Skip to main content

S3 Advanced


Replicationโ€‹

Cross-Region Replication (CRR) vs Same-Region Replication (SRR)โ€‹

FeatureCRRSRR
RegionsSource โ†’ different regionSource โ†’ same region
Use caseDR, latency, complianceAggregate logs, dev/prod copies
VersioningRequired on both bucketsRequired on both buckets
Existing objectsโŒ Not replicated by default (use S3 Batch)โŒ Same
Delete markerNot replicated by default (opt-in)Not replicated by default
caution

Replication does not replicate delete markers or object deletions by default โ€” to prevent accidental cross-region deletes. Enable Delete Marker Replication explicitly.


S3 Transfer Accelerationโ€‹

Client โ†’ CloudFront Edge Location โ†’ AWS Backbone โ†’ S3 Bucket
  • Speeds up uploads from distant clients
  • Uses CloudFront edge network as an entry point
  • Extra cost per GB transferred
  • Separate endpoint: bucket.s3-accelerate.amazonaws.com

S3 Select & Glacier Selectโ€‹

Query data inside S3 objects without downloading the whole file:

SelectObjectContentRequest request = SelectObjectContentRequest.builder()
.bucket("my-data-lake")
.key("orders/2024-01.csv")
.expressionType(ExpressionType.SQL)
.expression("SELECT * FROM S3Object s WHERE s.status = 'FAILED'")
.inputSerialization(InputSerialization.builder()
.csv(CSVInput.builder().fileHeaderInfo(FileHeaderInfo.USE).build())
.compressionType(CompressionType.NONE)
.build())
.outputSerialization(OutputSerialization.builder()
.csv(CSVOutput.builder().build())
.build())
.build();

Reduces data transfer and processing cost significantly.


Object Lambdaโ€‹

Transform S3 objects on the fly during a GET request:

Client GET request
โ†“
S3 Object Lambda Access Point
โ†“
Lambda function (transform: redact PII, resize image, format conversion)
โ†“
Transformed response to client

Use cases: redact SSN/email from CSV, resize images, add watermarks.


MFA Deleteโ€‹

  • Requires MFA to permanently delete a versioned object or suspend versioning
  • Only the bucket owner (root account) can enable MFA Delete
  • CLI only (not console)

Lifecycle Rulesโ€‹

{
"Rules": [{
"ID": "ArchiveOldLogs",
"Status": "Enabled",
"Filter": { "Prefix": "logs/" },
"Transitions": [
{ "Days": 30, "StorageClass": "STANDARD_IA" },
{ "Days": 90, "StorageClass": "GLACIER_IR" },
{ "Days": 365, "StorageClass": "DEEP_ARCHIVE" }
],
"Expiration": { "Days": 2555 },
"AbortIncompleteMultipartUpload": { "DaysAfterInitiation": 7 }
}]
}

S3 Object Ownership & ACLsโ€‹

  • Bucket owner enforced (recommended): ACLs disabled, bucket owner owns all objects
  • ACLs are legacy โ€” use bucket policies and IAM instead

๐Ÿงช Practice Questionsโ€‹

Q1. A company replicates S3 objects from us-east-1 to eu-west-1 for compliance. A user deletes an object in us-east-1. Will the object be deleted in eu-west-1?

A) Yes โ€” deletions are always replicated
B) No โ€” delete markers are not replicated by default
C) Yes โ€” if the bucket policy allows cross-region deletes
D) No โ€” only new objects are replicated, not modifications

โœ… Answer & Explanation

B โ€” By default, S3 replication does not replicate delete markers. This protects against accidental or malicious cross-region deletes. You must explicitly enable Delete Marker Replication.


Q2. A developer has a 10GB CSV file in S3. They only need rows where status = 'ERROR'. What is the MOST cost-effective approach?

A) Download the file and filter locally
B) Use Lambda to stream and filter the file
C) Use S3 Select with a SQL expression
D) Use Athena to query the file

โœ… Answer & Explanation

C โ€” S3 Select executes the filter server-side, returning only matching rows. You only pay for the data scanned and returned, avoiding full file download. Athena is better for complex analytics; S3 Select is simpler for single-object queries.


๐Ÿ”— Resourcesโ€‹