Amazon S3
Key exam themes: Encryption types, presigned URLs, CORS, event notifications, storage classes, bucket policies.
๐ฐ What Is Amazon S3?โ
Amazon S3 (Simple Storage Service) is an object storage service offering virtually unlimited storage with 99.999999999% (11 nines) durability. Objects are stored in buckets and accessed via unique keys.
Analogy: S3 is like an infinite filing cabinet in the cloud. Each drawer is a bucket, each file inside is an object. You can organize files with "folders" (key prefixes), lock them with encryption, and set rules to automatically archive old files.
S3 vs Other Storageโ
| Feature | S3 (Object) | EBS (Block) | EFS (File) |
|---|---|---|---|
| Access | HTTP/HTTPS API | Attached to EC2 | NFS mount |
| Scalability | Unlimited | Fixed size (up to 64 TB) | Auto-scaling |
| Sharing | Any number of clients | Single EC2 (or multi-attach io2) | Multiple EC2 |
| Use case | Static files, backups, data lake | Databases, OS volumes | Shared file system |
| Durability | 11 nines | Volume-level | 11 nines |
Storage Classesโ
| Class | Use Case | Min Duration | Retrieval | Availability |
|---|---|---|---|---|
| Standard | Frequently accessed | None | Instant | 99.99% |
| Standard-IA | Infrequent, fast retrieval needed | 30 days | Instant | 99.9% |
| One Zone-IA | Infrequent, recreatable data | 30 days | Instant | 99.5% |
| Glacier Instant | Archive, quarterly access | 90 days | Instant | 99.9% |
| Glacier Flexible | Archive, hours acceptable | 90 days | 1-12 hours | 99.99% |
| Glacier Deep Archive | Long-term (7-10yr) | 180 days | 12-48 hours | 99.99% |
| Intelligent-Tiering | Unknown access patterns | None | Instant | 99.9% |
Glacier Flexible Retrieval Optionsโ
| Option | Speed | Cost |
|---|---|---|
| Expedited | 1-5 minutes | Highest |
| Standard | 3-5 hours | Medium |
| Bulk | 5-12 hours | Lowest |
- "Rarely accessed but needs millisecond retrieval" โ Glacier Instant Retrieval
- "Unknown access pattern" โ Intelligent-Tiering (auto-moves between tiers)
- "Can lose one AZ, infrequent access" โ One Zone-IA (cheapest IA)
- "Legal compliance, 7+ year retention" โ Glacier Deep Archive
Versioningโ
- Enable per bucket โ objects get a
VersionId DELETEwithout specifying version โ adds a delete marker (old versions preserved)DELETEwith VersionId โ permanently deletes that specific version- Once enabled, versioning can be suspended but never fully disabled
- MFA Delete โ requires MFA to permanently delete or suspend versioning
Versioning Behaviorโ
PUT object.txt (v1) โ { VersionId: "abc", Content: "Hello" }
PUT object.txt (v2) โ { VersionId: "def", Content: "World" }
DELETE object.txt โ { VersionId: "ghi", DeleteMarker: true }
GET object.txt โ 404 (latest is delete marker)
GET object.txt?versionId=abc โ "Hello" (still exists!)
DELETE object.txt?versionId=ghi โ Removes delete marker, v2 is latest again
Encryptionโ
| Type | Key Management | Who Manages? | Audit Trail |
|---|---|---|---|
| SSE-S3 | AWS-managed (AES-256) | AWS | โ No CloudTrail |
| SSE-KMS | KMS key (CMK or AWS-managed) | You + KMS | โ CloudTrail |
| SSE-C | Customer-provided key | You send key per request | โ (your responsibility) |
| Client-Side | Key never leaves client | You | โ (your responsibility) |
SSE-KMS Considerationsโ
PUT request โ S3 โ KMS:Encrypt โ encrypted object stored
GET request โ S3 โ KMS:Decrypt โ decrypted object returned
Each request counts toward KMS API quotas!
- 5,500 requests/sec (us-east-1) or 10,000/sec (some regions)
- High-throughput buckets with SSE-KMS may need to request quota increase
- SSE-KMS โ audit trail in CloudTrail + KMS quota limit
- SSE-C โ you send key with every request (HTTPS required!)
- SSE-S3 โ default encryption, no extra cost, no audit
- Bucket keys (with SSE-KMS) โ reduces KMS API calls by 99%
Force Encryption via Bucket Policyโ
{
"Effect": "Deny",
"Principal": "*",
"Action": "s3:PutObject",
"Resource": "arn:aws:s3:::my-bucket/*",
"Condition": {
"StringNotEquals": {
"s3:x-amz-server-side-encryption": "aws:kms"
}
}
}
Default Encryptionโ
{
"ServerSideEncryptionConfiguration": {
"Rules": [{
"ApplyServerSideEncryptionByDefault": {
"SSEAlgorithm": "aws:kms",
"KMSMasterKeyID": "arn:aws:kms:us-east-1:123:key/abc-def"
},
"BucketKeyEnabled": true
}]
}
}
Presigned URLsโ
Generate time-limited URLs that grant temporary access to private objects:
// Generate presigned GET URL (download)
S3Presigner presigner = S3Presigner.create();
PresignedGetObjectRequest presigned = presigner.presignGetObject(b -> b
.signatureDuration(Duration.ofHours(1))
.getObjectRequest(r -> r.bucket("my-bucket").key("reports/2024-Q4.pdf")));
URL downloadUrl = presigned.url();
// Generate presigned PUT URL (upload directly from client browser)
PresignedPutObjectRequest presignedPut = presigner.presignPutObject(b -> b
.signatureDuration(Duration.ofMinutes(15))
.putObjectRequest(r -> r
.bucket("my-bucket")
.key("uploads/" + UUID.randomUUID() + ".jpg")
.contentType("image/jpeg")));
URL uploadUrl = presignedPut.url();
| Property | Details |
|---|---|
| Permissions | Inherits permissions of the signer (IAM user/role) |
| Default expiry | Configurable; max 7 days (IAM user), 12 hours (STS temp creds) |
| Use case | Client-side uploads/downloads without exposing AWS credentials |
| Security | If signer's permissions are revoked, URL stops working immediately |
"Allow browser to upload directly to S3 without going through your server" โ Generate a presigned PUT URL on your backend, return it to the client.
Event Notificationsโ
| Destination | Use Case | Setup Complexity |
|---|---|---|
| SNS | Fan-out to multiple subscribers | Low |
| SQS | Queue for async processing | Low |
| Lambda | Direct serverless processing | Low |
| EventBridge | Complex routing, filtering, replay | Medium |
Event Typesโ
s3:ObjectCreated:* โ PUT, POST, COPY, CompleteMultipartUpload
s3:ObjectRemoved:* โ DELETE, DeleteMarkerCreated
s3:ObjectRestore:* โ Glacier restore initiated/completed
s3:Replication:* โ Replication success/failure
s3:LifecycleExpiration:* โ Object expired by lifecycle
EventBridge Integrationโ
{ "EventBridgeConfiguration": {} }
EventBridge provides: filtering, multiple targets, archive & replay, schema registry โ much more powerful than native S3 notifications.
CORS (Cross-Origin Resource Sharing)โ
When a browser at domain-a.com requests resources from S3 at domain-b.com:
<CORSConfiguration>
<CORSRule>
<AllowedOrigin>https://myapp.example.com</AllowedOrigin>
<AllowedMethod>GET</AllowedMethod>
<AllowedMethod>PUT</AllowedMethod>
<AllowedHeader>*</AllowedHeader>
<MaxAgeSeconds>3000</MaxAgeSeconds>
<ExposeHeader>x-amz-request-id</ExposeHeader>
</CORSRule>
</CORSConfiguration>
CORS is NOT a security control โ it only tells browsers whether to allow cross-origin responses. Direct API calls (curl, SDK) bypass CORS entirely. Use bucket policies and IAM for actual access control.
Multipart Uploadโ
| Property | Value |
|---|---|
| Recommended | Objects >100 MB |
| Required | Objects >5 GB |
| Max parts | 10,000 |
| Part size | 5 MB โ 5 GB |
| Parallelism | Parts uploaded in parallel |
// SDK v2 handles multipart automatically with TransferManager
S3TransferManager transferManager = S3TransferManager.create();
FileUpload upload = transferManager.uploadFile(UploadFileRequest.builder()
.putObjectRequest(PutObjectRequest.builder()
.bucket("my-bucket")
.key("large-file.zip")
.build())
.source(Paths.get("/path/to/large-file.zip"))
.build());
upload.completionFuture().join();
Always create a lifecycle rule to abort incomplete multipart uploads after N days โ orphaned parts incur storage costs!
S3 Access Pointsโ
Simplify bucket policies for large teams:
Bucket "data-lake"
โโโ Access Point "finance-ap" โ /finance/* (finance team only)
โโโ Access Point "analytics-ap" โ /analytics/* (data scientists)
โโโ Access Point "public-ap" โ /public/* (read-only, anyone)
- Each access point has its own DNS name and IAM policy
- Can restrict to a specific VPC (VPC-only access point)
- Simplifies managing complex bucket policies with many principals
Bucket Policies vs IAM Policiesโ
| Feature | Bucket Policy | IAM Policy |
|---|---|---|
| Attached to | S3 bucket | IAM user/role/group |
| Scope | Cross-account, anonymous | Same account only |
| Use case | Public access, cross-account | User-level permissions |
| Deny | Can explicitly deny any principal | Applies to attached principal |
Common Bucket Policy Patternsโ
// Force HTTPS only
{
"Effect": "Deny",
"Principal": "*",
"Action": "s3:*",
"Resource": ["arn:aws:s3:::bucket/*", "arn:aws:s3:::bucket"],
"Condition": { "Bool": { "aws:SecureTransport": "false" } }
}
// Cross-account access
{
"Effect": "Allow",
"Principal": { "AWS": "arn:aws:iam::987654321098:root" },
"Action": ["s3:GetObject", "s3:PutObject"],
"Resource": "arn:aws:s3:::my-bucket/*"
}
๐ Best Practicesโ
Securityโ
- Block all public access by default โ enable only when explicitly needed
- Use SSE-KMS with bucket keys for audit + cost optimization
- Enable versioning + MFA Delete for critical data
- Force HTTPS via bucket policy condition
Costโ
- Use Lifecycle Rules to transition to cheaper tiers automatically
- Abort incomplete multipart uploads via lifecycle rule
- Use Intelligent-Tiering when access patterns are unknown
- S3 Select โ filter data server-side instead of downloading entire objects
Performanceโ
- Multipart upload for files >100MB โ parallel parts
- Transfer Acceleration for distant clients (uses CloudFront edge)
- Prefix partitioning โ distribute objects across prefixes for high request rates
- S3 supports 3,500 PUT/5,500 GET per prefix per second โ use multiple prefixes
๐ฏ DVA-C02 Exam Tipsโ
- SSE-KMS = CloudTrail audit trail but KMS quota limits
- SSE-C = you provide key with every request, HTTPS mandatory
- Presigned URL = temporary access inheriting signer's permissions
- CORS = browser-only, not a security mechanism
- Versioning DELETE = adds delete marker (object not actually deleted)
- Multipart = required >5GB, recommended >100MB
- S3 Event โ EventBridge gives more filtering than native notifications
- Lifecycle rule = auto-transition storage class + abort multipart
- Bucket keys with SSE-KMS reduces API calls by 99%
- S3 Access Points simplify complex multi-team bucket policies
๐งช Practice Questionsโ
Q1. Allow client browser to directly upload to S3 without going through your server. Best approach?
A) API Gateway proxy to S3
B) Presigned PUT URL
C) Make bucket public
D) Transfer Acceleration
โ Answer & Explanation
B โ Presigned PUT URL lets the client upload directly with time-limited, credential-free access. No server in the upload path.
Q2. All objects must use customer-managed KMS key with audit trail. Which encryption?
A) SSE-S3
B) SSE-KMS
C) SSE-C
D) Client-Side
โ Answer & Explanation
B โ SSE-KMS uses a CMK in KMS, and every encrypt/decrypt is logged in CloudTrail.
Q3. Versioning enabled. User deletes a file. What happens?
A) Permanently deleted
B) All versions deleted
C) Delete marker added; previous versions preserved
D) Moved to Glacier
โ Answer & Explanation
C โ DELETE without VersionId adds a delete marker. All previous versions remain intact.
Q4. High-throughput application using SSE-KMS encryption starts getting ThrottlingException. What should you do?
A) Switch to SSE-S3
B) Request KMS quota increase
C) Enable S3 Bucket Keys
D) Both B and C
โ Answer & Explanation
D โ S3 Bucket Keys reduce KMS API calls by ~99% (uses a bucket-level key to derive per-object keys). Also request a KMS quota increase if needed.
Q5. A React app on app.example.com fetches images from S3. Requests fail with CORS error. What to configure?
A) IAM policy on the React app
B) S3 bucket policy allowing the domain
C) S3 CORS configuration allowing app.example.com
D) CloudFront distribution
โ Answer & Explanation
C โ CORS is a browser mechanism. Configure S3 CORS rules to allow the origin domain. Bucket policies control access, not CORS headers.