SAA-C02 Storage Services

SAA-C02 Storage Services

Different solutions to allow different types of migrations.
On prem, use different types, same on cloud. DIfferent benefits: costs, capacity, durability, r/w/ speeds, auditing, backup etc.
Not all data treated the same.
Blocks: low latency, like direct access storage
File: Separate files in directories, in a file system. Like NAS
Object: referenced by key, flat address space.

EBS Elastic block storage

Block level storage to EC2. e.g. db disk
Persistent and durable.
Flexible
Good for rapidly changing data
Attached to one EC2 instance.
Separate service, so independent of EC2.
EBS snapshot stored on S3, manual or automatic. Incremental. If you loose an EBS volume, can restore from a snapshot.
Writes to EBS volumes by default replicated in one AZ. Can recreate in a different AZ.
IOPS: input output per second. Also use data throughput.
Can create as part of EC2 setup or separately
Elastically scalable, so can be resized with modify volume.
Not good for: Temporary storage, multi instance access, very high durability / availability.

EBS Disk volume types

HDD is for high throughput, big data, logging, larger blocks, not boot volumes.
SSD good for smaller blocks, db transactions, boot volumes

SSD GP2: General purpose, fast, 3k IOPS, 128 Mb / sec
SSD IO1: Provisioned IOPS. Predictable performance, specify IOPS rate. Max 20K IOPS. Most expensive.
HDD SC1: cold HDD. good for large infrequent workloads, high throughput, 250 Mb/sec. Cheapest.
HDD ST1: throughput optimised. good for frequently accessed large dataset. 500 Mb/sec.

EBS Encryption

EBS offers encryption at rest and in transit.
Just need to choose if encrypted when create.
256 bit with KMS for keys.
Snapshots are also encrypted.

EBS Pricing

Charged per month for capacity, even if don't use capacity.

AWS EC2 instance store (Instance store volume)

Can store data locally to EC2 instance.
Temporary storage only, can be lost when instance lost or terminated. OK if reboot.
E.g. if EC2 fails, all date lost and reset.

Benefits:

storage included in EC2 costs
IO is very fast compared to EBS (3.3 M IOPS)
Ideal as cache or buffer.
often used in load balancing group

Features:

Not available on all instances.
Get bigger with EC2 size.
Same security as EC2.
Not good for: shared data, persistent data. Use EBS.

S3 (Simple storage Service)

Most heavily used storage
Object based.
Unlimited scalability, but limits on individual file size (5 Tb)
Object storage, so no data structure hierarchy. Just a URL to the object.
Regional service, each bucket in one region.

Definitions

Availability: uptime is between 99.5 and 99.99%
Durability: probability of maintaining data without being lost (corruption etc). All S3 is 11 9s.

Features

Bucket name - container for the data. Unique.
Flat address space, so no duplicates.
Account has up to 100 buckets, can increase.
Can create folders in the bucket, but really this is just address
Key: bucket and object key. The folders are included in the key.
URL to access the key
Object versioning, so changes are a new object with diff key

S3 storage classes

Lifecycle rules can move or delete data to save storage costs.

S3 standard: good for high throughput, low latency, frequent access, durable, available, lifecycle rules
S3 Int: Intelligent tiering. move to infrequent tier if not accessed, then moves to standard for e.g. a month when accessed. Good for unknowns.
S3 S-IA: Standard infrequent access. Just the infrequent tier. Availability a bit lower.
S3 Z-IA: one zone infrequent access. Availability less again. Only one availability zone. 20% cheaper than standard but less available.

S3 Glacier

archival data. Interact with lifecycle rules. Much cheaper than S3. Not instant access. Low cost long term cold storage. Long term backup and archiving. Can take several hours to retrieve. Glacier value is regional , just a container for an archive.
Same durability, same availability.
Vaults and archives, not buckets and folders.
Vaults in a region, Archive in a vault.
No GUI - use glacier dashboard.
Moving data: create vault, move data using APIs or SDKs. Or use the S3 lifecycle rules.

Glacier retrieval

Expedited: under 250Mb, 5 mins
Standard: any size, 3-5 hours
Bulk: large size, 5-12 hours. Cheapest.

S3 Glacier Deep Archive

Good for long term access, e.g. financial records. Only one retrieval method, slow.

S3 storage classes summary

Choose based on durability, access times, etc.

S3 Bucket properties and management features

S3 Versioning

Multiple versions of same object.
Managed automatically when update object in a bucket with versioning enabled.
Only latest version shown on console.
Can't disable, only suspend (keeps versions to date)
Status can be un-versioned, version- enabled, versioning-suspended.
Takes more Space so costs more.
To enable, on creation or update.
Versions visible on console if switch it on and saved since versions switched on.
New version id when save.
When delete an object, new version gets a delete marker. Thats the current version.
Use AWS SDK to delete a file permanently passing version of delete marker.

S3 Server access logging

Capture details of requests to objects.
Not guaranteed.
specify target bucket for logs in same region, and file Prefix.
Log delivery group needs access to S3.
Logs have bucket owner, time,. Ip address, request id, operation, key, etc.

S3 Static website hosting

No server side scripting.
Gives url, but this doesn't support https or requestor pays.
Needs index document and error document.
Redirect requests
Change permissions to allow public access (By default all access is blocked), and add bucket policy

S3 Object level logging

This is part of AWS cloudtrail
Cloudtrail logs api calls from CLI, console, other services etc.
Log files stored on S3.
Stores metadata and request.
Setup in cloudtrail, or at bucket level.

S3 Default encryption

Only new update objects.

SSE-S3: S3 managed keys. Keys managed by AWS, so simple and invisible. 2 stage encryption with plaintext key being encrypted by the master key, and deleted from memory.
SSE-KMS: KMS managed keys. KMS generates 2 data keys

Object lock

Good for WORM compliance, write once read many.
Can add retention periods.
Only done on creation of bucket, can't be disabled.
Needs versioning enabled.
Enter retention period.
Need to choose governance mode or compliance mode (no users can overwrite retention period)
Legal hold is object level, no expiration date.

S3 Tags

Use for cost allocation to help with categorisation
Tags can be e.g. project name, and environment name (e..g test, dev)

S3 Transfer acceleration

Long distance transfers.
Cloudfront is content delivery service, distributes traffic worldwide via edge locations.
If transfer acceleration enabled, uses cloudfront.
Increased cost.

S3 Events

multiple event types, e.g. put, copy, replication started...
Events sent to SNS topics, SQS queue, Lambda function.
Need to give permissions for events to be sent to the topics.

S3 Requestor pays

Any costs of data transfers and requests become responsibility of bucket requestor, not owner.
Owner pays storage costs
All access needs to be authenticated so knows which account to charge.
Customer knows from request header.

Elastic File store (EFS)

File system Differences:

S3 is Object storage: single storage, not block storage, so a change to a file and needs a new version. Good for e.g. video files, write once read many. Not good for read and write at the same time.
EBS storage: stored in small blocks, not single objects, so when one part of a file changed only that part of the file is changed. Good for e.g. apps, op system. Good for part of file changing, multiple write and read.
EFS: file level, good for low latency. Multiple EC2 instances at once. Locks, hierarchy. bit like on prem file system.

To the user looks like a on-prem file system.

Good throughput, scalability, latency.
Regional replication, so applications in different regions can access same files
Not in all regions.

EFS Storage classes and options

standard: default. Change for storage only per month.
IA (Infrequent access): cheaper, more latency. charged for access and storage

Both same level of durability and accessibility.
Can use EFS lifecycle management to move between these storage classes. Small files and metadata don't get moves to IA.

EFS has 2 different performance modes:

general purpose: typical. low latency, IOPS capped at 7k.
Max IO: heavy throughput and IOPS, but latency is worse. If requirement is less throughput and speed, then use general.

EFS has 2 throughput modes:

Bursting throughput: Normal. Bursting costs money. Burst credits for EFS. Get credits when not using much throughput, then spend on bursts.
Provisioned Throughput: guaranteed rate for high availability.

Creating EFS

Mount to connect storage to EC2. Use either Linux NFS or EFS mount helper (preferred). Install EFS mount helper to set recommended options. Built in logging. Create a security group to allow the EC2 instances to write to the EFS.

Basically create the EFS, create a folder, mount the EFS to that folder. Like mapping, so you can access e.g. one EFS mount from 2 EC2 instances.

EFS Security

To create the EFS file system, you need 'allow' access to EFS, and EC2 access to create network interface. Need to have policies set up.

Encryption at rest
Need customer master key. AWS KMS encrypts and decrypts the AWS keys. 2 types of customer master keys: created by amazon / created externally. CMKs created first time you use them
Another course on encryption.
encryption in transit
TLS: transport layer security Stunnel: open source app for TSL / SSL tunnelling service. Listens for encrypted data and routes to decryption.

Importing data to EFS

AWS Datasync: to transfer data from customer site.
Download Datasync agent, configure source and target.
Also good for syncing data between EFS instances.

Backup and DR strategies

RTO: recovery time objective - time after a disruption to restore a business process to recovery level.
RPO: recovery point objective - acceptable data loss in time interval. e.g. loose the last 10 mins of data may be acceptable.

Either could the be priority.

Backup and restore

like a virtual tape library.
Copy object to S3, restore from there.
Need to decide retention time
Need to set encryption policies
Regularly test

Pilot light

Keep a similar stack ready to run.
In event of disaster, start up pilot light, and quickly horizontal scale to same scale as production system.

Warm standby

Ready to go with all services running in minimal possible way.
Mirroring / replication of data
Minimal services running.
When fails, change DNS settings and scale up

Multi site: full replica

Route traffic to all multiple sites
when one fails, route all traffic to less sites.
Most expensive

Replication considerations

distance between sites (latency)
Available bandwidth
Data create required by application
Replication technology should be parallel

Replication types

Synchronous: atomic write to both.
Asynchronous: not atomically written to multi locations. Transfers as network allows. Used by file systems, some databases, good for read only copy.

Using AWS Storage for on-prem backup and DR

Traditional

backup files
may not be available quickly
long RTO
scalability issues
Manual processes

issues

time to get files transferred
RTO / RPO

data transfer mechanisms

3 direct connection types (plus snowball, )

Direct connect (fastest)
VPN connection
Internet connection

Different times to transfer data after restore. Calculate the time.

S3 as a backup solution

Storage classes

Standard: 11 9's of durability, 4 9's or availability.
IA: infrequent access. 11 9's of durability, 3 9's or availability. Cheaper. Good choice for backup data
Glacier: 11 9's of durability, N/A availability. Archives and vaults.

Moving data to glacier

lifecycle rules
AWS SDK
Glacier API

Has Vault locks.

Glacier retrieval

expedited (urgent)
Standard (cheaper)
bulk (slow and cheap)

S3 regions

S3 is by default single region. Can turn on cross region replication (CRR).
Maybe do this for extra durability

S3 performance

Multipart upload should be used when object > 100Mb. Faster, more throughput, interruption recovery, easier management

S3 Security

IAM policies (allow or restrict identity / user group / role access)
Bucket policies (JSON policies per bucket)
Access control lists
Lifecycle policies
Multi factor delete
Versioning

AWS Snowball for data transfer

in and out of AWS
Physical appliance
50 or 80 Tb devices.
Dust, impact and tamper resistant.
E-link shipping label so goes to the correct place.
tracked with SNS (simple notification service)

Consider snowball for restore when your connection is slow.
Create export job in AWS management console

AWS storage gateway

Software client in your data centre to connect to amazon.
It's a VM running in your site.
Can be configured as file gateways, volume gateways, virtual tape library

File gateway: files sent to S3 over https, encrypted. View S3 as normal NFS file system, mount internally.
Volume gateway / Stored volume gateway: backup for local storage to S3, stored as EBS snapshot. Configures is iSCSI devices for asynchronous backup. Your data stays on site. Storage buffered on site for staging. Easy disaster recovery: just
Volume gateway / Cached volume gateway: primary storage is S3. Uses local data storage as buffer. Presented as iSCSI devices. Local disks act as buffer.
virtual tape library (VTL): cloud based tape. Use your infrastructure, but store on S3. Virtual tapes, capacity of 1500 virtual tapes. Presented as iSCSI devices. Archive tapes from VTL to Glacier

Optimising Storage costs

S3 costs based on volume.
With intelligent tier, costs decrease as more data added per month

Other storage classes are fixed cost per Gb

Retrieval costs: Pay for requests (e.g. API calls) and for volume retrieval per Gb

Data transfer costs: Free with to EC2 or cloudfront, but costs when transfer to internet. Transfer acceleration (via edge locations) is more costly.

Management and replication costs: costs when use S3 inventory, analytics, Object tagging.

Batch operations: Batch for S3 operations. Priced per job and per mission operations.

S3 and glacier select: select the data you want, cheaper, but fixed costs

S3 replication: CRR cross region replication, or SRR same region replication. Costs for storage at other location, and data transfer between regions.

Versioning: implement life cycle policies on S3 to cover versions.

Cost optimisation

understand your data: profile of retrieval, data request patterns, retrieval costs, speed to get data. Make use of lifecycle controls.

EFS costs

EFS Standard storage is volume per month, no access charges.
EFS IA has cheaper storage classes (90% cheaper), but also read and write charges.
Lifecycle management moves data between these automatically. Excludes files below 128k, and metadata. configure the lifecycle management timer to e.g. 30 days.
EFS bursting throughput: caters for occasional high peaks, included.
EFS provisioned throughput: higher level, costs.

FSx

FSX for windows file system: managed windows files system on AWS. Full support for NTFS.
FSx for Lustre: for compute intensive workloads & ML
Pay for storage capacity used.

AWS Storage Gateway costs

Hardware or software on prem to connect to AWS. Types are File / volume / tape.

File gateways: S3 file share mounted as on prem. Pricing on storage and request as per S3.
Stored volume gateways (EBS snapshots), billed as EBS snapshots
Cached volume gateways, charged on S3 levels.
Tape gateways: charged for S3 and glacier.

AWS Backup costs

Solution to backup AWS services.
Also automate backup from on prem.
Centralises management, auditing, logging, restores.
Need to create backup plans: schedule, window, vault, tags, regional copies etc.
Can use tag based policies.