AWS SAA-C02 Databases

AWS SAA-C02 Databases

Relational vs Non Relational

Structured data

transactional and analytical. Relational or non relational.

Semi structured

e.g. xml. But flexible, updated withing the requirement to change the schema for every record in the table. Non relational stores, e.g. XML,. JSON, email

Unstructured

not organized. e.g. key value pairs, lots of irrelevant information. e.g. text messages, docs, videos. Files in file store or data lake.
often combined, e.g. point of sale system produces structured data, integrated with clickstream data form the website in XML. Need to bring together to query.

RDS

Covers Aurora, MySQL, Postgres, Oracle, SQL Server
OLTP, replacement for on prem db.
RDS encryption: only enable at creation (unless you load an encrypted backup into non encrypted)
Read replicas need to be same encryption
Monitor with cloudwatch
maintenance windows
specify storage, vpc, instance size (cpu / mem)
specify instance type, some downtime when changing this.
No access to underlying EC2 instance so no customisation.
patching, backup are automated
easy to scale
synchronous replication to multi AZ
auto failover
read replicas to scale for reads
pay for: DB hours, storage, IO requests, backup storage
in multi AZ, pay for 2nd install but data transfer between the 2 is not chargeable
scale up based on cpu and storage, but not down. Causes downtime if change instance type.

more custom option: EC2 instance with database

option if need more control than RDS offers
e.g IBM DB2
need to manage everything: patches, scaling, backups

Setup Wordpress on MySQL

Set up MySQL db, free tier Set up EC2 instance

# Install WordPress on EC2 using RDS MySQL DB
yum update -y
amazon-linux-extras install -y php7.2 yum install -y httpd
systemctl start httpd
systemctl enable httpd
cd /var/www/html
wget https://wordpress.org/latest.tar.gz tar -xzf latest.tar.gz
cp -r wordpress/* ./
chmod -R 755 wp-content
chown -R apache:apache wp-content
yum install php-mbstring -y
systemctl restart httpd
systemctl restart php-fpm
wget https://www.phpmyadmin.net/downloads/phpMyAdmin-latest-all- languages.tar.gz
mkdir phpMyAdmin && tar -xvzf phpMyAdmin-latest-all-languages.tar.gz -C phpMyAdmin --strip-components 1
rm phpMyAdmin-latest-all-languages.tar.gz

Go to public I{ address and see wordpress installation screen Put in database urn in wordpress setup page SSH to the EC2 instance, create wordpress wp-config.php file in EC2. Give site title

Install PHP etc

# Install PHP MyAdmin
yum install php-mbstring -y
systemctl restart httpd
systemctl restart php-fpm
wget https://www.phpmyadmin.net/downloads/phpMyAdmin-latest-all-languages.tar.gz
mkdir phpMyAdmin && tar -xvzf phpMyAdmin-latest-all-languages.tar.gz -C phpMyAdmin --strip-components 1 rm phpMyAdmin-latest-all-languages.tar.gz

# Edit config:
nano config.sample.inc.php #add database host name

# Add text (enter RDS endpoint):
$cfg['Servers'][$i]['host'] = 'RDS MYSQL ENDPOINT'; $cfg['Servers'][$i]['compress'] = false; $cfg['Servers'][$i]['AllowNoPassword'] = false; $cfg['Servers'][$i]['connect_type'] = 'tcp'; $cfg['Servers'][$i]['extension'] = 'mysqli';

# Copy sample file over live file:
cp config.sample.inc.php config.inc.php
service httpd restart

Go to ip address/phpmyadmin to get MySQL database manager.

Multi AZ database

Multi AZ:

standby database in separate AZ.
synchronous replication to instance in different AZ.
No endpoint, on failover end point address is mapped to standby instance.
2 AZs in one region.
Standby used to take backups.
Auto or manual fail over.
When flips over to secondary, replicates the other way. Takes a couple of mins.

setup: just choose setting to make multi AZ. Takes time initially. Demo with reboot, wordpress site goes to replica.

Read replica

Read replica: Asynchronous replication. Separate endpoint, so you can use it. Can be cross region! Can promote to writable db.

Multi AZ read replicas

If you need fail over on read replicas.. If you fail over to a rr, then the rr copy becomes the OLTP replica.

RDS encryption

Encryption at rest: means storage in EBS, backups, snapshots, replicas
Uses AES 256.
Transparent
no performance impact
enabled at creation only, can't disable
RR encrypted at same level. In same region, can use the same key.
If in different regions, need different keys
work around to encrypt / decrypt: take snapshot, start a new instance with different encryption, load snapshot.

Snapshots

Taken to S3, manually delete
user initiated backups
restored databases to new RDS instance (not to existing DB)

Migration

AWS database migration service (DMS)
Schema conversion tool (SCT)
Lots of source databases

Aurora

Different from other RDS databases
Compatible with MySQL or PostgreSQL, faster than these
point in time recovery without restore
cheap
highly available
Aurora replicas: like read replicas, in region read scaling. up to 15.
MySQL read replicas: cross region cluster with read scaling, up to 5.
Global database: cross region cluster with read scaling. Fastest replication and low latency reads.
Multi master: scales writes in a region.
Serverless: autoscaling for Aurora.
2 copies of data in each AZ, with min of 3 AZs, so min 6 data copies

Aurora replicas vs MySQL replicas

Aurora replica

single region
high performance
support for failover

Aurora Cross region replica

multi region for better DR
eay to migrate between regions
fast local reads

MySQL replicas

multi region
asynchronous
higher impact on primary

Aurora fault tolerance

Multi AZ, multi copies in an AZ, so 6 min copies of the data

Aurora setup

provisioned (defined scale) or serverless (autoscaling) tiers control the order of failover can enable backtrack

Cross region read replicas: need to enable bin logging. Promote: makes a read replica a primary, stops and reverses replication.

Global database setup

write to one region, replicated to others.
Low latency (1 sec) read only copy
little impact on the database (uses non db processes to replicate data)
can promote secondary region
storage based replication

Aurora multi master

mySQL only
High availability
ACID
multi AZ

Aurora Serverless

automatically starts up, scales, shots down when required
simple and cheap for infrequent intermittent, unpredictable workload
On demand auto scaling
MySQL or Postgres SQL only
pay for storage, and io while active.

Dynamo DB

Features

Fully managed, serverless, no setup or maintenance to do, no downtime.
Scales easily: horizontal, no downtime. Scale with read and write units, or autoscaling.
Availability very high. 4 9's, 5 9's when use global tables.
DynamoDB streams: use for queue.
DynamoDB accelerator (DAX): caches with microsecond performance (no read replicas for DynamoDB!)
Point in time recovery
Global tables are multi region multi master.
Transaction support: strong consistent, eventual consistent, or ACID.
multi AZ or cross region
Defaults to eventual consistency, but can request strongly consistent reads
Use DynamoDB Transactions for ACID
auto scaling
pay for capacity: on demand (auto scaling) or provisioned capacity (specify reads and writes per second)
fast to store data from e.g. Kinosis

DynamoDB table setup

Primary key and sort key optional
read write capacity: choose Provisioned or on demand. Can choose reserved capacity (big discount)
Scaling parameters
Encryption with KMS or default
Items can have different attributes
Search with scan (uses filters) or query (uses sort key, fast)

DynamoDB Streams

Switch on to get a list of item level changes
integrate with Lambda as trigger, associate the stream with a lambda function

DynamoDB Accelerator (DAX)

Look at python and DAX example

Cache, has nodes in various AZs.
Gives endpoint.
Managed service
10 x faster than dynamoDB

Global table

Replicated to regions, and can write in regions.
Async replication but uses AWS backbone.
Need to switch on DynamoDBStreams
Set up a table and add a region to it.
Monitoring shows pending replication count.

Elasticache

Fully managed, in memory datastores Use when want to improve latency and throughput for read intensive apps. best for OLAP 2 different options

Redis

better for real data, for very low latency and high performance
persistent
really just like an in memory database
multi AZ with failover
with no clusters, all data in one db and have between 0 and 5 replicas
cluster mode can be enabled, this splits the data into shards in different AZs.
- a shard has between 0 and 5 replicas
slots and key spaces to segregate the data into shards
can encrypt
can have auto failover to make a different node the primary

Redis AUTH command: requires users to have a token (password) before allowing clients to execute commands.

MemCacheD

high performance object storage
good for cache data for web apps, e.g. session state
partitioned data
set up number of nodes, scales these, multi AZ
can get reserved instances for cheaper cost
no backups
no encryption

Need to write code to use this effectively.

Redshift

OLAP Analytical database
Data warehouse with SQL and BI tools
Load data from OLTP, or s3 or something.
Leader nodes coordinate the query
compute nodes do the query execution
Take snapshots, can replicate snapshots
Single AZ, but snapshots are in S3 region, so can restore to a second AZ
Redshift spectrum: query data directly on S3 without loading to Redshift
Uses columnar data storage: this is good for warehousing as less IO for L shaped data
Massively parallel processing with lots of compute nodes.
cross region snapshots
can use reserved nodes with optional upfront payment
good for long running queries, and also the cache helps with repeated short queries

Document DB

mongo DB implementation
auto scaling
JSON type storage

Neptune

Graph database
supports graph query languages Apache Tinkerpop etc
use cases social networking, recommendations.
Create relationships and quickly query.
Hard in Relational because multiple foreign keys, difficult nested queries.
Neptune uses graph structures such as nodes (data entities), edges (relationships), and properties to represent and store data

Gremlin query to find Howard's friends friends

g.V().has('name', 'Howard').out('friend').out('friend').values('name')

QLDB

Quantum ledger database
fully managed
ledger database to record history
immutable transaction log known as journal, read only, with hash chained blocks
SQL Transaction logs can be replayed but not secure because can be changed, and cant be easily queried
QLDB similar to trans log, but immutable.
Write transactions committed to journal first
journal materialised into queryable user defined tables

Timestream

Time series database
store and analyse time series data points
keeps recent data in memory and moves old data to different storage automatically
query engine to analyse recent and historical data together without specifying location
built in time analytics functions
managed service
use cases: metrics, telemetry, IOT