AWS Analysing data at scale

Extraction / enrichment

Lots of amazon services to extract data from sources Output as JSON file Store in Dynamo DB (can cope with large single items 400Kb) Kenesis data firehose -> data lake

Typical problems

Solutions

Host in Lambda, write to EFS.

Simple solution:

better: Host in EFS

For Lambda examples look at aws.amazon.com/blogs/compute/pay-as-you...

Look at "Postman" software to run and see results Look at "Papermill" software: run notebooks in batch execution

Conclusion