In previous blog (Part-3), I compared basic workload with Athena and other query engines, both on-prem and cloud based solution. In this post, we will do bit deep dive, understand how the service works and how Amazon build Athena service.
First understand the service flow, figure below explains how flow works with AWS Athena service and how you can take the cold data and run analytics on data-set.
Let’s decouple the entire flow –
- When you create table, the table metadata is stored in metadata indicated with red arrow.
- The table definition has a reference of where data resides in S3 bucket indicated in blue pointers.
- Also, Athena will also create S3 bucket to store service logs indicated in doted line
- AWS Athena rely on Presto query in-memory engine for fast query analytics
- The results either can be displayed on the Athena console or can be pushed to AWS QuickSight for data slice and dice.
- With AWS Quicksight, it is great way to understand, slice and dice data and publish dashboards.
There are some limitations with AWS Athena shown in table below:
|Athena Service limitations|
|Parallel query executions||5|
|Number of databases||100|
|Tables per database||100|
|Partitions per table||20K|
|S3 buket – log||log bucket for service outputs|
Again, AWS Athena is good way to start learning about your data quality, data trend and converting raw data to dashboards with few clicks.
In Part-5 I will touch more on AWS Athena + QuickSight and how data can be quickly converted to dashboards.
Hope this post helps understand how AWS Athena workflow. Comments and questions are welcomed!