Getting started with AWS Athena – Part 4

In previous blog (Part-2),  I created two tables using JSON and CSV format. In this post (part 3) I will talk about how one can explore dataset,  query large data with predicate filtering and some basic inner joins using Athena. Also, I will compare the performance with Hadoop cluster and AWS EMR. For this benchmark […]

Advertisements

In previous blog (Part-3),  I compared basic workload with Athena and other query engines, both on-prem and cloud based solution.  In this post, we will do bit deep dive, understand how the service works and how Amazon build Athena service.

First understand the service flow, figure below explains how flow works with AWS Athena service and how you can take the cold data and run analytics on data-set.

Athena flow

Let’s decouple the entire flow –

  • When you create table, the table metadata is stored in metadata indicated with red arrow.
  • The table definition has a reference of where data resides in S3 bucket indicated in blue pointers.
  • Also, Athena will also create S3 bucket to store service logs indicated in doted line
  • AWS Athena rely on Presto query in-memory engine for fast query analytics 
  • The results either can be displayed on the Athena console or can be pushed to AWS QuickSight for data slice and dice.
  • With AWS Quicksight, it is great way to understand, slice and dice data and publish dashboards.

There are some limitations with AWS Athena shown in table below:

Service limits

 

Athena Service limitations 
Action  Limit
Parallel submit 1
Parallel query executions 5
Number of databases 100
Tables per database 100
Partitions per table 20K
S3 buket – log log bucket for service outputs

Conclusion

Again, AWS Athena is good way to start learning about your data quality, data trend and converting raw data to dashboards with few clicks.

In Part-5 I will touch more on AWS Athena + QuickSight and how data can be quickly converted to dashboards.

Hope this post helps understand how AWS Athena workflow.  Comments and questions are welcomed!

Thanks!

Author: Abdul H Khan

Trying to be cloudy!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s