Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

If you're already in AWS, why wouldn't you use AWS Glue Catalog + AWS SDK for pandas + Athena?

You can setup a data lake, save data and start doing queries in like 10 minutes with this setup.



These days you can 'just' create an S3 tables bucket. https://docs.aws.amazon.com/AmazonS3/latest/userguide/s3-tab...


Athena is really expensive though and you will often run into a hard limit on the size of your query.


Like most things serverless Athena is cheap as long as you don't use it.

My company has 100s of data pipelines that are executed infrequently.

For this use case Athena is ridiculously cheap and easy to use vs most other solutions.


I never found Athena expensive. Compared to employment cost it will be miniscule.

And some times, if your query is CPU extensive but the queried data size is not huge you can get a ridiculous value for money, like many CPU-days in 10 minutes for just $5 if your query covers 1TB after partitioning.

Query size limits are also configurable.

Obviously it depends on what data you are working on, but not having to set up and pay for a computational cluster is a huge cost saving.


Agreed.

A lot of people worry would worry about "vendor lock-in" here, but it's certainly convenient.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: