
Cultivating Your Data Lake - yarapavan
https://segment.com/blog/cultivating-your-data-lake/
======
s188
I find it odd that the words 'consent' and 'permission' don't appear anywhere
in this article. I realise it is a purely technical article but is it off
topic to include consent in an article about collecting customer data? It
seems to me that sitting above all the various architectural layers, there
should be a 'consent layer'.

> If you take one idea away from this blog post, let it be this: store a raw
> copy of your data in S3. I would have been happier if the 'one idea' was
> this: First get your customers informed consent before you collect their
> data.

------
yarapavan
As heavy users of all of these tools in AWS, we’ll share some examples, tips,
and recommendations for customer data in the AWS ecosystem. These same
concepts also apply to other clouds and beyond.

1\. S3 or Google cloud storage as a cheap and reliable storage layer

2\. Athena or Big Query as a SQL query layer

3\. AWS Glue Catalog as metadata store

4\. EMR as a transformation layer

