
Need Input – Mysql Database vs. Redshift - akshayB
We have a extremely small database few GBs in size and wanted to combine data from lot of third party vendors for analysis. Is it worth moving to AWS Redshift or we are better of just creating a new Mysql instance and aggregate data in MySql. Performance &amp; processing are not an issue here since the size of our data is pretty small, we just intend to do simple querying nothing fancy.<p>I am leaning towards a new MySql instance since we have lot of code and functionality which already works fine with MySql + cost considerations as well. I guess my big question is that does AWS Redshift offers any functionality which is worth exploring in this use case of just simple querying and analytical reporting.
======
scapecast
For a small dataset like that, Redshift is overkill. Especially if it's a one
time thing. Redshift excels when you have (at least) TBs of data from many
different sources.

Having said that - if you expect to query your data on an ongoing basis, with
fresh (and growing) data coming in every day, then it's worth considering. You
can run your transformations on top of raw data in Redshift. It's certainly
more expensive than S3. But with just a few GBs, you'll stay underneath $150 /
months.

Reg. Spark vs. Redshift - see my post on Quora:

[https://www.quora.com/Spark-vs-Redshift-Should-I-be-using-
bo...](https://www.quora.com/Spark-vs-Redshift-Should-I-be-using-both-for-big-
data-Which-is-better/answer/Lars-Kamp?srid=DtA)

------
vlahmot
You don't need Redshift and it's not really the best for "combining data".

I'd throw the data on S3, do the processing in spark(you can likely run on one
node in local mode for now at that scale and scale as the data does), write
the data back to s3, load that processed/aggregated data from s3 into mysql
since you running that already and can just plug in your BI tools.

Much easier to process data not in the db, s3 as a source of truth is great in
AWS, and much cheaper.

