

Amazon Redshift Now Available to All Customers - jrnkntl
http://aws.amazon.com/redshift/?open

======
alexatkeplar
We're hugely excited about this for SnowPlow
(<https://github.com/snowplow/snowplow>) - Redshift Postgres is a really
attractive storage target for eventstream data. Bit of a shame they don't
support hstore/JSON yet but hopefully that will come in time.

We're going to work on SnowPlow-Redshift integration next week, using the COPY
command + SnowPlow S3 event files. It's great timing as we've been hitting the
limits of what we can do in Infobright (which inherits MySQL's limit of 65532
bytes per row - an unfortunate restriction for a columnar database).

------
amalag
I think this is a smart move. I know companies who are doing their custom data
warehousing using Infobright (another column store database), the free
version. I am sure they will be interested to dump a lot of custom scripts and
do all their querying on Amazon since their data is there anyway.

------
arielweisberg
Initially I was really excited by Redshift, but when I got a chance to play
with it I found out that there is no JDBC support for any kind of bulk insert
or trickle loading.

The Postgres JDBC driver when you try and do batch inserts runs each statement
individually and you end up inserting 10s of rows a second.

I wish they had gone with something like Vertica.

~~~
mallipeddi
You can use the COPY command to do bulk imports from S3. We also support
importing from DynamoDB.

~~~
arielweisberg
What if DynamoDB doesn't solve my problem because I need transactions?

Why do I have to write code to perform an extra step and pay the extra cost
and latency of pushing data through S3 just to get it into Redshift?

Not supporting trickle loading is a leaky abstraction IMO. It's not a ton of
code to log statements until you have enough to justify an import and you
shouldn't push that complexity on every database user.

Postgres supports copying from a binary stream, why not support that?

~~~
shanif
I'd have to agree with arielweisberg here. Our organization was really excited
about Redshift a few days ago, but after seeing each of our individual INSERTs
take upwards of 2 seconds, and hearing that we should first upload to S3 or
Dynamo, we decided the platform would not fit our needs.

Our goal is minimal architecture complexity, and to upload log files or other
data to a file system before loading it into a data warehouse just doesn't
make sense.

We're currently looking into Hadoop/HDFS/Impala due to cost constraints
(Vertica would have been our primary choice). If anyone has any other
suggestions it would be great to hear them.

------
espeed
What are the best options for clickstream tracking for storing in a data
warehouse?

I've looked at Snowplow (<https://github.com/snowplow/snowplow>) -- is that
what most people are using, are you rolling your own, etc?

~~~
ra
We use snowplow JS with a custom django app that we include in each project.
It stores clicks and events in Redis, as well as gziped logfiles for permanent
storage. The redis data expires after a configurable period of time.

------
UnoriginalGuy
What's the difference between this and Amazon's RDS or S3? Is it just data
storage with an easy way to query the for-mentioned data?

Seems like an "odd" product that kind of compete's with Amazon's existing
offerings in many ways...

~~~
SpikeGronim
This is optimized for bulk analysis. RDS is more optimized for low latency
queries. This is because Redshift is column oriented:
<http://en.wikipedia.org/wiki/Column-oriented_DBMS> .

~~~
kevindication
Yep. Redshift is based off of ParAccel. We used that DB for some really
awesome large scale analytics on a previous project.

[http://www.zdnet.com/amazon-redshift-paraccel-in-costly-
appl...](http://www.zdnet.com/amazon-redshift-paraccel-in-costly-appliances-
out-7000008111/)

------
valhallarecords
How is this different/better than Google BigQuery?

How does speed/performance compare to something like that shown here:

<https://cloud.google.com/bigquery-tour>

------
grzaks
Perfect, but how do we upload those already collected 300GB raw data there ...

~~~
TY
Use AWS Import/Export service to bring it into S3, then you can load it from
there:

<http://aws.amazon.com/importexport>

