

With Amazon Redshift SSD, querying a TB of data took less than 10 seconds - fujibee
https://www.flydata.com/blog/posts/with-amazon-redshift-ssd-querying-a-tb-of-data-took-less-than-10-seconds

======
jandrewrogers
These numbers are not that surprising for an OLAP cluster. Even though
Redshift is really architected to run on spinning disks, SSDs will almost
always improve the performance.

On the other hand, the load performance is quite poor. On the 12x dw2.large
hardware, a good clustered analytical database engine should be able to easily
load 1.2TB in less than 15 minutes while the database tables are online and
being queried. That it took well over an hour, and with a very simple data
model at that, would argue against it being good for "real-time" even with
SSDs. (This is not a surprising result though; Redshift is just a clustered
PostgreSQL variant, which does not have the best internals for real-time.)

~~~
kevindication
It's not a Postgres variant at all. Postgres is emulated as an interface to
the columnar ParAccel database underneath. ParAccel does neat things (compiles
your SQL into a program that it runs to answer the question, for instance) and
really rips if you can order your data on good keys up front (and then use
those keys in your query, of course).

Source: I helped build a very high speed network data analytical tool on top
of ParAccel (before it was bought by Amazon and rolled into redshift).

~~~
jandrewrogers
That is not an emulation.

ParAccel, like a large percentage of parallel analytical databases, are forked
off the excellent PostgreSQL code base because those internals were designed
to be easy to extend and modify. Netezza, Vertica, EMC/Greenplum,
Teradata/Aster, et al are all PostgreSQL derivatives as well with varying
degrees of divergence. I've designed and built custom parallel derivatives of
PostgreSQL for companies too, it is surprisingly straightforward.

There are only a handful of original, high-quality database kernels out there
because it is enormously difficult to design one from scratch. Most good
databases copy an existing design, or even more conveniently, fork the mature,
easily modifiable, BSD-licensed, Stonebraker-designed PostgreSQL kernel. Every
basic kernel design has distinctive characteristics that tend to stick with
everything derived from them, which leaves an identifiable "fingerprint" on a
new database if you know what to look for. You inherit both the strengths and
weaknesses of the underlying kernel design.

(Source: I've designed analytical databases engines for a long time.)

~~~
kevindication
But to call that a Postgres variant seems to suggest that they have way more
in common than they really do. The trade offs that vertical databases are
making are kind of alien for someone who is used to using Postgres.

Really cool work though!

------
fear91
SSD drive saved my life when I had to query a 300 GB MySQL table that couldn't
fit in my RAM. Since the data was organized by the primary key ( which was
random in the SELECT queries), both reads and writes had to come from random
places and the whole process became IOPS bound ( ordinary HDD can query only
around 75-150 different disk areas per second). So while a normal HDD can
achieve good sequential read speed, it SUCKS when it comes to reading data
spread randomly.

I was amazed how much improvement I've seen just by getting an SSD - and how
cheap it was compared to all other solutions.

------
leobelle
It's not cheap. Base price is $0.25 per hour:

[http://www.wolframalpha.com/input/?i=%240.25+per+hour+for+a+...](http://www.wolframalpha.com/input/?i=%240.25+per+hour+for+a+month)

$183 a month.

~~~
flavor8
That's still a bargain compared to running your own Vertica or Greenplum
cluster.

~~~
leobelle
That's just the base price if you do nothing. The costs increase when you
actually store and query data.

~~~
jeffbarr
No, that's not the case. You pay for the cluster by the hour.

------
antonmks
Is it possible to generate the dataset that you used ? I would like to run a
benchmark for myself and downloading a 1 TB file from Amazon unfortunately is
not an option.

------
CompleteMoron
whoa! sign me up! I wanna develop something with this speed

~~~
goldenkey
Are you by chance, a complete moron? Wait a minute...

