

Database startup Drawn to Scale is closing down - dmor
http://gigaom.com/2013/05/17/database-startup-drawn-to-scale-is-closing-down/

======
mratzloff
I know nothing about this company, so perhaps I have answered my own question,
but...

How does a SQL-on-Hadoop solution, especially one of the first on the market,
fail in this big data-obsessed investment climate?

~~~
jandrewrogers
The issue is that Hadoop, once you strip away the hype, is a terrible
analytical database system. It was designed for a narrow analytic use case but
has been sold far beyond its capability and no amount of wrapping or
abstraction will make the reality of the underlying architecture go away.

Consequently, you have a crowded market of companies that try to differentiate
by modding Hadoop to make it do things it really can't do well for fundamental
computer science reasons. So it is both a highly competitive market _and_ only
marginal value is being added to the base Hadoop platform that customers can
discern. VCs know this and are mostly done with the Hadoop portion of the Big
Data market.

Most of the new growth is in non-Hadoop platforms with more capable
architectures. It is a point of differentiation that they can handle many
workloads (e.g. real-time) and many data models (e.g. geospatial) that Hadoop
fundamentally cannot handle well.

It is the natural evolution of the hype cycle. Hadoop is being pushed back
into the niche where it actually offers some value and other more capable
platforms are starting to eat into the parts of the market for which it is
poorly suited.

~~~
cmccabe
Hadoop can do realtime. One example is Cloudera Impala, which can do small SQL
queries in seconds or less. Another, non-SQL example is using the Lambda
Architecture ([http://jameskinley.tumblr.com/post/37398560534/the-lambda-
ar...](http://jameskinley.tumblr.com/post/37398560534/the-lambda-architecture-
principles-for-architecting)) with something like Storm or S4.

~~~
jandrewrogers
Real-time is round-trip time: the latency between when new data is available
for ingest and when that data shows up in queries. Database engines that are
designed for these types of workloads have a round-trip latency measured in
milliseconds, and some of them can do this while ingesting millions of records
per second at petabyte scales. As in, very fast SQL queries concurrent with
extremely high continuous ingest workloads.

The term does not mean "fast queries". Otherwise, many parallel SQL databases
would be "real-time" because some of those are even faster than Impala in this
regard. A system that uses offline data loading is not real-time.

Stream processing systems like Storm are real-time in this sense but are not
databases. I don't think you can run ad hoc SQL queries against the data being
processed by Storm nor can it store those streams to disk for future queries.
They also aren't "big" data due to the limitations of the architecture.

~~~
cmccabe
You're missing the point, which is that the current limitations of the system
may not be the limitations in the future.

~~~
jandrewrogers
Current limitations _do_ imply future limitations. Database engines (and
similar software) necessarily embed a large number of tradeoffs and
assumptions in their design at the most fundamental levels. Every line of code
is written to support the target workload to the exclusion of others because
of the tradeoffs required. Design decisions are extremely sticky over the
long-term because they are tacitly embedded in every piece of code.

The point you are missing is that (1) significantly altering the basic
architectural characteristics is tantamount to a complete rewrite from scratch
and (2) existing users design their applications around the design assumptions
of the platform so fundamentally changing the architecture abandons the user
base as well.

This is why in practice almost everyone starts from a blank slate if they need
new architectural capabilities. And in the few cases where they did manage to
re-architect an existing system, not only did it cost more than doing it from
scratch but they lost their user base anyway.

I've designed these types of systems for a long time. When a database engine
is first designed, its capabilities, limitations, and future performance
ceiling are essentially set in stone. All potential modifications have to fit
within those constraints so you need to be cognizant of the choices you are
making but not really thinking about. An experienced database engine designer
can look at a system design and tell you what types of data models and
workloads the system will always do poorly on. And every database engine
design has weaknesses designed into them. Hadoop just happens to be a
particularly weak engine with a low ceiling in terms of its expressiveness.

------
gsteph22
Hey -- CEO of DTS here. Quick word on the tech -- it was basically a clone of
Google F1, with full ANSI SQL support. It was damn fast and could handle tens
of thousands of connections on a modest cluster. Way more than Salesforce's
Phoenix, which was a small subset of SQL and focused on analytics.

We failed for human reasons :)

