
Presto: Interacting with petabytes of data at Facebook - ternaryoperator
https://www.facebook.com/notes/facebook-engineering/presto-interacting-with-petabytes-of-data-at-facebook/10151786197628920
======
pvnick
>As mentioned above, development on Presto started in Fall 2012. We had our
first production system up and running in early 2013. It was fully rolled out
to the entire company by Spring 2013.

Wow. As somebody who skimmed through the original Google Dremel paper and
thought for a while about how one would go about implementing such an
interactive system, that strikes me as an amazingly impressive timeline.

------
electrum
Hi, I work on Presto at Facebook and would be happy to answer any questions
about it.

~~~
polskibus
1\. Can you force it to do exact (not approximate counts) and aggregations? If
so, what are the preconditions (size of tables?)

2\. In data warehouse terms, would you fit it as a MOLAP, ROLAP or HOLAP kind
of engine? I'm not sure whether all data is held in memory and whether
aggregates are cached. Can you preprocess the dataset in batch mode (let's say
overnight), repopulate aggregate caches for faster retrieval later on? (I mean
that as something similar to SQL Server Analysis Services in MOLAP mode).

3\. Can you compare it to Apache Drill ?

~~~
electrum
1) Presto always returns exact answers, with the exception of the approximate
aggregate functions approx_distinct() and approx_percentile() (see
[http://prestodb.io/docs/current/functions/aggregate.html](http://prestodb.io/docs/current/functions/aggregate.html)).
We do not currently support COUNT(DISTINCT), though if you really need that
you can use a subquery with SELECT DISTINCT.

We have a pull request for partitioned hash aggregations, but currently all
the groups must fit in memory limit specified by the "task.max-memory"
configuration parameter (see
[http://prestodb.io/docs/current/installation/deployment.html](http://prestodb.io/docs/current/installation/deployment.html)
for details).

Regarding approximate queries, we are working with the author of BlinkDB
([http://blinkdb.org/](http://blinkdb.org/)) to add it to Presto. BlinkDB
allows very fast approximate queries with bounded errors (an important
requirement for statisticians / data scientists).

2) Presto is a traditional SQL engine, so it would be ROLAP. We don't yet have
any support for building cubes or other specialized structures (though full
materialized view support with rewrite is on the roadmap).

The Presto query engine is actually agnostic to the data source. Data is
queried via pluggable connectors. We don't currently have any caching ready
for production.

There is a very alpha quality native store that we plan to use soon for query
acceleration. The idea is you create a materialized view against a Hive table
which loads the data into the native store. The view then gets used
transparently when you query the Hive table, so the user doesn't have to
rewrite their queries or even know about it. All they see is that their query
is 100x faster. (All this code is there today in the open source release but
needs work to productionize it.)

We have a dashboard system at Facebook that uses Presto. For large tables,
users typically setup pipelines in Hive that run daily to compute summary
tables, then write the dashboard queries against these summary tables. In the
future, we would like to be able to handle all of this within Presto as
materialized views.

3) We're excited about the Drill project. They have some interesting ideas
about integrating unstructured data processing (like arbitrary JSON documents)
with standard SQL. However, last I looked they were still in early
development, whereas Presto is in production at Facebook and is usable today.
Please also see this comment:
[https://news.ycombinator.com/item?id=6684785](https://news.ycombinator.com/item?id=6684785)

~~~
polskibus
Thanks a lot for the answers, good luck on your further development. I'm a
regular SQL Server user and MOLAP goodies can be very useful. From what I
understand, it should be very easy to join a new data source to existing
Presto warehouse. So I suppose you could "upload" additional mappings from
Excel/CSV into the warehouse and do new queries live (instead of waiting for
batch reprocess). This can be very beneficial to the user/analyst who often
likes to define new data categorizations on-the-fly and use them to dig into
data further.Great job! I'm putting this project on my must-try-shortlist
right now!

~~~
spydum
If this is a hot button item, seriously look into microstrategy >9.2.1. You
build the project against some warehouse, and can import other data sources
like excel, access, or even other odbc sources. The imported data gets
converted to an in memory cube which you can then combine analysis with the
larger warehouse. No reprocessing on the warehouse needed.

------
dude_abides
Will be very interesting if someone does a benchmark comparison of Presto with
Cloudera Impala, Amazon RedShift and Apache Drill.

Also, very curious to know (from any Googlers browsing HN) if Dremel is still
the state-of-the-art within Google, or if there is already a newer
replacement.

~~~
necubi
The state of the art at Google appears to be F1 [0], although that takes a
pretty different approach than Presto/Dremel and may be complementary. This
area (interactive SQL on big data) has become very active lately. In addition
to those you list, there's Shark [1] and BlinkDB [2] from Berkeley's AMP Lab.

[0]
[http://research.google.com/pubs/pub41344.html](http://research.google.com/pubs/pub41344.html)
[1] [http://shark.cs.berkeley.edu](http://shark.cs.berkeley.edu) [2]
[http://blinkdb.org/](http://blinkdb.org/)

~~~
snewman
As sprizzle says, F1 and Dremel are not competitors; Dremel is a read-only
system supporting large analysis queries, while F1 is a read-write system
designed for large numbers of smaller queries. Last year Google published a
paper on PowerDrill
([http://vldb.org/pvldb/vol5/p1436_alexanderhall_vldb2012.pdf](http://vldb.org/pvldb/vol5/p1436_alexanderhall_vldb2012.pdf)),
which sounds somewhat like Dremel but is designed to use large amounts of RAM,
which in turn enables some very powerful optimizations.

------
wehadfun
I'm curious what these 1000s of queries by facebook employees are for. What
data / information does facebook mine for daily?

~~~
nbm
I'm a recent user of Presto and Hive at Facebook after being here coming up to
three years. I've been using them primarily to understand ways in which we
might be sending people's traffic to non-optimal locations, or non-optimal
paths between our CDN and the source content, or to help with understanding
current and extrapolating future capacity needs on our CDN (total requests,
changes in cache hit object/byte rates, &c.)

------
cakeface
> (In a later post, we will share some tips and tricks for writing high-
> performance Java system code and the lessons learned while building Presto.)

I'm waiting in anticipation for that article!

------
monokrome
Rounding down is not a good solution to the rounding problem. As a software
engineer, I also used rounding in my contracts when I was working as a
contractor - and here's why:

If I have 5 clients at a time, and each one has a concept/idea every day that
takes 20 seconds to explain to me (and isn't on the current iteration) then
they are distracting me by not managing their time properly and calling me a
dozen times a day to tell me about their thoughts. If it is a change to the
current iteration, then it should have been discussed when we agreed on the
current iteration's feature set. Either way, the call is not a result of well
thought out time management.

As much as I like hearing new concepts and ideas, I also have to take
attention away from a project that I'm working on in order to provide my full
attention to the client calling me.

After the call is done, I also have to come back to the project at hand and
hopefully I'm not working on something that requires that I retain a super
complicated thought chain which may or may not have been lost in discussion
with another client - especially in consideration that I'm not going to bill
on other client project for the time that I've spent having been sidetracked
and/or getting back to where I was before the call was made.

So, charging in $15 increments causes the client to actually manage their time
with the same effectiveness that they would hope that I am managing mine.

Rounding down makes this more of a problem for me, not less of one. Now a 90
second call is at no charge, and I can get more than one of those in one hour
- still at no charge - based on the suggested agreement.

~~~
haberman
Posted on the wrong story?

~~~
asselinpaul
should be here:
[https://news.ycombinator.com/item?id=6684156](https://news.ycombinator.com/item?id=6684156)

------
tszming
I really hope one day Google can follow Facebook to opensource some of their
core infrastructure technologies (even the legacy one, e.g. 1st generation of
GFS etc) to the opensource community :)

------
hatred
It will be extremely kind if someone can explain the major differences or
features that Presto offers that are different then Impala or other similar
products.

~~~
mtraverso
I can't speak to what other products like Impala, Apache Drill, etc. offer,
but Presto supports the following:

\- Standard ANSI SQL syntax, including all the basic features you'd expect
from a SQL engine (aggregations, joins, etc) and other more advanced features
like analytic window functions, common table expressions (WITH), approximate
distinct counts and percentiles.

\- It's extensible. The open source code base includes a connector for Hive,
but we also have some custom connectors for internal data stores at Facebook.
We're working on a connector for HBase, too.

\- In comparison to Hive, it's very fast and efficient. For our workloads it's
at least 10x more CPU-efficient. Airbnb is using it and has had a similar
experience.

\- Most importantly, Presto has been battle-tested. It's been in production at
Facebook since January and it's used by 1,000 employees every day running
30,000 queries daily. We've hit every edge case you can imagine.

~~~
rpedela
What about connectors to other relational databases: PostgreSQL, MySQL, etc?

~~~
dsundstrom
Connecting Presto to a relational database is a tricky question. If you just
want to have Presto scan the table and perform all computation in Presto, it
is pretty easy, but for this to perform well, you would want to push down some
of the computation to the database. The other problem is if you only have one
database, you would have hundreds of cores hammering that single database for
data.

That said, earlier this year, during a hackathon, we build a prototype
connector that could split a query and push down the relevant parts to a
distributed database that supports simple aggregations. It would be more work
to clean this up and integrate, so if a lot of people are interested in this
we can prioritize that.

~~~
CurtMonash
If you did that, wouldn't you just be re-inventing DATAllegro or early
versions of Greenplum or early versions of Aster? I.e., better than nothing,
but still far short of a modern analytic relational DBMS?

~~~
rpedela
Are those things open-source?

~~~
CurtMonash
Actually, there was an early version of Greenplum that was open source. Nobody
seemed to care much.

------
RyanZAG
Interesting use of optimizing sql query plans down to JVM byte code. I don't
think anybody has done that before - but it sounds like it would work really
well. Would be really interesting to hear more on that as it might be
applicable to a lot of other areas as well.

~~~
yonks
Maybe with SQL but compiling a DML to byte code isn’t a new idea;

[http://xml.coverpages.org/saxon42Ann.html](http://xml.coverpages.org/saxon42Ann.html)

------
leadgen
How can I install and use this on Win 2012 server with SQL server ?

------
gesman
Really nice to see posts like that from Facebook.

------
pswenson
what is the data format FB stores? (xml/json/?)

~~~
electrum
Most of the data in our data warehouse is stored as relational records using
Hive's RCFile format, though we are actively working on converting it to the
new ORCFile format.

The Hive data model supports complex types including arrays, maps and structs
(nested tables), and those are used liberally. We also have a fair amount of
data stored as JSON inside of string columns in Hive tables, though much of
this is actually structured and could be better modeled using Hive's complex
types.

Presto currently has limited support for Hive complex types. All complex types
are converted to JSON at query time and can be accessed using the JSON
functions:
[http://prestodb.io/docs/current/functions/json.html](http://prestodb.io/docs/current/functions/json.html)

------
knodi
Java -_-

------
teddyh
“Beautiful. Unethical. Dangerous.”

— Lucius Fox, _The Dark Knight_

