
Airbnb open-sources Caravel: data exploration and visualization platform - caravel
https://github.com/airbnb/caravel
======
twakefield
If anyone involved with this is around, I'm curious why Airbnb would build
something like this - cost, performance, features, all of the above? Data
querying and visualization is a pretty crowded field with a lot of commercial
options to choose from[1].

I'm not knocking Caravel (it looks amazing) just curious why build vs buy in
this case.

[1] Tableau, Looker, Periscope, Chartio, Qlikview, Gooddata are just some that
come to mind.

~~~
phunge
They mention that it was originally paired with Druid. The data volume that
Druid excels at (and that AirBnB must have) is orders of magnitude larger than
what Tableau&Looker do well with. It's probably just built for bigger-than-SQL
OLAP usecases.

~~~
dwmintz
It's worth distinguishing between the tools that leave the data in your data
warehouse (Caravel, Periscope, Mode, Looker (where I work)), and those that
have their own data stores (Good Data, Qlikview, etc.) Tableau can connect
directly to your datastore, but it's happier if it can operate on data that's
stored locally in-memory.

Anyway, the ones where bring your own database can scale as far as the
database can bring you.

~~~
caravel
Note that Tableau doesn't play well with Presto which Airbnb uses extensively.
No possibility of using the "live mode"

~~~
dwmintz
Not arguing that Airbnb should use Looker, but fwiw, Looker (where I work)
does in fact connect to Presto fine.

------
tedmiston
I really like the Python style you guys have adopted.

Grouping imports into: standard lib, third party, local is a strong pattern
that I don't see done consistently in many repos. Likewise with your use of
wrapping long imports with ()s and a single tab.

Any chance of sharing your Python style guide? My startup is Python based
(Django and Flask) and would really appreciate it!

~~~
kevinastone
The import ordering is pretty common:
[https://google.github.io/styleguide/pyguide.html?showone=Imp...](https://google.github.io/styleguide/pyguide.html?showone=Imports_formatting#Imports_formatting)

------
cauthon
What's the name of this style of plot?

[https://camo.githubusercontent.com/c22acad6c1302c5da3236cb8e...](https://camo.githubusercontent.com/c22acad6c1302c5da3236cb8eea816013a866006/687474703a2f2f692e696d6775722e636f6d2f44326b5a4c37712e706e67)

~~~
devy
I believe it's called "Sankey Diagram" as denoted in the dropdown menu on the
upper top left.

Here is the original demo[1] from Mike Bostock, D3's author.

[1]: [https://bost.ocks.org/mike/sankey/](https://bost.ocks.org/mike/sankey/)

~~~
nthitz
Though they have been around much longer than D3 has...
[https://en.wikipedia.org/wiki/Sankey_diagram](https://en.wikipedia.org/wiki/Sankey_diagram)

------
don_draper
The code is so clean and simple. This is great PR for the company. I want to
work there.

~~~
ultimoo
I think so too!

I have done a fair bit of Ruby a few years ago but I'm new to python CRUD apps
and trying to improve my knowledge here. Is defining all models in the same
file[1] conventional in python apps? Rails used to have separate files for
each model. And most Ruby apps that I have seen advocate the one-class-one-
file convention.

[1]
[https://github.com/airbnb/caravel/blob/master/caravel/models...](https://github.com/airbnb/caravel/blob/master/caravel/models.py)

~~~
tedmiston
All models in one models.py file is common for Flask and Django.

If you use multiple apps within one Django project or the equivalent in Flask
(Blueprints), that extends to one models.py per app (where a "project" is a
collection of "apps").

Sometimes you'll see one file model per (with a models/__init__.py that
imports them for use). While I think it keep dependency imports for each model
very cleanly separated, you end up having a lot of redundancy importing the
same basic pieces in every model file.

------
polskibus
I'm wondering - what's the effort required to build such a BI tool (tables,
charts, maps) these days assuming reusing open source components and focusing
on SQL-speaking datastores? Could a small team of experienced devs accomplish
such a feat in a year?

~~~
caravel
Yes, look at the commit log.
[https://github.com/airbnb/caravel/graphs/contributors](https://github.com/airbnb/caravel/graphs/contributors)

~~~
polskibus
Is it covering the very beginning when the caravel wasn't open source?

------
mooneater
I really need a great data explorer/dashboard for my postgres-based systems. I
was going to use shiny but this looks really nice -- I hope the docs can be
built out very soon. Can anyone comment on other competing products? In the
commercial space, I like Looker but its too pricey.

~~~
jonbishop
Hey! I run marketing for Periscope Data, a data explorer/dashboard product. We
have a lot of customers using postgres and get compared to Looker a lot. We
focus on optimizing for the analyst whereas tools like Looker focus on
business users. We have a lot of features for business users, but chart
creation is all SQL based.

Our site is here:
[https://www.periscopedata.com/](https://www.periscopedata.com/) and if you
have any questions, shoot me an email at jon@periscopedata.com.

~~~
Rapzid
Do you have a self-hosted option? What's your pricing?

~~~
ngould
Word on the street is Periscope costs $1000/mo for unlimited users, up to 1B
rows.

------
kfk
Hey, this is very interesting, I will take a look. I am trying to shift a
whole $700m division and then hopefully $3b segment on a new workflow paradigm
for data (automatically refreshed dashboards instead of sending around excel
files, focus on building models and not data pasting in spreadsheets) and
unfortunately for now Tableau is my only option. I feel very uncomfortable
going with a closed solution since I know that we will have lots of edge cases
and that being able to do your own coding is in the end the best way to deal
with those. License cost is also incredibly high, we are talking $200 per user
per year at minimum, that means 200 to 400 thousand per year for a big
organization between 1000 and 200 users.

~~~
dwmintz
I'm a huge proponent of the idea of centralizing the data model. That's the
core idea behind Looker (where I just came to work after 3 years as a
customer), and I agree it's a hugely powerful change from the world of
everybody-in-their-own spreadsheet.

On your other point, though, to echo the build vs. buy discussion from above,
I think it's a bit misleading to say "oh, we'll just use an open-source
solution and that'll be cheaper." Because if open source means a couple of
internal developers and an analyst, that's easily $300k+/year in salaries that
you might not spend if you were using a vendor.

Anyway, given your particular statement of the problem you're facing, I'd
humbly suggest you take a look at Looker. The data modeling layer that's core
to Looker is meant to solve EXACTLY that problem, by leaving your data where
it lives and then embedding your business logic in the layer that sits between
end users and the data.

------
jedisct1
Wasn't Panoramix already opensource?

~~~
caravel
It was (Panoramix got renamed to Caravel), it's just officially supported,
maintained an grown by Airbnb now.

------
dschiptsov
It is not Java, but Python. What a surprise!

------
arikfr
Have you considered using Re:dash[1] before writing your own tool?

[1] [https://github.com/getredash/redash](https://github.com/getredash/redash)

~~~
gorkemcetin
Redash is around for 2.5 years only and probably AirBNB engs thought it was
too early to have a look at.

------
johnieeboy
seems a little bit sparse on the documentation or am I missing something?

~~~
caravel
We'll be providing short user training videos very soon.

~~~
teej
As someone who is the core audience for this tool can I say that I strongly
prefer clear documentation over videos? Videos are way too hard to maintain
and end up being stale the minute after you post them in fast-moving projects.
I can't text-search a video and I can't be linked directly to an answer in a
StackOverflow response.

Written documentation is vastly superior to videos in my opinion.

~~~
shostack
To add--you can also skim text MUCH more quickly for the piece you want
compared to video.

I absolutely loathe video for any analytics-related documentation. It rarely
adds any real value over text outside of live webinars where I can ask
questions.

------
flashman
Having some trouble getting it working on Windows, which I see you don't
currently support (I need to create caravel_config.py to get past the
fabmanager installation step). This looks really interesting but I might wait
until someone has posted Windows instructions.

~~~
robroy72
I got it to work on windows using the Anaconda Python 2.7 installation; keep
in mind that the caravel commands in the docs have to be run from your install
dir; e.g. change dir to

<yourPythonInstallDir>\Lib\site-packages\caravel\bin

then run as

python caravel db upgrade

------
gavin6252
Great looking product! Has anyone figured out how to join tables yet, or do
you need to define views in your sql database?

------
uberneo
Its Apache licensed .. so it means can i use it directly in my company
replacing the existing commercial products like Tableau / Looker?

------
polartx
what are the supported data sources? I saw SQL tables and I imagine flat
files, can you write calls to web service endpoints?

~~~
wesd
From the readme file:

Database Support

Caravel was originally designed on top of Druid.io, but quickly broadened its
scope to support other databases through the use of SqlAlchemy, a Python ORM
that is compatible with most common databases[1].

[1][http://docs.sqlalchemy.org/en/rel_1_0/core/engines.html](http://docs.sqlalchemy.org/en/rel_1_0/core/engines.html)

~~~
tedmiston
Specifically, SQLAlchemy includes dialects out of the box for: Firebird,
Microsoft SQL Server, MySQL, Oracle, PostgreSQL, SQLite, Sybase.

[http://docs.sqlalchemy.org/en/rel_1_0/dialects/index.html](http://docs.sqlalchemy.org/en/rel_1_0/dialects/index.html)

~~~
infinite8s
And has adapters for Hive, Presto, Redshift and Google BigQuery.

------
markovbling
Awesome - great work!

A tutorial on how to link it to a mysql database would be greatly appreciated
:)

~~~
tedmiston
They're using SQLAlchemy as a database abstraction layer which supports MySQL
out of the box.

So, you just need to set the config param SQLALCHEMY_DATABASE_URI like this:

[https://github.com/airbnb/caravel/blob/1b4e750b2aa111445703d...](https://github.com/airbnb/caravel/blob/1b4e750b2aa111445703dca07fe929f7ffde38c9/caravel/config.py#L31)

The configuration guide explains it further:

[https://github.com/airbnb/caravel/blob/master/docs/installat...](https://github.com/airbnb/caravel/blob/master/docs/installation.rst#configuration)

------
educar
Is this like piwik?

~~~
teej
Piwik is more of a Google Analytics replacement. It's a package that contains
a data visualization platform, a data storage engine, and a data emitter
(website tag) all in one.

This is just a data visualization platform. You need to bring your own data
store and data.

------
vhhuhhfryuhgfh
Any pictures anywhere?

