Hacker News new | past | comments | ask | show | jobs | submit login
Bokeh – Interactive web visualization library in Python (pydata.org)
234 points by trueduke on Nov 20, 2013 | hide | past | favorite | 64 comments



I wanted to point out one important distinction between Bokeh and anything that's currently out there - we have a full blow python/js object bridge that synchronizes client side models with objects you can interact with in python. The significance of this, is that someone can select points on a scatter plot, and then you can retrieve the indexes of those points on the python side, and use that to further dive into your data.

It's worth mentioning that the IPython guys are implementing a similar json/python bridge to support the new interactive tools in the IPython notebook. Once that is up and running, we'll probably just piggy back off of that bridge, when you're running in the notebook.


As a "database guy" who uses Python for most things not bash, is this an approach for viz apps that would eliminate (most) of the need to muck about in JavaScript?

D3 and its children produce some awesome visualizations, but the bandwidth does not exist for me to begin developing apps in a language I don't have much experience in.

If something like Bokeh allows me to live mostly in Python, it becomes even more interesting.


That is exactly the point of bokeh. It allows you to write python code and get browser based plots. Currently we have a python interface, but we intend to build interfaces in other langauges. Look at the examples gallery, which include the code needed to generate the plots. http://bokeh.pydata.org/gallery.html

A relatively simple plot (scroll down to see the code): http://bokeh.pydata.org/plot_gallery/correlation.html

The simplest example plot in the repo: https://github.com/ContinuumIO/bokeh/blob/master/examples/pl...


This is really cool, but what about adding interactivity with the charts? My first thought when I saw the headline was that this python library just generates D3 code, but it seems to be generating some sort of a static SVG object.


We're actually using canvas.

http://bokeh.pydata.org/plot_gallery/iris.html

That has a selection tool you can play around with.


There are already some tools available, pan and wheel zoom, plot resizing, selection (on some plot types). Many more tool types are planned.

As for the architecture, BokehJS is built entirely on top of HTML canvas. The python bokeh library sends data and plot specifications to the browser, which uses BokehJS to render the plot and handle interactive tools, etc.


Gotcha, thanks.


Here is a linked brushing example in the IPython notebook. http://nbviewer.ipython.org/urls/raw.github.com/ContinuumIO/...


For some reason I can't get any of the interactivity buttons to work.

EDIT: Ok, everything but the zoom works. How do you zoom?


The current zoom tool is a scroll wheel zoom, not a box selection zoom. Guessing that might be the issue since others have run into the ambiguous description as well. It will be labeled more precisely in master in a few days and in the next release as well.

If that is not the issue, please file a ticket on GitHub!


On a Macbook, the scroll wheel equivalent is the two fingered scroll and that doesn't seem to work for me (scroll the page).


Do you have the tool selected in the toolbar above? If so, please file a ticket on GH about this. Most of the Bokeh devs are on OSX so it would surprise me if it does not work on OSX, but if there is a bug we want to fix it!


Hope there's some kind of python server side push technology to update browser graphs in realtime.



To be more clear, there are two main parts of bokeh. First there is the js portion which uses canvas to create the plots. There is an object model that allows plots to be composed of multiple components (glyphs, data sources, axes, data ranges).

The python side produces json that represents the objects to be plotted. Python only writes a small amount of js to start the js running. For the most part python just produces json objects that the js side reads.

There could be alternate implementations of the python side that still use the same js rendering logic. You could even write a nicer higher-level js api that wraps the low-level component construction.

I talked about this at PyData NYC. Here is my notebook (which I am in the process of updating for bokeh 0.3) http://nbviewer.ipython.org/urls/raw.github.com/paddymul/bok...


Take a look at Vincent, which may give you what you're looking for (Python -> Vega, a d3 wrapper): https://vincent.readthedocs.org/en/latest/


If you're looking to make scientific D3 graphs with Python, never touching anything close to javascript, you can also check out the Plotly Python API:

https://plot.ly/api/python

It was designed foremost to make graphs pertinent for scientific and engineering applications: https://plot.ly/~alex/76/

(Disclosure: I'm a dev @Plotly)


As a photographer, I gotta say I really love the name. :)

For those who aren't photography nerds, "bokeh" is a Japanese word that means the out-of-focus areas in a photograph. Different lenses have different kinds of bokeh, and beautiful or ugly bokeh is an important dividing line between good and bad lenses.


As someone who likes photography and analytics, I disagree about the name.

As you say, bokeh is about out-of-focus blur. That's sort of the opposite impression you want to present in a tool that's intended to give you "clarity" via its visualizations.


Actual, bokeh is about the quality of the blur. Yes, blur can have quality. If you simply removed everything that was not in focus, you lose context and texture about the subject. If you use a pinhole camera and present everything in sharp focus, you lose the insight.

This is actually mentioned in the documentation: http://bokeh.pydata.org/#technical-vision

""" Photographers use the Japanese word “bokeh” to describe the blurring of the out-of-focus parts of an image. Its aesthetic quality can greatly enhance a photograph, and photographers artfully use focus to draw attention to subjects of interest. “Good bokeh” contributes visual interest to a photograph and places its subjects in context.

In this vein of focusing on high-impact subjects while always maintaining a relationship to the data background, the Bokeh project attempts to address fundamental challenges of large dataset visualization... """


>> Actual, bokeh is about the quality of the blur.

Yes, you're right. But most people tend to treat out-of-focus blur synonymously with bokeh (the quality of the blur) -- they're related but not the same. In this particular library's case, I think they're talking about out-of-focus blur, not bokeh.


Yeah, I kinda glossed over that.


As someone who likes photography, I have to disagree about your reasoning. The clarity of your subject is as much about how much it is in focus as how much irrelevant things are out of focus. Portrait photos are often beautiful when the lens has good bokeh characteristics.


>> The clarity of your subject is as much about how much it is in focus as how much irrelevant things are out of focus.

If you actually look at the sample output of the library, there's nothing out of focus, at least from the perspective of depth of field. In my opinion (and it is just an opinion), calling all de-emphasized data bokeh is a stretch at best. Blurring and de-emphasis using color and size are two different things.

>> Portrait photos are often beautiful when the lens has good bokeh characteristics.

Let's be clear -- while bokeh can enhance the beauty of a portrait, it doesn't make a portrait beautiful. Most people don't know the difference between good bokeh and bad bokeh (pwang's definition of bokeh in his response to me is very good), but they can usually identify a blurred background vs. a sharp background.

Many people tend to prefer a sharp subject against a blurred background, and that's usually enough for most people to consider a portrait beautiful even if the bokeh is quite ugly. Without getting into a long drawn out discussion of bokeh, you have to remember that there's also more to a beautiful portrait than the novelty of a blurred background.


> If you actually look at the sample output of the library, there's nothing out of focus,

We are working on the semantic downsampling and perceptual integration aspects of visualizing large data. This currently lives in its own repo: https://github.com/JosephCottam/AbstractRendering

> calling all de-emphasized data bokeh is a stretch at best

It's really just meant to be an evocative metaphor... :-)


If you're thinking in terms of the _spirit_ of bokeh, and understand 'semantic downsampling', you've got my vote. As someone whose tried to use D3, I like this a lot. I'll be experimenting with Bokeh now soon.


We have a paper that we'll be presenting at SPIE VDA 2014 in February: http://spie.org/EI/conferencedetails/visualization-data-anal...

With the 0.3 release out now, I'm focusing on building hooking up the abstract rendering backend for the plot server, so just keep an eye out.


Do you guys all work for Continuum? Need more hands? ;)


Could always use more hands, especially in certain areas. Shoot an email to jobs@continuum.io and reference this post.


>> It's really just meant to be an evocative metaphor... :-)

Gotcha - me not liking a name is just my own personal opinion. The library itself is interesting.

You can't please everyone all of the time. ;)


I didn't realise "bokeh" was a Japanese word. That's really cool. Seems like it uses this kanji 暈 [1] meaning "corona" or "halo" (in turn made up of the "sun" kanji, "car" kanji and "crown" radical–I'm not sure if there's a historical reason for this but it makes it easy to remember at least).

[1] http://jisho.org/kanji/details/暈


Boke (meaning blur) is from a combination of kanji 暈 (bo) and hiragana け (ke). You'll notice that "bokeh" doesn't look like usual romaji (Japanese romanization) spelling, since it would normally be transliterated as boke. It was spelled bokeh with the extra 'h' to avoid accidental pronunciation like poke.

[1] http://www.luminous-landscape.com/essays/bokeh.shtml


Thanks for the info. I just realised too (thanks to a friend) that the kanji might originate because "corona" sounds similar to "kuruma" (meaning "car"), and that's why the "bo" kanji has the "car" kanji in it.


I thought bokeh was the aesthetic quality of the out of focus area, not the area itself.


It's nice, I've heard of it a while ago. But I just had a crazy thought of combining this with UTFGrid for interacting with data points, but that's probably silly :)


No - that's not crazy at all.

In fact we are working on (and open sourcing) similar ideas

http://www.youtube.com/watch?v=b0-4xtFeaT8


Are mouse hover interactions in the timeline (display value of selected point)? Don't see any references but otherwise this is a very interesting project.


[Another bokeh dev chiming in] Additional tools like crosshair, data and color inspectors, box zoom, more types of selections (point, lasso, etc), and measurement tools are all planned.


Check out the Plotly APIs for hover - Here's an example: https://plot.ly/~alex/75/ https://plot.ly/api/python (Disclosure: I'm a dev at @Plotly)


This is badass. I wish there were something like this or Seaborn [1] for Matlab. Anyone know of anything similar that can make the ugly default Matlab plots turn into beauties like these?

[1] http://stanford.edu/~mwaskom/software/seaborn/index.html


This doesn't directly answer your question, but in regards to seaborn, our goal is to support enough of the mpl API so we can get seaborn to work with bokeh. Either that or we would write our own ggplot interface. Which ever approach ends up being easier


Yes! Is would be crazy awesome if I could do 'from bokeh import pyplot' or something similar.


The Plotly MATLAB API does exactly that: https://plot.ly/api/matlab


I don't know if anyone else noticed but we owe DARPA's XDATA program a thank you note too for funding this project.


They've been great supporters of this effort, as well as Blaze[1] and Numba[2].

[1] http://blaze.pydata.org

[2] http://numba.pydata.org


What does 'large datasets' mean here? We are building a visualization service to abstract users from fiddling with d3 and other libraries. We want users to be able to use all of viz libraries out there with just providing data input and tweaking settings, so this looks interesting.


Part of the 0.4 release is to incorporate the concept of abstract rendering - which means you render on the server, and then send the necessary information over to the client on demand. For example, if someone tries to scatter a billion points, instead of just drawing a useless point cloud, you would figure out where all the points fit inside your 512x512 canvas (or whatever size you have), figure out how all the points stack up, compute an alpha that is meaningful for that number of points, and then send the heatmap to the client.

You can easily imagine as similar approach for line plots which does selectively downsampling of datapoints in order to preserve interesting features in the plot.

And then we'll build interactors on top of that, so you can actually treat it like a scatter plot, even though it's a heatmap that's being sent to your browser.

So the answer is - large datasets, means, as large as our abstract rendering algorithm can handle on your hardware, so those data sets should be pretty big.


Interesting, this is for our second phase then (we're launching soon, you'll know about it). We'll definitely look into it if we can provide an interface for bokeh as well then. Currently we're transforming user provided sheets (csv etc.) into json and tying them into viz on client side. Thanks for answer.


I have a project involving multi-gigabyte datasets of line plot data. With your 0.4 release, will it be possible to show down-sampled subsets of these plots, with the ability to pan/zoom around and get more data on demand without having it all held in memory?


Well, we'll be able to do that without sending the data to the client, not sure if our implementation right now will work without loading the data into memory, though long term that is definitely the plan (we will leverage http://blaze.pydata.org/)

If you want to discuss further, please email bokeh@continuum.io


The Python version of Abstract Rendering currently would load it all into memory. The Java version is based on the same algorithms would not. It routinely handles multi-gigabyte files and lets us know that the core algorithm can scale. We're working on getting the Python implementation to scale as well.


For when is the 0.4 release planned? I would be really interested by this, having to visualize terabytes of data in the browser.


January - but probably only support for abstract rendering for scatter plots and line plots. we'll have to roll it out incrementally, but the good thing is 90% of plots are scatters and lines =)


Perfect, thanks.


Has anyone used it in production ? I would be very interested to hear about interoperability potential with other platforms - some json-based protocol perhaps? In D3 I can just point it to csv and do whatever I need to. Is it just as easy in BokehJS?


We will be adding stream datasources (which allow pulling from 3rd party jsonp feeds) for bokeh 0.4. We expect to release bokeh 0.4 in early 2014.

It is incredibly easy to use bokeh from python. The burtin example in the gallery reads from CSV http://bokeh.pydata.org/plot_gallery/burtin_example.html . Scroll down a bit, and you can see the code.

I am a bokeh dev at Continuum Analytics.


I realize this is nitpicky, but it seems obnoxious to call these[1] "candlesticks" when it would make much more sense to call the page box plots[2]. Wouldn't it?

[1] http://bokeh.pydata.org/plot_gallery/candlestick.html

[2] http://en.wikipedia.org/wiki/Box_plot



Awesome! I fell for the superficial resemblance, clearly. :) (I even searched for the term candlestick on the wikipedia page for box plots... )


This is really awesome. I just added it to my IPython Notebook Mac app: https://github.com/liyanage/ipython-notebook/wiki


Cool! Thanks!

Let us know if you ever have any problems with it.


Just for reference, here is the actual 0.3 announcement: http://continuum.io/blog/bokeh03


Can someone comment on how this compares with matplotlib?


In the FAQ: http://bokeh.pydata.org/faq.html

""" Q: Why did you start writing a new plotting library, instead of just extending e.g. Matplotlib?

A: There are a number of reasons why we wrote a new Python library, but they all hinge on maximizing flexibility for exploring new design spaces for achieving our long-term visualization goals. (Please see Technical Vision[1] for details about those.) """

[1] http://bokeh.pydata.org/index.html#technicalvision




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: