I wanted to point out one important distinction between Bokeh and anything that's currently out there - we have a full blow python/js object bridge that synchronizes client side models with objects you can interact with in python. The significance of this, is that someone can select points on a scatter plot, and then you can retrieve the indexes of those points on the python side, and use that to further dive into your data.
It's worth mentioning that the IPython guys are implementing a similar json/python bridge to support the new interactive tools in the IPython notebook. Once that is up and running, we'll probably just piggy back off of that bridge, when you're running in the notebook.
As a "database guy" who uses Python for most things not bash, is this an approach for viz apps that would eliminate (most) of the need to muck about in JavaScript?
D3 and its children produce some awesome visualizations, but the bandwidth does not exist for me to begin developing apps in a language I don't have much experience in.
If something like Bokeh allows me to live mostly in Python, it becomes even more interesting.
That is exactly the point of bokeh. It allows you to write python code and get browser based plots. Currently we have a python interface, but we intend to build interfaces in other langauges. Look at the examples gallery, which include the code needed to generate the plots. http://bokeh.pydata.org/gallery.html
This is really cool, but what about adding interactivity with the charts? My first thought when I saw the headline was that this python library just generates D3 code, but it seems to be generating some sort of a static SVG object.
There are already some tools available, pan and wheel zoom, plot resizing, selection (on some plot types). Many more tool types are planned.
As for the architecture, BokehJS is built entirely on top of HTML canvas. The python bokeh library sends data and plot specifications to the browser, which uses BokehJS to render the plot and handle interactive tools, etc.
The current zoom tool is a scroll wheel zoom, not a box selection zoom. Guessing that might be the issue since others have run into the ambiguous description as well. It will be labeled more precisely in master in a few days and in the next release as well.
If that is not the issue, please file a ticket on GitHub!
Do you have the tool selected in the toolbar above? If so, please file a ticket on GH about this. Most of the Bokeh devs are on OSX so it would surprise me if it does not work on OSX, but if there is a bug we want to fix it!
To be more clear, there are two main parts of bokeh. First there is the js portion which uses canvas to create the plots. There is an object model that allows plots to be composed of multiple components (glyphs, data sources, axes, data ranges).
The python side produces json that represents the objects to be plotted. Python only writes a small amount of js to start the js running. For the most part python just produces json objects that the js side reads.
There could be alternate implementations of the python side that still use the same js rendering logic. You could even write a nicer higher-level js api that wraps the low-level component construction.
As a photographer, I gotta say I really love the name. :)
For those who aren't photography nerds, "bokeh" is a Japanese word that means the out-of-focus areas in a photograph. Different lenses have different kinds of bokeh, and beautiful or ugly bokeh is an important dividing line between good and bad lenses.
As someone who likes photography and analytics, I disagree about the name.
As you say, bokeh is about out-of-focus blur. That's sort of the opposite impression you want to present in a tool that's intended to give you "clarity" via its visualizations.
Actual, bokeh is about the quality of the blur. Yes, blur can have quality. If you simply removed everything that was not in focus, you lose context and texture about the subject. If you use a pinhole camera and present everything in sharp focus, you lose the insight.
"""
Photographers use the Japanese word “bokeh” to describe the blurring of the out-of-focus parts of an image. Its aesthetic quality can greatly enhance a photograph, and photographers artfully use focus to draw attention to subjects of interest. “Good bokeh” contributes visual interest to a photograph and places its subjects in context.
In this vein of focusing on high-impact subjects while always maintaining a relationship to the data background, the Bokeh project attempts to address fundamental challenges of large dataset visualization...
"""
>> Actual, bokeh is about the quality of the blur.
Yes, you're right. But most people tend to treat out-of-focus blur synonymously with bokeh (the quality of the blur) -- they're related but not the same. In this particular library's case, I think they're talking about out-of-focus blur, not bokeh.
As someone who likes photography, I have to disagree about your reasoning. The clarity of your subject is as much about how much it is in focus as how much irrelevant things are out of focus. Portrait photos are often beautiful when the lens has good bokeh characteristics.
>> The clarity of your subject is as much about how much it is in focus as how much irrelevant things are out of focus.
If you actually look at the sample output of the library, there's nothing out of focus, at least from the perspective of depth of field. In my opinion (and it is just an opinion), calling all de-emphasized data bokeh is a stretch at best. Blurring and de-emphasis using color and size are two different things.
>> Portrait photos are often beautiful when the lens has good bokeh characteristics.
Let's be clear -- while bokeh can enhance the beauty of a portrait, it doesn't make a portrait beautiful. Most people don't know the difference between good bokeh and bad bokeh (pwang's definition of bokeh in his response to me is very good), but they can usually identify a blurred background vs. a sharp background.
Many people tend to prefer a sharp subject against a blurred background, and that's usually enough for most people to consider a portrait beautiful even if the bokeh is quite ugly. Without getting into a long drawn out discussion of bokeh, you have to remember that there's also more to a beautiful portrait than the novelty of a blurred background.
If you're thinking in terms of the _spirit_ of bokeh, and understand 'semantic downsampling', you've got my vote. As someone whose tried to use D3, I like this a lot. I'll be experimenting with Bokeh now soon.
I didn't realise "bokeh" was a Japanese word. That's really cool. Seems like it uses this kanji 暈 [1] meaning "corona" or "halo" (in turn made up of the "sun" kanji, "car" kanji and "crown" radical–I'm not sure if there's a historical reason for this but it makes it easy to remember at least).
Boke (meaning blur) is from a combination of kanji 暈 (bo) and hiragana け (ke). You'll notice that "bokeh" doesn't look like usual romaji (Japanese romanization) spelling, since it would normally be transliterated as boke. It was spelled bokeh with the extra 'h' to avoid accidental pronunciation like poke.
Thanks for the info. I just realised too (thanks to a friend) that the kanji might originate because "corona" sounds similar to "kuruma" (meaning "car"), and that's why the "bo" kanji has the "car" kanji in it.
It's nice, I've heard of it a while ago. But I just had a crazy thought of combining this with UTFGrid for interacting with data points, but that's probably silly :)
Are mouse hover interactions in the timeline (display value of selected point)? Don't see any references but otherwise this is a very interesting project.
[Another bokeh dev chiming in] Additional tools like crosshair, data and color inspectors, box zoom, more types of selections (point, lasso, etc), and measurement tools are all planned.
This is badass. I wish there were something like this or Seaborn [1] for Matlab. Anyone know of anything similar that can make the ugly default Matlab plots turn into beauties like these?
This doesn't directly answer your question, but in regards to seaborn, our goal is to support enough of the mpl API so we can get seaborn to work with bokeh. Either that or we would write our own ggplot interface. Which ever approach ends up being easier
What does 'large datasets' mean here?
We are building a visualization service to abstract users from fiddling with d3 and other libraries. We want users to be able to use all of viz libraries out there with just providing data input and tweaking settings, so this looks interesting.
Part of the 0.4 release is to incorporate the concept of abstract rendering - which means you render on the server, and then send the necessary information over to the client on demand. For example, if someone tries to scatter a billion points, instead of just drawing a useless point cloud, you would figure out where all the points fit inside your 512x512 canvas (or whatever size you have), figure out how all the points stack up, compute an alpha that is meaningful for that number of points, and then send the heatmap to the client.
You can easily imagine as similar approach for line plots which does selectively downsampling of datapoints in order to preserve interesting features in the plot.
And then we'll build interactors on top of that, so you can actually treat it like a scatter plot, even though it's a heatmap that's being sent to your browser.
So the answer is - large datasets, means, as large as our abstract rendering algorithm can handle on your hardware, so those data sets should be pretty big.
Interesting, this is for our second phase then (we're launching soon, you'll know about it). We'll definitely look into it if we can provide an interface for bokeh as well then. Currently we're transforming user provided sheets (csv etc.) into json and tying them into viz on client side.
Thanks for answer.
I have a project involving multi-gigabyte datasets of line plot data. With your 0.4 release, will it be possible to show down-sampled subsets of these plots, with the ability to pan/zoom around and get more data on demand without having it all held in memory?
Well, we'll be able to do that without sending the data to the client, not sure if our implementation right now will work without loading the data into memory, though long term that is definitely the plan (we will leverage http://blaze.pydata.org/)
If you want to discuss further, please email bokeh@continuum.io
The Python version of Abstract Rendering currently would load it all into memory. The Java version is based on the same algorithms would not. It routinely handles multi-gigabyte files and lets us know that the core algorithm can scale. We're working on getting the Python implementation to scale as well.
January - but probably only support for abstract rendering for scatter plots and line plots. we'll have to roll it out incrementally, but the good thing is 90% of plots are scatters and lines =)
Has anyone used it in production ? I would be very interested to hear about interoperability potential with other platforms - some json-based protocol perhaps? In D3 I can just point it to csv and do whatever I need to. Is it just as easy in BokehJS?
I realize this is nitpicky, but it seems obnoxious to call these[1] "candlesticks" when it would make much more sense to call the page box plots[2]. Wouldn't it?
"""
Q: Why did you start writing a new plotting library, instead of just extending e.g. Matplotlib?
A: There are a number of reasons why we wrote a new Python library, but they all hinge on maximizing flexibility for exploring new design spaces for achieving our long-term visualization goals. (Please see Technical Vision[1] for details about those.)
"""
It's worth mentioning that the IPython guys are implementing a similar json/python bridge to support the new interactive tools in the IPython notebook. Once that is up and running, we'll probably just piggy back off of that bridge, when you're running in the notebook.