Hacker News new | past | comments | ask | show | jobs | submit login
Google open sources key piece of Street View technology (google-opensource.blogspot.com)
350 points by mierle on May 1, 2012 | hide | past | favorite | 51 comments

This is a key component of Street View. We're really happy to have released this. When we started writing Ceres, there was nothing available like it. It's state of the art; a real contribution to computer vision and other fields which need large scale nonlinear least squares.

Part of the reason I worked on this in my spare time (my real job at Google is something else) was to get it integrated in libmv (http://code.google.com/p/libmv) and subsequently integrated into Blender. This opens the door to many sophisticated tracking features, like autocalibration, rolling shutter, multi-shot solves, tripod solves, planar tracking, and more.

EDIT: Changed link title to make it more provocative

When you say "a key component of Street View", which part is it a key component of? Snapping vehicle track data to existing roads? 3D model computation for scene-to-scene jumps? Both?

If you're snapping track data to roads, how do you deal with areas where you have no existing data? Do you take into account aerial imagery?

</openstreetmap-nerd jealous="true">

Google Street View was filmed at 30fps video format. You can generate a 2D route map directly from it without GPS or when GPS is not available.

I took some notes during the StreetView Q&A on reddit two years ago: http://blog.est.im/archives/554

Unfortunately we can't talk much about this; sorry.

There's some more discussion over at Reddit: http://www.reddit.com/r/programming/comments/t1mo7/ceres_sol...

Thanks for adding your note about integrating this into Blender--that makes the news much more relevant to my interests. :)

I was looking forward to integration of motion tracking into Blender and had a small play with the feature when released. Now I just need to make the time to play some more...

What exactly would you use this in Blender for? As a library for someone to build a script that would turn 2d photos into 3d objects or something? Could you expand on that?

Blender got integrated matchmoving / tracking with the 2.61 release, by way of libmv (which I am the BDFL of). A critical part of many tasks in 3D reconstruction and tracking is a solid, flexible nonlinear minimizer. Since Ceres is so easy to model problems with, is so fast, and is appropriately licensed (unlike some alternatives), it is an enabler for these more sophisticated tracking operations.

Note that Blender's next open movie, http://mango.blender.org, is live-action done with Blender's tracking tools. Ceres is going to be used for that as well.

I guess that one application would be camera tracking to implement automatic compositing of 3d models into videos.

It would be nice if we could use it, or (better) if Google exposed the ability in the Maps API, to fit paths to the most likely road, just like is being done in the demo video on the project page.

The roads are not an input to the optimization shown in the examples. How this works is left as an exercise for the reader.

Since it mentions sensor fusion, is this not just* combining the data from the GPS, compass, accelerometers (and any other sensor input from the car e.g. comparing photos, IR, wifi signals) to smooth out any irregularities in a single sensor and making better guesses where they all agree. If I'm right then the demo video is cheating a bit by only showing the GPS input initially and the rest of the info seems to come from thin air, which might be why people think it's working with the underlying street data. Dead reckoning with a compass and accelerometer and/or wheel odometer/tachograph might also have produced a similarly wonky line, but then you have two different wonky lines to average together to get a better approximation of the true path.

Note this supposition is based almost entirely on doing the first 3 weeks of the Udacity course CS373: Programming a Robotic Car, but it seems to fit quite neatly with the stuff they talk about i.e. a car/robot moving, guessing how far it's moved, taking sensor snapshots of its environment then cleverly combining the data to figure out accurately where it is.

Off-topic: scrubbing back and forth in the blog video (to get a better idea of what was happening) seemed incredibly smooth for me, is this a side effect of Youtube/Firefox using HTML5 video? (edit: trying it out with Flash, I get a fancier, but less useful pop-up when I try to scrub, a tech limitation of Flash or just a design decision?)

* not meaning to underestimate the hard work and genius that underlies this technology, but the same concept is used in such everyday items as Wii remotes and suchlike.

is any geometry point?

I mean, when you have a loop (say, around the manhattan grid), then I can understand how everything works when you have one hard(er) constraint such as, "the two farthest points visited are 2nd street at avenue C and 112th street at 12th avenue" - and get everything else aligned accordingly.

But if no external geometry constraint is involved (roads being the most abundant, but not the only type of external geometry constraint), then this reader has been unsuccessful with the exercise.

Awesome! I just filed a few bugs with patches that let it build out-of-the-box on Ubuntu. There didn't seem to be much in this space outside of levmar ( http://www.ics.forth.gr/~lourakis/levmar/ ); levmar is great for what it does, but some friendly competition is most welcome from my perspective.

It's really surprising how broadly applicable non-linear least squares is.

Trust me if I say that this baby's performance on large problems is to good'ole levmar what Ferrari is to tricyle. Or rather, don't, just benchmark them and see for yourself ;-) Kier, Sameer and Co. have done an amazing job here.

Working on integrating fixes now...


Wow. Released, then community-sourced patches offered and integrated, all in ten hours. Sometimes open source makes me feel all warm & fuzzy inside. :)

Thank you for the bugs and the fixes.

> Automatic differentiation

This is extremely cool. The documentation seems to be a WIP on this, can anyone comment on what kind of cost functions are supported by automatic differentiation?

EDIT: it would be also interesting to see how this compares to Theano [1], which also does symbolic differentiation and can JIT to native code and GPU. It is a largely more generic framework, but I'm not sure how well it can handle large sparse problems.

[1] http://deeplearning.net/software/theano/

You can define define arbitrary functions. You are only limited by the operator overloads defined in jet.h. Let us know if you think something is missing.

I guess you mean arbitrary expressions, just using arithmetic operations and the overloaded functions (I see that abs, log, sin, cos, ... are supported).

What happens if I try to use an if-then or a for loop depending on one of the variables?

It's not just expressions; it's any computation you normally would do with doubles. Take a look at the simple bundle adjustment example:


In particular, search for "SnavelyReprojectionError". In this case there are no loops or if statements, but they work fine too. The only issue with branches is that you can make the cost function discontiguous, which can cause issues.

The way this works is that you write your cost functions templated on T. For pure-cost evaluation, this is a double. For cost and jacobian, T is replaced with the "Jet" object, found in jet.h. However, you never see this; Ceres does the substitution for you.

There are some caveats; for example, sqrt(0) doesn't work on autodiff objects since the derivative is undefined. You have to use a taylor approximation or similar if the argument to sqrt is exactly zero.

If you want to see some more complicated code that runs with autodiff, check out the included rotation.h header. We use that internally for 3D reconstruction type cost functions:


We have conditionals overloaded which allow you to compare with other jets, and doubles, which boil down to a comparison with the "value" part of the jet.

So conditionals are not a problem. Loops I would stay away from, since they would destroy the inlining. But if you have an example of a dynamic loop, we would like to hear about it.

From a very brief look Theano is an expression evaluator. It optimizes expressions, it does not solve numerical optimization problems.

I think it can be used as a backend in Ceres to push computations to the GPU, but as it stands it is a different category of software than Ceres.

It means automatic numerical differentiation, not symbolic.

No, this is not true. Autodiff != numeric diff.

While we do support numeric differentiation, we don't suggest you use it. What we have is automatic differentiation, which is a technique to take exact derivatives without resorting to symbolic differentation.

Check out he jet.h file which is the core implementation of this:


The header comment has a nice description of autodiff.

In summary, Ceres supports three ways to compute derivatives:

(1) Automatic differentiation (easiest, fastest, most accurate)

(2) Numeric differentiation (easy, but worse convergence and hazardous)

(3) User-supplied jacobian (use a pen and paper then implement the jacobian for your residual manually)

I forgot an important technique that is somewhere between numeric differentiation and automatic differentiation:

(4) Complex-step differentiation http://citeseerx.ist.psu.edu/viewdoc/summary?doi=

The technique is clever: they add an finite difference along the imaginary axis, and this turns out to give a more accurate derivative than normal forward differences.

We don't support this in Ceres and suggest autodiff instead.

I apologize for replying without actually looking at your posted code first.

What I meant to say was, "If it's all in one C++ project, it's not symbolic differentiation" and I took the liberty of changing "not symbolic" to "numeric."

Having now read your posted code, I kind of feel like a 17th century naturalist trying to classify a platypus. Clearly this technique is not symbolic differentiation, since it can only produce numerical and not symbolic results (although I suppose the Jet template could handle an "expression" type as its first argument - have you actually tried this?), and it's not really numerical in the normal sense because there's no h parameter that changes the accuracy as it varies.

As an aside, "autodmatic ifferentiation" is a terrible name, in the sense that it doesn't convey any real information about the technique, except maybe that it's being done by a computer and I knew that already. It might still be a good marketing term (like how Richard Bellman named his technique "Dynamic Programming" because it's impossible to call something "dynamic" in a perjorative sense, although there really isn't anything intrinsically dynamic, and in fact you commonly end up solving "static dynamic programming" problems).

Automatic differentation is "symbolic" in the sense that the resulting derivatives are exact, in the same way that you get exact derivatives if you differentiate by hand and implement the resulting expression. There is no approximation.

Numeric differentiation has a specific meaning - which is computing derivatives via finite differencing.

There's three ways to compute derivatives; for whatever reason most people only know about symbolic (take the derivative by hand, implement it in C++) and numeric (implement your cost function, do finite differences on each parameter to determine the gradient) differentiation. Automatic differentiation is a totally different way to take derivatives. The Wikipedia article about it is fairly good.

We didn't pick the name.

Interesting; this is the first I've heard of automatic differentiation and it's quite elegant and exact. Do you know why it's not covered more in academia?

I've wondered the same myself. It's rather popular in industry. I hadn't head of it until coming to Google. Fun fact: backpropagation for neural networks, which is really just a fancy word for computing the derivatives of the cost, is equivalent to reverse mode autodiff.

For the people wondering what autodiff is, I found the following article a real eye-opener when I came across it (mainly, his paper he links is very accessible: http://homepage.mac.com/sigfpe/paper.pdf).


That's no Instagram, no Facebook, this is technology, not Hollywoodisation of Silicon Valley

Neat, what a pity that real hacker news gets under-appreciated!

Thank you for your contribution. It's great to see a giant, who's gained so much from open source contribute back in a significant and meaningful way! For those interested it appears it was open sourced under the "New BSD license."

What exactly is the operation being done in the video? My best guess is a noisy set of GPS coordinates / panoramas being repositioned via bundle adjustment on the street view panoramic captures. Or something.

In the video, no vision/bundle adjustment is used, just GPS, IMU and other sensors.

I think I can sort of see what's going on, we're trying to fit the data of some GPS position taken from a walk around the park or something similar, and fit it to some linear combination of basis of some "street functions" to minimize its two-norm, where the "street function" corresponding to each street for some point (x,y) is some measure of how far away from the street that point is.

For example, in the subproblem of only one-way streets that we can assume to be composed of one or more segments of straight lines per street, we take a simple street-function $\phi_i(x,y)$ corresponding to street i to be the norm of the projection of <x,y> onto that street. Furthermore we also add into the system some "smoothing-function" to ensure that the overall shape of the final path is doable, for example constraining the distance between successive points. Next, we solve the argmin of the norm equation for each point so that each point is now moved to some linear combination of the streets, and truncate all but the most significant street basis, and rerun until we get to some acceptable tolerance.


Johnny Lee has written a short post about Ceres: http://procrastineering.blogspot.se/2012/05/today-sameer-aga... "If I find the time, I might try to post some tutorials on using Ceres. Because I believe this is one of the most powerful tools in modern engineering, and no one ever taught it to me in undergrad or high school. It's like the difference between doing long division by hand and then being handed a calculator."

It'd be great if this is just the start of a trend where Google supports open source mapping tech that would make projects like OSM more viable (for reasons why its not currently viable see my other posts.)

Its a bit like Google being the biggest supporter of Firefox, even though they are also a competitor (Chrome.) Though Google supporting OSM is more like Microsoft supporting OpenOffice, since Google has such a large monopoly in maps right now.

From a cursory look - that's a nicely organized, formatted and documented piece of code. Well done.

In the video they figure out the road inside a tunnel, with no GPS data. How does it do that?

Fusing sensor data (IMU, etc). You can formulate this as a (extremely) giant sparse least squares problem.

The video in the link is awesome. So that's how Google figures out roads and stuff!

This is much more than a key piece of Street View. It's an awesome building block for thousands of very diverse AI applications. Great contribution!

Google has often contributed code to the open source world (like this). However they don't AFAIK release any data. Google drove cars around the world photographing everything. But they claim copyright on it all. If they allowed derivative works of it, then Open Street Map could bloom and benefit massively.

Downvotes because I point out how Google can donate data, not just code? ☹


Took long enough but we finally made it!

Good job !

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact