Hacker News new | past | comments | ask | show | jobs | submit login
The new Photosynth (photosynth.net)
445 points by ot on Jan 7, 2014 | hide | past | web | favorite | 91 comments

If you didn't bother to read the "learn more":

When in the viewer, press C to see the 3D interpretation of individual shots and M for a map of the path taken by the camera.

Controls in 3D mode: right-click and drag to move the camera, scroll to zoom. Arrow keys move the camera location.

Play the "walk" photosynth of the motorcycle in the alley, then press C and move around to see how the 3d model of the reflection in the puddle is a surface far underground where the reflected building appears to be! So cool! http://photosynth.net/preview/about

Another bit of code in this space is libmv: https://github.com/libmv/libmv.

libmv's codebase seems to be forked, with an earlier version at goggle code: http://code.google.com/p/libmv/ which also contains an interesting summary of other libraries in the 3D reconstruction space. Blender also has its own fork, which it uses for matchmoving, which is the integration of animated objects into a real world scene.

In turn, libmv seems be be influenced by the work of Marc Pollefeys? The tutorial is a readable summary of how to go from a collection of 2D images to a 3D model.




Question: Can a knowledgeable person here suggest which codebase is the best to start experimenting with, to build an application that converts a 2D photo sequence into a dimensionally accurate 3D model?

Sure! My lab focuses on SLAM and 3D reconstruction, especially for robotic applications. We've developed a BSD-licensed C++ (w/ MATLAB wrapper) library with specific applications towards 3D reconstruction problems such as SLAM and structure from motion. It's called GTSAM [1]

We actively maintain and release new features as they are published. While we don't provide a full out of the box pipeline (yet!), there are plenty of examples and documentation which walk you though the math, implementation, and other issues. If you want to read about the graphical models underlying GTSAM, see [2]

Utilizing OpenCV for feature detection and association is pretty much all you really need to add to a program in order to recreate Photosynth using gtsam. I'd also you recommend KAZE features from a former post-doc out of our lab, it's state of the art and recently added OpenCV wrappers[3]. However, it's also trivial to integrate other sensors such as IMUs, GPS, lasers, etc. for full navigation problems.

If you wish to know more about the actual subject, I definitely recommend Hartley and Zisserman's Multiview Geometery Book[4]

[1] https://borg.cc.gatech.edu/borg/

[2] http://www.cc.gatech.edu/~dellaert/pub/Dellaert06ijrr.pdf

[3] https://github.com/pablofdezalc/akaze

[4] http://www.robots.ox.ac.uk/~vgg/hzbook/index.html

do you think it would be possible and/or worth it to reconstruct a automotive road course? this is something i've always wanted to do. there's a ton of video online (though usually shot from inside the car.)

the one i spend the most time at is pretty flat, though. i suspect the green hills are hard to get a match on? here is an example: http://www.youtube.com/watch?v=EyZcERAlBeE

I often go on rides around mountainous areas on my motorbike with a video camera attached to my helmet (some videos here http://www.youtube.com/user/slashtomeu/videos ). Since I carry a GPS as well, I've always wondered if I could reconstruct a 3D model of the landscape I ride around in.

You need to come & talk with openstreetmap people more.

Although I don't have first hand experience with the code, a former labmate of mine (I'm at the Univ. of Washington, and he's now at Google) is one of the leading experts on this area of research, and his VisualSFM [1] tool is, I think, the best and easiest-to-use available online.

Briefly, there are three main steps required to go from images to a 3d viewer like PhotoSynth:

1. Figure out where each image was shot from (the "camera pose") and get a sparse set of 3d points from the scene. These two are estimated simultaneously using bundle adjustment [2].

2. Go from a sparse set of 3d points to a dense 3d model. This is done using a technique called Multiple View Stereo (MVS), of which the leading (open) implementations are PMVS/CMVS [3,4].

3. Build an image-based rendering system that intelligently blends between the 3d models and images to minimize artifacts.

The VisualSFM software will do steps 1 and 2. Step 3 is still quite a challenging problem, but depending on what you're doing, you could use standard 3d modeling environments to look at your data.

[1] http://ccwu.me/vsfm/

[2] http://en.wikipedia.org/wiki/Bundle_adjustment

[3] http://www.di.ens.fr/pmvs/

[4] http://www.di.ens.fr/cmvs/

A good place to start is the Photogammmetry forum, there you find info about photosync, bundler, PMS, etc:


Glad to see they're still working on this. I remember seeing a demo of this some years back and being really impressed.

A nice reminder that Microsoft really does have some great engineering talent and they can break new ground.

> Microsoft really does have some great engineering talent

Not to dispute this--they certainly do.

But sadly for Microsoft, Blaise Agüera y Arcas (one of the creators of the original Photosynth) just left MS for Google in December.


Microsoft is always doing incredible research, just depressing that it's rarely converted into a compelling consumer product.

Well, Ballmer's gone. So maybe they will figure out how to pull together their innovation. It is kind of deflating sometimes to realize that MS was doing something that they simply dropped the ball on. The smart phone is solidly one of those things. They dominated the PDA market prior to the iPhone coming out. All they had to do was put some focus towards it.

They were also ahead of the curve with tablets. Bill Gates demoed a tablet in 2000: http://news.cnet.com/2100-1001-248474.html

The first photosynth iOS app was pretty cool and really well made when it was first released. Hopefully it'll be updated with this new 3d stuff!

I remember it used to seriously limit the resolution of the pictures though. You'd stitch together lots of high-res photos and yet the app would merge them into a low-res output. I wonder if they ever changed that?

If I remember correctly, that only occurred if the panorama was extremely wide or extremely tall in aspect ratio. I think it had a limit for the maximum dimension. But I remember exporting approximately 4:3 panoramas that were quite large.

Also very impressed here. However, it appears to have a huge limitation over the previous implementation: You can only move in a linear fashion though the scene. Gone is the ability to jump around in a 3D environment. At least from the first handful of examples I looked at.

Does anyone see an environment in this new version that still allows freedom of movement?

My guess is that they are optimizing for smartphone use cases: ie record a quick video instead of stitching together photos from lots of different folks. Using linear video as an input means you have knowledge you didn't have before: successive shots must be taken from a relatively close position and direction.

The technology is clearly deeply related to the more freeform movement variants of the past. It's likely that even better freeform movement can be stitched together from a collection of linear videos than could be from a collection of stills in the past. I wouldn't be surprised if we see that start to happen soon.

This isn't really about 3D reconstruction (there exists better solutions for that already, see http://vimeo.com/61625715) rather it seems that it's more about stitching consecutive photos together. If you press C in the demo you'll see that it appears to create different 3D geometry for every photo. This approach allows movement in the scene.

They have translation and rotation. What path can you not construct from those?

It's my understanding that you can't use them simultaneously. You have to choose one of the types: spin, panorama, walk, or wall. (http://photosynth.net/preview/about)

In the previous version, random pictures around a scene could be stitched together allowing an experience you could explore. For example, one of the original, popular photosynths allowed you to explore inside an art studio. You could look up at the ceiling, walk on various paths, move close into pictures, etc. In this new version, you're stuck on rails.

tl;dr: In this version each node has two exit points: next or previous picture. In the previous version each node had an unlimited number of exit points to other pictures.

Note you can press "c" to make the camera break free of the rails

Thanks for the tip! But this allows you to move your perspective outside of the rail system, but doesn't change the fact that you only have images established on the rail system itself.

I've been excited about this project ever since I saw the Ted talk by Blaise Agüera y Arcas. Here are some related projects.

multiple photos: http://www.123dapp.com/catch

Single photo: http://make3d.cs.cornell.edu/ http://www.3defy.com/ and http://hackaday.com/2013/09/12/3-sweep-turning-2d-images-int...

video: http://www.3ders.org/articles/20130729-disney-new-image-algo... and http://punchcard.com.au/

That talk was was a lot of fun to watch. And the first half of that technology in the talk, Deep Zoom, is now open source: http://openseadragon.github.io/

I'm sure I'm missing something with this particular one, and I'm sure it's got a lot of clever tech behind it, but is the end result actually achieving anything more than a simple video of the same walk?

bluekitten, you have been shadowedbanned for 11 days for some reason. You might want to contact info@ycombinator.com to find out why.

Haha, the end kind of freaked me out. This would be fun for making scary photo collages.

It is impressive. But, I have to wonder: what does this get you above-and-beyond taking a short video of an object, and then allowing the view to "scrub" back and forth within the video?

One thing that's illustrated in the demos is that you can zoom into detail in the photosynth images that you couldn't in a video.

I imagine there could eventually be better interactivity with the underlying 3D model than video could provide. Certain surfaces could be links to more information or another photosynth, for example. It kind of reminds me of some of the VRML demos from the 90s, but without the plugins and working backwards from photos instead of forward from models.

Photosynth collages can be created by stitching together a lot of disparate photos. So you can have a 3D, interactive representation of, say, Trafalgar Square, created from photos available on Flickr.

One nice thing is that the interpolation smooths out any bumps, so it's kinda like using a steadicam.

the movement is much smother than anything I could accomplish with a cam ...

Mount it on a dolly?

Excellent point. It seems to have lost functionality over the previous implementation (see my other post).

A friend wanted to know the nitty gritty details of how this new Photosynth works. I don't work on it, but I saw a talk on it this summer and I've worked on similar projects, so I wrote up all that I know/can speculate.


Cool, I remember it.

Is it possible to release the source code of some of the older projects like the PhotoTour Viewer?


Amazing stuff, and much nicer without the Silverlight requirement!

There are still some strange artefacts remaining, though. For example, on this demo - http://photosynth.net/preview/view/c7287786-a863-4291-a291-d... - watch the bases of the dragons as the camera pans left to right. The first two seem to stitch together fine, but the last two go wrong and bend outwards as if they are moving in the wrong direction. It's strange because other parts of the scene are perfect.

seems a bit of a step back in possibilities, but with streamlined UI. With the old photosynth you could even extract the point clouds from a bunch of photos of a scene you uploaded.



But making meaningful 3d triangulations out of point clouds is a whole other story.

The glitchy charme of the new pales to the wonder of seeing a explorable pointcloud created out of a pile of photos from Stonehenge.


WTF, Microsoft: "Your password can't be longer than 16 characters."

same on Office 365 (if you don't use SAML)

Geez, Google, you've limited my passwords to 200 characters. What gives? Microsoft allows passphrases with SAML... though at that point (>200) it might be pass-paragraphs.

I know this is a joke, but allowing arbitrarily long passwords allow a DOS attack if your server uses bcrypt or similar (consider uploading a 1GB password, for example)

Good point. You need to draw the line somewhere. I wrote about 200 character limit Google uses because I hit it the other day. I wondered, but that makes sense. Wouldn't surprise me if they also took networking into consideration too.

My company (360 Cities) works with some of the people who are on the Photosynth team, and I have had the pleasure of talking to some of them about all the different iterations of Photosynth over the years, including this new iteration. I'll share some of my own observations (without breaking my NDA ;-)

First off, I have a ton of respect for everyone I've met and spoken with on the Photosynth team. They represent all that is great about Microsoft Research (well, Photosynth has moved to the Bing Maps department a few years ago).

The first iteration of Photosynth was the one shown by Blaise Aguera y Arcas in what is now one of the most popular TED talks of all time [1]. Basically, it automatically arranged photos in 3d space from where each picture was taken, and allowed the user to "fly" from one photo to the next, giving a real feeling of navigating through 3d space.

The prospect and amazing, working demonstration of taking all the world's photos and mapping them together into a single quasi-3d space was a pretty incredible idea (for which Apple has just had a patent approved - WTF! [2]).

The Photosynth service itself in my opinion did not go far enough to combine the content of different users in order to achieve the goal achieving huge groups of images spanning very large (even city-wide) spaces. (There must have been significant usage / copyright issues which prevented a service like this from aggregating as many photos as would be required to achieve this).

On the user side, regular people had some trouble with the UI of Photosynth -- while the technology was obviously impressive, breathtaking at times, navigating this 3d space on a 2d screen is a very difficult thing to design well, and there is a learning curve. This was something which I think prevented more wide-scale adoption. (The other thing which personally turned me off was the silverlight requirement...)

Around the same time, Google built a "look around" feature in Panoramio [3] which was a very similar functionality, but remained fairly obscure, despite being eventually baked into the Panoramio layer on Google Maps/Streetview.

A couple years later, the Photosynth team built an iPhone app for stitching panoramas, and redesigned the Photosynth service to be more centered around 360° photography.

The Photosynth iPhone app was absolutely groundbreaking for its time, blowing away every comparable app in every respect (the size limitation of the output pano, as remarked elsewhere here, is small, this is mainly due to the strict RAM limitations of the iPhone, rather than any fault of the app itself). It has taken 3 or 4 years for anything to catch up to the quality and usability of the Photosynth app (Android Photosphere now has that crown).

Now, we are seeing the "New Photosynth" (which Microsoft seems to be calling "Photosynth 2" but it seems to me more like "Photosynth 3". This New Photosynth, to me, is simply awesome. What is interesting about it is that it seems to have the same guts as the original Photosynth, but the UI is completely redesigned and built in a very linear way, which is obviously addressing the original "weirdness" of the Photosynth 1 UI. This accomplishes a few things: it directs users to make a more consistent type of content (you now have 4 different types of photo sequences you can shoot), and it gives viewers one and only one way to consume that content. It also allows a better kind of "autoplay" functionality, if you want to simply watch the sequence of images without interacting with it.

What I don't like about the content that I've personally created so far is that it seems to be quite glitchy. Even when I shoot something carefully, there seem to be numerous artifacts in the 3d shapes that are created. I am guessing that this could be reduced considerably if the full resolution of the images was used for the 3d reconstruction, at the expense of more expensive computation.

All things considered, I really like where Microsoft is headed with Photosynth, and I look forward to seeing where things move.

One hint at what could be to come is that the amazing new Ricoh Theta has Photosynth support [4][5], which hopefully means that there will be some way to join together spherical panoramas into a "synth" at some point in the future, allowing a more freeform navigation within the 3d space.

[1] http://www.ted.com/talks/blaise_aguera_y_arcas_demos_photosy...

[2] http://venturebeat.com/2014/01/07/apple-patent-street-view/

[3] http://blog.panoramio.com/2010/04/new-way-to-look-around.htm...

[4] https://theta360.com/en/info/news/2013-10-07/

[5] http://blogs.msdn.com/b/photosynth/archive/2013/09/20/photos...

Does it still require Silverlight?

No. I "click to play" plugins like Flash and Silverlight and the demos worked without my enabling any plugins.

Great, thanks. Didn't want to go to the trouble of creating an account if I couldn't use it :).

Looks like it is WebGL based

What is the business model for this? (just curious)

Microsoft has a budget of around $9 billion dollars per year for Microsoft Research. I'm pretty sure Photosynth falls under that umbrella?

It's a tech demo. I don't think they created a business model.

Here are two (non-new) photosynths I took while in Paris this summer:

The Louve: http://photosynth.net/view.aspx?cid=3d67aa96-ac60-43ee-9644-...

Underneath Eiffel Tower: http://photosynth.net/view.aspx?cid=f0f50007-42cb-4236-83a9-... (look up!!)

They're very fun to take and the apps they have make it super easy to do. Curious to try these new versions (though they seem sort of more cumbersome..)

Very cool, but I found the interpolation artifacts to be super distracting.

As a VJ I find the articfacts really interesting and plan on using it to generate new footage.

I liked the idea, I think it's a different (not better nor worse) to explore imagery. I even found it's more intimate in some ways... like discovering some details of the images.

some of these remind me of what I felt the first time I played Myst.

This is really impressive, and I can see it being useful for a lot of people - the GoPro crowd, and people selling uncommon things on eBay or Etsy come to mind.

In addition to being an impressive demo, the type and range of experiences represented in these is interesting. After some random clicking, I saw a:

  - walkthrough of a wealth manager's office
  - boat cruising around a marina
  - a walk through an exclusive shopping district with an Hermes and Louis Vuitton.
  - a duomo in Florence.

The samples are really, really cool. But it got me thinking: aside from a bit of parallax, what's the practical difference between this and 60fps video with a smooth/intertial seek slider?

EDIT: I guess not having to use a dolly for smooth motion is a huge plus. But the tradeoff, of course, is loss of quality in the interpolated "frames".

I can't wait for the next generation Richard Linklater to make an entire movie using this technology.

I bet it would be an awesome experience to view those with an Oculus Rift. Even moreso in high resolution: http://www.wired.com/gadgetlab/2014/01/oculus-rift

Looks to me like this version of Photosynth doesn't let you look around like old versions did however and seems to follow a fixed path with fixed camera angle?

I guess the Oculus would improve the 3D aspect of it but you wouldn't be able to look around left and right while traveling.

I don't get what's really cool. Viewport angle and camera path is fixed, than what's really better than just a video? Can let me know the point of this tech?

I tried pressing `c`, but it showed only some cracked images, doesn't seem really meaningful.

Try and take a sequence of ~20 photos (you can see how many photos were taken by counting the dots after you hit `m`), and stitch those into a smooth video to simulate this effect.

The smooth transitioning between the stills is the tech here.

I registered, verified my e-mail, but then I opened the page and clicked the create button, I succesfully uploaded my synth, just waiting it to be preprocessed. Not sure, but it might be a way to pass the waiting for invitation.

What's the difference between this and a video.

Given that most cameras nowadays record video this seems rather pointless. Only advantage I can think of is that you can normally record higher quality images with stills vs video.

Hmm: no way to download the 3D objects created.

Interesting tech, but that's a real pity.

It is WebGL, it is running client side, there might be a hack, but probably hard to get.

Reminds me a lot of QuickTime VR. Anybody old enough to remember it? That was 90s tech, and worked on the Web too. It never caught on though, despite QuickTime being fairly widely deployed.

Quicktime VR was simply a 360 panorama player. (And it also did "object movies" which are just glorified slideshows, simulating different views around the circumference of an object)

I beg to differ than it never caught on ;-) full disclosure: I've based my career around it ;-)

my college fraternity chapter house is getting demolished and rebuilt this summer. it hit me like a ton of bricks because I have so many college memories there and was saddened I couldn't walk through it in another 20 years. I think i'll have to take a shit ton of pictures so I build a photosynth reconstruction of it :)

Chrome keeps giving me the "aw snap" page when trying to view one of them.. :\

The giant shadowy hand on the intro video was an interesting choice.

What a great example of pushing technology further.

I wonder what this technology will allow us to do?

....so it's a really glitchy video? Does it need glasses?

It is built from images. Of course there will be glitches.

My point is how is it better than simply taking a video which is also built from images.

It generates a 3D model, its purpose is not to just show around.

Saw "Sign in with Microsoft" and left.

You left too soon. No sign in required. Your loss!

I think it could do with a little bit of information on the first page to tell you what it actually is. If it was not a highly voted link from HN I'd most likely not bothered to actually figure out what it is.

Also in Chrome canary it frequently crashed the tab or gives the "WebGL hit a snag" message, which requires you to click reload before the site works properly again.

Edit: Why is this been downvoted? All you get on the first page is a large photograph and a circle with more photos. Until you click a photo, or learn more, it is not clear what the site is about...

Also, think I'm missing something because I just get a HTML5 video of a scene.

If you're using Chrome Canary, shouldn't you try regular Chrome first before you complain about bugs?

It's not a complaint and it might equally be a problem with Chrome Canary itself and have zero to do with the site.

However, I'd personally find it useful to know if my project was working in pre-release browsers, especially if it is more than your basic web app, to ensure future compatibility before that pre-release version makes it out as a stable version.

You've been downvoted because your criticism is unwarranted. On the front page, there is a large link "Learn more" to a page that very clearly explains what PhotoSynth is. There's also a menu with similar info.

That's not my point though, I'm talking about what you see when you visit for the first time.

If you visit the site for the first time, like I just have, you have no idea what it is. You are just looking at a large image with some additional photos in circles. Just a simple phrase, such as the first line from the Learn More page: "Capture [and view] the places you love in amazing resolution and full 3D.", and perhaps a "Try it, select a scene" on near the circles would make it much more obvious.

Or perhaps even better, when you visit for the time time give them a quick demo or walk through.

n.b. if you view in Firefox, it might not load correctly. I got a mostly blank page, but when I viewed it in Chrome, the page loaded fully.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact