Hacker News new | comments | show | ask | jobs | submit login
Augmented reality with Python and OpenCV (bitesofcode.wordpress.com)
291 points by galloafro on Nov 14, 2017 | hide | past | web | favorite | 22 comments

here's a secret you can be in on too if you have access to ida/hexrays: every single AR sdk (vuforia/kudan/wikitude/easyar/arkit/etc.) is 90% opencv code, 5% novelty, and 5% glue.

Hmmm... makes sense. As someone interested in using AR for creative/art uses, I wonder if learning OpenCV would be a good route. It’s tempting as I already use Python for a lot of work. But, I am not predominantly a programmer. Would this impair me when it comes to distribution? Or cause other complications? Maybe I’ll just have to find out the hard way.

image recognition/homography sdks aren't real ar (of the sort of arkit and arcore) and arkit is a free unity plugin. the only thing you would use vuforia et al is image recognition. if you do need that then my recommendation is to use vuforia but just the trial.

Thanks for the response. I prefer Unreal engine, and evidently it is also offered as a plugin for Unreal. I only prefer Unreal because my experience with Unity involved some terrible black boxes with lightmapping and unpredictable material and texture systems.

i would love for there to be an officially supported vuforia plugin for unreal but unfortunately there is not.

Is there a particularly good book that you could recommend on learning OpenCV? I have been wanting to try this for a while now, but have unfortunately never found the time.

The book Mastering OpenCV helped me a bunch several years ago during implementation. The best advice I can give along with this book is to try and understand the fundamental geometric problem that computer vision tries to solve. It's been the same for like 30-50 years...we've just thrown some ML at it for these past 10-20. For that, I recommend a copy of Trucco and Veri's book that you can probably find floating around online (link below). Good luck!



Awesome! Thanks for your help!

The "official" O'Reilly books are pretty good, Learning OpenCV by Kaehler and Bradski (I say official because Gary Bradski was one of the original maintainers). Make sure you get the new one for OpenCV 3. They cover a lot of the algorithms behind the code which is nice.

There's also https://www.pyimagesearch.com/ which is great. It's ultimately a hook to get you to buy the course, and I think the teaching style may put off some people, but everything is thorouhgly explained and there's a lot of cool mini projects.

I've used pyimagesearch.com a few times for various things - its not perfect, but the example code is pretty useful.

The OpenCV examples look a lot less smooth and well registered than the ARKit demos I've seen. Do you put that down to 'novelty' or 'glue'? (Or 'advertising'? :P )

Or maybe it is because it isn't actually mainly opencv code and the top comment didn't actually look at it but nobody checked and just upvoted.

One thing that ARKit is able to do is utilize information about the device motion from the accelerometer and gyroscope. Also, I believe it is a slightly different technique than what is described in this post. I think they do more of a 3D reconstruction (i.e. structure from motion), and build up a sparse model of the environment, kind of like SLAM. That can perform better because it uses the knowledge that the environment is relatively constant between frames. This post, and most examples you see, essentially start over from scratch each frame, throwing away all the information from the last tracking frame.

Of course, many of the algorithms you need to do that are part of OpenCV, so maybe it is still 95% OpenCV!

Probably a combination of not using something as simple as RANSAC, and passing the output through a Kalman filter to smooth it out. (so: sufficiently advanced glue until proven otherwise)

That’s certainly true.

The next thing I will do if I keep working on it will be to implement a Kalman filter to reduce shakiness. It could visually improve a lot the results since coherence of the projection between frames would make it much more attractive.

And I think this is easier to achieve with a Kalman filter rather than using a more precise estimation method.

Yeah, that alone might do it. Only one way to find out :-)

>5% novelty

Oh man this is so true. Here's what you can do with drones by using opencv and the opencv_contrib extra modules along with a usb radio transmitter and web camera for tracking position with a bit of matrix math. OpenCV is an amazing library.


Pick a task that needs to use images and you'll find that OpenCV is probably what you want to use.

I looked at OpenCV and while OpenCV itself has a pretty decent license, its dependencies would give your lawyer PTSD if you were working on a commercial product. I’m not sure, therefore, how ARKit in particular could include it.

It's scary how much of this could be done 10-15 years ago with exactly the same tools. They really haven't changed much. The difference now is that everyone has a mobile computing device in their hands, with the power to run this stuff.

The transition in eye tracking may be especially dramatic. From 1500$ specialty item to 1$ MEMS at Mega-unit scale in perhaps one year. Until scale, there's no reason to price low, blocking adoption.

I can wave my hand as a 3D mouse above my laptop keyboard. The hardware is years old. The hand-tracking software was created recently to chase VR gaming dollars. I was talking to someone not-quite legally blind. VR, what's in it for them? Well, a lot of input tech and haptics possibilities are getting unstuck.

> The difference now is that

It's not just market prerequisites, but also perceived market validation. One might make an HMD with 4x-linear greater resolution than current, with existing(?) panels. But "who would buy it?" The game market is somewhat established, but not yet a fit. The professional/commercial market is less established, but a candidate. So maybe next year, priced at 5000$ to 10000$. And if it turns out there's a market for screen substitutes, perhaps a 10x price decrease the year after.

China is much better than the US at doing "run in all directions" product space exploration, but I've found it surprising just how slowly and poorly even that's working around AR/VR.

The same can be said for machine learning in general, we just got really fast at matrix multiplication.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact