Hacker News new | comments | ask | show | jobs | submit login
Introducing ccv, a modern computer vision library (libccv.org)
301 points by liuliu on June 30, 2012 | hide | past | web | favorite | 54 comments

Had to do (to build on ubuntu 12.04)

    sudo apt-get install clang libjpeg-dev libpng-dev libfftw3-dev libgsl0-dev libblas-dev liblinear-dev libblas-dev
    cd bin/
Then add -lgslcblas to lib/.LN and run again, make

At minimal, it should be able to build without any dependencies (drop the code into your XCode project, and it will run). But to use command line tools in ./bin, you need libjpeg and libpng indeed.

libgsl and liblinear are required for all the training programs (bbfcreate, dpmcreate). libfftw3 can help with dpmdetect performance.

Nice work, from what i can tell! How does this compare to open cv? Are there python bindings in the works yet? Congrats!

Hmm. What's the main motivation to use libccv as opposed to OpenCV?

Personally, I am a little frustrated by OpenCV's move to C++ interface. Yes, it looks like a high-level language by mimicking Matlab style. But at the end of the day, the implementation uses many advanced C++ features, and ignored one thing that C++ is really good at: abstracting to objects / object interactions.

Computer vision task is very different, from, for example, a game engine. From engineering perspective, computer vision tasks are very functional. You throw an image to it, get some semantic information out of it, and done. The only objects there are images, and it is boring (Really? You need a inheritance relationship between different type of images? Tell me what's the behavioral difference!). Thus, object-oriented style is not the best abstraction for this kind of task at all. However, C++ is all about object-oriented programming! (and it is really useful for game engines, where you actually have a lot different objects and interacting with each other).

(Then here is the question, how to support different types of image? From the people I talked with, it seems to me the general conscious is to use generated code no matter you use c or c++. Generated code can be type safe, and still very fast. Effectively, this is what template does for OpenCV and macro does for ccv, although, both approaches are not the best.)

Take the end result, it is easy to use OpenCV's new interface (if you are familiar with Matlab), but hard to hack in. OTOH, it does provide quite a lot algorithms for you to experiment with, but then, if you need to write something new (the end goal of experimentation after all), it is really not an easy undertaking.

ccv takes a different path. Don't experiment with it, use it in your application, trust the underlying implementation is the best-of-its-kind, and you will be happy.

> However, C++ is all about object-oriented programming!

That really depends on how you use the language. Looking at the code I normally see there's almost no inheritance, but lots of static polymorphism through templates. Data tends to be stored in structs, tuples and the default data structures (vectors, mostly), and a lot of use is made of the new lambdas and the functional bits of the STL.

I know that you can write Java in C++ (and that many people do), but it's not really the language's raison d'être.

This matches my experience with opencv. Great job liuliu! For practical tasks your libccv looks really useful.

Hmm, sounds to me like one of those "Flask vs. Django" arguments - Flask has less features, hence it's more flexible and better.

One of the creators of Django once told me, "Flask is what Django would have been if we designed it in 2010 instead of 2006." Sometimes smaller is actually better.

There's a difference between software having features and having opinions. For example, Pylons (http://www.pylonsproject.org/) ships with a large feature set but doesn't feel like you're forced to use any of them, whereas Django ships with a larger feature set and doesn't give you much choice.

Naturally, there are advantages and disadvantages to both approaches.

I think, judging from his commit comments, the focus was on applications rather than completely covering all possible algorithms without an eye towards ease of usage and implementation.

also check out http://simplecv.org

Nice work, but you need to add a license for the library, if you want people to start using and improving your library.

+1 more software should say what the copyright is.

On their github page, COPYING has a licence ( https://github.com/liuliu/ccv/blob/unstable/COPYING ). Looks BSDlike

It looks like they link to the GNU Scientific Library which is GPL so they need to have GPL compatible licence.

Yes, it is BSD 3-clause license. Added a link in front-page. As of GSL, I am not a guru on licensing, AFAIK, dynamic linking is a gray area and you can compile ccv without GSL dependency.

What about the SIFT and SWT patents?

A bit off-topic: a prof. at the Uni once told me that David Lowe (who invented SIFT) never got any money from the patent (which I think belongs jointly to him and UBC) because commercial implementations would make a few changes to the algo and just use it, without citation or license. If any HN'ers know more about this, I would be glad to hear it

While it does seem to be regarded to be easy to avoid infringing on the SIFT patent by changing the algorithm in a minor way (e.g. using SURF features but doing SIFT-style search), I know there are companies that have licensed SIFT from Lowe, and I would expect he's been paid for those licenses.

IP is protecting its usage (real world applications, ideally, with assessed damages), rather than the implementation. The analogy is: bulb is patented, but the patent shouldn't cover the bulb factory as well. You can pay for the damage, or shut down the factory, but you cannot force to dis-integrate the factory.

You can use patent algorithms if you have proper license, but it shouldn't forbidden various implementations from ground up. (however, you can form the argument that the implementation is dedicated to violate the patent rights, thus, should be responsible as well).

As always, software patent is a bch. Without a court trial, no one knows the exact answer.

Check Megawave for another great C library with excellent and quality algos!


the library design is a bit dated but the code contributions are outstanding!

Also, I think the way forward is something like scikit.image (python library) with a way to add C-coded algorithms

Doesn't scikit already have an obvious way to add C code? It's Python. There's Cython/raw CPython API if you don't care about compatibility with alternative implementations, and ctypes (shudder) if you do.

Way off-topic, but are you saying that ctypes is less pleasant than the CPython API? My experience with the latter wasn't all that pleasant...

The radix tree caching mechanism is interesting, and seems very general. I'm new to techniques of randomized algorithm analysis, so I can't whip up a perspective of my own yet.

Does anyone else recognize this? Has it been rigorously treated? Seems like a similar approach to more applications in general using bagwell tries would be useful for both scala and clojure.

Check also http://www.vlfeat.org/.

It is quite used in the research community, with new algorithms coming soon (check this: http://eccv2012.unifi.it/program/tutorials/modern-features-a...).

I admire their work, http://libccv.org/post/call-for-a-new-lightweight-c-based-co...

Their documentation is very nice, and back then when I implementing linear MSER for OpenCV, I learned a lot from their semilinear MSER implementation.

But, I never understand why they want to mimick object-oriented interface in their implementation. Especially, their abstraction strikes me as "weird". Each algorithm implementation is an "object" to encapsulate some intermediate data, or to retain some parameter settings? It is a far stretch to what these computer vision algorithm does.

In my mind, computer vision algorithms are functions, which takes image/image sequence in, and emit semantic information out. Reusable intermediate data should be implementation detail, it shouldn't overshadow how the interface looks like.

Sorry if I am acting too opinionated on this topic, I am too passionate about good interface design.

In vlfeat there is a strong focus on the algorithms, since it is mainly developed by a researcher for his own research. This is the reason of the "weird" object-oriented interface.

Keeping the "status" of the algorithm in an object gives the following advantages:

- The interface encourages cleaner code: computer vision algorithms often come with many parameters. One has to use his own data structure, or many variables or constants to keep them in memory, and then call a function with a very long signature. This produces not so clean code, and the long signature could lead to mistakes in the parameter values. Using an object with get/set functions make easier to define in an iterative manner the parameters and call the function in a clean an less error-prone way.

- Many computer vision algorithms are iterative. For this kind of algorithms, having an object representing the status of the algorithms allows to stop/restart/continue it easily. The possibility to check the convergence of certain algorithms is vital in computer vision (SVMs for example).

You are right when you say that implementation details shouldn't interfere with the interfaces. But vlfeat doesn't have a general "extractFeatures" function, it implement different famous algorithms. So it makes sense to assume that if one uses SIFT, have a basic knowledge of how it works and prefers to exploit it at its best (in the documentation you can find basic information of any algorithm anyway).

So I think the two library are different, libccv focus on those developers that want to use computer vision algorithms without a deep knowledge of them, while vlfeat requires the developer to have a proper computer vision knowledge, but gives more power to him.

Nice job liuliu. Being new to CV, I have what is probably a stupid question: I see that you do text detection, but what then? Does your library have the ability to do OCR on the text? And if not, what is your recommended way to do OCR on the text?

No, ccv doesn't do OCR. Tesseract: https://code.google.com/p/tesseract-ocr/

Thanks! That looks like just the ticket. Do you have any tips for efficiently combining your library with Tesseract?

OpenCV is intended to be useful for computer vision researchers to test new functionality easily and also for developers to put together CV based functionality easily. I think the new C++ interface is pretty neat. Although I do agree that it is becoming humongous, that is a sign of maturity rather than neglect. As a computer vision developer, I believe that choice of algorithms, and ability to use low level functionality is key to developing robust computer vision applications. Mostly because there is no one size fits all. Probably you are aiming for fewer use cases than what is possible with OpenCV. In that case, it makes sense.

Just thought you might like to know that there is another project with the same name. (http://ccv.nuigroup.com/). Although they seem to stylize it with all caps.

Cool though, looks good.

I just started reading the OpenCV book this week. Eerily apropos.

Suggestion for liuliu: list supported platforms in your Readme - it may save you some email. And it seems you're targeting Macs from all the Xcode comments.

Could this be adapted to run on arduino? Or does it require too many libs?

I suspect it requires too much everything (cpu, ram, io speed), by several orders of magnitude, to run on an arduino.

If you're seriously asking, and this isn't some kind of internet meme/joke I'm not familiar with, you might find it interesting and instructive to have a look at the datasheet of the AtMega328[1], the chip an Arduino Uno uses, and compare it to, say, what you might find in a typical intel desktop machine.

Just as one crude scalar comparison to illustrate the sort of differences, the atmega328 doesn't have the ability to do floating point operations in hardware , so you have to emulate them in software using the core's fixed point hardware (the toolchain will do this for you). This means potentially hundreds of cpu cycles per floating point operation. In the past i have managed to get an atmega328, at 16Mhz, to perform about 60,000 floating point multiplications per second. According to the wikipedia[2], a 2010 Intel Core i7 cpu can do about 109 billion floating point operations per second. That's nearly 2 millions times the speed!

If you want something of arduino size and arduino price, that will let you play with machine vision, the Raspberry Pi might be far more useful to you.

Finally, I'm not saying you can't do CV on an 8-bit embedded micro, I have done some in the past myself, but it's a very different and far more limited beast to what most people would understand as modern CV and required in my case an great deal of hand optimisation and gory bit-hacks.

[1] http://www.atmel.com/devices/atmega328.aspx [2] http://en.wikipedia.org/wiki/FLOPS [3] http://www.raspberrypi.org/

Thanks for your response, and no it was not a troll but a serious (albeit naive) question. Thank you for answering my question with good references and very respectfully. I actually do know a fair amount about Arduino and I should have put it together that the 328 wouldn't be anywhere near fast enough. I'm sure it would be pretty easy to run it on a Raspberry Pi and I have been clamoring to get my hands on a RasPi but am still in the preorder queue :/

But Arduino also has the upcoming Arduino Due which is based on a 32bit ARM Cortex M3, which still may be powerful enough. I guess I would revise my question to be would it be possible to get this (or something similar) running in the Arduino Environment, regardless of speed (without a full OS stack)?

looking at the specs for it, there might still be an issue of ram, since the ATSAM3X8E has only 100kb of ram. if you limit what you do you can likely get some form of it working but to be honest i'm not sure how limited that would be at the moment. You'd certainly have to lower the resolution to something like 160x120, since even 320x240 would take up 192kb of ram in raw rgb. so limited resolution, possibly even doing it in yuv with 4:2:0 like it's a low quality jpeg would probably be possible, but you would end up having some things more difficult to work with e.g. find the red ball. speed wise at 84-85mhz it ought to be good enough to do the processing at a few frames per second which for lots of things would likely be good enough.

all of this is just speculation based off of what i found googling the chip on it, and then the libccv might not even fit on the 512kb flash it has for the program you put on it.

my thought would be, bight the bullet and go with a raspberry pi or something like that for doing image processing, faster cpu, more ram, and you could hook it up to something like the arduino due, to do other logic, and have the due ask it questions like, "where is the ball" etc.

source: http://www.linkedin.com/groups/Arduino-Due-featuring-Atmel-A...

The Raspberry Pi has my interested, as it does seem to have significantly more power for this type of thing (in a limited sense)

Or for a bit more power, check out the APC for $49: http://www.geek.com/articles/chips/via-launch-a-49-android-p...

"The APC’s spec includes a VIA WonderMedia ARM 11 processor at its core, 512MB DDR3 RAM, 2GB of on-board flash storage, 4 USB 2.0 ports, a microSD slot, Ethernet port, and both VGA and HDMI display ports. As for power consumption, it tops out at 13 watts under load and 4 watts when idle."

I'm familiar with the apc but would ccv run on Android?

Should be possible; not sure how hard it will be to compile fo r ARM

Are there any video demonstrations? I found nothing good on youtube.

Any thoughts about putting this on a smartphone?

It compiles with you iOS XCode project (drop the source code in and it will compile).

I need to sit down and read the docs better, but is there any easy(ish) way for me to use this with Ruby/Rails?

I would also be greatly interested in this, as well as an explicit license to see how viable this is as an option.

I probably missed it but how do you train more faces with your library?

It is face detection (outline faces from images), not face recognition (tell whom the face belongs to). To train your own face detector, follow instructions here:


It's nice to see some competition with Open CV


Yes. But I believe it's licensed for non-commercial use, from what I can tell from http://www.cs.ubc.ca/~lowe/keypoints/

Nice job!

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact