
Ccv open sources near state-of-the-art image classifier under Creative Commons - liuliu
http://libccv.org/post/ccv-0.6-open-sources-near-state-of-the-art-image-classifier-under-creative-commons/
======
jeresig
This release is absolutely phenomenal. As liuliu mentioned in the release
notes most of the pre-existing implementations are either not-that-good,
closed source, or their datasets are closed. This is, seemingly, the first
_good_ open source deep learning classifier out there! There is a whole
cottage industry of commercial deep learning image classifier services
springing up (like [http://www.ersatzlabs.com/](http://www.ersatzlabs.com/))
but this will surely make this technology even more accessible. I can't wait
to use it in some of my side projects!

~~~
kastnerkyle
I think this is a great thing to put out into the community, and I hope it
gains widespread adoption.

I do have a few things to comment on here:

The architectures themselves have never been closed source (hence
publication). Frankly, I think Alex didn't release his implementation because
until TITAN came along, you needed two GTX580 GPUs to do distributed training.
This is unique (and difficult) to setup and support! Just "putting the code
out there" is not useful to many researchers, and the architecture is in the
paper already to implement and extend or modify. You will be bombed to death
with support requests, even if you just say "this is provided as is". Many
researchers don't want to deal with that IMO.

Data is EVERYTHING for training these architectures, and the ImageNet dataset
itself is still semi-closed source, and hard to get. You have to sign up to
get their license coverage if you want to download the original images for
research. I _think_ you can get weblinks to the original data, which you could
then ostensibly crawl yourself, but that gets murky from a copyright
perspective.

It will also take a very, very long time to wget all 1.2TB of ImageNet. My
download took about 45 days, so keep that in mind!

Most people just do not have the hardware to train a network of this
magnitude. CIFAR10 is a much more appropriate dataset for consumer grade
hardware/non lab-funded work IMO. I think this is one of the reasons most
advances on this front have simply released the trained network parameters as
a "black-box" preprocessor - it is much more useful for general tasks on
normal laptops and PCs! See [http://fastml.com/yesterday-a-kaggler-today-a-
kaggle-master-...](http://fastml.com/yesterday-a-kaggler-today-a-kaggle-
master-a-wrap-up-of-the-cats-and-dogs-competition/)

I personally used a simple wrapper to DeCAF + a sklearn logistic regression to
score a pretty good (96%) score for the kaggle Cats vs. Dogs competition on a
Lenovo T400, and I think this is where the commercial usefulness will go as
well. The training + data to get the "magic numbers" is probably going to be
the KFC/Coke tradesecret formula of data science in coming years. Plowing
through a bunch of floats in a particular format, then applying simple
classifiers is good enough for many standard computer vision tasks, and could
possibly be fit into a very small formfactor. Think FPGAs...

------
yuvipanda
Just the samples were moved to CC, which makes more sense. Code still is 3
clause BSD, and always has been.

------
plicense
The code being in C and being highly optimized, it would be really awesome if
they could comment their code better.

~~~
forgottenpass
sorry for the accidental downvote, does replying reverse that vote? I think I
saw someone post that it does, but HN can be smoke and mirrors sometimes

~~~
nitrogen
No, replying does not explicitly reverse a downvote, but pointing out your
mistake sometimes causes others to fix it for you by upvoting.

------
kastnerkyle
What is the advantage of using this instead of OverFeat for natural images? I
have been operating on the assumption that the strange license of OverFeat is
not a problem, since you are using someone's software to feed in input, and
generate a bunch of floating point values.

Are the floating point outputs of a program then "tainted" by the license of
the code as well? If not, why not use the state of the art, which handles both
localization AND classification?

I can see using ccv for custom datasets, though I am also assuming the 6GB
TITAN requirement for GPU use is only for ImageNet, and not for a custom
dataset? TITAN GPUs are pretty expensive...

I am very impressed by the result, as I have not been able to even approach
OverFeat with my own work (pylearn2 + theano) thus far. Will definitely be
experimenting with this in the future. Thanks!

~~~
liuliu
From my understanding, you cannot deploy OverFeat in production because its
license is only for research and evaluation purposes.

You still need TITAN for any reasonable custom datasets (with non-trivial
data). The current CPU implementation doesn't support Dropout, therefore, you
can only play it with CIFAR-10 (./bin/cifar-10.c) dataset if you don't have a
GPU.

This is an preliminary implementation, I do plan to finish up CPU training
part to be on-par with GPU in subsequent releases.

~~~
kastnerkyle
So you are storing the entire dataset on the GPU then, to speed up processing?
Or is there support for a "minibatch" mode that sends chunks at a time to be
processed?

Unless this more an issue with the model size of the "Krizhevsky net" being on
the order of 6GB?

Also, does this mean you managed to get a 2D convolutional kernel optimized
for Kepler architectures? If so, that is awesome! Alex's code is still only
optimized for Fermi architectures if I recall correctly.

~~~
liuliu
Not really the whole dataset. For ImageNet, I have mini_batch size of 256, and
I need to allocate the whole network on GPU for this mini_batch (which is 256
* the neurons in the network), and plus parameters (parameters is about 200MiB
* 3 for updates and momentum). Also to speed up certain operations, the data
need to be reshaped, and there are 500MiB scratch space just for that purpose.
In total, I am using close to 6GiB GPU memory. You probably can get to only
use 4GiB memory if the batch size is 128.

The code is never optimized to the extreme. I optimized the code to the point
of being able to finish at reasonable time (9 days for 100 epochs). The
convolutional kernel is parametrized (with template and some macro tricks,
forward and backward propagation convolutional kernels are parametrized into
hundreds of small functions) and the best parameters were chose with a mini
benchmark in the beginning of training process.

~~~
kastnerkyle
So... duh. You would need > 1.2 TB to fit all of ImageNet on the card :).
Thanks for clarifying, and pardon my brain lapse! Also, thanks for putting
this out there - if I get some time I may send some pull requests your way.
Awesome stuff.

~~~
liuliu
You need 1.2TB SSD for sure to train the complete ImageNet dataset. The data
is loaded into GPU memory only one batch (256 images) at a time. But the
loading part will be the bottleneck if you use a rotational disk.

------
jkrippy
Here's a link to the their documentation for the algorithm:
[http://libccv.org/doc/doc-convnet/](http://libccv.org/doc/doc-convnet/)

Had to dig for a few minutes and wanted to help others find it.

------
jdonahue
As a user of and contributor to Caffe [1], I have to take the opportunity to
plug it here. Like the CCV classifier linked, Caffe is fully open-source [2],
has a downloadable state-of-the-art model pre-trained on ImageNet [3], and
scripts/documentation that make it very easy to compute features using our
pre-trained model or other models [4].

Unlike the linked CCV release (unless I'm misinformed -- haven't actually
tried it, please correct me if I say anything inaccurate), Caffe supports
completely customizable architectures via a configuration language, fully
supports training [5], finetuning [6], and inference (feature
extraction/classification) in these customizable architectures, and seamlessly
runs on both CPU and GPU.

Caffe is also very fast; twice as fast at CPU feature computation as its
predecessor DeCAF, and faster than cuda-convnet at training/testing ImageNet
architectures on a Titan/K40 GPU.

The linked CCV release does mention Caffe, but quickly dismisses it due to the
license. It's true that our pre-trained model [3] is licensed only for non-
commercial use, but ALL of the Caffe code is BSD-licensed, including the exact
script we used to train said model. So if you're a commercial entity, using
Caffe for feature extraction/classification from a state-of-the-art network is
a matter of purchasing a $1000 GPU (NVIDIA Titan -- I'm assuming you own a
computer), downloading the ImageNet dataset, and waiting about a week for
training to converge. This will buy you the ability to adapt the classifier to
YOUR visual classification problem by finetuning [6], rather than being stuck
with the particular 1000 categories the pre-trained model knows about.

[1] [http://caffe.berkeleyvision.org/](http://caffe.berkeleyvision.org/)

[2] [https://github.com/BVLC/caffe](https://github.com/BVLC/caffe)

[3]
[http://caffe.berkeleyvision.org/getting_pretrained_models.ht...](http://caffe.berkeleyvision.org/getting_pretrained_models.html)

[4]
[http://nbviewer.ipython.org/github/BVLC/caffe/blob/master/ex...](http://nbviewer.ipython.org/github/BVLC/caffe/blob/master/examples/imagenet_pretrained.ipynb)

[5]
[http://caffe.berkeleyvision.org/imagenet_training.html](http://caffe.berkeleyvision.org/imagenet_training.html)

[6] [http://caffe.berkeleyvision.org/caffe-
presentation.pdf](http://caffe.berkeleyvision.org/caffe-presentation.pdf) (see
page 14)

~~~
liuliu
Hey, Donahue, I actually referenced Caffe extensively in detailed
documentation: [http://libccv.org/doc/doc-convnet/](http://libccv.org/doc/doc-
convnet/)

This is a preliminary implementation, but is a complete one includes both
training and testing code. The big difference is that ccv is a computer vision
library in general and Caffe is a artificial neural network library. This does
mean quite a few different ways of approaching things, for example, ccv's
implementation does allow you to specify network topology, but doesn't have a
implementation of local non-weight-sharing layer (because CIFAR-10 and
ImageNet doesn't need such type of layer).

You can also chop off the last full connect layer and train a SVM on top of it
with ccv, I actually plan to do exactly what you guys did with that and train
on VOC 2012 dataset.

All in all, ccv 0.6 is a preliminary implementation of convnet, but it is
important for a library claims to be "modern" to contain the said
implementation. And providing the pre-trained data model with a liberal
license (so that you can fine-tune your classification problem on top of the
pre-trained data model) is also aligned with ccv's goal.

~~~
jdonahue
I hadn't seen the detailed documentation - thanks so much for the
acknowledgments there!

And thanks for correcting me about CCV's support for custom architectures and
training -- I'd just assumed that it wasn't supported since it wasn't
mentioned in the post, but I guess this was more of a marketing decision as
most users are probably just interested in feature extraction/classification
from the pretrained net. :) I would argue that GPU support is pretty necessary
for training modern network architectures a la Krizhevsky to be remotely
practical, though.

I apologize if I came off as overly competitive or derisive, this is obviously
very nice work and it seems like an attractive option for many users. Always
happy to see deep learning made more accessible and open!

------
caio1982
This older post explains better why we should care about it (or not):
[http://libccv.org/post/an-elephant-in-the-room/](http://libccv.org/post/an-
elephant-in-the-room/)

------
unhammer
Very cool :-)

On a sort-of-related note: anyone know of a good FOSS library for making a
reverse image search? My current method is just to use findimgdupes, but this
is rather slow and hard to script around. (Ideally, my desktop with my main
private image storage would have a simple web interface that works like
tineye.)

~~~
sheraz
You are probably looking for image similarity tools. I've been playing with
the IMGSeek tools [1] for a few months in my spare time. My hobby project is
to create a reverse image search engine.

[1] - [http://www.imgseek.net/isk-daemon](http://www.imgseek.net/isk-daemon)

------
steeve
I've been using the Stroke Width Transform (SWT) from libccv for more than a
year now. It's amazing.

------
swah
I'm needing a ELI5 for this..

------
bsaul
is there any image recognition software working with texturized 3D models as
input for data training ?

~~~
nitrogen
Can you treat Z as just another color channel and use the same algorithms?

~~~
kastnerkyle
It depends - some algorithms exploit correlations between pixels to reduce the
computational load. This is bad if the Z channel is not strongly correlated
with the others (RGB are very strongly correlated in natural images). Since
depth shouldn't usually be correlated with color, this might cause some
issues. Some experiments were done in "Learning Feature Representations with
K-means", A. Ng. and A. Coates.
([http://www.stanford.edu/~acoates/papers/coatesng_nntot2012.p...](http://www.stanford.edu/~acoates/papers/coatesng_nntot2012.pdf))

Give it a try, but also look at the underlying assumptions of your algorithm
if it performs poorly.

------
stefantalpalaru
From
[http://wiki.creativecommons.org/FAQ#Can_I_apply_a_Creative_C...](http://wiki.creativecommons.org/FAQ#Can_I_apply_a_Creative_Commons_license_to_software.3F)
:

> We recommend against using Creative Commons licenses for software.

~~~
MatthewWilkes
It's not software, it's a data structure.

