
AVA: A Finely Labeled Video Dataset for Human Action Understanding - hurrycane
https://research.googleblog.com/2017/10/announcing-ava-finely-labeled-video.html
======
SloopJon
From the download page:

> The AVA dataset contains 192 videos split into 154 training and 38 test
> videos. Each video has 15 minutes annotated in 3 second intervals, resulting
> in 300 annotated segments.

So basically this is a couple of CSV files annotating 192 videos, which are
hosted on YouTube. ava_train_v1.0.csv is about 7 MB.

~~~
tzm
> basically this is a couple of..

I would prefer accuracy over complexity any day.

------
lifeisstillgood
the most interesting thing i found was "We use movies as the source of AVA".

while the datasets will only grow, movies are not realistic - they are by
design faked, acted, well lit etc. While that is probably the best thing to do
with a starting set i am waiting for the CNN/RNN to start saying (much like
the early black female standford researcher who was not identified as human
face) that person is not walking - i know walking, it's just like John Cleese.

~~~
chimtim
this is what makes this dataset poor. other datasets mentioned in the blog are
based off youtube which is more realistic. movie based datasets have perfect
lighting, center the subject are almost never useful (e.g. HMDB)

~~~
yodon
YouTube/Flickr/etc are far from ideal data sources. Do dogs drive cars? Flickr
has tons of photos of dogs driving cars, eating ice cream, and doing tons of
other rare-for-dogs things. Ultimately whatever the raw data source is what
matters is how well is it curated, and that’s always going to be a highly
labor intensive job that can be done well or poorly regardless of the source
of the images being curated.

~~~
chimtim
these are not random, raw youtube video datasets. they are hand curated
dataset in specific classes (like using mechanical turk). youtube has really
diverse videos with different lighting, and real world scenarios which makes
it an excellent dataset. Movie clips look great but models trained on them are
useless in real world.

