
Kaggle’s Yelp Restaurant Photo Classification Competition, Fast.ai Style - harveynick
https://harveynick.com/2018/06/24/kaggles-yelp-restaurant-photo-classification-competition-fast-ai-style-part-1/
======
kaveh_h
Nick (the author) reflects on this: "The data needs to be merged into a format
which can be used to train a neural network. Solving this leads to the second,
much bigger issue: many of the resulting label to image mappings are
inappropriate...".

If one instead use a RNN (recurrent neural network) particularly with LSTM,
then it can take as input the sequence of all photos from a business and the
output of the model would then be the sequence of labels, similar to how
translation models work. Of course then another problem could perhaps be the
ordering of labels is unrelated to order of photos for a business, but there
is probably some way to handle this by either data synthesis (multiple
permutations of the training data to ignore ordering of labels) or by sorting
the labels in a certain fashion that the model can learn.

~~~
harveynick
Hi, Nick here.

That's really interesting. I have another 2000-ish word length part two* I'm
hoping to put out around the end of the week with some ideas for others things
to try. Most of my ideas for next steps revolved around tweaking the data
loader to output composite images, or tweaking the loss the function to be
more forgiving of false negatives. This is an alternative I hadn't considered.

You would still need 34+ layers of ResNet to detect the features, I think, but
perhaps the fully connected classification layers could be replaced with an
RNN. You would probably ignore all of the outputs aside from the last one, so
that part of the ordering wouldn't matter. Input ordering might have an
effect, though.

Assuming you don't object, I'm going to add this to the "extension ideas" at
the end of my part 2 (with a hat tip, of course).

Of course the next problem would be figuring out how to actually accomplish
this with the fast.ai library...

* I was originally on the fence about just publishing the entire 5000-ish words as a single post, but Ulysses actually started to have trouble with it and that pushed me over the edge.

~~~
matt4077
The LSTM idea is good, but the ordering problem stems from the LSTM assumption
that the input is in some way ordered (i.e. words in a text, or frames in a
video).

I would instead try separate inputs for each image, followed by an aggregation
layer. I think a maximum-pooling approach makes most sense for most of the
labels: there seem to be some photos for each restaurant providing all the
information regarding certain labels. When there are five interior images, and
one showing a patio, you want the latter to determine the label "outdoor
dining".

For other characteristics, such as "elegant" (I'm making up these labels
because I read the post yesterday and don't remember the specifics) average
pooling might be a better fit.

In fact, a simple concatenation might work well, allowing the model to learn
these things on its own. With differing numbers of images per restaurant that
wouldn't work, though. Instead, one might try both MAX and AVG pooling,
followed by concatenation. Or, if available, play around with other types of
averaging, such as trimming outliers.

On another point: I was slightly put off by the criticism of the data
structure. That structure follows directly from the problem being solved and
isn't some oversight making life more complicated than necessary. To work with
such data, I tend to throw it into a rails application and export whatever
format I need (although I'd use a python solution if I were more familiar with
them).

~~~
kaveh_h
Actually the LSTM model doesn't asume anything about ordering itself, but it
could probably overfit because of ordering depending on how much training data
is available, and in this case we know that the order of input and output is
not directly related. One way to make the LSTM model more robust and less
prone to over-fitting would simply be by making a number of copies of each
training examples where each copy is then mutated by changing order of the
images and tags.

------
namuol
I'd love to see more articles from fast.ai students covering topics other than
image (multi) classification. I've been gradually going through the course,
and attempting to apply what I'm learning as I go, but I rarely see good
results, especially with structured data.

~~~
autokad
> "but I rarely see good results, especially with structured data."

what algorithms are you using? I dont know if I am a ds neanderthal, but DNN
are the last thing I try. xgboost and some ensembling with catboost usually
gets very good results with little effort

~~~
halflings
Wouldn't call the ensembling of xgboost models + catboost "little effort". One
thing that is true however is that xgboost will usually give better
performance without tuning, vs DNNs without any tuning / hyperparameter
search.

~~~
autokad
psudocode here:

yhat_xgboost = xg.predict(x_test)

y_hat_cat = cat.predict(x_test)

y_hat = .7 * y_hat_xgboost + .3 * y_hat_cat

how is that not little effort?

~~~
namuol
I really think the FastAI course needs to be taken with a heaping dose of salt
in the form of "here are all the non-DL solutions to common problems" \--
there's a ton of material out there already, but based on anecdotal evidence,
there are a lot of folks very new to ML & data science in general that are
taking the course (i.e. people like myself).

~~~
harveynick
Have you looked at fast.ai's ML course? It might also be worth a try.
[http://forums.fast.ai/t/another-treat-early-access-to-
intro-...](http://forums.fast.ai/t/another-treat-early-access-to-intro-to-
machine-learning-videos/6826)

I'd also recommend the Andrew Ng Coursera course, which I talked about here:
[https://harveynick.com/2018/04/25/some-notes-on-the-
andrew-n...](https://harveynick.com/2018/04/25/some-notes-on-the-andrew-ng-
coursera-machine-learning-course/)

