
An MNIST-like fashion product dataset - kashifr
https://github.com/zalandoresearch/fashion-mnist
======
jph00
I don't understand why this seems to be getting so much attention. There are
plenty of small image datasets around, and wide recognition of the issues with
MNIST.

I see no evidence at all that this particular dataset is better than MNIST.
None of the issues they themselves list with MNIST are discussed with relation
to their proposed replacement.

The benchmarks they provide are entirely useless - sklearn does not claim to
be a platform for computer vision models. A quick WRN model gets 96% of this
dataset (h/t @ajmooch on Twitter), suggesting that it doesn't deal with the
"too easy" issue.

The images clearly don't deal with the problem of lack of translation
invariance.

On the downside, they don't have the same ease of understanding of hand-drawn
digits, which is extremely helpful for teaching, debugging, and visualizing.

~~~
Q6T46nT668w6i3m
> sklearn does not claim to be a platform for computer vision models.

There are more than a dozen image classification and segmentation examples on
the scikit-learn gallery:

[http://scikit-learn.org/stable/auto_examples/index.html](http://scikit-
learn.org/stable/auto_examples/index.html)

~~~
jph00
Yes but at this stage they're really toy examples. IIRC the best sklearn
result they showed on this benchmark had 20% error, whereas the little WRN
network was 4%.

------
nip
How would you go about generating such dataset?

1\. Scrape images and store as png

2\. Downscale to 28px

3\. Convert each image to grayscale

4\. Convert to matrices and add label (additional row?)

5\. Normalize to have matrices of 1 and 0 for faster computation

6\. Vectorize said matrices

7\. Concatenate into one big vector

Did I miss something / Am I fooling myself?

I plan on working on my first ML side project and I would love to gain some
insights from HN.

~~~
e_ameisen
That's the right idea overall, with a few caveats.

1\. Yes but you need to manually inspect and verify that the images are of the
right class

5\. Images are grayscale, not only black and white.

Additionally, MNIST and fashion-MNIST have all their objects centered and of
similar scale. This is a large part of what makes them a popular first test
for any image model: they are very simple to solve as the model need not be
very robust to fit the dataset.

~~~
Q6T46nT668w6i3m
> Additionally, MNIST and fashion-MNIST have all their objects centered and of
> similar scale. This is a large part of what makes them a popular first test
> for any image model: they are very simple to solve as the model need not be
> very robust to fit the dataset.

Yeah, this is crucial. Especially when trying to generalize models. It’s easy
to verify the usefulness of data augmentation when you can make basic
assumptions about the data (e.g. it’s centered).

------
eggie5
Looks like this was sourced from in-house at some German online retailer:
zalando.de. There is a similar data set from from amazon sourced by UCSD:
[http://jmcauley.ucsd.edu/data/amazon/](http://jmcauley.ucsd.edu/data/amazon/)

And our research on recommenders using it:
[http://sharknado.eggie5.com](http://sharknado.eggie5.com)

Particularly, the 2D scatter of the CNN features:
[http://sharknado.eggie5.com/tsne](http://sharknado.eggie5.com/tsne)

~~~
boulos
It says in the README, that it's from them (Zalando). Via a Google search,
they're an e-commerce site from Berlin funded by the Rocket Internet folks.

------
edshiro
I'd love to play around with this dataset! It certainly seems richer than
MNIST, and would most likely force the network to extract more features.

But just like MNIST, it seems to lack variety in the positioning of the
important elements, they are all centered which means that they don't train
the network in being translation invariant. I presume this issue can be
tackled with data augmentation techniques like applying affine
transformations.

~~~
ENGNR
You could potentially add translations to the dataset automatically (and
generate mosaics with multiple images). Since you've done the transforms you
have perfect knowledge of the new images and they're ready to train on

------
stared
For a MNIST-like dataset, I often use not-MNIST
([http://yaroslavvb.blogspot.com/2011/09/notmnist-
dataset.html](http://yaroslavvb.blogspot.com/2011/09/notmnist-dataset.html)),
which is more difficult than the original one (see examples of misclassified
digits here: [https://docs.neptune.ml/get-started/character-
recognition/](https://docs.neptune.ml/get-started/character-recognition/)).

However, I am not sure if we need more MNIST-like datasets. With small size
many things make much less sense (data augmentation, even convnets as images
are centered anyway) plus using many channels is a typical things (IRL I
rarely work with grayscale images). So I am curious, in which way this dataset
is better than CIFAR-10?

See my note on datasets in Learning Deep Learning,
[http://p.migdal.pl/2017/04/30/teaching-deep-
learning.html#da...](http://p.migdal.pl/2017/04/30/teaching-deep-
learning.html#datasets).

------
a3864
If I am understanding the side-by-side comparison correctly, then the
performance is highly correlated with MNIST (at least for high accuracy
methods).

[https://i.imgur.com/viV7gFB.png](https://i.imgur.com/viV7gFB.png) (x-axis:
Fashion, y-axis: MNIST)

~~~
Q6T46nT668w6i3m
They should be correlated. It’s difficult (or near impossible) to interpret
the underlying convolutional features, but one can assume they are recognizing
basic image descriptors like intensities and shapes. They are rescaled to a
size that would deemphasize complex descriptors like texture. Likewise, they
are single-channel so that would deemphasize color features.

------
ntenenz
One of the reasons people have shifted away from MNIST is that it's simply too
easy. Single channel, small image size, few classes, etc. Unfortunately, this
does not address any of these concerns.

------
singularity2001
How is this 'better' then cifar10 / cifar100?

~~~
Q6T46nT668w6i3m
From looking at the examples, zeroed out backgrounds are one advantage to
CIFAR (depending on your task).

