
Not_notMNIST: Generate your own datasets - RafazZ
[Teaser](http:&#x2F;&#x2F;zafar.cc&#x2F;images&#x2F;letters.png)<p>[Personal Blog](http:&#x2F;&#x2F;zafar.cc&#x2F;not-notmnist-dataset-generation&#x2F;)<p>[GitHub Link](https:&#x2F;&#x2F;github.com&#x2F;zafartahirov&#x2F;not_notMNIST)<p>I wrote a little script that you can use to generate datasets for classification (like MNIST or notMNIST).<p>It takes fonts that you have, and creates images + label&#x2F;features pickle that you can load into Python.<p>A more detailed explanation here: http:&#x2F;&#x2F;zafar.cc&#x2F;not-notmnist-dataset-generation&#x2F; I would really appreciate any critique, issue requests, and pull requests on GitHub: https:&#x2F;&#x2F;github.com&#x2F;zafartahirov&#x2F;not_notMNIST<p>The benefits that I personally see is that if you want to test your classification on datasets that involve Unicode characters, you can. The problem is that you have to have a lot of fonts to be able to generate a decent dataset. If you have a lot of fonts in your language, I would appreciate if you could share the dataset :) I generated some using Hiragana, but I don&#x27;t have a license for a lot of fonts, so it is more of a demo (check GitHub). I would really love to have a dataset for Chinese, Arabic, Hebrew, Cyrillic, etc.
======
opless
Great stuff!

