
Indigenous datasets: A listing to help AI perform better in India - LogicRiver
https://factordaily.com/indigenous-datasets-from-india/
======
skbly7
Glad to see multiple contributions from my college.

Here are few more datasets which I am aware of and can be added to the list:

\- Dataset for Indian National Rupee
[https://cvit.iiit.ac.in/research/projects/cvit-
projects/curr...](https://cvit.iiit.ac.in/research/projects/cvit-
projects/currency-recognition-on-mobile-phones)

\- City-scale Road Audit using Deep Learning dataset
[https://cvit.iiit.ac.in/research/projects/cvit-
projects/city...](https://cvit.iiit.ac.in/research/projects/cvit-
projects/city-scale-road-audit)

\- Multi domain corpus for sentimental analysis (Telugu dataset, download link
isn't working probably email for download)

~~~
geevi
There is also an India Driving Dataset:
[http://idd.insaan.iiit.ac.in/](http://idd.insaan.iiit.ac.in/)

------
johnny313
This reminds me of a paper put out by a few researchers at Google Brain: `No
Classification without Representation: Assessing Geodiversity Issues in Open
Data Sets for the Developing World` [0]

The take away was that existing open image datasets are biased toward western
contexts (eg what a wedding looks like), leading to low performance when
applied in non-western contexts.

[0] [https://arxiv.org/abs/1711.08536](https://arxiv.org/abs/1711.08536)

------
sairahul82
[https://tdil-dc.in/index.php?lang=en](https://tdil-dc.in/index.php?lang=en)
has good some good datasets. Only problem is you need to agree the t&s and fax
the document to Delhi to get the access. They supposedly will send the DVD. I
don't understand why they are doing. If Indian government wants to foster the
research, they should put them in public domain.

