Hacker News new | past | comments | ask | show | jobs | submit login

Let me start by saying that laion is a non profit, open to anyone that want to contribute.

Agreed about the website css. Do you want to contribute?

What's the problem with the dataset name exactly? Seems to work pretty well.

Yes the dataset is an extract of common crawl, this is an accessible to all method to produce valuable dataset. This is unlike supervised dataset which are reserved to organization with millions of dollars to spend on annotation and do not scale.

Non annotated datasets are the base of self supervised learning, which is the future of machine learning. Image/text with no human label is a feature, not a bug. We provide safety tags for safety concerns and watermark tags to improve generations.

It also so happens that this dataset collection method has been proven by using laion400m to reproduce clip model. (And by a bunch of other models trained on it)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: