Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: unstructured data and unsupervised learning
3 points by jonathanbesomi on July 10, 2020 | hide | past | favorite | 3 comments
About 80% of data in the world is unstructured: work descriptions, cv, emails, text documents, voice recordings, images and videos, social media posts, website, etc.

As far as I know, as of now, we aren't really able to get insights and profit from all this unstructured data. For instance, in most cases, structured data are needed to train neural networks; i.e all settings are predominantly supervised.

Furthermore, with the discovery of the transformers architectures (BERT, GPT-2, ...) the last couple of years have been incredible for the NLP ecosystem.

My question is, looking ahead two, five, and ten years, how good we will be able to deal with unstructured data? Also, which tools and technologies are we missing now that will be available in a couple of years that will permit to get the most out of all this data? Which role will plays language models and transformers architectures in this field? Semi-supervised learning will be the key there?

These are all quite general and broad questions. The idea is to discuss unstructured data and unsupervised algorithms.




If you like the way NLP progresses with all the recent large models, the next natural step is to apply the same idea to video - predict the next frame. Obviously, it's not the same, as we don't have a "vocabulary" of frames, so some novel approach is needed. I haven't actually looked into video prediction/generation literature, so I don't know what's happening in that field.


Interesting; thank you! Related to videos: recently Open AI released GPT-3, a transformer models trained this time on images ...





Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: