About 80% of data in the world is unstructured: work descriptions, cv, emails, text documents, voice recordings, images and videos, social media posts, website, etc.
As far as I know, as of now, we aren't really able to get insights and profit from all this unstructured data. For instance, in most cases, structured data are needed to train neural networks; i.e all settings are predominantly supervised.
Furthermore, with the discovery of the transformers architectures (BERT, GPT-2, ...) the last couple of years have been incredible for the NLP ecosystem.
My question is, looking ahead two, five, and ten years, how good we will be able to deal with unstructured data? Also, which tools and technologies are we missing now that will be available in a couple of years that will permit to get the most out of all this data? Which role will plays language models and transformers architectures in this field? Semi-supervised learning will be the key there?
These are all quite general and broad questions. The idea is to discuss unstructured data and unsupervised algorithms.