I'm an Aquarium user. There are two ways Aquarium provides value to my company. First, we improved our model performance. Second, I spent less time and less clicks curating my dataset.
Regarding model performance, I used Aquarium to improve the AUC for my model by 18 percentage points (i.e., comparing the AUC for the first model trained on my new dataset to the AUC for my production model).
Regarding dataset curation efficiency, I spent much less time curating my dataset using Aquarium than I would have spent using our own in-house tooling. For example, the embedding-based point cloud allowed me to identify lots of images with an issue at once, rather than image by image, click by click.
This thread has been mostly focused on improving model performance (i.e., my first point), but Aquarium is also valuable for improving model curation labor efficiency (i.e., my second point). For the business owner, dataset curation labor efficiency means less money wasted on having some of your most expensive employees, ML data scientists, clicking around and writing ad-hoc scripts. For the ML practitioner, dataset curation labor efficiency means fewer clicks and less wear on your carpal tunnels.
The founders, Peter and Quinn, didn't ask me to write this. I chose to write it because it's a great product that I think can help a lot of businesses and people.
To second your comment, I think non ML folks don't understand how much of an impact dataset curation can have on model performance. More high-quality data will outshine clever network architectures with less data. I've seen it again and again. But the thing is, it's so hard to really curate your data once the dataset has a lot of "dimensionality" to it (sorry couldn't think of a better word...). To be honest, if I were to pick an area of dev-tool I'm most excited about over the next 5 years, this area is probably it.
Regarding model performance, I used Aquarium to improve the AUC for my model by 18 percentage points (i.e., comparing the AUC for the first model trained on my new dataset to the AUC for my production model).
Regarding dataset curation efficiency, I spent much less time curating my dataset using Aquarium than I would have spent using our own in-house tooling. For example, the embedding-based point cloud allowed me to identify lots of images with an issue at once, rather than image by image, click by click.
This thread has been mostly focused on improving model performance (i.e., my first point), but Aquarium is also valuable for improving model curation labor efficiency (i.e., my second point). For the business owner, dataset curation labor efficiency means less money wasted on having some of your most expensive employees, ML data scientists, clicking around and writing ad-hoc scripts. For the ML practitioner, dataset curation labor efficiency means fewer clicks and less wear on your carpal tunnels.
The founders, Peter and Quinn, didn't ask me to write this. I chose to write it because it's a great product that I think can help a lot of businesses and people.