onasta's comments

onasta · 2025-01-09T23:30:26 1736465426

There have been a ton of improvements! Much better performance overall, way larger data size limit (1K-->10K rows, 100-->500 features), regression support, native categorical data and missing values handling, much better support for uninformative or outlier features etc.

onasta · 2025-01-09T22:55:43 1736463343

TabPFN is better on numerical data since v1 (see figure 6 in the CARTE paper). CARTE's main strength in on text features, which are now also supported for TabPFN v2 API version (https://github.com/PriorLabs/tabpfn-client). We compared this to CARTE and found our model to be generally quite better, and much faster. CARTE multi-table approach is also very interesting, and we want to tackle this setting in the future.

onasta · on Aug 4, 2022

Super interesting! Do you know the kind of data that it's usually used for? And in the remaining 80% to 60%, do NNs acccount for a large portion of the best models?

Bonus question: are the stats you're mentioning publically available?

coffee_am · on Aug 5, 2022

Sadly (but correctly) nothing is public, no one ever sees any data, it's a service. Pure FNN (feedforward NN) models, if I recall correctly is also ~30 to 40%.

Since the server doesn't work for all types of data, and probably folks that are experts in ML would do their own hyperparameter tuning, and custom models, this leads to the bias on the type of datasets that are compete.

But this share have been consistent over many months of various unrelated datasets, I believe.