I’m not sure how the authors can claim to be comparing their approach against “state of the art” AutoML and not include AutoGluon, FLAML or H2O. This independent benchmarking paper[0] is what the AutoML field points to as establishing SOTA, and the libraries compared against in the SapientML paper are middle of the pack at best.
Is calling it "sapient" too presumptuous? I feel like that word should be reserved for something more AGI like.
Am I completely off base with that opinion? I've been trying to temper my desire to jump in with any comment however irrelevant. Sorry, weird comment and question.
Any reason for using the term "generative", which may confuse readers and imply generative AI/LLMs? It's a traditional tabular autoML system, though it does learn pipelines from a corpus of Kaggle solutions, and generates pipelines with a "three-stage program synthesis approach" [1].
Is there a reason why those frameworks were suggested?
There are many commercial offerings that greatly outperform open-source automl approaches.
At my job we use Datarobot and its super impressive.
There is Azure AutoML, Vertex Bigquery AutoML, ...
Other data focused software components have automl solutions as well I believe, like Alteryx, Dataiku, SAS, ...
If you want state of the art in AutoML, I am afraid this is one of the areas where the commercial space is well ahead of the open source space.
Are there benchmarks or third party evaluations that you can share that support your claim? I haven’t used all these offerings so my experience is anecdotal, but I haven’t seen commercial offerings outperform AutoGluon, at least for Tabular data.
[0] https://arxiv.org/abs/2207.12560