
Tribuo, a Machine Learning Library for Java - samcolvin
https://github.com/oracle/tribuo
======
craigacp
The announcement blog talks a little about why we built this library, and
where we think it sits in the ML ecosystem -
[https://blogs.oracle.com/datascience/tribuo-java-machine-
lea...](https://blogs.oracle.com/datascience/tribuo-java-machine-learning-
library)

------
stevehiehn
Hmm, the first thing that comes to mind is DL4J
[https://deeplearning4j.org/](https://deeplearning4j.org/) I guess DL4J is
NNet focused and maybe Tribuo is a toolkit, not sure.

------
bratao
Looks very interesting! Another Java Library for ML is
Smile([https://github.com/haifengl/smile](https://github.com/haifengl/smile)).
I deeply recommend it!

~~~
hakjink
I want to use smile but their license is prohibiting the use of it at my
company. I understand the project authors' intentions to promote contributions
to the project, but the requirements to get commercial license is a deal
breaker for many.

~~~
mumblemumble
I suspect the (L)GPL is a big part of why Java has more-or-less completely
ceded the data space to Python. Corporate policy makes it painful for me to
take a dependency on GPL (including LGPL), too, and that makes building and
maintaining data applications in Java an absolute minefield, because so many
projects are copyleft. Even the packages that are Apache, MIT or BSD licensed
often require you to plug in netlib-java (LGPL) if you want decent
performance.

It's encouraging to see a project like Tribuo come along, but it also feels
like too little, too late. I'm already well underway on jumping ship and
migrating to Python, and have yet to encounter any particular reason why I
should look back.

~~~
johnc1231
I don't think netlib-java is LGPL. Where does it say that it is?

~~~
mumblemumble
Sorry, I was misremembering the problem. It's not netlib-java that is LGPL.
It's some of the native math libraries that plug into netlib-java.

------
jarym
This looks excellent, will take a bit of time to go through and understand.
The announcement blog is really helpful actually and explains the problem
you're solving well (and one I am intimately familiar with so I see value
here).

Congratulations!

------
latenightcoding
It would be cool if they add isolation forests to their anomaly detection
algorithms. I'm yet to meet someone who uses one-class SVMs in production

~~~
craigacp
We're implementing the extra trees algorithm at the moment, an isolation
forest is only a small amount of code from there.

------
londogard
Hi,

What would you say differentiates you from Smile which includes a simplistic
datagrame, visualisation and support for CBLAS etc.

Is speed on par?

~~~
craigacp
We have a strong focus on provenance, Tribuo models capture their input and
output domains, along with the necessary configuration to rebuild a model.
Tribuo's also more object oriented, nothing returns a bare float or int, you
always get a strongly typed prediction object back which you can use without
looking things up. Tribuo is also happy to integrate with other ML libraries
on the JVM like TensorFlow and XGBoost, providing the same provenance/tracking
benefits as standard Tribuo models, and we contribute fixes back to those
projects to help support the ecosystem. Plus we can load models trained in
Python via ONNX.

To your direct question, I've not benchmarked Smile against Tribuo. We are
very interested in the upcoming Java Vector API -
[https://openjdk.java.net/jeps/338](https://openjdk.java.net/jeps/338) \-
targeted at Java 16, which will let us accelerate computations which C2 or
Graal don't autovectorise.

------
suyash
Happy to answer any questions you may have about Tribuo.

~~~
nikhilgk
This looks really interesting! Kudos and thanks for the good work. Some
questions:

\- What does the future road map look like?

\- Are you planning on adding more algorithms ?

\- Any plans to bring in dataset and dataframe handling capabilities such as
in numpy/pandas etc?

\- What other interop features with other languages/platforms are planned?

\- Any plans for AutoML features?

~~~
craigacp
\- Short term roadmap is here -
[https://github.com/oracle/tribuo/blob/main/docs/Roadmap.md](https://github.com/oracle/tribuo/blob/main/docs/Roadmap.md),
longer term we'd like to see what the community wants.

\- Yep.

\- There are various efforts on the JVM to build multidimensional arrays,
we're talking to many of them to try and figure out a strategy for the whole
platform. Ditto for dataframes, though Apache Arrow looks like a good
baseline.

\- We're not looking at other languages outside of the JVM at the moment, but
we're continuing to contribute to Tensorflow Java and ONNX Runtime to improve
their Java support. We could look at pytorch inference support based on their
Java API, but that overlaps pretty well with the things that ONNX Runtime
supports. Do you have any suggestions?

\- Not beyond hyperparameter tuning.

~~~
nikhilgk
> We're not looking at other languages outside of the JVM at the moment, but
> we're continuing to contribute to Tensorflow Java and ONNX Runtime to
> improve their Java support. We could look at pytorch inference support based
> on their Java API, but that overlaps pretty well with the things that ONNX
> Runtime supports. Do you have any suggestions?

Many models are deployed as restful endpoints, so a quick and easy ways to
deploy models as services with containers or serverless providers will be very
useful - although admittedly, you might not want that in the core project,
could be a good sidecar project to this. Given your focus on model provenance,
extending that beyond to model deployment and life cycle management tools such
as MLFlow could also be very useful

~~~
suyash
Yes, I'll be building some demo's showing just that using cloud services.

------
RocketSyntax
What is the use case for non-Scala? Apps written in Java?

