
Anyscale, from the creators of the Ray distributed computing project, launches - ceohockey60
https://techcrunch.com/2019/12/17/anyscale-ray-project-distributed-computing-a16z/
======
thinkingkong
Project Ray is specifically more focused on scaling AI systems it seems. That
is sort of important but also missing from the article.

[https://bair.berkeley.edu/blog/2018/01/09/ray/](https://bair.berkeley.edu/blog/2018/01/09/ray/)

~~~
ignoramous
Previous discussions on the Ray project:

[https://news.ycombinator.com/item?id=15481169](https://news.ycombinator.com/item?id=15481169)

[https://news.ycombinator.com/item?id=16510610](https://news.ycombinator.com/item?id=16510610)

[https://news.ycombinator.com/item?id=20064241](https://news.ycombinator.com/item?id=20064241)

------
vonnik
Ray user here. Their adoption is exponential. Much faster than Spark when it
was "the same age." It's not limited to AI workloads, that's just its flagship
use case with RLlib.

~~~
choppaface
Has Ray grown beyond reinforcement learning? Ray appears to lack a datastore
or locality mechanism for at-rest distributed data, which is really important
for ETL and SQL workloads. So far Ray looks like it has some nice features
that are missing from the Spark RDD API, but for ETL / SQL it looks like you’d
only choose Ray there for your own entertainment.

It would be compelling if Ray were to provide Horovod support versus the
authors re-applying some of same research in their own thing. Ray programming
API + code distribution + Horovod performance primitives is what the community
probably wants.

~~~
agibsonccc
It's still pretty early days for ray yet. That being said, spark never really
got the hang of doing machine learning properly. It "works" but not for newer
workloads which ray is trying to support.

It's good someone is building a company around it. I could see them building
services on top of it and build a SAAS like databricks did with spark.

I'll be curious to see how ray matures.

~~~
choppaface
I agree that ML on Spark was only a limited hit—- iterative jobs would
actually be feasible versus Hadoop—- I still have yet to find a better ETL and
SQL tool, and that’s a big part of most ML projects.

I’m worried about Ray as a SAAS Co because so far it looks to me like they’re
riding reinforcement learning hype. They’d need to really penetrate the users
of Horovod and Tensorflow Distributed to get beyond a beach head. And what if
TPUs and Cerebras become more common? Because then the maker for multi-machine
workloads becomes smaller (definitely not zero though).

~~~
agibsonccc
Your concerns are right on point. I agree that spark is a great sql/etl tool.
My thinking was on the "math execution" part. Ray is able to doa bit more
there. I do feel like there is a bit of hype riding going on here as well.

One interesting thing that could happen is the hardware gets better, and then
these distributed schedulers might not be able to keep up with all the
different options on the market.

There is also the tension of the hardware vendors wanting to give away things
that only run on their chips vs the software makers who want things to run on
every chip. It seems like there will be a lot of competition among the various
infra players in the next few years now that nvidia is starting to have real
competition now (even if it's not big yet)

~~~
choppaface
Just to qualify that "math execution" part, the beauty of Ray is that you get
threadpool-like features to speed up arbitrary python code. So not just
parallelism, but state/variable sharing _for relatively small data_. So this
is great for some optimizers and definitely RL (where your "math" is some
really complicated simulation / loss logic), but Ray wouldn't make much sense
for BLAS stuff. Am I missing something here?

Ray shows expertise in multi-machine that's lacking in stuff like Jax,
Tensorflow, and PyTorch. Horovod nailed down a lot of the performance issues
for SGD in particular, but is missing the sort of rapid deployment /
distribution stuff in Ray. If only they could all work together ...

------
one_electron
another one!

berkeley rise lab is such a powerhouse - spark, mesos, etc. just in the past
few years. it really puts some of these big tech companies to shame.

~~~
zamadatix
I think spark and mesos are about a decade old at this point. I'm not really
sure anyone is being put to shame either, there are wildly successful
alternatives to all of these.

