
Ask HN: How are distributed systems as a field of study compared to ML/AI? - distsysdude
I currently work on Database Development at Oracle.
I would like to acquire knowledge on something other than the DB internals, and I&#x27;ve been recently fascinated with Distributed Databases like CockroachDB, AWS Aurora, Azure Cosmos, TiDB, etc.<p>So I was thinking I&#x27;d dive deep and learn more about distributed systems and possibly try to switch to any of the above mentioned companies.<p>I would like HN&#x27;s opinion about the scope for Distributed Systems(&amp; Distributed Databases) as a career choice, compared to the current hot areas like Machine Learning&#x2F;AI.<p>A bit of background about me :
I did my undergrad in a Non-CS field, but I&#x27;ve had an interest for Computer Science since High School, so the switch to a full fledged CS job was not that difficult.
I&#x27;m self motivated and willing to spend copious amount of time to learn something that would help with my career.
But I&#x27;m confused as to what to study.<p>Should I stick with my current interests (or) follow the industry trend, and try to learn an ML&#x2F;AI ?<p>Thanks!
======
montereynack
I’m biased as my work is primarily focused on large-scale distributed physics
simulations, and incorporating machine learning into these. As a result, I
treat ML very much as a means to an end.

Of course with the caveat that your situation is unique to you so I can’t give
any definitive answers, I would think long and hard before jumping on the ML
hype train. In my experience, it doesn’t pay to follow the trend; you’ve
either gotta be first or you gotta be unique. Now that’s not to say that doing
ML work work will only be restricted to a select few which you aren’t a part
of, but myself and a few others are wary that the ML hype train (at least as
far as deep learning is concerned) might be passing. The days of the AI labs
paying million-dollar bonuses are nearly gone, unless (and someone can correct
me if I’m wrong) you’ve got an alternate skill set they’re looking for. Of
course, that doesn’t mean there aren’t plenty of people and businesses who
would need CRUD-type ML setups; with your experience in databases I imagine
that could be a unique angle to attack it from. Whether it’s a good idea to
try and pivot into a career using ML really depends on your specific situation
and the opportunities therein; to get more solid advice I would ask a trusted
colleague or mentor, and would not consult people online, even if they are
from HN.

For my PERSONAL opinion: I can’t speak to what is normally done in other parts
of distributed systems, since scientific tools are usually bespoke and don’t
use the same set of approaches as commercial products. However, just thinking
about it from an outsiders POV, it seems to me like focusing more on
distributes systems would be a winning combination. I don’t think computers
will advance enough in the next 30 years that the need for distributed data
and compute management skills will go away; hell with IoT you might be looking
at a boom down that career path. From my perspective it’s only upside if you
focus on expanding your skilllset in these areas; if ML continues to thrive
there’ll most definitely be a need for distributed systems to run these models
on. And if an AI winter hits, you’ll have a solid set of core skills to fall
back on which I don’t imagine will go out if favor anytime soon. Those are
just my two cents though, of course YMMV.

~~~
distsysdude
Thank you for taking time to answer this!

I never knew that distributed physics simulations could be a career field. I
always thought that such problems would be handled by scaling up vertically or
just throwing a super computer at it.

If you don't mind, can you please elaborate a bit about the type of work that
you do and scope of problems that you solve every day?

Thanks again!.

~~~
montereynack
I can elaborate a bit. Most of the large-scale problems are actually as
straightforward as “just throw a supercomputer at it”. However, just like when
mathematicians say “that’s an implementation detail”, it turns out that
actually throwing a supercomputer at the problem is much more difficult to do
in practice than merely setting up some shell scripts to run, especially where
scientific computing is concerned. For one, there’s usually no concept of
“micro services” or “containerized” applications, at least not in my
experience. Most of the modern distributed computing practices are actually
thrown right out the window when it comes to scientific computing, since the
scientists are going to be directly programming distribution schemes via MPI
and stuff. The reason is because academic projects don’t have lots of money
and need to efficiently use every dollar, and because most of the time
distribution schemes really aren’t suitable for the science. You might have
one layer where node interactions occur according to some mathematical and
physical criteria instead of “load” or some other abstract flag, for example;
that’s a bit harder to code for, and it’s much better to have a domain
scientist who knows the physics deciding how to decompose the problem, instead
of a computer scientist who has no idea of the physics adopting a scheme which
ignores the problem entirely. Hence why I said most scientific tools are
“bespoke”.

The result is that most of the distributed systems people are moved to a
supporting role, where their job is to develop tooling and libraries to allow
for better communication between nodes, for example. I’ve also heard of some
compsci people being directly integrated into these scientific teams to
develop specialized APIs and such in-house, but that’s a bit more rare imo.
These are just some examples of how science and compsci intersect; for example
here’s one group I know of:
[https://www.ornl.gov/group/dcs](https://www.ornl.gov/group/dcs)

------
s1t5
They're basically everywhere, there's a lot to learn, it's a valuable skill
for a potential employer and you're interested. Seems pretty obvious that you
should go deeper with distributed systems. Also, your lack of formal education
won't hold you back as much as it would with ML. The idea that you should go
do math for the next couple of years just because the field seems hot, doesn't
seem well thought out.

~~~
distsysdude
Thank you for taking time to answer this!

I honestly thought that my lack of formal education would be hurdle when
learning/working on Distributed Systems. Don't you agree?

Also, I've always wondered how ML models are deployed in production at scale.
Building a model using libraries in Python seems fine, but how do they
distribute and deploy it in Production?

------
drallison
Huh? Seems to me that you should be interested in and learning about all
manner of computer related things including Distributed Systems, Machine
Learning, and Artificial Intelligence. Knowledge and techniques useful in one
context are generally useful in other contexts. The ability to see structure
and similarities across systems and applications is invaluable.

~~~
distsysdude
I'm not quite sure about that. For example, I dont think any major Database
vendor has considered using ML algorithms inside a DB. Google did try
replacing B-Tree indexes with Neural networks, but that was only a research
project and it wouldn't have been possible to scale it up to meet production
demands.

But I think I understand where you're coming from and I should definitely try
to widen my knowledge base.

