Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: How to pivot to a Machine Learning engineer?
103 points by lma21 9 months ago | hide | past | favorite | 55 comments
Software engineer here with 10+ YOE building data (mildly) intensive applications: mainly back-end development experience (from legacy to modern/cloud-native applications, brownfields and greenfields).

(1) is it wise to do this transition?

(2) has anyone else here in HN done it?

(3) how can I do it if my job has no ML in it?

Is there an ML engineering practice that isn't focused on building models but more on managing/deploying/scaling models? i.e. can I avoid learning all the maths underneath?




I was a MLE Tech Lead at Snap and laid some of the foundations of the generative AI infra at Snap. I would highly recommend MLE route as a very rewarding career path.

This book is a very good introduction to designing Machine Learning Systems for production: https://www.amazon.com/Designing-Machine-Learning-Systems-Pr...

This blog by the same author is highly recommended as an intro into building production grade AI and ML systems: https://huyenchip.com/2023/04/11/llm-engineering.html

To summarize answers to your questions:

(1) Yes it is wise to do this transition especially at an inflection point in the zeitgeist of the times as now

(2) Yes

(3) See above for resources on how to get started and reach mastery in the craft of ML Engineering.


I dipped my toes in about 7 months ago with a 3-month-long project to make content recommendations using ML. I started with off-the-shelf collaborative filtering libraries and ended with PyTorch. ChatGPT was a huge help. I would have been OK continuing down that route, but execs wanted faster and better results, and 3 months is just enough time to really get a flow going when starting from no experience.

A lot of ML was cleaning up and preparing datasets, which wasn’t that much fun. I had an exec pushing for using Amazon Personalize, which I gave a good try but ultimately it didn’t deliver. Was that because of data problems or underlying models? That’s the crux of the problem when using black-box ML services: you can’t analyze what’s going on. And Amazon Personalize makes changing the data layer a pain, so you never know if you’re getting closer to a better solution. Who knows, Personalize is probably great in the hands of experienced ML folks.

So, if you can swing it, definitely recommend doing an ML trial project to see if you like it before committing your career to it.


> Is there an ML engineering practice that isn't focused on building models but more on managing/deploying/scaling models? i.e. can I avoid learning all the maths underneath?

I transitioned to this in 2018. It was called MLOps. I was a mobile developer before that. Transitioning was pretty easy at that time (it might be more competitive now). What I did: worked on an intensive ML project for myself and realized I enjoyed working across the ML stack. I wrote a blog post on the project here: https://www.nicksypteras.com/blog/aisu.html. Then I applied to an MLOps team and leveraged the project to demonstrate skills/experience. As for avoiding the math, you could probably get away with it but learning the basics will make everything way easier. I feel like without some basic ML math under my belt I would have been flying very blindly.


I think they call that focus "AI Engineer".

Edit: just realized you might also be thinking of "MLOps". See end of comment.

It's what I have been doing for the last two years but I refer to it as "software engineer with a recent focus on generative AI".

I also think AI Integration Engineer is good but I have only really seen AI Engineer.

The thing is, up until a few years ago, doing useful things with AI generally did require something more like machine learning knowledge.

But now that we have general purpose models like gpt-4o, Claude 3.5, LlaVA, etc., you can just do an API call and in a day or two have a more functional system than what a machine learning engineer may have spent months training a custom model on previously.

So that somewhat explains the confusion. I think it's best to just honest. Most applications actually don't need "real" ML knowledge or custom neural network architectures or training a model from scratch.

I do think that ML is a good field to be in if you have the patience to learn the math and about neural networks etc. But I think that is not what you are talking about. And the architectures and models are very general purpose so as I said you can work on many applications of ML now without having that background.

Go to the Anthropic or OpenAI documentation and copy paste their examples and try inserting some customization into the system message using an f-string.

I think MLOps is also a thing. Go to HuggingFace and RunPod and practice deploying models with Python. Also find some tutorials on LLM pre-training, fine-tuning, and evaluations. Check out Predibase.

A big thing right now I believe is Diffusion Transformers. If you can find some article explaining how to run a training job for that, you may be able to help people.

If you want to "cheat", check out replicate.com. cog could be useful for self hosting ML models outside of replicate.com also.


I'm not "pivoting to a ML engineer" but in the last 2.5 months I've learned to some extent to use public models, use the tools and APIs to train and run them. That was a lot of reading with little code writing.

I didn't pivot into it, that was part of the project (object recognition in a video stream).

It helps if you work with small organizations that don't box you into a role but just give you stuff to do.


Could you please share some of the learning resources you used and found useful.

I am overwhelmed by the amount of ML related material out there and am having a hard time finding out material what is worth my time as a software engineer.


The most important thing is I have a fixed goal and a paying customer to keep happy :)

The resources are all crap to be honest. Half of them have been obsolete for a year or more and most of the rest look like they're done for self promotion and assume you already know everything. Every public repo you run into has already been forked 3 to 10 times and now you have to find out which one(s) is/are up to date.

I had one of these toys:

https://shop.luxonis.com/collections/oak-cameras-1

They're cameras with a colour cam, two b&w cams that give you depth info (z-distance) and a small coprocessor from Intel that can run a reduced neural network directly on the cam.

The other thing they have is their own API and pretty good documentation for it.

It won't teach you about ML math but you'll get used to loading pre trained models and getting your data out of them. And then you'll move on to training your own model (look at DarkMark for example), converting models between various formats and other stuff like that that I'm still learning.

And you get a pretty fun toy!

Mind, i've only worked with object recognition. I have no idea about LLMs or other applications of neural networks.


Karpathy’s zero to hero series on Youtube is considered one of the gold standards for starting out.


1) depends, I find ML much more interesting than software engineering.

2) Not me

3) With difficulty, but through courses like fast.ai

If you’re not into maths and problem solving this is probably the wrong path. The main value add you bring is being able to transform a business problem into solvable maths.

Read Introduction to statistical learning and look at the fast.ai courses. I also recommend the paper attention is all you need. If you find all of those things interesting and not too mathsy then I’d say go for the switch. Else, maybe look at options in data engineering or as a software engineer in a ML team.


When people talk about working with ML requiring a lot of math, what do they mean? That is to say, I have a degree in electrical engineering so I learned a lot of math. However, my programming career hasn't required me to actually use it since I graduated.

So I understand/"know" a lot of math. However, it would be tough for me to build back up to the point where I can, for example, solve differential equations again.

In a manner of speaking: Does a career in ML require a strong understanding/knowledge of math or does it require you to be able to solve a lot of math?


Visualizing 3D matrix multiplications, and getting comfortable with it. Then there's basic calculus in understanding gradient descent. Can't think of any other advanced math that was necessary to grok the innermost workings of most models today.

Source: I won a silver medal in a kaggle competition after 6 months of ML self-learning.


> Does a career in ML require a strong understanding/knowledge of math or does it require you to be able to solve a lot of math?

No. With ANNs, unless you're doing research into totally new ML architectures, you're not gonna do any maths apart from some arithmetic. And even if you do, it's usually quite simple maths, mostly matrix multiplications and simple non-linear transformations of scalars.

Nothing even close to solving differential equations, and there is very little analytical solving of anything. Advanced stuff may need some statistics in the theory side, but not really in applications.


Just to he sure that I understand the context, are you working as MLE or where from you had that knowledge?


No. I use ML in academic research and teach it. Some come close to what is done in "real-world", e.g. training and evaluating LLMs with real-world datasets.


My 2c: there isn't really a lot of math underneath practicing many ML jobs. But I would read up on basics. Get a basic understanding of optimization and gradient descent. If you are not math inclined, do not bother with untangling chain rules for networks, but play with a few toy, non ML, low dimensional problems. That understanding will help a lot.

And I would look at applying existing models to business problems. That I think has a lot of demand now. Solve a few problems for yourself, make a few blog posts and apply.


Answer: Just do whatever the fuck is fun for you. Or has the best expected value, if you care about money.

Edit:

>i.e. can I avoid learning all the maths underneath?

Why would you want to avoid that? Math makes everything easier. Doesn't even matter which field you work in


> Math makes everything easier

This can't be overstated enough. Understanding the math behind what you are doing has the capacity to turn mind boggling hard problems trivial.


A lot of great takes in this thread, but let me give more perspective being a MLE/MLOps Engineer myself around the math.

For me mathematics and statistics represent (a) unemployment insurance since it give me a lot of transitivity around roles in the space,(b) give me a good toolbox to talk as an equal with DS, and (c) for all implementation made by the Research Engineers/Data Scientists I can chime in and give insights and avoid waste of time and resources.

One example: When BERT was released (ca. 2018) I was working in a place where several Research Engineers and DSs wanted to use it in production for text classification.

The issue was that architecturally BERT was suboptimal due to a process called masking [1] that increases significantly the training time and the inference time was not so great. The alternative that I gave at that time was to use a mechanism called "Bag fo Tricks" [2] which its a very efficient modification of Bag of Words, but knowing math (and being on top of the literature) saved me from implementing something that would be inherently inefficient. Without having it it's hard to push back on DS/ResEng.

[1] - https://datascience.stackexchange.com/questions/97310/what-i... [2] - https://arxiv.org/abs/1607.01759


I actually wrote a blog post about this for experienced software engineers like you who are thinking of transitioning to ML, so I wanted to share it here: https://www.trybackprop.com/blog/2024_06_09_you_dont_need_a_...

I write about various engineers who now work at Meta, Google, Amazon, and OpenAI who made the switch. You can see what strategies and tactics they used to do it.

1) It's "wise" if you find during your personal hours you are enjoying hacking on it. Before I made the switch, I spent a year studying the material on nights and weekends so that was m my first data point that perhaps this is something I wanted to do full time.

2) Yes, I have! And I've been an ML engineer for 7 years now after I made the switch. For context, I'm an ML tech lead at FAANG. Prior to that, I worked in infrastructure and product.

3) One piece of advice I got on this years ago is to join a team adjacent to ML work so that you can get familiar with what production ML looks like. You can also start practicing ML thinking on Kaggle.com.

P.S. You can check out other posts in my blog for resources to learn AI/ML and the math needed for this career, such as my Linear Algebra 101 for AI/ML series: https://www.trybackprop.com/blog/linalg101/part_1_vectors_ma... (includes interactive quizzes, fundamentals of vectors/matrices, and a quick intro to PyTorch, an open source ML framework widely used in industry)


How much math is truly necessary to work as MLE in a company where you do not need to write papers but need to deliver working ML systems?


Just basic stats, basic calculus etc to working with data and use ML algos/ML techniques etc.


You should try fast.ai practical deep learning for coders part 1 and 2. It's quite dated 2022 but the principles you learn are very valid and highly useful in today's context. Especially self attention, transformers and the newer architectures based on these concepts.

Many who have done the fast.ai course have pivoted their careers into not only ML engineers but also research scientists.

It's not an easy course so to speak so you have to work through it in your spare time.

Since you are interested in deploying/scaling feel free to jump straight ahead to lesson 2 of part 1. Jeremy is an awesome teacher. I don't like or come from academia so I find his style of teaching very wholesome.

https://course.fast.ai/Lessons/lesson2.html


Great question!

1) Depend on your own goal. Its wise if you believe in the future, market appreciate ML eng higher than software eng (within same years exp)

2) Yes. I did (> 3 years now)

3) Start find the ML job

Is there an ML engineering practice that isn't focused on building models but more on managing/deploying/scaling models?

Yes. Companies who adopt ML at large scale usually need this.

i.e. can I avoid learning all the maths underneath?

Yes its possible as long as you are focusing on infrastructure. Eg: create/monitoring pipelines, models, etc.

Noted: While you finding job, try to take professional ML engineering certification/program (Google, AWS, Azure, etc) - they will provide you hands-on lab experience with some case studies. The cost is super cheap


I would recommend understanding the ML algos at least on a conceptual level. And to use them "raw". It's nowadays quite straightforward with e.g. the transformers-library.

The maths in ANN ML aren't that hard, and you don't need to understand them very deeply even to come up with new models. A lot of the new model development is just stacking pre-built layers with torch.nn.Module.

The difficulty comes from getting the beasts to actually work. But it's largely trial and error for everybody.


There’s definitely going to be plenty of work in the area you’re describing, but I think it’s worth going in eyes-open and making sure you’re doing it for the right reason.

AI/ML is already becoming rapidly commoditised and the level you’re talking about is very infrastructure/platform orientated whereas all the real action (and potentially higher value stuff) is going to be happening an abstraction level or two above that.

It’s kind of like when electricity was invented. Do you want to become an engineer working on building the electricity grid (rapidly commoditised) or do you want to be the inventor/builder working on new things powered by this fancy new electricity?

The analogy isn’t perfect but I always think it’s worth carefully thinking about what level of abstraction you want to work at. Working in the infrastructure layer is definitely fun and rewarding. I would happily do that kind of work too. I love Devops. But like a lot of infrastructure level stuff it may not be where the “action” really is eventually.

But it depends what your motivations are too. “Do what you enjoy” is always a good way to set a general direction I think.

Just my two cents. I’m just an internet moron. I could be wrong.


I transitioned into an ML Engineer.

There are different types of roles in an ML project.

1. Data scientists - These are the people who analyze the data and prepare the models. They basically deliver a jupyter notebook to us

2. ML Engineer - We take the notebooks from the Data scientists and productionize it.

3. MLOPs - These are people take care of the required infra, basically the equivalent of devops.

Personally, for me I worked as a hybrid of Data scientist/Ml enggr. I liked being an ML enggr better.


How well do you know math? Do you really know math?


yes, well enough to get by. Not enough for research or coming up with new ML algos etc


Which math to learn to be on your level?


I transitioned to what's now referred to as ML engineering 11 years ago, as a programmer I started working on scripts doing all sorts of ETL (in Python - was my intro to the language) and handled datasets saving and loading for training. I managed models serving (we were using Theano at the time) in prod via REST APIs. Also worked (and still do) on writing model architectures with DL experts and I can say I still don't have solid understanding of the maths but I sure know how to manipulate matrices and write ML layers/models when working with an expert.

There are other aspects to the job (like dispatching experiments) but the point is that I was able to bring value in all of this as a programmer without requiring any new skills apart from the natural learning experience that one has to go through in any discipline. I think you can surely transition as long as your job requires it.


I'm looking into learning more about ML and how I can create my own models. I enjoy math, but when I read research papers the formulas sound like gibberish. I also like the idea of Kaggle competitions, do you think I need to have research-level math understanding of ML models to do well on Kaggle?


Yes, I believe you do need to have a good math understanding to understand why a model is behaving the way it is and develop intuition into where the problem might be but that comes with experience. If you're looking into ML research then absolutely you need to learn the math but if you intend to support ML as a programmer then there's plenty of space for you to do so


Thanks for the response! I'm torn between AI research and just implementing models with my own data, I do enjoy math but I can't seem to understand advanced math, that might change after I get an engineering degree though, or not ¯\_(ツ)_/¯

But for the time being, I'm just wondering if people who win Kaggle competitions implement their own algorithm, or do they just read a lot and try techniques that are already out there?


Good luck in your studies and plans!

I can't answer you about Kaggle competitions as I never was involved in any but I imagine the novelty would be in the way people encode the datasets and maybe the cost functions they implement but the models themselves are probably based on already established archs


I've been wanting to do the same, but when I get far into machine/deep learning there is some high level math involved, so I've been taking all the math courses on khan academy for the past couple months https://www.khanacademy.org/. Soon ill finally be done with all the math listed there and move back into learning more about deep/machine learning (even if I stick with programming the math will help in all areas of my life). If anyone has a good math resource I can take after khan academy more geared toward machine learning I am all ears.


The last year has really intensified how often I see this question here or on reddit or otherwhere in my online travels.

You're describing MLOps.

Why would you want to make this transition if you're not keen on the underlying math? Why not include DevOps or SRE if you're interested in code-based pipelines, deployments, and scaling? How are you going to get into things like drift detection without understanding the math?

(...hoping for an answer that isn't shouted by Rod Tidwell at Jerry Maguire -- but genuinely curious why I'm seeing this so much and trying not to make a cynical assumption :) )


Trying to make the same transition right now. I’ve got almost 10 years experience with python and data engineering and I’ve been reading tutorials and playing with projects on the side.

I think I’ve got a grasp of the fundamentals and the ability to learn fast on the job, but every MLE job listing I see wants “4+ years of experience training and deploying models in a production environment” or something (even non-Senior roles!). I’m not sure how to break into it, to acquire a MLE job to get the experience to acquire an MLE job. Does anyone have any advice?


Just the standard advice: It's usually much easier to switch into a new role (MLE, manager,...) at your existing company than landing those roles directly at a new company. So if your current employer does not employ any MLEs, consider joining another company that does– but apply for a role you're currently well qualified for, and then try to make the switch internally. Consider being up-front about that in the hiring process to get signal on how supportive a company is.


It is not wise.

The AI/ML hype will die out, or you’ll be replaced by some third party service, and you’ll be left with no job, having to relearn how to be a regular software engineer.


You could try your hand at the following course and see how it feels? https://fullstackdeeplearning.com/course/ (I loved the experience.)

It is much more focused on the engineering than the maths of deep learning itself.


Is it a wise choice though?

I feel like the market is saturated with machine learning. Everyone seems to be doing machine learning these days.


Get a PhD in ML from a top school. If you can't, get a MS CS/DS with ML emphasis from a top school, AI grad cert from Stanford at a minimum so that you can understand the latest arxiv papers. If you can't, YOLO and sift through a lot of low-quality articles on the Internet, find the gold nuggets and learn to apply them rapidly and then hope somebody will notice you and hire you. Competition is brutal right now as AI is the only area that is still hiring like crazy. I still think you are 5-10 years too late to start right now. If you can do DevOps, you can likely learn MLOps quickly but it's the same horrible job as regular DevOps. Also, data engineering is not ML but those jobs are easier to find.

EDIT: For downvoters, that's how I did it. I was a very successful SWEng (some of my work was among top posts on HN under different nicks) but saw the ball rolling towards ML in 2012 so I reskilled.


Citation needed on "competition is brutal right now".

I'm seeing folks with their first and only workshop paper at an ACL track conference landing 150K offers starting at no-name startups. Some of these folks are not even 20 yet. Workshop papers are considered "easy" to publish, and are held in lower regard compared to main conference publications.

If it's "brutal" to compete against folks like this, I think a lot aren't cut out for this field.


There aren't that many folks who publish even workshop papers. Most folks are scared of academics and hope to raze their way to ML just with dev skills which is unlikely to work as they won't be able to grasp the concepts they need to implement, especially if they work on anything <2 year old. $150k is also on the low end.


There's plenty of demand for doing ML just by calling OpenAI or similar APIs as more or less total black boxes. Probably moreso than for designing and training your own models. And even then it's mostly taking a pretrained model from huggingface and doing fine-tuning and prompt churn by trial and error.

E.g. doing or hosting state-of-the-art LLMs is more or less infeasible for many/most use cases. (Applying LLMs succesfully for many/most use cases is probaly fundamentally infeasible, but that doesn't mean you can't get paid doing them anyway.)


Those jobs are quickly getting commoditized - you can see it e.g. on TopTal where these types of jobs had $150/h last year and $60/h this year. But jobs like "create a framework for interpretable transformers based on some DeepMind research" are still at $250/h.


So 2014, the year before OpenAI was even founded, was too late to get into the ML space? Very interesting take.


No, but add ~5 years to master it if you are a decent academic performer, then additional few years to learn how to scale it up. Some folks could master it faster but most would likely fail due to the lack of commitment. There is also a bunch of folks that still live in RNN days and cast evil eyes at anyone who uses transformers (hello Jürgen!), so one has to consistently update their knowledge to be relevant (CS25 could help there).


He does not want to be a AI/ML researcher, but a ML engineer.


Companies are picky. FAANG don't need puzzle solvers but want them anyway.


I've had that title.

It helped that I got a PhD in Physics so I knew a lot of math already, also I had been interested in ML for text classification circa 2005 or so. From 2001 to 2010 or so I was busy with a wide range of web applications such as: a social network for a secret society, a blog for a local political party with an integrated telephone response system, an application tracking system for a nanotechnology internship program, etc. There was plenty of brownfield work in there.

I had worked on a series of side projects that got attention and landed me a job as a "relevance architect" (recommender systems and such for a new social media site that softlaunched but didn't get big) and then an ML software engineer where I completed a search engine for patents based on a neural network.

After that I went through a phase of doing random consulting projects but also trying to start my own startup around data engineering problems that didn't get the support I needed. I learned Python because there was huge amounts of Python work in this space. I did a project involving LSTM networks for text around this time.

When I threw in the towel I joined up with a company where I was between the engineering and data science teams, over the course of a year I had figured out most of why our Python systems were not entirely reliable but the company was pivoting a lot and towards the end I was writing more typescript + Scala. Even though I had the physics education I felt my long experience as an applications developer was more important to this role: we had data scientists who were better at creating models than me, but most of them worked in Jupyter Notebooks and didn't have a clear idea of how to take the code they wrote and make it reusable: how to make the "monthly sales report" as opposed to the "April sales report". It was my role to get that discipline in terms of collaboration w/ the data sci's, the process we used, and as embodied in the software I developed.

It was a great experience but also a bit disorganized as we were trying to develop a product but also having to change tack every week to accommodate project work for A-list companies we had as customers. We had a system called "Themis" that was used to build training sets that worked but I disagreed with architecturally (#1 requirement is not sweating it when your A-list company needs something really different). I wrote up a description of a system called "Nemesis" which the company didn't go with but I developed a number of (image|document|comment|job advertisement)-sorters on my own account afterwards that were all called "Nemesis" until I had a vision which lead to YOShInOn and the newer FraXinus which is meant to be an everything sorter. We were working on CNN-text models at this time, BERT came out and we thought it was a big advance but we had no idea how big it would be.

I was burned out from working remote so I went looking for an local job which means doing more ordinary stuff like React programming but I sometimes I get to do some more systems oriented stuff such as writing parsers and code generators and such. At the end of my ML phase I felt that (1) it was more important to have the appropriate labeled data rather than the best models, and (2) UI was the bottleneck for (1) so it was worth "getting gud" at UI so I have. I still have my side projects.


Don't. Most of these roles are not ethical.


Can you elaborate? How are the ethics different from any role in tech?


Lol like ordinary programmers have a such a great ethical reputation... Cambridge Analytica, Ad Platforms, and on and on.


My view stands with those as well.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: