Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: How to Break into AI Engineering
131 points by dragonmouse on June 22, 2023 | hide | past | favorite | 65 comments
What are some great resources for learning the skills and knowledge necessary to start a career as an AI engineer?

Gracias!




- Have crystal clear Mathematical foundations, as in why this formula/method the way it is, rather than being able to solve college/HS test problems. Really solid footing in Differential Calculus and Linear Algebra is necessary.

- Know the Statistical language that you learn from a basic college-level Stat 101 course. Be able to translate normal sentences into those using Statistical notation, and be able to read easily. Also, know basic Statistics.

- You already know programming, I assume. Learn Python if you don't know already. It's really easy.

- There are a number of paths you can go from there. Here's what I did.

-- IBM Data Science Professional Certificate (not deep at all, but lays out the landscape well; did it in a week)

-- Machine Learning for Absolute Beginners by Oliver Theobald which you can finish in an evening.

-- Machine Learning Specialization by Andrew Ng on Coursera.

-- Deep Learning Specialization by Andrew Ng on Coursera.

-- fast.ai course.

- Learn PyTorch really well. I suggest Sebastian Raschka's book.

Now from here, you can chart your own path. You can choose NLProc, Vision, RL, or something else.

I went towards Vision. And I do Edge AI as hobby.

I was in the last year of college as a Physics undergrad, when I was hired to do Vision modelling/research for a non-flashy company in 2021. Finishing my CS Master's next month and starting to look for PhD. I worked in the same company for the ~2.5 years.

EDIT: If you want a job in big tech, grind Leetcode, and learn about system design, study Machine Learning systems, and be able to design them. Chip Huyen has a good book as I hear. 6-7 rounds of interview is common in Meta/Google. DL hackathon awards, open source contributions are significantly helpful.


Highjacking for self-edification as you led with math...

Is math going to become more important for software engineers in general, as AI is adopted in to more and more aspects of the SDLC, even if said engineers aren't working directly on the AI systems themselves?

I ask because I'm quite math averse (dyscalulia). I am self-taught engineer, largely because it became glaringly obvious I wasn't going to be able to complete a CS program, so I pivoted my educational focus.

I still manage a relatively successful and effective career in software. I love the engineering aspects, e.g. programming, system design, problem solving, pattern recognition, etc.

When it comes down to it, I couldn't calculate my way out of a wet paper sack. Are my days (pun intended) numbered?


I've been an ML/data engineer for the majority of my nearly decade-long career. You don't need hardcore math unless you're building models themselves, and even then it's pretty variable.

There's a LOT that happens around just building models, and posts like the top post ignore that, and it's more akin to software engineering than anything else.


> Are my days numbered?

Only in the sense that ultimately everyone's days are numbered. LLMs can suggest bits of code but there's no reason to believe that it can undertake large-scale, novel software engineering projects. In fact the lack of an underlying mental model makes me doubt that the LLM paradigm will ever get to that point.

For the forseeable future I think there will always be a role for the person who can turn around and say "I see what you're getting at, but it just doesn't fit conceptually with the system. What are you actually trying to achieve? What if we did it this way instead?".

I could well be wrong!


Copilot can absolutely do the jobs of Data Scientists who got hired based on some minimal skills that were considered valuable even three years back.

I have tried applying Copilot to my own work where it spews- garbage. But I use Copilot a lot to generate some example data, or write repetitive code- I needed a basic Flask frontend and I used Copilot to do 70% of the job. But it cannot yet solve Vision tasks.

> " I think there will always be a role for the person who can turn around and say"

These roles, if and when they start to exist, will exist in extremely tiny numbers. And they will all likely be hired from Stanford/Oxbridge/MIT/Caltech, and they will have PhDs.

If this scenario plays out, then, I am afraid for my future, too.


>Is math going to become more important for software engineers in general, as AI is adopted in to more and more aspects of the SDLC, even if said engineers aren't working directly on the AI systems themselves?

I'm definitely not any authority on the matter -- also self taught, but I don't think so.

Having a conceptual understanding of the math is probably a really good idea, but I imagine there's quite a bit that can be hidden behind an api that you can essentially say "does xyz by means of magic".

I imagine this winds up creating three distinct roles: - Focusing on improving the model's usefulness - Focusing on implementing the model for optimal performance - Focusing on using the model within a larger application

I say "creating", but I imagine this is also probably roughly the current status quo.


The maths are not actually that complicated. 99% of what pytorch is doing is matrix multiplication and addition with n-dimensional arrays (tensors). For backprop you need a very limited understanding of calculus. [1] Stuff like sigmoid/softmax is also straightforward and you can grok it from a 5 minute youtube video. [2] If something doesn't make sense, just ask ChatGPT to walk you through it.

There is a lot of other complicated math that you will need when reading papers because people writing AI papers like to overcomplicate a lot of things that can be represented in 5 lines of pytorch.

[1] https://towardsdatascience.com/understanding-backpropagation...

[2] https://www.youtube.com/watch?v=Qn4Fme1fK-M


Til I learnt about discalulia!! Interesting. I know math can cause anxiety in kids as/when there is an external "reward/loss" contingent on it. I wonder if that exists when those external incentives are removed?

I did a ee and cs degree in college and I hated the eng part of it. It was because I hated the exams which was mostly rote stuff. When I finished college though and started working in a job that had no linkage to ee I one day just picked up courses (which were just arbitrary web 1.0 sites by random folks) and started going through the (signal processing/kalman filtering/convolution) problems and implementing them in software which was a lot of fun to see in action.

Reason for that long winded one was, I was thinking in your case, would the anxiety still be there as now you (might) have a reason to develop passion for the math than back in school where it was forced without a reason?


I have attempted a couple times to re-learn and the result is always the same. It goes beyond anxiety. If I try to read so much as a three digit number, my brain turns to mush and things start to get blurry. I have trouble reading ticket numbers on a daily basis, and struggle to help my grade school kids with their homework. I _can_ do formulae and calculations when I really hunker down, but it takes me considerable time, and I'm often wrong.


Have you tried recently? I was one of these people in high school but now in my 30s, stuff that seemed impossible no longer does. Maybe me or the context is different.


I have. I am disappointingly crippled by dyslexia. It takes a tremendous amount of energy for me to work with numbers and formulae which is one of the reasons I was drawn to programming. Once I realized I could abstract away the numbers and only really needed to get the calculation right once before putting it into a function, it was like jet fuel.


What did you try? I was the same, also in my 30s, did some online stat and calc classes, passed those classes, and I don't think I learned much. If I did, I've already forgotten it I'm afraid.


You clearly have much more experience than me when it comes to software development. I am not sure I can add to anything what you already know.

As to if all engineers everywhere will need to deal with more math- I don't think that would be true.

And exactly how use of AI will pan out in the development scenario is an unknown unknown to me. So I will not comment there.


Great post - thank you.

What directions can you head in AI Eng these days if you _dont_ want big tech?


AI Engineering is basically Data Engineering focused on AI. When in "traditional" Data Engineering you create pipelines that store processed data in something like a Data Lake, in AI Eng. your end storage might be a specialized Feature Storage (like Feast or GCP Vertex AI).

There are some AI Engineers with strong scientific/mathematical background, but that's rare. Usually, you're paired with these ML people that actually develop and evaluate the models.

So my advice is to start with Data Engineering and then find a specialization AI. You should have a VERY solid foundation on scripting and programming, specially Python. Also, a lot of concepts of "data wrangling". Understanding how data flows from point A to point B, how the intermediate storages and streaming engines work, etc. Functional programming is key here.

[0] https://github.com/feast-dev/feast


> AI Engineering is basically Data Engineering focused on AI.

I work in machine learning and this isn't how I see it at all. Data engineering specifically evolved as a term to differentiate the people who work with data but don't work on ML/AI.


I work in machine learning and also in data engineering, and for most of my career the data engineering title was for people doing everything in the lifecycle outside of R&D workflows (building the models/model architecture itself). It's only very recently differentiated to MLE/DE, and even that is far from being a standard.

The skillset is largely the same, but with some specialized knowledge for ML data work.


This is exactly what I did years ago, and it's much closer to software engineering than building models (which a lot of commenters are conflating with MLE - but tbf the titles aren't delineated well in practice).


Check out the Huggingface NLP course: https://huggingface.co/learn/nlp-course/chapter1/1

Huggingface has a bunch of courses, but that's a good one to start with. You can do the exercises on your own computer or on a cloud server if you want access to a more powerful GPU. If you go through these courses and pay attention you'll be in a really good position.


Maybe a little bit of a contrarian idea, but I would be really leery of trying to become an "AI engineer" now. There is a possibility that we are at the apex of this cycle of AI (if you look at history, you will see AI goes in cycles), and we will run into more and more limitations.

Instead, of targeting AI engineering, I would focus on obtaining a solid mathematics background (calculus, linear algebra, discrete mathematics) and a solid computer science background (algorithms, data structures, distributed systems, databases/data storage/data retrieval).

Then with those skills, you can easily become a "SW Engineer who leverages AI" which in my guess will be a much better job and more stable than "AI Engineer"


I wrote something a bit ago to answer this question: https://llm-utils.org/AI+Learning+Curation

It was previously popular on HN though only inside a comment thread, and I haven't submitted it as a link post yet.


This looks like a very good resource, thank you. It also clarifies for me a bit more the domains within the field, which I wouldn't have been able to define in terms of prerequisites.


Assuming you have the math and algorithmic background, I would start by reading the “attention is all you need” paper. After reading, attempt to build a baby transformer model in PyTorch. After that, consider constructing some of the building blocks without libraries to understand how they work.


I read this exact advice often here on HN and I can’t help but wonder.

Is the person writing it just repeating something they read? Is it just because they like the ´coding from first principles’ aesthetics?

I mean let’s imagine that someone does read that paper, and manage to replicate the code (quite an effort from someone coming from outside AI and academia).

Then what? I doubt it’s particularly illuminating. That doesn’t really qualify for a job by itself. So what’s the goal there? Is it just a thing to say to look like a cool hacker that code from scratch?


> Is the person writing it just repeating something they read? Is it just because they like the ´coding from first principles’ aesthetics?

of course. you know how i know? absolutely no one except the wannabees has time to read papers - people working in the area have deadlines and meetings. we absorb the content of the paper by osmosis - convos, code bases, occasionally a talk at a conference.

it's especially horrible advice from the perspective of pedagogy to tell a n00b to read a paper (so the person giving the advice has immediately disqualified themselves from possibly being an expert) because papers are horribly written, omit critical details, and function purely as advertising for the authors, group, etc.

for every poor undergrad/n00b soul reading this comment, take this thing to heart that took me too long to unlearn (due to its constant perpetuation by people like gop): reading the paper is 100:1 waste-of-time:value-derived.

if i hear about/see some paper that makes strong claims that are relevant to my work, i will look for a github link and/or email the first author. 5/10 i get a response (ratio is going up as i'm getting to be more ingratiated in my community). the other times i just move on - none of these papers have some revolutionary cure for cancer in them so most of the time what i'm already doing is already close enough that i don't need to kill myself figuring out the new thing.

that paper in particular (attention) has nothing in it that is in the least bit interesting/revolutionary. the hard part of attention isn't writing down softmax(QK^T)V, the hard part is executing that matrix product fast enough that you're not waiting eons for your model to converge.


I’m an ML engineer and frankly that paper is not very good in my opinion. It’s quite confusing. The BERT paper is much more approachable.


Could you post a link to "the BERT paper"? I've read some, but would be interested reading anything that anyone considered definitive :) Is it this one? "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding" :https://arxiv.org/abs/1810.04805


> I doubt it’s particularly illuminating.

Everytime I do a foundational activity like this it does turn out to be illuminating. Why do you doubt you'll learn something?


The goal here is to actually understand the mechanics of the model and begin the process of intuitively understanding the space. I would put the effort here at a few weekends of focus.

Also will add the models turned out to be a lot simpler to understand than I expected going in.


If developing intuitions is the goal, I really do think Jay Alammar's "Illustrated Transformer" [1] is at least a step function better in the service of that outcome than the academic paper itself.

(I totally realize this is subjective, but that has been my experience with my own learning in the space over the last few years as well as some folks I've mentored)

http://jalammar.github.io/illustrated-transformer/


> That doesn’t really qualify for a job by itself

Apple often asks it's ML engineering candidates to explain attention & transformers from scratch.


Just a quick reminder to everyone reading that AI / ML (let's face it, it's 0.1% AI and 99.9% ML) is still a ton more than just SOTA Deep Learning models. Depending on where you work, it could be all classical machine learning methods, and zero deep learning - or the other way around.

Having a broad enough understanding in ML would be a good starting point, along with solid SW engineering skills.


And would you consider this resource to contain "the math and algorithmic background" necessary? Or is it overkill/missing some things?

https://www.freecodecamp.org/news/all-the-math-you-need-in-a...


I'm not qualified to answer this, but I would state nothing is really "overkill"

I've been "filling in the gaps" in math for almost a year now to learn machine learning stuff casually. I don't even need to use it, I am just obsessed with learning and I read about it for nearly an hour a day, and its still not enough.

Being self-taught at math introduces so many painful problems. If I were to do this seriously I would start ALLLL the way back at algrebra in 5th grade and work forwards ALLL the way up to linear algebra/calculus etc.

There's just too many tiny things and subtle details that are missed that I find. It makes any example require 10 times more brain power just to do simple things I don't remember, like the rules of factorization etc. So I'll go learn that thing which is simple, go back and 2 mins later I'm off trying to find some other simple thing. Mainly the idea of learning 5th grade math and such is so boring I just never have actually done it, so really instead of what I do, I would just learn the ENTIRE freaking thing.

Learning stuff like gradient descent and stuff is easy, but that's not even where all the hard stuff lies. I feel like trying to understand the math deeply behind those topics where youre not just glossing over explanation is where it gets difficult, and to do that you pretty much need the most solid math background without gaps.


If you have a good handle on undergraduate math material I highly recommend graduate math material. You will "restart" math in a sense and start building up those building blocks piece by piece. If you have a shaky understanding of a derivative that's fine because you will do epsilon-delta proofs until your eyes bleed in a Real Analysis course.

Edit: I didn't have the laws/properties of logarithms memorized/understood until maybe 3-4 years into my Math degree. I could have learned it sooner probably, but I just had an aversion to it and would desperately translate any problem into exponentials and work with those instead. I definitely sympathise with the desire to "restart" Math.


> Being self-taught at math introduces so many painful problems. If I were to do this seriously I would start ALLLL the way back at algrebra in 5th grade and work forwards ALLL the way up to linear algebra/calculus etc.

If you want to do this, these books are great and have complete solutions manuals available: https://artofproblemsolving.com/store/recommendations


I worry that even if self-taught, I wouldn't have the credentials (job experience or degree), to do a full-time ML job.

Our my concerns unfounded?


ML is a relatively new field. How do you think current engineers got their start?


5 years ago, you needed a masters or phd to qualify for a Data Science role. At my current company, that is still the case.

Has that change?


DS roles aren't the only roles in the field available.



I would recommend AI Applications Engineering (e.g., applying LLMs as a unit of compute).

Start with reading AI Canon and setting up projects like PrivateGPT and AutoGPT locally, then working with LocalAI to serve up HuggingFace models in place of OpenAI models.


Can you elaborate on what "AI engineer" means to you?


I left the question a bit open ended to see the diversity of the field and to hopefully gather advice that could benefit others as well.

That said, primarily _for me_ it's more defined in what it doesn't mean - Which is software engineering _using_ AI (eg - LLMs) and prompt engineering. I'm already a self-taught web/mobile developer with 10+ years experience, and have built some toy projects using LLMs and image diffusion models. Where I believe I have more interest is in the research scientist route. I have prior education in biology/ecology/behavior, and would love to combine the fields.


there's ml, which requires math. lots of it, for the rest of us there's so much we can build now by just interfacing with language models using tools like langchain or llama index.

The future of ai, is models so advance only an ai can understand or create them, and they will do just that I'm a feedback loop of sorts.


Got it, thx for clarification. I think you're asking how to move into data science.


Make a Terminator to smite my enemies.


AI Engineering is broad. There are a lot of things to learn and a lot of mistakes you have to made yourself.

For applied ML, my tips are: make sure you learn the dark side of BatchNorm and Dropout, start with simple and elegant baselines instead of complex SOTA algorithms, spend more time on understanding your data than trying algorithms, be aware that SOTA in a related task will often suck at your task, be data driven. Also, most of your ideas will not work but you have to try and conduct experiments carefully.


I think studying what happened to the “data engineer” role is a good indicator.

There was a brief moment in time wherein data engineers were computer scientists specialised in distributed systems and data processing algorithms on commodity hardware. You had to know a lot on average.

Then came commoditisation via the big vendors and now you really don’t need to know very much. As a result it is not uncommon to meet “senior” data engineers who mostly script Python, do SQL and configure Airflow.

I think ML ala AI has already gone that way and many vendors are strongly promoting developer participation with courses and plug-and-play resources.

So what do you need? A vendor certificate you took over a weekend, and an employer to say “yes”…


> an employer to say “yes”…

This is a "just draw the owl" kind of comment. It's hard to get hired with just a certificate for plug and play tools and it isn't the kind of job OP wants anyway.


I wasn’t commenting on how difficult it is to be hired —- that’s relative to many things. But in terms of the least you need, I think I’m unfortunately right because I have seen it many times over the least few years.

We will see over the next few years when every 2nd BI analyst has become a AI engineer :-)


1. Log onto OpenAI

2. Prompt ChatGPT for "how do I make a pytorch program that learns to do <some task>"

3. Ask ChatGPT for code and for it to explain the code

4. Run the code on Google colab, if it doesn't work, ask ChatGPT why and keep rerunning it until it does

5. If you find some API that's too new for ChatGPT to know about, just paste in the API documentation and then ask ChatGPT to propose some code using it

ChatGPT is wrong a lot, but if you keep badgering it, eventually you will get a solution that works. It's like having a tutor standing next to you that you can ask questions to, I can't think of a better way to learn even if it's wrong on occasion.


Learn everything you can about cleaning/standardizing datasets and the boots-on-the-ground process of labeling data and training algorithms. These are fundamental, and are often overlooked and undervalued.


I'm surprised that many recommend heavy Math, but I always thought AI engineer is sort of specialized data engineer for AI projects.

It's definitely a plus to know a bit about the Mathematics, but I doubt anything short of a Master of Science in Math with AI as specialization is going to close the gap. How many data engineers can do that?

Wouldn't it make a lot of sense to hire someone with zero AI exposure but tons of experience in sysadmin, data engineering and ops? It's going to be tough to find someone who are both an engineer and a Math wizard, I think.


Has anyone here transitioned into an ai dev adjacent role, akin to something a bit more involved than "prompt engineering" potentially the "product eng" equivalent of AI?


That's me, currently. At a small startup and no one else seemed to be as interested in LLMs, so they made me the guy. Still a dev, deploying various models, integrating with our services, instrumentation, prompt management, etc.

Basically architecting LLM-related infrastructure to enable the product features they want, while managing expectations.


I moved into content moderation, which basically does all the engineering around ML models (logging, queuing, caching, databases, etc.) with limited opportunities to create my own models. Unfortunately, the barrier to entry is high, and it has been difficult for me to find the time to step out of my comfort zone.


I've been talking up the idea of finally settling on a PE certification for software engineers (in the U.S.). It seems like most of the risks and responsibilities being discussed in the context of government regulations could be addressed with mechanisms similar to what we rely on for aircraft, bridges, power plants, etc. -- all those areas have processional credentialing in addition to bureaucratic oversight.


They tried it from 2013 to 2019 with a total of 81 candidates, and it's been discontinued for lack of interest.



I'm familiar, but that definition wasn't well thought out.

Other approaches have been more widely adopted, like FINRA certs for working with trading infrastructure.

I'd much rather include knowledgeable individuals with professional authority in the loop than rely entirely on box checking as with various ISO standards, HIPPA, FedRAMP etc.


Can you clarify what you mean by "AI engineering?" There are two main paths right now - this is from my experience as a software engineer overall for a decade, a data engineer of some form for all but a year and a half of that, and a DE/MLE working on AI R&D teams for the last 5.5 years.

1. MLE/DE/MLOps - this is more like typical software engineering. You're responsible for building data platforms, tools, monitoring, and more around the model development lifecycle. This can include: data ingestion, data architecture, data transformation and storage, automating and productionizing various workflows like training, evaluation, and deployment, monitoring deployed models, data monitoring (and building monitoring), tooling like feature stores (and libraries for R&D teams to interact with them) or internal deep learning frameworks, etc. You'll basically work as a part (or an adjunct to) the research team that is testing new model architectures, different approaches towards some goal, etc. These are largely taken from my own experiences and projects I've built. Skills: software engineering, Python, knowledge of the model development lifecycle, data architecture/engineering, some knowledge about the frameworks used, cloud platforms, etc. Designing ML Systems by Chip Huyen is a great overview of all of this kind of work.

2. Research. This is actually building models, implementing papers, very occasionally (especially in big companies) doing publishable research. This is more akin to academic work (my educational background is in hard science academia), and requires a lot of paper reading, experimentation, etc. It will require knowledge of your niche (I mostly work with CV teams, for instance), strong math fundamentals, and very often a PhD.

I can tell you how I, as a self-taught software engineer with a bio education got here. My first job was a generic enterprise desktop application development role, randomly joined a data engineering team shortly after that not even knowing what DE was, but knowing I liked it. We worked on a massive distributed ETL system. I then joined my first startup, it was also a DE role, but we were a small group in a larger research team where I got my initial exposure to ML workflows and especially moving them to the cloud. We did some simple model training, data management, and building products around the models we built while also supporting the research efforts of the larger team.

I then went to another startup, where I had the sole responsibility of our research infrastructure (largely based on the strength of my knowledge of AWS and Python). I was the sole engineer on a team of CV researchers, and did things like automate their entire evaluation workflow and move it to the cloud, worked on the internal deep learning framework, and built a team to evaluate the current AI development lifecycle and design a platform to harden and optimize the process. Covid put the kibosh on that. I moved to another, earlier startup, doing similar work but more foundational - almost everything was built from scratch.


When does the AI do the AI engineering?


Why? Do you like to make software that barely works? Do you like to make and sell software which is dishonest in its capabilities? In traditional software, 10% error rate is unacceptable - it can't be released. But hey 10% error rate in AI is good! No, it isn't.


You’re either missing the point, or you’ve seen too many demos of people using neural networks for the wrong problem.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: