Hacker News new | past | comments | ask | show | jobs | submit login
How I Became a Machine Learning Practitioner (gregbrockman.com)
290 points by sama 6 months ago | hide | past | web | favorite | 47 comments

I think it's doable if you're at the right place and have enough opportunities around you, and something to show for. I feel that companies and startups around tech hubs are more willing to give someone a chance, and look through the formality, if you manage to convince / impress them.

Where I live, far away from tech, it's almost impossible to land a job in ML / AI / DS unless you have a (minimum) Masters degree in something relevant. Preferably a Ph.D and solid experience to show for - I know because I work in the field, and lots of F500 dinosaurs are just now waking up. But are also unfortunately clinging to their old ways of hiring people.

Schools all over are also picking up slack, starting to offer specialized graduate degrees in those domains. When I got my degree in ML, it was a sub-field at my schools engineering department, mixed up with signal processing and control theory groups.

When first trying to get a job, the main problem was to explain what I actually could bring and do, and a lot of the recruiters or managers had no idea what Machine Learning was. Then you said "It's basically Artificial Intelligence" and, and they were instantly wooed.

Your observation is correct. No degrees are needed there is a lot of stuff to build.

Given that Greg Brockman was the CTO of Stripe before OpenAI, that's a order of magnitude more technically/CS capable than the typical reader who might be looking into ML.

Yes but the transition definitely doable and his advice is great. The key part of his advice is spending time experimenting, rapidly failing, and continuing to work on it with real world use case. Often the challenge is making the jump from the simple toy examples used in educational materials to the messiness of real-world data.

I'm a senior data scientist at vc-back startup. I'm in a hybrid data scientist/ machine learning engineering role, where I build and train ml and deep learning models and also build the scaffolding them to support their production usage. But my previous roles included being a business analyst, project manager, and research analyst. My undergrad education was in Creative Writing and the social sciences.

While I kind of accidentally transitioned into this career, how I got here is similar to most folks coming from a different background. Lot's of self-study and experimentation. I think one of the challenges to transitioning into ML and deep learning is that there are so many applications, domains, and input formats. It can be overwhelming to learn about vision, nlp, tabular, time-series and all other formats, applications and domains.

Things solidified for me when I found a space I found compelling and I was able to dive deep into it. You kind of learn the fundamentals along the way through experimentations and reflection. My pattern was pick up a model or architecture. Learn to apply it first to get familiar with it, experiment with different data, and then go back to build it from scratch to learn the fundamentals. That and I read a lot of papers related to problems I was interested in. After a while, I started developing intuitions around classes of problems and how to engage them (in DS you rarely ever solve the problem, there's always room to improve the model ...)

Thanks for the info.

I have a serious question (not for bashing)

Can you please describe what part of your job CANNOT be automated?

No worries, fair question. It worth noting that my job is not data analysis. So I do use data analysis to evaluate our metrics and model performance.

Really none of it is really automatable. I'm working developing NLP features for our product (question answering, search, neural machine translation, dialog, etc). Our customer data is diverse, in different formats, and thier use cases are all distinct. So most of my work is novel applied research and development.


I do assume that the data format is different (alas I also assume that they are all some sort of a text file with known fields and types).

But after you setup the dataset definition and defined the schema, the rest can be based on neural search?

Moreover, isn't there a state of the art architecture for each of the task. E.g. Seq2Seq for machine translation. Can you just use that as a base line, and let the NAS engine search hyper param, etc?

Happy to talk more offline, my email is my profile. The short answer is no because there are more complexities involved - both related to our specific use cases but really natural language in general. If that were the case NLP would be solved and any company that could exist would already. From my experience, I'm not sure where the line is between choosing the right model vs having the right data solves most problems. There have been novel architecture developments like rnns and lstms that have shown well to support certain domains. New architecture are developing each year and the space moves very quickly. On the flip side, having pedabtyes of data (like BERT or OpenGPT) and simpler architectures is also powerful but prohibitive to everyone that is not Google or state government. The real answer is probably somewhere in between and whiles it's unsolved, there is work for me to do. That being said, our strategic philosophy is to make our AI a commodity so that we can differentiate ourselves on other features.

Most of our problem don't cleanly map to existing NLP tasks. State of the art often isn't as high as you think in many tasks. For example, the machine translation in relation to beta feature we're building that lets you ask the question of arbitrary single tables (kind of like wiki-tables) but we don't the know the schemas in advance or the questions the user may ask about. Outside of having the issue of having quality annotated data (which we often don't - cold start problem), we need to do more than simple model tuning. It requires building custom architectures.

But even when you consider known tasks, state of the art models do not often produce those same results on real-world data. If you put aside data quality issues (which is another huge challenge for us), in the context of question answering, the training data rarely captures the distribution of the natural language in the wild. People ask questions differently and use language that doesn't match the content in our knowledge base.

I could go on. But short answer, it's not as straightforward as you think. Even at google scale, machine learning is not solved. For everyone else with fewer data and domain-specific use cases, it's even harder.

Thanks for the answer. I am happy to discuss offline. While I did my master on computational linguistics which is related to your field, I am currently creating a new auto ml platform so I would appreciate your feedback. My goal is to automate the straightforward parts.

As you mentioned, some tasks in NLP like full conversation are not solved and will likely never be solved with deep learning by itself (at the level of the conversation). There should be some sort of symbolic AI or taxonomies/knowledge graph (like RDF) in combination with deep models.

>But after you setup the dataset definition and defined the schema, the rest can be based on neural search?

Sure, but hyperparameter tuning and architecture selection takes such an insignificant amount of any competent ML practitioner's time so as to be pretty much irrelevant.

At least for me, my time is mostly spent: 1. Understanding (or designing) the process that generated the data. 2. Organizing the training schema. 3. Understanding the customer's business problem so that an appropriate ML system can be designed. 4. Doing an initial design of the ML system based on that understanding and then iteratively designing new components for said system based on customer feedback. 5. Developing or researching how to measure model performance. 6. Searching for alternative data sources. 7. Answering customer and stakeholder questions about the ML system 8. Implementing the ML system in code.

None of these can be automated with current technology, and there's a reason for that: if it was possible to automate a task then our team already would have.

Sure, provided you have enough data to feed a neutral net, and the problem is well suited to it and you don’t mind giving up huge chunks of explainability.

I recently replaced a classifier at work that was using a neural net with a decision tree and some hand chosen features. It performs a bit better, it takes way less time to train and it’s significantly more explainable: my teammates asked why it sometimes miss-classifies a certain edge case, and because the features and model properties were so easy to understand, fixing the issue was a couple of hours work and not a case of “who knows”.

One of the difficulties is that the broader scope your optimiser has to push towards a solution, the better your measurements need to be. And having an accurate measure of which thing is "better" can be prohibitively expensive.

The costs of errors varies drastically in different domains and for different use cases, so something important is understanding how and why different models typically fail and making tradeoffs there.

This is really good advice. I'd say it generalizes to learning most engineering challenges: Pick a problem you're interested in and solve it from top to bottom, tweaking all components as you move forward.

Studying Math at Harvard/MIT certainly puts you in a different category than the average software engineer. And if ML was still challenging to Greg, it is honestly a bit discouraging.

(I wrote the post.)

If it's helpful, I dropped out of both schools — the vast majority of my knowledge is self taught!

>I learn best when I have something specific in mind to build.

This is so incredibly important for me and, based on my conversations, many others as well.

The other thing I struggle with is the feeling that many of the problems I wish to solve are likely also solvable with simpler statistical methods and that I'm just being a poser by trying to pound them home with the ML hammer.

Question: will the OpenAI fellows curriculum ever be released? I need a nice, structured intro to deep learning research and feel like the curriculum modules your company has developed would have the highest quality.

(For reference, I’m an undergrad looking to get into this field)

How often do you go "back to basics" (read introductory book chapters, or classic papers, etc)?

So there is a difference between ML research and application. Being a practitioner doesn't require deep math knowledge that perhaps research would. Jeremy Howard's fastai course is a great example of how someone with a solid programming background can effectively transition into being a deep learning practitioner. Given that production ml and deep learning is still the wild west, as a practioner, you can contribute also to the research around effective training, scaling, and application of these models. The math and intuition required are definitely acquirable.

I think when you shift into pure research, yes a deep probability, information theory, linear algebra, and calculus background are needed. But at the level, you're rarely writing code and more likely working at theoretical level.

I recently got the assignment to "do ML" on some data. I hadn't done anything in the area before, and a couple of things surprised me:

1. Most of your time is spent transforming data. Very little is spent building models.

2. Most of the eye-grabbing stuff that makes headlines is inapplicable. My application involves decisions that are expensive and can be safety critical. The models themselves have to be simple enough to be reasoned about, or they're no use.

You might argue that this means what I'm actually doing is statistics.

The longer you work with ML, the more you discover that it's almost exclusively about handling data.

It's also one critique I have to the world of academia. When learning ML in academia, 9 of 10 times you work with clean and neat toy datasets.

Then you go out in the "real world" and instantly get hit with reality: You're gonna spend 80% of the time fixing data.

With that said, I think that 10 year from now, ML is going to be almost exclusively SaaS with very high levels of abstraction, with very little coding for the average user. Maybe some light scripting here and there, but I mostly just drag'n drop stuff.

> You might argue that this means what I'm actually doing is statistics.

whats the difference?

ML conferences have way bigger budgets.

Also, note that Greg's goal was to contribute to OpenAI's flagship project. That's a rather ambitious goal!

Also, most folks I know that are making practical deep learning contributions are doing so by combining their pre-existing domain expertise with their new deep learning skills. E.g. a journalist analyzing a large corpus of text for a story, or an oil&gas analyst building models from well plots, etc.

as a side note, I love that you highlight regex in your new NLP course. There is an inherent tension between the probabilistic nature of models and the need for deterministic outputs in most production settings. Often if we can uncover linguistics rules or regex patterns that guarantee minimal precision (or as our VP puts it - don't look stupid), we'll eschew the model in the short term or use the model to augment the rules.

Also I really appreciated that on of the training goals for ULMfit was to be trainable on a single gpu. With these large-capacity models, training is getting crazy expensive and out of hand. Any chance that your future work will still keep the single gpu training goal?

I think "doing" ML math is a little different than "doing" math math.

With math, on paper say, it is hard to tell if you are doing it right or wrong. You can trick yourself quite easily. A compelling proof can have a huge hole.

You can still trick yourself programming -- in a sense, that is what a bug is -- but it is much harder.

The upshot is, I think it is easier to teach yourself math that is applied to a computer program than math on a piece of paper.

Maybe if you want to write tensorflow, but not if you want to use it.

If you want to write TF, then you are a ml engineer. If you want to use TF, then you are a scientist (data, research, whatever)

We know that someone with a good CS degree can do this because.. Ph.D students....

Does anyone not want to become a ML engineer? Is this the future, and will we even have a choice or else be out of a job?

There's plenty of non ML software engineering to be done. Anecdotally a large proportion of interns want an "ML project", but only a small percentage of teams looking for interns are offering one.

Too many people going into ML could skew the supply/demand into making it a worse job option (more work, less pay), like game programming or academia.

There aren’t that many engineers with 10+ years of ML experience ATM, but there will be tons in 5 or so years. Chasing tomorrow’s trend is always more productive than chasing today’s, but of course the former requires predicting the future.

The question is this: 10 years from now when the top job requirements list ML - are you going to be ready or out of the game?

Alternatively the market is flooded with ML people and the value of ML on your CV is 0.

I got an offer to do ML on video data recently, but turned it down to do some non-ML software eng. instead, for double the money. The ML work would be way cooler, but it would at a startup (=stressful) and double the money means half the time to reach FIRE.

The caveat is that I've worked on ML in the past and I think that the work is maybe less intellectual than software engineering - with complex enough models they become impossible to understand and you start to just try out ideas based on random intuitions. The thing I mostly like about it is the ability to use math and more independent style of work - no scrum, less need for cooperation with other team members etc..

I started in ML about 7 years ago so well before the hype and back then very few people wanted to be ML engineers.

What's happening at least in Australia now is that contract rates (a good indicator of the supply/demand ratio) has halved for ML engineers. Which means (a) a lot of people want to be ML engineers and (b) there aren't that many jobs for them.

Just to be sure I understand correctly, you’re saying that money has halved for ML engineers because plenty of people are getting into ML and are willing to take much less money? You said they “want to be” ML engineers which implies they aren’t really and thus aren’t worth ‘full-rate’

ML engineer is a super boring job content-wise and has insane outside pressure. It's about building data pipelines, the ugly grunt work. ML/Data Scientist is the interesting job. Usually Data Scientists view ML Engineers as replaceable drones that don't understand anything interesting and do the boring part of the job for 2-3x less than they do. The only advantage of ML Engineers is that AutoML is unlikely going to replace some dirty work but might endanger outdated Data Scientists.

I might disagree on this. The software engineering behind production machine learning systems can be quite interesting and nontrivial. It really depends on the scope of the challenges being faced. If you have thousands of models that need to be served in production and continually retrained and monitored, that becomes a pretty sophisticated problem space to work in.

Yes, however most ML engineers don't get to work at Jeff Dean's level to actually do such interesting work. There are very few companies willing to write their own Horovod or distributed PyTorch.

Except for the part where data science is an incredibly broad term and the majority of the positions are seemingly what used to be called 'data analyst'.

It makes finding good positions really hard.

the vast majority of ML/data scientist positions are actually similar to the hypothetical ML engineer position you described.

Well, you could save money and use it to feed yourself while learn new skills when you're out of a job. That works too.

Congrats on your cool life, your ivy league education, your CTO role at OpenAI and all the access that provides. You've done it!

Also thanks for telling us how you became a practitioner. It's definitely relatable and not a humble brag at all.

Comments like this are what make HN great and are not at all toxic!

Jealousy is unbecoming. I had no idea who this guy was and i found the article nice to read. It is a story about a guy who feels out of his depth in a highly technical domain but instead of giving up, he devotes time to it and comes out the other end much more competent. The internet provides everyone with world class experts to help you if you are stuck on something, most people don't have the will to help themselves.

Indeed inspirational article.

Applications are open for YC Summer 2020

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact