
The cold start problem: how to build your machine learning portfolio - sharemywin
https://towardsdatascience.com/the-cold-start-problem-how-to-build-your-machine-learning-portfolio-6718b4ae83e9
======
pragmacoders
I agree with the general point of this article. Showing that you have the
skill to accomplish a job effectively will get you a job most anywhere.

There's a major issue when this is applied to a certain class of problems in
ML and Data Science that people tend to ignore.

If you could get a job as a civil or mechanical engineer (building bridges or
whatnot) by showing that you built a small bridge in your backyard... We'd
have some unstable bridges.

If hospitals just let residents run the hospital... We'd have a lot of
mistreated illnesses.

If you could show a realty company that you can build a recommendation engine
and they hire you to build their advertisement algorithm... Suddenly you're
breaking housing discrimination laws.

I am all for folks being able to get jobs from their cool projects. But we
need ethical standards and educational standards before folks are given large,
real-world problems to work with.

We need to take a page out of engineering and medical playbooks and build
official education or apprenticeship requirements. We need to have licenses
that can be revoked if someone fails to follow ethical or quality standards.

So - love creative people getting jobs. Now give them a high-quality education
program along with those jobs.

~~~
Ma8ee
This is the problem of a lot of software development. Look, this bright young
man just built a wooden box all by himself. Let’s employ him to build our new
shed. A shed is just a big box, isn’t it? This bright young man just built a
shed. A house is just a big shed, isn’t it? He’ll figure out isolation,
electricity and plumbing when we get to those parts...

~~~
raducu
You know, Hercules should never try to catch that turtle, because he would
first have to walk haf the distance to the turtle, but in order to walk half
the distance he first must walk 1/4 the distance; but first he needs to walk
1/8...

Building ML projects at home is nothing like building a box to building a
shed; it's more like building a shed to building a house.

Sure, building a house is more complicated, but you won't be building that
house alone, and if someone hired you to build a house because you were good
at building sheds, I'm sure you'll start your job as a junior house builder,
not master architect.

~~~
pragmacoders
There were budget cuts. Teams were shuffled around.

You're running the team now. Your colleagues have only ever built boxes. Go
get em', master architect!

This is my (hopefully humourous but actually taken from my experience) way of
saying that companies will often do what is most immediately profitable rather
than what's best in the long run (for humanity or themselves).

~~~
llampx
If they are building houses where the roof caves in after the first rain, they
won't be in business long.

~~~
Ma8ee
Why? Nowhere was it written in the requirements that the house must be able to
withstand rain.

~~~
no-s
woodpeckers. civilization.

------
drawkbox
Companies want people that can come in and hit the ground running and are
self-starters.

It is always better to go into the interview, meeting, pitch with projects
that are shipped or working to demonstrate.

For instance, if you want to get into the game industry, build games.

Same with any field, when you create/build you can rise above education,
experience and more competitive metrics.

Created/functional projects, and especially shipped products, are the best
qualifications.

~~~
pj_mukh
Instead of the n-th Tutorial website, I would love a list of _worthwhile_
projects one can build with increasingly tougher level of difficulty, start
with MNIST and move up. You don't have to hand-hold me by providing a sandbox,
but suggested frameworks are welcome.

Ending the list with a couple of research/unsolved problems is perfect.

To me, this is the perfect way to learn, almost like feel your way blind in a
cave and reaching its extents.

~~~
skosch
OpenAI has "Requests for Research" from easy to very difficult:

[https://openai.com/requests-for-research/](https://openai.com/requests-for-
research/)

[https://blog.openai.com/requests-for-
research-2/](https://blog.openai.com/requests-for-research-2/)

~~~
pj_mukh
Woah. This is a good list, I just kinda assumed it was all unsolved problems
(“request for research”) but it’s not!

------
nightski
I learned two things from this -

1\. Apparently modeling is a solved problem. No need for any knowledge of
math/stats folks, just use the latest python packages.

2\. Data collection is more important than actually knowing how a model works
and even more importantly doesn't work. What happens when he is hired and
can't put together a reliable model?

I think his approach is solid (using projects to learn and being ambitious),
but it feels like trying to run before you can walk.

~~~
metafunctor
Like Jeremy Howard (fast.ai) says, nobody learns baseball by first studying
several years how to build baseball bats, optimal playing strategies, or
managing baseball teams. Nope, you're given a bat, told where to stand, swing
the bat and try to hit the ball. Suddenly, you're playing baseball.

~~~
majormajor
This feels inaccurate, or misleading because you typically learn baseball when
very young. Going out and swinging wildly without knowing how is hardly
"playing baseball."

When I tried to learn how to golf a _lot_ of time was spent on proper form and
club choice, not just "swing!" I swung as much _without_ a ball in front of me
as with one.

~~~
kkarakk
did your instructor also sit you down and instruct you on how the materials
involved in your club make it possible for you to swing?

the club is just the component you need to launch the ball towards the hole
using your skill. you don't need or want to know what the club is made of
until you think of buying a more expensive club(which might be never)

------
aabhay
The title of this article is misleading. I thought it was a discussion of the
"Cold Start Problem"
([https://en.wikipedia.org/wiki/Cold_start_(computing)](https://en.wikipedia.org/wiki/Cold_start_\(computing\))),
which is a common technical challenge. Instead, it is a recollection of two
stories for how to get a job in ML.

~~~
nostromo
It's a playful title -- using a term from the field in a new way.

~~~
dotancohen
It is using a term from the field in an incorrect way.

~~~
gdy
You've just failed the Turing test :)

~~~
cwilkes
Huh that’s an interesting way of thinking about the Turing Test. State
something that is categorically wrong according to a dictionary lookup (“cold
start problem” means X in machine learning and ML only) and see if the
respondent can be maliable enough to use that in a different context. Course
that’s the definition of a Turing test but in a way I hadn’t thought about.

------
nharada
One tricky thing I've noticed when trying to hire ML people is that it's very
difficult to tell when someone is okay at ML and when someone is great at ML
from their past projects. Because machine learning is just statistics, it is
often resilient to errors in design and implementation. You can mess things up
and still get reasonable (but suboptimal) results.

This means that someone can easily claim "this problem is difficult to learn
for machines" when they fail or claim "we got X% accuracy look how great we
are!" when they do okay. But a really good engineer or scientist would have
succeeded in the same task, or have gotten X+10% accuracy with the right
models, data, or engineering.

~~~
lstmemery
On my resume, I compare my results to either the previously implemented model
or the state-of-the-art for that specific problem. If you bring someone in for
an interview, ask them what the previous best was.

------
garysieling
This is also a great way to become a better engineer - a lot of jobs are
improving existing systems, so you get a different category of lessons
building something from scratch to completion. Doing a write-up at the end is
also teaches you to communicate better.

------
timdellinger
The number of prospective employers that will actually pull up your github /
portfolio project website / etc (let alone read it) is not especially
different from zero.

It's important to have stories to tell during an interview, but getting the
interview is the hard part.

~~~
scrollaway
Getting an interview when you have a consequent Github portfolio is very easy.

Prospective employers not only _do_ actually pull up your github profile, many
of them will find you _because_ of it.

As an aside, writing this makes me realize how successful Github has been at
implementing their motto of "social coding" when none of us had any idea wtf
that could actually mean or look like.

~~~
timdellinger
I'd be interested in the numbers on this. To get a rough number, I typed "data
scientist" "San Francisco, CA" into indeed.com and it's reporting 2,832 job
listings.

So of those 2,832 openings that companies are trying to fill, how many
employers are out actively recruiting candidates based on their github
profiles?

It seems to me that Github / social coding is an interesting mix of networking
(the social kind) and coding (which demonstrates abilities), but like most
networking, it's only one of many ways into the door, and it's unclear how
many of the 2,832 positions will be filled via that particular flavor of
networking. My guess is that it's the minority.

------
ScoutOrgo
I had a similar experience with a personal project. I scrapped NFL player/game
stats dating back past 2000 and spent a bunch of time on the data, feature
engineering, and modeling. Although my original goal was to beat vegas models
(got to even in some metrics, but not enough to cover the juice), it ended up
being the best line on my resume. Every interviewer during my last job search
asked about it, and it was very easy to talk about.

I did make money in the end however, but only because I was buying bitcoins to
gamble with in the 2017 season.

~~~
ptd
Serendipity? Or just good planning.

------
munchbunny
There's another important trend in these two examples that generalizes well
beyond just machine learning.

If you show a willingness to scrape resources together to work on difficult
problems, that's scrappier than the majority of your competition for the job.
Startups love that.

Of course actual technical skill matters, but in my experience, the
willingness to reach for aspirational goals is rarer than baseline competence,
so I'll happily interview someone who shows that specific trait.

------
jldugger
> Alex planned to improve his accuracy, of course, but he was hired before he
> got the chance.

Reading these posts I get the impression people prefer portfolio projects over
studying the mathematical fundamentals; "dazzle hiring managers with these
three easy tricks!" Seems like a great strategy to getting hired, but one
wonders how long people using these lifehacks persist in the role.

~~~
joshwcomeau
(not an ML person, take with grain of salt)

I think someone without experience training a model to high-accuracy might
struggle at a startup where they're the first hire in the department, but at a
larger company, presumably there are senior folks to help newcomers along. I
think having motivation is a much better signal, and you need to be motivated
to develop a side-project of significant magnitude.

------
smrtinsert
I had a hunch. I'm sitting here perusing theory after theory for years now. I
need to just sit down with some data and a python ide. It's more fun that way
anyway. This doesn't mean I think I'll get or even want a job, but I'll get a
better sense of what the work is and how to best use it.

------
sheeshkebab
Side projects directly related to the area someone is interested in getting a
job in are very useful, or those that use trendy technologies.

However if it’s in unrelated technology (or no longer trendy), then I’d argue
they should be removed from the profile altogether. These might raise
unnecessary questions and trigger biases.

------
Jerry2
> _But it was similar enough that they quickly asked Ron to make his repo
> private._

How is this infringement? Why did he oblige? Unless he got paid by them, he
should not have obliged with their request.

~~~
max76
Ron wanted a job with the organization that asked him to make his repo
private. Cooperating is the best strategy given the goal. Ron can always make
his repo public later.

------
ziont
Seems to me more of a case where companies no longer train grads, and instead
tell them to work on a proof of concept on solving a hard problem without
paying for it, a sad state of affairs.

~~~
simonw
I disagree.

I think it is absolutely immoral (and dumb from an IP risk point of view) for
companies to try to get interviewees to solve real problems for free.

This is different though: this is about potential candidates finding creative
ways to demonstrate their skills.

The companies didn't even ask people to do this: these people chose to take on
projects that would demonstrate their skills in a new space in which which
they did no yet have commercial experience. I think that's commendable and a
very smart strategy.

~~~
munk-a
This isn't different though. This article was written as "How to get into ML"
and is basically selling the idea that getting into ML requires (or is aided
by) doing some independent open source work to bulk up a profile.

A very competent developer who is interested in doing some ML work may have
just recently read this article and gotten to work on an ML project because he
feels that not doing so will hurt his chances of getting a job.

As a note, I have no idea what the right solution to this problem is, it is
good to confirm candidate's knowledge (I prefer creative ways similar to the
parent) and part of that knowledge certainly can include open source work, but
going down this road too much leads to people feeling obligated to make open
source work to put on their resume and even to people faking open source work
to try and land a job.

~~~
jimbokun
"but going down this road too much leads to people feeling obligated to make
open source work to put on their resume"

...which makes the world a better place. Why do you object to people writing
open source code?

------
hnuser355
I don’t know why everyone cares about machine learning so much. Frankly there
seem to be so many prerequisites to knowing what goes on, when I imagine a
software developer could just get better at java and then learn about parallel
code or something to get lots of money for less effort. I’m not in software so
I don’t know if any of that is actually true

~~~
t0astbread
There are lots of ways to make money easier and faster. I guess this is for
people who are genuinely interested in the subject.

