Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: Is it a waste of time to teach yourself data science without a degree?
182 points by thewarrior on May 5, 2017 | hide | past | favorite | 138 comments
I see a lot of people teaching themselves data science and machine learning but it seems that in the real world you won't be allowed anywhere near such a position without having a degree in the subject.

This is opposed to regular programming gigs where you can get work based on a portfolio.

Also there are efforts to commoditize common methods and algorithms by wrapping them up in APIs and SDKs.

So is it a waste of time to learn it on the side with the hopes of getting a data science job ?




(copied from answer to another similar question.)

Companies are looking for what you as a candidate can do for them.

Self-study or taking a class signals some level of "I tried to learn this thing." So that's a start.

Even better is "I built X", where X is obviously based on skill you learned. In which case you can omit the class because you have proof of learning, not just trying to learn.

Even better is "I provided business value V to my employer by building X." Because now you're showing how this skill is useful to someone else. So using skill at work is another thing to try.

Ideal is you write the above, but emphasize V (or choose between multiple things you can list) in a way that suggests you can help the needs of the particular company you're applying to.

So there's having the skill (which is good), but there's also how you present it to show it will provide value (also important).

More on the contrast between having engineering skills and marketing yourself here: https://codewithoutrules.com/2017/01/19/specialist-vs-genera...


While this sounds rational, I don't think that's how it works in most cases. Those who do the hiring usually prefer covering themselves by hiring someone with a degree rather than having to explain how they hired you because you made millions to someone else in the past. Also while we understand the hacker culture, many only understand "degree > job", they can't comprehend why you don't have a degree if you are so good.


> While this sounds rational, I don't think that's how it works in most cases. Those who do the hiring usually prefer covering themselves by hiring someone with a degree rather than having to explain how they hired you because you made millions to someone else in the past.

Definitely not my experience. I can count on one hand the number of employers who didn't want to proceed because of a lack of degree.


By that, do you mean <=5 of them, or given that your hand has 5 bits, <32 of them?


Given the rotating power of your average radius and ulna, isn't it possible to go all the way to <64 on single hand binary? (whole hand binary)


65 if you include sticking up the middle finger.


For data science specifically?

Remember, this question isn't about general CS.


The original comment sounded like general advice not specific to data science. I was thinking about it in the context of general CS/programming. But even in data science specifically, in my experience the focus in hiring is towards actual skills and less towards credentials/qualifications. I recently (few weeks ago) interviewed with a data science company for a data science role as someone with a software engineering background. It was a challenging interview but the subject of my degree never even came up and I got the job.


Data science is absolutely based on degree level. Software engineering, maybe 50% of people care about your degree.


I'm a data scientist at a research institution and I have no college degree. I'm self-taught and was originally hired as a software engineer on the basis of my projects and work experience.

> Remember, this question isn't about general CS.

What's your perception of data science vs CS, especially with respect to hiring?


Can you please elaborate more on the kind of work you do?


My impression is there is huge demand for data scientists (e.g. friend who is trying to hire someone and can't find anyone). Open Data Science Conference is running right now in Boston, 3600 attendees... it seems likely that if you can demonstrate skill, someone somewhere will hire you.


Depends a lot on the company. I know many successful programmers without a CS degree, for example.


That's exactly the problem, though: they're, I assume, programmers, not "data scientists". The notion that you can learn to program on your own and have the self-education result in a comparable (or better) effects than attending classes is widely accepted. I don't see the same attitude towards data scientists, which is what I think the OP is asking about.


This is great, however, the biggest roadblock might be the automated HR application scanning. As a victim of this, I'm back in school to finish my degree that I started 17 years ago.


How is it going? How do you do it?

I'm in a similar position right now, but every time I think about being subject to the bureaucracy again I shudder and choose to write interesting blog posts instead... It's not about the costs or the time; I feel like my dignity as a professional (and possibly even as a human being) would be threatened. It could be because of the way the higher education is set up in my country, though.


It's going well. It's tough working full time, kids, and accelerated full time courses. I have almost no free time what-so-ever. Currently taking a B.S. Computer Science program at Regis University.


>my dignity as would be threatened.. Bc thr way higher ed is set up in my country.

Would you mind elaborating on this?


In short, public universities and colleges have more than 50 years of history operating in an almost communist, authoritarian state. Private institutions are still widely considered to be worse than public, and the latter still operate on a presumption that students should be happy for just being accepted (and ripped off - sometimes unlawfully, as the public higher education is supposed to be free). In reality, when we chose to go all democratic in '89, the amount of money sent to the institutions diminished, while any change in their mode of operation was resisted, for 25 years and counting, so things only get worse since then.

It's hard to imagine without experiencing it firsthand. How could it be that big of a deal? But it is: you're treated not as a customer, not as a partner, but as a lowly supplicant in all your interactions with the institutions. You're expected to just "put up with it for a while" (like 3 to 5 years...) and drink some vodka if you get frustrated a bit too much.


Precisely. When giving career advice, remember to put your HR hat on. Most of the front lines receiving your resume don't think very much like an engineer. That's why names of languages make for such popular keywords.


May I ask which degree was that?


I'm working on B.S. in Computer Science


Do you take the classes at night and work full time during the day? That sounds tough considering you also have kids. Congratulations!


I think with DS more often than not you're being hired into an existing operation rather than being the first hire. Your hiring manager will likely be a DS guy with a higher degree who'll hardly believe that his PhD was a waste of time. Like finds like, people like me, and all that ...


I have this same non-background and work on the proverbial team of mostly PhD's. Short answer is yes, you can do it. Long answer is that you have to be really, really good to compensate, and getting to that point is absolutely exhausting. It's not about just going through a couple ML courses on coursera. You need to understand statistics, CS, and ML at a really deep level, and that means being good at applied math too. I was lucky to come out of physics and have a solid applied math background anyways, giving me a few years head start on that self study.

If you need structure to go through a few years of coursework on your own, you should go for the degree. If you just want to learn how to put pieces together and not learn how/why they work under the hood, you should opt for something else.

As with most questions about going nontraditional routes, you have to be really good to compensate, and getting really good is constant exhausting work.


Don't have a degree myself and about a third of the people I hire also don't have one. Why? Because I don't give anything about them.

I'd say if you don't want to work for a large, respected company first, it's a waste of time. Your degree is your entry ticket to your first job, not more. Later on, you can even work at Google if you want - just make a great product and get acquihired.

Three tips on what you should do instead:

1) BUILD something and show off your skills. Like, continuously. Always have your own challenges, do something about them, put your code online on Github. Host it so it can be seen and played with. Work towards a goal and learn what you need to learn on the side.

2) Focus on applying to companies not listing a degree in their job ad. You'll see there are quite a lot of them.

3) Don't focus on your lack of a degree in any interviews. Don't deny it, but just don't make it seem a deal. Often times, people won't even ask.


> I'd say if you don't want to work for a large, respected company first, it's a waste of time.

I wouldn't necessarily say that.

I don't have a degree (well, an Associates that ain't worth much). I've been employed as a software developer for 25+ years now (since I was 18 years old).

I wish I had pursued a degree, though.

Back then, when it came to my education, I was pretty lazy - at least when it came to more structured learning. I liked to pursue stuff on my own, though, at my own pace. I've done well in that manner.

In 2011, I "discovered" the idea of a MOOC: I took Andrew Ng's "ML Class" - and successfully completed it. That led me Udacity's CS373 course in 2012 - which I also completed successfully.

That isn't to say I didn't struggle with both of those: I had no experience with probabilities and stats, and I hadn't touched linear algebra since high school. But with the help of resources on the internet and elsewhere (along with help from others on the internet, and fellow classmates), I managed to complete both successfully, and I learned a lot in the process.

Last year, I started Udacity's Self-Driving Car Engineer Nanodegree. Today, I'm working on term 2. We're dealing with localization - basically learning SLAM, which was covered in the CS373 course, too. Prior to that, we learned about how Kalman filters (standard, EKF, and UKF) all worked to integrate sensor data. In the first term, as part of one of the projects I implemented NVidia's End-to-End CNN to drive a virtual car around a track.

All of these experiences, and others outside of all this, have taught me that perhaps I cut myself short by not pursuing a degree when I was younger. My current plan is once I finish this Udacity course, I'm going to get my BS online, then work toward an MS in comp sci. It isn't a matter of "I think I can do it" - I know I can do it. It's more a matter of absoluting proving it, and likely learning a lot more along the way.

I don't think a degree is a waste of time, unless you intend never learning more stuff as you "grow older". If your only goal is to "make money" and all that, maybe it is. To me, though, had I gotten my degree back then, I believe I would be much, much further along today. I can't change that, though - so all I can do is move forward.


>Last year, I started Udacity's Self-Driving Car Engineer Nanodegree

Same. I am also doing the course, just finishing term 1 (in the Feb cohort). Although I have a degree, taken around 23 years back. But I think, its the always learning attitude, thats more important.

As of now, I'm not exactly sure, as to how I can utilize the learnings in CarND nanodegree, but I'm enjoying learning about stuff, all the same. Had never coded in Python before I started taking Udacity courses, so learnt Python along the way, and that's just one of the several things.

For me, I find, I am not satisfied with knowing some things at a high level. Until I'd started taking these courses, I'd always get confused between the terms AI/ML/DL for e.g. also other buzzwords would be learnt in some vague way, and soon forgotten. Now the minimum value-add of this, is that that I can distinguish between BS/hype or not, when read articles on ML/etc. Also have advised some friends on what it can do and what it can't. That's a big achievement by itself to me. And lastly although, I am pursuing my own startup, so not actively looking for a job, but if some opportunity arises, I will seriously consider it. Because, heck, why not?


> I wish I had pursued a degree, though.

Why? I'm in a similar boat, but don't really regret it or wish I had done things differently.


Good for you! Your attitude is inspiring.


A good programmer with even just a high level overview of ML and Stats concepts would be an incredibly valuable asset to a data science team. Most ML people are academics who tend to not have good software engineering skills, finding people who master both domains is really hard.

Also to add to that most of the work in ML is feature engineering, data cleaning, testing and building pipelines which all require a good software engineering background.


I work for a technical statistical team in the financial world, we've been hiring PHDs lately to my team which has just meant exactly what this comment points out.

I do a lot of the grunt work of getting the data sourced, cleaned and ready and am called the 'data wizard' and other such annoying names.

What's frustrating is I can run the last lines of code and read and understand the output of the last step, but as the original question asked, management would prefer someone with a phd or masters in customer analytics to be the expert of the data output.


It sucks that you are doing the boring data cleansing job. And those people with degrees do the more interesting higher level analysis.


That's actually quite common, especially in financial industry. I work at a financial company and do the same line of work building data management tools/ETL systems that provide data to researchers on demand. There are many interesting challenges but a lot of grunt work too. Many of my teammates have CFA certs and math + CS degree (though non-PhDs) so majority of them are definitely capable of performing high-level analysis/financial modeling. However, only a few are needed to do research, the rest implementing and maintaining data infrastructure, which is quite a lot of work and needs more head counts.


Data cleansing + pipelining requires a certain degree of cleverness.


Sounds more like a "data monkey" (no offense)


Data monkey or data engineer? #loadedlanguage


Data Sanitation Engineering


I'm currently working through Andrew Ng's ML course on Coursera. It's definitely high-level (though a fantastic overview of the fundamentals), and I plan to take something a little more mathematically rigorous at some point, but I'll probably want to take some refresher courses in Linear Algebra and calc before doing so.

I'm not trying to learn about ML for purposes of employment, It's somewhat relevant to my current job, and I may have some interest in using it on my personal projects. But mostly I'm just learning for the 'fun' of it :)

I don't have the time, money, or inclination to pursue a MS in data science atm (My current 'formal' education consists of a BS in Comp Sci and an MBA), but I may go back to school when the kids are grown, more for personal edification than anything else, however. A big shift in career, from software engineer to 'data scientist (or whatever they call it)' is probably not possible at my advanced age (37).


> I'm currently working through Andrew Ng's ML course on Coursera. It's definitely high-level

I agree that it is "high level" and glosses over (purposefully) the nitty-gritty details of the "black boxes" for the most part. I say this as someone who took the first incarnation of the course, which was known as "ML Class" in the fall of 2011, before Coursera came about.

Despite it being high-level, though - this is what one of my "classmates" was able to create, about halfway or so thru the course:

http://blog.davidsingleton.org/nnrccar/

In 2012, I completed Udacity's CS373 course (https://www.udacity.com/course/artificial-intelligence-for-r...).

Today, I'm currently in the second term of Udacity's Self-Driving Car Engineer Nanodegree (the current lesson I'm on actually is a part of CS373 - so it's a kind of review lesson for me - heh). I'm having a great time learning about more in-depth understanding and knowledge relating to self-driving vehicles. Much of the learning can be applied to other areas of ML as well (learning how to use and abuse TensorFlow and Keras, for instance).

> A big shift in career, from software engineer to 'data scientist (or whatever they call it)' is probably not possible at my advanced age (37).

Don't let that stop ya! My plan after finishing this Udacity course is to actually work toward getting my BS and maybe MS in Comp Sci. By that time, I'll be well into my 44th year of age. I don't know if any of this will lead to a different direction in my career, but that isn't something I am really worried or planning about. I'm currently happy with where my career is; it pays the bills and allows for some fun, too. But if it should lead in another direction, so be it! I figure having this knowledge can't hurt me as a employment candidate, and will likely be seen as a plus. Worst case scenario, it will make my hobbyist robotics projects more interesting.

I figure I have another 20 or more years in me doing software development (assuming it remains a career option, of course); I personally have met more that a few other developers that age or older who are still making a living at it. So I'm not ruling out the possibility of a lateral move toward something involving my knowledge of machine learning.

Good luck with your studies!


Really? I don't know if I'm good, but I think I'm at least a "decent" programmer and I have a solid grasp of ML and Stats concepts and I've had absolutely no luck getting interviews, much less call backs for data science jobs.


I'd try to apply as Data Engineer, you'll be close to the data science work and can more easily pivot into there if you want.


Heck, before my first web dev job, it was VERY difficult to get a foot in the door. Now I feel like getting a restraining order on recruiters.


If only you were willing to work in Canada :P.


Have you seen the job listings for data scientists in Canada? It's paltry even here in Toronto compared to parts of the U.S. like SF which have a fraction of the population. Last time I checked, when I searched Data Scientist in SF there were 3000+ listings while in Toronto there maybe a little over a hundred.

And of the few job listings I've seen, most have high standards (PhD or min. Masters, x years of experience) with old companies (banks, car companies).

What's funny is that a lot of people in my circle in Canada are actually doing work for companies outside of the country (U.S., China...)


I was speaking in context of the company i work for. You're right but we're getting there.


You can get a job with a portfolio in data science. Just go to kaggle and beat everybody in all competitions. That is worth more than a degree. Companies will try to reach you if you can do it.

But, honestly, I think it is very difficult to learn data science by yourself. Someone with experience teaching you will make a huge difference. Data science is different than programming as in programming you can see step by step what is happening, in data science most of times it either works or doesn't. And you know it after your algorithm has run through all data for at least an hour. It is really hard to learn this way, you need hints that only someone with experience can provide to you. Moreover you can do a lot of mistakes without knowing it, for example, when cleaning the dataset people use the whole dataset to fill gaps and them split it for training and test. It feels right but that it is a huge mistake that invalidates the whole experiment (because you use information from the test set in the train set, to fill the gaps).


It depends on how you define "data science".

If you are like AWS and say that using logistic regression is machine learning, then yes, you can teach yourself data science. Learn SQL, read a couple of books on logistic regression, use some open data for building a couple of models. There are many companies where you can have a decent job and an easy living with SQL and logistic regression on your tool belt.

If you say that data science starts with automating stock trading or building the intelligence of self driving cars, than no, you can not teach yourself data science. You will need at least one degree. Or more.


> It depends on how you define "data science".

I think the widely accepted definition is "Statistics, but on a Mac"


Shots fired???

I'm new to the startup community, e.g. still in school but excited about startups, is there a general aversion to Windows and why is that?


TLDR;

1. Start-ups love OSS. OSS loves NIX. 2. Mac's just work (mostly). 3. Cult of Apple.

1. OSS is generally free. Windows software, esp. on the server side, tends not to be. For a start-up it means you might need to invest some sweat but you can spend your cash elsewhere (typically on hires or feeding yourself). OS X being a NIX allows easier porting of OSS than Windows.

2. Apple tech has a reputation for "just working" and continuing to work. Windows is still perceived to need a spring cleaning to reinstall it every year or so to keep it purring. Apple being a closed ecosystem from end-to-end doesn't have to support as much random stuff as MS. It keeps the problem space narrow and presumably that results in higher reliability.

3. The cult of Apple. Apple has a brand that is perceived as creative, fun, enlightened, whatever. MS is viewed as big enterprise. , which is often the Goliath that start-ups are looking to slay.

--

Anecdotally one company I was an early "many hats" hire I standardised on Macs because it meant less effort for support and licensing was easier.

If someones Mac was acting up/it fell down the stairs we could swap the drive to another machine and not faff with driver setup. If someone left the company, we could easily transfer the licenses which isn't always as straight forward between Windows and OS X.

One drawback is that we had one developer who preferred Windows. He was probably less productive than he could've been as a result. I would generally advocate to allow developers to pick their hardware when they start and take it when they leave. Once the company was big enough for a full-time IT support staff we diversified but it's a tradeoff either way.


Thank's for the reply, how does this apply if you're developing software that more than likely is going to run in a windows environment, i.e. Hololens, or on a gaming rig? I would imagine developing for that environment would necessitate access to that environment fairly regularly.


It doesn't if you're in the AR/VR or Video game space. It'll tend to be Windows centric. Server side stuff might be Linux but as you've alluded the client development will target Windows (e.g. HoloLens SDK requires Windows/MS Studio).

Only suggestion I'd make is learn to use the Command-Line with your tools (where possible) and practise writing scripts to optimise your workflow. If you do that it'll make the transition to a NIX a little less foreign.


I have a class next semester specifically on command line and NIX that I'm pretty excited about. When you say scripts do you specifically mean bash scripts or more generally like Python scripts ect.


Maybe because it's not unix-like. But I know many colleagues using Windows so I don't think there's an issue with that.


I saw one guy on Google Entrepreneur that eliminated potential hires based on if they used Windows. He was in the web development space so that could have been part of it but still it was really weird.


+1 to this. SQL + Logistic regression creates millions of dollars in business value every year. Some of that could be yours!


"If you say that data science starts with automating stock trading or building the intelligence of self driving cars, than no, you can not teach yourself data science. You will need at least one degree. Or more.".

Isn't it a little presumptive telling others they can't teach themselves something? Or do you mean to imply that they can't get a job without a degree? Those are different things.


> no, you can not teach yourself data science. You will need at least one degree. Or more.

Why not? What is it that prevents anyone from learning anything without getting a degree? I disagree with your statement, I think it might be harder, but I don't think anyone "cannot teach themselves X".


Business fundamentals rely on norms, whether we like it or not. As a society, we rely on credentialing for complex matters. Sure, while anyone can absolutely learn to build a self driving car without a degree, here are two very likely paths as a result:

1. Company hires this data scientist, but regulators are skeptical of the efficacy of his/her implementation. 2. Companies adopt this notion that everyone can be a data scientist, build self driving cars, and the cars turn out to be a very error prone, imposing harm.

Businesses have to set a bar somewhere to ensure their expected return on data scientist is positive. Just like pharma execs & investors vet their scientists, highly complex data science positions will require convincing the players (investors, regulators...etc.) that your guy is legit. A pharma company would never endorse Walter White as their scientist responsible for delivering drugs.


It's quite possible that you can teach yourself a lot of the relevant background. If you read David Barber's book cover to cover and do all the problems, that's a machine learning masters' course covered.

But the problem is that you're competing against job applicants who already have a degree in machine learning. So it will often be the case that you're lucky to make it past an initial HR screen. If they want to interview ten people, life is easier for them to pick ten who already have the right piece of paper.


Hah, I couldn't read that book when I was taking his class, it's not likely I'll start now.

However, to your point, "you can't teach yourself" and "you can't get a job just by reading a book" aren't the same thing. I'm not so sure you can't get a job by teaching yourself machine learning and making your own self-driving toy car or some other fun project, though.


If someone can't understand barber, I don't think they've taught themselves machine learning. I think they've taught themselves to plug data into an off-the-shelf model.


>What is it that prevents anyone from learning anything without getting a degree?

The people with advanced degrees want to protect the value of their investment.

As others have noted, the reality is you can generate a ton of value to business by "learned at home data science". Will you be doing cutting edge ML research? Of course not. But 99.9% of everyday problems can be solved with simple tools.


I may be biased because I am from a country where the first degree was for free and we did not have to pay tuition fee. At that time "I can teach myself X" meant "I can teach myself X so why not have a paper about it?". A lot of us are on the job market and you compete with us.


I agree with parts of both statements. On one hand, if you're looking to seriously get into data science then it's going to be hard to even get interviews with companies that are looking for real data scientists without those pieces of paper (diplomas). On the other hand, I agree that with enough dedication and effort you can teach yourself anything (after all, university is largely just you teaching yourself with the help of a schedule set by an institution).


My company is hiring data scientists all the time. Nobody looks at the education section of the resume of the candidates we get. We look at what the person has actually done, and then we interview to make sure the resume wasn't filled with lies about that experience.


That has been my experience as well. Not a single person has ever asked me "what's your degree", but everyone looks at my projects and past work.


I don't want to design the next self driving car.

Just design models that deliver business value.


self driving cars like comma.ai done by george hotz (dropout from college) ?

(and it's one of the best self driving software out there)

Of course you need to study(a lot), but a degree is not required.


Non of the data scientists I know actually have a degree in data science. They tend to come from either a physics, math or statistic background and have picked up the data science bits of the side.

Also many jobs that aren't data science jobs per se offer many opportunities to do data science type things. Get a job at a company that works with a type of data you find interesting, and that perhaps doesn't have a dedicated in house data scientist, and every time an interesting data related challenge shows up just go "I have a good idea on how we can approach this" (assuming you actually do). Next thing you know people will coming to you with their data science problems and before you know it you have several years of data science experience on your CV.


>> in the real world you won't be allowed anywhere near such a position without having a degree ...

yes, most likely they won't hire you for a "Data Scientist" position, but there are related jobs out there you can be qualified for if you have programming skills and understand DS stuff to some degree.

I've seen setups where a PhD with a "scientist" in his title would act as an architect/co-team lead with a senior engineer running a team of developers.

Someone has to implement DS' ideas after all and unless we're talking a really small team (or a jack of all trades DS) where DS has to write all the code himself - there is a need for developers with "some DS background" in those situations.


> it seems that in the real world you won't be allowed anywhere near such a position without having a degree in the subject.

I don't have a degree but work as a data scientist at a research institution. I'm self-taught and was originally hired as a software engineer on the basis of my projects and work experience.

It's true that you have to convincingly make the case for your competence, but a bachelor's degree is really at best a certificate of minimal competency in a subject. Its signalling[1] value quickly gets swamped out by actual work experience where you're continually learning and improving. So there's a great hack: just do actual good work and put it on your resume. Your portfolio of work should convey your competence so well that having a degree wouldn't really add anything. (So you can skip the degree, but you'll still have to put in the work.)

Remember that any healthy organization wants to hire for competence at job duties. If some company rejects you for not having a degree because the hiring manager has to cover their ass to upper management instead of optimizing for getting work done, you should really just be glad that you dodged a bullet.

I think what's most important is to keep growing and learning. Pg had it right: "If you're worried that your current job is rotting your brain, it probably is"[2].

1. https://en.wikipedia.org/wiki/Signalling_(economics)

2. http://www.paulgraham.com/gh.html


Writing a full reply since I don't agree with much of the advice given.

I've worked around/in data science teams at a large BigCo and I think that you're far overestimating the bar here. There aren't enough people to who can write data pipeline code (SQL/Shell/etc.), much less implement and intelligently explain statistical/ML models. Also, the average decision maker here does not understand the difference between 'created model in Pandas' and 'created model with Amazon's ML API'.

The modal background of data scientists in industry is closer to 'Econ BA + knows Python' than 'Artificial Intelligence PhD'. Moreover, the former will still enjoy a remunerative career if (s)he's sufficiently savvy about identifying problems and showing off how they can be solved with technology.

There may be a point in time when companies can't get a return by throwing math-savvy programmers at a problem, but that will be long after you and I have passed from the scene.


I don't think it's a waste of time. Even if you can't straight-up get a pure data science job, you can still benefit from having this background:

(1) You could focus on building data processing platforms using, e.g., Spark. This will get you very close to the data science folks and you could probably end up doing some interdisciplinary work if you wanted it and demonstrated enough interest and competence. At the very least, people who can build highly scalable data processing systems and who also have a reasonable understanding of how the data is being used are very valuable.

(2) There are lots of companies out there that don't engage in data science/machine learning at all. You could join such a company and represent the push towards developing a data science or ML division or team. If you're successful this could also get you major credit as a manager as well as putting you very close to real-world data scientists and ML projects.


A lot of employers still use degrees as a rough proxy for ability and dedication. This may be especially prevalent in data science since the field itself tends to have a lot of Masters/PhDs occupying the field -- which will tend to bias the hiring process towards viewing degrees as a strong positive signal.

With that said, a lot of companies hiring for data science roles fall into the category of software startups -- larger companies like Google or Facebook are looking for specialists who tend to hold degrees. But at smaller companies, you can be more of a generalist and there, the old mantra of "show me what you've built" often applies. You could build out a data science career if you found just the right company.

By no means is it easy, but I wouldn't say it's a waste of your time (unless you have some incredible opportunity cost you're using up).

If you were to go about doing it, I found this blog post that can help you with your plan of attack: https://www.springboard.com/blog/learn-data-science-without-...


As Mike Acton (Data Oriented Design Guru) once said in an interview "I don't care what you learned in school.. I care about what you learned of your own volition" [paraphrased from 1].

It never hurts to learn new things. Another HN poster suggested this channel for beefing up on linear algebra, and I absolutely love it [2].

[1] https://youtu.be/qWJpI2adCcs?t=58m

[2] https://www.youtube.com/playlist?list=PLlXfTHzgMRUKXD88IdzS1...


To hit a target, first you have to see it clearly. The term "Data Science" covers a broad collection of jobs, from statistician to machine learning/pattern recognition/AI expert to DBA to business analyst to visualization/animation expert to cloud/cluster/Hadoop expert to general data wrangler.

The skills required for each DS role vary a lot. I wouldn't expect a cloud expert to have learned about the Hadoop stack or HPC workflows in school, at least not to a useful degree. The same goes for DBA or business analyst or data wrangler.

But statistics and ML lie at the other end of the spectrum. These roles require a hierarchy of formal skills that are rarely mastered outside of college. They're expected to keep up with the research literature or formal techniques, which almost always requires the math skills of an engineer or mathematician.

Remember, HR everywhere is technically clueless. If management doesn't tell them the precise set of skills needed for the job, they'll minimize risk and ask for more expertise and experience than is needed -- usually in the form of excess degrees or prestige or buzzwords. The best cure for this is to bypass HR and go straight to a technical manager who knows what s/he wants. That's hardest at large corporations, who tend to outsource their HR needs to the lowest bidder.

At a smaller company, a lack of degree will matter less. If you can convince them you know what they need RIGHT NOW and can learn future material quickly, that's what they want to hear. (That's probably what the bosses of the startup did).

Or if you're targeting a specific project, then if you can show (e.g. via Kaggle or an online portfolio) that you clearly have the needed skills and you're not just a script kiddie, that speaks a lot louder than a mere degree (especially if it's over a decade old).


I hold a degree in mathematics. Small-minded HR drones have told me I'm not qualified to do programming since I'm not formally trained in computer science. I have been doing this since I was a kid.

Don't listen to them. Every professional will at some point in their career be judged by those less capable.


I teach data engineering and data science. I've taught at hundreds of companies. Yes, there are self-taught people doing data science in the real world. They're few and far between, but they are out there.

If you're coming from a programming background, I'd suggest becoming a Data Engineer with the goal of becoming a Data Scientist. I've had several students do that. They were general programmers who learned Big Data/data engineering and eventually became more technical Data Scientists. You can start to learn more about the whys here: http://www.jesse-anderson.com/2017/03/what-happens-when-you-....


Teaching yourself anything is definitely not a waste of time.

Don't get so caught up in the "degree."

I've met individuals with graduate degrees in computer science (i know OP asked for data science, but the overall point here applies to any field) that didn't hold a candle to self taught developers. If you're actually passionate and interested about something, you will become extremely well-versed in it. On the other hand, if you're not excited about data science, a degree with probably benefit you more than without one since it will force you to learn the topic.

In a nutshell, it's up to you to make yourself valuable and present that value to the world - a degree is just a shortcut for recruiters to filter on, but you can skip recruiters and talk to anyone in any company.


My experience has been that when it comes to the job market _knowing_ stuff is extremely valuable, but _having learnt_ stuff isn't very valuable, unless you have an excellent degree from a top tier university. What this implies is that you should select online study options based on how they contribute to your actual knowledge, rather than how they will appear to employers (in most cases, they will appear like nothing). Once you know enough, build a portfolio of projects to show what you know and look for a job based on that - if you really know how to get stuff done in the field you'll have many options to choose from.


Some actual data: in 2012, 70% of employed data scientists had a Master's degree or more

http://cdn.oreillystatic.com/oreilly/radarreport/06369200290...

So no, not futile.


Does anyone have experience with this scenario and actually completed Udacity nanodegrees for Machine Learning or Data Science or AI?

Their programs express job placement as a perk of graduation.

https://www.udacity.com/nanodegree

Educating for the "jobs of the future" is one of Udacity's goals, data scientist being one of those jobs.


I'm half-way through their Deep Learning Foundations nanodegree and I'm generally happy with it.

Note that only selected nanodegrees come with the job placement guarantee, and that the guarantee seems to essentially mean a refund, if you fail to find a job within 6 months. https://www.udacity.com/nanodegree/plus

As a sidenote - the deepest (meta) learning I've gotten is that paying for the course made me much more engaged and determined to invest time in understanding the material and completing assignments.


I started the AI engineer nanodegree in April and so far have been happy (https://www.udacity.com/course/artificial-intelligence-nanod...)

The first three months are basically a walkthrough of Peter Norvigs AI book, and the second part will be about deep learning.


It's possible, but definitely challenging. I did exactly this last year and got several offers, including prestigious companies, but I didn't have my pick of jobs as I did before and had to make some trade offs to be doing what I wanted, but it's definitely possible if you're a talented dev.


I can't answer the question directly, but I will say this: machine learning is a lot of applied math.

Suppose you are setting up a convolutional network to recognize some special object for a company. You will need to understand that math to know what parameters to tweak.

Is it the learning rate? Is it the way you randomized the weights? Is it the activation function?

Although, in fairness, I don't think even a PhD level candidate works out what the reason is likely to be. More than likely they have a few heuristics in their head (oh, it stops learning too soon, let's just drop the learning rate. Oh, it never converges? that activation function can't propagate error and so on).

The point is that you have to know the theory to be useful. It hasn't been worked out. It is very much a living science project. That's the fun of it though.


Machine learning is only a small part of data science.


True.

However, I think what I said about Machine Learning is just as true -- perhaps even "more" true -- of Data Science.

Data Science is applied statistics. Knowing the underlying math is key to interpreting the results, knowing what to tweak and so forth.

(Wait, Hadley Wickham himself commented on my comment!)


Well, imagine yourself on the interviewer side of the table. If you have a candidate who genuinely knows more than you, will you honestly turn them away for lack of a degree?

Obviously, you'll have problems getting past HR/filtering processes, and knowing more than whoever interviews you is a high bar.


I would suggest joining an early stage start up and getting involved with anything remotely to do with data science at every opportunity. I joined a small company as an analyst with no programming experience and minimal statistical knowledge. I was a graduate but not in a relevant subject and just taught myself the relevant skills on the side. It was a lot of work but not a waste of time. The programming side of the job can be learnt fairly quickly but the maths and stats side takes longer. I don't think you can really succeed in data science without both. Saying that, you certainly don't have to have a degree to be able to use that knowledge. I did just do a statistics degree though, and it has made the job a lot more pleasurable.


As a sidenote to your question, you may want to consider Data Engineering. It's not a sexy as ML, but it pays well and it's in high demand because somebody has to pipe all that data around so that the ML folks can do their thing. IMO it's much easier to go from a more traditional software development role into Data Engineering than into something as math-and-theory-heavy as ML because Data Engineering is based in how computers work and some knowledge of algorithmic scaling, not in heavy linear algebra/stats like ML.


I think it's harder but still feasible to obtain a job in data science, after the first job things will roll a lot quicker.

What a self-taught DS would need to do in order for me to feel comfortable hiring them is have a public body of work that I find impressive.

There are a huge number of publicly available datasets packed full of interesting information. Someone that shows they can do the work with a few findings on their github would be equivalent to a degree on a resume.


You can do a "regular" programming job and seeking business cases at your company where data science could help.

After, meet your boss and tell him something like "I can make this process 10-20% faster with a 3 month projects"

If he accept, you will have data science real world experience in your CV and it will increase your weight on the CV stack when you apply for data science jobs.


I don't think the goal should necessarily be to get a job that has "data science" in the title. There are plenty of projects where data science could help but not many devs know about the available tools so if you know something about data science you have an edge over other devs.

For example in my company there would be plenty of opportunities for applying machine learning or computer vision. Nobody knows enough to know how to approach these problem so nothing happens. We could use somebody who knows how to move forward.


Just think about how much better you need to be than someone with actual credentials (e.g. PhD in machine learning and real presentable experience) and then assess whether you are good enough to compete with them.

If you don't know how good you are relative to the competition with PhDs, then it would be worth it to have a discussion with people who have a taste for the field.


Not a waste of time but you'll need to be entrepreneurial to get a job without a higher degrees at the moment . In a few years time you'll be able to nail a job in DS simply because it'll likely be more pervasive in every day SaaS products and it'll become yet another thing you do as a dev that isn't strictly part of your job title.


On a related topic, I'm graduating with a master's in (computational) physics, and am already incredibly insecure about not having a PhD as many of the data science positions seem to prefer those.

Would a four year PhD, let's say in ML, be a worthwhile investment from a data science career point of view?


I would like to say no, but I hear from others that this can be an issue. Honestly, you can probably do a PhD in anything quantitative.

FWIW, I hire data sciency people all the time, and I don't care about PhD's. I think that the qualification means that someone was stubborn enough to finish, and can do (some) work independently, and that's about it.

The actual skills that I tend to like are the following: - data cleaning experience: this is the majority of the job

- scepticism, especially about one's own theories

- statistics and experimental design: you don't need to be able to prove theorems, but you should be able to follow them. Experimental design is one of the highest value skills in the world, and a lot of people don't have it.

- Communication skills: it doesn't matter what you did if no-one else can understand it

- For the avoidance of doubt, programming (not just SQL) experience (on HN, I would probably assume this).

Full disclosure: I have a PhD and lots of friends of mine in this area tell me that its very difficult to get the better jobs without it.


Doing Phd for the sake of getting hired by specific companies/industries later is an extremely bad idea. Phd is for those who want to pursue research career (although it doesn't work out for more than 50% of Phd's). I myself did Phd and transitioned into data science, but I could have gone better way. Some reasons include:

- Getting Phd is a long, unpredictable process that can take anywhere 4~8 years. Data Science may be hot right now, but it is not necessarily true by the time you graduate.

- You will pay a ton of opportunity cost doing Phd. For the same amount of time, you can earn so much money and learn on the job.

- Phd doesn't give you the real world industry experiences. I know a lot of fellow doctors who are trying to transition to data science but struggling.

I would say you should polish your programming skills and get hired as a software engineer in big data related field. Then involve in some internal data science projects and transition into data science.


> a four year PhD

Those exist? Be careful, very few doctorates are short and well-scoped.

Source: three letter, several of my friends were 7+, masters or not.


[raises hand]

Including Master's (which was when I did most of my graduate course work), I took about 7.5 for a PhD.


anecdotal, but Jesse Anderson is a world class big data expert, former Cloudera, etc. and my understanding is he is entirely self taught: http://www.jesse-anderson.com/


With data science, do you mean data science as in learning the tools, the software behind data science? That's like learning any technologies or tools.

Along this line, you would just be a "tech", not a "scientist" That's not to say you won't be real well compensated.

Data Science as in you are someone able to make sense of the myriads of conflicting data, derive pattern, synthesize bits and bytes into action plans, there is no degree in that :)

As an example on this line of thought, people may win the Nobel prize in Economics even though they may have no idea on how to use Excel :)


Lets try! For example this the math behind a 2x2 neural network:

http://htmlpreview.github.io/?https://github.com/aguaviva/Ar...

It is computing the derivatives of the error with respect to the weights.

If you feel comfortable reading that then you are good to go.


I think if you really want to get into the field, self study can be great. Yes you could learn some of the frameworks and libraries out there, but I think you will miss the bigger picture if you do not grasp the fundamentals.

Even brushing up on probability and linear algebra has benefits. Your learning a skill set that you can use in other areas of life. Heck, if you have kids or will have kids someday, you will have the knowledge to teach them valuable skills.


Most data scientists don’t have formal data science training. Most of the ones that go through our free fellowship (https://www.thedataincubator.com/fellowship.html - warning, I work at TDI) have STEM backgrounds and still land data science jobs at places like LinkedIn, EBay, Amazon, Capital One, Facebook, etc …


Break dow the word "data science" into non bull shit terms, actionable items, and you will see how achievable it actually is.


Bum on a seat using the Python free tools as a blackbox and the internet as a reference then? In most business cases it would work just fine but employers want to buy the most they can in advance, that's why degrees as a filter. Your best shot is showing up with one or more interesting, unheard case studies to gain attention.


What I've seen is that more and more companies just care about skills rather than degree. Self-teaching requires a lot of tenacity and most hiring managers would love this soft skill as well. Skills-based hiring is the future. If you can build real-world projects and demonstrate your skills, you should have a good shot.


How do you plan to study? Have you created your own curriculum or will you be following one you've found?

Like this Open Source Data Science Masters: http://datasciencemasters.org/


To tag on to this question a little bit. If someone wanted to teach themselves, even without the purpose of getting a job. What books / references would be recommended?

I saw a mention to David Barber's book in one of the threads here, but what else?


The problem is most of the people here don't understand how tough a problem is. Once you figure that out then search for books for answer. Here is a good book to start with Python Machine Learning.

Also don't read any book because it is free, Barber book is heavy in maths, you need at least a college level calculus and advanced statistics/ probability course to understand it.


Machine learning is the new electricity. There will be tons of positions available.


I suspect it depends in part on where you want to apply. Generally speaking, large corporations and government entities tend to want formal credentials, like degrees. This may be less true of smaller or newer operations.


I have been working software and IT engineering for almost 35 years. I'm self taught and, never took a college course until about 5 years ago. I have not been out of work for many years now and, the reason for that is the fact that most companies desire people who can hit the ground running. College degrees and books are fine for getting the basics but, what you learn in college is FAR different than what is in the real world. Companies want people who have been in the trenches and learned with "Trial by fire".

If you want to get a start as a self educated person in IT then, learn what you can on your own and then reach out to contracting firms. Get a few entry level contract gigs under your belt in order to pad your resume with some experience and then move up the ladder.


Quite often, Self-directed education + track record > Formal education


I'm a lead data scientist and I don't have a degree. I do have programming/technical chops though which helped a ton.


You seem to assume that the only use for knowledge is garnering employment: this is patently false, as you could easily learn something and apply it for your own pleasure in the non-professional domain.

P.S. It's called ’statistics’.


> P.S. It's called ’statistics’.

No, it's not. Read Breiman's "Statistical Modeling: The Two Cultures."

http://www2.math.uu.se/~thulin/mm/breiman.pdf

"Statistics" has largely been concerned with the "data modelling culture" Breiman talks about; a lot of data science is focused on algorithmic modelling, things like neural nets, random forests, and so on. A lot of these techniques have been refined outside of modern statistics because of statistic's focus on data modelling.

This also ignores all of the things that fall outside of the purview of the modelling steps altogether, things like data cleaning, data engineering, and so on. All of those are properly "data science" but often fall outside of what's in a statistics textbook.

If you are going to do data science, you should know statistics. You should know a lot of it. But that is far from the only thing you should know.

As to your larger point... yeah, well, jobs allow people to eat and get health insurance and all that, so it's understandable that OP might want to be able to do those things and not just apply it for his own pleasure. My take on that is that it's hard, both to acquire the skills needed and to signal to employers that you have them. If you're going that route, you need to build a solid portfolio of work. Kaggle might be a good place to start.


I agree with this observation. And given data science's distinguishing emphasis on algorithms and process, maybe it would be more properly called "data engineering."



I wasn't assuming that at all. Just wondering about the real world prospects for someone self taught.


> So is it a waste of time to learn it on the side with the hopes of getting a data science job ?


Not at all. Worst case scenario even if it doesn't land you a data science job, doesn't mean that it won't greatly help in another job that you may be doing.

The question you're asking is essentially "what's the point of going to the gym if I won't ever become Mr. Olympia".


I'm not the OP.

The comment above responded to the OP by treating his/her question as trivial and followed it with a snarky "P.S. It's called ’statistics’." remark.

P.S. @qubex Some people don't have degrees and would actually like to work in data science regardless. The OP is asking about the practicality of that scenario.


I didn't treat it as trivial (exactly) but I oozed what I judge a suitable level of condescension for what is clearly a very venal question that transpires a very narrow horizon. As for the ”it's called ’statistics’” comment: it is, and that subject has a rich history going back centuries — much of the ’modern’ ”data science” is just a less refined, more brutal reinvention of those same techniques.


I run a data science team.

We have statisticians (yes, plural) on my team who have published in Nature, and plenty with other backgrounds.

Even ignoring the data engineering side, there is plenty that statisticians don't do or know which is useful data science.

Take the two attitudes to p-tests, or what a "reasonable number of features" means. You drop the Jeff Dean "consider training models with billions of features" quote on a statistcians desk and see their eyes open.

Statistics is great, but data science is just as much programming as it is stats.


Awesome, so what qualifications do your Data Science team members have and would you hire someone without a degree? Or what if they had a data science / ML / AI MOOC like Udacity or Coursera? And do you have any advice for OP or others who don't have degrees?


We've interviewed people without degrees for SWE positions. I don't think we've hired any.

I wouldn't rule it out, but it would be useful to have accomplishments to point to, or a personal recommendation.

We have a stack of maybe 25 well qualified PhD data scientists waiting for interviews. It takes a lot to get to the head of that line.


Interesting. So perhaps tales of shortages are bullshit?


It's pretty hard to find people who know how to apply the tools that they have to new problems.


Nothing is a waste of time so long as you learn from it.


Learning something you like is never a waste of time.


It depends on what else you're doing. If you're a Scala dev and regularly work with something like Spark and Hadoop you could probably find an entry level data science job at a non-FB/Google company because your programming and framework experience are much needed. But if you're just a Java dev and you're taking an Udemy nanodegree or something you would have to know somebody or get very lucky.

It's possible to maybe help another team and sequel that into a data science job internally, but outside, forget it.


i fell into an ML consulting gig that was very lucrative once.

something to ponder: you just have to know enough to actually deliver on something management wants, and know more about it than everyone else at the company.


True. I'm an AV security expert at my mother in law's house.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: