I interview a lot of developers for ML positions at our company. The first red flag is always a lack of math. Candidates who come in with API-level competence ie. can implement an ML algo using this,that or the other API, without any understanding of some basic math behind it, always fare poorly. Atleast in ML, not having an understanding of math is pretty much like claiming expertise in riding a bicycle by watching 100s of bicycle videos on youtube without ever riding a bicycle yourself. This isn't the case with say, front-end web dev, where nobody really cares whether what's under the hood is php or jquery or elegantly handcrafted elm as long as the webpage looks good and functions as advertised. You can get a lot of mileage by hiring someone who can "make webpages" without knowing what tools they use to make that webpage. Once you get traction etc. you'll probably rewrite all of the cruft :) With ML, if you hire somebody who can "create decision tree in python mllib" without knowing the first thing about what entropy is & how to compute entropy ( a real candidate, unfortunately ), you are simply inflicting a lot of pain on yourself and your customers. Suppose such a person is deciding whether or not to give you a loan, and he decides to construct a decision tree. He'll happily throw in your zip code and credit card number into the mix, not realizing that those two features have super high entropy but the tree will have serious overfitting issues ie. the tree will simply not generalize to unseen data. He won't realize these things because he won't know what entropy is in the first place, since he only thinks of decision trees in terms of some black box that comes out of some ML api.
A lack of mathematical intuition is a serious problem for many people from engineering to biology to economics. It certainly plagued me throughout my engineering bachelors studies and is something I continually work to get better at.
In my opinion, physics students learn the best framework for thinking and get a very good mathematical intuition. For example, here's a problem from an introductory QM book that really threw me for a loop when I was studying:
A needle of length L is dropped at random onto a sheet of paper ruled with parallel lines a distance L apart. What is the probability that the needle will cross a line?
Since the lines are parallel you can rephrase the problem:
A circle of radius L is centered x far away from a border 0<x<L. This is because one end of the needle will always end in some zone (the center) and the other end will be L far away (the circle).
How much of the 2pi boundary is outside the zone?
When x -> 0 then it's going to be 50% since one boundary line becomes a tangent and the other goes through the middle. When we move x by k (e.g. f(x+k)) then 2k new points will be added on the left side while 2*k points leave the boundary on the right side. When x=L/2 then the boundary lines will split the circle in four equal parts (since they're tangent to the radius at r/2 on both sides) so intuitively its 50%.
I will say that after my discrete math class we really only talked about how to write a proof, and well, it really didn't help for me. (I think we were supposed to get further into stuff, but well, the class wasn't paced well, new professor, etc).
For such a problem, usually
"at random" means a uniform
distribution. But on
the plane, there is no
So, the "paper" can't be
the plane. So, it might
be fair to ask the size
of the paper and what
happens with the needle
near the edges? E.g., on
a rectangular sheet of paper
of finite size, the needle
can land in a position so that
it does not cross a line
but would on a larger sheet
I don't think that page explains it very well, but have poor math background. I imagined notebook paper with horizontal lines spaced L apart and then the needle dropping at any angle. When the needle is vertical the probability it cross a line is 1, when horizontal it is zero. The length of the needle L is the hypotenuse of a triangle. If we call the angle from horizontal x, the "height" of the needle can be anywhere within h=Lsin(x) for x between 0 and pi/2.
The "lines" are like a sample of a point from a uniform distribution U with width L, and h is an interval inside U. The probability a number sampled from a distribution of width L will fall within interval h is h/L. Substituting for h gives p(cross|x) = sin(x).
Then assuming the needle is equally likely to drop at any angle, for any one angle theta we get probability density p(theta=x) = 1/(pi/2-0)= 2/pi.
The probability the needle drops at angle x AND crosses a line is the product of p(theta=x)p(cross|x)= (2/pi)sin(x). As mentioned, x can range between 0 and pi/2. To get the probability the needle drops at angle x1 OR x2 OR x3, etc and cross we need to sum all these. So take the integral of (2/pi)sin(x) from 0:pi/2. This gives 2/pi.
Not to detract from your point that math is important, but in that example proper methodology (e.g., cross-validation), proper feature engineering, and especially domain knowledge are probably even more important. You can be aware of the strengths & weaknesses of different machine learning algorithms without being intimately familiar with the math. Ideally, ML methods are not treated as a "black boxes", but some aspects are inherently black box, even if you do know the math (e.g., parameter tuning).
>> He'll happily throw in your zip code and credit card number into the mix, not realizing that those two features have super high entropy
You will be surprised: you can make significant gains by including the zip code - i've seen that happen in a competitive setting. Where you live probably contains some signal about your credit worthiness.
Having said that, of course it doesn't make sense to simply feed the raw zip code to the tree. An appropriate encoding (most people would use a one-hot encoding, though there exist better ones) of the zip code will be key to extracting signal in a robust way.
>> ... the tree will have serious overfitting issues
Isn't it an almost standard practise now to use an ensemble of trees, such as a random forest? Decision trees have long been known to be prone to overfitting.
>If this is possible you don't have a real job and the Indian will be shortly be replaced in turn by a piece of software.
This is actually possible for a LOT of jobs that Americans do, modulo edge cases. Most of what nurse practitioners do, for instance, can very easily be replicated by a random forest. Try telling the nurse practitioners they don't have a real job. Hell, by that metric, most of crud isn't a real job either. In fact, the whole point of "work experience" is that the newhire with zero work experience mimics the more experienced worker until he passes the Turing test ie. the Boss, given the work output, can no longer distinguish if the work was performed by the newhire or the oldhand. Its turtles/strike/mimicry all the way down.
May yield... May lead...May replace. That's a lot of maybe. I submit that from purely a business ROI pov, you are far better off building a 'very complex hand engineered system' as you put it, since that's the natural outcome of hiring say a bunch of rails/python devs at pennies on the dollar on some offshoot dev portal. Building a nicely tuned scalable RNN model in the industry requires a team with 100x intellectual capabilities for which you will pay 100x, and there may not be a mature business case for that yet.Though I agree much of these skills are being commoditized rapidly.
Seems like a bit of a non sequitur. Nothing is guaranteed in life.
That being said, you've got this backwards. The natural outcome of hiring a bunch of rails/python devs to fine tune a machine learning / translation / recommendation system is that you get hundreds of thousands of lines of code that run slowly AND don't work. The entire premise of "deep" learning is that the system is a black box - features are learned by the black box. And you typically use pre-rolled fast GPU implementations. Most importantly, very little domain specific knowledge is needed. In fact, getting that hand-engineered system is going to be more complex, more costly, and it's going to require people with more domain expertise.
I agree, I don't think we're there yet. I think the tipping point will be when two key things converge:
a) when data (particularly historically-oriented, time series-type data) becomes as accessible and as commoditized as `npm install <foo>` (e.g. `datawiz install <a-PB-sized-data-set-of-everything-that's-ever-happened-in-this-domain-ever>`), and
b) the realization that software engineers and data scientists work best when paired together; they're symbiotic roles, not competitive/opposing (e.g. think an F-14 pilot and RIO).
I wish every keystroke, every OS, every program, console, TTY, GUI, errno, ssh session, Window, RDP frame, TCP/IP packet... everything I ever did was logged and timestamped. Imagine pairing that level of data with an RNN and a feedback loop that could self-evaluate predictions.
(And then imagine if that could be anonymized and publicized, such that in 50-100 years from now, new developers could get a head start with a "friendly AI" that nurses them from "well, that segfaulted" to "end-to-end enterprise app implemented from scratch" over the course of their career.)
I don't think there is any correlation between exciting/boring and profitable. Index funds ar one of the more profitable strategies (as they don't involve paying some guy a ton of money to play lotto for you). They are certainly boring. Taleb argues that people Investing are really bad at estimating the value of catastrophic failure (or blowout success) and invests accordingly. He is playing the other side of Y combinators strategy, very much in agreement with Fred Wilson, Y combinator etc, just the other side of the spectrum.
I used to be a fan of polyglotism and knew a bunch of languages to various degrees of profiency. But the market pays much , much more for depth than breadth. The tools that pulled me into the quarter million paycheck territory were -
1. Good Scala - FPish Scala, coupled with Scala libraries such as Spark, mllib, scalaz, Scalding, Apache Math, colt, akka.
2. Bad Scala - Using Scala much more as an imperative lang, primarily for 3 purposes - shell scripting, web front-end html5 graphics and DOM manipulation with scalajs and web back-end with scalatra.
3. R - This one is not going anywhere in a million years thanks to Hadley Wickam and co. Now with SparkR I get to use #1 and #2 with #3.
My one desire is that Styla becomes a first class citizen in this ecosystem as well - when you need prolog, there is often no easy substitute.
xyz will be a lucrative profession if and only if
1. xyz is gated
2. xyz is intrinsically hard
3. fewer people attempt to get into xyz over time
4. xyz yields higher wages per hour relative to other professions.
I don't see programming fitting any of these axes, let alone all of them.
I don't think the presented axes are correct (#3 I'd argue isn't necessary, though there may be a different supply factor that is, and #4 itself is equivalent to being a currently lucrative profession, it isn't a separate consideration that is necessary), and programming certainly currently meets 4, and arguably meets 2, at least for some significant subfields of programming.