
An Interview with an Anonymous Data Scientist (2016) - PaulJulius
https://logicmag.io/01-interview-with-an-anonymous-data-scientist/
======
Terr_
Good interview, there are a bunch of bits I feel like I ought to be Quoting
For Truth but then I'd end up with a pretty bloated reply.

> I want to emphasize that historically, from the very first moment somebody
> thought of computers, there has been a notion of: “Oh, can the computer talk
> to me, can it learn to love?” And somebody, some yahoo, will be like, “Oh
> absolutely!” And then a bunch of people will put money into it, and then
> they'll be disappointed.

Reminds me of a pre-transistor computing quote from Charles Babbage, about
some overeager British politicians:

> On two occasions I have been asked, — "Pray, Mr. Babbage, if you put into
> the machine wrong figures, will the right answers come out?" In one case a
> member of the Upper, and in the other a member of the Lower, House put this
> question. I am not able rightly to apprehend the kind of confusion of ideas
> that could provoke such a question.

~~~
gwern
Speaking as a 'loon', his AI history is wrong in several places:

1\. the Fifth Generation Project
([https://en.wikipedia.org/wiki/Fifth_generation_computer](https://en.wikipedia.org/wiki/Fifth_generation_computer))
was 19 _8_ 0s officially ending in 1992, not 'late 1990s' (during the Dot-com
bubble?!); 2\. the Lisp bubble didn't pop because of a failed DoD piloting
project, it popped because of the first AI Winter + commodity SPARC/x86
pressure + recession
([https://en.wikipedia.org/wiki/Lisp_machine](https://en.wikipedia.org/wiki/Lisp_machine))
(and I don't recall DARPA instituting any policy like 'no AI', just stopping
subsidizing Symbolics and later Connection Machine); 3\. the Club of Rome
report couldn't've killed its modeling language because it only really
acquired its present ill repute by the 1990s, the implementation language
Modelica
([https://en.wikipedia.org/wiki/Modelica](https://en.wikipedia.org/wiki/Modelica))
didn't die (last release: April 2017) and is still in industrial use which is
more than almost all languages from the 1960s-1970s can say, and even the
World3 model
([https://en.wikipedia.org/wiki/World3](https://en.wikipedia.org/wiki/World3))
analyzed in the report continued development for decades; 4\. the Oxford paper
([https://www.fhi.ox.ac.uk/wp-content/uploads/The-Future-of-
Em...](https://www.fhi.ox.ac.uk/wp-content/uploads/The-Future-of-Employment-
How-Susceptible-Are-Jobs-to-Computerization.pdf)) doesn't make precise
forecasts for when any automation may happen (merely saying "associated
occupations are potentially automatable over some unspecified number of years,
perhaps a decade or two"); 5\. the GPU server comparison is really weird as
computers have almost always cost more than humans and only relatively
recently do any computers' hourly costs fall below minimum wage; and 6\. the
Dartmouth description is wrong, the conference merely proposed ([http://www-
formal.stanford.edu/jmc/history/dartmouth/dartmou...](http://www-
formal.stanford.edu/jmc/history/dartmouth/dartmouth.html)) that meaningful
progress could be made by 10 researchers, not grad students ("We propose that
a 2 month, 10 man study of artificial intelligence be carried out during the
summer of 1956 at Dartmouth College...We think that a significant advance can
be made in one or more of these problems if a carefully selected group of
scientists work on it together for a summer.")

Also, come on dude, Keras isn't hard to use - it's not even comparable to
Tensorflow. But at least he didn't tell the tank story.

~~~
fnl
Here's another factual error: Data science is from the 1960s, and was used
first in a paper published by Peter Naur in 1974:
[https://en.wikipedia.org/wiki/Data_science](https://en.wikipedia.org/wiki/Data_science)

~~~
geezerjay
Data science is actually statistics, which goes quite a bit further than the
1960s. In fact, today's data scientists love to quote Box and Fischer.

Data science and data mining are victories of marketing over common sense.

~~~
fnl
Sorry, I meant that in the sense of the origin of the term. But yes, DS is
mostly just another word for statistics. About as pointless as the term AI has
become.

------
sriku
> You become so acutely aware of the limitations of what you’re doing that the
> interest just gets beaten out of you. You would never go and say, “Oh yeah,
> I know the secret to building human-level AI.”

A colleague of mine called these "educated incapacities" \- where we become
acutely aware of impossibilities and lose sight of possibilities. Andrej
Karpathy, in one of his interviews iirc, said something like "if you ask folks
in nonlinear optimization, they'll tell you that DL is not possible".

It is useful to keep that innocence alive despite being educated, especially
if the cost to trying something out doesn't involve radical health risks. That
plus a balance with scholarship.

Knowledge, courage and the means to execute are all needed.

~~~
brucephillips
> If you ask folks in nonlinear optimization, they'll tell you that DL is not
> possible.

I sincerely doubt anyone who knows more than one sentence about deep learning
would say that, since deep learning doesn't claim to optimize.

~~~
aoki
i suspect that what he's referring to is that he's heuristically minimizing a
somewhat arbitrary (loss) function in a million-ish dimensions using the
simple variants of gradient descent that work under these conditions. it
sounds far too WIBNI to produce good results reliably (in practice, let alone
in theory). the landscape has so many stationary points at which to get stuck;
why would you ever get good results?

there's a small cottage industry of papers (like [0]) that try to explain
this.

[0] [https://arxiv.org/pdf/1412.0233.pdf](https://arxiv.org/pdf/1412.0233.pdf)

~~~
azag0
I think this recent paper [1] sheds quite a bit of light on this.

[1] [https://arxiv.org/abs/1703.00810v3](https://arxiv.org/abs/1703.00810v3)

~~~
chillee
Really don't think that's the best paper to say "sheds quite a bit of light on
this". That paper has been somewhat controversial since it came out.

I think [https://arxiv.org/abs/1609.04836](https://arxiv.org/abs/1609.04836)
is seminal in showing unsharp minima = generalization, the parent's paper is
good for showing that gradient descent over non-convex surfaces works fine,
[https://arxiv.org/abs/1611.03530](https://arxiv.org/abs/1611.03530) is
landmark for kicking off this whole generalization business (mainly shows that
traditional models of generalization, namely VC dimension and ideas of
"capacity" don't make sense for neural nets).

------
nocoder
I work at a tech company and one of the things I have recently noticed is how
ML and AI terms are being increasingly used by the business people. The guys
who have no technical understanding, these are accountants or marketing guys
saying we should ask tech team to design ML to solve these problems. Its as if
ML is a thing to through at every kind of imaginable problem and it will be
magically solved. I believe a lot of this has to do with PR around this by big
tech companies. Take for example, the recent alpha zero vs stock fish PR, it
has been spun around by Google in a way as if it was some magic. You hear a
lot about how it took just 4 hours and I find it hard to explain to people
that 4 hour time is meaningless. It is about how many games it could play in
that time. Moreover the match happened between two systems on a different
hardware and that is a big difference and also the fact that it used a
arbitrary type of time control of, 1 min/move. Again this can make big
difference but it is a big struggle to get past this PR fluff. To be clear, I
am not denying the advances made by deep mind, I just want people to
understand that it has come on back of probably the the world best team of
scientists alongside state of the art Google designed hardware and incredible
monetary resources of Google.

~~~
blueplastic
I'm pretty sure you can throw IBM Watson's AI at any of these business
problems and you can solve it very quickly.

------
trts
This articulated so much I have learned about the field in the past 5 years.
As someone who inherited the title 'data scientist' because that's how my
department designated us when it became fashionable, felt fraudulent due to
the unlimited expectations of what data science is vs. what I understood it to
be, and subsequently has interviewed probably nearly a hundred data science
and machine learning 'experts', there seems to be little cohesion to what
these terms describe, little understanding by laypersons about data science
besides that it is some kind of magic that only the very gifted can command,
and no greater distance between hubris and praxis that I have seen sustain
itself for so long and so intensely.

The whole interview was an absolute joy to read.

------
carlsborg
It was 2016 and he said "I’ve noticed on AWS prices was that a few months ago,
the spot prices on their GPU compute instances were $26 an hour for a four-GP
machine, and $6.50 an hour for a one-GP machine. That’s the first time I’ve
seen a computer that has human wages.."

Minimum wage (or thereabouts $7.20) now gets you a whopping p2.8xlarge (8 GPU,
32 vcpus, 488GB RAM), and the single GPU machine p2.xlarge is now $0.9 per
hour.

This is a crazy data point. What will minimum wage buy you five years from
now?

~~~
SiempreViernes
Depends, do you think the lowest legal wage should go up or down?

~~~
jononor
Even if I wanted it to double, I don't think that would make it more likely to
actually happen. I think the likelihood of machine power available being
double or quadruple what it is now is pretty good.

------
CalChris
This reminds me of ... What’s the difference between a data scientist and a
statistician? A data scientist lives in San Francisco.

~~~
tikhonj
More cynically, the difference is 100k/year :P.

~~~
dllthomas
In rent? :D

------
sundarurfriend
It's an interesting read, though not very enlightening in terms of new
information. It's same old pre-existing arguments put in a more informal, more
directly honest package.

As another person who's seen robots fall over again and again and has a scope
for the difficulty of the problem, I'd say there's also the risk of the day to
day failures making us lose sight of the forest for the trees, with
availability bias working against us.

Also,

> the Y Combinator autistic Stanford guy thing

> the Aspy worldview

It's a bit worrying that use of these terms has turned into a kind of slur, to
lump a kind of imagined stunted-worldview with a medical diagnosis. Not
particularly pissed that this guy used these, more worried about what it
indicates - that these have become so common as to infiltrate friendly
informal conversations from seemingly intelligent people.

~~~
muraiki
Yeah, I was shocked when I came across that. The data scientist appeared to be
really in tune with ethical problems, and then speaks like that. It's very
disappointing.

------
MikeGale
It is just so amazingly refreshing to read something not put together by a
know-nothing.

I wish I saw more than one or two of these a year.

------
comstock
Any bets on when the current deep learning bubble is going to burst?

It’s shocking to me how much technical people buy into this, how “this time
it’s different” and AI isn’t “over-promising and substantially under-
delivering” this time. Really odd to watch it come round again, when the
reality is we’re more likely to see some near incremental progresses, partly
fueled by more compute and algorithmic advances. Partly by a lot of PR.

~~~
marshray
I think we're just used to computers advancing noticeably on a regular basis:
"Is this year's iPhone better enough to justify an upgrade?"

Also, we judge the difficulty of things by our own experience. It took us ~1
billion years to get to the point where we could communicate abstract ideas
and play chess. These were once believed to be the challenging problems in AI.

It turned out that chess is easy we're just relatively bad at it.

~~~
yters
Chess is easy when you have the hardware to effectively brute force it. Once
someone develops an algorithm that requires an order of moves comparable to a
human, and significantly outperforms a human, then AI will be interesting.

------
deviationblue
I've noticed an alarming uptick in articles around job titles and what people
call themselves, so I feel compelled to say something. I couldn't be bothered
what someone calls themselves as long as they can actually get shit done. The
focus on titles is misplaced, especially for people who work in BigCo, as most
titles in such places are handed down by HR anyways so I don't focus too much
on them. But what does the person actually doing on a day to day basis? Is it
stats? Is it exploratory analysis and modeling? Are they using ML, or working
with data that doesn't fit on a single commodity machine? Writing people off
based on what titles they might have had at some job (which they probably
might not have any control over) is a good way to lose out on talent that you
might have appreciated. But of course, this cuts both ways, would you want to
work for someone who gets hung on things like that?

Anyway, overall great article, but this was the one thing that bothered me
enough to comment.

------
nicolewhite
I enjoyed his comments on Tensorflow.

> It’s really bad to use. There’s so much hype around it, but the number of
> people who are actually using it to build real things that make a difference
> is probably very low.

I wonder how many data scientists out there are actually developing Tensorflow
models for a mission-critical project at work. I'm not. I have used Tensorflow
successfully within my personal projects, but I've yet to need it for anything
"real."

~~~
mslate
We used it for a sales email classification problem--it significantly out-
performed our conventional approaches (i.e. logistic regression + bag-of-
words), but we were not PhDs and none of our job titles were "data scientist"
so I guess that makes us charlatans ;)

That service offering among the rest of the business was marginal so it never
became an offering that our sales team pitched our customers very
aggressively, so in this particular case TensorFlow did not push the needle
so-to-speak.

~~~
amrrs
Wondering what Tensorflow has to do with that out-performance since it must be
all about the model/algorithm that you implemented in that - like you could've
had a TF code running the same conventional approach you mentioned above -
which wouldn't have done any magic. Isn't it the algorithm like a convnet
doing the magic rather than TF itself responsible for it?

~~~
mslate
Yes, TF is merely a framework implementing convolutional neural nets, not a
novel implementation of them.

We chose TF over other convolutional neural net libraries because it was 1.
Python and 2. heavily sponsored by Google.

------
EdwardDiego
Can anyone comment on his point about Spark's ML libs? I note that was from
last year (about 2015 code), not sure what level of beta they were at, but
yeah, I use it for batch processing, but have never used the ML aspects, so
just curious.

> And even up to last year, there’s just massive bugs in the machine learning
> libraries that come bundled with Spark. It’s so bizarre, because you go to
> Caltrain, and there’s a giant banner showing a cool-looking data scientist
> peering at computers in some cool ways, advertising Spark, which is a
> platform that in my day job I know is just barely usable at best, or at
> worst, actively misleading.

~~~
Radim
Getting better obviously, but the feet-on-the-ground experience for MLlib is
still far from pleasant: hard to configure, hard to manage, hard to scale,
hard to debug.

By way of anecdote, Spark's MLlib used to contain an implementation of
word2vec that failed when used on more than 2 billion words (some arcane
integer overflow). So much for scale!

As for performance, in 2016, the break-even point where a Spark cluster
started being competitive with a single-machine implementation was around 12
Spark machines (a bit of a hindrance to rapid iterative development, which is
the corner stone of R&D):
[https://radimrehurek.com/florence15.pdf](https://radimrehurek.com/florence15.pdf)

~~~
kwisatzh
Can you be more specific in terms of issues with ML Lib? I'm thinking of using
it with Spark cause of big data requirements, but have heard MLLib in
particular is highly unreliable.

------
Jesus_Jones
Hah, this is a great interview! [You can't really trust someone who calls
themselves a data scientist, they are just taking that exciting and
financially rewarding name], loosely paraphrasing. Too bad it is anonymous. It
totally fits my unfair preconceptions of this field. I know, I'm a "computer
scientist" with a phd, its not a real science if you have to put science in
the name, that's what they tell me.

------
eanzenberg
Eh, pretty disappointing interview. It doesn’t tske a team to utilize gpu
computing, it takes one person and I’ve done it. Also, you can’t complain
about there being no strong-ai companies and then list accomplishments of
strong-ai companies.

I personally don’t like the phrase data scientist but I get it and I get why
it’s science as opposed to engineering. I personally like the split between
machine learning, BI, and data engineering.

~~~
sjg007
I think the contrast is between statisticians and physicists PhDs compiling
GPU support... even some CS PhDs have a hard time with that... this is less
important as time goes on since the engineers figure it out and make it
readily available.

~~~
chestervonwinch
When I installed Theano, it was just `pip install theano`, and editing a
couple of lines in a config file. Are other GPU libs (tensorflow, caffe, etc.)
really that much more difficult?

~~~
eanzenberg
pip install tensorflow-gpu

is all I do, once the dependancies are setup.

------
perturbation
I've been seeing nothing but negative, dismissive comments about data science
on HN lately, which is really disappointing. There's definitely a lot of hype
right now about DL, but almost all of my job does not deal with Big Data or
Deep Learning, 'just' machine learning + stats + calc + scripting + data
cleaning + deploying models.

I think most people don't have big data (Amazon has an x1 with 4 TB of RAM,
after all!) but there's no shame in that. I'll use a big machine for grid
search or other embarrassingly parallelizable stuff, but I can confirm that
Spark is usually a bad tool for actual ML unless you use one of their out-of-
the-box algos. Even then, tuning the cluster on EMR with YARN is a pain,
especially for pyspark. There's a gap, I think, between the inflated
expectations of "I'm going to get general AI in 5 years and CHANGE THE WORLD"
and "this K-means clustering will be a good way to explore our reviews", but
somewhere in the middle there is actual value.

(I also hate that "AI" is becoming the new hype-train; I don't consider
anything of what I do to be "AI", but you have people calling CNNs or even
non-deep-learning models "AI"). This is only going to result in inflated
expectations- DS practitioners have to communicate the value without hype, and
also find a way to weed out charlatans.

~~~
gaius
_I 've been seeing nothing but negative, dismissive comments about data
science on HN lately, which is really disappointing. There's definitely a lot
of hype right now about DL, but almost all of my job does not deal with Big
Data or Deep Learning, 'just' machine learning + stats + calc + scripting +
data cleaning + deploying models._

But, all those things people did in the '90's or even earlier. It was called
"data warehousing" or "decision support" back then. The fundamental techniques
- linear regression, logistic regression, k-mean clustering - go back even
earlier, to the OR community post-WW2. Banks have been doing credit scoring
with these techniques for a loooong time. The manufacturing industry has been
using these techniques for even longer. Engineering for even longer than that.

So you can see why people are quite cynical about the way old, established
techniques are being presented as the hot new thing - and you can see why
people who have been doing this stuff for 20+ years might be annoyed at
20-somethings who claim to have invented this new thing. What's wrong with
someone calling themselves a "statistician" or an "applied mathematician"?

But this is by no means purely a DS thing, seems noone is a programmer anymore
either, they're all "senior certified enterprise solution architects" or some
grandiose thing.

~~~
perturbation
> But, all those things people did in the '90's or even earlier. It was called
> "data warehousing" or "decision support" back then.

I would say data warehousing is more concerned with things like OLAP, Star
Schema, ETL, etc. than what people are calling 'data science' right now. The
same thing with 'decision support', since data warehousing grew out of
decision support systems. The most overlap here is with 'data mining'
algorithms like association rules clustering.

> The fundamental techniques - linear regression, logistic regression, k-mean
> clustering - go back even earlier, to the OR community post-WW2.

Here I think you've got a stronger argument. OR has a long, proud history of
using applied math for business objectives. But again, I would say most of OR
deals with different problems and different techniques - it's more about
prescriptive analytics, constrained optimization, linear programming,
simulations, etc. than the type of predictive modeling in most data science.

I see data science as a separate field even though it's stitched together from
a bunch of others. It's certainly not entirely new, and certainly overhyped in
some annoyingly-breathless news reports. I could say the same thing about CS -
was it entirely "new" when it started as a discipline? Isn't CS "just" applied
math?

------
otalp
Jeff Hamerbacher, the guy who coined the term Data Science, also said "The
best minds of my generation are thinking about how to make people click ads.
That sucks.”

~~~
fnl
Um, no, that's yet another falsehood in that interview; The term DS is _much_
older, and stems from Peter Naur, anecdotally coined in the 1960s and with a
provable [edit: removed wrong ref] paper in 1974 using that term:
[https://en.wikipedia.org/wiki/Data_science](https://en.wikipedia.org/wiki/Data_science)

~~~
chestervonwinch
Interestingly, Tukey's (of fast Fourier fame) paper, "The future of Data
Analysis" [1], was published circa 1961.

[1]:
[https://projecteuclid.org/download/pdf_1/euclid.aoms/1177704...](https://projecteuclid.org/download/pdf_1/euclid.aoms/1177704711)

------
d--b
As important as it is to debunk the hype surrounding AI, it is also important
to note that the recent advances in neural nets hinted that we're onto
something regarding the functioning of the brain, and in my opinion, it would
be equally foolish to dismiss the _possibility_ of a breakthrough that would
get us much closer to general AI (for instance if someone came up with some
kind of short-term / long-term memory mechanism that works well)

I personally think that the main reason why general AI may be very far away is
because there is little incentive today for working on it. Specialized AI
seemss good enough to drive cars. Specialized AI should be good enough to put
objects in boxes, cut vegetables and flip burgers and so on, and the
economical impact of building that is much greater than the economical impact
of making a robot that barely passes the turing test and that's otherwise
fairly dumb or ethically unbounded.

------
brucephillips
> the data sets have gotten large enough where you can start to consider
> variable interactions in a way that’s becoming increasingly predictive. And
> there are a number of problems where the actual individual variables
> themselves don’t have a lot of meaning, or they are kind of ambiguous, or
> they are only very weak signals. There’s information in the correlation
> structure of the variables that can be revealed, but only through really
> huge amounts of data

This isn't really true, since this can be said of any ML model. ML is nothing
new. Deep learning is new. It works because we have so much data that we can
start to extract complex, nonlinear patterns.

------
vadimberman
> I feel like the Hollywood version of invention is: Thomas Edison goes into a
> lab, and comes out with a light bulb. And what you’re describing is that
> there are breakthroughs that happen, either at a conceptual level or a
> technological level, that people don’t have the capacity to take full
> advantage of yet, but which are later layered onto new advances.

Brilliant.

------
ramtatatam
I'm not native English speaker and I find this sentence from the article
weird:

> Because the frightening thing is that even if you remove those specific
> variables, if the signal is there, you're going to find correlates with it
> all the time, and you either need to have a regulator that says, “You can
> use these variables, you can't use these variables,” or, > I don't know, we
> need to change the law. As a data scientist I would prefer if that did not
> come out in the data. I think it's a question of how we deal with it. But I
> feel sensitive toward the machines, because we're telling them to optimize,
> and that's what they’re coming up with."

So is he saying that he is worried optimisation throws results that are not
what he would like to see?

~~~
pesmhey
Race is an incredibly sensitive topic in America. The best analogy I can come
up with for the author's statement is this:

You're looking to pick the fastest runners out of a group of people. You run
an optimization algorithm to pick out the fastest in that group. Nothing about
this optimization accounts for the fact that 1/3 of the people in the group
have been being shot in the foot with a gun prior to your optimization. The
data will show that they are poor runners without addressing the crime
previously committed. In fact, many people would consider it a second act of
crime.

------
yters
DL is hyped as a big thing, but why are multiple layers on a NN a
breakthrough? The only breakthrough is hardware, but I don't see that hyped.

~~~
srean
Shh, will you. Some truths are not to be aired in public.

We know that no manager got fired for choosing Java.

There is a researcher's version of that. No researcher got fired for making a
neural network more 'convoluted'. It helps if there exists one dataset where
it does 0.3% better. Doesn't matter if that data set is(has been since the
late 90s) standard fare as a homework problem in machine learning course.

That said we do understand these things a bit better than before. Some
concrete math is indeed coming out.

------
kerbalspacepro
Am I the only one who was expecting to learn about data science and instead I
got some moralising?

------
DrNuke
Different communities play a game at different times: the pioneers at first,
then the early comers, then the businessmen, then the masses, in the end the
legislators.

------
reesefitz
I feel so many data scientists are bullshit. I had the worse interviews, like
someone telling me about how ARIMA is so good and why would I even use a LSTM
network. Even worse is they cite some bullshit consulting article with skewed
data to prove their point.

------
reesefitz
some interviews ask me the stupidest questions "how large is your dataset?" ,
"have you ever worked with 100GB of data". fucking morons

