
Data Science at The New York Times - gk1
https://blog.dominodatalab.com/data-science-at-the-new-york-times/
======
everybodyknows
>Python has gotten sufficiently weapons grade that we don’t descend into R
anymore.

>Hadoop is definitely happening but it’s Google’s problem because now after
building our own Hadoop on iron solution, after dealing with Redshift for a
while, we now just gave it all to BigQuery.

A tidy simplification of the technology stack.

~~~
nerdponx
_Python has gotten sufficiently weapons grade that we don’t descend into R
anymore._

I've experienced this in my own work as well. The extra verbosity of Pandas
data frames compared to R data frames doesn't bother me anymore. Sometimes I
miss the Lispy homoiconic magic, but not enough to make me want to use R at
work.

I still use it once in a while for heavily "statistical" stuff that doesn't
ever need to be "productionized", but for run-of-the-mill machine learning I
see no reason to use it over Pandas.

~~~
mushufasa
Would anyone recommend / warn against any of python tidyverse ports, like
dplython (dplyr) or plotnine (ggplot2)?

I'd like to have my cake and eat it too, but I'm worried that's too good to be
true.

~~~
mkay313
I gave plotnine a go in one of my personal Python projects (I'm a big fan of
ggplot2 and tidyverse in general over pandas and seaborn) and after struggling
for a while with a more complicated graph I went back to using seaborn.

Not to mention writing R-like code in Python will prevent you from being
immediately understood by both R and Python developers. It's just not worth
it.

------
o10449366
I'd like to see more transparency from NYT on how they're actually collecting,
retaining, and distributing user data given both their data science and
privacy efforts.

------
PeterStuer
Interesting how at 11:45 he skirts the whole privacy topic by just stating
that linking all their data to an identified reader (the 'who' in the 'who
what where' of reader behavior tracking) 'involves third party data'.

------
lordleft
I’ve collaborated with Chris Wiggins at Columbia. He’s insanely hardworking
and it’s impressive to see how he balances an academic life with the life of a
working Data Scientist at the New York Times. Really inspiring guy to be
around.

------
sonabinu
Been a very long time since I heard anyone speak of Jeff Hammerbacher. His
passion for data and engineering is amazing!

~~~
TallGuyShort
And it seems like an understatement, when presenting his data science
credentials in an article that mentions Hadoop this much, to not mention that
the dude founded the first major Hadoop company between Facebook and
"retirement". The guy really was an inspiration when I worked there in the
early days.

