

Astronomer to Data Scientist  - gammarator
http://womeninastronomy.blogspot.com/2013/01/datascience.html

======
elchief
Some clarifications.

1\. Java and Python are good languages for data mining, due to their
libraries, though the others aren't great. C++ is good for finance.

2\. Excel is not a statistical analysis package. "Just because you can drive a
car with your feet doesn't make it a good idea" - Chris Rock. Excel is great
for pivot tables, however.

3\. Hadoop is not a distributed database, it is MapReduce plus a distributed
file system. Hive is also not really a distributed database. It lets you write
SQL that turns into mapreduce jobs.

~~~
rjdagost
Excel is the most common statistical analysis package on the planet, bar-none.
When you need to deliver code to people who are numerate but don't program
then Excel is the best thing to use. Sometimes you just need to give someone
(a manager) a tool where they can test out a bunch of different scenarios but
you don't have time to make a polished application. Even for "big data"
applications sometimes seeing the numbers in front of you in cells is quite
useful.

~~~
elchief
Yes, it is popular, but that doesn't make it good for statistical analysis. It
also sucks compared to JMP.

On the accuracy of statistical procedures in Microsoft Excel 2007

[http://or.nps.edu/faculty/PaulSanchez/oa4333/handouts/Excel/...](http://or.nps.edu/faculty/PaulSanchez/oa4333/handouts/Excel/excel2007.pdf)

------
noelwelsh
Looking from the other side, if I were hiring I wouldn't be too worried if the
applicant knew this kind of stuff or not. It would be in their favour, but I'm
far more interested in a strong mathematical background, intellectual
horsepower, and personality. If they have these attributes (and any good
researcher should) they should be able to learn the tech.

~~~
berkeleyjess
I wish more people hiring had your attitude. I found that wasn't the case
during my interview process.

------
46Bit
If you've hired/been hired for a "Data Scientist" role recently, what are the
main skills and accomplishments you look for? I could do with a better idea as
to whether we're talking someone who's a statistics fan or a distinguished PhD
expert.

~~~
noelwelsh
At this point in time "Data Scientist" seems to be very loosely defined. I've
seen everything from "must know Excel" to "must hold a PhD". You'll find lots
of blog posts where people try to pin it down, but I don't think one
definition will suffice. A more likely outcome is that we'll recognise
specialisms within data science just like we have front-end, back-end, and
other specialisms within programming.

------
skimmas
when I looked at the site url the first thing I read was wo-meninas-astronomy,
and meninas in portuguese is little girls. Ah :P funny

