
The Data Science of MathOverflow - coliveira
https://blog.wolfram.com/2019/02/01/the-data-science-of-mathoverflow/
======
zwaps
I find that wolfram mathematica is incredibly painful to use except for
symbolic math. And even then.. the kernel often crashes, syntax (and the
combination of how it is entered) is weird, and overall often things (like
dynamic graphics) would just stop working altogether. Mind you, I was
basically just doing some symbolic functions and manipulate functions.

I can not imagine actually using it with data and more complicated programs.
That must be... really tedious.

It's great for solving equations and such, but I seriously wonder why anyone
would do "data science" with it.

~~~
kanox
Somewhat offtopic but it is strange that special-purpose languages for
scientific computing even exist. It seems far saner to just have these tools
available as a set of libraries for a general purpose programming language and
indeed the world seems to be heading that way with python as the chosen
language.

Maybe when matlab and mathematica were first created the existing dynamic
languages were not very good?

~~~
goatlover
Fortran was released in 1957 as the first compiled language. Guess what it was
for ... scientific computing.

~~~
DerekL
I don't think that Fortran was the first compiled language. Wikipedia says the
first language to be compiled was Autocode for the Mark 1.

[https://en.wikipedia.org/wiki/Autocode](https://en.wikipedia.org/wiki/Autocode)

------
o10449366
I don't think I've ever seen the term "data science" used in a sentence like
this. Does the word "statistics" not sufficiently describe things anymore?

~~~
gowld
Statistics is "the practice or science of collecting and analyzing numerical
data in large quantities"

~~~
digitalzombie
> Statistics is "the practice or science of collecting and analyzing numerical
> data in large quantities"

That's not true.

Statistic can analyze data in small quantities you can read up on it with
nonparametric statistic. At least half of nonparametric statistic deal with
small data (mostly through use of ranking). With Bayesian stat you can just
assign a prior distribution.

I think the best way to describe statistic is that it uses data to infer the
population. A subset data via sampling and infer a statistic (mean, median,
whatever) about the population at large.

More often than not I'm just surprise as to why Data Science is so big when
seemingly it seems like statistic does every freaking thing with data. From
how to correctly sampling, designing experiment to collect the correct data to
answer a hypothesis, making sure the data aren't bias, etc.. It deals with
designing the data from inception to end, including either collecting the data
or data given to you already. On top of dealing with problematic data such as
missing data, imputation, etc..

------
minimaxir
Although a very cool and thorough approach to the data, it's very
_complicated_ code-wise and doesn't help sell the Wolfram Language. (Python/R
admittingly can't do as much with LaTEX equations formatting for data
analysis/visualization.)

~~~
4thaccount
Mathematica user here. It is nice that the Wolfram language has an insane
amounts of builtin features (graphics, charts, massive amount of math
functionality, tables, database stuff, GPU, 3D printing, blockchain, audio
analysis, web scraping, one load function that can interpret over 100
different file formats...etc) it is insanely powerful for a lot of things.

Some of the drawbacks are that it is closed source, costs money (although
cheap as far as this kind of software generally goes), but most importantly
the succinct benefits of the language can make it painful to deal with. Yes, I
can probably write 3 lines to do something that would take a 1/2 page of
Python, but I first need to know which of thousands of functions to use and
the eccentricities of the language. I'm sure Wolfram employees are that
skilled, but I'm not and will not be anytime soon.

With that being said, I spent a few hours writing a notebook demonstrating key
fundamental and scientific formulas in my industry this weekend. It was easy
and the resulting code, pictures, and graphs look fantastic. I exported it as
a PDF for others to use. Even the console is pretty cool. I think it would be
a very popular language and environment if it was free and open source.
Another problem is running something in production. My solution is to not even
bother and just write the final solution in Julia if I need it. I think
Mathematica really makes sense though if you're at a lab where everyone else
uses it and can pass around Notebooks. In short I really like having it
around, but don't like dealing with licensing issues.

------
miguelrochefort
I didn't know Wolfram Language had support for RDF and SPARQL!

I'm always baffled by the expressiveness of the language. I think the platform
is extremely underrated. How is it not the silver bullet we're all looking
for?

~~~
reikonomusha
It’s closed source, for starters.

------
coliveira
It would be nice to see versions of this analysis in other languages such as
Python.

~~~
miguelrochefort
I would like to see a version with Javascript and
[http://observablehq.com](http://observablehq.com).

~~~
h8hawk
I would like keep js out of data science. Why should anyone like this?

~~~
miguelrochefort
JavaScript is ruling the world and will soon replace Python for AI-related
stuff.

~~~
h8hawk
It won't. Scientific computing is not web development, python probably
replaced by a saner language, like Julia, but js is not saner anyway. It's
little messy language that becomes bytecode of browsers.

