Hacker News new | comments | ask | show | jobs | submit login
The Data Science of MathOverflow (wolfram.com)
84 points by coliveira 7 days ago | hide | past | web | favorite | 29 comments

I find that wolfram mathematica is incredibly painful to use except for symbolic math. And even then.. the kernel often crashes, syntax (and the combination of how it is entered) is weird, and overall often things (like dynamic graphics) would just stop working altogether. Mind you, I was basically just doing some symbolic functions and manipulate functions.

I can not imagine actually using it with data and more complicated programs. That must be... really tedious.

It's great for solving equations and such, but I seriously wonder why anyone would do "data science" with it.

On the contrary I don't have a problem with the syntax at all. You can almost treat it like a symbolic functional language. All the usual functional programming constructs are there like map/fold/lambdas (mathematica calls them pure functions).

I agree the front end struggles if you have a large dynamic object with a lot of data, it has been 32bit for a long time.

Was watching one of Wolfram's Twitch streams and the front end is supposedly going to be 64 bit in the newest version (12 I think).

You are not alone. I think it is a powerful environment but the learning wall to get in is steep.

I found this just now http://www.wolfram.com/language/fast-introduction-for-progra...

And that page in particular is a good example of how not to write documentation for a language/environment.

Table[x^2, {x, 10}]

The page before introduced lists. And there was no mention of lists being able to do magic things like spanning values. I think that line up there makes a table and somehow that magic list goes from 1 to 10 ...

There is just too much hidden there. It is a poor introduction.

I have read this guide before ... and each time I shake my head and wonder why anyone would bother trying to get through the opaque/hidden syntax when there are way better choices of languages.

The learning curve is pretty insane but once you get past that it's a really fun language to work in especially if you want to build or test something really fast just to see if something will work.

Somewhat offtopic but it is strange that special-purpose languages for scientific computing even exist. It seems far saner to just have these tools available as a set of libraries for a general purpose programming language and indeed the world seems to be heading that way with python as the chosen language.

Maybe when matlab and mathematica were first created the existing dynamic languages were not very good?

It is rare for general purpose languages to make matrices as easy to manipulate as MATLAB does. Maybe Julia will fill this need, once it becomes more popular. Right now, for fast coding involving matrices (not necessarily fast running time, mind you), MATLAB takes the cake.

Fortran was released in 1957 as the first compiled language. Guess what it was for ... scientific computing.

I don't think that Fortran was the first compiled language. Wikipedia says the first language to be compiled was Autocode for the Mark 1.


> Maybe when matlab and mathematica were first created the existing dynamic languages were not very good?

It had less to do with the languages and more to do with the libraries. Would you use Python over Matlab if numpy and scikit learn did not exist?

And even today if you have to do a lot of heavy duty or specialized statistics, does Python match R, Matlab or SPSS libraries?

Why does it seem more sane to have libraries instead of specialized languages?

Even with Python and R's mathematical ecosystem, they don't replicate the sheer breadth and depth of specialized tools like Mathematica and MATLAB.

I don't think I've ever seen the term "data science" used in a sentence like this. Does the word "statistics" not sufficiently describe things anymore?

Does the word "statistics" not sufficiently describe things anymore?

Not for SEO purposes at any rate.

I was told (just yesterday and only partly in jest) that a data scientist is just a statistician with a Macbook

Statistics is "the practice or science of collecting and analyzing numerical data in large quantities"

> Statistics is "the practice or science of collecting and analyzing numerical data in large quantities"

That's not true.

Statistic can analyze data in small quantities you can read up on it with nonparametric statistic. At least half of nonparametric statistic deal with small data (mostly through use of ranking). With Bayesian stat you can just assign a prior distribution.

I think the best way to describe statistic is that it uses data to infer the population. A subset data via sampling and infer a statistic (mean, median, whatever) about the population at large.

More often than not I'm just surprise as to why Data Science is so big when seemingly it seems like statistic does every freaking thing with data. From how to correctly sampling, designing experiment to collect the correct data to answer a hypothesis, making sure the data aren't bias, etc.. It deals with designing the data from inception to end, including either collecting the data or data given to you already. On top of dealing with problematic data such as missing data, imputation, etc..

Although a very cool and thorough approach to the data, it's very complicated code-wise and doesn't help sell the Wolfram Language. (Python/R admittingly can't do as much with LaTEX equations formatting for data analysis/visualization.)

Mathematica user here. It is nice that the Wolfram language has an insane amounts of builtin features (graphics, charts, massive amount of math functionality, tables, database stuff, GPU, 3D printing, blockchain, audio analysis, web scraping, one load function that can interpret over 100 different file formats...etc) it is insanely powerful for a lot of things.

Some of the drawbacks are that it is closed source, costs money (although cheap as far as this kind of software generally goes), but most importantly the succinct benefits of the language can make it painful to deal with. Yes, I can probably write 3 lines to do something that would take a 1/2 page of Python, but I first need to know which of thousands of functions to use and the eccentricities of the language. I'm sure Wolfram employees are that skilled, but I'm not and will not be anytime soon.

With that being said, I spent a few hours writing a notebook demonstrating key fundamental and scientific formulas in my industry this weekend. It was easy and the resulting code, pictures, and graphs look fantastic. I exported it as a PDF for others to use. Even the console is pretty cool. I think it would be a very popular language and environment if it was free and open source. Another problem is running something in production. My solution is to not even bother and just write the final solution in Julia if I need it. I think Mathematica really makes sense though if you're at a lab where everyone else uses it and can pass around Notebooks. In short I really like having it around, but don't like dealing with licensing issues.

I didn't know Wolfram Language had support for RDF and SPARQL!

I'm always baffled by the expressiveness of the language. I think the platform is extremely underrated. How is it not the silver bullet we're all looking for?

It’s closed source, for starters.

I've also wondered whether it is underrated. I looked into it a bit recently and one thing I noticed is that nearly all the emphasis is on interactive "notebook" style work, with very little emphasis on writing code to be structured into modules and executed as a standalone piece of software. (You have to dig really hard to even learn how to write a module in a plain text source code file.)

It would be nice to see versions of this analysis in other languages such as Python.

A year ago I did a version of Stack Overflow analysis (https://minimaxir.com/2018/02/stack-overflow-questions/) around similar topics (question/answer rates) using R (code: https://minimaxir.com/notebooks/stack-overflow-questions/)

I would like to see a version with Javascript and http://observablehq.com.

I would like keep js out of data science. Why should anyone like this?

JavaScript is ruling the world and will soon replace Python for AI-related stuff.

It won't. Scientific computing is not web development, python probably replaced by a saner language, like Julia, but js is not saner anyway. It's little messy language that becomes bytecode of browsers.

Truly spoken by someone who doesn't really know what they're talking about.

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact