Hacker News new | past | comments | ask | show | jobs | submit login
Exploring Mathematics with Matplotlib and Python (programmingzen.com)
156 points by acangiano 44 days ago | hide | past | web | favorite | 31 comments

This is a nice article. For those who have not yet read it (it's short, read it!), a one-paragraph summary: the author starts with a list of random numbers. Visualizing it (plotting the numbers, with the list index on the x axis) suggests / leads to (for the author) curiosity about how often numbers repeat. Plotting that leads to the question of what the maximum frequency would be, as a size of the input list. This can lead to a hypothesis, which one can explore with larger runs. And then after some musings about this process, the post suddenly ends (leaving the rest to the reader), and gives the code that was used for plotting.

This article is essentially an encouragement and a reminder of our ability to do experimental mathematics (https://en.wikipedia.org/w/index.php?title=Experimental_math...): there's even a journal for it, and the Wikipedia article on it is worth reading (https://en.wikipedia.org/w/index.php?title=Experimental_Math...). See also (I guess I'm just reproducing the first page of search results here) this article (https://www.maa.org/external_archive/devlin/devlin_03_09.htm...), these two in the Notices of the AMS (https://www.ams.org/notices/200505/fea-borwein.pdf, http://www.ams.org/notices/199506/levy.pdf), this website (https://www.experimentalmath.info), this post by Wolfram (https://blog.stephenwolfram.com/2017/03/two-hours-of-experim...), and there's even book by V. I. Arnold (besides a couple by Borwein and Bailey, and others).

Especially in number theory and probability, simple explorations with a computer can suggest deep conjectures that are yet to be proved.

Layman here

Thank you so much for pointing this out! Experimental mathematics feels like a missing puzzle piece in which it makes so much more sense.

Quotes are from the wiki article you linked.

> As expressed by Paul Halmos: "Mathematics is not a deductive science—that's a cliché. When you try to prove a theorem, you don't just list the hypotheses, and then start to reason. What you do is trial and error, experimentation, guesswork. You want to find out what the facts are, and what you do is in that respect similar to what a laboratory technician does."[3]

I wish there were books on how people would describe their complete process (not only their proof) on how they figured things out.

> Mathematicians have always practised experimental mathematics. Existing records of early mathematics, such as Babylonian mathematics, typically consist of lists of numerical examples illustrating algebraic identities. However, modern mathematics, beginning in the 17th century, developed a tradition of publishing results in a final, formal and abstract presentation. The numerical examples that may have led a mathematician to originally formulate a general theorem were not published, and were generally forgotten.

Why is this the case? It seems like it doesn't benefit us other than saving some paper.

> The following mathematicians and computer scientists have made significant contributions to the field of experimental mathematics:

Fabrice Bellard

Donald Knuth

Stephen Wolfram

(among others)

--> This is so awesome, it also sheds some light into how these people think.

This isn't to defend how it's done, but the tradition has been that the "laboratory technician" skills are learned on the job. This is true of lab tech work as well. I've taught a number of summer interns how to solder, but it's not written up in any research paper. Of course that makes it hard if one isn't preparing to do it as a job.

> I've taught a number of summer interns how to solder, but it's not written up in any research paper.

Not in a research paper, but it is described in some nice decades-old training videos, https://www.youtube.com/playlist?list=PL926EC0F1F93C1837

By the way: the post ends with a conjecture that the maximum frequency is likely to be log n + 1 (with log here denoting the logarithm to base 10), but the more precise result seems to be that the largest frequency is:

((ln n)/(ln ln n))(1 + o(1))

with high probability (i.e., probability 1 - o(1)), where ln denotes the natural logarithm (log to base e). Empirically, the maximum frequency for n=1000, 10000, 100000 often seems to be, respectively, 5, 6 or 7, and 7 or 8.

This problem has applications in studying hash tables etc., and can be found under terms like "maximum load" with balls in bins, and proving this doesn't seem to be very easy. As the post says “the solution is likely not as trivial as it first looks”. The analysis may be hard, but these days if faced with a problem like this in the real world (e.g. we have a hash table of size M that will receive N entries in it, and we're curious about the likely maximum load), we can likely just experiment to find out. Even when the numbers are too large to run simulations directly, an in-between solution is to get a tractable expression (a recurrence relation using dynamic programming or whatever) for the closed form, and write a program to compute it.

1. How many people do there need to be in a room so the there is a greater than 50% chance of at least two of them sharing the same birthday?

2. How many numbers we have to draw from 365 so the there is a greater than 50% chance of at least two of them are the same?

3. How many numbers we have to draw from X so the there is a greater than Y% chance of at least Z of them are the same?

I think X,Y,Z are enough parameters:

Drawing from X=1000 numbers, what is the chance Y that Z=(5,6,7,8...) is the same?

Sorry, I'm not a matematician, just some breakfast ideas ;)

Edit: Inspired by http://datagenetics.com/blog/february72019/index.html

I’m embarrassed to see such a post upvoted. Also, matplotlib is outdated. If you want a good visualization tool, it should leverage as many features as it can to present the most information possible. This includes not just color but interactive tools like hover tools. A library like bokeh makes this extremely easy for example. I’m a bit sad to see such posts whose purpose is to demonstrate how to leverage visualization tools to improve our understanding of a phenomenon by people holding on to legacy outdated tools. It sends the wrong message.

Matplotlib is incredibly powerful for non-interactive visualisations, and I have yet to find another library that offers the same flexibility.

Interactive visualisations are often impractical. They don’t work in publications, presentations or documents. Generally speaking, visualisations that have everything clearly visible without requiring interaction are always superior to visualisations that require extra interactions.

I found that it always paid off to do some extra thinking on how to reshape my plots so they don’t need to be interactive. I had very few cases where I needed the plots to be interactive, and ironically, in these occasions only matplotlib worked for me. Those were cases where I wanted to show and play audio snippets that belonged to data points in a dimensionality reduction plot. It was quite hard to get matplotlib to do what I want, but I didn’t even get a anywhere near a result with plotly et al.

Plotly has very robust static image export capabilities, meaning you can create an interactive vis for yourself in a notebook, and embed it in an app if you like, and then use the same tools to create publication-ready export in SVG, EPS, PDF, PNG etc.

Here is the relevant documentation: https://plot.ly/python/static-image-export/

How is matplotlib outdated? It's just relatively low-level, so it might be quicker to use high-level tools like seaborn or altair (which use matplotlib as their backend), if that fits the situation. For heavily customized publication-ready plots (that means print), there is no alternative to matplotlib. The other two low-level Python visualization libs, plotly and bokeh (see https://pyviz.org/tools.html) focus on interactive plots, which is an entirely different use case! Bokeh just does not generate visualizations for print. Thus, it does NOT make matplotlib obsolete. On top of that, last time I checked, bokeh was far less flexible than matplotlib.

Just wait 5 years. I was a heavy matplotlib user and before that used a primitive tool for plotting in C. It actually performed so much better than matplotlib. If working on those one or two figures for hours for a publication is what you’re aiming for, sure matplotlib is still useful for that (I did that during my PhD and postdoc days, sure it’s interesting). But I think that’s where the problem is. We have too many people stuck on an old tool because that’s all they know. I’m watching lectures and courses every morning before work and keeping up to date. It’s helped me get exponentially ahead. I think we need to encourage this mentality. This current post was about showing how to understand data. I guarantee you that if you learn to use bokeh the right way you’ll have significantly faster iteration times. Also, in industry where things are extremely fast paced, you need to move and present your information fast.

Again, my disappointment is that we need to encourage a mentality of flexible learners, and I find this post a regression. I’m a bit disappointed at the unpopularity of this comment, but maybe that’s why I’ve moved up to earn an enormous salary.

Try it

Mighty matplotlib isn't going away any time soon, but just because Plotly makes interactive plots, it doesn't mean you can't also

a) customize every aspect of the chart, from the fonts to the length of the axis ticks to the legend placement etc (see the full list of thousands of available customization attributes here: https://plot.ly/python/reference/)

b) export to raster or vector formats for publication (https://plot.ly/python/static-image-export/)

c) use high-level grammar-of-graphics-inspired tools like https://plotly.express/ to create complex charts in a single line of code.

I haven't really looked at plotly in a very long time. At that time, it was cloud-based. Having the data I'm plotting sent to ploty's servers, or more generally, my plots depending on the availability of some remote server, was a no-go for me. Has this changed?

Very much so! Plotly.py version 4 is "offline-only" just like matplotlib and other libraries: https://medium.com/plotly/plotly-py-4-0-is-here-offline-only...

Very nice! I might give it another try sometime soon, then!

Nobody cares if matplotlib is outdated. It served its purpose here, which was to show how plotting data can lead to further understanding data. You don't need a fully featured tool to do so.

As far as "fully featured" goes, there is nothing more fully featured than matplotlib. It just tends to take a lot more code to generate a plot, compared to some higher-level libraries, so for simple visualizations, matplotlib may not be worth the tradeoff of power vs ease of use.

Nothing really beats "plot sin(x)" in gnuplot.

As much as I love the Python language, it is shameful how it has become a sort of "schtrumpf"-like addition to any computer-related stuff. Such introductory tutorials are great (but this one, specifically, would benefit much by having the code that generates each figure next to it). However, it is really not necessary to specify that your particular thing is "with Python" as if it really meant anything fundamental.

What kind of shitty reasoning leads to this? "Oh, let's introduce this elementary mathematics to the illiterate masses by writing it as a Python script. Now everyone will understand it!" This is a lack of respect for the agency of the readers.

While your comment (“let's introduce this elementary mathematics to the illiterate masses by writing it as a Python script. Now everyone will understand it”) may apply to some articles, what relevance does it have on this one? This article is fundamentally about how, starting even with a set of random numbers, simply visualizing it can lead to further explorations that can be carried out again with computers — sure you could replace “Matplotlib and Python” with “a graphing library and a programming language”, but fundamentally the article is not trying to teach any elementary mathematics and in fact it does not even reach the point of proving any theorems (for that matter despite a fair bit of trying I haven't actually figured out yet how to prove the conjecture the article ends with); it's just about the process of exploration (generating hypotheses, verifying them, etc) using computers.

Perhaps it's just a factual element. Or perhaps the intention is `s/python/pragmatic/`. Yeah, python sucks in many ways but is it really so horrible? I know! Let's ask the reader to get intimate with a static compiler. If they're "smart" enough to satisfy (or trick) the compiler, then they've earned the "reward" of being able to execute their program.

There is value is static typing, but there are many instances where that cost is not worth the reward.

I think the grandparent poster was just pointing out that this post has very little to do with python or matplotlib, with the code to generate the plots just thrown in as an afterthought at the end. Which makes it weird to have them in the title. (Whether python is a good tool or not is unrelated to this observation.)

I would expect a post entitled “exploring mathematics with python” to have a whole lot more python code (inline with the text and better explained instead of an uncommented blob at the end) and a whole lot more mathematics.

A more accurately descriptive title for this post might be “counting the repetitions among randomly chosen positive integers”.... which of course isn’t going to get as many clicks or as many reflexive upvotes from non-readers as a post promising “exploring mathematics with python” because it doesn’t sound (and frankly isn’t) all that interesting to most readers. (It might make a decent short project for middle school students though.)

Personally I flagged the post for its misleading title.

What I got out of the post was that exploratory and experimental mathematics is fun and worthwhile and if you haven't tried it, you should, and by the way, this experiment uses Python and matplotlib (which some readers may already know). I think you missed the point of the article.

Every in the title is a major part of the article. The author used Python's matplotlib to investigate mathematics, and share both the math and the Python code.

It's not the author's fault if you bring your own baggage to the words.

> If they're "smart" enough to satisfy (or trick) the compiler, then they've earned the "reward" of being able to execute their program.

This is needlessly dismissive and frankly offensive.

I prefer having errors from a compiler(or static analysis, or...) because it helps me. Not because I am a better programmer but because it helps me be a better programmer.

> Compiler: Hey that type doesn't work there

Oh! Thank you! I meant to use this type instead.

> Compiler: This value is freed here but used right afterward here

I meant to clone it. Whoops. That would have been embarrassing to debug in production!

And so on.

I absolutely recognize that it's a barrier to entry, but it's not one erected to keep people out, it's there to catch your mistakes for you so that you spend less time debugging and more time writing your actual application.

It's also needlessly dismissive to write as if everyone using Python is making a mistake.

> It's also needlessly dismissive to write as if everyone using Python is making a mistake.

That's true. If I start doing that, please call me out on it. From the HN Guidelines:

> Please respond to the strongest plausible interpretation of what someone says, not a weaker one that's easier to criticize. Assume good faith.[0]

I'm very much not a fan of meaningful whitespace, but I use Python occasionally, and regularly help my friends who are learning it grasp this or that topic. Except where pointing them to a specific library or other tool, I've never said "You shouldn't do that in Python, do it in this other language instead".

[0]: https://news.ycombinator.com/newsguidelines.html

I apologize if I misinterpreted what you wrote. It seems you just meant to criticize the title, or...? I'm actually not sure what your above comment was getting at. Sorry if I missed your point.

> I'm actually not sure what your above comment was getting at. Sorry if I missed your point.

I was responding specifically to the content that I quoted, which came from the post that I replied to.

> It seems you just meant to criticize the title, or...?

I should note that I am not the person who started this comment thread.

They were criticizing the post title as they did not believe that the contents within were specific to Python.

They were not criticizing Python either. Rather they were saying that the word Python in the title appears to be used to attract viewers who might otherwise be intimidated by the contents, instead of being relevant to the contents of that article.

I see! Thanks. What I meant to say was that the "'smart' enough" comment seemed to me to be meant in a jocular way and should not be something to take offence at. However, I did indeed misread the thread flow.

The scare quotes are what take it from potentially jocular to derisive.

Applications are open for YC Winter 2020

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact