
The Two Cultures of Mathematics (2000) [pdf] - rfreytag
https://www.dpmms.cam.ac.uk/~wtg10/2cultures.pdf
======
ajtulloch
One technique Prof Gower's mentions (the phenomenon of the concentration of
measure) has many applications in machine learning/statistics (which may be of
more interest to HN readers than some of the probabilistic combinatorics
mentioned in the paper).

A few (non-rigourous) examples in ML/statistics are:

\- The use of the LASSO estimator [0] (minimizer of |Y - β X|₂ + λ |β|₁) for
variable selection is justified by Candes & Tao's 2005 results [1, 2] that
show, under fairly general conditions, that this estimate exactly recovers the
true set of active variables. A key step in the proof is finding a
concentration of measure result for sub-Gaussian random matrices. A corollary
is that (with high probability) we can exactly solve the non-convex problem |Y
- β X|₂ + λ |β|₀ by passing to the convex relation |Y - β X|₂ + λ |β|₁ and
have strong guarantees that our relaxed solution is optimal.

\- The Johnson-Lindenstrauss lemma for dimensionality reduction, etc [3] can
be proven by essentially the same principle (random projections are "probably"
nearly isometries) that Prof Gowers describes in this note.

[0]:
[http://en.wikipedia.org/wiki/Lasso_regression#Lasso_method](http://en.wikipedia.org/wiki/Lasso_regression#Lasso_method)

[1]:
[http://authors.library.caltech.edu/10092/1/CANieeespm08.pdf](http://authors.library.caltech.edu/10092/1/CANieeespm08.pdf)

[2]:
[http://arxiv.org/pdf/math/0410542.pdf](http://arxiv.org/pdf/math/0410542.pdf)

[3]:
[http://en.wikipedia.org/wiki/Johnson%E2%80%93Lindenstrauss_l...](http://en.wikipedia.org/wiki/Johnson%E2%80%93Lindenstrauss_lemma)

------
throwaway283719
I was lucky enough to have Tim Gowers as a lecturer in my first term of
university (hmm... 2002) when he gave a lecture course introducing number
theory, set theory, combinatorics and axiomatic construction of number
systems. It was intoxicating to be taught by someone who was (a) brilliant,
and (b) so thoroughly enthusiastic about mathematics. I eventually routed off
into applied math and physics, but I look back on those initial lectures
fondly, and still have an appreciation for the kind of "problem solving"
mathematics that he taught.

------
dkarapetyan
The argument can be adapted almost verbatim for hacker type vs theory type
programmers. One is in the problem solver camp the other in the fundamental
understanding and theory building camp. Silicon Valley is dominated by hackers
and they are constantly re-inventing wheels because of lack of theoretical
understanding, e.g. NoSQL, and the theory builders are working on problems
that will almost surely collect dust in some dark corner of a university
library.

~~~
loup-vaillant
Funny… While I personally like beautiful theory, the last bit of deepish
theoretical work I am studying was to solve a particular problem.

I'm currently delving into Earley Parsing, so I can write compilers easily, so
I can write DSLs economically, so I can write short programs in any language,
so I can spend less time writing those programs, so I can generate less bugs
from those programs, so these programs ultimately cost less money$$$.

Then there is Bret Victor, who uses very simple principles to solve the long
standing problem of understanding in various domains.

I probably stand in the _" the best solutions come from the deepest theories"_
camp. With the caveat that "deep" does not necessarily mean "complex", on the
contrary. Lambda calculus is very deep, but quite simple. So, while I believe
problems are the centre of our field, I also believe we need to break down
those problems into deep and simple math before we can tackle them
efficiently. Not doing so is a recipe for bloat.

~~~
dkarapetyan
I disagree about Bret Victor. There is nothing simple about the principles
that he uses. In fact you need a whole lot of very deep theoretical
understanding to even get to the point of Bret's abilities. Time traveling
debuggers for one are very advanced technology. If anything Bret Victor is
applying very advanced theory to mundane domains like programming.

~~~
loup-vaillant
> _Time traveling debuggers for one are very advanced technology._

Yes they are. But the underlying principles are dead simple: just keep all the
history of the memory, never erase anything!

Of course, the _implementation_ is not that simple, because efficiency,
because complex imperative languages, because many other things. But the basic
principles are still dead simple. You may be amazed by time travelling with
GDB, but OCaml had that _years_ before GNU did it.

I feel the same way about Bret Victor. He does understand things that are hard
to grok. Probably very abstract stuff. But I suspect it wouldn't take much
space to write those things down.

I gather you have seen his videos, most notably the dead fish and dynamic
visualization ones? Learnable Programming also counts. Tell, me, why did you
find those presentations so amazing?

The reason _I_ found them so amazing was their simplicity. Bret's demos show
incredibly innovative features, but at the same time, those features felt
incredibly simple: easy to use, the feedback is obvious, and frankly, most of
his particular examples don't seem that hard to implement. The hard part was
coming up with the idea in the first place.

Plus, I can see 2 overarching principles in Bret's work: the first one is,
using our visual system to see the relevant stuff at a glance. Give me a
graph, not a listing. The second one is direct interaction. You could think of
it as a REPL on steroids, or even a live image, but it's more than that. In a
live environment, you would change a variable, and see the consequences in
real time. Bret Victor goes beyond even _time_ , and let us manipulate systems
from the outside, as if they were a block universe (where time is just another
dimension).

In learnable programming, you can see how he slides the force of the jump, to
make it just right so our little Mario can go through that little secret
passage. We can visualize the whole path (that's the first principle), and we
can see what that path _would have been_ if you change the laws of physics
(that's the second principle).

Very deep stuff. Very _hard_ stuff. But as I said, very simple as well. And I
expect future insights will make it even simpler.

\---

Let us go back to my Earley Parsing for a second, because it's quite
illustrative. My journey began with this talk[1] by Ian Piumarta. He mentioned
Earley parsing at some point saying it was very general, handles every
possible context-free grammars, and is also very simple, very easy to
implement: half an hour to translate the algorithm in any language.

Months later, I'm still at it.

But you know what? As I come to understand Earley parsing, I agree with him
more and more. This thing is _genuinely_ simple. I wrote a recognizer in less
than a hundred lines of Lua code, and the whole parser will likely take less
than 200 lines. As I said, simple.

Simple does not mean easy however. Without the proper insights, I could only
struggle. But then I stumble upon a sentence, a picture… and then something
_clicks_. And I can see the simplicity, as well as the depth. And then it
clicks _again_. And it becomes even simpler. To the point that now, I
understand where Piumarta is coming from. And I still remember why it was so
difficult for me back then.

I mean, Earley parsing did use to feel kinda complex, and quite ad-hoc. At the
same time, it didn't seem too difficult. I was wrong. Earey parsing proved to
be much more difficult, and at the same time much simpler, than I had
expected.

[1]:
[http://www.youtube.com/watch?v=EGeN2IC7N0Q](http://www.youtube.com/watch?v=EGeN2IC7N0Q)

\---

I think Bret's work is like that: deep, hard, and simple.

------
mathgenius
I am currently reading "The Master and His Emissary: The Divided Brain and the
Making of the Western World" by Iain McGilchrist. It provides a good overview
of current insight into the roles of the separate hemispheres of the brain
(way beyond the usual hippy stuff). One of the distinctions made is that the
left hemisphere creates "abstraction" and the right hemisphere works by using
"examples". A good course in (pure) mathematics will emphasize both:
understand the theorem but also consider these examples that demonstrate the
theorem. Now it strikes me that this article by Gowers paints a deeper picture
of this left/right brain divide in mathematics, and the tension between them
that McGilchrist describes in his book. How remarkable.

------
techadv
Related is Dijkstra's "on the fact that the ocean has two sides":
[https://www.cs.utexas.edu/users/EWD/transcriptions/EWD06xx/E...](https://www.cs.utexas.edu/users/EWD/transcriptions/EWD06xx/EWD611.html)

------
kamaal
This isn't just with Mathematics. In general in any profession, people who
start new things are valued more than the people who solve problems with
existing things.

On earth day I was speaking to a volunteer, who talked about how people who
talk about saving existing forests are taken to be anti industrial progress
but people who plant new trees are considered as people who are changing the
world.

This applies to any thing you will ever do.

This is also why you must avoid jobs and responsibilities that deal with
legacy projects, and seek projects that are doing something new.

------
NhanH
Is this something really new? I've always thought that for any field of
science, there are always those that research for the sake of understanding,
and those that study/ research the field to be use as an abstraction for
another, more practical field (as noted in
[http://xkcd.com/435/](http://xkcd.com/435/) ). Even back hundreds of years
ago, Newton developed Calculus for physics, but Euler is studying maths for
the sake of understanding.

------
mberning
Do we not see similar trends in every aspect of computer science and software
development?

~~~
NAFV_P
If you are referring to "the two cultures", what would those cultures be?

~~~
chriswarbo
Computer Science vs. Software Engineering; although usually the label
"Computer Science" is applied to both.

Paraphrasing the PDF:

Loosely speaking, I mean the distinction between programmers who regard their
central aim as being to solve problems, and those who are more concerned with
building and understanding theories... If you are unsure to which class you
belong, then consider the following two statements. (i) The point of solving
problems is to understand computation better. (ii) The point of understanding
computation is to become better able to solve problems.

Here (i) corresponds to Computer Science and (ii) to Software Engineering.

~~~
NAFV_P
> _If you are unsure to which class you belong, then consider the following
> two statements. (i) The point of solving problems is to understand
> computation better. (ii) The point of understanding computation is to become
> better able to solve problems._

Like the original statement in the article, your paraphrasing strikes me as
being _egg and chicken_. I think most mathematicians and hackers would sit on
the fence.

~~~
AnimalMuppet
Perhaps. But even those who sit on the fence lean toward one side or the
other.

~~~
NAFV_P
Or the fence sitters lean one way or the other depending on their
mood/requirements.

------
iaf
Can someone please expand on this statement, "if one is counting solutions,
inside a given set, to a linear equation, then it is enough, and usually
easier, to estimate Fourier coefficients of the characteristic function of the
set"

~~~
mathgenius
The characteristic function of a set is one in the set and zero outside [1].
Take fourier transform of this function. Fourier transform is a linear
operator. Turn handle, bingo.

[1]
[http://en.wikipedia.org/wiki/Indicator_function](http://en.wikipedia.org/wiki/Indicator_function)

~~~
iaf
Ah, I didn't know 'characteristic' function and 'indicator' function were
synonyms. Thanks!

------
dang
It looks from [http://www.americanscientist.org/bookshelf/pub/prove-and-
con...](http://www.americanscientist.org/bookshelf/pub/prove-and-conjecture)
like the proper date for this article is 2000.

