

Putting the Science back in Computer Science - kenjackson
http://www.cs.princeton.edu/~rs/talks/ScienceCS10.pdf

======
T_S_
That is a big, rambling presentation with a lot of interesting topics in it.
Despite the title, I think Sedgewick is really making a case for putting more
mathematics into Computer Science. I think this is an important long-term
trend.

Last year I read Bergeron et al's Combinatorial Species and then dipped into
Flajolet and Sedgewick's Analytic Combinatorics. The first of these is
concerned with mathematical structures that are tantalizingly close (or
identical) to data structure that we often use. These can be represented as
algebraic expressions and converted into related or equivalent ones using
simple operations from algebra and calculus. F&S exhibit the tools necessary
for analyzing the properties of these structures, using complex analysis, the
same tools used to get a grip on "ordinary" algebraic expressions. It turns
out these tools can help predict the performance of algorithms or help
generate large random data structures for testing or statistical analysis.

We are used to the strong connections between type theory and mathematical
logic, and here we see something that looks like types being manipulated with
algebra!

To me the trend is this: CS will benefit in the future from borrowing more and
more design patterns from math. E.g. Don't think databases, think relations,
or better yet, graphs.

------
PaulHoule
Hard to say.

Computer science has always been in an awkward position. Is it science, is it
math, is it engineering? Some people say that it's an entirely new thing, the
"study of the artificial"

Personally my impression is that work in description logics, databases and
such has been held back by a focus on mathematics. Roughly, the more powerful
a logic system gets, the harder it is to 'compute' about it, both in the sense
of using computational resources and in the sense that no sound and complete
algorithm may even be available.

On the other hand, the real world requires us to represent facts like "Captain
Kirk is a person is the Star Trek universe" and "Christians believe that Jesus
Christ is the Son of God", "The identity of Deep Throat was unknown to most
people in 1993", etc. For that we need highly descriptive representations.

On the other hand, we know that people and animals are quite capable of
various sorts of modal reasoning. People aren't good at solving arbitrary
problems with general structures, but they seem to have a bunch of heuristics
that can deal with all of the problems that are actually in place.

Perhaps a scientific approach can be experimental, and not be so concerned
with problems that ~might~ happen, but as an engineer I'm afraid the term
'science' is something that can be appropriated for the fashion of day, as in
'scientific socialism'.

~~~
mathgladiator
It is something new entirely and the superposition of science, math,
engineering, art.

I've talked (and I'll blog more about this sometime) about the role of a
software architect.

Software is very free in an engineering sense. There is not reality except the
limitations of the machine which compared to reality are kinda easy. There is
no gravity, friction, collisions between matter, heat, radiation. There is
just a machine.

In this freedom, you have art. It can be used to implement math or build
mathematical theories on. You can do science and have lots of fun.

The role of the software architect is to define reality and this reality will
enable engineering.

~~~
PaulHoule
Once you start dealing with a lot of data, you feel a very heavy weight from
the physical limitations of your computer. The path that led me to computing
was really a flight from my blue collar roots and the tyranny of the physical
universe, but now I have my days when I wish I was good at using a milling
machine.

------
alextp
This is very interesting and very true. I think the main points of this
article are:

1\. Constants do matter, specially most programs end up with an asymptotic
that looks like a N^b, and a and b are often not what you'd expect. This
matches my experience a lot, and it's interesting that this is one level below
O notation but one level above "counting how many operations should happen,
roughly", which is what you have to do in practice to make algorithmic code go
fast.

2\. The examples around which an intro to CS course is structured are really
uninteresting. I couldn't agree more, and never cared for computing factorial,
fibonacci, etc, or for writing silly games or dumb data-processing programs;
Simulations and numeric algorithms are far more interesting, and can get
students interested more easily when they realize that other classes' work can
be made far easier with computers.

3\. Everyone should know CS. Even my philosophy-major wife often has the need
to run short scripts to automate some tasks, and I think that if a philosophy
major can benefit from CS concepts anyone else should as well.

All these things one should find out eventually, and I agree that exposing
people to these concepts as soon as possible is a very good idea, far better
than what is usually taught as CS-for-non-CS-majors.

~~~
mturmon
And regarding your point (1), the slides also lay out the interesting case in
which the typical-case asymptotic time complexity

    
    
      a1 N^b
    

is far more favorable than the worst-case form of

    
    
      a2 N^b
    

The first form is generally deduced through extensive experiments with typical
data, while the second is (often) easily proved with just paper and pencil.
The first is the more important in practice, but the second is what's quoted
in textbooks.

There are so many algorithms in which a1 << a2, which enables all sorts of
problems to be addressed that would be otherwise intractable.

The linear programming/max-flow type problems the slides mention are a major
case.

------
joshes
"1970s: Want to use the computer? Take intro CS.

2000s: Intro CS course relevant only to future cubicle-dwellers"

This is a serious problem. When I speak to students from programs other than
Computer Science, even from other scientific disciplines, they question why
they should ever have to take a Computer Science class. But if you tell a
Chemistry major he needs a Physics course, or tell a Biology major that he
needs a Chemistry course, it is likely to be accepted as perfectly reasonable.
To many, CS == Computer Programming.

But this is a tough problem. To get non-CS students interested, we probably
need to stress the applicability and practicality of being able to understand
fundamentals and concepts of Computer Science as applied to other scientific
disciplines. "Putting the science back into Computer Science" may be the only
way to do so, while also dispelling the notion that you only need Computer
Science education if you wish to program computers for a living.

~~~
baddox
It seems like most computer scientists, when observing the general lack of
mainstream interest in computer science, assume that there's actually a
problem with having a low number of computer science students. I disagree. Why
do we ("we" being the computer science community, of which I casually consider
myself a member) need _more_ people? Is there reason to assume that there are
many people who would be interested in and excel in computer science that
never gave it a try? A decent portion of the supposedly interested computer
science students I've interacted with don't even seem like they're a good fit
for studying CS. If anything, I think we should have _fewer_ CS students
(perhaps this could be accomplished by requiring earlier or more difficult
weed-out courses like Discrete Math, Algorithms, or Theory of Computation).

Especially with today's ubiquity of personal computing, certainly anyone who's
interested in computers or computation can seek it out themselves—I don't
think they need CS departments to cater or market themselves to them.

Oh, and about the "CS == programming" debate: If you don't like the act of
programming, I really don't see how you're a good fit for studying computer
science. This isn't to say that computer science and programming are one and
the same. Consider this: biology != dissection, but if you can't stand
dissection, I doubt you're a good fit for studying biology.

------
lkozma
A good part of the presentation is a critique of worst-case-analysis, which is
often not relevant for practical problems. The author doesn't mention smoothed
analysis which addresses this very issue:
<http://en.wikipedia.org/wiki/Smoothed_analysis>

~~~
bartonfink
That's interesting. I hadn't run into this before in college or grad school,
and it seems like a good way of bridging the analytical gap between theory and
practice that Sedgewick talked about.

It's unlikely that Sedgewick hadn't heard of this before. I suspect that he
didn't mention it simply because it isn't often taught. According to your
citation, it was introduced in 2001 which is quite recent in academic terms.
How did you come across it?

~~~
lkozma
Yes, it is quite interesting, and although the main idea is straightforward,
the analysis of concrete algorithms gets quite technical. Also, it's still
actively researched and maybe not entirely accepted into the mainstream yet.
AFAIK it has been successfully applied to many well-known algorithms which
have bad worst-case, but surprisingly good behavior in practice (e.g.
quicksort). I heard about it from my advisor.

A list of related papers:
[http://www.cs.yale.edu/homes/spielman/SmoothedAnalysis/index...](http://www.cs.yale.edu/homes/spielman/SmoothedAnalysis/index.html)

------
younata
To me, Computer Science has always really been more about math than anything
else. I tell people that CS is basically a glorified math degree. Although it
is possible to create programs without knowing much advanced math (beyond high
school), they still require you to understand math.

Also, telling people I'm a glorified mathematician means that I don't get as
much requests for tech help.

------
jasondavies
I thought his reasons for advocating Java for teaching are interesting, in
particular:

    
    
      Q. Why not Python?
      A. Poor data abstraction; everyone needs layers of abstraction
    

I've recently been telling my dad that he should teach himself Python (he has
no programming background aside from BASIC), perhaps I should rethink this. I
like Python's REPL for teaching, but back in my Java days I used to use
<http://www.beanshell.org/> to provide something similar.

~~~
andolanra
I'm not entirely sure that's a cogent argument. Two major objections:

1\. If anything, I think the problem with Java is that it encourages _too
much_ abstraction. This isn't inherent in the language design, but is a
product of Java culture—the massive UML diagrams and fragile class hierarchies
which are glorified in Java tend to encourage adding extra abstractions to
things which don't really need to be abstracted away, and you spend time doing
extra work to accommodate them. There's an attitude among computer scientists,
especially of the academic variety, that there's no such thing as "too much
abstraction," and while abstraction is generally a decidedly good thing—I
have, in my time, written Strategy-Factories, after all—I think it can be
possible to overemphasize and overuse it. (John Carmack: "It is not that
uncommon for the cost of an abstraction to outweigh the benefit it delivers.
Kill one today!")

2\. I'm not entirely sure that I agree with the statement that Python _has_
poor data abstraction, especially in comparison to Java. Python in many ways
lets you abstract away more details than Java does; e.g. duck typing allows
you to treat several pieces of data as identical black boxes without having to
worry about the implementation details and without using subtyping or
interfaces, which is exactly the definition of abstraction. On the other hand,
Java fails to abstract away some of the implementation details which are
unimportant to the kind of programming that beginners do, e.g. the differences
between int and Integer.

My own two cents is also that I also prefer Python's OOP-but-not-necessarily-
always philosophy to Java's OOP-all-the-time-everything-in-a-class-somewhere
philosophy. I am involved with teaching a Java course right now, and it has
made me really wish that students could start out not worrying about classes
and just writing scripts, as in Python, _before_ they have to learn about
classes and inheritance and data abstraction. (I have previously taught
everything from C to Scheme, and it really is wonderful to just start people
out _writing code_ in a high-level Python-like language without having to
handwave about static and classes and so forth.)

~~~
ShardPhoenix
My impression (given the next FAQ about Matlab) is that by "abstraction" he
means "static types". Whether static typing is better or worse for beginners
than dynamic typing is (as far as I know) an open question, like
(unfortunately) most questions about software engineering practices.

------
macrael
More than anything else, Computer Science is _young_. Unlike other
disciplines, there has not yet been a schism between the theoretical and
practical yet. In school, we learned a bit of everything. I took classes
focused on algorithms and computation, but also took classes on software
design, learning how to use databases and practicing loose coupling and such.

If you look a physics, which has been around a little longer, you see a
definite split. A physics major is a license to explore the theoretical limits
of the physical world. Engineers design bridges and learn skills that let them
do that really well. Both sides have overlapping knowledge and their knowledge
is complimentary, but schools split them into distinct groups.

My guess is that, in time, we will see the same split in computer science.

~~~
profgarrett
I really agree with this, but think that we'll see even more of a split than
just math/application. I got my PhD in Information Systems and Technology,
which fused a social science approach with a topic of information systems
being used in different domains. I found it really useful, and think that a
business-oriented/programming-oriented/math-oriented division will probably
split up the field.

------
PakG1
You know, correct me if I'm wrong, but I always viewed a true science as
something held the scientific method at its core. So the big test for me is
whether computer scientists apply the scientific method. And to be honest, I
can't think of many examples where people rigorously chase a problem using the
scientific method.

From Wikipedia, in case anyone's unfamiliar:

1\. Define the question

2\. Gather information and resources (observe)

3\. Form hypothesis

4\. Perform experiment and collect data

5\. Analyze data

6\. Interpret data and draw conclusions that serve as a starting point for new
hypothesis

7\. Publish results

8\. Retest (frequently done by other scientists)

How many people do this other than maybe to try and solve P=NP?

edit: formatting

~~~
jamii
That sounds very much like the debugging process in an open source project.

~~~
PakG1
Debugging a program is slightly different from trying to understand the nature
of the world, no? I suppose that's the other major criteria I tend to use when
thinking about scientists.

~~~
jamii
Sure, I'm just comparing the methods. I think its interesting that as software
systems become more complex that scientific methods become more and more
useful.

------
baddox
Quadratic algorithms are useless because performance matters? That is a
textbook false dichotomy. Performance obviously matters, but the time
hierarchy theorems suggest that there is certainly legitimate computation that
_requires_ quadratic time. Max flow will need at least quadratic time.

~~~
mzl
There are lots of algorithms with very high time-compleity that are still
usefull in practice. One nice example is the decision-algorithm for Presburger
arithmetic which runs in O(2^2^n).

------
hsmyers
I wonder if one of the implications of this presentation is that use of 'O'
notation in design/algorithm selection is a form of cargo cultism?

~~~
mayank
Not really, Big-O has its places. I think his point in that area was that the
constant factors _do_ matter, and are glossed over (by definition) in Big-O.
Reminds me of DJB's book on high-speed crypto, where he talks about this
issue: <http://cr.yp.to/highspeed.html>

~~~
mattyb
Is that book available anywhere? I can't seem to find it.

~~~
pasbesoin
It's not the book, but go here and search the page for "speed":

<http://cr.yp.to/papers.html>

------
ssn
Nicely designed slides.

------
mariuskempe
"A lecture within a lecture", haha.

------
NY_USA_Hacker
Poor Sedgewick: He's looking for work! Close to the terms of Steve Blank, he
is looking for "product/market fit"! So far, he's a long way from finding it!

Computing is the greatest gift horse to civilization so far. Yet poor
Sedgewick wants to look at this horse, strain to find a flea on its back, rush
off to do a 'scientific' study of how to overcome fleas, and then scream that
he has saved hackers in cubicles struggling with performance problems and has
enabled the power of the gift horse for the benefit of civilization.

So he wants to "Save the world" and, thus, get praise, acceptance, approval,
status, prestige, popularity, admiring Princeton coeds, lecture invitations,
and book contracts! Heck, just the coeds should do! A lot of them have rich
fathers who actually have done well finding product/market fit!

Part of his evidence for the value of his 'science' are enrollments in his
'computer science' courses! He interprets the enrollments as high interest in
overcoming fleas instead of just using the horse!

For the science, he wants to use old Princeton heroes von Neumann, Ford, and
Fulkerson.

Alas, in his steps toward science, in his first step he talks about
'randomness' and falls flat on his face in the mud. Instead, he needs to walk
across campus and talk to some people who actually know how to define and work
with 'randomness'; there are such people, but I will omit names here!

Then he wants to claim to use 'von Neumann's science' to dig into details of
performance of Ford and Fulkerson graph problems and try to find one measure
of performance "to rule them all".

Then he discovers generating functions!

Where does he get that really strong funny stuff he's been smoking?

But, the rest of us can relax: So far Sedgewick at Princeton hasn't found
anything important in 'computer science' beyond some middle school course in
Java!

