
Rob Pike's Rules of Programming (1989) - tosh
http://users.ece.utexas.edu/~adnan/pike.html
======
jonahx
Torvalds version of rule 5: “Bad programmers worry about the code. Good
programmers worry about data structures and their relationships.”

Brooks's version: "Show me your flowcharts and conceal your tables, and I
shall continue to be mystified. Show me your tables, and I won’t usually need
your flowcharts; they’ll be obvious."

~~~
analog31
I wrote in Pascal for many years. Trust me, I can design data structures that
will baffle the greatest scholars of the ages. ;-)

I found one of my old codes, a few months ago, and I'm still trying to figure
out what possessed me to write it.

~~~
kornish
Is that better than the alternative, though, which is _not_ designing data
structures?

Essential complexity [1] is still complexity. It has to exist somewhere; you
just get to choose how to move it around.

[1]:
[http://wiki.c2.com/?EssentialComplexity](http://wiki.c2.com/?EssentialComplexity)

------
whack
Rule 6: Every well-intentioned rule will be bastardized and used to justify
horrible code.

Foo: _" Hey, this 10000-element collection, which we do repeated lookups on...
why are we using a list and not a HashSet?"_

Bar: _" Because lists are so much simpler than a HashSet. Let's just KISS"_

Foo: _" But... doing repeated lookups on a 10k sized list is so much slower
than just using a HashSet!"_

Bar: _" Oh really? Have you profiled the entire application, and measured the
latency impact caused by this decision? Come back to me once you've done so,
and until then, we're not going to tune for speed."_

~~~
winterlight
Ironically, in that situation, using a list might end up being more efficient
due to cache locality. Or not. That's why measuring is so important, since
performance can be a very counterintuitive subject. Hard-data should always
prevail over theory and guesswork.

~~~
anarazel
"Lists", presumably referencing a linked list, have horrible cache locality.
You were thinking of an array?

~~~
yen223
The rise of Python and similar languages have muddied the distinction between
"lists" and "arrays".

------
glangdale
Having spent a great deal of time - most of my career, really, doing
performance optimization, some of this rings true, but much of it doesn't.

Rule 2 only applies if you can just decide that your current level of
performance is "pretty great" and then parachute away to another project.
Otherwise if you find that, instead of 1 hot spot taking 80% of your time, you
have 5 warm spots taking 16% of your time, you have 5x as much thinking to do
to fix your problems and get that potential 4x speedup (assuming you could get
cut that 80% down by a factor of 16).

Rule 4 is an argument from bad implementation. It's harder to get fancy
algorithms right, but what does it mean to say an _algorithm_ is buggier than
some other algorithm. Are we seriously meant to think that there are some bugs
buried in Strassen's matrix multiply?

Rule 3 is dubious. It seems to imply that the performance cost of fancy
algorithms on low N matters, which assumes that we are actually executing the
algorithm quite a bit. Perhaps we've picked the wrong fancy algorithm? Maybe
if you have a godzillion tiny matrix multiplies you shouldn't be using
Strassen's but you might still get a performance win from doing something
'fancy' that's tuned for large numbers of small matrix multiplies...

As for Rule 5, this is probably the best of the lot. That's why I like to use
languages that allow me to describe my data structures in detail... even going
so far as to say that the contents of my user-built containers have _types_.
Ahem.

~~~
hasenj
Rule 5 is where dynamic languages usually utterly fail.

~~~
gavinpc
In twenty-five years as a working programmer, I have fallen in love with a
piece of technology four times: Emacs (on my third attempt), Tup (the build
system by Mike Shal), React... and now TypeScript. I've been using it full-
time for a few weeks now, and I can't imagine going back. If you like rule
5... you'll heart TypeScript.

 _Edit_ : I also spent ~12 years writing C# which is good. But TypeScript is
actually _more_ powerful in a way, because of the impossibility of typing
everything. This means it's likely to get things like variadic generics or
higher-kinded types long before C#, since (1) in C# everything _has_ to be
typed down to the smallest particle (so it's much harder to add features), and
(2) changes to the CLR would be needed.

~~~
millstone
When you say "going back" \- going back to what? I mean, is a language where
all array keys are strings really the best platform for higher-kinded types?

~~~
gavinpc
Going back to untyped JS of course. It's not about having "the best" platform
for types—that's just the point. TypeScript's system is incomplete, but it's
plenty good enough to _dramatically_ improve the experience. The tooling is
now much more powerful. The type contracts also provide a whole new channel of
formal communication between team members, not to mention unknown users if
you're developing libraries. And it's just _way more fun_. It's a rewarding
challenge to write the most expressive types that you can for the problem
you're solving. But you don't _have_ to. If you want to start by writing data
types and then fulfilling them, you can. If you want to start by writing code
and then typing it, you can. It's brilliantly designed to fit with all kinds
of idiomatic JavaScript. The language server provides fast incremental
compilation and integrates easily with all major editors. Structural typing
means you can define types in-place and conform to interfaces that haven't
been written yet. The type system's features are impressive and growing,
including generics (with constraints and defaults), tagged unions, mapped
types, guard conditions and flow control analysis.

It's not "the best" of both worlds, but it is both worlds. What other language
is as "gelatinous" as JavaScript [0] but still gives you state-of-the-art type
inference when you want it? Hack? [1] I don't know, but it's the same value
proposition.

[0]
[https://news.ycombinator.com/item?id=13942674](https://news.ycombinator.com/item?id=13942674)

[1] [http://hacklang.org/](http://hacklang.org/)

------
candiodari
It's just a cycle: procedural programming -> data oriented programming ->
object oriented programming -> functional programming

procedural programming -> 'I hate all these imperative sequences
("flowcharts"), it's all about the data anyway !'

data oriented programming -> 'I hate keeping all these data relationships in
sync, can't the data do that itself ?'

object oriented programming -> 'all this behavior is too complex, often does
things I don't intend, can't it be simpler ?'

functional programming -> "it's simple, but way too dense (like algebra),
can't we just write out what happens in sequences of imperative statements ?"

So we arrive at Wapper's "Battlestar" law of programming paradigms:

"All Of This Has Happened Before And All Of It Will Happen Again"

Object oriented programming is where really complex and large programs go. I
agree with Pike : my favorite is data-oriented programming. Read the data and
relationships, but only document them, don't encode the data schema directly.
But here's the kicker:

The issue with these Pike 5 rules is that while some things are simple (and
yes, people can screw up and make simple things complex), there are many
things that are complex and can't be made simpler without destroying the
functionality of the program. There is no way to make a CAD drawing program
that is anything remotely close to these rules. Making such a program simple
and predictable, while possible, also destroys its function.

~~~
eikenberry
>.. there are many things that are complex and can't be made simpler without
destroying the functionality of the program.

This is because we haven't figured out how to make them simple yet. I believe
all problems can be made simple, once we come up with the proper system. Maybe
we don't have the correct abstractions yet or just haven't figured out the
best data layout. We've only been collectively programming for a few years now
I think we have plenty of room left to advance the field. I'm not ready to
just throw in the towel and say impossible.

~~~
goatlover
Have architects figured out how to make designing any sort of building and
bridge simple?

Maybe with powerful AI and Holodeck-like programs we can make it simple for
humans to tell the computer to do anything they want easily, but short of
that, I'm not seeing why every program would be simple to build.

However, the more powerful your tools, the more complex buildings and programs
you're able to create. Programming hasn't necessarily gotten easier with more
powerful abstractions, rather we create more complex programs now.

------
hyperpape
"Rule 1. You can't tell where a program is going to spend its time.
Bottlenecks occur in surprising places, so don't try to second guess and put
in a speed hack until you've proven that's where the bottleneck is."

Everything about reading this quote depends on what you think a "speed hack"
is. Without practical agreement on that, you'll get a lot of people arguing
past each other.

~~~
khedoros1
I'd say "a modification to the code designed to speed up that section of
code". Are there other reasonable definitions? I thought it was pretty clear
that the rule could be accurately paraphrased as "Don't optimize a section for
speed until you're sure that's where it's needed."

~~~
kccqzy
I'd say "hack" implies there's some sort of trade off, perhaps in readability
and portability or additional assumptions.

------
drostie
I think #5, which is probably going to be thought of as great wisdom for our
age, is secretly an empty tautology. That is, my amateurish work in schemas
and validating data structures and type systems has led me to think that there
is a somewhat hard-to-see but extremely-important bijection between data
structures and the control structures that consume them. (In many ways this is
theoretically a non-issue as there are Church and Scott encodings of those
data structures, but I think it also extends lists to for-loops and not just
folds.) Seen that way it's really just saying "the most important part of the
algorithm is figuring out the right basic control structures to build the
algorithm out of." Well, yeah, that's what programming _is_.

~~~
gue5t
How is a for-loop not a fold?

~~~
JBiserkov
A fold is _usually_ restricted to a combining function of 2 arguments -
accumulated-so-far and current item.

A for-loop's body has no such restrictions, it can access any number of
neighbor elements if the algorithm so requires.

    
    
        a[i] = a[i-1] * a[i+1]

~~~
rjeli
A for loop is a fold over the local context and the index

------
Walkman
Rule 5 is actually the same as Linus Torvalds said: "Bad programmers worry
about the code. Good programmers worry about data structures and their
relationships."

------
avinassh
> Pike's rules 1 and 2 restate Tony Hoare's famous maxim "Premature
> optimization is the root of all evil."

Looks like wrongly attributed to Hoare? That famous quote is by Knuth

[http://wiki.c2.com/?PrematureOptimization](http://wiki.c2.com/?PrematureOptimization)

[https://en.wikiquote.org/wiki/Donald_Knuth](https://en.wikiquote.org/wiki/Donald_Knuth)

~~~
mpweiher
Full Knuth quote:

"We should forget about small efficiencies, say about 97% of the time:
premature optimization is the root of all evil. Yet we should not pass up our
opportunities in that critical 3%."

1\. It is only ever about _small_ efficiencies.

2\. And about small efficiencies in the 97% non-critical parts

So:

1\. It is _always_ valid to concern yourself with _large_ efficiencies.

2\. In the critical 3%, it is also legitimate to worry about small
efficiencies.

For context, later in the same paper (Structured Programming with goto
statements):

“The conventional wisdom [..] calls for ignoring efficiency in the small; but
I believe this is simply an overreaction [..] In established engineering
disciplines a 12% improvement, easily obtained, is never considered marginal;
and I believe the same viewpoint should prevail in software engineering."

So his _actual_ message is pretty much the opposite of what is attributed to
him.

~~~
fiddlerwoaroof
The issue though is that optimizing for performance is, in most cases,
optimizing for the wrong thing if it makes your code harder to understand,
maintain and test.

------
punnerud
A prox what was said in the book Rework; If we add 1000 more professors and
expand to more cities Harvard/MIT/Stanford would be a better school. Everyone
laught of this, why do we still think it is a good thing for a company? Start
small, fix while you go and keep a customer focus (not growth focus in both
company size and software complexity)

~~~
mac01021
I can't speak for Harvard/MIT/Stanford, but the top-tier administrators at my
state's flagship university would probably _love_ to be able to engage in that
kind of expansion.

------
saagarjha
> If you've chosen the right data structures and organized things well, the
> algorithms will almost always be self-evident.

Therin lies the rub: it’s hard to choose data structures.

~~~
BatFastard
Its easy to choose data structures, its hard to choose a good flexible data
structure.

------
diedyesterday
Programming language 'sages' tend to blurt out these generalizations, although
they are based on insights which are only PARTIALLY determinant.

Many advanced ('fancy', if you will) algorithms are based on mathematical
proofs and relations, which are seldom obvious. Some of them have taken
hundreds of years to reach proof and be capable of being used as a theorem.

If all we needed were 'obvious' algorithms we would have hardly solved any
real-world non-rudimentary problems with computation.

------
dang
I put 2007 above because it's the earliest year at
[https://web.archive.org/web/20070210233739/http://users.ece....](https://web.archive.org/web/20070210233739/http://users.ece.utexas.edu/~adnan/pike.html),
but if someone has a better date for this we can change it.

~~~
Pete_D
[https://www.lysator.liu.se/c/pikestyle.html](https://www.lysator.liu.se/c/pikestyle.html)
includes the rules under the header "Complexity", and is dated 1989.

~~~
dang
Wow, well done. Edited above.

------
adamnemecek
"Data structures, not algorithms, are central to programming."

This seems to follow from Curry-Howard isomorphism.

------
lngnmn
> _Data dominates. If you 've chosen the right data structures and organized
> things well, the algorithms will almost always be self-evident. Data
> structures, not algorithms, are central to programming._

This is especially beautiful insight giving that it has a parallel with
molecular biology - proteins and other molecular structures dominate. Enzymes
are made of "standard" proteins to transform particular molecular structures.

This, BTW, is also related to usually overlooked the "code is data" principle.

To be a good programmer one has to know molecular biology 101. It seems like
good guy (John McCarthy, MIT Wizards, Bell labs and Erlang guys) did.

------
maxpert
> If you've chosen the right data structures and organized things well, the
> algorithms will almost always be self-evident. Data structures, not
> algorithms, are central to programming.

OMG, this is so so true! I have seen people writing complicated/fancy
algorithms to do simple stuff, and 2 years later they don't even know how did
they do it!

------
geocar
That last rule is pretty important; "Everything is a hash table" is the
antithesis of it.

------
threatofrain
The sum of these rules seem to imply that it's difficult to theoretically
model programming, so always wait for your program to be fully written so you
can perform some brute force empiricism -- and only then think about
performance.

~~~
eternalban
It is not true that you can not "theoretically" (read: based on formal
reasoning) model program performance. You don't need to run a program to
determine that an O(n^2) approach will be slower than an O(log n) algorithm.
You do not need to run a program to determine the impact of optimizing a bit
of your para code; you can get fairly good sense using Amdahl's law. Etc.

Like his creation, the Go language, Rob Pike's rules are somewhat
condescending and patronizing those he deems as lesser programmers.

Rule 3, however, is the exception. That is the take away gem based on
experience.

~~~
inimino
> You don't need to run a program to determine that an O(n^2) approach will be
> slower than an O(log n) algorithm.

That depends on N and on the constant factors. Which means testing still might
surprise you.

More to the point, if the O(n^2) algorithm is simpler and leaves you more time
for profiling and fixing actual performance issues after the code works, you
may end up with faster code by doing the dumb thing.

------
dvirsky
Rule 3 is even more important today, with modern CPU cache. Running a simple
O(n) scan of a (small-ish) bunch of integers (and often strings) will be
significantly faster than storing them in some map or even binary search in
some cases.

------
alexnewman
Honestly the more I learn about Pike the less I have respect for him. He
surely has coded with some of the best but I wonder where he has led us. I
appreciate the goals of go but see everything I worry about doubling up: ```
\- Everything wrong about c, for instance polymorphism in go is... \- Not
understanding ownership leads to a lack of design in terms of ownership \-
without a better sense of types people's thinking is not explicit and careful
enough ```

------
maxxxxx
These are very useful and practical. Especially 5 which can also be really
hard to achieve.

------
fergazen
These are not "Rules of Programming", but are excellent "Guidelines for
Performance". Having 30yrs under my belt I have come to all the same
conclusions as Rob Pike over my years, so these are very deserving of some
serious contemplation for those with < 10yrs experience.

In my current job we have lots of large and often 'sparsely populated' objects
which means basically you can never count on any of the properties you were
hoping would be present will actually be present. This means the data objects
are essentially useless (at least for knowing what data you have to work with
as you are writing code), and violate Pike's rules mentioned about data object
design being critically important. In the modern world the younger generation
of developers thinks "functional programming is great, and OOP (inheritance,
etc) is obsolete", and only after 20yrs of coding they look back and
eventually realize OOP had it right all along.

In architectures with massive numbers of sparsely populated objects only the
guy who originally wrote any given function will be able to understand it, and
everyone else who tries to work in the code is pretty much screwed.

~~~
cutler
".. in the modern world", "... younger generation". Maybe you're just getting
old? If "OOP had it right all along" there wouldn't be a revival of interest
in FP. OOP encourages mutable state and the bigger your programme the harder
it is to keep track of it. Rampant mutable state also makes concurrency
difficult and concurrency is now more important than it was due to multi-core
hardware.

------
johansch
"Rule 5. Data dominates. If you've chosen the right data structures and
organized things well, the algorithms will almost always be self-evident."

This should have been #1.

Really. In all of my 20 years of designing and writing software products, I
have the learned that pretty much all of the work depends on the data
model/the data structures.

(And also: once you have you have understood the data structures of a program,
you kinda understand the program too.)

It is by far the number one mistake for juniors to do - focusing on the code
rather than on the data structures.

~~~
amelius
I've found that even as a user it is very useful to have a mental image of
what the "data model" looks like, abstractly. This is more important than
knowing about all the features of an application.

I'm thinking user documentation should start with a quick explanation of the
data model, and from there describe the operations that can be performed on
it.

~~~
johansch
Yeah, the data structures define the program. Both downwards towards the
developer and upwards towards the user.

~~~
evincarofautumn
This reminds me of a quote about Blender, which I find has a remarkably solid
architecture based on the idea of interactively editing a sort of “scene
database”:

“Although some jokingly called Blender a ‘struct visualizer’, the name is
actually quite accurate. During the first months of Blender's development, we
did little but devise structures and write include files. During the years
that followed, mostly tools and visualization methods were worked out.”

And the architecture of a compiler I’ve been working on has gotten
considerably simpler and more solid as I’ve figured out the right data
structures—with the right data model, especially if it’s enforced with types,
the code practically writes itself. _Finding_ that model is the hard part.

------
CoreXtreme
As a functional programmer, these rules sound like a joke to me. I'll only
partially agree with the last one.

