
Rob Pike's Rules of Programming (1989) - gjvc
http://users.ece.utexas.edu/~adnan/pike.html
======
dkarl
> Rule 1. You can't tell where a program is going to spend its time.
> Bottlenecks occur in surprising places, so don't try to second guess and put
> in a speed hack until you've proven that's where the bottleneck is.

I wish people would follow this rule and just let stuff work. I recently
encountered the most extreme version of this I've ever seen in my career: a
design review where a guy proposed a Redis caching layer _and_ a complex
custom lookup scheme for a <1GB, moderate read volume, super low write volume
MySQL database. And of course he wants to put the bulk of the data in JSON
fields and manage any schema evolution in our application code.

Can't we just let stuff work? I'm no fan of MySQL, but can't we admit that a
ubiquitous and battle-tested piece of technology, applied to a canonical use
case, on tiny data under near-ideal circumstances, is probably going to work
just fine? At least give it a chance before you spend days designing and
documenting a bunch of fancy tricks to save MySQL from being crushed under a
few megabytes of data.

~~~
maxk42
This is particularly exasperating for me. I can't tell you how many times in
my professional career I've ended up speeding up systems by removing two or
three layers of improperly-implemented "caching" and using good ol' MySQL and
a basic understanding of algorithmic time complexity to simplify things.

~~~
flukus
Me too. I've seen a few systems that replaced their simple request requiring
10,000 queries "optimized" by requiring 10,000 cache lookups when they should
have just added some joins. The bottleneck is the network latency, not the
database. The worst I've seen is an nHibernate cache stored in a session
variable, half the database was being serialized/deserialized on every http
request. Fortunately that was a small database.

Even with in memory caches I've seen systems grind to a halt by death of a
thousand cuts, dictionary based entity attribute systems where each attribute
is looked up individually. There seems to be a mentality that constant lookup
== free lookup and devs don't seem to realize that constant * $bignumber ==
$biggerNumber. Caching shouldn't be granular.

Obligatory latency numbers every programmer should know:
[https://gist.github.com/jboner/2841832](https://gist.github.com/jboner/2841832)

~~~
noisy_boy
Having worked with Hibernate recently on a smallish database (that isn't
expected to grow too big), my take away is to minimize the network trips. If
you have to check something for a list/set of things and all of that can be
done from, say, two not so big tables, fetch the superset of records via a
join with as much criteria pushed to predicates as efficiently possible and
handle rest of the logic in application code. It is so much more quicker. Of
course, like most things in software, conditions apply like network
latency/total no. of records being fetched by the query/potential for data to
suddenly grow etc.

~~~
bernawil
Not to cache simple trips to the database is pretty uncontroversial, the are
more controversial cases elsewhere.

Say you consume a Restful API to hidrate order data in another system. Do you
fetch orders as:

    
    
      /orders?ids=id-1,id-2,id-3
    

or separate calls to

    
    
      /orders/id-x
    

which can be cached and retrieved by id as a memoized function?

Well if you had to pick only one the second is probably better, but the best
would be abstracting away order fetching in application code to always fetch
single orders and behind the scenes looking up the cache for singles and
pooling all the misses into a single request to the plural endpoint.

~~~
flukus
If this is a common occurrence the former would almost always be better, the
second would be absolutely glacial.

> but the best would be abstracting away order fetching in application code to
> always fetch single orders and behind the scenes looking up the cache for
> singles and pooling all the misses into a single request to the plural
> endpoint.

Unless a significant amount of requests have 100% cache hits then I doubt a
local cache will make much of a difference at all, all it's saving is a bit of
bandwidth.

------
andrewl
In _The Mythical Man Month_ Fred Brooks said "Show me your flowchart and
conceal your tables, and I shall continue to be mystified. Show me your
tables, and I won't usually need your flowchart; it'll be obvious."

I first read that on Guy Steele's site:
[http://www.dreamsongs.com/ObjectsHaveNotFailedNarr.html](http://www.dreamsongs.com/ObjectsHaveNotFailedNarr.html)

~~~
vishnugupta
> Show me your tables, and I won't usually need your flowchart

A couple of years ago I spent quite some time trying to evaluate the tech
stack (and general engineering culture) of merger/acquisition targets of my
employer. It was quite a fun exercise, all said and done. I encountered all
sorts; from a small team start up who had their tech sorted out more or less
to a largish organisation who relied on IBM's ESB which exactly one person in
their team knew how it worked!!

I discovered this exact method during the third tech evaluation exercise. When
the team began explaining various modules top-down and user-flows etc., I
politely interrupted them and asked for DB schema. It was just on a whim
because I was bored of typical one way session interrupted by me asking minor
questions. Once I had a hang of their schema rest of the session was literally
me telling them what their control and user flows were and them validating it.

Since then it's become my magic wand to understand a new company or team. Just
go directly to the schema and work backwards.

Conversely, I've begun paying more attention to data modelling. Because once a
data model is fixed it's very hard to change and once enough data accumulates
the inertia just increases and instead if changing the data model (for the
fear of data migration etc.,) the tendency is to beat the use cases to fit the
data model. It's not your usual fail-fast-and-iterate thing.

~~~
noisy_boy
I have learned to spend a good chuck on my effort and focus on data model - it
is literally the heart of the application. Once that is done correctly, I've
seen that the code almost just falls into place by itself.

------
lliamander
Rule 5 seems to mirror one of my favorite insights from Alexander Stepanov:

> In 1976, still back in the USSR, I got a very serious case of food poisoning
> from eating raw fish. While in the hospital, in the state of delirium, I
> suddenly realized that the ability to add numbers in parallel depends on the
> fact that addition is associative. (So, putting it simply, STL is the result
> of a bacterial infection.) In other words, I realized that a parallel
> reduction algorithm is associated with a semigroup structure type. That is
> the fundamental point: algorithms are defined on algebraic structures.

This is also exemplified in the analytics infrastructure used at stripe:
[https://www.infoq.com/presentations/abstract-algebra-
analyti...](https://www.infoq.com/presentations/abstract-algebra-analytics/)

~~~
skybrian
But adding floating point numbers _isn 't_ associative, in general. Sometimes
you need to do it the right way to avoid catastrophic cancellation.

I guess the key is to know how to deal with things that are only mostly true.

~~~
Koshkin
That’s why in C++ we have traits and overloading.

~~~
rumanator
Could you explain where do you see traits and overloading helping you with
floating point operations?

------
karl11
Everyone building no code tools is learning or will learn that the problem
most businesses have is not a lack of coding skill, or the inability to build
the algorithm, but rather how to structure and model data in a sensible way in
the first place.

~~~
throwaway894345
Modeling the data and structuring the program are indeed the harder tasks, but
orgs have lots of smart people who have those skills but not the familiarity
with various existing syntaxes and standard libraries and so on that a
programmer learns over the decades of their career. Further, those same orgs
probably have many people with experience in the latter but without any
special ability to think abstractly. This significantly limits the ability to
create tools. Further, the no code tools often abstract at a more appropriate
level than general purpose programming languages’ standard libraries because
these tools aren’t trying to be general purpose (at least not to the same
degree as general purpose programming languages). Lastly, I’ve seen business
people use certain no code tools to build internal solutions quickly that
would have taken a programmer considerable (but not crazy) time to crank out,
especially considering things like CI/CD pipelines, etc. Nocode won’t replace
Python, but it serves a valuable niche.

------
commandlinefan
> Tony Hoare's famous maxim "Premature optimization is the root of all evil."

Actually that was Donald Knuth - it's an urban legend that it's an urban
legend that it was originally Knuth. Hoare was quoting Knuth, but Knuth forgot
he said it, and re-mis-attributed the quote to Hoare.

~~~
karmakaze
And it is usually quoted out of its context.

"We should forget about small efficiencies, say about 97% of the time:
premature optimization is the root of all evil. Yet we should not pass up our
opportunities in that critical 3%."

~~~
eps
It's also often interpreted literally.

Premature _complex_ optimization is a bad idea, but simple (read, cheap to
code) optimization for common bottleneck patterns is a perfectly reasonable
thing to do.

~~~
FridgeSeal
It's also often (ab)used far too often to justify performing no optimisation
at all.

~~~
userbinator
...or even _pessimisation_.

~~~
NohatCoder
That is a great word.

------
OliverJones
In long-lived systems (systems that run for many years) it's almost impossible
to choose the "right data structures" for the ages. The sources and uses of
your data will not last nearly as long as the data itself.

What to do about this? Two things:

STORE YOUR TIMESTAMPS IN UTC. NOT US Pacific or any other local timezone. If
you start out with the wrong timezone you'll never be able to fix it. And
generations of programmers will curse your name.

Keep your data structures simple enough to adapt to the future. Written
another way: respect the programmers who have to use your data when you're not
around to explain it.

And, a rule that's like the third law of thermodynamics. You can never know
when you're designing data how long it will last. Written another way: your
kludges will come back to bite you in the xxx.

~~~
aserafini
Sometimes storing in UTC is simply not correct. For example a shop opening
time. The shop opens 10am local time, whether DST or not. Their opening time
is 10am local time all year but their UTC opening time actually changes
depending on the time of year!

~~~
wtetzner
But a shop opening time is not a timestamp, so I think the original advice is
still good. A timestamp is the time at which some event happened, which is
different than a date/time used for specifying a schedule.

For example, if you wanted to track the history of when the shop actually
opened, it would make sense to store a UTC timestamp.

~~~
TheCoelacanth
> A timestamp is the time at which some event happened, which is different
> than a date/time used for specifying a schedule.

Correct, but that makes this a rule with much more limited applications than
many people are going to interpret it as.

------
combatentropy
A quote that combines the rules about optimization with the rules about data
structures: "NoSQL databases that only have weak consistency are enforcing a
broadly applied premature optimization on the entire system." \--- Alexander
Lloyd, Google

Also in support of Rule 5, see Eric Raymond's treatment of Data-Driven
Programming:

"Even the simplest procedural logic is hard for humans to verify, but quite
complex data structures are fairly easy to model and reason about. To see
this, compare the expressiveness and explanatory power of a diagram of (say) a
fifty-node pointer tree with a flowchart of a fifty-line program. Or, compare
an array initializer expressing a conversion table with an equivalent switch
statement. The difference in transparency and clarity is dramatic. See Rob
Pike's Rule 5.

"Data is more tractable than program logic. It follows that where you see a
choice between complexity in data structures and complexity in code, choose
the former. More: in evolving a design, you should actively seek ways to shift
complexity from code to data."

[http://www.catb.org/~esr/writings/taoup/html/ch01s06.html#id...](http://www.catb.org/~esr/writings/taoup/html/ch01s06.html#id2878263)

[http://www.catb.org/~esr/writings/taoup/html/generationchapt...](http://www.catb.org/~esr/writings/taoup/html/generationchapter.html)

------
sethammons
A quote from one of our founders that I've always liked:

If you make an optimization that was not at a bottleneck, you did not make an
optimization.

~~~
renewiltord
Read _The Goal_ by Eliyahu Goldratt. While it's possible your founder came
upon the idea independently, this is one of many that are repeated in that
book. It's relatively short and entertaining to read and has definitely
survived the 36 years since first publishing quite well.

~~~
mplewis
This was adapted into a novel about IT/devops called The Phoenix Project. It's
an excellent read.

~~~
renewiltord
I second this. Quite the entertaining read, honestly. I also enjoyed the
completely unnecessary transformation of the security dork into a security
ubermensch.

~~~
b0afc375b5
I've read The Goal and The Phoenix Project. While I did enjoy the stories, I'm
uncertain, perhaps due to inexperience, what the main lesson/s are supposed to
be.

Anyone want to share their main takeaways from these books?

~~~
sethammons
Reduce feedback cycles

------
munificent
_> Rule 1. You can't tell where a program is going to spend its time.
Bottlenecks occur in surprising places, so don't try to second guess and put
in a speed hack until you've proven that's where the bottleneck is._

I agree with the general thrust of this. But it's worth pointing out that
often the easiest way to prove where a bottleneck is (or at least) isn't, is
to try an optimization and see if it helps. I like profiling tools immensely,
and this kind of trial an error optimization doesn't scale well to widespread
performance problems. But there's something to be said for doing a couple of
quick optimizations as tracer bullets to see if you get lucky and find the
problem before bringing in the big guns.

The last three rules bug me. I wish we had a name for aphorisms that perfectly
encapsulate an idea _once you already have the wisdom to understand it_ , but
that don't actually _teach_ anything. They may help you remember a concept—a
sort of aphoristic mnemonic—but don't _illuminate_ it. The problem with these
is that espousing them is more a way of bragging ("look how smart I am for
understanding this!") than really helping others.

For example:

 _> Rule 5. Data dominates. If you've chosen the right data structures and
organized things well, the algorithms will almost always be self-evident. Data
structures, not algorithms, are central to programming. _

OK, well what are the "right" data structures? The answer is "the ones that
let you perform the operations you need to perform easily or efficiently". So
you still need to know what the code is doing too. And the algorithms are only
"self-evident" because you chose data structures expressely to give you the
luxury of using simple algorithms.

~~~
wtetzner
I think the point is more to direct how you write/refactor code: focus more on
the data structures than on the code. E.g. if you're struggling to write the
code, then maybe you need to take a step back and reconsider your data
structures.

------
dogbox
> Rule 5. Data dominates. If you've chosen the right data structures and
> organized things well, the algorithms will almost always be self-evident.
> Data structures, not algorithms, are central to programming.

Is "data structures" the correct term here? Assuming I'm not misinterpreting,
the usage of "data structures" can be misleading - one usually thinks of
things like BST's and hash tables, which are inherently tied to algorithms. I
feel like "data modeling" better captures the intended meaning here.

~~~
hellofunk
A custom type is also a data structure and that is usually what I think quotes
like these refer to.

------
glangdale
These rules are good as far as they go. However, they are notably silent on
what happens when they stop working.

If you're a celebrity computer scientist and a problem has been 'squashed' to
the point where there aren't any more 80-99% "hot spots" I guess you can just
parachute out of there and on to a more interesting problem.

However, if you're paid to work on a particular thing, sooner or later, you
will fire up a profile and say "oh, great, I don't have a hot spot anymore,
just a bunch of 10-30% 'warm spots'". At that point you need to attack
problems that aren't traditional bottlenecks if you still want to get
speedups.

Moreover, the things that you learn from repeatedly attacking those 10-30%
'warm spots' might be fundamentally different from the learning you get from,
I dunno, taking some O(N^2) monstrosity out of the "99% of the profile" and
Declaring Victory.

------
jorangreef
Rule 1 and 2 depend on context, whether you're working on an existing program
or a new program. They can be true or false. They can really help or they can
really hurt. Are you going into an existing system to do performance
optimization? Sure, don't guess, measure. Are you designing a new system?
Throw out those parroted premature optimization mantras... you are responsible
for designing for performance upfront. You will always measure but depending
on context you will design for speed first and then test your prototype with
measurements. There's no way around an initial hypothesis when you're
designing new systems. You have to start somewhere. That's where Jeff Dean's
rule always to do back of the envelope guesses will pay off in orders of
magnitude, many times over.

Rule 3 and 4 are gold and always true.

Rule 5 is the key to good design.

------
ChrisMarshallNY
I've always taken a very practical, results-oriented approach to software
development.

That makes sense, to me.

One of the first things that we learned, when optimizing our code, was to _use
a profiler_.

Bottlenecks could be in very strange places, like breaking L2 caches. That
would happen when data was just a bit too big, or a method was called; forcing
a stack frame update.

We wouldn't see this kind of thing until we looked at a profiler; sometimes, a
rather advanced one, provided by Intel.

------
Ozzie_osman
It turns out rule 5 (Data dominates. If you've chosen the right data
structures and organized things well, the algorithms will almost always be
self-evident) is both true but also hard.

Eric Evans' Domain Driven Design is a good book on the topic.

~~~
Supermancho
> If you've chosen the right data structures and organized things well, the
> algorithms will almost always be self-evident) is both true but also hard.

The problem is not the self-evident algorithm, but the delicate implementation
(or god forbid, at scale).

Take in 1000 web requests per second. The data is all strictly validated and
has about 60 fields a record/req plus dealing with errors.

How does that go from webserver to ( _rolls dice_ ) kafka to a ( _rolls dice_
) cassandra that can be queried accurately and timely? How much does that
cost?

Oh, that's not a programmer problem. Except it is. Creating a fantasy niche of
describing problems as data vs algorithm is the canonical ivory tower
disconnect.

~~~
veets
It seems you are arguing something different, although I am having a hard time
understanding what you have written. I think you are saying algorithms and
data structures aren't hard, distributed systems are hard. In my experience
choosing the correct data structures and algorithms in your
services/programs/whatever can dramatically simplify the design of your
systems overall.

~~~
Supermancho
> It seems you are arguing something different,

I'm making an argument that the stark reality of what's hard in software
development is not the simplistic "Rules of programming", which have limited
utility.

The reason the "rules" aren't self evident (or followed), is because we live
in the reality of disparate functionality paired with an ever-changing
technical landscape. You can't just make a DB KISS abstraction and expect it
to hold with all the different repository types like ( _rolls dice_ ) Athena
after using ( _rolls dice_ ) CockroachDB. There are concerns that are not
purely algorithmic vs data structure that are far more influential and
important to understand. Even knowing these details and cases, becomes less
useful as time goes on and new technologies emerge.

> I am having a hard time understanding what you have written

If you're not interacting with new environments, tooling, and problems,
regularly (every year or 2) you don't encounter the real pain which is far
more important to your career and your ability to produce functional software.
Reading almost every postmortem, the number of lines attributed to "we changed
the data structure to O" is dwarfed by "we learned that technology X does Y,
so we had to do Z".

This is only incidentally related to distributed systems, which is indicative
of a disconnect with the problem described. Of course when you sit around in
the same environment for a long time, you can observe and optimize on
structure and algorithm, but that's not getting you to market (you're already
there) and that's the nature of maintenance...not just being a fire
extinguisher who is on call.

~~~
lllr_finger
> we live in the reality of disparate functionality paired with an ever-
> changing technical landscape

It is the responsibility of tech leaders to minimize this (accurate)
stereotype. Choose boring technology, and only build your own or choose
something exotic when it gives you competitive advantage - because the reality
I also see is that 99% of devs aren't working on anything new or unseen in the
field. Even at the FAMANG companies, most people I know are working on boring
problems.

So when your CTO or architect or whomever buys into the hype for X technology,
make a good argument against it by proposing a better solution.

------
bob1029
Modeling the problem domain is so important that I dont know why the first
year of every compsci undergrad program isnt entirely dedicated to teaching
the idea.

Instead, day 1 is installing python or java, running hello world and talking
about pointers, binary, encoding, logic gates, etc.

We should be teaching students on day 1 that code is a liability and to be
avoided whenever it is convenient to do so.

~~~
codemonkey-zeta
IMO Matthias Felleisen (building on work of Sussman (IIRC) et. al.) gets
first-year CS education right with How to Design Programs
([https://htdp.org/2020-5-6/Book/index.html](https://htdp.org/2020-5-6/Book/index.html)).
Literally step 1 in the design recipe presented in the preface is:

    
    
       "1. From Problem Analysis to Data Definitions
        Identify the information that must be represented and how  it is represented in the chosen programming language.
        Formulate data definitions and illustrate them with examples."

------
gridlockd
All this advice against "premature optimization" has created generations of
programmers that don't understand how to use hardware efficiently.

Here's the problem: If you profile software that is 100x slower than it needs
to be on every level, _there are no obvious bottlenecks_. Your whole program
is just slow across the board, because you used tons of allocations,
abstractions and indirections at every step of the way.

Rob Pike probably has never written a program where performance _really_
mattered, because if he did, he would've found that you need to think about
performance right from the beginning and all the way during development,
because making bad decisions early on can force you to rewrite almost
everything.

For instance, if you start writing a Go program with the mindset that you can
just heap-allocate all the time, the garbage collector will eventually come
back to bite you and the "bottleneck" will be your entire codebase.

~~~
byronr
> Rob Pike probably has never written a program where performance really
> mattered

Rob Pike has written window system software which ran in what now would be
called a "thin client" over a 9600 baud modem and rendered graphics using a
2MHz CPU. He probably knows a thing or two about performance tuning.

~~~
gridlockd
By "rendered graphics" you mean "place characters on screen"? If the
bottleneck for that is a 9600 baud modem throughput, there's not a lot you
need to optimize, even on a 2MHZ CPU.

Also, having programmed more constrained systems decades ago doesn't magically
make you knowledgeable on performance on modern hardware with completely
different capabilities. In fact, it's probably what causes you to develop a
"computers are so fast now, no need to think about performance"-mindset,
because everything you want to do could be done in an arbitrarily inefficient
way on modern hardware. Performance doesn't matter to you anymore.

~~~
zimpenfish
> By "rendered graphics" you mean "place characters on screen"?

It was a fully graphical 800x1024 (or 1024x1024) system running on 1982
processors.

[https://en.wikipedia.org/wiki/Blit_(computer_terminal)](https://en.wikipedia.org/wiki/Blit_\(computer_terminal\))

> having programmed more constrained systems decades ago doesn't magically
> make you knowledgeable on performance

Perhaps not but it does mean you've "written a program where performance
really mattered" which I believe was the original claim?

~~~
gridlockd
> It was a fully graphical 800x1024 (or 1024x1024) system running on 1982
> processors.

I've looked into in that. Blit was monochrome, had an _8Mhz_ processor, and a
relatively large 256KB framebuffer which could but directly written to. There
were only a handful commands, mostly concerned with copying (blitting) bitmaps
around.

Rob Pike only wrote the first version of the graphics routines - the _slowest_
version, in C(!). It was rewritten another four times over, by Locanthi and
finally Reiser.

I don't think any credit should go to Pike for implementing the performance-
critical parts of that system.

[https://9p.io/cm/cs/doc/87/archtr.ps.gz](https://9p.io/cm/cs/doc/87/archtr.ps.gz)

> Perhaps not but it does mean you've "written a program where performance
> really mattered" which I believe was the original claim?

No, it doesn't mean that. You can be wasteful on constrained hardware as well,
performance doesn't necessarily matter even on the simplest chips, if what you
want to do doesn't need the full capabilities of the system.

However, I am specifically replying to the claim that "Rob Pike probably knows
a thing or two about performance". As you can see, Rob Pike handed off
performance-critical work to someone else. He probably _didn 't_ know how to
write optimal code for that particular platform, but even if he did, most of
that knowledge wouldn't transfer over to modern systems.

At the very least, he didn't _care_ about optimizing that stuff, or he
wouldn't have handed it off. He would've _enjoyed_ optimizing that stuff. And
that's all completely fine, not every programmer needs to care about
performance. I just refuse to take advice from these people about performance
or "premature optimization", because it is _uninformed_.

~~~
byronr
Writeup of the blit terminal's operating system is here, it consists of a lot
more than the bitblt primitive, with many whole-system performance concerns at
play:

[http://a.papnet.eu/UNIX/bltj/06771910.pdf](http://a.papnet.eu/UNIX/bltj/06771910.pdf)

Suggest you read this before denigrating Rob Pike's bona fides. Not sure what
axe you are trying to grind but it is ugly and unbecoming of a professional.

~~~
gridlockd
I'm not denigrating his bona fides, I'm questioning his credentials on
performance-oriented computing. For all I know, if Rob Pike had been a
performance freak, Blit might've never shipped. He may indeed have chosen all
the right trade-offs.

Nevertheless, the advice he gives on performance is wrong, plain and simple,
for the reason that I gave you: If you have overhead everywhere, there is no
bottleneck that you can observe - your software is just slower than it needs
to be across the board. If you write software without performance in mind from
the very beginning, you can never get all of it back by optimizing later -
without major rewrites that is.

How does one give wrong advice? By not having the required experience to give
correct advice. I don't care if you're Rob Pike, Dennis Ritchie or Donald
Knuth. If you're wrong, you're wrong.

------
rubyn00bie
> Rule 5. Data dominates. If you've chosen the right data structures and
> organized things well, the algorithms will almost always be self-evident.
> Data structures, not algorithms, are central to programming.

That one hits me in the feels because I think a lot of folks focus on
algorithms (including myself), and code patterns, before their data and as a
result a lot of things end up being harder than they need to be. I've always
liked this quote from Torvalds on the subject speaking on git's design (first
line is for some context):

> … git actually has a simple design, with stable and reasonably well-
> documented data structures.

then continues:

> In fact, I'm a huge proponent of designing your code around the data, rather
> than the other way around, and I think it's one of the reasons git has been
> fairly successful […] I will, in fact, claim that the difference between a
> bad programmer and a good one is whether he considers his code or his data
> structures more important. Bad programmers worry about the code. Good
> programmers worry about data structures and their relationships.

When I have good data structures most things just sort of fall into place. I
honestly can't think of a time where I've figuratively (or literally) said "my
data structure really whips the llamas ass" and then immediately said "it's
going to be horrible to use." On the contrary, I _have_ written code that is
both so beautiful and esoteric, its pedantry would be lauded for the ages--
had only I glanced over at my data model during my madness. No, instead, I
awaken to find I spent my time quite aptly digging a marvelous hole, filling
said hole with shit, and then hopping in hoping to not get shitty.

One thing that really has helped me make better data structures and models is
taking advanced courses on things like multivariate linear regression analysis
specifically going over identifying things like multicolinearity and
heteroskedasticity. Statistical tools are incredibly powerful in this field,
even if you aren't doing statistical analysis everyday. Making good data
models isn't necessarily easy, nor obvious, and I've watched a lot of
experienced folks make silly mistakes simply because they didn't want
something asinine like _two_ fields instead of one.

~~~
allover
The counter argument would be that git is the poster-child of poor UX, which
could be blamed on the fact that it exposes too much of its internal data
structure and general inner-workings to the user.

I.e. too much focus has been put on data structures and not enough on the rest
of the tool.

A less efficient data structure, but more focus on UX could have saved
millions of man hours by this point.

~~~
wtetzner
Or perhaps learning Git just requires a different approach: you understand the
model first, not the interface. Once you understand the model (which is quite
simple), the interface is easy.

~~~
rabidrat
People keep repeating this, but it's not true. The interface has so many "this
flag in this case" but "this other flag in that case" and "that command
doesn't support this flag like that" etc. There's no composability or
orthoganality or suggestiveness. It's nonsensical and capricious and
unmemorable, even though I understand the "simple" underlying model and have
for years.

~~~
triangleman
Has anyone attempted to re-engineer a superior UX on top of the git data
structure? Would it even be possible?

~~~
codemonkey-zeta
Magit with emacs solves git's UX problem IMO. Discoverability is/was git's
real problem.

~~~
disgruntledphd2
This is true, but the trouble is that you need to know what git will do before
the magit commands and options make sense.

------
jonfw
"Write stupid code that uses smart objects"

That's a good one. It's amazing how much complexity can be created by using
the wrong abstractions.

~~~
chubot
FWIW I find this is especially important for compilers and interpreters.

It's not an exaggeration to say that such programs are basically big data
structures, full of compromises to accomodate the algorithms you need to run
on them.

For example LLVM IR is just a big data structure. Lattner has been saying for
awhile that a major design mistake in Clang is not to have its own IR (in the
talks on the new MLIR project).

SSA is data structure with some invariants that make a bunch of algorithms
easier to write (and I think it improves their computational complexity over
naive algorithms in several cases)

\----

In Oil I used a DSL to describe an elaborate data structure that describes all
of shell:

 _What is Zephyr ASDL?_
[http://www.oilshell.org/blog/2016/12/11.html](http://www.oilshell.org/blog/2016/12/11.html)

[https://www.oilshell.org/release/0.8.pre9/source-
code.wwz/fr...](https://www.oilshell.org/release/0.8.pre9/source-
code.wwz/frontend/syntax.asdl)

I added some nice properties that algebraic data types in some language don't
have, e.g. variants are "first class" unlike in Rust.

Related: I noticed recently that Rust IDE support has a related DSL for its
data structure representation: [https://internals.rust-
lang.org/t/announcement-simple-produc...](https://internals.rust-
lang.org/t/announcement-simple-production-ready-rust-un-grammar/12827)

~~~
mamcx
> FWIW I find this is especially important for compilers and interpreters.

Totally. I'm building a relational language and start to get very obvious why
RDBMS not fit certain purity ideals of the relational model (like all
relations are sets, not bags).

I'm stuck in deciding which structures provide by default. Dancing between
flat vectors or ndarrays or split between flat vectors (columns), and
HashMaps/BTree with n-values (this is my intuition now).

\--- > I added some nice properties that algebraic data types in some language
don't have, e.g. variants are "first class" unlike in Rust.

This sound cool, where I can learn about this?

~~~
chubot
FWIW I found this post thought provoking in thinking about data models of
languages.

[https://news.ycombinator.com/item?id=13293290](https://news.ycombinator.com/item?id=13293290)

\---

About first class variants:

[https://lobste.rs/s/77nu3d/oil_s_parser_is_160x_200x_faster_...](https://lobste.rs/s/77nu3d/oil_s_parser_is_160x_200x_faster_than_it_was#c_gu4oaj)

[https://github.com/rust-lang/rfcs/pull/2593](https://github.com/rust-
lang/rfcs/pull/2593)

Another way I think of this is "types vs. tags":
[https://oilshell.zulipchat.com/#narrow/stream/208950-zephyr-...](https://oilshell.zulipchat.com/#narrow/stream/208950-zephyr-
asdl/topic/Better.20solution.3A.20tags.20vs.2E.20types) (Zulip, requires
login)

Basically variant can types stand alone, and have a unique tag. Tags are
discriminated at RUNTIME with "pattern matching".

But a variant can belong to multiple sum types, and that's checked statically.
This is modeled with multiple inheritance in OOP, but there's no
implementation inheritance. Related: [https://pling.jondgoodwin.com/post/when-
sum-types-inherit/](https://pling.jondgoodwin.com/post/when-sum-types-
inherit/)

So basically in the ASDL and C++ and Python type system I can model:

\- a Token type is a leaf in an arithmetic expression

\- a Token type is a leaf in an word expression

But it's not a leaf in say what goes in a[i], or dozens of other sum types.
Shell is a big composition of sublanguages, so this is very useful and
natural. Another construct that appears in multiple places is ${x}.

So having these invariants modeled by the type system is very useful, and
actually C++ and MyPy are surprisingly more expressive than Rust! (due to
multiple inheritance)

Search for %Token here, the syntax I made up for including a first class
variant into a sum type:

[https://www.oilshell.org/release/0.8.pre9/source-
code.wwz/fr...](https://www.oilshell.org/release/0.8.pre9/source-
code.wwz/frontend/syntax.asdl)

There is a name for the type, and a name for the tag (and multiple names for
the same integer tag). Tags (dynamic) and types (static) are decoupled.

------
okaleniuk
Just yesterday I removed an obsolete doxygen tag from some 300 files and my
Visual Studio went catatonic.

Well, I suspect it's not about Studio per se but rather the git integration
but still. Someone avoided a "fancy algorithm" and wasted both my time and the
product reputation with a "small n" workaround. Because, git is about small
commits right?

I'd like to restate the first rule as "You can't tell where a program is going
to spend its life".

------
sam_lowry_
"Bad programmers worry about the code. Good programmers worry about data
structures and their relationships."

— Linus Torvalds

~~~
eointierney
Amateur mathematicians worry about patterns, professionals worry about numbers

~~~
codemonkey-zeta
I'm neither an amateur nor professional mathematician, so I can't tell; is
this statement tongue-in-cheek? If not, what does it actually mean?

------
dang
If curious see also

2017
[https://news.ycombinator.com/item?id=15265356](https://news.ycombinator.com/item?id=15265356),

[https://news.ycombinator.com/item?id=15776124](https://news.ycombinator.com/item?id=15776124)

2014
[https://news.ycombinator.com/item?id=7994102](https://news.ycombinator.com/item?id=7994102)

Pete_D gets credit for the date:
[https://news.ycombinator.com/item?id=15266498](https://news.ycombinator.com/item?id=15266498).
These rules come from "Notes on Programming in C"
([http://www.lysator.liu.se/c/pikestyle.html](http://www.lysator.liu.se/c/pikestyle.html)),
which has its own sequence of threads:

2017
[https://news.ycombinator.com/item?id=15399028](https://news.ycombinator.com/item?id=15399028),

[https://news.ycombinator.com/item?id=13852734](https://news.ycombinator.com/item?id=13852734)

2014
[https://news.ycombinator.com/item?id=7728084](https://news.ycombinator.com/item?id=7728084)

2011
[https://news.ycombinator.com/item?id=3333044](https://news.ycombinator.com/item?id=3333044)

2010
[https://news.ycombinator.com/item?id=1887442](https://news.ycombinator.com/item?id=1887442)

------
cactus2093
Interesting that the first 3 are all about performance. Which strikes me as a
bit ironic given rule #1, which could be summarized as don't worry about
performance until you have to.

------
svec
Pike himself says "there's a 6th rule":
[https://twitter.com/rob_pike/status/998681790037442561?lang=...](https://twitter.com/rob_pike/status/998681790037442561?lang=en)

(6. There is no Rule 6.)

And points to the best source he finds for it on the web:
[http://doc.cat-v.org/bell_labs/pikestyle](http://doc.cat-v.org/bell_labs/pikestyle)

~~~
tinco
dang if this sticks to the frontpage, can you change the title to "Rob Pike's
5 rules of programming _in C_ "? As the title is now it misrepresents Rob
Pikes words.

------
mywittyname
> "write stupid code that uses smart objects".

Writing stupid code is actually really difficult.

For me, it takes a little bit of iterating before I know just the right place
to insert stupid.

~~~
dnautics
Do you use TDD? I'm not religious about it in general, but when I'm lost,
confused, and easily distracted, I start with TDD to write the dumbest
possible code.

~~~
mywittyname
No.

It's really an issue of not being sure at first what needs to be flexible &
data-driven vs handled in code. If make everything data driven, then it
becomes this horrible mess where your input is basically a program and your
actual code ends up being a terrible interpreter.

I tend to just build things bottom-up, and start with a small bit of
functionality, then when I have enough small bits, I bolt them together and
decide what I need to abstract at that point, do refactoring on the smaller
bits and provide data to them from the caller. Then repeat that continuously
until I have all of the functionality I need.

It might be different for other people, but I need to have working code before
I can it abstract.

~~~
dnautics
I don't understand. Tdd gets you to working code (if on a subset of all your
data) extremely fast. There are both bottom-up and top-down styles, neither of
which is particularly wrong.

~~~
mywittyname
The TDD aspect is not really here nor there.

To provide a more concrete example, say I have a function that performs a
transformation on a piece data. From what I _currently know about the data_ ,
I can parse it using a regex. So I code up the function that accepts data, it
runs the regex and provides the transformed result. Great.

Now as I'm continuing my work, I notice that some other data requires a
similar, but not exactly the same transformation. It can be done with a
slightly different regex. So rather than duplicating functionality, I modify
the transform function above to take a regex as input along with the data.
Everything works as expected.

I get further along in the project and I realize another piece of data needs a
somewhat similar transformation, but this time it's just slightly too
complicated for a stand-alone regex, it needs to be a function.

The "smart code" way to handle this would be to create another transformation
function and call that instead. The "dumb code" way of handling this is to
generalize the transformation function such that I can pass in some descriptor
for the transformation, and have the transform function return the correct
result.

That's the crux of the issue. I rarely have enough information at the time of
writing to know just how generalized to make a function. If I created this
hyper-generalized transformation at the beginning, but never needed anything
beyond the original simple regex, I would have wasted a bunch of time creating
code that's needlessly confusing.

TDD is perfectly applicable for development, and would help tremendously with
the refactoring aspect, but what it doesn't help with is information that you
don't yet know about.

------
Impossible
Rule 5 (data dominates) is exactly the same as lie 3 (code is more important
than data) of Mike Acton's Three Big Lies
([https://cellperformance.beyond3d.com/articles/2008/03/three-...](https://cellperformance.beyond3d.com/articles/2008/03/three-
big-lies.html)). Rob Pike's rule 3 (don't use fancy algorithms) is closely
related to Mike Acton's lie 1 (software is a platform) as well.

A part of me wonders if Mike was influenced by Rob Pike's rules directly or
indirectly. It's also something that an experienced programmer can discover
independently easily enough. Mike Acton was clearly heavily influenced by some
of the failures of classic OOP design (lie 2 of Three Big lies).

------
4thwaywastrel
I feel ike #5 is why many people like GraphQL, and #1 - #4 is why many hate
it.

It becomes much easier to understand what data exists and build tools on top
of it's schema. But the extra cludge it adds to optimise the specific case of
returning less data in a field is often counter productive.

~~~
pier25
> _But the extra cludge it adds to optimise the specific case of returning
> less data in a field is often counter productive._

Could you elaborate?

------
radmuzom
While the rules make sense, it is often not realistic due to business
constraints.

What tends to happen in real life is that the profiling and optimization of
bottlenecks is forgotten once the software is "ready".

I would propose Rule 0: Developers are irrelevant, only end users of your
software matter.

------
RcouF1uZ4gsC
One of the big problems with fancy algorithms is that they either access data
out of order and/or do pointer chasing. Simple algorithms tend to access the
data in order.

CPU's have a lot of logic for making in order data access very fast.

------
johndoe42377
This is even more relevant after 30 years.

The Data Dominates principle is the key, everything else just follow,
including n are usually small and preference for simple algorithms.

Nowadays we have exactly the opposite, which is understandable since the field
has been invaded by uneducated amateur posers.

[https://karma-engineering.com/lab/blog/NodeModules](https://karma-
engineering.com/lab/blog/NodeModules)

------
GuB-42
Rule 1. You can't tell where a program is going to spend its time. Bottlenecks
occur in surprising places, so don't try to second guess and put in a speed
hack until you've proven that's where the bottleneck is.

True, but emphasis on the speed _hack_. It shouldn't stop you from thinking
about performance in your design. If it means doing less, do less now. If it
means adding a speed hack (usually some sort of cache), don't do it until you
are sure that you need it.

Rule 2. Measure. Don't tune for speed until you've measured, and even then
don't unless one part of the code overwhelms the rest.

I agree completely. But you have to realize that measuring is as much of an
art form as optimization is. Especially on today's ridiculously complex
systems, the bottleneck may not be obvious. For example a function may take a
long time to run, but the real cause may be another function flushing the
cache.

Rule 3. Fancy algorithms are slow when n is small, and n is usually small.
Fancy algorithms have big constants. Until you know that n is frequently going
to be big, don't get fancy. (Even if n does get big, use Rule 2 first.)

Disagree, unless you can prove that n will stay small. You don't know how your
users will abuse your program. For example, you may have designed your program
to handle shopping lists of a few dozens of items, and then someone decides to
import the entire McMaster-Carr catalogue, which has more than half a million
items. It may be a great use case you didn't thought of, and that fancy
algorithm permits it by scaling well. There are also vulnerabilities that
exploit high algorithmic complexity and worst cases. Don't overdo it, but N^2
is rarely a good idea if you can avoid it.

Rule 4. Fancy algorithms are buggier than simple ones, and they're much harder
to implement. Use simple algorithms as well as simple data structures.

True, but with the caveats of Rule 3

Rule 5. Data dominates. If you've chosen the right data structures and
organized things well, the algorithms will almost always be self-evident. Data
structures, not algorithms, are central to programming.

Agree, for me there is a hierarchy in code. From most important to least
important: data (structures), code (algorithms), comments.

The order comes from the fact that if you change your data, you need to change
your code too, and if you change your code, you also need to change your
comments. Going the other way, you can freely change comments, and changing
your code will not require you to change your data. Data is the cornerstone.

------
gentleman11
> Data dominates. If you've chosen the right data structures and organized
> things well, the algorithms will almost always be self-evident

What does this have to say about the careers and roles of data scientists vs
programmers? A data scientists entire job is to categorize and model data in a
useful way. In the future, will they fundamentally more important than coders,
or will the two roles just merge?

~~~
ttamslam
I think you're conflating two things: in my mind, working on the shape of data
is different than pulling inferences out of that data.

------
terandle
Am I wrong to avoid writing O(n^2) code if at all possible when it is fairly
easy to use hash tables for a better time complexity? Sure when n is small the
O(n^2) one will be faster but when n is small /anything/ you do is fast in
absolute terms so I'm trying to not leave traps in my code just waiting for n
to get bigger than initially expected.

~~~
dragontamer
> Am I wrong to avoid writing O(n^2) code if at all possible when it is fairly
> easy to use hash tables for a better time complexity

Are you sure that std::unordered_map is faster than std::vector? Did you
measure?

Every time you access an element in std::vector, you also access nearby ones
(thanks to L1 cache, as well as CPU-prefetching of in-line data).

In contrast, your std::unordered_map or hash-table has almost no benefits to
L1 cache. (It should be noted that linear-probing, despite being a O(N^2)
version of hash-tables worst-case, is actually one of the better performers
due to L1 cache + prefetching)

~~~
krzat
Worrying about performance of small collections is premature optimization.

Using maps or sets nowadays is mostly for clarity, as they are used to solve
certain kind of problems.

~~~
dragontamer
I agree with you. But what you're talking about is completely different from
what I was responding to originally.

If you need a set, use a set. But don't assume that its faster than a
std::vector.

Even then, std::vector has set-like operations through binary_search or
std::make_heap in C++, so it really isn't that hard using a sorted (or
make_heap'd) std::vector in practice.

\--------

Even if you don't plan on doing optimization work, its important to have a
proper understanding of a modern CPU. The effects of L1 and prefetching are
non-trivial, and make simple arrays and std::vectors extremely fast data
structures, far faster than compared to 80s or 90s computers anyway. A lot of
optimization advice from the past has become outdated because of the evolution
of CPUs.

So its important to bring up these changes in discussion, from time to time,
to remind others to restudy computers. Things change.

~~~
avasthe
> sorted std::vector

But inserting into sorted std::vector is o(n log n) in worst case right? (As
you have to binary search the position and move other elements). But a hash
set with linear probing can give O(1) (amortized) access, while usually
maintaining an invariant like total_size > 2*n. I don't think that would be
such an impact on cache locality. Linear probing doesn't require linked lists.

Of course this is given that you have a data structure in standard library.
But at this point I think hash sets are pretty standard.

~~~
dragontamer
Yeah, linear probing helps hash-sets a lot on modern CPUs.

Perhaps linear probing is a better example of how BigO analysis can go wrong
on modern architectures. Inserting into a hashset with linear-probing is O(n)
worst-case, while inserting into a linked list is always O(1) (best case,
worst case, and average case).

And yet, linear probing seems to work out best in practice (with a bit of
rigging. The total_size > 2*n invariant is one, but so does Robin-hood hashing
if you want to keep the table small)

Linear probing vs Linked List implementations of hash-sets seems to be a more
clear example of an O(1) vs O(n) anomaly, where the O(n) example is superior.

~~~
avasthe
I don't think it is the same thing.

Inserting into linked list assumes you have found the node to insert in.

I don't remember exact details but in hash set with linear probing, the worst
case happens quite rarely given the hash function is good one (which are quite
sophisticated these days). It is O(1) amortized. The same applies for hash
table with chaining too, that all of your keys may go to same bucket given a
sufficiently bad hash function.

Given the other choices, like (as far as I know) SkipList based or tree based
variants, hash sets are obvious choice.

~~~
dragontamer
I mean "Hash-set with Linked List" vs "Hash-set with linear probing". I
realize I was getting lazy with my typing, so lemme try to be more clear this
time.

Hash-set with Linked List is O(1) all cases.

Hash-set with linear-probing is O(n) worst-case insertion. But happens to be
faster in practice with circa 2020-style CPUs (especially with Robin Hood
insertion)

Assume the load-factor to be 90%+, so that we actually get a reasonable
difference between the two strategies. We have a situation where O(n) is
better than O(1).

------
Areading314
These resonate, especially #1, but I'm not so sure about #5. Although it makes
sense to choose good data structures, I don't think that guarantees a simpler
algorithm. For example you can store your data in a heap (tree), and still
need to write a tree traversal algorithm to print out the elements in order.

------
spirographer
Rob Pike's rules of programming simplified :

\- Start with stupid code built on smart data (after Fred Brooks)

\- When in doubt use brute force (Ken Thompson)

\- Premature optimization is the root of all evil (Tony Hoare)

------
zekrioca
Why don’t interviewers at tech companies just don’t follow these rules while
grading candidates?

------
rswail
Over time, the rule I've "discovered" is:

"Focus on the Nouns, not the Verbs"

------
HeavyStorm
Wait, wasn't Knuth that said that premature optimizations are the root of all
evil?

------
ed_elliott_asc
When people say worry about the data structures, what do they mean?

------
game_the0ry
> Data dominates. If you've chosen the right data structures and organized
> things well, the algorithms will almost always be self-evident. Data
> structures, not algorithms, are central to programming.

:O

Epiphany

------
badrequest
There are actually six rules!

~~~
combatentropy
There is no Rule 6 ---
[http://doc.cat-v.org/bell_labs/pikestyle](http://doc.cat-v.org/bell_labs/pikestyle)

------
nurettin
These rules look silly when you know that your tight loop that waits for IO or
redundantly computes things needs caching. No, you don't need to measure that,
and you know that your tight loop function is going to be the bottleneck.
Everyone knows that.

Now it does make sense when you introduce an entire constraint library instead
of looping over 3-4 variables with a small search space. But again, you know
it is a small search space. You know you don't have to optimize it.

I really don't get these rules.

Edit: Go ahead and roast me, but keep in mind I've probably been there and
back.

~~~
zimpenfish
> No, you don't need to measure that

Just this evening I came across some code which pasted an image on top of a
blue background (in Go) that set every individual pixel to the background,
then got every pixel from the source and set the corresponding pixel of the
destination to that colour. I figured it'd be quicker to paste the source onto
the destination with `image/draw`.

Turns out, if you're using NRGBA images, it's 40-50% slower. That's definitely
an "obvious optimisation" that was proven wrong by measurement.

(If you're using RGBA images, though, the pasting method is 300% faster.
Because obviously.)

