Brooks's version: "Show me your flowcharts and conceal your tables, and I shall continue to be mystified. Show me your tables, and I won’t usually need your flowcharts; they’ll be obvious."
Not all algorithms are obvious from a picture of the data structure, otherwise we wouldn't need computer science at all.
No, you won't spontaneously invent a minimax search with alpha-beta pruning if you're just show the picture of the game tree with board positions.
Some of the "low-hanging fruit" is obvious, but that's about it.
In the other direction, some algorithms are abstract; the same structure will work for more than one data structure. Sometimes the process is important, not the low level representational pieces. E.g. "map reduce".
Gee, don't show me "map reduce"; I need to see the servers and requests! No wait, I mean linked lists and atoms, ...
I found one of my old codes, a few months ago, and I'm still trying to figure out what possessed me to write it.
Essential complexity  is still complexity. It has to exist somewhere; you just get to choose how to move it around.
Foo: "Hey, this 10000-element collection, which we do repeated lookups on... why are we using a list and not a HashSet?"
Bar: "Because lists are so much simpler than a HashSet. Let's just KISS"
Foo: "But... doing repeated lookups on a 10k sized list is so much slower than just using a HashSet!"
Bar: "Oh really? Have you profiled the entire application, and measured the latency impact caused by this decision? Come back to me once you've done so, and until then, we're not going to tune for speed."
Pike's rules are very good directionally for a certain clever and diligent type of programmer who knows their CS fundamentals but sometimes overthinks what they're doing. If you are Rob Pike and you work places like Bell Labs or Google, you are going to be surrounded by more of these people and your advice is going to be directionally aimed at them. If you work with people who are sloppy or have poor CS fundamentals, you're going to want to give them different advice than Rob Pike gives his coworkers.
(In your example, people don't usually get as militantly stupid as Bar unless Foo is somehow antagonizing them or being too argumentative with his advice, which is a completely different aspect of giving advice. Either that, or Bar is a particular combination of "smart enough to rationalize their original position" and "stubborn enough not to give up their original position".)
In any skill, people can fall prey to cults of myth and mysticism as they merit based on adherence to orthodoxy rather than suitability. Programming is no different and sometimes it's hard to hear anybody think over the mooing of all the sacred cows.
I agree that measurements & hard-data are preferable to guesswork, but it takes time and energy to gather these measurements and hard-data as well. For minor decisions where the alternative proposal is very slightly more complex, but there's a very compelling reason to assume order-of-magnitude performance improvements, I would argue that gathering data is a waste of time.
But the way too common situation is that you look at a program under development and tell people "no way, that list will be too large to keep searching, you should design it around a hash set", and people reply something about premature optimization and keep going, then it gets released and is too slow to use so you get to rewrite all their code under pressure.
But by providing a set type, Rust is able to provide for users various set operations which Go requires you to write by hand. It becomes hard to argue that a set is simpler than a slice when you have to write your own methods dealing with union, disjunction, intersection, difference, subsets/supersets, etc every time.
If it is easy to procure a set (as a hash set), then sets should be used when they could be used instead of arrays (unless there are performance concerns to using sets). In languages that do not include batteries (e.g. C), it makes sense to use an array simply because procuring a set isn't easy.
Lots of low-level data structures can be implemented in terms of one another, and which makes sense depends on circumstance.
Sets are much more restricted, they don't have manual indexing, they don't garaunteed iteration ordering without special semantics, they don't do as much as arrays even if they might be implemented with arrays.
Then why use an array when you only need a set and an array would be overkill?
Incidentally, Plan 9 was just 6 upside down.
So the next question is "What's the implementation of the Get(key) method?" Or maybe "Why aren't you using your language's library methods to loop over that list?"
Get is simpler precisely because it hides the complexity.
> Or maybe "Why aren't you using your language's library methods to loop over that list?"
What makes those methods not subject to the "hiding complexity" objection?
Part of what I was going for is that a clean interface might hide a complex implementation. If you've followed rule 1 and see that "Get" is a bottleneck, it makes sense to apply rules 3+4, and see if there isn't a simpler implementation than the one provided by "Get".
The point of the OP is that among different implementations for the same interface, choose the simplest one unless you have empirical evidence that compels you to do otherwise.
How does a List map values to key? How does it even represent them?
The problem with OP comment is that KISS and premature optimization are not diametrically opposed. Thet are two separate principles that mean different things.
Premature optimization is bad, but not because it necessarily violates KISS. Similarly, many people overcomplicate code for reasons nothing to do with optimization.
His argument reminds me of people who argue against free speech generally because we already ban people shouting fire in a cinema.
Not efficient, but very much possible.
In terms of program speed/optimization, the answer depends on how .Get() is implemented.
Premature optimization is the root of all evil.
- relies on simple "Programming 101" concepts
- doesn't require using or learning a framework or library
- corollary to above, doesn't require ascertaining that said framework or library implements a .Get() method
- doesn't rely on anyone else's code and has no dependencies or references
- doesn't require understanding ancillary concepts that might be present such as hash tables
Then writing the loop is "simpler." To answer your original question.
And just to turn things completely topsy-turvy, how about your .Get() method. If it's implemented the way such things are usually implemented, that means it uses hash tables and is very efficient. But that means it's a premature optimization, right, and therefore the devil? (Well we haven't defined "premature" either.)
The real answer is to think for yourself like a grown-up and don't rely on dogma. Every situation is different; tackle every problem by optimizing for what's important in that specific problem, using whatever resources and minding whatever constraints apply to that specific problem.
It might even be faster (cache locality etc as another said -- if the list is contiguous).
But even more so: you don't even need to profile, unless you have an actual problem with execution speed. If it's fast enough that you don't have to care, you don't have to care.
Rule 2 only applies if you can just decide that your current level of performance is "pretty great" and then parachute away to another project. Otherwise if you find that, instead of 1 hot spot taking 80% of your time, you have 5 warm spots taking 16% of your time, you have 5x as much thinking to do to fix your problems and get that potential 4x speedup (assuming you could get cut that 80% down by a factor of 16).
Rule 4 is an argument from bad implementation. It's harder to get fancy algorithms right, but what does it mean to say an algorithm is buggier than some other algorithm. Are we seriously meant to think that there are some bugs buried in Strassen's matrix multiply?
Rule 3 is dubious. It seems to imply that the performance cost of fancy algorithms on low N matters, which assumes that we are actually executing the algorithm quite a bit. Perhaps we've picked the wrong fancy algorithm? Maybe if you have a godzillion tiny matrix multiplies you shouldn't be using Strassen's but you might still get a performance win from doing something 'fancy' that's tuned for large numbers of small matrix multiplies...
As for Rule 5, this is probably the best of the lot. That's why I like to use languages that allow me to describe my data structures in detail... even going so far as to say that the contents of my user-built containers have types. Ahem.
These rules are great if you actually have a smoking gun, however if you're getting killed by random memory access patterns they rarely present that way.
Edit: I also spent ~12 years writing C# which is good. But TypeScript is actually more powerful in a way, because of the impossibility of typing everything. This means it's likely to get things like variadic generics or higher-kinded types long before C#, since (1) in C# everything has to be typed down to the smallest particle (so it's much harder to add features), and (2) changes to the CLR would be needed.
Do you have any alternatives to TypeScript, which compile to JS, which fix the array key issue you mention?
Again, this seems a bit like the assumption behind Rule 2 - that the time spent in figuring out the complex algorithm and ensuring it is bug free is not worth spending (just like saying "oh, it's not worth smashing a hot spot if it's only 20%). This will frequently be true, but not always. I suspect that most people would be happier to use OS schedulers, TCP implementations and compilers (to name a few things) written by people who don't get to "parachute out" when the easy solutions are exhausted.
procedural programming -> 'I hate all these imperative sequences ("flowcharts"), it's all about the data anyway !'
data oriented programming -> 'I hate keeping all these data relationships in sync, can't the data do that itself ?'
object oriented programming -> 'all this behavior is too complex, often does things I don't intend, can't it be simpler ?'
functional programming -> "it's simple, but way too dense (like algebra), can't we just write out what happens in sequences of imperative statements ?"
So we arrive at Wapper's "Battlestar" law of programming paradigms:
"All Of This Has Happened Before And All Of It Will Happen Again"
Object oriented programming is where really complex and large programs go. I agree with Pike : my favorite is data-oriented programming. Read the data and relationships, but only document them, don't encode the data schema directly. But here's the kicker:
The issue with these Pike 5 rules is that while some things are simple (and yes, people can screw up and make simple things complex), there are many things that are complex and can't be made simpler without destroying the functionality of the program. There is no way to make a CAD drawing program that is anything remotely close to these rules. Making such a program simple and predictable, while possible, also destroys its function.
Eh? A CAD program will have a lot of features, but that's not the same as using complex algorithms. For the bits where you do use complex algorithms, with a good architecture you can confine the complexity to plugins and corners of the program.
All of the other rules would apply to almost anything, surely?
Maybe the best example of irreducible complexity would be a video player; the algorithm is handed to you in a multi-hundred-page spec and there's nothing you can do to avoid that.
This is because we haven't figured out how to make them simple yet. I believe all problems can be made simple, once we come up with the proper system. Maybe we don't have the correct abstractions yet or just haven't figured out the best data layout. We've only been collectively programming for a few years now I think we have plenty of room left to advance the field. I'm not ready to just throw in the towel and say impossible.
Maybe with powerful AI and Holodeck-like programs we can make it simple for humans to tell the computer to do anything they want easily, but short of that, I'm not seeing why every program would be simple to build.
However, the more powerful your tools, the more complex buildings and programs you're able to create. Programming hasn't necessarily gotten easier with more powerful abstractions, rather we create more complex programs now.
Claiming that it's "all a cycle" is incredibly short sighted considering the time-scale.
I don't care how well (as in perf)
unless you meant something else (not skilled at logic programming).
Everything about reading this quote depends on what you think a "speed hack" is. Without practical agreement on that, you'll get a lot of people arguing past each other.
I'm ok with a rule of thumb. I just want to point out that they leave a lot of room for disagreement and that the most important part may end up being how you apply the rule in particular cases.
And the "experience" will change per-environment! Think in how many apply something like this well enough when inside his programming language, yet fail when using a RDBMS/NOSQL/CACHE/KV store. Or moving between UI/Backend. Or using the network, etc.
ie. Is easy to see the truth at low-scale but do the same across all the stack is another thing.
I have some anecdote from my early days, when doing a school management app. After almost finished the porting from FoxPro DOS 2.6 (procedural) to Visual FoxPro 3 (OO, procedural, sql) some task were far slower than in DOS.
Eventually I "discover" that was because some tables were normalized "by rows" instead of "by columns" (ie: The central process was around a table that, when printed in paper, was very much alike a pivot table). Then I change the tables to be as 1-1 to that pivot table. This massively improve the WHOLE program. Not only speed, but the simplification in coding and reduction on lines of the program was very big!
A for-loop's body has no such restrictions, it can access any number of neighbor elements if the algorithm so requires.
a[i] = a[i-1] * a[i+1]
This should have been #1.
Really. In all of my 20 years of designing and writing software products, I have the learned that pretty much all of the work depends on the data model/the data structures.
(And also: once you have you have understood the data structures of a program, you kinda understand the program too.)
It is by far the number one mistake for juniors to do - focusing on the code rather than on the data structures.
I'm thinking user documentation should start with a quick explanation of the data model, and from there describe the operations that can be performed on it.
“Although some jokingly called Blender a ‘struct visualizer’, the name is actually quite accurate. During the first months of Blender's development, we did little but devise structures and write include files. During the years that followed, mostly tools and visualization methods were worked out.”
And the architecture of a compiler I’ve been working on has gotten considerably simpler and more solid as I’ve figured out the right data structures—with the right data model, especially if it’s enforced with types, the code practically writes itself. Finding that model is the hard part.
Looks like wrongly attributed to Hoare? That famous quote is by Knuth
"We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%."
1. It is only ever about small efficiencies.
2. And about small efficiencies in the 97% non-critical parts
1. It is always valid to concern yourself with large efficiencies.
2. In the critical 3%, it is also legitimate to worry about small efficiencies.
For context, later in the same paper (Structured Programming with goto statements):
“The conventional wisdom [..] calls for ignoring efficiency in the small; but I believe this is simply an overreaction [..] In established engineering disciplines a 12% improvement, easily obtained, is never considered marginal; and I believe the same viewpoint should prevail in software engineering."
So his actual message is pretty much the opposite of what is attributed to him.
Therin lies the rub: it’s hard to choose data structures.
This is also a difficult investment to justify because, as always, it doesn't deliver any "user facing features". That makes me chuckle because most of the time the "feature" is that it continues to be possible to use the application at all.
Many advanced ('fancy', if you will) algorithms are based on mathematical proofs and relations, which are seldom obvious. Some of them have taken hundreds of years to reach proof and be capable of being used as a theorem.
If all we needed were 'obvious' algorithms we would have hardly solved any real-world non-rudimentary problems with computation.
This seems to follow from Curry-Howard isomorphism.
This is especially beautiful insight giving that it has a parallel with molecular biology - proteins and other molecular structures dominate. Enzymes are made of "standard" proteins to transform particular molecular structures.
This, BTW, is also related to usually overlooked the "code is data" principle.
To be a good programmer one has to know molecular biology 101. It seems like good guy (John McCarthy, MIT Wizards, Bell labs and Erlang guys) did.
OMG, this is so so true! I have seen people writing complicated/fancy algorithms to do simple stuff, and 2 years later they don't even know how did they do it!
Like his creation, the Go language, Rob Pike's rules are somewhat condescending and patronizing those he deems as lesser programmers.
Rule 3, however, is the exception. That is the take away gem based on experience.
That depends on N and on the constant factors. Which means testing still might surprise you.
More to the point, if the O(n^2) algorithm is simpler and leaves you more time for profiling and fixing actual performance issues after the code works, you may end up with faster code by doing the dumb thing.
He doesn't imply that from what I read. He simply said be empirical and practical. Let data (inputs and what you measure) speaks for itself. This is like the simplified version of scientific method in the context of programming.
In my current job we have lots of large and often 'sparsely populated' objects which means basically you can never count on any of the properties you were hoping would be present will actually be present. This means the data objects are essentially useless (at least for knowing what data you have to work with as you are writing code), and violate Pike's rules mentioned about data object design being critically important. In the modern world the younger generation of developers thinks "functional programming is great, and OOP (inheritance, etc) is obsolete", and only after 20yrs of coding they look back and eventually realize OOP had it right all along.
In architectures with massive numbers of sparsely populated objects only the guy who originally wrote any given function will be able to understand it, and everyone else who tries to work in the code is pretty much screwed.
(He also designed his latest language without inheritance)