Hacker News new | past | comments | ask | show | jobs | submit login

The Knuth version goes on:

"Yet we should not pass up our opportunities in that critical 3%. A good programmer will not be lulled into complacency by such reasoning, he will be wise to look carefully at the critical code; but only after that code has been identified."

Above the famous quote:

"The conventional wisdom shared by many of today's software engineers calls for ignoring efficiency in the small; but I believe this is simply an overreaction to the abuses they see being practiced by penny- wise-and-pound-foolish programmers, who can't debug or maintain their "optimized" programs. In established engineering disciplines a 12% improvement, easily obtained, is never considered marginal; and I believe the same viewpoint should prevail in software engineering. Of course I wouldn't bother making such optimizations on a one-shot job, but when it's a question of preparing quality programs, I don't want to restrict myself to tools that deny me such efficiencies."

All this from "Structured Programming with Goto Statements"[1], which is an advocacy piece for optimization. And as I've written before, we as an industry typically squander many orders of magnitude of performance. An iPhone has significantly more CPU horsepower than a Cray 1 supercomputer, yet we actually think it's OK that programs have problems when their data-sets increase to over a hundred small entries (notes/tasks/etc.).

Anyway, I write a lot more about this in my upcoming book: "iOS and macOS Performance Tuning: Cocoa, Cocoa Touch, Objective-C, and Swift"[2]

[1] http://sbel.wisc.edu/Courses/ME964/Literature/knuthProgrammi...

[2] https://www.amazon.com/iOS-macOS-Performance-Tuning-Objectiv...

I'm surprised that anyone thinks it's OK when a program has trouble with a data set with over 100 small entries. Can you point to an example of that?

In some Sharepoint 2010 deployments I worked on, it was possible to create Sharepoint workflows that would would bog down, fail to process new entries, etc. once the list of items grew beyond 100-150 entries.

Admittedly, this was probably related to misconfiguration and database issues (i.e. having zero oversight or administrative maintenance of the underlying MS SQL Server). That specific local minimum might not apply to the context of the article (optimization in code and systems design).

I've seen Hive take minutes (OMG!) to count a table with 5 rows ... but (other) people still think it's OK, because it scales well. It's latency sucks for small data sets, but it can handle very large data sets.

It's true, the startup costs of a MapReduce job are immense. I'm surprised by minutes, but I'm not sure this counts since there are and always will be different solutions and different tradeoffs for problems of different orders of magnitude. Any solution built for massive scale is often considers cumbersome for a small scale problem.

For instance, I find test cases that spin up in-memory, in-process Spark extremely slow, but the spin up is quite fast overall in the context of a job that processes gigabytes of data per task.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact