

Sorting and searching at the library - nkurz
http://jorendorff.github.com/hackday/2012/library/

======
bgilroy26
Thanks for the great read!

It's my life's dream to do a real-life process vs. algorithm design comparison
like this one.

Some day, I will visit hundreds of the world's best kitchens (normalized for
ingredients budget) and survey how the instructions filter down from the head
chefs there to the prep cooks.

I would love to see the best practices in the process from raw ingredients to
completed dishes abstracted to the highest level and then compared with
concurrency libraries/best practices to see if there are any places where
software design and kitchen instruction exhibit convergent evolution due to
pressures they face that are similar on an abstract level.

~~~
btown
I, too, would love to see this kitchen-algorithm study done. Not just in how
the instructions filter down, but also in how resources are allocated and
locked.

What happens when two chefs are ready to fulfill an order at the same time?
What happens if a resource, such as a stovetop, is under contention? Does the
higher-up chef in the social hierarchy win whenever a resource is under
contention (giving a whole new meaning to master-slave), or is there weight
given to the priority of the task? Do kitchens evolve a verbal "code" to
communicate hierarchy? Are there any large kitchens that don't use the
strategy of tacking all orders on the same wall... and how was that strategy
developed in the first place? And are any of these things taught in cooking
schools?

Heck, this could be a whole book exploring a whole bunch of industries...
construction, door-to-door marketing, and more. I have no idea who the
ideally-situated person would be to do something like this (maybe it's you, or
the OP), but if the right person started a Kickstarter campaign, I'd
definitely contribute.

------
MTGandP
This was a fascinating article. I do have a small correction: bucket sort is
more similar to quicksort than merge sort.

merge sort bucket sort

1\. divide (in half) 1. divide (in bins)

2\. sort halves 2. sort bins

3\. merge 3. done!

(Edit: Sorry, the formatting isn't great. I don't know how to do fixed-width
text.)

The big difference here is that for merge sort, dividing the lists is trivial
because all you have to do is split it into the first have and the second
half. But for bucket sort, you have to actually look at each element to decide
which bin it should go in. This is more like quicksort's partition step.

------
OllieJones
Great article, thank you.

Librarians aren't as dumb as you make them out to be. Several times I've asked
a librarian to locate a mis-shelved book I really needed, and they've always
succeeded. I gather it's a pain, but it's considered a big victory.

Also, as a library patron, they've taught me to take any book that I notice
mis-shelved and set it on a nearby table.

It never ceases to amaze me how human intelligence and care can transform
algorithms that are theoretically brittle into systems than endure for
centuries.

------
hodgesmr
The article says this about insertion sort:

 _This is the fastest sorting algorithm for small arrays, up to maybe 20, 30
elements. Even in software!_

No no no. Insertion sort is fastest only when the items being sorted are in
_nearly sorted order_. Its runtime will always be dependent on the number of
items, n, being sorted. For a completely randomized array of items, it will
run at O(n^2 ) regardless if n is small. Quicksort and Mergesort will do
better in these situations.

~~~
jeltz
Any source for that? I am pretty sure insertion sort has a lower constant than
quick sort and merge sort. O(n^2) with low constant can beat O(n log n) when n
is small.

Wikipedia agrees with me too about quick sort being slower:

<https://en.wikipedia.org/wiki/Quicksort#Optimizations>

EDIT: Wikipedia says the same about merge sort.

[https://en.wikipedia.org/wiki/Merge_sort#Comparison_with_oth...](https://en.wikipedia.org/wiki/Merge_sort#Comparison_with_other_sort_algorithms)

~~~
hodgesmr
You bring up very valid points. I was speaking in terms of asymptotic runtime
analysis, which does not care about constant overhead. However, the cost-per-
operation will be dependent on the architecture that runs it.

~~~
nandemo
You cannot talk about asymptotic behavior ("as n grows to infinity") and
"regardless if n is small" at the same time. If you're talking asymptotic, you
should ignore the constant. If you're talking about small n, then you can't
ignore the constant.

------
ww520
Librarians don't cheat by posting the prefix of the key on the aisles. That's
called radix sort.

------
ninetax
Thanks for the article! I love to see this type of stuff on HN.

It's so interesting how we think of cost in a computer vs what really costs us
in human time.

------
peter_l_downs
Great article! Really hits home for me because I spent a year working in the
children's room of a local library. I was that guy who helped people find
books, stacked books on shelves, and checked people out.

The worst of the three jobs was definitely stacking books back on shelves. I'd
start with a cart filled with books. OP has it totally right — they are
grouped (sometimes, but not always!) by general type — but within those type
they're usually randomly arranged.

At this point in time I'd been programming for around a year and a half and
was just starting to try to learn about different search algorithms. If only
I'd realized that I was sorting every day! Looking back, I found the fastest
method (for me) to be quicksort. Go figure!

------
the1
Library isn't RAM. If library books were to be RAM, you'd clone yourself many
times and stand in front of each book. And, all of you would share mind. Then
performing binary search in the library is sort of okay.

Or, it could be organized in a circle where you are at the center. You don't
move. But you pull/push each book with a magnet. But, RAM is more like the
former. So many parallel flip flops. No wonder it's expensive.

~~~
therandomguy
In the post he says that. They part where he discuss the cost of the
algorithm. For computers it is only the comparison. For library it is the
comparison and distance to walk.

------
nicholassmith
This is one of the few times I've read something and all the comments are
overwhelmingly positive.

I really enjoyed it, it's tricky finding real world metaphors that can be
applied and then deconstructed into the technical details, bookmarked to show
people struggling to understand algorithms.

------
pavel_lishin
I think most of the books at the library I visit already have RFID chips
inside. I look forward to reserving a couple of books, walking in, and having
the actual physical location of the book light up in a Augmented Reality
display.

At that point, it won't even be necessary to sort.

~~~
hodgesmr
Sorting is still necessary... but you might be interested in ShelvAR -
<http://shelvar.com/>

~~~
jorgem
That is cool. I wonder if a good library worker is faster than Shelvar?

------
rinon
Awesome writeup! I will definitely be pointing friends who are curious about
algorithms to this.

------
alaaibrahim
Ok, I'm sorry to be the negative guy here, but there is some thing terribly
wrong in the article. The Library is not an array. It's more of a hash table
than an array. You first calculate the hash for what you are searching for, go
to that location, and search there.

Once you stop thinking of a library as an array, and be more realistic, you
could understand why what the writer is talking about doesn't make much sense.

------
T_S_
Really great. I wonder if there are other algos that are being overlooked
because we are looking at them with the wrong cost model in mind.

