
Moving Wikipedia from Computer to Many, Many Bookshelves - sharkweek
http://www.nytimes.com/2015/06/17/books/moving-wikipedia-from-computer-to-many-many-bookshelves.html?_r=1
======
rndn
That's a rather large font though, and no use of hyphenation. Here is the
Encyclopædia Britannica for comparison:

[http://i.imgur.com/qXGfGWa.jpg](http://i.imgur.com/qXGfGWa.jpg)

[http://i.imgur.com/obu7D8e.jpg](http://i.imgur.com/obu7D8e.jpg)

~~~
pimlottc
For the love of god, man, justify that text!

~~~
sp332
With columns that narrow, it will still look like crap without hyphenation.

~~~
jimktrains2
I never understood how word processors and web browsers are so incapable of of
doing hyphenation. The Knuth algorithm is great and open. It requires a bit
more computing power, but I'd rather my computer be doing that then
implementing some crappy transition.

~~~
sp332
I don't think word processors are bad at hyphenation recently. They do
optimize for fewest lines instead of least raggedness though. And Knuth's
algorithm takes O(n^2) time which could get significant with long documents
and seems like overkill to recompute for every keystroke. I guess it could be
computed once when saving or printing though.

~~~
jimktrains2
It's only O(n^2) for each paragraph, isn't it?

A "Rejustify" button that would recompute the proper justification is an
interesting idea.

~~~
sp332
n is the number of words, according to Wikipedia.
[https://en.wikipedia.org/wiki/Line_wrap_and_word_wrap#Minimu...](https://en.wikipedia.org/wiki/Line_wrap_and_word_wrap#Minimum_raggedness)

~~~
jimktrains2
That's what I meant, it's O(n^2) for the words per paragraph I thought, not
the entire document.

~~~
sp332
Oh right... yeah I guess that's true.

------
danbruc
By the way, I'd love (Google Maps) satellite imagery printed onto a sphere
large enough to be able to see every house and street on earth. Unfortunately
127.42 meters diameter (1 mm = 100 m) and a magnifying glass is probably as
small as you can make it without losing the ability to look at everything
without a microscope or something similar.

~~~
vmorgulis
It's a good idea. It's feasible with a balloon.

~~~
toomuchtodo
Would make an incredible outdoor work of art.

------
danbruc
I could never do this - push the project that far and then not print every
single volume. Not sure if it is a feature or a bug of mine.

------
mark_l_watson
I had a lightly similar idea that I hope to implement this summer. I mention
my idea here in case someone else wants to do it first (and save me the
work!).

I keep just the text for the English language articles on my laptop in a text
file, one line per article. It is a little more than two gigs. My idea is to
compress the text using a ZIP entry for each article, build a Lucene index for
local search, add a small Java web app to search and peruse a local copy of
Wikipedia (or host it for family and friends) and package it all in one Java
JAR file. This could provide a useful local copy of the text in Wikipedia in
one handy 4 or 5 gigabyte file.

The downsides to this idea: no media files and it would not be up to date.

Edit: I use the Wikipedia text for NLP experiments, including finding entities
(my own code) and processing with both Open Calais and Standford NLP
libraries.

~~~
sjs382
There are lots of Offline versions of Wikipedia. Some have search, some have
media, etc. Here's one:
[http://xowa.sourceforge.net/](http://xowa.sourceforge.net/)

~~~
mark_l_watson
Thanks, that looks good. I might still try my idea for a compressed.text only
version but I will try xowa first. Thanks again.

~~~
toomuchtodo
With the recently released 512GB microSD cards, you could carry all of
wikipedia around (not text only).

EDIT: It appears that since 2013, Wikipedia can fit on a 128GB SD card,
including images.

[http://arstechnica.com/information-technology/2013/11/all-
of...](http://arstechnica.com/information-technology/2013/11/all-of-wikipedia-
can-be-installed-to-your-desktop-in-just-30-hours/)

------
Splendor
tl;dr: "He will not, however, be printing all 7,600 volumes."

------
spiralpolitik
Don't suppose he's planning to release the source for the scripts he's using
to compile the books.

~~~
hachacha
The source code can be found here:
[https://github.com/mandiberg/printwikipedia/](https://github.com/mandiberg/printwikipedia/)
. It will be updated to the most current version and cleaned up once we are
done dealing with monitoring the uploading process. Check back in two weeks.

