
184 year-old Indian library goes digital, including 444 yr-old book on Alexander - jayadevan
http://www.nextbigwhat.com/kerala-state-library-digital-available-online-297/
======
hzay
As a history amateur, it is painful to force myself never to probe into south
indian history [of which I've heard so much, growing up there] simply because
I've learned by experience that it only leads to frustration and
disappointment at the lack of information and research. This work is
incredible.

~~~
nsns
What "lack of history and reasearch"? This field is full of amazing works.
Three great examples (there are many more):
[http://books.google.com/books/about/Symbols_of_substance.htm...](http://books.google.com/books/about/Symbols_of_substance.html?id=znFuAAAAMAAJ&redir_esc=y)
[http://books.google.com/books/about/Languages_and_Nations.ht...](http://books.google.com/books/about/Languages_and_Nations.html?id=0W0oVIZgyEwC&redir_esc=y)
<http://ukcatalogue.oup.com/product/9780198063124.do>

~~~
hzay
Those three books talk about the 1500-1700 Nayaka rule, languages in 1700-1800
and the third book's summary says it talks about the 12th-14th centuries. That
doesn't qualify as evidence of anything. Here is what I'd pay to learn about -

1) Origins, dating and translations of the literature - the sangam lit, the
andal songs, etc.

2) Exactly what the fuck was happening in south india from 500 BC to 500 AD?
There are Roman coins found in south india and south indian coins found in
Rome. One of the Ptolemies refer to a pandian king. Pliny the younger
complains about the amount of money spent on indian goods by the roman people.
Apparently the romans also had a huge space reserved for indian peppers. One
of the chinese explorers have some descriptions of a port in tamil nadu. All
this more or less sums up what we know about this period. I know I have to
cite sources but it will take a lot of time to hunt them down and I'll do it
over the next few days. What else was happening? How did the people dress? Did
they know about the greek and the roman ideals?

3) The pallava kingdom - Why were they so into sculptures? Why did they
practice so much religious tolerance unlike the later cholas, for example? How
did they come to power?

4) The chola kingdom - are the names of the kings really all that we have
about the early cholas who ruled around jesus's time? How come they went out
of power and then came back into power after five centuries? What were the
pandavas doing when the cholas were in power? Were they hunted? Why were
aditya karikala's murderers pardoned by raja raja chola? Is it truly about
their caste? We have a shitload of kalvettus from this time - can we translate
them all and upload them online, please?

5) Do we have anything at all to go by in terms of the food they ate at any
period in history? How did the language change over the two millennia?

Edit: There simply aren't that many books published about south indian
history. There isn't all that much digging up either. There is one Nilakanta
Sastry who is cited by everyone who talks of Cholas, but his books have been
out of publication for decades. In comparison, we have hundreds and hundreds
of books published on every conceivable aspect of a number of other
civilizations - about the changes in english, greek and latin over the years,
the mayans, the rise and fall of rome, histories of the various european
monarchies, the aztecs, etc. [all of which interest me hugely].

~~~
kamaal
The problem is something like this, the government of India is currently
totally disinterested in projects of this nature. What we do currently is take
development work as the sole requirement and work towards that.

Indians have never shown interest towards quality research in history. Heck we
don't even respect the symbols of history we have amidst us. We either
demolish archeological evidences(Many demolished during new Airport road
construction in Bangalore) or use them as places where lovers hang around to
escape from their parents. Just look at the state of historical monuments and
how badly they have been maintained, if it was not for some tourism value even
those would have long vanished.

Apart from that much of the historical research comes from the archaeological
work carried out during pre-independence era during British time or
immediately post that. Archeological Survey of India is a joke, and much of
the historical research is left to curious professors from universities who
are severely underfunded.

I think even if research is carried out now, we are only likely to find out
half truth and much of the story will have to reconstructed piecing things
together.

~~~
mkartic
but they still care about our heritage. isn't that why all the cities with
'English' names were rechristened? :p

~~~
chimeracoder
> isn't that why all the cities with 'English' names were rechristened? :p

Interesting choice of words there - 'rechristened'. In any case, many of the
names were reverted back to their original forms (not just cities, but things
like surnames too).

------
ChuckMcM
The cynic in me wants to say, "Just wait until descendants of Alexander the
Great sue you for copyright infringement."

I think this effort is great. I hope more and more history can be put on line
and made accessible. I am a firm believer that knowing the past is the only
way to know where you are going.

~~~
yakiv
"How _dare_ you digitize books from the library of Alexandria. That's theft!"

No idea what I'm trying to say here, by the way.

------
gulbrandr
You can easily download the books from here:
<http://statelibrary.kerala.gov.in/rarebooks/site_media/>

------
bonchibuji
Hailing from Kerala, really happy to see this happening. This is indeed a
great achievement, and hope there will be more initiatives of such kind which
will help to bring together the vast amount of information scattered around in
the sub continent.

------
rwbt
While I commend them for digitizing, I'm somewhat disappointed that they are
just scanned copies instead of selectable/searchable text. I wish they made
them more accessible for reference by truly converting them to text.

~~~
igravious
This is something I know about.

If you waited until stuff was selectable/searchable (i.e. transcribed) then
you'd never get these types of documents online. The scanning part of this
process is tough considering that either a) very decent equipment must be
bought, staff must be trained, extreme care must be taken handling old
documents, all this costs money and takes time; b) the scanning must be
outsourced because the archive doesn't have the competencies or tech and this
is done at a cost.

Archives regard documents in there special collections as assets belonging to
the institution. There is a resistance to putting them online. Once they are
online they can be 'stolen'. If you want to have the whole lot transcribed
first it could take decades because of the sheer volume of documents and the
lack of researchers and archivists (and its not really the job of an archivist
to transcribe stuff, merely to synopsize for a descriptive list and
catalogue). For instance, The University of London, a liberal institution
founded by Jeremy Bentham, wanted to get all of Bentham's documents online.
Except the guy was a prolific correspondent. It was taking them years and
years to transcribe the Bentham archive and they had only gotten (I don't
remember exactly but something like) 2% through it. In the end the crowd-
sourced the task[1] by scanning everything, putting the scans in a wiki and
letting everyone on the net have a go at transcribing the documents - they are
94% done now. And that's just one historical figure. Take into account that
documents need to be semantically marked up using standards like TEI (Text
Encoding Initiative; an XML format) and that researchers in these areas are
not known for their techie skills and wouldn't know a programming language if
it came up and bit them on the bum and you can see ...

Finally, the institution may never have done research on the documents in the
archive and may want to vet everything before it goes online, or may be
reluctant to 'give away' its jewels. There is a serious tension between
enabling global research and respecting the 'property' of the archive. This is
something that needs to be dealt with now and is what I'm a part of, at the
moment we call it the digital humanities.

Hope that gives you an overview. Well done to the Kerala State Central
Library.

[1] <http://blogs.ucl.ac.uk/transcribe-bentham/>

[2] <http://www.tei-c.org/index.xml>

~~~
eitally
This is something I know about, too, having once supervised this lab (in its
previous incarnation as part of UVA's eText center):
<http://www.digitalcurationservices.org/>. We used a pair of PhaseOne P40
scanning backs[1] with Hasselblad prime lenses, mounted aiming directly
downward at a custom tabletop with adjustable book mounts. We used standard
studio lighting (not strobes). Software-wise, for books and manuscripts we
created batch jobs in Photoshop to minimally post-process the images (adjust
levels & contrast, rename files, not much else). For 3D artifacts, everything
was custom & manual. Files were scanned into a pair of Mac Pros for processing
then burned in duplicate to archival CD for filing and shipping to the Indian
company[2] that had been contracted to transcribe and encode (SGML at that
time) the text. We chose manual transcription over OCR because it gave us
roughly 99.9% accuracy versus 95% accuracy (at that time -- 1997-1999), and
Apex only charged $.03/pg. That rate bought us two transcribers encoding each
page and then compared against each other's work for error correction. In
general we were quite happy with them.

Not including the staff, I think there was about $75k in hardware and about
$5k/yr in consumables (mostly repairing book stands, buying CDs and replacing
CD burners we wore out).

If anyone is curious, here are two projects I worked on: Early American
Fiction (1789-1875): <http://etext.lib.virginia.edu/eaf/> Walt Whitman Leaves
of Grass archive (a whole bunch of versions):
<http://etext.lib.virginia.edu/whitman/>

Note that these were both created 12-15yrs ago and offered both high quality
scans and searchable text, and even basic comparison (split screen view of two
texts: <http://etext.lib.virginia.edu/whitman/whitframe2.html>).

Subsequently, UVA's library has joined TEI and I'm sure things are much more
modern now, but I wanted to provide a little more flavor to what you posted,
with some more examples. Obviously, manuscripts in any language are time
consuming to transcribe. They are often in poor condition and handwriting can
be downright illegible, and don't get me started on issues with accurately
transcribing original authors' own grammatical and spelling mistakes! Argh!

[1] <http://www.phaseone.com/en/Camera-Systems/P-Series.aspx> [2]
<http://www.apexglobal.in/apextranscription/index.htm>

~~~
igravious
That's extremely interesting. Thanks for the detailed description. How many
pages were your transcribers able to get through and at what rate? I've heard
about the Whitman archive (it's often cited and quite famous I think). Are you
still in the field? You know about Nines (<http://www.nines.org/> for those
that don't) I presume :)

------
nodata
Does anyone know what archival format the library is using?

------
ankit28595
The city's name is Thiruvananthapuram not Trivandrum.

~~~
gingerjoos
Trivandrum is what the English called it. Bombay = Mumbai, same way. Let's not
nitpick :)

------
impostervt
Glad to see it online, but a 444 year old book on Alexander the Great is still
~ 2,000 behind when he lived.

------
tcbawo
>3,28,268 >1,84,321

That's an odd way to represent a number.

~~~
blntechie
Indian number system.

3,28,268 = 3 lakhs 28 thousand and 268

1,84,321 = 1 lakh 84 thousand and 321

~~~
rikacomet
yes, in simple

1 Lakh = 100 Thousand 1 Crore = 10 Million

They fall in between the million-billion range.

1--> billion 0--> 10 crore/100 million 0--> crore 0--> million 0--> lakh 0-->
ten thousand 0--> thousand 0--> hundred 0--> tens 0--> ones

------
harichinnan
I think the library and the books survived because the Kochi-Thiruvithankur
area was never under foreign rule.

