Hacker News new | past | comments | ask | show | jobs | submit login
Tom Tryniski digitized nearly 50M pages of newspapers in his living room (cjr.org)
326 points by dzdt on July 30, 2018 | hide | past | favorite | 70 comments



Tom T., I salute you. For the past 10+ years, I have used hundreds of scans in your archive to learn about my roots in small farming villages and factory towns in northern and western New York. Thank you for not selling out to Ancestry.com or Newspapers.com and keeping this resource free.

His devotion to old newspaper archives leads me to ask: How could vital records and other paper archives at the municipal level be digitized and indexed? What are the technical and legal considerations for such a project?


> How could vital records and other paper archives at the municipal level be digitized and indexed?

The typical business model is for a company to digitize the ones somebody might want to look at someday and offer online access.

> What are the technical and legal considerations for such a project?

Legal: more than can be articulated here, but the FCRA is sort of the big one that covers much more than you'd think. Technical: buy a scanner or two.


In terms of scanners, are there models that can handle odd-sized books and ledgers, many in fragile condition? This is where many old vital records and documents reside.

Another issue for indexing: 18th, 19th and early 20th century handwriting styles. Many clerks had good handwriting, but with lots of variation in style, written over printed templates (marriage certificates, property and tax records, etc). How good are modern OCR technologies recognizing variations in handwriting?


There are a few different levels of digitization here if you're talking about something like a county clerk or a state office or something.

You can digitize the documents with OCR (or by hand) and process them with a level of precision great enough that you can archive or destroy the original documents and just print out a new document when you need it. One example would be the states that no longer issue a "real" birth certificate and just print one out on official letterhead when there is a demand.

You can index the documents and stuff them in the basement, which is pretty much the default for many county records. The computer is basically equivalent to a card catalog in that case. The database might very well be a scanned card catalog...

You can make a nice, pretty, OCR-enabled picture of the original document and store it on a database. Searchability can vary.

Things like signatures and cursive writing will often just need to be interpreted by a human during the process of digitization. If such records aren't important enough to justify a few minutes of someone's time, they might not ever be digitized.

There's a lot of variance here. A state agency like a Department of Motor Vehicles has a strong motivation to make things more efficient and the advantage that none of the records are really ancient. A county clerk in a poor, rural county has completely different resources and needs.


1. If documents are odd sized, or fragile, cameras usually the best choice for digitization. Doesn't have to touch the pages, software can handle differently sized formats etc.

2. OCR still isn't even 100% accurate for printed text. Handwritten text accuracy is worse. Cursive is even less accurate.


Please can you elaborate more about the legal aspects? I've scanned a lot of Chinese-language material (especially bilingual English/Chinese and dialects) and found some new Unicode characters that way.

I'm reluctant to share it online because I'm scared about copyright.


The original copyright is with the paper for anything published after the second world war (the exact date depends on the country). They have the sole right to decide how and when their archives are published online. Determining copyright status is non-trivial, here is an example for some US periodicals: http://onlinebooks.library.upenn.edu/cce/firstperiod.html

That being said, the paper have usually long since went under and the copyrights will never be asserted. In the extremely unlikely case they will be (for example, if the paper is still in circulation), a free online archive with no advertisements falls squarely under fair use, given the immense historical value of old papers and their nil current commercial value.


"nil current commercial value" can't be right.

"He has, on occasion, been approached by companies looking to partner with him or purchase licenses to his archives. He’s turned them all down, including one offer for half a million dollars.

“I knew [my collection] would ultimately be charged for,” Tryniski says nonchalantly, explaining why he declined the offer—for most people, an enormous sum of money. “I really didn’t like the idea of charging a guy to use my site, and then for them to take the biggest profit."

I assume the operation used a Russian server because of the copyright issue.


Actually, I think cameras are the preferred method for digitisation now --- they've gotten much better in terms of resolution, and have always been many times faster.


Dropbox has a document scanner that has completely obliterated the anxiety I get over whether I should keep a particular document or not. If it doesn't need to be physically presented anywhere, it can be scanned and tossed, and Dropbox offers a convenient, perfectly adequate solution to do that.

Documents that do need physical presentation can always be reacquired, but there's only a handful of those, easy to put them all in an envelope and stow 'em.


Yes! Dropbox really nailed this. I keep hoping it’s a feature iOS ends up implementing natively in the camera app with the ability to save to iCloud Files.


That's interesting. I guess it would happen eventually.

I knew that a while back drum scanners were the ultimate for scanning large documents/artworks. I wonder if now they're doing image stitching with cameras?


You can certainly do image stitching with camera images. From the software's perspective it's no different than putting together a panorama, really. (There are some minor differences, since the camera is not being panned from a static position, but they are not big.)

Drum scanners were always the ne plus ultra when it came to scanning film, particularly large format film like 4x5 negatives. But as CMOS sensors have gotten better, the advantage has diminished (the design of drum scanners was driven by PMTs and later, linear CCDs), and I don't know if even 4x5 format photographers are bothering anymore. They are a real pain to work with, especially the ones that required a "wet mount" of the negative. Historical negatives were never a good match for them.

It's nice that the advancements in the consumer market have made it much easier for archivists doing copy stand work.


There is something awe inspiring about people who spend long times doing something they love without expecting much in return, for public good. These are the projects that need to be supported so they are able to do it as long as they physically can.

Huge respect for not selling.

Anyone know of more examples?


I share this sentiment wholeheartedly. The power of single minded dedication is humbling. This type of act adds wealth to the human experience far beyond what the "corporate" model can deliver.

I hate to pull politics into stuff like this but I always thought one of the massive unforeseen benefits of a basic minimum income would be an explosion of these types of "labors of love".


That's would be horrible for innovation. People want to feel that they could make millions of dollars creating something but instead turn the money away. That choice is very important.


I don't understand how that is incompatible with basic income? Basic income isn't going to give you millions of dollars...


Basic income discourages levels those at the top with more taxation in order to ensure a level base for everyone else. And it also encourages the citizens in the middle to get a little less aggressive in the workplace market. Based on primal human instincts alone, those that strive to outwork everybody are less likely to do so if it means that they are not getting compensated equally for their effort.


Are there any studies done that show these effects? I'm interested in basic income.


Its happening right now. This guy is even part of the study!

A certain demographic was singled out and provided with a guaranteed income for the rest of their lives. As expected many of them just sat alone in their apartments taking drugs and watching the tube. Many, though, went out socialized, volunteered, and generally made their community better. Some even did as Tom here, finding a passion that benefited others and dedicating themselves to it.

All in all, the study is succeeding. Its called "social security".


Iirc, the creator of VLC maintains much of it on his own, and going by his comments on Reddit, has turned down several eight figure offers.


I like VLC but one of the developers is a major dick. You sometimes see him ripping someone a new he for asking an innocent question on forums.


If you want to help out with a similar project, Ted Nelson (of Xanadu etc) used to sign up for "junk mail" about the present and future of computing, archiving ephemera that might not exist anywhere else anymore. Recently they've been scanned and uploaded to the Internet Archive. https://archive.org/details/tednelsonjunkmail The project is finished and the scanner has been paid, but the project overall is still in the red. Financials and paypal link at https://docs.google.com/spreadsheets/d/1rqfc2R_Ti6-WnuneaMfH...

Selections: https://twitter.com/hashtag/TedNelsonMail?src=hash


> There is something awe inspiring about people who spend long times doing something they love without expecting much in return, for public good.

If only we could have such people elected for president.


> Anyone know of more examples?

Jason Scott!

Also: https://www.nbcphiladelphia.com/news/local/Digitizing-35-Yea...


Aren't these kind of work where Patreon could be used at its fullest?


Casey Neistat is someone who comes to mind. He has chronicled so many moments of his life on video and shared them with the world.


He's expecting something in return though, money from ad revenue.


Casey started monetizing his channel much, much later. In my view, the footage he has amassed and shared on Youtube over the years shares parallels with what Tryniski is doing, namely, he has provided people with a visual account of what it is like to live in New York City at a particular time in history. While obviously different than what Tryniski does in that it is more autobiographical, I definitely think it'll be significant if not now then in the future, especially as the city develops more and changes over time. There's been a lot of footage he has captured that only he has captured, e.g., https://www.youtube.com/watch?v=TKOdMA97FGM


Do you have a source for him monetizing his channel much later? The way he describes it in this video[1], he says he made a series for HBO (around 2010), then a couple films, then shifted his focus to youtube. It sounds like he considered it at that time his main income source. The youtube partner program launched in 2007. Even if he wasn't monetized right away, the idea of monetization in the future was likely a motivation. In this video[2] he describes his dream as making money making videos.

The video you linked was uploaded in 2016, certainly long after he started monetizing his videos. The title doesn't seem to be aimed at sharing information with future generations. The title is vague and non-descriptive. It's in all caps and clickbaity. The title is nearly the same as this over video of his[3] which is about a completely different event. Many of his other videos have vague clickbaity titles and thumbnails, sometimes the titles are misleading.

[1] https://www.youtube.com/watch?v=V6Y-ahQFQDA

[2] https://www.youtube.com/watch?v=BQ_z48aJD5o

[3] https://www.youtube.com/watch?v=GJoDRUybisw


IIRC Casey Neistat didn't start vlogging daily until 2014

I've watched a vast majority of his videos. Some of his videos date back in early 2000s as well, during times he did work for local political campaigns

His motivation is making a living doing what he enjoys doing, which is telling stories. He's said that many times over already. Youtube was his way of bridging his experiences from cinematic private production in hollywood and bringing it to the masses so everyone can enjoy it. 368 is just his next brainchild after the downfall of beme


Are you talking about vlogging?


Yeah, definitely. The vlogs are cool. Check out the amount of hard drives he ends up using https://youtu.be/Zlu5tkeTg9Y?t=4m54s


> Tryniski has no formal training in archiving and isn’t particularly interested in working with any of the various other online newspaper directories, especially those with regimented archival requirements. He has, on occasion, been approached by companies looking to partner with him or purchase licenses to his archives. He’s turned them all down, including one offer for half a million dollars.

I do hope his material is free to be stored on Archive.org at least? Or perhaps even his entire website. It would be terrible if this resource would disappear


Yeah, that was sort of my reaction as well. It's great that the guy is doing what he's doing, and I respect his desire not to sell to some vampire-squid monetizer who will paywall all of it for revenue maximization, but ... what's his backup strategy? What's his estate plan? Does whoever is going to inherit his house know that there's more to that giant pile of hard drives, besides a task to be farmed out to 1-800-GOT-JUNK?

I am concerned that this entire effort is one sloppy-street-crossing or one house fire away from being gone forever.


Its amazing that an individual, working just with their own time and money in their own house, is so outpacing efforts like that of the Library of Congress.


That's an unfair characterization.

He's scanning the product of the New York State Newspaper Project, which is/was a project funded by the National Endowment for the Humanities from 1982-2010 and the New York State Library/Archives. That project got access to these old newspapers, scanned the paper to microfilm, etc. That required alot of time and money.

Tom does a cool thing, but he's utilizing the work product of the government, mostly using the library system to provide him with the films at little or no cost, and will almost certainly sell the collection either himself when he needs to or via his estate.

Meanwhile, the microfilms will be around for hundreds of years.


Not amazing at all. Those bureaucracies spent their time planning, budgeting, arguing, doing studies, etc., while Tom Tryniski just puts the microfilm in the scanner and scans it.


That's a neat soundbite, but it's committees, planning, arguing, and meetings that got us the protocols underpinning an open internet that are holding up decades later. Not to discount his work, but wouldn't it be great if he sat down with some people who argued for the best format to store and deliver it?


David D. Clark, Chief Protocol Architect of the development of the Internet in the 1980s, famously wrote this similar neat soundbite:

    We reject: kings, presidents and voting.
    We believe in: rough consensus and running code.


I talked to a museum archivist about this once. She was obsessed with getting the best scans possible with the best equipment and budget. So no scans at all were done.


Given that some book scanning processes are destructive, and the presence of a digital archive may be a pretext to dispose of the physical one, and you'll never get budget to digitise it again .. this may not be the worst decision.


The microfilm he's scanning is a government product.

He's doing a better job making the output of a government project accessible to the project, but except where he is actually scanning the paper newsprint itself, he is building on the work of others, much of it taxpayer-funded. Credit where credit is due.


you clearly haven't worked in govt.

Govt not that great at quickly getting something done.

Govt IS pretty good at keeping on doing something though. That can be an underrated benefit of govt.


No meetings and decision commitees.


No accountability. Nobody's going to come and ask criticise how he spent their grant money, why he chose this paper over another, etc.

Distrust of the public sector leads to excessive accountability requirements that often take up a large fraction of the project cost, maybe even more than half.


Not being ironic, I really like his website. Brutalism amateurs should take a look http://fultonhistory.com/ That is the perfect example (on desktop), no regard for the design, still perfectly functional (including back button etc..).

Edit: just noticed the use of jAlbum for the gallery. Just adding +1 to nostalgia :)


Renders as a blank page with a color changing square in the lower right corner for me. Very minimalist.


I like it too. Reminds me of a similar ("not-ironic") site here in Brazil: https://www.ccdb.gea.nom.br/

This guy serves tens of thousands of pages of his master piece book called Gea, and even makes his own bizarre old style renderings and animations to go with it.

A recent rendering of his muse, Ky:

https://www.ccdb.gea.nom.br/ky_2019_altissima_resolucao_2758...


Brutalism != "no regard for design", but an initially unadorned and functional design that makes no attempt to hide materials and structure.


Wants me to install something called flash, no thanks


Flash isn't necessary to use the site. It seems to be used for a PDF display portion of the archive search's split-pane view. The search results are displayed fine without Flash.


Oh wow, thank you for pointing that out! You can barely see it because the internet is too fast these days, but after you click a link it says "A wise decision!"

EDIT: His donation page is here: http://fultonhistory.com/Donation%20paypal.html


Somewhat related: Nicholson Baker didn't expect librarians to act like barbarians - but then he learned they were destroying or dumping millions of books and newspapers. He tells Oliver Burkeman why he had to take a stand.

https://news.ycombinator.com/item?id=17644080


He reminds me of the woman who obsessively taped local TV news broadcasts for decades, and wound up with the only surviving record of those shows. Truly an invaluable service both of them did and are doing.


Ah, yes, Marion Stokes: https://kernelmag.dailydot.com/issue-sections/features-issue...

Wikipedia reports she left 71,716 VHS tapes of news recordings. The Internet Archive is on it.


This is a good time to remind everyone to donate to the Archive: https://archive.org/donate/


Am I the only one who feels that the author's withering rant about Tryniski opinions, and her blabber about free press is an undelicate and unneeded addition only to push some agenda ?

Like, is it really the point of his project ? Are his views expressed anywhere in his archives ? Why be so judgmental ?!


The CJR is a journal about the press; its writers' opinion that a free press is vital is nailed on.

I don't read it as trying to push an agenda (which I suspect the author believes is shared by her readers) but to explore the tensions between America's tradition of a viciously partisan press and its 20th Century ethic of studied neutrality, and of tensions between the press as abstract apple-pie-and-motherhood and distrust of actual publications (the archivist's selection of Fox over other media, and the observation that even in their heyday there was commercial pressure for small town press not to do investigative reporting).

If you find arguments about the state of the press distasteful I can see why you wouldn't like the last section. But I found it quite interesting.


His political position is unusual and relevant as his life’s work is dependent on the work done and funded by New York State and the National Endowment for the Humanities. Those microfilms weren’t created as the last act of a bankrupt newspaper!

Like many people entranced by the right wing message, he doesn’t realize what it really means. (ie, library budgets are always a target, which would mean no library network or librarians to get microfilm from, etc)

If you’re a student of journalism or archiving, it is probably very difficult to understand this gentleman’s position.


I totally agree. There is no doubt the reporter is a Clinton supporter and it almost seems like she have problems believing a good guy like this can support Trump. The article really shows off Tom as a guy who has his beliefs but also dare to hear other people with different opinions.


Tryniski supports Trump, who regularly describes journalists as public enemies. And at the same time he spends a a lot of time working on the material produced by journalism. I'd say that's an interesting contradiction, and exploring it added colour to what would otherwise have been a dry article about microfilm scanning.

Some years ago I did a course on magazine journalism. You get taught to look for the human angle when writing a piece, and to find a "hook" for your article that relates it to current events. This was just a journalist doing that.


The entire article is undisciplined and sloppy. She really gets in the way of the story she's trying to tell.


"The reasonable man adapts himself to the world; the unreasonable one persists in trying to adapt the world to himself. Therefore all progress depends on the unreasonable man." --George Bernard Shaw


https://www.youtube.com/watch?v=KVWDX6oaYCg - here's an older video about him. Pretty interesting. We need more home archivists (or whatever the proper word is) like him. There's so much stuff out there that would be interesting to search through.


> We need more home archivists

We are out there on r/DataHoarder, and on some obscure trackers like Myspleen. I hope that when I'm older I can preserve some history stuff as well for the greater good.


Looks like we need more capeless heroes like Tom Tryniski!


What Mr. Tryniski has done is more than admirable. But unfortunately he has failed in one aspect and that has to be as a leader. When he dies whatever number of pages have been archived all efforts will cease.

If he had volunteer helpers and a foundation he could will money to help them keep it going.


This guy digitizes over 50M pages for the greater good and you're calling him a failure?


"... in one respect"




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: