Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: How to transcribe 1000s of handwritten notes
181 points by bckr 36 days ago | hide | past | favorite | 142 comments
I have 10 years’ worth of journals.

My handwriting is not great!

None of the off the shelf solutions come even close to recognizing my handwriting.

Can you think of anything better than just opening every single file and manually transcribing it?

I have been thinking about training a model to first divide the images into lines of text. Then, it will be easier to transcribe, and automatically those transcriptions will be associated with areas of the image, in case I figure out a good handwriting model.




Can you read them? Speech to text perhaps. That can also be done locally.

If a note's a minute, 1000 notes are around 16 hours of reading. Scale time needed depending on if it takes less or more than a minute to read. Add a note reference to the start of each recording, like a zettelkasten, so the scanned file, recording and text cross-reference.

If assessing other solutions, that's at least an upper bound on the cost of any other solution.


This is the best answer.

Any techie will desperately try to come up with a tech solution to this problem.

A few months of development later, you might have something that yields trustworthy output.

But 16 hours? No tech solution will be done faster than that.

Don't build a factory for a one-off.


> But 16 hours? No tech solution will be done faster than that.

True

> Don't build a factory for a one-off.

One thing on my wishlist is that I end up with a way to instantly transcribe my notes.


bookmark the nuwa pen project, and check back in 6 month or so when/if experience reports are in. ( https://nuwapen.com/en-us )

If you are willing to use special paper, there is existing Neo Smartpen ( https://shop.neosmartpen.com/ )

Both will force use to us D1 ballpoint pen cartridges, so no suggestions in you must write with favorite fountain pen, or are a Hi-Tec-C only pen lifestyle.


> One thing on my wishlist is that I end up with a way to instantly transcribe my notes.

Many of the implementations are clunky in my opinion, but this exists as a feature in many note taking tablet apps.


Err, 16 hours just to read. Then you still need to deal with the inaccuracies of speech to text.


Have 4 people do it and you're done by lunch.


If their handwriting is like mine, i think it will take more time that way. The other people will be interrupting them every second to ask “what’s that word?”.

Eventually, they’ll learn and speed will go up, but with this amount of work, work will be finished before they make up for the learning curve.


Modern speech to text is, for me, extremely accurate. You also have the original audio and can rerun things as and when technology improves.


Yea, I use a cheapie voice recorder that only saves .wav files for ~10 memos per day, and Whisper transcripts are good. "Tiny" model, 4GB ram laptop. "Base" model runs too, but slower, and produces different inaccuracies.

But overall, if I were suggest an ideal process: 1) transcribe notes w/ Whisper, 2) play back the media in VLC with the transcripts and correct the errors. T = 16 hours of proofing/correction + ~8 hours of headless transcription of *.wav before hand.


I’d add that I had better luck using smaller chunks (about 20 seconds) per wav file for accuracy. Whisper seems to go berserk if you pump in lengthy audio (30+ seconds).

I’d be tempted to at least try breaking down the notes into one line long images (about a sentence) each and give it ago with Gemini. I haven’t tested their ocr, but even if it has errors, I bet you could just ask Gemini again to best fix the sentence.


Whisper works on 30s chunks iirc. You need to use something that's automatically splitting up your input if it's longer.


Do it over 2 weekends. Or an hour a night instead of TV.


Since OP is doing this deliberately for speech2text they will presumably enunciate very clearly and have a good mic. S2T has very good performance under ideal conditions like that


so what? it's 10 years of notes. even if you double or triple or quadruple the time editing it for inaccuracies it's still better than an ocr almost certainly.


Have you tried gpt4o?


Or Gemini 1.5 Pro. The latest multimodal models, while still far from perfect, do seem to be getting better at image recognition and OCR.



>Don't build a factory for a one-off.

Maybe other people can use the software, so it's not a one-off?


I’d imagine 16 hours is a low estimate if OP wants to retain formatting.


I mean, I'd totally try Tesseract[1], a few samples, and a python script. Shouldn't take more than 5 minutes to validate this.

Adobe also has the whole scan thing, and apple can — in some cases — correctly transcribe characters from images.

https://github.com/tesseract-ocr/tesseract


Tesseract out of the box is terrible for anything non standard. I tried using it for the comic books. Unusable. The training for your font is doable, but it's very time intensive (while the tools are pretty good!).


I'd say any of the language models are far better than Tesseract. I did some work in this space and it was an absolute nightmare, event working with pdfs.


For OCR of handwriting, I did some comparative analysis a year back, and I found that Tesseract was... not good. However TrOCR was okay, certainly the best of the FOSS solutions. But Textract from Amazon was the best one by far far for handwriting, though your mileage will vary


from my experience with tesseract ~1 year ago, it was frequently fucking up even with crispy PNG screenshots

I really doubt it can handle handwriting


Handwritten notes, cmon! Don't waste time on tesseract for that.


You made my day. It's obviously an awesome approach!

Documented here: https://notes.dsebastien.net/30+Areas/33+Permanent+notes/33....


Great solution. And, if the notes don't contain confidential information, you could totally hire someone on Fiverr to read them for you. Or on Mechanical Turk, have the same notes be read more than once by different people, so you can compare and more easily find errors in transcription later.


Some good transcription solutions:

https://zapier.com/blog/best-text-dictation-software/#window...

https://otter.ai/

(Haven't actually tried Otter, but it gets a LOT of good reviews.)


Reading the notes aloud is a really good solution without having to spend a ton of time on trying to OCR handwriting.

I can recommend https://www.videototextai.com/ for transcribing huge amounts of audio. (Disclaimer, I am the founder of VideoToTextAI)


Bad solution simply because of information loss!

* after STT, there is objectively less info in the storage format

* OP cannot take advantage of rapidly advancing OCR tech on the storage

* inevitably OP might end up saving the originals “just in case”- rendering this entire process useless


Using STT today doesn't stop OP from also storing high resolution scans for the future.


Additionally, you could hire other people to read them, dividing the task into whatever manageable chunks or even having multiple people read the same parts for agreement.

In the days before good software transcription I saved a ton of time I grad school by splitting up interviews and using mechanical Turk or up work ( can’t remember which one, to transcribe 1 minute snippets, and then took another pass)


Great recommendation, thank you. I have considered this and it’s definitely the simplest way to achieve what I want.


If you also gave all that text, with its audio, to the putative AI, it might have enough training material to learn to read your handwriting.


I've been working on an AI app for 18 months and almost replied to tell you no, AI cant do that.

And, doh, took a minute, and realized I'm a dunce.* And now there's at least 3 I can think of off the top of my head, not to mention local.

Training a handwriting recognition AI is universally accessible. What a time to be alive.

* If you're dense like me: they're not saying "any AI" as "any handwriting recognition machine learning model you build from that dataset". They're saying as AI as any multimodal LLM, it'll do in context learning on what you upload.


Agreed!


I have a similar issue but reading them won't work because the person who wrote them passed away. It there another solution that could transcribe this sort of thing (maybe the original use case would have been for historical texts)?


Using MacWhisper (or other similar whisper.cpp app or utility), you could do it all on-device for a free or one-time fee, too.

note: I have no relation to MacWhisper, just a happy customer.


I have about 5000 pages of research notes. I have found that the quality and usefulness of the material varies greatly. Much of the older material is of little relevance with the passing of time. As futile it may seem, I'm finding that re-reading and summarizing rather than straight transcribing is effective. I'm refreshing my memory of what I did discover and only typing up what is relevant now. Fortunately I'm a fast touch typist, so I can stare at the handwritten page and type; only glancing at the screen after a paragraph or two. Two things I find useful to retain are the dates of the original materials and bibliographic references.


This is important, I think. Allen Ward expands on the topic of "reusable knowledge" standing in contrast to putting notes in a binder and sticking the binder in a cabinet somewhere and then pretending you have stored knowledge.

For knowledge to be reusable, it needs to be actively maintained, curated, summarised, integrated. It takes work so one shouldn't bother at all if one doesn't expect to want to refer to it later.


I think this is the best approach for project-related information. It’s essentially the Second Brain approach.

I’m interested in these journals for autobiographical / psychiatric reasons. Therefore, indeed, the more recent information is more valuable, but not with such a steep drop-off.

The oldest 10% might contain 5% of the value.


I'll throw in another vote for AWS Textract, I've had great results for it against 19th century handwriting: https://simonwillison.net/2022/Aug/25/sfms-archive/


I've found decent success with Googles Cloud Vision API for transcribing cursive writing on the backs of 1000s of family photos.

https://cloud.google.com/vision/docs/handwriting

I threw together a basic UI with the transcribed text in an editable area next to the image where I would edit any adjustments as it wasn't 100% perfect.


Thanks! I did try the vision demo in the console. One problem might be that my handwriting is idiosyncratic / there may be more training data available for historical handwriting styles?


I have this same issue. Got 25 years worth of journals written in cursive that's a bit... non-standard.


Yeah OCR remains an area where the open source solutions can't quite compete on quality with what the cloud providers offer. I've found that (unless you have a cost-prohibitive number of documents to process) if there are complex layouts, handwriting, etc. it's worth going to Google or AWS.


Take photos of them, or cut the binding and scan them all, and then feed the work out to mechanical turk?


Or avoid cutting by using a book scanner e.g. https://www.amazon.co.uk/CZUR-Professional-Document-Auto-Fla...


This is likely the fastest and cheapest option. Pennies per page. Double- or triple- assign them when they show signs of large differences between expected grammar, word choice, or spelling patterns.


I have a hundred or so pages of handwritten letters in Hungarian, but got useless results from AWS Textract and from transkribus. However, I also have about the same number of pages (written by the same person) that I have already gotten hand-transcribed into Hungarian. How might I approach using the already-transcribed stuff to train some kind of AI model or text-recognition model to work on the rest?


A hint that might help at least partially: novadays for managing digital and handwritten notes I juse Joplin, but before that I was an avid Evernote user. Having a paid plan active gives you access to Evernote's OCR function on their backend. I had a lot of handwritten notes uploaded as attachments to Evernote, and I remember that despite my handwritnig being awful their softwre was able to parse it and allow me to, among others, perform quite advanced searches on my handwritten notes. I'm not sure if there's a way to make Evernote's OCR backend work for you in scenarios more elastic that what it's been built for, but I wanted to menion that there's this unique OCR tech that I think does far better job that any standalone OCR software I tried (for my handwriting style which I consider awful). It might be worth researching further for you.


I used to use Evernote for a while, and like you, was a fan of its handwriting OCR.

Sadly, it is no longer software I would recommend:

https://news.ycombinator.com/item?id=36609641


Be careful about Evernote, they got bought out by a somewhat questionable company that has a history of buying up companies and basically not improving them like the old owners.


But the fact is that they are improving it!


BROADCOM?


Have you tried https://www.handwritingOCR.com?

It is designed to do exactly what you are looking for, and has been used very successfully by many others for that same purpose (I’m the founder).

It is not as cheap per page as Google Document AI, for example, but it does tend to be much more accurate for handwriting, so usually ends up cheaper when editing time is factored in.

If you find it does work well with your handwriting, please get in touch and I can try to fit the pricing to your use case.


Does it work for Arabic and hebrew? I am trying to teach myself how to fine tune a model and thought doing this with my own arabic notes could be a fun project. Not sure where to start though.

update: I tired it and it works to some degree and a lot better than chatgpt.


Will try the free trial this week, thank you.

I don’t see a way to fine tune on here, though. Is that right or am I missing it?


Hi,

There's no way to fine tune at present, but it does pretty well with all but the scrawliest of handwriting out of the box.

btw I wrote to the email in your profile, not sure if you got it.


I did, thanks. Will respond soon.


Sounds super cool, but why "per month" and not some "per page" pricing?


Thanks!

I’m still experimenting with pricing, and agree that per page pricing makes logical sense. Still, it’s harder for me to build a sustainable business on that model.

I will probably test a few per-page or single payment options soon, though.


My 2 cents ; let people buy packs of Scans. Say 100, 200, and 1000. Rarely people will have exact match with their work, so they are left with some pages "left over", which can nudge them back into using the product more often to "use up" what they paid for.

With this strategy you might be more successful in making a workflow out of it, and nudge people over to a monthly model. Just don't make the packs so small that they can be aligned with their normal workflows, eg. Transcribing a 40 page note book. I would advise to do some statistics to see how many pages people typically scan at the same-ish time.

Also: it is considered good practice to indicate that you are affiliated when promoting a product


Hey,

Thanks for the feedback and great suggestions. It's something I will try to implement in the coming days.

p.s. I tried to make my affiliation clear - I wrote "I’m the founder" in the original comment above :)


Sorry, I didn't catch the founder part. Got two small kids, so my attention falters a little


Why not do both? That way you can capture both types of customers. Even if you do subscription model people are gonna sign up for a month and then ditch it. Most people don’t need this service “constantly” so per page seems ideal for one type of customer and subscription for another who would need it for a long time, but you may want to put a hard limit on that too, as a summer interned high school student or college student could do a lot of damage for a law operation needing such a service :D


Thanks, I'm grateful for this useful feedback.


people don't want yet another monthly subscription so that's going to be a harder sell. even though the business advice is that monthly subscriptions are better for you, the business owner, you can't forget about your customer. who wants to setup a subscription for something they think they're only going to use once or twice? and then have to go through the bullshit of cancelling.


True, in cases like OP a one-off purchase will be better. But I also have business customers with regular and ongoing document processing needs for whom a subscription does work.

I expect the answer may be a combination of the two.


I was in a similar situation last month. Not quite 1000s of pages but close to 100. Just enough to make typing them out seem like too much work.

I found an app online (I wont even name it) which promised incredibly accurate handwriting transcription. Signed up and found it was true, but they were just sending images directly to chatGPT and returning the result and then charging a fee on top.

I started working on an open source version. It took me only a few hours and I'm sure anyone else could pull it together. used chatGPT example code to connect to API and send an image with a prompt along the lines of "please transcribe the text in this image and return only that, nothing else". even with that instruction it still sometimes prefaces with "sure! I can do that.", which I think is the AI equivalent of Homer Simpson writing "ok" in the "please leave this section blank" part of the form. Anyhoo, I had a basic job queue written, pull in images in order of file creation date and fire them off, append the text to a text file after. There was some cleanup of the file required (weird line breaks) but it saved me days of typing.

You still need a chatGPT API key for it but it does take a good bit of the work out.

At the moment I'm investigating using a free local model. LLava is just as accurate but takes longer than sending it to ChatGPT. but if you were worried about burning credits it would be the way to go.


I record myself reading my hand written notes, then I just upload the mp3 of the recording to MS 365 to transcribe.

I put special stop words like highlight/return so then I can post process and ensure the markdown formatting looks good.


Whisper.app will do this locally on Apple Silicon, FYI.


Whisper.cpp will do it locally on anything.


I have my 3 years of paper, I wanted to use it to experiment building a black mass program. A blackmass program is a concept which will yield to a black mass in the computer, capable of building conceptual cool tech like automating your daily work, self experimentation, self learning etc.

My notes will have instructions to reach the black mass state, a computer image scanner will try to learn my handwritings, take them as instructions, connect dots etc.

The design of this system is cryptic and challenging. because, side effect to create a computational program will result in a circling thoughts for me. And its hard for me to convert it into an action.

Taking that as an inspiration, this program is a circling program, which means, it will constantly spiral upwards in a value that is definitive to its actions in the past.

All my notes has information or points or ideas about this fictional concept. I burned the notes which were repetitive, kept the rest.

When I did that, It created more head space for me. The headspace, helped to solve problems and have more space for more learnings.


> I burned the notes which were repetitive, kept the rest.

> When I did that, It created more head space for me.

This is essentially the idea behind Getting Things Done and Building a Second Brain

As I said to another commenter, I’ve been able to separate out the project notes and ideas from my autobiographical diaries. These latter I want to keep and read.

Thanks for your interesting comment and good luck building your system!


For anyone whose handwritten notes have equations or pictures, Mathpix is stellar. Their APIs can take PDFs as input and return markdown with latex and embedded images. The handwriting recognition is pretty good on my cursive -- good enough anyway that a plain old LLM like Llama 3 can fix the typos.

(Likely under the hood Mathpix has done exactly what you're proposing, with image segmentation, text/image/math classification, then transcription.)

I've been using an Apple Shortcuts automation that turns my handwritten PDFs into notes in Obsidian, with the transcription up top and the PDF embedded below. Could pretty easily be adapted to turn a library of PDFs into a folder of Obsidian markdown notes. Here's a writeup: https://riddle.press/a-marriage-between-handwritten-notes-an...


If those notes are really worthy and meaningful to you, then hire someone to type them out for you. If there is something that money can buy, then save your time!


You’re right, and thanks to another one of the commenters, I have an idea for how I could do this.

Take my journals, and run a relatively simple word separation algorithm over them.

Shuffle up those words and pay to have them annotated.

Reconstruct the dataset from there.


It might be better to provide 2 3 words of context with each separated word. My handwriting is often bad, and I sometimes have to guess a word based on context.


I built this firm a decade ago. https://www.cogent.co.jp/en/

Works with English and Japanese. Sadly I'm no longer with the team there but the work is solid. Try it out.


It seems like using speech-to-text is a faster alternative. You can also consider outsourcing the work. I know abbyy.com offers a service for this. Even though you may not be their target market, they have services for implementing hybrid machine learning and data entry solutions.

If you're into dreaming up cool solutions, you could try using smart pens or tablets to write stuff and then teach a model to recognize your handwriting. But for now, it's just a dream.


Scan into pdf and organize them, keep as PDF.

You have to think about what your goal is. Handwritten notes can be perfectly digitized into handwritten notes. What do you need the ocr for? Publishing? … transcribe what you need, or better, rewrite.

Searching? As you scan, make a basic index so that you can refer to the notes. Organize the folders properly with your notes, use a useful naming scheme.


I'm unsure how recognisable your handwriting is, but the following tech understood mine.

Try LLMwhisperer[1] pdf extraction API. You are only one "curl" command away from extracting your handwritten text.

The best thing is it preserves the layout of your notes, which means it can keep tables as tables and lists as lists.

Check this screen grab for extracting handwritten notes > https://imgur.com/fXk0tcR

[1]: https://llmwhisperer.unstract.com/ [2]: Try it with your document here > https://pg.llmwhisperer.unstract.com/

[edited] added links


Have you tried chatgpt? 10k image requests should be pretty cheap


Theoretical solution: train a model on your handwriting. There should be plenty of easy (relatively) to use apps and frameworks for that.

It will take time but you will have a pretty tailored solution.

Also of course: first of all try to process the images so that they only are white and black (not greyscale, actual B/W pictures)


How about creating a crowdsourced captcha service?

Take scans of your journal pages, split the jpegs/pics into word fragments, display a couple of fragments to captcha clients, generate completed journal entries when the consensus gets reasonably high for each word fragment.

Not sure how captcha services start from scratch - probably ask around/check with google search.

Privacy goes out the door, but you should be able to show disjointed word fragments so no one could reconstruct enough of a single journal entry to expose your more personal info unless they were very determined. Or maybe split the scans into individual letter fragments instead?

Then monetize this for other people in the same situation...


I love this idea. It’s way overengineered for this problem, and I already have a startup that requires my complete attention, but thank you for writing this out.

And if anyone decides to do this, let me know!

Privacy is one of the reasons I would pay for a service like this, rather than pay a person to (try) to do it.

These journals contain a lot of psychiatric-level information about me, which is both what makes it valuable and sensitive.


>Then monetize this for other people in the same situation...

That's basically what Amazon Mechanical Turk is, without the captcha bit.


I’ve had good success with:

1. Scan them (or take photos of each page) 2. put them files in a directory 3. Make a Python script that sends them to OpenAI GOT-4o 4. Store the text as a new file in the directory.


Archaeologist tool, you'll want to fine tune it for yourself.

https://readcoop.eu/transkribus/


Shameless plug: https://getsearchablepdf.com

There's a free trial so you can check if it works for your handwriting.


I know you've said you've looked at off-the shelf tools, but in that did you consider https://www.transkribus.org/? It's a tool designed for reading historical, hand-written documentation—gets used a lot in archives and historical studies. Might be worth an evaluation to see if your handwriting is not great in similar ways to Dutch bankers from the 18th century.


I did have a look at transkribus and their models did not work for my needs.


I've been playing with https://huggingface.co/microsoft/trocr-base-handwritten which has been pretty good so far. I want to take it and fine-tune on my own handwriting. For equations, I either use mathpix or just type them manually.


I just tried ChatGPT on my handwritten notes, OCR can very seldom recognize my handwriting and it nailed it. It’s cheap, you should give that a shot.


Seconded. GPT4 can do this perfectly.


Hm. It depends how much you care about accuracy. ChatGPT does a great job overall, but I have found frequent errors and hallucinations around numbers, names, and dates in particular.

If you do go for GPT-4, just be careful of this. Where other transcription services might fail, or give some implausible output which highlights that you need to check the source, ChatGPT might give a highly plausible but incorrect transcription from which you might not immediately identify that transcription has failed.


ChatGPT-4o only hallucinated on my son’s god awful handwriting with mathematics, mine is pretty bad and it still did fine.

What’s funny is my son made an error in the arithmetic and ChatGPT corrected it - that was the hallucination.


Ideally OP would keep the source images of the original journal pages around even after transcription. I think ChatGPT (or LLM in general) is probably the best option, but the best overall solution would accept that LLMs are flawed and would require long-term iteration.


The problem with ChatGPT is that you might not know to check the original.

If the original text is “I’m getting married on the 10th July”, you’ll know to check the handwritten note if it says “I’m getting married on the l@ July” but not necessarily if it says “on the 16th July”. ChatGPT seems to do the second quite often.


Thanks all, I tried ChatGPT and it didn’t like my handwriting at all.

Which is understandable… :’)


Have you considered training a model on your handwriting?


Yep! However that needs a ton of labeled data, so a bootstrapping method is required.

I like the idea of doing it by speech recognition, or of chopping it up for privacy and then outsourcing that to humans at cost.

One thing I … Imagine … would help—is having a private web app where I could pull up a document and then make a voice recording on my phone.

Maybe I’ll put this together on my plane trip.


If you use Telegram, you can just voice message those to https://t.me/gienjibot, it uses OpenAI's Whisper under the hood, so recognition is superb and also you can immediately fix grammar with it. And yes, i'm both the creator of the tool and the happy user.


I have scanning my handwritten notes also on my todo-list, some of them are even taken digitally. I have noticed that the offline ocr on Samsung (or maybe on Android devices generally) is pretty good, even with characters that don’t exist in English. Unfortunately there don’t seem to be implementations for batch scanning with Android handwriting ml kit or Samsung vision ocr


You might be able to manually transcribe some of the notes and then fine tune an existing handwriting recognition model using them.


Yep, I think that’s what I’ll need to do in the end because my handwriting is just so… expressive.

In order to get that data I’ll probably need to chop up the words in my diaries for privacy, then outsource to a human-in-the-loop labeling service.


Amazon's Textract seems to do a decent job on my horrific scribbles, and is far better than any of the open source OCR tools I tried. To get started quickly, try using Textractor: https://github.com/Artikash/Textractor


Thanks, I recall trying this one too, but wasn’t aware of that repo


Just scan them over the course of a few months, spending a couple of hours a day.

If your notes are anything like mine there might be arrows, drawings or text effects like underlines and circling of words that you'd want to conserve.

You can later ocr the whole thing.


hello,

imho. (!)

* if you have a lot of "uniform" pages - read something like A4 -, get yourself a scanner with an automatic sheet-feeder

or throw some rainy-weekend afternoons on it & scan your notes with some decent SOHO scanner

* don't get too excessive with resolution, 400+ pixels/inch are enough for OCR ...

i always scan with 1200 and reduce the images to 600 px via simple batch-processing / for example imagemagick "convert".

* get yourself a decent OCR software, which is able to read your notes ...

i'm a big fan of abbyys "finereader", but sadly its prohibitively expensive ... ;)

idk how well FOSS OCR software a la tesseract works for hand-written notes.

* create pdfs with automatically detected text in the background for search and the scanned image of the notes.

it additionally generates XML-metadata & from there: whatever you want (web frontend ... :)

just my 0.02€


Thanks buddy. I’ve already got them all digitized thankfully. That was a whole thing in and of itself.

Unfortunately the half-dozen or so OCRs I’ve tried fail miserably on even my clearest pages.

Lots of good ideas in this thread, though.

Thank you!


Have you tried Appke notes. It does a pretty decent job. Here is a example https://youtu.be/eoIIUpdhKZs?si=PXWdhTt0DmFjbrLs


I have a similar situation (especially when it comes to the handwriting) and am now trying to train my own tesseract model, which with around 12 pages of manually transcribed content starts to work.


A really important question here is why?

That will guide what's important. Inaccuracies aren't as much of a problem if you're using it as a search index where you'll return the image of your writing.


Which off the shelf solutions have you tried? https://www.transkribus.org/ is generally pretty good with hard to read texts.


I would approach it like this: how long does it take to type out one journal? How long does it take to research, trial, config, use, retweak, retry, and finally confirm one of a half dozen OCR solutions? Will your chosen solution(s) be available for another 10 years?

I'm sorry but the type it out solution seems the the best choice. You will probably remember something interesting by doing it that way.


Would this work with Mechanical Turk? I wonder how much it would be.


Take a picture, upload it to chat GPT and see what happens.

If it works then scan all the pages and run though it with a script.

Shouldn't take you more than about an hour to code ( with Chat GPT!) in Python.


Your journaling must not be as insane as mine..?


Indeed ChatGPT is one of the solutions I tried.


If you don't care about personal privacy, I would probably just go on Fiverr and upload it to somebody to do it at like $5 a page. Even reading all the journal notes is going to be very tiring.


I presume not well, then [since you're again asking]?


Of course. If ChatGPT worked for this problem, I’d be in human-machine interface heaven.

Oh well.


Which off the shelf solutions have you tried?


Thanks for asking, I’ve tried at least:

- Google Cloud OCR

- Transkribus

- ChatGPT(4V)

- EasyOCR

- Tesseract

- MacOS text highlighting


There's Yandex OCR (1.2 cents per image with handwriting model).

https://yandex.cloud/en/docs/vision/pricing

https://yandex.cloud/en/docs/vision/operations/ocr/text-dete...

Not sure about their training on English content, but they made a search engine for parish registers kept in a number of local archives last year (in Russian):

https://habr.com/en/companies/yandex/articles/712510/


If you have access to a GPU, try a vision-language model using Ollama, and feed it your notes. Might work out!


AWS Textract has worked better for me than the other cloud OCR solutions.


Heard of GCP Document AI? And if you're rich, use gpt4o.


Pay someone on Upwork or Fiver to transcribe them for you.


Why do you want to transcribe the journals?


That’s a great question.

1) I want to have a more-readable version of my autobiographical info. I frequently write things down about my psychology that I specifically want to reflect on later.

2) I want to have a system that can read my handwriting. This is a ton of historical data that I’d love to label.


Can you think of anything better than just opening every single file and manually transcribing it?

No, because the work of manual transcription is a way of telling if transcribing them is worth doing. Or maybe pay someone to transcribe it. Spending money is also a good way to tell if something matters (assuming you have sufficient money).

Orthogonally, maybe building a system is what you really want to do (for many people that would be more enjoyable than revisiting old journal content).

Finally, starting from hand transcription is an entry point into rewriting what you wrote. Rewriting is writing and if there's publication on your roadmap, you will be rewriting anyway.

There's no easy way to write well. Good luck.


> No, because the work of manual transcription is a way of telling if transcribing them is worth doing.

Yes, for a percent of the work. I have spent a bunch of time already digitizing my journals (including a loooong detour where I had to organize them because I didn’t exactly have them in chronological order…)

I have seen and manually transcribed enough of my journals to know I want the rest.

But it’s not worth it to do it manually at this time.

> Orthogonally, maybe building a system is what you really want to do (for many people that would be more enjoyable than revisiting old journal content).

And I do want to build a system to do this, as part of my own personal mind bike.

Thanks for your comment.


Try using the RATH analyser from github


have you tried using gpt-4o? It's pretty incredible at recognizing handwriting.


live text is an iOS feature you could experiment with


in the time its taken you to look into this and procrastinate, you could have done it by hand

greener pastures


Indeed—except for the cost of time, attention, energy, wrist and neck pain.

This is something I want done eventually, and I’m smelling the flowers on the way there.

This HN comment thread is one of the flowers on the path to my OCR pure land zen state


Having experience of this, it's fun to explore technological solutions but at the end of the day, re-reading while transcribing is a lot more enjoyable than quality checking and correcting OCR results - especially if you have bad handwriting.

Good luck with whatever approach you choose!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: