Hacker News new | past | comments | ask | show | jobs | submit | aidenn0's comments login

I remember in word for windows (Either 2 or 6, not sure) on my 40MHz 386, you could watch it reflowing text if you edited a paragraph near the beginning of a multi-page document. It would correct for widows and orphans, which would cause surrounding pages to potentially do the same. It could take a non-trivial amount of time to stabilize, and if you printed a document before it did, you could end up with duplicated or missing lines at page breaks.

> Ocr is well and good, i thought it was mostly solved with tesseract what does this bring?

This is specifically for historic documents that tesseract will handle poorly. It also provides a good interface for retraining models on a specific document set, which will help for documents that are different from the training set.


The internet archive generates MRC pdfs and have open-sourced their tooling: https://github.com/internetarchive/archive-pdf-tools

Tesseract wildly outperforms any VLM I've tried (as of November 2024) for clean scans of machine-printed text. True, this is the best case for Tesseract, but by "wildly outperforms" I mean: given a page that Tesseract had a few errors on, the VLM misread the text everywhere that Tesseract did, plus more.

On top of that, the linked article suggests that Gemini 2.0 can't give meaningful bounding boxes for the text it OCRs, which further limits the places in which it can be used.

I strongly suspect that traditional OCR systems will become obsolete, but we aren't there yet.


> The wife finishing your sentences is an interesting analogy... My wife and I are usually on the same page about things, so for many topics we can use short-hand or otherwise cut discussions short.

My wife and I are just too different for this to happen. For the first 10 years or so we had the opposite happen a lot (multiple times a day for the first few years), where we thought we were on the same page, but had actually under-communicated. It still happens occasionally, but now we mostly overcommunicate about anything of any importance.

Our kids learned pretty quickly that if one parent was helping them with their homework, but had to leave to do something else, that asking the other parent for help was going to confuse them more, since we come at any given problem from a completely different direction.


Sure, it's not a given that a conversation with a spouse will be at a "we can complete each-others sentences" level every time, or even most of the time. My wife and I are pretty lucky in that regard.

And to bring it full circle, the LLMs aren't going to know where you're going in all cases. I've had REALLY poor luck getting LLMs (admittedly, a year ago) to help me with Python Textual UIs. I don't understand Textual enough, and they both don't seem to understand it well enough, and I think they also have a better understanding of archaic use of Textual (a fast moving target I get the impression).

My wife and I literally just had a conversation of shared knowledge: "I want to get back into the habit of doing more exercise. Those things like Pikmin and that other thing." "Yeah, I know what you mean." "You know what I'm talking about?" "Yeah, but I can't remember the name." "You know the one?" "Yeah, the insurance exercise incentive one, something-go or something." "Yeah, that's the one."


Good goes in the pot, bad goes in the crop. (i.e. keep the good, toss the bad)

I don't know about Python specifically, but using a language I'm familiar with to generate ninja files (+ any header/environment/&c) for the build has become my go-to way of doing builds in the past 18 months or so.

> How in 2025 music players refuse to (not can't, refuse to) get "Album Artist" right blows my mind.

And many of the few that get "Album Artist" won't respect tags that set the sort order for it either. If you are lucky, they will at least special case "The" so that "The Beatles" are under B, but they'll still put John Denver under "J".

The only one I know of that gets both of these right is cmus; I don't choose to use a terminal-based music player, I'm forced to.


cmus does this as well.

Why does this need to be JIT compiled? If it could be written in C, then it certainly could just be compiled at load time, no?

If what could be written in C? The FFI library allows for dynamic binding of library methods for execution from Ruby without the need to write a native extension. That's a huge productivity boost and makes for code that can be shared across CRuby, JRuby, and TruffleRuby.

I suppose if you could statically determine all of the bindings at boot up you could write a stub and insert into the method table. But, that still would happen at runtime, making it JIT. And it wouldn't be able to adapt to the types flowing through the system, so it'd have to be conservative in what it accepts or what it optimizes, which is what libffi already does today. The AOT approach is to write a native extension.


By "it" I meant this part from TFA:

> you should write a native extension with a very very limited API where most work is done in Ruby. Any native code would be a very thin wrapper around the function we actually want to call that just converts Ruby types in to the types required by the native function.

I think our main disagreement is your assertion that any compilation at runtime qualifiees as JIT. I consider JIT to be dynamic compilation (and possibly recompilation) of a running program, not merely anything that generates machine code at runtime.


Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: