Hacker News new | past | comments | ask | show | jobs | submit | noosphr's comments login

This is a problem with Ubuntu and not VirtualBox. I spend a few hours today fighting with 24.04.02 in libvirt only to realize that they are using some fancy new GUI library for the installer which crashes on all VMs: https://www.dell.com/support/kbdoc/en-us/000123893/manual-no...

Mandatory Ubuntu considered harmful.

If only NVidia considered Debian a first class distribution so I never had to use Ubuntu again.


> So now that we brought down prod for a day the new rule is no AI sql without three humans signing off on any queries.

If that’s the scenario, I would be asking why the testing pipeline didn’t catch this rather than why was the AI SQL wrong.

Because the testing pipeline was generated by AI, and code-reviewed by AI, reading a PR description generated by AI.

Because the testing pipeline isn't the real database.

Anyone that knows a database well can bring it down with a innocent looking statement that no one else will blink at.


Sure, but everyone knows humans end up bringing down the database too by writing an innocent looking test query nobody else blinks at, which is why you end up needing a testing strategy for ANY SQL before YOLO'ing into prod.

To offer a 3rd option - what testing pipeline? Incompetent managers aren't going to approve of developers "wasting their time" on writing high quality tests.

It always is for the first week. Then you find out that the last 10% matter a lot more than than the other 90%. And finally they turn off the high compute version and you're left with a brain dead model that loses to a 32b local model half the time.

If a user eventually creates half a dozen projects with an API key for each, and prompts Gemini side-by-side under each key, and only some of the responses are consistently terrible…

Would you expect that to be Google employing cost-saving measures?


A PDF corpus with a size of 1tb can mean anything from 10,000 really poorly scanned documents to 1,000,000,000 nicely generated latex pdfs. What matters is the number of documents, and the number of pages per document.

For the first I can run a segmentation model + traditional OCR in a day or two for the cost of warming my office in winter. For the second you'd need a few hundred dollars and a cloud server.

Feel free to reach out. I'd be happy to have a chat and do some pro-bono work for someone building a open source tool chain and index for the rest of us.


>replace their OCR pipelines with Flash for a fraction of the cost of previous solutions, it's really quite remarkable.

As someone who had to build custom tools because VLMs are so unreliable: anyone that uses VLMs for unprocessed images is in for more pain than all the providers which let LLMs without guard rails interact directly with consumers.

They are very good at image labeling. They are ok at very simple documents, e.g. single column text, centered single level of headings, one image or table per page, etc. (which is what all the MVP demos show). They need another trillion parameters to become bad at complex documents with tables and images.

Right now they hallucinate so badly that you simply _can't_ use them for something as simple as a table with a heading at the top, data in the middle and a summary at the bottom.


I wish I could upvote you more. The compounding errors of these document solutions preclude what people assume must be possible.

I've worked on this in my day job: extracting _all_ relevant information from a financial services PDF for a bert based search engine.

The only way to solve that is with a segmentation model followed by a regular OCR model and whatever other specialized models you need to extract other types of data. VLM aren't ready for prime time and won't be for a decade on more.

What worked was using doclaynet trained YOLO models to get the areas of the document that were text, images, tables or formulas: https://github.com/DS4SD/DocLayNet if you don't care about anything but text you can feed the results into tesseract directly (but for the love of god read the manual). Congratulations, you're done.

Here's some pre-trained models that work OK out of the box: https://github.com/ppaanngggg/yolo-doclaynet I found that we needed to increase the resolution from ~700px to ~2100px horizontal for financial data segmentation.

VLMs on the other hand still choke on long text and hallucinate unpredictably. Worse they can't understand nested data. If you give _any_ current model nothing harder than three nested rectangles with text under each they will not extract the text correctly. Given that nested rectangles describes every table no VLM can currently extract data from anything but the most straightforward of tables. But it will happily lie to you that it did - after all a mining company should own a dozen bulldozers right? And if they each cost $35.000 it must be an amazing deal they got, right?


That looks like a pretty good starting point, thanks. I've been dabbling in vision models but need a much higher degree of accuracy than they seem able to provide, opting instead for more traditional techniques and handling errors manually.

For non-table documents a fine tuned yolov8 + tesseract with _good_ image pre-processing has basically a zero percent error rate on monolingual texts. I say basically because the training data has worse labels than what the multi-model system gives out in the cases that I double checked manually.

But no one reads the manual on tesseract and everyone ends up feeding it garbage, with predictable results.

Tables are an open research problem.

We started training a custom version of this model: https://arxiv.org/pdf/2309.14962 but there wasn't the business case since the bert search model dealt well enough with the word soup that came out of easy ocr. If you're interested drop a line. I'd love to get a model like that trained since it's very low hanging fruit that no one has done right.


The first thing I did when I saw this thread was ctrl-f for doclaynet :)

I've been at this problem since 2013, and a few years ago turned my findings into more of a consultancy than a product. See HTTPS://pdfcrun.ch

However, due to various events, I burned out recently and took a permie job, so would love to stick my head in the sand and play video games in my spare time, but secretly hoping you'd see this and to hear about your work.


There's not much to say.

Doclaynet is the easy part and with triple the usual resolution the previous gen of yolo models have solved document segmentation for every document I've looked at.

The hard part is the table segmentation. I don't have the budget to do a proper exploration of hyper parameters for the gridformer models before starting a $50,000 training run.

This is a back burner project along with speaker diarization. I have no idea why those haven't been solved since they are very low hanging fruit that would release tens of millions in productivity when deployed at scale, but regardless I can't justify buying a Nvidia DGX H200 and spending two months exploring architectures for each.


Thanks, that's interesting research, I'll look into it.

Screens main use case is to open an emacs session remotely.

Tmux's main use case is to be glue for a unix IDE.

The two use cases are rather different and the tools are very specialized for them.


Nah, screen's main use case is as an ad-hoc method to daemonise random scripts.


A more modern alternative is systemd-run: <https://www.freedesktop.org/software/systemd/man/latest/syst...>

Yeah this is 100% of when I reach for screen. “I’m not willing/ready to make this a service, screen detach here I come”

I switched to dtach for the first case.

dtach for session persistence. “Do one thing well.”

> Screens main use case is to open an emacs session remotely.

Not an emacs user, but for this, what does screen do that tmux can't?


Nothing at all. I’ve used emacs over tmux (and now zellij) for many years. Emacsserver can replace both of them, but that’s a different story.

Nothing but replacing it with something newer invalidates decades of muscle memory.

I switched to tmux eventually though.


I'm confused by this statement. Are you claiming this is the projects' stated goal? Their primary use in the wild?

Emacs can work as a daemon.

It also has tramp mode which means you can use all your local packages remotely.

When I realized how powerful TRAMP was, I don't think I ever used screen/tmux again. I'm sure there are uses, mind. Just TRAMP fully hit all of my needs.

It really is magical, isn’t it? And although I rarely need to use it, I love the multihop setups where you can ssh to this system, then ssh again to this other, then mount an SMB filesystem using these credentials, and start editing.

>And while a VC fund is limited in what it can do in providing open-ended freedom. It can try to provide a meaningful simulacrum of that space and community, which is why I’m so excited about programs like 1517’s Flux that invests $100k in people, no questions asked and lets them explore for a few months without demanding KPIs or instantaneous progress.

>>You can move to the United States. (We will help with visas.)

This is no longer viable for anyone who isn't already a US citizen. Not sure how serious about investing in individuals that VC is, but from talking to 16 to 22 year olds _none_ of them want to move to the US with ICE deporting students for saying the wrong thing online - or the perception they do. US universities and businesses are suffering from brain drain that unless reversed in the next 3 years will be a drag on the US economy for decades.


There should be a name for the phenomenon where people upset about some injustice pick the least plausible example to use as the cause celebre of the injustice.

For a more modern take I can't understand why Daniel Shaver is not the face of police murder in the US. The video is on YouTube, you can find the unedited version with a Google search. There is no benefit of the doubt to give. It was straight up murder done on live cam. The more you read the worse it gets.

But it got buried in a week and no one remembers it.


It's unfortunate that the shooter was not convicted, but the mere fact that there was an investigation and a trial differentiates it from a lot of police violence causes célèbres.

what, why?

It’s because there is no controversy. There isn’t anyone arguing that the cop who did it was actually justified. Everyone agrees.

A story lives on when people argue over things. If no one argues the other side of something, the story just kind of fades away.


Elijah McCain wasn't allegedly shooting an air rifle at a motel, so I'd reckon they should be the face.

Right?


> no one remembers it.

That doesn't resonate with my experience. People know about the murder, but aren't sure what to do.

The murderer, who clearly had mental health issues (eg, having "you're fucked" on the dust cover of his personal AR-15, which he used to commit the act), was acquitted (in a trial of strange circumstances). It's baffling that none of his colleagues - who saw the message on his weapon - ever pulled him aside to ask if he was OK.

And anyway, what does this have to do with your point of holding up an unlikely / outlying example to demonstrate a phenomenon?


His colleagues likely didn't find the dust cover noteworthy. Within contemporary American gun culture, it would seem like a minor bit of braggadocio akin to a "Protected by Smith & Wesson" sticker or a "Warning: We Don't Dial 911" placard; tacky and unprofessional, but not something to take seriously. There's a whole little industry around AR-15 customization, offering thousands of options for engraved dust covers with all kinds of symbols and messages:

https://midstatefirearms.com/product/engraved-dust-cover-eje...

https://mcsgearup.com/ar-15-ejection-port-dust-cover-engravi...

https://www.wingtactical.com/firearm-parts/ar-15/upper-recei...

https://cordedarms.com/ExoticCovers1


I am not remotely aware of this case. How does those words, or any words, on a gun case/cover relate to mental health issues? This isn't a manifesto; it is more like "guard dog? Beware of owner!" decal, or a Calvin pissing on a coexist sticker. Or truck nuts. These might be distasteful to some but is unrelated to mental health. I'd be more worried about my former neighbor who had an unhealthy love of maglite flashlights and owned like 50 of 'em. _That_ was strange.

The "you're fucked" was written on the inside of the ejection port dust cover so that it would become visible after the weapon was fired. The implication is that he was eager to shoot someone.

I’d say the medium is the message in this case.

People like flashlights. It's a thing. Just like we like computers.

No shade meant towards odd collections nor truck nuts.

As someone who avoids making threats in public, those stickers and whatnot do leave me concerned about the person's mental health.

You find your neighbours torch collection more worrying than aggressive messages left by someone who went on to kill?

> There should be a name for the phenomenon where people upset about some injustice pick the least plausible example to use as the cause celebre of the injustice.

Perhaps 'The Toxoplasma of Rage'? See https://slatestarcodex.com/2014/12/17/the-toxoplasma-of-rage...

Or you might like https://slatestarcodex.com/2017/04/12/clarification-to-sacre...


Thanks for reminding me of CGP Grey's "This Video Will Make You Angry": https://youtu.be/rE3j_RHkqJc

Now I feel I should rewatch this video annually as a reminder to myself, or maybe monthly.


The surest way to get flamed out of any crypto mailing list was to ask what the effective clearance rate for the coin was, then following it up with how it could be sped up.

Today the bitcoin network is still stuck at ~7 transactions a second.

This is not what the white paper promised.


Thus, other blockchains emerged

Yes, we only need checks notes 3,500 other coins to match the clearance rate of Visa.

And adding more confusion and more problems.

Btw. Criss chain transaction is the major problem this strategy added.


requiring much higher resources to run a node and making them much less decentralized.

Which one are you referring to? What do you want to get all the freedom in the world and no effort for running a decentralized node? Staking blockchains don't require much resources.

The ones that allow hundreds of txs per second, making verification of the entire tx history orders of magnitude harder. The limited tx throughput of bitcoin is a feature, not a bug.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: