More

zzleeper · 2026-03-07T17:32:33 1772904753

I used QGIS 2.x and 3.x a lot when making maps for research papers. But something that always stung was reproducibility. The python tooling was not there compared to what I could do with click-and-mouse, and there was no easy way to transfer my click-and-mouse sessions into an equivalent python script.

Is the situation unchanged? (Maybe a good use for Opus would be to write a wrapper for the python tooling?)

zzleeper · 2026-03-06T18:34:51 1772822091

Sad/funny that your comment is at the bottom.

Workers on strike are classified as not employed, so yeah we should ignore that category

dehrmann · 2026-03-07T01:32:10 1772847130

It's one of the challenges with data. It's technically accurate, and it's useful for trends like productivity and output, but only marginally useful as a gauge of the health of the economy. You also have to remember it for the next jobs report.

zzleeper · 2026-03-05T18:58:57 1772737137

For me the issue is why there's not a new mini since 5-mini in August.

I have now switched web-related and data-related queries to Gemini, coding to Claude, and will probably try QWEN for less critical data queries. So where does OpenAI fits now?

zzleeper · 2026-03-04T02:25:15 1772591115

Are there good open models out there that beat gemini 2.5 flash on price? I often run data extraction queries ("here is this article, tell me xyz") with structured output (pydantic) and wasn't aware of any feasible (= supports pydantic) cheap enough soln :/

kristianp · 2026-03-04T10:27:03 1772620023

You'll have to try out models on your use case. Openrouter makes that easy.

zzleeper · 2026-03-03T21:29:49 1772573389

Wow. The amount of quasi-xenophobic comments in this tread is nuts. They are also a bit misguided.

You don't hire a professor at a R1 school just to teach math101. You hire them they can build a research lab or otherwise help to advance the frontier of the field (cancer, stats, etc.). The talent pool in several of these fields is very very small for Americans, because the brightest just go (used to go?) to work to finance or tech. So if you say you can't bring any bright foreigners, you are constraining yourself to a lower talent pool than other countries, and thus will pay a price (in less research, in no foreign students applying and thus no $$ from them, etc etc)

maldev · 2026-03-03T21:39:31 1772573971

There's different visa's for gifted people to come into the US. The H1B is not intended for this purpose you claim. The brightest won't be affected by this.

zzleeper · 2026-03-03T21:50:31 1772574631

If you are in early career (i.e. graduated your PhD within the last 5 years) you are extremely unlikely to get the gifted people visa. The standard approach is to just get the H1B (not the lottery stuff for tech companies but the non-lottery one for hiring faculty at universities). Ask any foreign MIT professor hired early in his career and they went through H1B (and later on, they are more reluctant to move into a place like Florida..)

huddert · 2026-03-04T00:30:43 1772584243

[flagged]

ozozozd · 2026-03-04T06:23:53 1772605433

Millions? Look up how many H1Bs are issued every year.

Hint: it’s low 6 figures, some years falling to 5 figures.

peyton · 2026-03-04T06:34:36 1772606076

India’s Ministry of External Affairs counts 5,409,062 [1]. IIRC they have a big party every year to celebrate.

[1]: https://www.mea.gov.in/population-of-overseas-indians.htm

zzleeper · 2026-02-17T06:41:43 1771310503

That's amazing!

I'm working on a kinda similar project (documenting bank runs from historical newspapers) and also opted for Claude to build a static website. Crazy that the two sites have a very similar look and feel: https://www.finhist.com/bank-runs/index.html . The only big difference is that mine lacks a map, which I should hopefully fix soon (I already have lat and lon and am linking to google maps).

PS: Do you know if mistral works better at OCRing handwritten text than gemini 3? Was planning on going the gemini3 for another project

dogline · 2026-02-17T16:19:31 1771345171

That's cool! I've noticed when asking for Claude for a website, it does have a certain look, like our two sites, if you don't give it any more guidance. I'm not sure if that's a good thing or not.

Digitizing history in different ways, with different resources that are unique or only known to small groups, might be a new development area, and that's exciting. As I've shown, and how other people have shared, using AI tools to digitize things which haven't previously been done before is now possible. Are there ways to make this easier for everybody? New techniques to discuss? I don't know, and I'd love to talk about it.

Concerning OCR: I used Mistral because of a posting here describing advancements with handwriting recognition a month or so ago. I didn't actually compare them. And I've got my setup that I can rerun everything again later if there are advancements in the area. Again, another area to keep track of and discuss.

zzleeper · 2026-02-17T20:53:25 1771361605

Thanks for the insights! I'll try Mistral as well.. Gemini worked well for me so far but which model is SOTA is changing quite frequently these days

zzleeper · 2026-02-11T18:03:46 1770833026

Surprisingly, I have a few hundred gigs of old newspaper scans so am very curious.

How fast was it per page? Do you recall if it's CPU or GPU based? TY!

Stagnant · 2026-02-11T20:59:43 1770843583

It is CPU-based. Somewhere between 1 to 2 seconds per page on a single core. I ran 20 instances of it in parallel to utilize 20 CPU cores so the avg time came down nicely.

zzleeper · 2026-02-17T02:24:02 1771295042

That's actually amazing, and might give me a way to use all the cores I have lying around. 2s per page is an insane 600 pages per minute at 20 cores!

Please do open source it, even if you don't do much around it (worst case I can just spend a few million tokens trying to get opus 4.6 to get it to work)

zzleeper · 2026-02-10T21:23:59 1770758639

I just spent the last 20 minutes trying to debug why my non-www URL wasn't working and my www was. Oh well now I know.

zzleeper · 2026-02-03T16:21:42 1770135702

I only see one redaction on page 128 (122 as in top-right of page), and it's just a URL. So there's a rule to redact URLs.

zzleeper · 2026-01-30T22:46:28 1769813188

Do any of these models do well with information retrieval and reasoning from text?

I'm reading newspaper articles through a MoE of gemini3flash and gpt5mini, and what made it hard to use open models (at the time) was a lack of support for pydantic.

jychang · 2026-01-30T22:49:24 1769813364

That roughly correlates with tool calling capabilities. Kimi K2.5 is a lot better than previous open source models in that regard.

You should try out K2.5 for your use case, it might actually succeed where previous generation open source models failed.