I used QGIS 2.x and 3.x a lot when making maps for research papers. But something that always stung was reproducibility. The python tooling was not there compared to what I could do with click-and-mouse, and there was no easy way to transfer my click-and-mouse sessions into an equivalent python script.
Is the situation unchanged? (Maybe a good use for Opus would be to write a wrapper for the python tooling?)
It's one of the challenges with data. It's technically accurate, and it's useful for trends like productivity and output, but only marginally useful as a gauge of the health of the economy. You also have to remember it for the next jobs report.
For me the issue is why there's not a new mini since 5-mini in August.
I have now switched web-related and data-related queries to Gemini, coding to Claude, and will probably try QWEN for less critical data queries. So where does OpenAI fits now?
Are there good open models out there that beat gemini 2.5 flash on price? I often run data extraction queries ("here is this article, tell me xyz") with structured output (pydantic) and wasn't aware of any feasible (= supports pydantic) cheap enough soln :/
Wow. The amount of quasi-xenophobic comments in this tread is nuts. They are also a bit misguided.
You don't hire a professor at a R1 school just to teach math101. You hire them they can build a research lab or otherwise help to advance the frontier of the field (cancer, stats, etc.). The talent pool in several of these fields is very very small for Americans, because the brightest just go (used to go?) to work to finance or tech. So if you say you can't bring any bright foreigners, you are constraining yourself to a lower talent pool than other countries, and thus will pay a price (in less research, in no foreign students applying and thus no $$ from them, etc etc)
There's different visa's for gifted people to come into the US. The H1B is not intended for this purpose you claim. The brightest won't be affected by this.
If you are in early career (i.e. graduated your PhD within the last 5 years) you are extremely unlikely to get the gifted people visa. The standard approach is to just get the H1B (not the lottery stuff for tech companies but the non-lottery one for hiring faculty at universities). Ask any foreign MIT professor hired early in his career and they went through H1B (and later on, they are more reluctant to move into a place like Florida..)
I'm working on a kinda similar project (documenting bank runs from historical newspapers) and also opted for Claude to build a static website. Crazy that the two sites have a very similar look and feel: https://www.finhist.com/bank-runs/index.html . The only big difference is that mine lacks a map, which I should hopefully fix soon (I already have lat and lon and am linking to google maps).
PS: Do you know if mistral works better at OCRing handwritten text than gemini 3? Was planning on going the gemini3 for another project
That's cool! I've noticed when asking for Claude for a website, it does have a certain look, like our two sites, if you don't give it any more guidance. I'm not sure if that's a good thing or not.
Digitizing history in different ways, with different resources that are unique or only known to small groups, might be a new development area, and that's exciting. As I've shown, and how other people have shared, using AI tools to digitize things which haven't previously been done before is now possible. Are there ways to make this easier for everybody? New techniques to discuss? I don't know, and I'd love to talk about it.
Concerning OCR: I used Mistral because of a posting here describing advancements with handwriting recognition a month or so ago. I didn't actually compare them. And I've got my setup that I can rerun everything again later if there are advancements in the area. Again, another area to keep track of and discuss.
It is CPU-based. Somewhere between 1 to 2 seconds per page on a single core. I ran 20 instances of it in parallel to utilize 20 CPU cores so the avg time came down nicely.
That's actually amazing, and might give me a way to use all the cores I have lying around. 2s per page is an insane 600 pages per minute at 20 cores!
Please do open source it, even if you don't do much around it (worst case I can just spend a few million tokens trying to get opus 4.6 to get it to work)
Do any of these models do well with information retrieval and reasoning from text?
I'm reading newspaper articles through a MoE of gemini3flash and gpt5mini, and what made it hard to use open models (at the time) was a lack of support for pydantic.
Is the situation unchanged? (Maybe a good use for Opus would be to write a wrapper for the python tooling?)
reply