Hacker News new | past | comments | ask | show | jobs | submit | silveraxe93's comments login

The way https://gwern.net/ does it is quite good.

The links open in a window, so you can still have centre aligned text with popups.


From Gary Marcus' (notable AI skeptic) predictions of what AI won't do in 2027:

> With little or no human involvement, write Pulitzer-caliber books, fiction and non-fiction.

So, yeah. I know you made a joke, but you have the same issue as the Onion I guess.


I'd be extremely surprised if AI labs are not doing or planning on doing this already.

The same way that reasoning models are trained on chain of thoughts, why not do it with program state?

Just have a "separate" scratchpad where the AI keeps the expected state of the program. You can verify if that is correct or not. Just use RL to train the AI to always have that correct.


Posted 4 days ago:

> Three state of the art VLMs - Claude-3, Gemini-1.5, and GPT-4o

Literally none of those are state of the art. Academia is completely unprepared to deal with the speed Ai develops. This is extremely common in research papers.

That's literally in the abstract. If I can see a completely wrong sentence 5 seconds into reading the paper, why should I read the rest?


What models would you recommend instead, for sophisticated OCR applications?

Honestly I thought Claude-3 and GPT-4o were some of the newest major models with vision support, and that models like o1 and deepseek were more reasoning-oriented than OCR-oriented.


My anecdotal tests and several benchmarks suggest that Qwen2-VL-72b [0] is better than the tested models (even better than Claude 3.5 Sonnet), notably for OCR applications. It has been available since October 2024.

[0]: https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct


For Google, definitely flash-2.0; It's a way better model. GPT-4o is kinda dated now. o1 is the one I'd pick for OpenAI. It's basically their "main" model now.

I'm not that familiar with Claude for vision. I don't think Anthropic focusses on that. But the 3.5 family of models is way better. If 3.5 Sonnet supports vision that's what I'd use


> For Google, definitely flash-2.0;

It was literally launched February 5th, ~10 days ago. I'm no researcher, and I know "academia moves slow" is of course true too, but I don't think we can expect research papers to include things that were launched probably after they finished the reviews of said paper.

Maybe papers aren't the right approach here at all, but I don't feel like it's a fair complaint they don't include models released less than 2 weeks ago.


It was officially launched 10 days ago, but has been openly available for way longer.

Also, this is arxiv. The website that's explicitly about posting research pre peer-review.


> It was officially launched 10 days ago, but has been openly available for way longer.

So for how long? How long did the papers you've written in the past take to write? AFAIK, it takes some time.

And peer-review is not the only review a paper goes through, and was not the reviews I was referring to.


Honestly? I don't know how long it's been available. But I do know it's been some time already. Enough be aware of it when posting this on arxiv.

I'm not even disagreeing that it takes time to write papers, and it's "common" for this to happen. But it's just more evidence for what I said in my original comment:

> Academia is completely unprepared to deal with the speed AI develops


Anthropic has a beta endpoint for PDFs which has produced impressive results for me with long and complex PDFs (tables, charts etc).



They may have been SotA at the moment of writing


Sure, but they posted this 4 days ago. The minimum I'd expect for quality research is for them to skim the abstract before posting and change that line to:

"Models from leading AI labs" or similar. Leaving it like now signals either sloppiness or dishonesty


The speed of publishing is just too slow. If you want to apply any kind of scientific rigor and have your peers check what you're doing (not even doing a full peer review), things take more time than just posting on blogs and iterating.


oftwaresay engineeryay


If they are not listening to you, then I guess you did a great job then ;)


Look, I won't try to convince you're wrong. But if you come to a website with thousands of active users and ask: Do you agree with me? Don't reply if not.

What do you expect? Might as well ask an AI to generate that text, same level of information you'll be getting.


btw I'm not trying to defend LLMs here, I'd make the same comment if you flipped your question.


Yeah the Do you agree with me, don't reply if not seems a bit unsporting on a discussion forum! I mean it's kind of supposed to be about discussing things.

Taking the other side, on one hand I can see "AI that can transform a text into an image etc, but otherwise the benefits of so called AI are completely lost on me" but on the other hand AI overtaking biology is kind of an interesting thing.

And you can of course skip the articles.


I mean we are all susceptible to confirmation bias, OP is just being explicit about it ;)


This is the 40% that OP mentioned. But there's a proportion on people/engineers that are just clueless and are incapable of understanding code. I don't know the proportion so can't comment on the 50% number, but hey definitely exist.

If you never worked with them, you should count yourself lucky.


But the cost is _definitely_ falling. For a recent example, see DeepSeek V3[1]. It's a model that's competitive with GPT-4, Claude Sonnet. But cost ~$6 Million to train.

This is ridiculously cheaper than what we had before. Inference is basically getting an 10x cheaper per year!

We're spending more because bigger models are worth the investment. But the "price per unit of [intelligence/quality]" is getting lower and _fast_.

Saying that models are getting more expensive is confusing the absolute value spent with the value for money.

- [1] https://github.com/deepseek-ai/DeepSeek-V3/tree/main


> We're spending more because bigger models are worth the investment

Are they? Where's the value? What are they being used for actually out there in the real world? Not the shitty apps that simonw bleats about day in day out, not the lame website bots that repeat your FAQ back at me - actual real valuable (to the tune of the billions being invested in them) use cases?


ChatGPT is one of the fastest growing apps ever. Saying that's there's no products is willful blindness by this point.

This is hackernews. I'd expect users to have a basic understanding of VC investment. The expected value of next-gen models times the probability to create them is higher than the billions than they are throwing at it.


> ChatGPT is one of the fastest growing apps ever. Saying that's there's no products is willful blindness by this point.

That's fair, but I think you're being a little uncharitable to the point being made.

I would postulate that most ChatGPT users are not using it in a productive capacity, they're using it as a sort of "google that's better at understanding my queries." Obviously that serves a great niche for lots of people, but I don't think it's what mvdtnz had in mind.


> Inference is basically getting an 10x cheaper per year!

You're gonna need some good citations for that.

There's a big difference between companies saying "The inference costs on our service are down" and the inference costs on the model are down. The former is oft cheated by simplyifying and dumbing down the models used in the service after the initial hype and benchmarks.

> But the "price per unit of [intelligence/quality]" is getting lower and _fast_.

Absolutely not a general trend across models. At best, older models are getting cheaper to run. Newer models are not cheaper "per unit of intelligence". OpenAI's fany new reasoning models are orders of magnitude more expensive to run whilst being ~linear improvements in real world capabilities.


See situational-awareness[1], see the "algorithmic efficiencies" section. He shows many examples of how models are getting cheaper. With many citations.

Costs are not just down on a specific service. Even though I don't see the problem in that, as long as you get the promised level of performance, without being subsidised. See the deepseek model I linked above. It's an open model and you can run it yourself.

> At best, older models are getting cheaper to run.

What's your definition of old here? If you compare the literal bleeding edge model (o3) to 2 years ago best model (GPT-4)? Not only is this a ridiculously misleading comparison, it's not even valid!

o3 is a reasoning model. It can spend money at test time to improve results. Previous models don't even have this capability. You can't look at one example of where they just threw a lot of money and say this is the cost. The cost is unbounded! If they want, they can just not let the model think for ages and have basically "0-thinking" outputs. This is what you use to compare models.

If you compare _todays_ cost for training and inference of a model as good as GPT-4 when it was released, this cost has massively gone down on both counts.

[1] - https://situational-awareness.ai/from-gpt-4-to-agi/#The_tren...


I'm not convinced about that 10 cheaper a year.

Larger models need more memory. I'm willing to bet that most of the tier 1 providers rely on multi-GPU models to serve traffic.

None of that is cheap, 8x GPU nodes that serve less than 20 queries a second are exceedingly expensive to run.


Larger models are more expensive to run (ceteris paribus). But we're seeing we can squeeze more performance from smaller models.

You need to compare like-for-like. You can't say that the cost of building a 5-story apartment is increasing by pointing at the burj khalifa.


Now remind us what HW did we need to run local inference of llama2-69B (July, 2023)? And then contrast it to the HW we need to run llama3.1-70B (July, 2024)? In particular, which optimizations and in what way did they dramatically cut down the cost of the inference?

I seriously don't get this argument and I see it being repeated all over and over again. Although model possibilities are increasing, no doubt in that, HW costs for inference remained the same and they're mostly driven by the amount of (V)RAM you need.


Install `uv` then run `uv run --python 3.13 my_script.py`


Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: