Hacker Newsnew | past | comments | ask | show | jobs | submit | XCSme's commentslogin

Not sure if it's a joke, but I don't think LLM is a bijective function.

If you had all the token probabilities it would be bijective. There was a post about this here some time back.

Kind of, LLMs still use randomness when selecting tokens, so the same input can lead to multiple different outputs.

I am not sure what's the stereotype, but I tried using langchain and realised most of the functionality actually adds more code to use than simply writing my own direct API LLM calls.

Overall I felt like it solves a problem doesn't exist, and I've been happily sending direct API calls for years to LLMs without issues.


JSON Structured Output from OpenAI was released a year after the first LangChain release.

I think structured output with schema validation mostly replaces the need for complex prompt frameworks. I do look at the LC source from time to time because they do have good prompts backed into the framework.


IME you could get reliable JSON or other easily-parsable output formats out of OpenAI's going back at least to GPT3.5 or 4 in early 2023. I think that was a bit after LangChain's release but I don't recall hitting problems that I needed to add a layer around in order to do "agent"-y things ("dispatch this to this specialized other prompt-plus-chatgpt-api-call, get back structured data, dispatch it to a different specialized prompt-plus-chatgpt-api-call") before it was a buzzword.

Can guarantee this was not true for any complicated extraction. You could reliably get it to output json but not the json you wanted

Even on smallish ~50k datasets error was still very high and interpretation of schema was not particularly good.


It's still not true for any complicated extraction. I don't think I've ever shipped a successful solution to anything serious that relied on freeform schema say-and-pray with retries.

To this day many good models don't support structured outputs (say Opus 4.5) so it's not a panacea you can count on in production.

The bigger problem is that LangChain/Python is the least set up to take advantage of strong schemas even when you do have it.

Agree about pillaging for prompts though.


> so it's not a panacea you can count on in production.

OpenAI and Gemini models can handle ridiculously complicated and convoluted schemas, if I needed complicated JSON output I wouldn’t use anything that didn’t guarantee it.

I have pushed Gemini 2.5 Pro further than I thought possible when it comes to ridiculously over complicated (by necessity) structured output.


100% Gemini + pydantic you don’t need a wrapper library in 2025

When my company organized an LLM hackathon last year, they pushed for LangChain.. but then instead of building on top of it I ended up creating a more lightweight abstraction for our use-cases.

That was more fun than actually using it.


I am coaching table-tennis, and sometimes I tell people that we only actually "learn" while we sleep. So, without sleeping, the brain doesn't have time to "save" the new information for future use.

Not sure if it's factually correct, but it seems about right, sleeping seems to be the magic sauce, and the time when all memories are written from RAM to disk.


Only 10b active params and close to SOTA?

Funny how they didn't include Gemini 3.0 Pro in the bar chart comparison, considering that it seems to do the best in the table view.

Also, funny how they included GPT-5.0 and 5.1 but not 5.2... I'm pretty sure they ran the benchmarks for 5.0, then 5.1 came out, so they ran the benchmarks for 5.1... and then 5.2 came out and they threw their hands up in the air and said "fuck it".

gpt-5.2 codex isn't available in the API yet.

If you want to be picky they could've compared it against gpt-5 pro gpt-5.2 gpt-5.1 gpt-5.1-codex-max gpt-5.2 pro

all depending on when they ran benchmarks (unless, of course, they are simply copying OAI's marketing).

At some point it's enough to give OAI a fair shot and let OAI come out with their own PR, which they doubtlessly will.


I didn't even notice that, I assumed it was the latest GPT version.

after or before running the benchmarks?

Gemini is garbage and does it's own thing most of the time ignoring the instructions

Good point. It also reminded me of when I was trying to optimize my app for some scenarios, then I realized it's better to optimize it for ALL scenarios, so it works fast and the servers can handle no matter what. To be more specific, I decided NOT to cache any common queries, but instead make sure that all queries are fast as possible.

I've recently added error tracking to my self-hosted analytics app (UXWizz), and the way I did it is simply add extra events to each user/session. Once you have the concept of a session or user, you can simply attach errors or logs as Events stored for that user. This solves the main problem mentioned in the article, where you don't know what happened, plus being an Event stored in a MySQL database, you can still query it.

Why not simply use Events for logging, instead of plain strings?


I hope they'll release a new AM4 CPU

Something like 5900x on 2nm or 4nm


Awesome!

Is it actually working well? Not really, at least not at this stage. But it's cool to see a new UX idea.


Anyone else getting ERR_TOO_MANY_REDIRECTS trying to access the post?


Yes only on that page, not the rest of his blog. Guessing he ansible’d it to redirect ;)

It's working now.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: