> Yes – Chromium now ships a tiny on‑device sentence‑embedding model, but it’s strictly an internal feature.
What it’s for
“History Embeddings.” Since ~M‑128 the browser can turn every page‑visit title/snippet and your search queries into dense vectors so it can do semantic history search and surface “answer” chips. The whole thing is gated behind two experiments:
This declarative system often is less intuitive once you get into the nitty gritty of things imo. Showing the hello world most basic example proves nothing
It's worse IMO. In a real three.js example, to repo their example, you'd make one BoxGeometry and 2 Mesh(es) But that wouldn't be idomatic React so they don't show it.
Listen I don't blame any mortal being for not grokking the AWS and Google docs. They are a twisting labyrinth of pointers to pointers some of them deprecated though recommended by Google itself.
Their website has an awful UX in which if you deny cookies they hard refresh the page. Maybe someone decided to do that with no oversight in their managerless utopia
So in my interactions with gpt, o3 and o4 mini, I am the organic middle man that copy and pastes code into the repl and reports on the output back to gpt if anything should be the problem. And for me, past a certain point, even if you continually report back problems it doesn't get any better in its new suggestions. It will just spin its wheels. So for that reason I'm a little skeptical about the value of automating this process. Maybe the llms you are using are better than the ones I tried this with?
Specifically I was researching a lesser known kafka-mqtt connector: https://docs.lenses.io/latest/connectors/kafka-connectors/si..., and o1 was hallucinating the configuration needed to support dynamic topics. The docs said one thing, and I even mentioned it to o1 that the docs contradicted with it. But it would stick to its guns. If I mentioned that the code wouldn't compile it would start suggesting very implausible scenarios -- did you spell this correctly? Responses like that indicate you've reached a dead end. I'm curious how/if the "structured LLM interactions" you mention overcome this.
> And for me, past a certain point, even if you continually report back problems it doesn't get any better in its new suggestions. It will just spin its wheels. So for that reason I'm a little skeptical about the value of automating this process.
It sucks, but the trick is to always restart the conversations/chat with a new message. I never go beyond one reply, and also copy-paste a bunch. Got tired of copy-pasting, wrote something like a prompting manager (https://github.com/victorb/prompta) to make it easier, and not having to neatly format code blocks and so on.
Basically make one message, if they get the reply wrong, iterate on the prompt itself and start fresh, always. Don't try to correct by adding another message, but update initial prompt to make it clearer/steer more.
But I've noticed that every model degrades really quickly past the initial reply, no matter what length of each individual message. The companies seem to continue to increase the theoretical and practical context limits, but the quality degrades a lot faster even within the context limits, and they don't seem to try to address that (nor have a way of measuring it).
This is my experience as well, as has been for over a year now.
LLMs are so incredibly transformative when they're incredibly transformative. And when they aren't it's much better to fall back on the years of hard won experience I have - the sooner the better. For example I'll switch between projects and languages and even with explicit instruction to move to a strongly typed language they'll stick to dynamic answers. It's an odd experience to re-find my skills every once in a while. "Oh yeah, I'm pretty good at reading docs myself".
With all the incredible leaps in LLMs being reported (especially here on HN) I really haven't seen much of a difference in quite a while.
In other words don't use the context window. Treat it as a command line with input/output, in which the purpose of the command is to extract information signal or knowledge manipulation or data mining and so on.
Also special care has to be given to the number of tokens. Even with one-question/one-answer, 5 hundred to 1 thousand tokens can be focused at once by our artificial overlords. After that they start losing their marbles. There are exceptions to that rule with the reasoning models, but in essence they are not that different.
The difference of using the tool correctly versus not, might be that instead of getting 99.9% accuracy, the user gets just 98%. Probably that doesn't sound that big of a difference to some people. The difference is that it works 10 times better in the first case.
People keep throwing these 95%+ accuracy rates for LLMs in these discussions, but that is nonsense. It's closer to 70%. It's quite terrible. I use LLMs but I never trust them beyond just doing some initial search if I am stumped, and when it unblocks me I immediately put it down again. It's not transformative, it's merely replacing google because search there has sucked for a while.
95% accuracy VS 70% accuracy, both numbers are pulled out of someone's ass and serves little to the discussion at hand. How did you measure that, or rather since you didn't, what's the point of sharing this hypothetical 25% difference?
And how funny that you comment seems to land perfectly together with this about people having very different experiences with using LLMs:
> I am still trying to sort out why experiences are so divergent. I've had much more positive LLM experiences while coding than many other people seem to, even as someone who's deeply skeptical of what's being promised about them. I don't know how to reconcile the two.
It works very well (99.9%), when the problem resides at a familiar territory of the user's knowledge. When i know enough about a problem, i know how to decompose it into smaller pieces, and all (most?) smaller pieces have been already solved countless of times.
When a problem is far outside of my understanding, A.I. leads me towards a wrong path more often than not. Accuracy is terrible, because i don't know how to decompose the problem.
Jargon plays a crucial role there. LLM's need to guided using as much correct jargon of the problem as possible.
I have done this for decades on people. I read a book at some point that the most sure way for people to like you, is to speak to them in words they usually use themselves. No matter the concepts they are hearing with their ears, if the words belong belong in their familiar vocabulary they are more than happy to discuss anything.
So when i meet someone, i always try to absorb as much of their vocabulary as possible, as quickly as possible, and then i use it to describe ideas i am interested in. People understand much better like that.
Anyway, the same holds true for LLM's, they need to hear the words of the problem, expressed in that particular jargon. So when a programmer wants to use a library, he needs to absorb the jargon used in that particular library. It is only then that accuracy rates hit many nines.
I will walk around the gratuitous rudeness and state the obvious:
No, the pretend above 95% accuracy is not as good as the up to 50% rates of hallucinations reported by OpenAI for example.
The difference in experiences is easily explainable in my opinion. Much like some people swear by mediums and psychics and other easily see through it: it's easy to see what you want to see when a nearly random experience lands you a good outcome.
I don't appreciate your insinuation that I am making up numbers and I though it shouldn't go unanswered but do not mistake this for a conversation. I am not in the habit of engaging with such demeaning language.
It is "Gratuitous rudeness" to say these numbers without any sort of sourcing/backing are pulled from someone's ass? Then I guess so be it, but I'm also not a fan of people speaking about absolute numbers as some sort of truth, when there isn't any clear way of coming up with those numbers in the first place.
Just like there are "extremists" claiming LLMs will save us all, clearly others fall on the other extreme and it's impossible to have a somewhat balanced conversation with either of these two groups.
This has largely been my experience as well, at least with GH Copilot. I mostly use it as a better Google now, because even with context of my existing code, it can't adhere to style at all. Hell, it can't even get docker compose files right, using various versions and incorrect parameters all the time.
I also noticed language matters a lot as well. It's pretty good with Python, pandas, matplotlib, etc. But ask it to write some PowerShell and it regularly hallucinates modules that don't exist, more than any other language I've tried to use it with.
And good luck if you're working with a stack that's not flavor of the month with plenty of online information available. ERP systems with its documentation living behind a paywall, so it's not in the training data - you know, real world enterprise CRUD use cases where I'd want to use it the most, it's the least helpful.
To be fair, I find ChatGPT useful for Elixir, which is pretty niche. The great error messages (if a bit verbose) and the atomic nature of functions in a functional language goes with the grain of LLMs I think.
Still, at most I get it to help me with snippets. I wouldn't want it to just generate lots of code, for one it's pretty easy to write Elixir...
I think “don’t use the context window” might be too simple. It can be incredibly useful. But avoid getting in a context window loop. When iterations stop showing useful progress toward the goal, it’s time to abandon the context. LLMs tend to circle back to the same dead end solution path at some point. It also helps to jump between LLMs to get a feel for how they perform on different problem spaces.
Depends on the use case. For programming where every small detail might have huge implications, 98% accuracy vs 99.99% is a ginormous difference.
Other tasks can be more forgiving, like writing which i do all the time, then i load 3000 tokens in the context window pretty frequently. Small details in the accuracy do not matter so much for most people, for everyday casual tasks like rewriting text, summarizing etc.
In general, be wary of how much context you load into the chat, performance degrades faster than you can imagine. Ok, the aphorism i started with, was a little simplistic.
Oh sure. Context windows have been less useful for me on programming tasks than other things. When working iteratively against a CSV file for instance, it can be very useful. I’ve used something very similar to the following before:
“Okay, now add another column which is the price of the VM based on current Azure costs and the CPU and Memory requirements listed.”
“This seems to only use a few Azure VM SKU. Use the full list.”
“Can you remove the burstable SKU?”
Though I will say simple error fixes with context windows with programming issues are resolved fine. On more than one occasion when I copy and paste an incorrect solution. Providing the LLM with the error is enough to fix the problem. But if it goes beyond that, it’s best to abandon the context.
I'm in the middle of test-driving Aider and I'm seeing exactly the same problem, the longer a conversation goes on, the worse the quality of replies... Currently, I'm doing something like this to prevent it from loading previous context:
I refused to stop being a middle man, because I can often catch really bad implementation early and can course correct. E.g. a function which solves a problem with a series of nested loops that can be done several orders of magnitude faster by using vectorised operations offered by common packages like numpy.
Even with all the coding agent magik people harp on about, I've never seen something that can write clean good quality code reliably. I'd prefer to tell an LLM what a functions purpose is, what kind of information and data structures it can expect and what it should output, see what it produces, provide feedback, and get a rather workable often perfect function in return.
If i get it to write the whole thing in one go, I cannot imagine the pain of having to find out where the fuckery is that slows everything down, without diving deep with profilers etc. all for a problem I could have solved by just playing middle man and keeping a close eye on how things are building up and being in charge of ensuring the overarching vision is achieved as required.
It seems no discussion of LLMs on HN these days is complete without a commenter wryly observing how that one specific issue someone is pointing to with an LLM is also, funnily enough, an issue they've seen with humans. The implication always seems to be that this somehow bolsters the idea that LLMs are therefore in some sense and to some degree human-like.
Humans not being infallible superintelligences does not mean that the thing that LLMs are doing is the same thing we do when we think, create, reason, etc. I would like to imagine that most serious people who use LLMs know this, but sometimes it's hard to be sure.
Is there a name for the "humans stupid --> LLMs smart" fallacy?
> The implication always seems to be that this somehow bolsters the idea that LLMs are therefore in some sense and to some degree human-like.
Nah, it's something else: it's that LLMs are being held to a higher standard than humans. Humans are fallible, and that's okay. The work they do is still useful. LLMs do not have to be perfect either to be useful.
The question of how good they are absolutely matters. But some error isn't immediately disqualifying.
I agree that LLMs are useful, in many ways, but think that people are in fact often making the stronger claim which I refer to in your quote from my original point. If the argument were put forward simply to highlight that LLMs, while fallible, are still useful, I would see no issue.
Yes, humans and LLMs are fallible, and both useful.
I'm not saying the comment I responded was an egregious case of the "fallacy" I'm wondering about, but I am saying that I feel like it's brewing. I imagine you've seen the argument that goes:
Anne: LLMs are human-like in some real, serious, scientific sense (they do some subset of reasoning, thinking, creating, and it's not just similar, it is intelligence)
Billy: No they aren't, look at XYZ (examples of "non-intelligence", according to the commenter).
Anne: Aha! Now we have you! I know humans who do XYZ! QED
I don't like Billy's argument and don't make it myself, but the rejoinder which I feel we're seeing often from Anne here seems absurd, no?
I think it's natural for programmers to hold LLMs to a higher standard, because we're used to software being deterministic, and we aim to make it reliable.
well they try to copy humans and humans on the internet are very different creatures from humans in face to face interaction. So I see the angle.
It is sad that, inadvertendly or not, LLMs may have picked up on the traits of the loudest humans. abrasive, never admitting fault, always trying to bring something up that sounds plausible but falls under scrutiny. Only thing it holds back on is resorting to insults when cornered.
> the idea that LLMs are therefore in some sense and to some degree human-like.
This is 100% true, isn't it? It is based on the corpus of humankind knowledge and interaction, it is only expected that it would "repeat" human patterns. It also makes sense that the way to evolve the results we get from it is to mimic human organization, politics, sociology in the a new layer on top of LLMs to surpass current bottlenecks, just like they were used to evolve human societies.
>It is based on the corpus of humankind knowledge and interaction
Something being based on X or using it as source material doesen't guarantee any kind of similarity though. My program can also contains the entire text of wikipedia, and only ever outputs the number 5.
I'd love a further description of how you can have a program with the entire text of wikipedia that only ever outputs 5. It is not immediately obvious to me how that is possible.
Assuming the text of wikipedia is meaningfully used in the program, of course. A definition of "meaningful" I will propose is code which survives an optimization loop into the final resulting machine code and isn't hidden behind some arbitrary conditional. That seems reasonable as a definition of a program "containing" something.
You can have agent search the web for documentation and then provide it to the LLM. That is how Context7 is currently very popular in the AI user crowd.
I used o4 to generate nixos config files from the pasted modules source files. At first it did outdated config stuff, but with context files it worked very good.
Kagi Assistant can do this too but I find it's mostly useful because the traditional search function can find the pages the LLM loaded into its context before it started to output bullshit.
It's nice when the LLM outputs bullshit, which is frequent.
Seriously Cursor (using Claude 3.5) does this all the time. It ends up with a pile of junk because it will introduce errors while fixing something, then go in a loop trying to fix the errors it created and slap more garbage on those.
Because it’s directly editing code in the IDE instead of me transferring sections of code from a chat window the large amount of bad code it writes it much more apparent.
Gemini 2.5 got into as close to a heated argument with me as possible about the existence of a function in the kotlin coroutines library that was never part of the library (but does exist as a 5 year old PR still visible in github that was never merged in).
It initially suggested I use the function as part of a solution, suggesting it was part of the base library and could be imported as such. When I told it that function didn't exist within the library it got obstinate and argued back and forth with me to the point where it told me it couldn't help me with that issue anymore but would love to help me with other things. It was surprisingly insistent that I must be importing the wrong library version or doing something else wrong.
When I got rid of that chat's context and asked it about the existence of that function more directly without the LLM first suggesting its use to me, it replied correctly that the function doesn't exist in the library but the concept being easy to implement... the joys(?) of using an LLM and having it go in wildly different directions depending upon the starting point.
I'm used to the opposite situation where an LLM will slide into sycophantic agreeable hallucinations so it was in a way kind of refreshing for Gemini to not do this, but on the other hand for it to be so confidently and provably wrong (while also standing its ground on its wrongness) got me unreasonably pissed off at it in a way that I don't experience when an LLM is wrong in the other direction.
That we're getting either sycophantic or intransigent hallucinations point to two fundamental limitations: there's no getting rid of hallucinations, and there's a trade-off in observed agreement "behavior".
Also, the recurring theme of "just wipe out context and re-start" places a hard ceiling on how complex an issue the LLM can be useful for.
> It will just spin its wheels. So for that reason I'm a little skeptical about the value of automating this process.
Question is would you rather find out it got stuck in a loop with 3 minutes with a coding agent or 40 minutes copy pasting. It can also get out of loops more often by being able to use tools to look up definitions with grep, ctags or language server tools, though you can copy paste commands for that too it will be much slower.
Nothing I guess? Except that they will continue to be vetted after being hired for the quality of their work.
just spitballing but even if someone has a remote computer after getting hired, and is onboarded they should not have access to sensitive systems. So while you can't completely prevent the possibility of hiring a malicious actor security should not simply be on/off. The register article mentioned how after these devs were hired they were immediately able to kick off their plans. I think security is not structured properly if that is the case.
well if you are still gonna browse on chrome don't settle for the ublock originless experience.
* download a release zip: https://github.com/gorhill/ublock/releases (expand Assets).
* go to chrome://extensions, toggle developer mode on
* click load unpacked and select the file you unzipped the release
then you also have to watch out because chrome will, still time later, disable ublock origin. You have to go to your extensions page and find the option for 'Keep it for now' or something. Then you can continue to browse the internet like a real gee! Thanks ublock origin!
reply