Hacker News new | past | comments | ask | show | jobs | submit | redman25's comments login

It's not that simple how would the model know when it knows? Removing hallucination has to be a post-training thing because you need to test the model against what it actually knows first in order to provide training examples of what it knows and doesn't know and how to respond in those circumstances.

I think only 2/3 of ram is allocated to be available to the gpu, so like 14gb which is probably not enough to run even Q4 quant.


This is configurable by the way.

sudo sysctl iogpu.wired_limit_mb=12345


You could always run your own server locally if you have a decent gpu. Some of the smaller LLMs are getting pretty good.


Correct. My dusty Intel Nuc is able to run a decent 3B model(thanks to ollama) with fans spinning but does not affect any other running applications. It ks very useful for local hobby projects. Visible lags and freezes begin if I start a 5B+ model locally.


Also M-series Macs have an insane price/performance/electricity consumption ratio in LLM use-cases.

Any M-series Mac Mini can run a pretty good local model with usable speed. The high-end models easily compete with dedicated GPUs.


Yes - of course. That's been my experience with "ultimate" privacy.


I spent a couple weekends trying to reimplement microsoft's inferencing for phi4 multimodal in rust. I had zero experience messing with ONNX before. Claude produced a believably good first pass but it ended up being too much work in the end and I've put it down for the moment.

I spent a lot of time fixing Claude's misunderstanding of the `ort` library, mainly because of Claude's knowledge cutoff. In the end, the draft just wasn't complete enough to get working without diving in really deep. I also kind of learned that ONNX probably isn't the best way to approach these things anymore. Most of the mindshare is around the python code and torch apis.


This is an interesting.

AI leads to more useless dives down into the internets.


Doesn't 8B need at least 16gb of ram? Otherwise, your swapping I would imagine...


Depends on quantization selected - see https://www.canirunthisllm.net/


Not the exact same comparison but I have an M1 mac with 16gb ram and can get about 10 t/s with a 3B model. The same model on my 3060ti gets more than 100 t/s.

Needless to say, ram isn't everything.


Could you say what exact model+quant you're using for that specific test + settings + runtime? Just so I could try to compare with other numbers I come across.


Minneapolis, Chicago, a lot of less temperate cities have protected walking tunnels, either underground or protected by buildings.


I do web dev and prefer safari. It’s less power hungry on a laptop. If you develop for the lowest common denominator, every other browser just works.

There’s something to be said for adding _less_ features to web browsers. A simple web is a web where open source solutions can compete with chrome. It helps avoid a browser monoculture.


Yeah but we’re not in a simple web. Safari isn’t trying to make it simple for you, they just use it to keep you in their walled garden.

PWAs are a great example of this. The general idea was even a product of the Jobs era of Apple, before the App Store.

Safari and iOS’s poor support of PWAs is exactly a result of Apple wanting to prevent app distribution channels other than their App Store.

Choosing safari is choosing a more closed web, where missing standards means that you, as a safari user, are unable to access certain features or apps that Apple doesn’t want you to access.


I get that Apple shouldn't be allowed to dictate which features belong on the web, but you're just letting Google to it instead if you opt to use Chrome.

Maybe I don't care about PWAs (which I don't), maybe I don't feel like the majority of APIs introduced lately belongs in the browser. There's very little of what you should be able to do on the web that you could not do 10 years ago. Yes, flexbox is awesome, let's have that, so is the dialog tag. WebGL, Blutooth, USB, device memory, battery status... No, the browser doesn't need to support that.


The browser doesn't need to support WebGL?! Together with Web Audio and Web Assembly, they are exactly the kind of technologies allowing desktop-class apps like Figma to run in the trustworthy browser sandbox.

I don't want to have to install more apps, and more web APIs means more things can be done inside of the browser shell without having to install software that could literally do anything on your computer (and is often hard to remove completely when you're done).


> exactly a result of Apple wanting to prevent app distribution channels other than their App Store

Nah, it's because PWAs were a standard created by Google, with Google being the primary market driver, solely with the interest of advancing Google's own interests (both in undermining Apple, and making lower-end devices more usable in developing markets). Don't think Google invented PWAs, or heavily pushed them, out of some charity. Notice also that as the ultra-cheap phones (~$100) have become more powerful, and as Apple refused to take the bait, that Google's efforts behind PWA have mostly ended.

The same goes for RCS, even though it was initially made by a neutral vendor forum. Google became the heavy pusher of RCS, not just for the sake of Android, but because they had carrier deals to use Google's own infrastructure (Jibe) for RCS, and forcing Apple to accept and integrate with their own infrastructure is a better position to be in.

This is also, let's be clear, not the first time that Google has tried an "open" standard to advance their interests and bludgeon competition. AMP is what happens when the standard catastrophically fails.


Out of curiosity, what's bad about Safari's PWA support? I use it for a bunch of things from Plex to my NAS, and never ran into any issues. In fact, it works better than Windows, in that Safari will open external links in PWAs in my default browser (on Windows, if you use Edge for PWAs, it will insist on opening itself for external links as well).


Missing standards like Manifest V3 that everyone was crying out for?


What is the metric for LLMs? Shouldn't more than just accuracy be measured? If something has high accuracy but low recall, won't it be overfit and fail to generalize? Your metrics would give you false confidence in how effective your model is. Just wondering because the announcement only seems to mention accuracy.


Good point, we should provide more detailed metrics. Since we are very early, we focus on the main metric in our view: higher accuracy of changes to be more practically usable. We will do more testing on overfitting and how the model performance on different types of tasks. On high level we believe in the idea of "a well fine-tuned model should be much better than a large general model". But we need more metrics, I agree.


Why does AI have to be smarter than the collective of hummanity in order to be considered intelligent? It seems like we keep raising the bar on what intelligence means ¯\_(ツ)_/¯


A machine that synthesizes all human knowledge really ought to know more than an individual in terms of intellect. An entity with all of human intellect prior to 1905 does not need to be as intelligent as a human to make discoveries that mere humans with limited intellect made. Why lower the bar?


The heightening of the bar is an attempt to deny that milestones were surpassed and to claim that LLMs are not intelligent.

We had a threshold for intelligence. An LLM blew past it and people refuse to believe that we passed a critical milestone in creating AI. Everyone still thinks all an LLM does is regurgitate things.

But a technical threshold for intelligence cannot have any leeway for what people want to believe. They don’t want to define an LLM as intelligent even if it meets the Turing test technical definition of intelligence so they change the technical definition.

And then they keep doing this without realizing and trivializing it. I believe humanity will develop an entity smarter than humans but it will not be an agi because people keep unconsciously moving the goal posts and changing definitions without realizing it.


Disagree. The AI we have is very useful for specific things. The pushback you see is not so much denying the milestones that have been surpassed, but rather the milestones that enthusiasts claim are near. And for good reason! Every time and in every field we’ve extrapolated an exponential-looking curve ad infinitum, it’s turned out to be S-shaped, and life goes on.

> We had a threshold for intelligence.

We’ve had many. Computers have surpassed several barriers considered to require intelligence such as arithmetic, guided search like chess computers, etc etc. the Turing test was a good benchmark because of how foreign and strange it was. It’s somewhat true we’re moving the goalposts. But the reason is not stubbornness, but rather that we can’t properly define and subcategorize what reason and intelligence really is. The difficulty to measure something does not mean it doesn’t exist or isn’t important.

Feel free to call it intelligence. But the limitations are staggering, given the advantages LLMs have over humans. They have been trained on all written knowledge that no human could ever come close to. And they still have not come up with anything conceptually novel, such as a new idea or theorem that is genuinely useful. Many people suspect that pattern matching is not the only thing required for intelligent independent thought. Whatever that is!


If you consider that evolution has taken millions of years to produce intelligent humans--that LLM training completed in a manner of months can produce parrots of humans is impressive by itself. Talking with the parrot is almost indistinguishable from talking with a real human.

As far as pattern matching, the difference I see from humans is consciousness. That's probably the main area yet to be solved. All of our current models are static.

Some ideas for where that might be headed:

- Maybe all it takes is to allow an LLM to continuously talk with itself much like how humans have "the milk man's voice".

- Maybe we might need to allow LLMs to update their own weights but that would also require an "objective" which might be hard to encode.


> If you consider that evolution has taken millions of years to produce intelligent humans--that LLM training completed in a manner of months can produce parrots of humans is impressive by itself.

I disagree that such a comparison is useful. Training should be compared to training, and LLM training feeds in so many more words than a baby gets. (A baby has other senses but it's not like feeding in 20 years of video footage is going to make an LLM more competent.)


No, a baby is pre-trained. We know from linguistics that there is a natural language grammar template all humans follow. This template is intrinsic to our biology and is encoded and not learned through observation.


A baby has a template but so does an LLM.

The better comparison to the templating is all the labor that went into making the LLM, not how long the GPUs run.

Template versus template, or specific training versus specific training. Those comparisons make a lot more sense than going criss-cross.


The template is what makes the training process so short for humans. We need minimal data and we can run off of that.

Training is both longer and less effective for the LLM because there is no template.

To give an example suppose it takes just one picture for a human to recognize a dog and it takes 1 million pictures for a ML model to do the same. What I’m saying is that it’s like this because humans come preprogrammed with application specific wetware to do the learning and recognition as a generic operation. That’s why it’s so quick. For AI we are doing it as a one shot operation on something that is not application specific. The training takes longer because of this and is less effective.


I disagree that an LLM has no template, but this is getting away from the point.

Did you look at the post I was replying to? You're talking about LLMs being slower, while that post was impressed by LLMs being "faster".

They're posing it as if LLMs recreate the same templating during their training time, and my core point is disagreeing with that. The two should not be compared so directly.


They are slower. In theory these LLMs with all the right weights can have intelligence superior or equivalent to humans.

But the training never gets there. It’s so slow it never reaches human intelligence even though we know these networks can compute anything.


> It’s somewhat true we’re moving the goalposts. But the reason is not stubbornness, but rather that we can’t properly define and subcategorize what reason and intelligence really is.

Disagree. Intelligence is a word created by humans. The entire concept is made up and defined by humans. It is not some concept that exists outside of that. It is simply a collection of qualities and features we choose to define as a word “intelligent”. The universe doesn’t really have a category or a group of features that is labeled intelligent. Does it use logic? Does it have feelings? Can it talk? Can it communicate? We define the features and we choose to put each and every feature under a category called “intelligence”.

Therefore when we define the “Turing test” as a benchmark for intelligence and we then invalidate it, it is indeed stubbornness and a conscious choice to change a definition of a word we Originally made up in the first place.

What you don’t realize is this entire thing is a vocabulary problem. When we argue what is conscious or what is intelligent we are simply arguing for what features belong in what categories we made up. When the category has blurry or controversial boundaries it’s because we chose the definition to be fuzzy. These are not profound discussions. They are debates about language choice. We are talking About personal definitions and generally accepted definitions both of which are completely chosen and made up by us. It is not profound to talk about things that are simply arbitrary choices picked by humans.

That being said we are indeed changing the goal posts. We are evolving our own chosen definitions and we very well may eventually change the definition of intelligence to never include any form of thinking machine that is artificially created. The reason why we do this is a choice. We are saying, “hey these LLMs are not anything amazing or anything profound. They are not intelligent and I choose to believe this by changing and evolving my own benchmark for what is intelligent.”

Of course this all happens subconsciously based off of deeply rooted instincts and feelings. It’s so deep that it’s really hard to differentiate the instincts between rational thinking. When you think logically, “intelligence” is just a word with an arbitrary definition. An arbitrary category. But the instincts are so strong that you literally spent your entire life thinking that intelligence like god or some other common myth made up by humans is some concept that exists outside of what we make up. It’s human to have these instincts, that’s where religion comes from. What you don’t realize is that it’s those same instincts fueling your definition of what is “intelligent”.

Religious people move the goal posts too. When science establishes things in reality like the helio centricity of the solar system religious people need to evolve their beliefs in order to stay inline with reality. They often do this by reinterpreting the Bible. It’s deeply rooted instincts that prevent us from thinking rationally and it effects the great debate we are having now on “what is intelligence?”.


Since we know an LLM does indeed simply regurgitate data, having it pass a "test for intelligence" simply means that either the test didn't actually test intelligence, or that intelligence can be defined as simply regurgitating data.


Intelligence is debateble without even bringing ai into it. Nobody agrees on whether humans have intelligence. Well, smart people agree but those people also agree we have or will soon have agi or something negligibly different from it.


> Intelligence is debateble without even bringing ai into it. Nobody agrees on whether humans have intelligence.

Yep, that constitutes the second of the two options I mentioned.

> Well, smart people agree but those people also agree we have or will soon have agi or something negligibly different from it.

lol, the ol' "I know what all smart people think and it's what I think" appeal.


> The heightening of the bar is an attempt to deny that milestones were surpassed and to claim that LLMs are not intelligent.

That was never "the bar"; nobody denies that milestones have been surpassed; none of those milestones are relevant to the question of intelligence.

> We had a threshold for intelligence. An LLM blew past it and people refuse to believe

Have you ever actually looked at contemporary (to Turing) examples of what people thought "passing a Turing test" might look like? It's abundantly clear to me that we were simply wrong about what the output would have to look like in order to convince human judges in the 2020s.

Even examples from much more recently (see e.g. on http://www-logic.stanford.edu/seminar/1213/Hawke_TuringTest....) suggest a very different approach to the test than prompting ChatGPT and marveling at the technical accuracy of its prose.

(Exercise: ask an LLM to write a refutation to your comment from the perspective of a human AI skeptic. Notice the ways in which it differs from mine.)

> Everyone still thinks all an LLM does is regurgitate things.

No; people still think LLMs aren't intelligent. Because they aren't, and they cannot become so in principle. They can do many things that are clearly beyond "regurgitation" (as we would otherwise apply the word to computer programs), but none of those things are the result of intelligence. Producing a result that could plausibly come from an intelligent system does not, in fact, demonstrate that the actual system producing it is also intelligent. The https://en.wikipedia.org/wiki/Antikythera_mechanism wasn't intelligent, either, and applying a power source to turn the gears wouldn't have made it so, either.

> They don’t want to define an LLM as intelligent even if it meets the Turing test technical definition of intelligence so they change the technical definition.

The Turing Test was never a "technical definition" of intelligence. Turing's original paper (https://en.wikipedia.org/wiki/Computing_Machinery_and_Intell...) spoke of "thinking" rather than "intelligence". Besides, the "Imitation Game" is presented as a substitute problem exactly because "think" cannot be clearly enough defined for the purposes. The entire point:

> As Stevan Harnad notes,[7] the question has become "Can machines do what we (as thinking entities) can do?" In other words, Turing is no longer asking whether a machine can "think"; he is asking whether a machine can act indistinguishably[8] from the way a thinker acts. This question avoids the difficult philosophical problem of pre-defining the verb "to think" and focuses instead on the performance capacities that being able to think makes possible, and how a causal system can generate them.

But the usual processes of pop science seem to have created a folk wisdom that being able to pass a Turing test logically ought to imply intelligence. This idea is what has been disproven, not the AI skepticism.


"Why lower the bar?"

Because of the chance of misundertanding. Failing at acknowledging artificial general intelligence standing right next to us.

An incredible risk to take in alignment.

Perfect memory doesn't equal to perfect knowledge, nor perfect understanding of everything you can know. In fact, a human can be "intelligent" with some of his own memories and/or knowledge, and - more commmonly - a complete "fool" with most of the rest of his internal memories.

That said, is not a bit less generally intelligent for that.

Supose it exists a human with unlimited memory, it retains every information touching any sense. At some point, he/she will probably understand LOTs of stuff, but it's simple to demonstrate he/she can't be actually proficient in everything: you have read how do an eye repairment surgery, but have not received/experimented the training,hence you could have shaky hands, and you won't be able to apply the precise know-how about the surgery, even if you remember a step-by-step procedure, even knowing all possible alternatives in different/changing scenarios during the surgery, you simply can't hold well the tools to go anywhere close to success.

But you still would be generally intelligent. Way more than most humans with normal memory.

If we'd have TODAY an AI with the same parameters as the human with perfect memory, it will be most certainly closely examined and determined to be not a general artificial intelligence.


> If we'd have TODAY an AI with the same parameters as the human with perfect memory, it will be most certainly closely examined and determined to be not a general artificial intelligence.

The human could learn to master a task, current AI can't. That is very different, the AI doesn't learn to remember stuff they are stateless.

When I can take an AI and get it to do any job on its own without any intervention after some training then that is AGI. The person you mentioned would pass that easily. Current day AI aren't even close.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: