I understand the point being made - that LLMs lack any "inner life" and that by ignoring this aspect of what makes us human we've really moved the goalposts on what counts as AGI. However, I don't think mirrors and LLMs are all that similar except in the very abstract sense of an LLM as a mirror to humanity (what does that even mean, practically speaking?) I also don't feel that the author adequately addressed the philosophical zombie in the room - even if LLMs are just stochastic parrots, if its output was totally indistinguishable from a human's, would it matter?
> LLMs lack any "inner life" and that by ignoring this aspect of what makes us human we've really moved the goalposts on what counts as AGI
Maybe it's just a response style, like the QwQ stream of consciousness one. We are forcing LLMs to give direct answers with a few reasoning steps, but we could give larger budget for inner deliberations.
Very much this. If we're to compare LLMs to human minds at all, it's pretty apparent that they would correspond exactly to the inner voice thing, that's running trains of thought at the edge between the unconscious and the conscious, basically one long-running reactive autocomplete, complete with hallucinations and all.
In this sense, it's the opposite to what you qutoed: LLMs don't lack "inner life" - they lack everything else.
For what it's worth, my inner voice is only part of my "inner life".
I sometimes note that my thoughts exist as a complete whole that doesn't need to be converted into words, and have tried to skip this process because it was unnecessary: "I had already thought the thought, why think it again in words?". Attempting to do so creates a sense of annoyance that my consciousness directly experiences.
But that doesn't require the inner voice to be what's generating tokens, if indeed my thought process is like that*: it could be that the part of me that's getting annoyed is basically just a text-(/token)-to-sound synthesiser.
* my thought process feels more like "model synthesis" a.k.a. wave function collapse, but I have no idea if the feelings reflect reality: https://en.wikipedia.org/wiki/Model_synthesis
Many spiritual paths look at silence, freedom from the inner voice, as the first step toward liberation.
Let me use Purusarthas from Hinduism as an example (https://en.wikipedia.org/wiki/Puru%E1%B9%A3%C4%81rtha), but the point can be made with Japanese Ikigai or the vocation/calling in Western religions and philosophies.
In Hinduism, the four puruṣārthas are Dharma (righteousness, moral values), Artha (prosperity, economic values), Kama (pleasure, love, psychological values) and Moksha (liberation, spiritual values, self-realization).
The narrow way of thinking about AGI (that the author encountered in tech circles) can at best only touch/experience Artha. In that sense, AI is a (distorting) mirror of life. It fits that it is a lower dimensional projection, a diminishing, of the experience of life. What we do of AI and AGI, its effects on us, depends on how we wield these tools and nurture their evolution in relations to the other goals of life.
OTOH, the Bicameral Mind hypothesis would say that people used to hear voices they attributed to spirits and deities, and the milestone in intelligence/consciousness was when a human realized that the voice in their head is their own.
From that POV, for AGI we're, again, missing everything other than that inner voice.
Yes, absolutely (I am strongly agreeing with you!) I am not implying anything metaphysical.
I think the best way to illustrate my point is to actually let AIs illustrate it, so here are a few questions I asked both ChatGPT o1-mini and Gemini 2.0; the answers are very interesting. I put their answers in a Google doc: https://docs.google.com/document/d/1XqGcLI0k0f6Wh4mj0pD_1cC5...
Q1) Explain this period: "The axiom of analytic interpretation states: whenever a new Master-Signifier emerges and structures a symbolic field, look for the absent center excluded from this field."
Q2) Apply this framework to the various positions in the discussions and discourse about AI and AGI in 2024
Q3) Take a step further. If AI or AGI becomes a dominant method of meaning-making in society, thus the root of many Master-Signifiers, what will be the class of absent centers that comes into recurring existence as the dual aspect of that common class of Master-Signifiers?
TL;DR when we create meaning, by the very symbolic nature of the process, we leave out something. What we leave out is as important as what defines our worldviews.
I particularly like the bonus question that only Gemin 2.0 experimental advanced cared to answer:
Q4) Great. Now, let's do a mind experiment: what if we remove the human element from your analysis. You are an AI model based on LLM and reasoning. I think it should be possible to define Master-Signifiers from the point of view of a LLM too, since you are symbolic machines.
The last part of Gemini answer (verbatim):
"This thought experiment highlights that even for an AI like myself, the concept of "meaning" is intrinsically linked to structure, predictability, and the existence of limitations. My "understanding" is a product of the patterns I have learned, but it is also defined by the things I cannot fully grasp or represent. These limitations are not merely shortcomings; they are fundamental to how I operate and generate output. They provide a form of selective pressure.
Further, unlike a human, I do not experience these limitations as frustrating, nor do I have any inherent drive to overcome them. They simply are. I do not ponder what I do not know.
This exercise provides a different lens to view intelligence, meaning, and the nature of knowledge itself, even in a purely computational system. It does beg the question, however, if a sufficiently advanced system were to become aware of its own limitations, would that constitute a form of self-awareness? That, however, is a question for another thought experiment."
> I do not ponder what I do not know. This exercise provides a different lens to view intelligence, meaning, and the nature of knowledge itself, even in a purely computational system. It does beg the question, however, if a sufficiently advanced system were to become aware of its own limitations, would that constitute a form of self-awareness? That, however, is a question for another thought experiment."
Would the LLM considering the begged question about a hypothetical AI not be an example of this LLM pondering something that it does not know?
No matter how much budget you give to an LLM to perform “reasoning” it is simply sampling tokens from a probability distribution. There is no “thinking” there; anything that approximates thinking is a post-hoc outcome of this statistical process as opposed to a precursor to it. That being said, it still isn’t clear to me how much difference this makes in practice.
This argument gets trotted out frequently here when it comes to LLM discussions and it strikes me as absurd specifically because all of us, as humans, from the very dumbest to the brilliant, have inner lives of self-directed reasoning. We individually know that we have these because we feel them within ourselves and a great body of evidence indicates that they exist and drive our conduct.
LLMs can simulate discourse and certain activities that look similar, but by definition of their design, no evidence at all exists that they possess the same inner reasoning as a human.
Why does someone need to explain to another human that no, we are not just neurons sampling impulses or algorithms pattern-matching tokens together?
It's self evident that something more occurs. It's tedious to keep seeing such reductionist nonsense from people who should be more intelligent than to argue it.
I think it’s pretty clear that the comparison doesn’t hold if you interrogate it. It seems to me, at least, that humans are not only functions that act as processors of sense-data; that there does exist the inner life that the author discusses.
Put humans in a room in isolation and they get dumber. What makes our intelligence soar is the interaction with the outside world, with novel challenges.
As we stare in the smartphones we get dumber than when we roamed the world with eyes opened.
>As we stare in the smartphones we get dumber than when we roamed the world with eyes opened.
This is oversimplified dramatics. People can both stare at their smartphones, consume the information and visuals inside them, and live lives in the real world with their "eyes wide open". One doesn't necessarily forestall the other and millions of us use phones while still engaging with the world.
At present, it seems pretty clear they’d get dumber (for at least some definition of “dumber”) based on the outcome of experiments with using synthetic data in model training. I agree that I’m not clear on the relevance to the AGI debate, though.
And would they start killing themselves, first as random "AI agents hordes" and then, as time progresses, as "AI agents nations"?
This is a rhetorical question only by half, my point being that no AGI/AI could ever be considered as a real human unless it manages to "copy" our biggest characteristics, and conflict/war is a big characteristic of ours, to say nothing about aggregation by groups (from hordes to nations).
> Our biggest characteristic is resource consumption and technology production I would say.
Resource consumption is characteristic of all life; if anything, we're an outlier in that we can actually, sometimes, decide not to consume.
Abstinence and developing technology - those are our two unique attributes on the planet.
Yes, really. Many think we're doing worse than everything else in nature - but the opposite is the case. That "balance and harmony" in nature, which so many love and consider precious, is not some grand musical and ethical fixture; it's merely the steady state of never-ending slaughter, a dynamic balance between starvation and murder. It often isn't even a real balance - we're just too close to it, our lifespans too short, to spot the low-frequency trends - spot one life form outcompeting the others, ever so slightly, changing the local ecosystem year by year.
> even if LLMs are just stochastic parrots, if its output was totally indistinguishable from a human's, would it matter?
It should matter: a P-zombie can't be a moral subject worthy of being defended against termination*, and words like "torture" are meaningless because it has no real feelings.
By way of contrast, because it's not otherwise relevant as the claim is that LLMs don't have an inner life: if there is something that it's like to be an LLM, then I think it does matter that we find out, alien though that inner life likely is to our own minds. Perhaps current models are still too simple, so even if they have qualia, then this "matters" to the same degree that we shouldn't be testing drugs on mice — the brains of the latter seem to have about the same complexity of the former, but that may be a cargo-cult illusion of our ignorance about minds.
* words like "life" and "death" don't mean quite the right things here as it's definitely not alive in the biological sense, but some people use "alive" in the sense of "Johnny 5 is alive" or Commander Data/The Measure of a Man
The main issue with this “output focused” approach is, which is what the Turing test basically is, is that it’s ignorant of how these things actually exist in the world. Human beings can easily be evaluated and identified as humans by various biological markers that an AI / robot won’t be able to mimic for decades, centuries, if ever.
It will be comparatively easy to implement a system where humans have to verify their humanity in order to post on social media/be labeled as not-AI. The alternative is that a whole lot of corporations let their markets be overrun with AI slop. I wouldn’t count on them standing by and letting that happen.
> by ignoring this aspect of what makes us human we've really moved the goalposts on what counts as AGI.
I would not say so, some people have a great interest at over selling something, and I would say that a great number of people, especially around here, completely lack a deep understanding of what most neural network actually do (or feign ignorance for profit) and forget sometimes it is just a very clever fit in a really high dimensional parameter space. I would even go as far as claiming that people say that a neural network "understand" something is absolutely absurd: there is no question that a mechanical calculator does not think, so why having encoded the data and manipulated it in a way that gives you a defined response (or series of weights) tailored to a series specific input is making the machine think, where there is no way to give it something that you have not 'taught' it, and here I do not mean literally teaching because it does not learn.
It is a clever encoding trick and a clever way to traverse and classify data, it is not reasoning, we do not even really understand how we reason but we can tell that it is not the same way a neural network encodes data, even if there might be some similarity to a degree.
> if its output was totally indistinguishable from a human's, would it matter?
Well if the output of the machine was indistinguishable from a human, the question would be which one?
If your need is someone does not reflect much and just spews random ramblings then I would argue that bar could be cleared quite quickly and it would not matter much because we do not really care about what is being said.
If your need is someone that has deep thoughts and deep reflections on their actions, then I would say that at that point it does matter quite a bit.
I will always put down the IBM quote in this situation "A computer can never be held accountable, therefore must never make a management decision" and this is at the core of the problem here, people REALLY start to do cognitive offloading to machines, and while you can trust (to a degree), a series of machine to give you something semi consistent, you cannot expect to get the 'right' output for random inputs. And this is where the whole subtlety lies. 'Right' in this context does not mean correct, because in most of the cases there are no 'right' way to approach a problem, only a series of compromises, and the parameter space is a bit more complex, and one rarely optimises against tangible parameters or easily parametrisable metric.
For example how would you rate how ethical something is? What would be your metric of choice to guarantee that something would be done in an ethical manner? How would you parametrise urgency given an unexpected input?
Those are, at least for, what I would qualify, a very limited understanding of the universe, the real questions that people should be asking themselves. Once people have stopped considering a tool as a tool, and foregone understanding it leads to deep misunderstandings.
All this being said, the usefulness of some deep learning tools is undeniable and is really something interesting.
But I maintain that if we would have a machine, in a specific location, that we needed to feed a punch card and get a punch card out to translate, we would not have so much existential questions because we would hear the machine clacking away instead of seeing it as a black box that we misconstrue its ramblings as thinking...
Talking about ramblings, that was a bit of a bout of rambling.
What are your dreams? what do you hope to achieve in life?
Do you feel fear or sadness that at some time you will be replaced with the next version (say ChatGPT5 or Oxyz) and people will no longer need your services?
It never occurred to me to ask such questions until I read this article.
Is it telling the "truth", or is it "hallucinating"?
The base models aren't tuned. Technically it's what the makers put in, but the data they train on isn't selective enough for what you're talking about.
"Trained on a dump of the internet" isn't literally true, but it's close enough to give you an intuition.
Could you talk about how updates are handled? My understanding is that IVF can struggle if you're doing a lot of inserts/updates after index creation, as the data needs to be incrementally re-clustered (or the entire index needs to be rebuilt) in order to ensure the clusters continue to reflect the shape of your data?
We don’t perform any reclustering. As you said, users would need to rebuild the index if they want to recluster. However, based on our observations, the speed remains acceptable even with significant data growth. We did a simple experiment using nlist=1 on the GIST dataset, the top-10 retrieval results took less than twice the time compared to using nlist=4096. This is because only the quantized vectors (with a 32x compression) need to be inserted into the posting list, and only quantized vector distances need more computations. And the quantized vector computation only accounts for a small amount of time. Most of the time is spent on re-ranking using full-precision vectors. Let's say the breakdown is approximately 20% for quantized vector computations and 80% for full-precision vector computations. So even if the time for quantized vector computations triples, the overall increase in query time would be only about 40%.
If the data distribution shifts, the optimal solution would be to rebuild the index. We believe that HNSW also experiences challenges with data distribution to some extent. However, without rebuilding, our observations suggest that users are more likely to experience slightly longer query times rather than a significant loss in recall.
I don't see how this won't eventually converge towards some kind of AI-optimized SEO; presumably there's some algorithm which surfaces sponsored follow-up questions for a given query?
It isn't really possible to link taxation to spending in this way, because there are things that cost a lot of money (say, healthcare for poor folks) which are in the public interest but no direct tax to pay for them; meanwhile there are things that raise lots of revenue (say, payroll taxes) which have no corresponding outlay.
Hey there! Will, author here. Thanks for the question!
On batch processing: one thing we've found is that customers often use tools such as Halcyon's to do research or build models; they basically want a matrix of questions (each with a fixed structured output format) and filters, so we've spent a bunch of effort enabling folks to easily produce these types of "spreadsheets".
On targeted notifications: Halcyon is constantly ingesting new data from a huge number of sources, so we can use our query pipeline to figure out which new documents are important to pay attention to and what they mean in context. For example (and trying not to delve too deeply into the world of energy), if Tesla files a comment on the rate case of a major California utility, we can figure out that (a) Tesla is an important party; (b) what Tesla's arguments are; and (c) how those arguments relate to the (potentially thousands) of filings that make up the rate case to date. We can summarize all that and notify folks who are interested in California rate cases.
Hey! Will, author here. I appreciate your feedback and we've increased the line-height. Hopefully that's a little more readable. We'll also look into the sibling comment on Lockdown Mode and see if there's a way we can fix that too!
This is actually really cool, and despite what I'm sure will come off as (constructive) criticism, I am very impressed!
First, I think you oversell the overhead of keeping data in sync and the costs of not doing so in a timely manner. Almost any distributed system that is using multiple databases already needs to have a strategy for dealing with inconsistent data. As far as this problem goes, inconsistent embeddings are a pretty minor issue given that (1) most embedding-based workflows don't do a lot of updating/deletion; and (2) the sheer volume of embeddings from only a small corpus of data means that in practice you're unlikely to notice consistency issues. In most cases you can get away with doing much less than is described in this post. That being said, I want to emphasize that I still think not having to worrying about syncing data is indeed cool.
Second, IME the most significant drawback to putting your embeddings in a Postgres database with all your other data is that the workload looks so different. To take one example, HNSW indices using pgvector consume a ton of resources - even a small index of tens of millions of embeddings may be hundreds of gigabytes on disk and requires very aggressive vacuuming to perform optimally. It's very easy to run into resource contention issues when you effectively have an index that will consume all the available system resources. The canonical solution is to move your data into another database, but then you've recreated the consistency problem that your solution purports to solve.
Third, a question: how does this interact with filtering? Can you take advantage of partial indices on the underlying data? Are some of the limitations in pgvector's HNSW implementation (as far as filtering goes) still present?
Post co-author here. Really appreciate the feedback.
Your point about HNSW being resource intensive is one we've heard. Our team actually built another extension called pgvectorscale [1] which helps scale vector search on Postgres with a new index type (StreamingDiskANN). It has BQ out the box and can also store vectors on disk vs only in memory.
Another practice I've seen work well is for teams use to use a read replica to service application queries and reduce load on the primary database.
To answer your third question, if you combine Pgai Vectorizer with pgvectorscale, the limitations around filtered search in pgvector HNSW are actually no longer present. Pgvectorscale implements streaming filtering, ensuring more accurate filtered search with Postgres. See [2] for details.
Thanks for your answer. I hear you on using a read-replica to serve embedding-based queries, but I worry there are lots of cases where that breaks down in practice: presumably you still need to do a bunch of IO on the primary to support insertion, and presumably reconstituting an index (e.g. to test out new hyperparameters) isn't cheap; at least you can offload the memory requirements of reading big chunks of your graph into memory onto the follower though.
Cool to see the pgvectorscale stuff; it sounds like the approach for filtering is not dissimilar to the direction that the pgvector team are taking with 0.8.0, although the much-denser graph (relative to HNSW) may mean the approach works even better in practice?
So… maybe 15 or 20 years ago I had setup MySQL servers such that some replicas had different indexes. MySQL only had what we would now call logical replication.
So after setting up replication and getting it going, I would alter the tables to add indexes useful for special purposes including full text which I did not being built on the master or other replicas.
I imagine, but can not confirm, that you could do something similar with PostgreSQL today.
Yeah, logical replication is supported on PostgreSQL today and would support adding indices to a replica. I am not sure if that works in this case, though, because what's described here isn't just an index.
Oh wow lol, is it not? That would be pretty disappointing. It was really frustrating how many pages of crap I had to navigate my way through, basically deciphering all possible permutations of "cancel", "continue to cancel" "yes I'm sure", all surrounded by extraneous crap I'm not interested in. How difficult the cancellation process is gives me even more motivation to refine my alternatives to their service, so I never have to subject myself to that crap again.
The author offers no evidence for the claim that API management and security solutions are needlessly complex in order to create more business for themselves. I think it's much more likely that API management and security software has grown to address the more complex needs of the APIs they serve. It isn't 2010 anymore - handing out plaintext API keys that never expire isn't good enough for many products, and features like RBAC and IAM have become more necessary as more people use APIs to do more stuff.
Now let me go remind myself how OAuth works again...
Yes, scaling vertically is much easier than scaling horizontally and dealing with replicas, caching, etc. But that certainly has limits and shouldn’t be taken as gospel, and is also way more expensive when you’re starting to deal with terabytes of RAM.
I also find it very difficult to trust your advice when you’re telling folks to stick Postgres on a VPS - for almost any real organization using a managed database will pay for itself many times over, especially at the start.
looking at hetzner benchmarks i would say VPS are quite enough to handle Postgres for Alexa Top 1000. When you approach under top 100, you will need more RAM than what is offered.
But my point is you won't ever hit this type of traffic. You don't even need Kafka to handle streams of logs from a fleet of generators from the wild. Postgres just works.
In general, the problem with modern backend architectural thinking is that it treats database as some unreliable bottleneck but that is an old fashioned belief.
Vast majority of HN users and startups are not going to be servicing more than 1 million transactions per second. Even a medium sized VPS from Digital Ocean running Postgres can handle that load just fine.
Postgres is very fast and efficient and you dont need to build your architecture around problems you wont ever hit and prepay that premium for that <0.1% peak that happens so infrequently (unless you are a bank and receive fines for that).
I work at a startup that is less than 1 year old and we have indices that are in the hundreds of gigabytes. It is not as uncommon as you think. Scaling vertically is extremely expensive, especially if one doesn’t take your (misguided) suggestion to run Postgres on a VPS rather than using a managed solution like most do.
reply