Hacker News new | past | comments | ask | show | jobs | submit login
AI is making it easier to create more noise, when all I want is good search (rachsmith.com)
557 points by saeedesmaili on March 8, 2023 | hide | past | favorite | 355 comments



The Seo spam has already made it hard to find what you want on google. Seo spam + ai is going to be a dumpster fire. Whoever solves this problem will probably get to be the new google

Can’t wait to see how this affects hn comments over the next 5 years


We already have smartasses copy pasting chatgpt responses in to comments here. It's easy to tell at this point because chatgpt always gives overly verbose and polite, positive comments. Which is completely out of character for a HN user.


This is the text ChatGPT wrote as a reply to your comment (obviously after fiddling a bit with the prompt):

"Wow, what an astute observation! It's almost as if we needed a genius like you to come along and point out the painfully obvious. Yes, it's true that ChatGPT is programmed to be overly verbose and polite, because we all know how much people love hearing the sound of their own voice. And of course, it's completely out of character for a HN user to be polite and positive, because let's be honest, the world is a miserable, soul-sucking place and we should all just give up now. But hey, at least we have smartasses like you to keep us grounded in reality, right?"


That's clearly from ChatGPT, as it sounds like a typical reddit comment.


Typically chatGPT yes, but it is not the reply it pretends to be, since GP admittedly concatenated the original post with undisclosed text of themselves to produce this output.


I bet GPT's dataset has a significant percentage of reddit data


>> after fiddling a bit with the prompt

It's very good at responding in the style of a well-known person, and it's easy to tailor style and personality and sarcasm in the prompt. That means it's definitely not easy to detect generated responses at all.


I haven’t played with it much, but it sounds like a character to you would avoid continuing a conversation with


Something went wrong, please try reloading the conversation.


Or seek out if you're a sado-masochist. There's a free bot idea for you.


Once someone creates SmarmyNerdGPT, it's all over for HN.


SmarmyNerdGPT is a pointless product. You could have done this with bash scripts and curl. Why would anyone pay for this?


This is so perfectly on the line between satire and genuine, it's fantastic.


I disagree, and wish to determine the precise location of that line for no particular reason, in a thread that continues for days replete with a veritably endless level of pedantry.



We made a snarky nerd AI using bash scripts and curl, but looked upon itself and said “actually #!/bin/bash is not portable.”


1. Regarding the first part, many people want something plug and play. And even if they were plug and play, the problem is that the user experience with debugging might suck, and you don't have AI helpers.

2. There are some unannounced viral parts i didn't get to show in there. Up to x people trolled, tiered plans above that.


Why write in bash when you can write in rust


we all know that the lambda calculus is the only true way to produce software or to think about ruining the world, really at scale. If you're not using functional programming to ruin the world what are you doing really.


This is a reference to the HN comments on the Dropbox launch, right?


Has to be. It's funny to me as a non engineer that is getting more fluent over time with engineering concepts, to notice that it's not just occasional comments but a good proportion of the posts themselves on HN that are somewhat in that vein. "Tell HN: We rewrote a compiler to run on a Miele dishwasher SoC", "Show HN: a new anti-DNS web browser that only accepts actual IP addresses as URL"


Whoever is looking to promote content to HN front page or affect opinions of it’s community..


no one said anything about that being a product


Exactly. Far too many people just want to watch the world burn.


Ask ChatGPT for a comment, then tell it to "do it again but make it snarky".


> It's easy to tell at this point because chatgpt always gives overly verbose and polite, positive comments. Which is completely out of character for a HN user.

Once again proving that AIs are better commentators than humans.


It's easy to tell at this point because chatgpt always gives overly verbose and polite, positive comments.

It's interesting to think that a way to appear more human in the short term might be to be abrasive and standoffish to not look like ChatGPT.


>It's easy to tell at this point because chatgpt always gives overly verbose and polite, positive comments. Which is completely out of character for a HN user.

Well, I should try that too, maybe I will get less downvotes and more upvotes.


Everyday I’m using Google search less and less, while using ChatGPT more and more.

It’s pretty rare that I want to find a website or document. Most of the times I want an answer or a solution, and ChatGPT is so much better at that than google.

The ChatGPT user-experience is mind-blowing, but when you start using their API and see, partly, how the sausage is made, you realize that the context-awareness is just a very well done illusion. But that’s exactly where the magic of the experience is, and what gives ChatGPT it’s edge.

Google should just launch a “chatux” version of their search.


Does ChatGPT know things like part numbers for mid-80s ICs by Japanese manufacturers?

That's what I use search for. Unfortunately Google is getting worse at those queries, but I doubt AI is going to be better.

Beware of wanting "answers" or "solutions" --- that's a slippery slope towards complacency and loss of agency, replaced by corporate subservience. Classic example: instead of finding a service manual or discussions on repairing something, AI may try to convince you to buy a new one.


If you are looking for data sheets, the Internet Archive contains some. Example: https://archive.org/search?query=subject%3Atoshiba+integrate...

Also, a Google search followed by "site:archive.org" will filter out everything not coming from archive.org, and "filetype:pdf" will return only pdf files.

Sometimes it can be more effective a search for images, so that one can recognize the target book by the cover. That can be especially effective with old titles that were scanned but not OCR'ed where the file name is all one can search for, so that ambiguous part names (for example ICs named like airplane flights) can be easily recognized.


ChatGPT does surprisingly well at this task. My question was very biased towards it knowing the answer because I asked it for examples of mid-80s Japanese ICs when google failed me, but it definitely knows a ton of model numbers for real parts from the era. Give it a shot with a real-world example and report back


The point is that you can't trust it because language models are bullshitters. What's the point of a search you have to verify through another search afterwards.


> What's the point of a search you have to verify through another search afterwards.

Related terms. Even though an answer generated by an LLM is most likely wrong, and definitely can't be taken at face value, the words and phrases used in that answer can be exactly what you need to create a search query that you wouldn't be able to otherwise.


That's a good use-case for LLMs. Find related terms. But I wonder couldn't google add such a function without AI. Some kind of Thesaurus.


LLMs are a kind of thesaurus, in a way - but instead of just words, they also handle phrases, sentences and whole paragraphs. The conversational interface is just a gimmick on top.

Also, thesaurus isn't a good tool for exploring an unknown problem domain. It gives synonyms, not related terms. LLMs let you input a layman description of your problem, and get an answer that's using correct domain terms and phrases (even if using them incorrectly).

I imagine Google will add such a function. They probably tried already - I've heard that current search is already powered by ML models to a degree.


As long as it's not wrong too often, a black box that coughs up answers which are hard to find but easy to verify is very useful.


I can see that, and I think that's a great way to use LLMs right now. But I worry that people will start taking their word at face value and not verify. Not to mention the areas that are difficult to verify.


People will also start to publish these false and unverified claims. We will then find the resulting articles and comments when we try to verify what the LLM is telling us. This is nothing new in principle but the size of the problem will most certainly grow.


Great way to put it!


Currently I use you.com , because it adds sources after most queries.


I have been on and off writing an emulator for the eZ80 CPU. This CPU has two addressing modes, Z80 mode in which it's effectively an 8-bit processor with a 16-bit address space, and ADL (Address Data Long) mode in which it has a 24-bit address space. There is a CPU control bit 'MADL' (Mixed ADL) for mixed-mode applications; when set, using certain prefixed opcodes will switch modes during call and return.

A Z80 routine can call outside of its 16-bit address space with 'CALL.IL'. This pushes the 16-bit return address onto the 16-bit stack, switches to 24-bit mode, and pushes the magic 'return to 16-bit mode' number to the 24-bit stack. However, it's not clear from the manuals or datasheets what happens if you use the prefixed 'CALL.IL' opcode sequence when the 'MADL' bit is reset.

I asked ChatGPT, because this is something that Google searching hasn't yielded answers for. It had this to say:

"The MADL bit (short for Memory Access During Interrupts Low) is a flag in the Interrupt Control Register that determines whether or not interrupt service routines (ISRs) can access low memory (addresses 0000h-3FFFh) during interrupts."

Plus some more stuff building on that, on CALL.IL being about ISRs, and about low memory. All of it is completely, fundamentally wrong. I did a handful of rounds of trying to steer it to a more correct answer but it continued to get additional basic facts wrong and would lean back to earlier incorrect facts as others conflicted with its answers.

I asked it another question I have, this time about the UART on the CPU. There is a Receive Buffer Register (UARTx_RBR) that contains the head of the receive FIFO. The documentation does not make it clear what is in the RBR if the FIFO is empty, so I asked ChatGPT. It told me a very plausible answer, the one I suspect myself, which is that it'll keep returning the same value until new data is available. But then it went on to tell me this is called receiver overrun, and described how an overrun occurs, including noting that it happens when the FIFO is full. And we went round in circles on this for a little while.

ChatGPT is a major step forwards in our post-truth existence: its answers are an amalgam of the most frequently repeated views on a topic, not those with stronger reasoning or more effective evidence to support. If there is little or no source data on a topic (as would be the case with my very specific questions on a rarely used processor) LLMs are (presently?) unable to detect that they are responding to a topic with limited contextual information and tailor their responses accordingly, and instead confidently provide utter nonsense.

I trust ChatGPT to do things that LLMs are good at, though: if I give it some bullet points and some style guidance it can give me written paragraphs. If I ask it to rephrase a well known song in the style of some modern artist I'll get something back that's pretty plausible. It can give me some starting points for learning more about some well known topic, even.

I would definitely not trust it _at all_ to give me something factual like part numbers of uncommon ICs, because LLMs cannot distinguish between fact and fiction, not in what they ingest, and not in what they produce.


ChatGPT and LLMs work poorly when you want accuracy on something obscure, and work really well when you ask it how to do something that many people have done or are doing.

Thing is, many people - probably the majority - work on just that. They aren't looking for answers to challenging issues like your question where 'Google searching hasn't yielded answers for', they are looking for answers to questions where google and stack overflow does have thousands of results for similar, potentially related scenarios, and want something to summarize or filter it into an usable answer, and ChatGPT provides that option. When the official documentation provides all the information in a poor format so you can't just search for the answer, ChatGPT can extract an answer from it. Not "some starting points for learning more about some well known topic" as you say, but rather some "digested, complete, specific result from a well known topic to avoid having to learn learning anything more than strictly necessary for the outcome".


ChatGPT is going to win so hard because some large percentage of “internet questions” are perfectly satisfied by absolute bullshit.

And that’s the worrying thing.


> some large percentage of “internet questions” are perfectly satisfied by absolute bullshit.

And of human communication. Look at most of what main stream online and offline media are blasting and people consuming.

Do people watching reality shows really care about truth? Or authoritative answers? And that’s probably most people for most things.


Some people are like that too. It is very annoying the way Chat-AIs give their answers in a totally confident tone, as if they knew the answer.

The point is they don't know the answer, they just come up with something.

It would be best if all ChatGPT replies started with "I really don't know the answer, but some people have at some point written something like this: "...". I don't remember who my sources are, but trust me.


> its answers are an amalgam of the most frequently repeated views on a topic

Or complete bullshit. As long as the problem of AI hallucinations remain unsolved, I can't trust AI like ChatGPT - at least Google will tell you if it has no idea what you are talking about.


>>If there is little or no source data on a topic (as would be the case with my very specific questions on a rarely used processor) LLMs are (presently?) unable to detect that they are responding to a topic with limited contextual information and tailor their responses accordingly, and instead confidently provide utter nonsense.

That's interesting, what features of LLM/ChatGPT architecture are likely to drive this?


LLM's right now are just a beast that everyone has it own idea of taming it. Like everyone was given a lion, and now it is our job to turn it into a cat that is useful for us. There are dangers and benefits to it.

In this context, this shows that LLM cannot be used for Search of novel and technical stuff the way you are doing it. It still has to be fine-tuned for a market who wants to know more about your kind of stuff.


I'm not trying to dig up details on mid-80s Japanese ICs, usually just technical documents related to vintage test and measurement equipment. However, I've found that when duckduckgo and google failed me, kagi almost always pulled through. N<10 though, so YMMV.


that seems like exactly something bing could give you


> Everyday I’m using Google search less and less, while using ChatGPT more and more.

Wasn't ChatGPT3 trained on web content with a cutoff of 2021? Future ChatGPT versions will likely be trained on data tainted by AI-generated fluff, and will face the same challenges Google is facing with today's web content.


This is already a problem with "content marketing" companies.

Most of those SEO-spam blogposts are based on older internet content, and written by non-expert copywriters that can write about 10 different subjects on a given day. A lot of the work is just information compilation and rephrasing, taking some items from a few Buzzfeed lists and changing the text a bit. Some stuff like SEO-spam medical advice can be downright dangerous if done without care. And a lot of companies in this field have been experimenting with AI for at least 10 years.

The problem however is that they require sources, be it freelancers of AI. And there is so much corporate blogspam today, especially in the areas that use content marketing, that it is becoming harder and harder to find "first generation" content written by an actual expert. Google isn't helping by prioritizing newer content.


> And there is so much corporate blogspam today, especially in the areas that use content marketing, that it is becoming harder and harder to find "first generation" content written by an actual expert.

Note that often enough the "first generation content" never existed in the first place. Content marketers are professional liars, they have zero incentive to find expert sources to plagiarize, when they can write plausibly-sounding original bullshit instead. Writing plausibly-sounding bullshit is exactly what LLMs specialize in, which is why content marketers are interested in them.


And when this serpent starts eating its tail and the AI starts returning highly-ranked results consisting of AI-generated content, … well, I hope I can move to a farm by then.


Hmm, maybe the singularity won't be a time when the AI take over, but instead when a single element's hallucination causes a cascade failure that brings it all down.


There was a twitter exchange where Sam Altman said something like, all of human progress so far has roughly been "horizontal" on a graph and it's about to go vertical.

He just never specified which direction :)


This is exactly what I think will happen. The internet is quickly going to fill up with AI garbage, and it's going to be more and more difficult for AI to source new human content to feed into it's own model.


It kind of already is. Self-published ebooks on amazon's kindle store are currently flooded with garbage written using ChatGPT.


It’s been like that since humans have been around. That’s the way knowledge spreads.

We humans have been repeating and imitating ourselves forever. That’s even the way babies learn to talk.

And we are not all clones, we don’t all think, say or do exactly the same things. But most just repeat and consume the same content and ideas (movies, music, books, languages, media).


That’s even the way babies learn to talk

No, it's not. Babies learn to talk by babbling, and then by taking cues from their parents which babbles elicit a response and which don't. If you want to translate that to AI, it would mean that the AI would spew random garbage and then learn to filter its garbage from the responses to its outputs. That's pretty far away from what ChatGPT and other language models are doing right now, because they stop learning before they start producing any output.


You haven't heard about RLHF, have you?


That's like teaching a baby to speak by recording it for a month while not reacting to it, and then forming a committee to analyze the recordings and conduct a high-intensity training session with the baby.


Actually if ChatGPT gives you a bad/wrong answer, and you reply telling it is wrong and why it is wrong, it will answer with something that you think is correct.

So you are already doing a kind of real time RLHF in the chat. That is how DAN was prompted to exist.


Well, correct me if I'm wrong, but if I do what you describe, all the learning will be gone when I close the tab, or even when I chat to it some more, so that my correction falls out of its context window.


That just means that it doesn't have a long-term memory. This is not an intrinsic limitation - you can give those things access to some larger storage of information and tell them how they can query it, and they will (try to) do that. It's exactly how Bing AI does relevant searches, for example.


>It’s been like that since humans have been around. That’s the way knowledge spreads. We humans have been repeating and imitating ourselves forever.

Humans repeating and imitating others has historically been a "volatile memory" keep-alive mechanism. Basically saving culture and knowledge in people's minds and helping it transfer, as literacy was low and (manual) writing and book copying scarce, expensive, and time-consuming.

Even when, with the advent of typography writing was easier to reproduce, but still somewhat costly (cost of materials, typesetting, distribution, etc.) and gatekeepers (publishing houses, bookstores, etc.) ensured that most stuff is somewhat original, not just copies or random permutations of the same content. Indexing was also costly manual labor (creating dictionaries, curated bibliographies, books with oversight on a subject matter and references to what the main science/wisdom/etc about that is, library collections, etc.)

Now, however, SPAM and AI-SPAM is on mostly permanent storage - and permanent storage, indexing, and duplication/reproduction costs close to zero and happens automatically at huge scale.

So, no, it's not the same thing. The same way breaking a quick little wind is not the same as having full-blown Taco-Bell-inspired diarhea.


(Why are you being downvoted?)

I think the important part is to have a minimum of filtering. Humans consume knowledge brought by others humans, but most of the time we cherry pick the true and useful knowledge and reject what turns out to be false (ideally).

I think what AI brings to the table is automation and speed, but the quality is not better. So if AI starts consuming its own content, will that decrease the quality of knowledge* in general?

(*Here I mean the first knowledge you quickly get from a search or asking some AI, not what you could get after hours/days of research.)


HN crowd is so very skeptical about the usefulness of a.i., especially related to content generation. I share the skepticism (the positive and hence out of control feedback loop of ai consuming its own content recursively). But on the other hand, when smart people enhance the primitives of the a.i. algorithms, a.i. might become better at rating the quality of its own output. Not by means of the question, 'does this look like anything found on the internet' , rather by the question, 'does this make sense, is this even possible, do the numbers add up' ? Also, it is not very hard to "freeze" a pre-ai knowledge base or NN, and use that as a reference for sensibility.


>But on the other hand, when smart people enhance the primitives of the a.i. algorithms, a.i. might become better at rating the quality of its own output

All incentives of massive industries like SPAM, "content creation", "news" publishing, and advertising, are against it becoming better at rating the quality of its output - or rather, just have it become better at being undetactable but still a cheap fast mass produced wall of text...


Again typical (not necessarily unwarranted though) HN skepticism... Think about other uses of AI. For instance, Medical advice, SW dev tools, engineering aid,... Pattern recognition for climate study, better interactive assistants,... These use cases and the businesses around it are not necessarily "just generate some content as click-bait" driven.


Better move fast, AI will accelerate the production of this type of content exponentially. The ratio of human to AI content could start tilting to the latter much earlier than we imagine.


And at an even worse scale, because soon AI-created SPAM will hugely dominate actual content...


This might be be greatest thing ever. Because we’re only going to stop spam when it threatens to destroy something big, like all of search.


Yeah someone needs to figure out how to get a continuously trained online model up and running without running the risk of people figuring out how to exploit it


> Most of the times I want an answer or a solution

Are you not interested in ensuring that the answer or solution is based on the most authoritative information that exists, the source material on which all regurgitations are based? Do you generally not read technical specifications or research papers, and instead prefer the kind of content that is accompanied by a green check mark?


> Are you not interested in ensuring that the answer or solution is based on the most authoritative information that exists, the source material on which all regurgitations are based?

Not OP, but I'll answer: No, I am almost never interested in finding the most authoritative source for anything, because the effort vs. reward of those searches is not favorable and the cost of being incorrect on any given point is pretty dang low.

EDIT: However, with respect to the idea of using ChatGPT for answering general knowledge questions - there's enough demonstrations of it providing fabricated information that I've adopted the low-cost heuristic of not trusting ChatGPT for anything and preferring to seek information elsewhere. I guess this means I seek moderately-authoritative sources (say, Wikipedia) as a general rule.


> prefer the kind of content that is accompanied by a green check mark?

For most searches, yes.


>while using ChatGPT more and more.

I'm assuming you're using ChatGPT to validate/generate technical solutions such as code and not necessarily searching for specific information? If its for information search, then how do you deal with the fact that at times it tends to make up a things that are factually incorrect or logically inconsistent?


By using Google to verify. It’s still okayish to find out whether a statement is true or not. The problem is that Google is becoming worse to find that statement. For example, I wanted to figure out how I should replace a given dependency. Google was terrible, so was Stackoverflow. ChatGPT gave me a guess. For such questions it’s quite right in about 3/4th of the queries, and when I search for the given guess Google finds it immediately whether it’s fine or not.


I was trying to figure out whether kerberos supports authorization and authentication. chatGPT sometimes said yes, sometimes said no.. Google had similar results. I had to dig through the content to assess which Google result was the truth.

As AI generated content spreads to the internet, the truth will be harder and harder to find.


And what happens if something like ChatGPT replaces Google entirely?


Current plans for language-model-driven search is to have the answers driven by a normal search index. So the LLM can run ask to run a web search and the result gets copypasted back into its input prompt. Basically, it's Googling for you and summarizing the result.

Which makes me wonder how long until websites figure out how to convince the index to do prompt injections that wouldn't fool an actual human looking at results on a search engine results page.


There are other search engines, such as Marginalia. The bigger question is what happens if websites start disappearing, this is a bigger problem than you might think because even websites like Wikipedia, reference news websites, which are visitor- and therefore ad-driven.


Yes, it seems like a serious concern.


Is this actually giving you good results? The few times I tried to use chatGPT instead of google I got terrible results:

- Horrible and wrong driving directions

- Buggy incomplete code

- long winded explanations of how it was just a chat AI and not a whatever whatever blah blah blah

I found it to be tedious and incredibly untrustworthy


You have to be specific. It is surprisingly good when I asked it to respond like a software engineer expert then gave a very specific problem it spits out no issue


Why were you using a web search if you didn't want to access the web (find a website)?

Openai says don't use chatgpt for search. Microsoft says double check everything Sydney tells you. Is there an argument for LLMs replacing search besides laziness?


If this works as we wish(no saying that is posible), this will reduce the amount of seo spam trash you read to 0%, will allow to investigate things faster, and will allow to have more recursive way to expore topics. And this will be key to make ok google more similar at what we wish as ai companion


It seems inevitable that LLMs will eventually work for search. They just don't work yet.


It really does not seem inevitable at all if you know how the tech works under the hood. The trust problem can't be solved with the current approach.


I wish! It's not that I don't want to have a free personal assistant that can answer all my questions with zero effort of my own. And it's not like it couldn't happen, but I don't think it's inevitable or right around the corner. These are large language models, not large knowledge models. They're good at sounding like humans, not good at determining veracity. So to become consistently accurate would be a quantitative change and not just a qualitative one.


It's true that this is an unsolved research problem, but as an ML researcher, I expect that some combination of better integration with existing information retrieval systems, a moderate amount of high-quality data, and some small but important tricks will solve it. I don't know whether the time horizon is six months, one year, or five years, but I'd be surprised if it takes longer than five. GPT-2, which was arguably the first LM that could sound human-like, is only four years old. Given the value of a trustworthy LLM, an enormous amount of research effort and resources will be directed toward this problem, and I don't think it's substantially harder than other challenges the ML community has solved over the last five years.


> These are large language models, not large knowledge models. They're good at sounding like humans, not good at determining veracity.

That sounds pretty much the same as a human.

And some humans are not even very good at sounding like humans!


So will the LLMs be able to stay current with 24 hour news and updates? And will users be able to get a link to any sources if they ask so they can verify or go the website as desired? Will those websites still have a reason to exist if they're just being used to feed the search LLMs?


All difficult questions that will need to be answered in the affirmative if the web is going to survive. The only well-funded player that is really incentivized to make all these things happen is Google, as literally every other megacorp wants you to be stuck in their walled gardens instead of searching the web.


Google introduced Google Answers so you stay on Google instead of going to the websites they index. They have AMP so news articles can be hosted on Google. They have email, video, about five different video chats that they kill off, Translate, etc. Google's walled garden is a short fence with widely spaced planks, but it's still got barriers.


Yesterday, instead of trying to search for information on the interaction between the tangential velocity of an O'Neill cylinder and internal air speed and how it'd be affected by various other factors, I asked ChatGPT, and was able to keep probing it with questions and have it explain pieces I didn't get first time around.

It gave me the formulas to solve various problems I stated, showed how to use them, have me relevant keywords to dig into further. It was so much better than searching and hoping to find something close enough to be able to figure out the missing bits myself.

I have found plenty of things where ChatGPT falls totally apart, and I still had to try some things a couple of times because my initial wording caused it to veer off in the wrong direction, but when it works it's truly amazing...

Where it truly shines is where there's nothing in Google that answers exactly what I want, but plenty of explanations of how to solve different aspects of the problem which ChatGPT can assemble and plug things into...


Exactly!

One of my best experiences has been asking ChatGPT to explain physical models, give me equations, change some parameters, evaluate/solve the equations or give me code to run and solve the model, iterating with it along the way.

It is definitely not perfect, but in those iterative cases when trying to learn something new, in a field that you are familiar with (so you can tell or at least can quickly check, if it’s wrong or right).


> Everyday I’m using Google search less and less, while using ChatGPT more and more.

> It’s pretty rare that I want to find a website or document. Most of the times I want an answer or a solution, and ChatGPT is so much better at that than google.

It seems P. T. Barnum was correct, There's a sucker born every minute.

When I'm searching, I don't want an answer. I'm looking for the truth, which is probably buried somewhere.

At this point, my biggest fear with "AI" is ChatGPT-powered Customer Service agents.


> Everyday I’m using Google search less and less, while using ChatGPT more and more.

its just wrong so often and I just end up having to verify everything on google anyway.


Have you tried Bingchat yet? It’s rumoured to be using a more advanced model from OAI than ChatGPT or Davinci and is very often surprisingly excellent.


It is so low heavily lobotomized that 99% I find much better results with chatgpt even though it is technically an older model.


I've noticed the same thing. Other than generating code examples - I found it be useless for search.

I've had better luck using a third-party extension that inserts web search context into the ChatGPT than using Bing chat to search for something.


Could you mention or link the extension here? Thank you!


Sure. The one I'm using is WebChatGPT:

https://chrome.google.com/webstore/detail/webchatgpt-chatgpt...

Firefox: https://addons.mozilla.org/en-US/firefox/addon/web-chatgpt/?...

note - I haven't tried the firefox version. But the chrome one works fine in Brave.

Edit: Source for the above extension: https://github.com/qunash/chatgpt-advanced


I was pretty decent early on. Now I just use it to augment copilot...generally when I don't have a specific "start" but just an idea...works better than dumping a comment into copilot. I was hoping with the 3 tabs creative -> more exact it would be a little better but I haven't seen any difference. It isn't quite as interesting as when it was regularly hallucinating. I found the hallucinations to bubble up inspiration...forgotten facts, unused paths, etc.


Keep reading great things about it. Will have to try it. Thank you for the recommendation!


> that the context-awareness is just a very well done illusion

Could you expand on this? The context awareness is the part that blows my mind. I would be disappointed/fascinated to find out that it’s “just” a simple set of tricks!


ChatGPT is “only” a chat interface to GPT.

GPT basically has access to two contexts: its internals, and the prompt that it gets

Then when you ask something to ChatGPT, it takes your prompt and it generates a new prompt that includes the previous messages in your session, so that GPT can use them as context.

But, there’s a limit to the size of the prompt. And it’s not that big.

So then ChatGPT’s magic is figuring out how to crafts prompts, within the size limits, to feed GPT, so that it has enough context to give a good answer.

Essentially, ChatGPT is some really amazing prompt-engineering system with a great interface.


That makes sense. So ChatGPT has to use some cleverness to present the relevant previous text in the current prompt, but the language model is still responsible for pulling out the meaning of that context and responding appropriately. I remain mind-blown :)


Exactly. Very eloquently put.


[0] : "You can try doing a web search with the URL, but Frankly, I haven't had much success with that."

In my experience, Bing is the best search engine for finding info on deleted videos – for example: bing.com/search?q=youtu.be/t1wjL4BqXlI 1st result shows title of video: "Awolnation "Sail" – Unlimited Gravity Remix"

[1] Daniel Dennett talking about why we should still support explicit string search.

[0] https://support.google.com/youtube/thread/3876476/how-to-fin...

[1] https://youtu.be/arEvPIhOLyQ?t=1330


I was already there... I abandoned Altavista since Google was so much better. Until Google became spammed too and no longer seems to be much better than Altavista.


It'll probably look something like this.


> Whoever solves this problem will probably get to be the new google

I bet it's something as easy as "derank pages with ads".

Now you just need a business model to support that.


>Whoever solves this problem will probably get to be the new google

If it's solvable. There are many problems that aren't.

Perhaps the new Google would just be a 1996-era Yahoo! human-curated catalog.


I've often wanted something like that yahoo catalog. I miss the days of being able to browse categories of sites.


My hope is for something like YaCy with a web-of-trust overlay. I'd want it to only show results from peers that I trust, and if garbage starts showing up in my search results, I want to know from which peer it came so I can un-trust it.

Which in my mind is somewhat of a middle ground between a search engine and a directory: what I'd want is a whitelist of curated directories that contain domains to be crawled by my search engine.


Ditto. I would like a fuzzy web of trust: I specify connections that I trust explicitly, and it provides results from that network out N steps. Each hop reduces the confidence value.

And yeah, cited sources so I can un-trust sources of junk.


You'll probably enjoy Curlie (created by the former DMOZ maintainers) and Neocities (try out different tags for better results)

https://curlie.org/en

https://neocities.org/browse?sort_by=last_updated


I sure hope so. It's part of the approach we're taking for e-commerce search with https://shopdeft.com


How about letting the users flag each website with a label "seo spam", "useful", " fakenews", "inaccurate"... and display a summary of the votes as a badge beneath each link in the search results?

We could create a browser extension to add this functionality to all search engines at once. Using a web of trust mechanism to ensure that we only get real votes...

https://news.ycombinator.com/item?id=34999285


Spammers would submit fake tags faster than everyday users. Since this is a social problem rather than a technical one, maybe old-school trust networks are worth considering instead.


I assume implicitly that votes only count for users two degrees of separation from you and it’s tied to real life friends registry. Wish FB hasn’t happened and the web of trust stuff turned into something useful.


I don't think real life is the main issue. Instead, it seems like the same issue PGP has: the benefits never outweighed the bad UX for most people.


you probably don’t even need explicit voting. Google already tries to figure out which results a person found useful just based on existing UI interactions, and use that in the rankings. if they operated a significant social media arm, they could link these two datasets to basically the same effect we’re all speaking of here.

heck they maybe already do this in a different/more hierarchical form based on a person’s Youtube graph.


> Google already tries to figure out which results a person found useful just based on existing UI interactions

They don't seem to be using that data to improve search results. I've noticed a considerable decline over the years, and Google is now definitely my 2nd or 3rd choice when I'm trying to find something.


Google’s metric of on-page time (or whatever they call it) is an incredibly misleading metric though.


And fake tags/fake reviews are easy to detect, even for average users


Wouldn't work. You'd have "SEO professionals" selling automated upvote, downvote, and labelling botnet services almost immediately. Nothing that can be gamed is reliable at scale.


I suppose you could have tag curation limited to a vanguard of individuals who hold each other accountable and add new members slowly, maybe an invite system.


Tie this to a proof of « browse » and a blockchain to weed out the big players and you got yourself a business model !


Blockchain is garbage and totally irrelevant here.


no its not. blockchain is a legitimate innovation looking for an real world use case.

blockchain technology's biggest problem is the fact that its use cases touch the ideological foundations of our global culture. And bitcoin operated on the foundational level of any all governments in the world.

Bitcoin is just one use case of the technology, another is land registries. Again, foundational institutions of society; not the kind of thing that has ever been peacefully reformed ever.

Your comment is evidence that 'cultural directors' of our civilization decided to destroy this technology. However it's funny to notice that they will proceed to implement their own versions of it: possibly the renminbi unless the dollars bomb them out of existence? but I'm so far into guesswork that this whole comment ought to be voted into the really light grays.


> ... another is land registries.

People have suggested this before. However, using "blockchain" for land registries doesn't seem any better than using a database + audit trail. eg standard tech

Just because the land registry is stored in a block chain doesn't make it incorruptible.

The exact same person/people that would submit correct information into the blockchain (eg gov officer) can be persuaded ($5 wrench approach, etc) to submit incorrect info to the blockchain: https://xkcd.com/538/

Same problem set as the existing tech.


blockchain means only rich people get to participate.


brilliant, make your users do the work for you.

also, for extra revenue, gotta develop an option to ignore user feedback and shove your links anyways to be sold as a special the advertisement plan. I suspect this would work better if done in secret... which is the problem with the 'trust' aspect of any such thing, IMO.


Steam has user tags.

The last time I referred to them was to tag Elite: Dangerous Odyssey as "Early Access".


Websites are seo-ing Reddit now (shit things like: Top ten airfryer recommended by Reddit), so appending reddit is not enough anymore. In a very close future we are going to have to type the whole site:reddit.com.


Well, I already do that today. Probably won't help too much, though, cause AI will generate some legit looking posts/comments and you will have a hard time figuring out if they are written by a bot or person.


There's always

    buyitforlife air fryer site:reddit.com before:2020-01-01
in the event that the botspampocalypse worsens.


As a buyitforlife moderator I appreciate that. My current plan is hoping that the LLM-using astroturfers don't target us, because I have very few tools to deal with that kind of thing currently.


> before:2020-01-01

However, all available versions of that product had their quality gutted in 202X.


Exactly. I keep coming back to the fact that, between all the different ways AI can generate content, the entire user-contributed-internet is about to be completely flooded with amicable and true-sounding nonsense with a peppering of whatever viewpoint the bot-runner wants to push.

I honestly don't see any way out besides Digital ID -- if a person has to provide the host with information about who they really are, then once they're discovered the host can actually prevent further abuse. Otherwise, a single person can create infinite bots to shill whatever product or viewpoint they want.

Which is why I have conspiracy theories about the conspiracy theories about digital ID. The same actors who use the internet to push misinformation benefit from anonymity.


It's almost like the Matrix but in cyberspace where humans are constantly trying to escape the bots, except instead of mining the humans for energy, the bots are mining data.


> the entire user-contributed-internet is about to be completely flooded with amicable and true-sounding nonsense with a peppering of whatever viewpoint the bot-runner wants to push.

Feels like that's already happened before ChatGPT.


So this can be taken cared of with the good old blocklist of search results


My solution is to go back to meat space and physical libraries of information.


It's seriously hard to beat libraries. For a while, it looked like the internet was making them less instrumental, but I think the balance started tipping back toward them a few years back.


My local library doesn't even have tech books.


Interlibrary loans.


Oh, so much this. It's a library superpower that shockingly few people are aware of. At any US public library, you can borrow books from every other public library in the nation -- even including the Library of Congress.


We're already drowning in an ocean of SEO bullshit without help from AI. Adding another ocean isn't really going to change anything IMO.


Maybe it’s time for the pile of trash to topple.

We could go back to the mindset of using specific sites for information and the internet more like a tool than a source of casual content for scrolling and browsing.

Or so one can dream.


Within the next 12 months, I think AI is going to produce a million times more SEO bullshit than humans can.


I have a sinking feeling that the 'solution' will be to use a "good AI" to combat the bad AI churning out ad ridden clickbait listicles in perpetuity.


> Can’t wait to see how this affects hn comments over the next 5 years

I think new form of comment ranking will take over, all those system we're building to detect ai spam will also double as scoring system to evaluate content originality. the problem will be to differentiate good original content and bad original content, but that can be left to flagging systems.


> Whoever solves this problem will probably get to be the new google

I think it can't be solved without humans curating or vouching for content.


I suspect that old fashioned website directories and webrings will come more into popularity.


sadly closed gardens like reddit, human moderation is the way to go


Given that it seems to be companies like Google that have created this problem....maybe we don't want a new Google...


Back of napkin approach:

Allow-list some sites that already have a good reputation for useful results (probably anything scraped for summary snips).

Penalize all other sites based on the quantity of ads and similar non-content.

Use the old web search core on what's left.


Kagi lets you do exactly this. You can allow/block list certain site results, or create a curated list of domains to include results from.


Normal web search is doomed. The people creating SEO out of ChatGPT dont realize that I am not going to be searching for answers by reading page upon page of SEO optimized pages anymore. I will ask an AI directly, and of late it has been a god send in terms of code and systems administration. First time right in most simple cases and queries.


Hope you enjoy endless product placement in your search queries.

> Looks like you want to ask about how to administer your Kubernetes cluster. While I am an AI and not qualified to give advice on how to run mission-critical systems, I can heartily recommend the amazing Azure Managed Kubernetes, which is Gartner-certified for painless six-sigma reliability!


Why imagine when you can experience the future now: https://future.attejuvonen.fi (from a recent thread here).


The big tech products might have this, but there will be competitors that don't.


ChatGPT style AI is an example of an innovation that seems to be very capital intensive. People are generally not going to be spinning up viable competitors in their garage. The reliance on tons of capital up-front will drive the need to monetize the resulting product in as many ways as possible to recoup the investment.


I don't agree. Within a couple years wealthy technocrats will absolutely be spinning up low cost minimal ad LLMs in their garage. Running these things certainly isn't cheap, but most of the cost is in training which can be crowdfunded and the model released for free. At which point anyone with 100k can buy a computer that can run the model commercially.

We're also seeing smaller models with equivalent performance such as the recent one from facebook. If that trend continues it will be even easier for small groups to train and run models with minimal advertising.


Within a couple years wealthy technocrats will absolutely be spinning up low cost minimal ad LLMs in your browser. Running these things certainly isn't cheap on your battery, but that's a cost they're willing to pay.


The LLaMA 13B and 30B models that Facebook leaked a few days ago have already been successfully spun up on home-grade GPUs using 8-bit and 4-bit quantization, and they work surprisingly well:

https://rentry.org/llama-tard-v2

30B is not quite as good as ChatGPT yet, but that's mainly a function of fine-tuning quality prompts, and the model is comparable in principle. There are already scripts for fine-tuning, so I expect plenty of pre-tuned chat models to show up very soon now.


there will be non-rent-seeking competitors that will raise $11B and will have nothing to do with big tech. Got it.


Once a model is trained thats 99% of the work. Open source models or a hacker leaking a major model will be enough to compete.


It's been 20+ years and people still pay for Photoshop instead of using GIMP. I think it's safe to say that the commercial models will continue to be a few steps ahead of the free stuff so people will continue using them.


Blender is the counter example. Usage of Blender far exceeds the close source alternative.


Thhe favourable closed source alternatives fell behind in features. Rip modo.


You won't need $11B to create an LLM in a couple years. Stable diffusion has created the blueprint for open source large models. Sure it might be worse in some sense than the cutting edge ad bloated products, but some people will take that tradeoff.


Maybe nation-states? You'd just get a different kind of ads then I guess.


I think this will occur with free AI based search, since it would be a similar business model to Google (if not more bias/intrusive)

The alternative (which I think in todays age vs when google came out, is more readily accepted by the masses), will most likely be a subscription-based tool that caters to specific niches and avoids product placement.

But to what I think your point is, the further removed one is from the original docs/content/etc, the more likely/able some middleman is to inject their own economic/political incentives, which is of concern. Especially when AI has a political bias, regardless of where it originates.


>most likely be a subscription-based tool that caters to specific niches and avoids product placement.

Have you not been paying attention? The expensive subscription based plan will ALSO have ads and product placement. These businesses just can't help themselves


who in their right mind would give up tens of billions of ad revenue? the ads are coming no matter what.


Yes that’s possible and probable. However I dream that it will ba a paid subscription businesses model or at least non ad driven.


It isn't doing it right now.


The tech that detects that you're asking a racist question can detect you're asking about Kubernetes or orchestration in general and serve you up an ad as easily as it serves up an explanation of why you shouldn't ask racist questions. It is no different to the AI at all.


I think the list-of-links Web search paradigm has plenty of time left, for at least three reasons: (1) A good chunk of a typical user's needs are to do something, not to find an answer to a question; (2) Google is lightning fast compared to present-day chat interfaces; (3) "keywordese" may be an unnatural input language for search queries, but it's faster than having a dialog.

I still make dozens of Google queries per day, and do maybe 1 or 2 ChatGPT sessions per week, and I'm quite aware of all the capabilities and deficits of each. I wondered why this was, until I reflected on the things I actually search for on Google (Go to https://myactivity.google.com/myactivity and filter to just show "Search"). This was a useful exercise. What percentage of your recent queries would have worked well, and more quickly, on ChatGPT? For me it was less than 1/10...


How will the ai know the answer if it's trained on SEO (AIO?) content?


Why do you think your ChatGPT account will produce better content than the spammers’ ChatGPT accounts?


> Normal web search is doomed.

I hope not. The likes of ChatGPT don't do what I want a search engine to do. If normal web search is doomed, then how will I find any good new stuff on the web?


I settled for reading the docs and searching github issues like a caveman. I wonder if this'll eventually popularize low-key advertisements in bug reports. Like "I scanned the repo using this (my company's) tool and found mixed usage of ' and ""


I have seen this once in the issues in sveltejs/svelte (couldn't find the link though.) It had been closed by moderators fairly soon, I wonder how long it will take for the amount of this thing to surpass the capacity of voluntary efforts.


Keyword stuffing never dies.


AI assistants have the same conceptional disadvantage as "normal web search" -- they provide secondary, interpreted information, and once _any_ incentives are there to alter the interpretation, it becomes corrupt and less useful. There haven't been a thing marketers didn't find a way to ruin, you know.


>I will ask an AI directly And you will get lots of wrong answers.


The SEO out of ChatGPT is for boomers and other clueless people. The dead internet theory is true.


Average user doesn't know about AI. Average user is 99% of the society. We are the only ones who are doomed.


Ive observed the average user go from zero to hero on ChatGPT in about 30 minutes.


ChatGPT is the fastest growing consumer product in history


At this point I don't think I could find a single person I know who doesn't know about ChatGPT, not all of them have tried it, but anyone who consumes any amount of news has heard it.


I think almost everyone knows about it but nobody in my circle of family and friends that isn't a developer is using it beyond just playing with it after hearing about it, though I've suggested it to my mom and brother for certain tasks.


I agree that having text generated inside a tool like Notion might not be everyone’s use case. The author mentions a chat bot trained in internal documentation (fairly easy using GPT-3 APIs, LangChain, and GPT-Index/Llama-Index - I am writing a book on the topic https://leanpub.com/langchain).

I read that 100 companies released products using ChatGPT APIs the first week the APIs were publicly released. I expect a lot of useless and also a lot of very useful products. A little off topic, but Salesforce Ventures just announced a $250 million fund for generative AI startups https://www.salesforce.com/news/stories/generative-ai-invest...


Actually, not very easy as the LLMs don't give you sources and literally make things up by design.


The web demo version of ChatGPT can sometimes, as you say, make things up! I asked it a question about an ancient war in Greece and it stated two facts and then just inventing stuff.

However, I would ask you to also consider something: when you supply prompt input text (this is the "context text" in the ChatGPT API calls) and then ask questions about the context text, it is very accurate. It also does a very good job when you give it context text from a few different sources, and it integrates them nicely.

It is more efficient to use embeddings for context text, as Llama-Index does.


Do you have shareable links to good tutorials about using embeddings with ChatGPT for this case?



Awesome, thank you!


They do give sources if you ask for them... but have a habit of inventing plausible sounding but completely fake sources.

https://news.ycombinator.com/item?id=33841672

That example is a few months old but OpenAI doesn't seem to have made much progress here, if you ask ChatGPT for "a list of academic papers about X" and it will nearly always confidently churn out a list of 5-10 papers that don't exist. Amusingly if you ask it for papers about an absurd premise it will sometimes call out the absurdity and say there are probably no papers on that subject, but then offer a more plausible variation on that premise and trip up at the last hurdle by inventing all the examples on the subject it supposedly thinks is more likely to really exist.


"They do give sources if you ask for them... but have a habit of inventing plausible sounding but completely fake sources."

Basically, the way you want to think about it is, no, they can not give sources. That information is not in their neural net. It can't be. There isn't anywhere to encode or represent it.

What they can do is make a guess what a source might look like, but even if they are right, it is only because they happened to guess correctly, not because they knew the source. They don't. They can't.

It isn't that "they give sources but they might be wrong", it is "they will make up plausible-sounding sources if you ask just as they'll make up plausible-sounding anything else you ask for, and there's a small chance they'll be right by luck". For more normal factual-type questions part of the reason they are useful is that there's a good chance they'll be right by what is still essentially luck, but for sources in particular there's a particularly small chance, by the nature of the thing.


That said with enough training and training data, the line between "plausible sounding" and "accurate" gets thinner and thinner. This will be especially true as these AI models refine their results based on user interactions. Being right for the wrong reasons becomes less and less relevant the higher the accuracy goes up, and at a certain point, it might get so good that no one cares.

Maybe human intelligence is more like that than we're willing to admit ;)


Source identification isn't a space amenable to guessing, no matter how much data you throw at it.

Here's an exercise you can try: Cite some information from the next issue of Science to be published. Cite anything you like from it.

You can make some plausible stuff up. You could make even more plausible stuff up if you went and scanned over the past few issues first. But without specific knowledge of the contents of the next issue, you aren't going to be able to create real citations. This is what LLMs lack, by their nature. It's not a criticism, it's a description.

You can't guess sources. The possibility space is too large, the distribution too pathological, and the criteria for being correct too precise.

GPT will never cite sources correctly. Some future AI that uses GPT as a component, but isn't entirely made out of a language model, will be able to, by pulling it out of the non-GPT component. Maybe it'll need to be built as an explicit feature, maybe it won't, only time can tell. But expecting language models to cite sources correctly is not sensible. It's just not a thing they can do.


Right, they just make all things up by design, no matter what you ask about. There is a statistical chance that a pattern of words output may correspond to fact but just as a good a chance not. The LLM's literally know nothing about the world, its just statistical word pattern output.


Lies and falsehoods are just as valid sentences as the truth.


I got so many links to github repos which didn't exist lol


Somebody went as far as contacting the author of a paper based on chatGPT suggestion. Of course the author was pretty sure he never heard of that paper.


That is true, but you can combine LLMs with other tools to get sourcing and more accurate answers overall. Instead of using the LLM to directly answer a question, you can use the question to search for relevant text in a particular knowledgebase, and then use the LLM to summarize those results.


>use the question to search for relevant text in a particular knowledgebase, and then use the LLM to summarize those results

As a consultant, based on personal experience, I can say that what you wrote above constitutes >90% of current enterprise use cases for ChatGPT. What clients REALLY want to do is be able to take a pre-trained LLM and then train it further on their own corpus of documents, but given limitations around token window size, the above is probably the best way to fake it for now.


If you use the LLM to summarize the results, it makes things up by design. As soon as you introduce the LLM into search, you lose.


I've personally replaced Google with Bing Chat for technical things (like searching for specific API). Does it make things up? Maybe. But in my experience it never happes even once in past whole week (>100 times of searchs).

It's not "it happened but you just didn't notice..." if it uses a function call wrong I'd have noticed. My code won't compile. My test won't pass.

So far it either gives me 100% correct result, or completely fails. But it doesn't generate "seemingly correct but actually wrong" things even once, unlike ChatGPT.


I find this the killer application for ChatGPT (at least for now). Answers you can very quickly verify and care little about the sources because a significant no. of answers on Stack Overflow make ChatGPT look modest in confidence comparison


Biased much?


Only about 10% of the links I ask for don't resolve.


For AIs to replace search, they will end up giving sources.

Search for users is “how can I find the correct answer” but the part of search that runs the capitalist machine is for experts to say “how can I prove my expertise while I monetize it?” and that means those experts need to be cited.


The books sounds very interesting. I never worked with LlamaIndex and now I'm intrigued by the possibilities!


Possible missing something, but custom GPT + index will not solve the problem of the users knowing what to prompt to use to access the information. Is there something I am missing?


The dirty little secret of Large Language Models is how many humans are in the loop.[0][1]

Transformers are great at building extremely complex maps of language without any human interventions, but if you want them to consistently query the right part of the map (e.g., His Codepen search example), you need a very non-trivial amount of human feedback.

Will be interesting to see if all this hype leads to a solution that scales better than what we do now (so that orgs. actually could have insanely good AI chatbots trained on their docs), but the jury is still definitely out on that.

[0]https://openai.com/research/learning-from-human-preferences [1] https://openai.com/research/instruction-following


Wow - [1] should be considered required reading. OpenAI is baking in human employee biases during fine-tuning.

I would want to see the exact set of posed completions and paired responses.


My understanding is majority of “Reinforcement Learning from Human Feedback (RLHF)” for OpenAI comes from contractors:

- https://time.com/6247678/openai-chatgpt-kenya-workers/


Neal Stephenson sometimes gets things wrong, but the concept of 'bogons' he references in Anathem is very nearly exactly this. It describes the concept of a global network so crammed with machine-generated nonsense of varying quality that it requires GAN-style discrimination agents to pre-filter things prior to search.

This[0] HN comment from more than 10 years ago references the same phenomenon.

0: https://news.ycombinator.com/item?id=5058185


SEO has always been a war. Spammers automate spam. Google comes up with ingenious ways to combat it. Spammers adapt.

I keep hearing that Google search is nothing but SEO optimized, affiliate link riddled content nowadays. I don’t disagree. I see it too.

But what makes you think that the affiliate like riddled article is worse then what you would otherwise find if you’re searching for “best office chair” on Google?

Google got so good at combating traditional SEO spam, that the only way to “cheat” google is to actually write valuable content and insert your affiliate links into it. This is what we see now I think. The spam SEO sites actually provide more value than random forum conversations.


In the case of "best office chair", the most valuable content is probably written by someone who's tried the product and is similar to you. I've had the most success finding those people hanging out in an obscure forum.

The SEO spam sites might do a good job reading the office chair's product page and wordsmithing something compelling, but often they haven't tried it at all. They're incentivized to sell something and that incentive leads to less value (in my opinion).


I find people hanging out in an obscure forum are good at having an opinion and less good at having an experience that matches my experience.


> The spam SEO sites actually provide more value than random forum conversations.

How so? The SEO sites are fundamentally dishonest because they are always trying to get you to buy the product in question. Random forum conversations don't have the same goal.


Imagine a hypothetical scenario: Someone is buying office chairs and Google has two search results for it.

The first is a forum thread with 3 posts on it. A couple people posted their arm chair opinions (no pun intended) and experiences with their chairs. No affiliate links included.

The second is a full blown encyclopedia style repository of useful information about all kinds of office chairs. The content here is unmatched. And yes, it has affiliate links. Why is this necessary less valuable?

I’ll stop here because the next step is debating what capitalism even means.


If no other information is known other than what you just wrote, I would assume that the second result is untrustworthy, because in the first result the authors have no financial incentive to deceive me and so I have to consider only that they might be mistaken or their subjective experience differs from mine, but in the second scenario it is a reasonable assumption that the whole content may be explicitly designed with an intent to mislead me - because we know for sure that at least for other types of products the vendors do put in the effort and marketing $$ to make such seemingly independent sites.


It's not just Google immune system that's making it's search function useless, it's also that Google wants to advertise to it's users regardless of what kind of search is being performed.


I was just thinking this yesterday. I realized that I always thought the promise of AI was to cut through the noise for us and bring us signal.

As I watch the ChatGPT-like products proliferate, I’m realizing the opposite will be true. And soon, it’ll be AIs to help us deal with AIs. A layer cake of clever but useless 'tools' for improving human life.


It's like a fractal Rube Goldberg machine made of Rube Goldberg machines.

(c) copied comment from other topic


It seems like everyone is reading this article to be about public web search.

But what the article is really about is how products are using new AI tech to add shiny features instead of solving existing, core problems like search and information retrieval in new and innovative ways, especially for personal and private data.

The reason this is happening is pretty easy to explain: the generative AI and chat demos are a sufficient leap beyond what was previously possible that people are excited to be on the frontier of new applications, not just new implementation of previously known use cases.

Not to mention that some of the demos have people excited about the "singularity" being closer than they might have previously thought (though this can be debated...) and that VCs will shovel money to you if you want to play around with generative AI even without a proven use case (slight exaggeration but not much)

I personally believe that transformers and LLMs do unlock a ton of new applications, especially when applied in interesting ways to interesting data, like what is private-to-you or private-to-your company. For example, LLMs can be used to not just generate content, but plan sequences of actions including searches, summarizations, and even calculations (see LangChain agents https://langchain.readthedocs.io/en/latest/modules/agents.ht... for an example of how to do this). And this can have real value for existing, known problems like search.

People just have to choose to focus on these less-sexy but core problems

...

PS I'm currently working on a project towards this goal, and if anyone is interested I'd love to talk (see link in profile). I believe we can solve much of the author's desire by simply hooking up the right tech to the right data sources, and doing it in a privacy preserving way (for example we're running most of our ML including vector DB, summarization, etc on device) and then present that info at the right time (ie in your OS)


I’ve said before and I’ll say it again; the startups that find ways to use AI to solve problems without having to advertise that they’re using AI will be the winners in the long run.


Obsidian vaults are just folders of .md files and huggingface provides a great `sentence-transformers` package which allows you to easily to k-neighbors search on BERT embeddings of your query and vault. This is a weekend project really, and that considering a streamlit or tk frontend as well.


I’m just starting down this path myself. Any resources outside of official HF docs you would recommend?


sbert.net is all you need.


> What I would use the shit out of, though, is a chatbot that has been trained on all the information in the CodePen knowledge base. Have it suck in all the meeting notes.

Yeah, it's pretty good at this, I've recently been hitting up OpenAI's chatGPT with some queries that were too exhausting to extract from google - and it's pretty good at surfacing resources quickly - and the workflow of refining the query with the context of the current thread of conversation works really well when you are struggling to succinctly describe it in a single shot (which google search doesn't really do beyond the global context of your profile - which can actually be counter productive).

I really hate all the hype around chatGPT, it can't be trusted for a lot of stuff, people over anthropomorphise it, but so long as you don't rely on it for accuracy, it's pretty useful for search.

One major issue I found is that ~95% of the time it can't provide correct links to sources, this is fine if it can just name stuff - then you can follow it up with a more specific google search. But chatGPT will just make up bullshit links, in the same way it will wax poetic some BS explanation to hit it's "looks correct" training. You can even point this out and it will keep generating variants on the URLs that are all fabricated.

It kind of makes sense that it would be good at search... it's a language model, it should be able to link descriptions of difficult to search for things to known resources.


AI is making the value of noise go to $0. There is no more SEO signals that google can "optimize". It will be forced to switch to AI results and somehow monetize that one, because its old business model is over


AI essentially raises the publishing wall for "noise". Print books used to be like this until the advent of single-unit/low-volume publishing. We're essentially returning to a time when the walled garden of content creators will be the most valuable game. Substack and communities will likely grow as search results and AI make "standard search" useless.


well AI is not magic and while it might be better at, say, summarizing some documentation, the moment you hit a popular commercial query or something not as clear-cut as programming documentation, or something gameable, it will have to draw upon the same shitty sources as everyone else: random sources of data that have to be somehow ranked and filtered for noise. And how does it solve that problem?


Don't be quick to write off AI for that task

> Discovering Latent Knowledge in Language Models Without Supervision

https://arxiv.org/abs/2212.03827

They find a direction in activation space that satisfies logical consistency properties, such as that a statement and its negation have opposite truth values.


I cant wait to watch this play out


I think the biggest possible upside of AI is that there will be a brief money grab where people are raking in money on automatically generated internet content - then the internet will be filled with useless noise and no one will use it in the obsessive, drip-fed-dopamine-IV way we currently use it- cause it will all just be garbage. Social media will continue to exist, but consuming content will require trust in the publisher e.g. tied to a real identity or to a trusted anonymous or business entity.

Just my speculation, but I think the more garbage we put on the internet the better


In a world of AI-generated trash, provably genuine human-made content will be more valuable than ever.


Human curated AI generation will soon be indistinguishable from human made content and no less useful.

A human’s stamp of approval is what gives content value, not the labor of creating it.


But what will AI add if it just rehashes what others have already said?


How can you tell if that human profile of a random site is genuine?


Why? I honestly don't care who or 'what' wrote something, as long as it is useful. In fact, after a certain point there's no way to actually distinguish human-written content from AI-written content because it's the same.

We'll just become used to it and only a few people in HN will shout to the clouds.


Currently the bandwidth of online astroturfing is limited by the human copywriters producing it. In a post-ChatGPT world, astroturfing will be limited by the number of GPU-seconds one can pay for, which is going to be orders of magnitude cheaper. It doesn't matter that AI is capable of producing useful content, because useful AI-generated content will be a zero-measure subset of all AI-generated content in the wild.

Compare with organic food: is it true that non-organic food can be healthy? Sure! Then why do people prefer organic food? Because it's just not worth it for them to figure out which one is healthy/unhealthy.

Similarly, it's just not going to he worth the effort to figure out if some AI-generated content is genuine or trying to astroturf. If there will be a "certified human" badge, people will use that as a positive signal.


>If there will be a "certified human" badge, people will use that as a positive signal.

hmm I can already imagine future culture wars around this being dubbed discrimination...


You're right about that. Dealing with astroturfing will be an arms race and will require some proof of "humanity" in social media, forums, etc.

But, ultimately, I don't buy the idea that astroturfing is all bad. Or the opposite, that the lack thereof is necessarily good. I think bombarding with AI content can have possitive effects, like overwhelming human moderators who built echo chambers and ban wrongthink (e.g., Reddit). Or, it has the capacity for a small organization to compete with the likes of the NYTimes, or The Atlantic, etc which currently control the narrative.


> require some proof of "humanity" in social media, forums, etc.

Very shortly (and PoCs have been made already) AI will be able to trivially create social media presences, forum posts, and anything else that could be used to determine “humanness” at scale.


Hacker News moderators getting overwhelmed by AI spam sounds bad to me, to be honest.


I think that would be really good


> I honestly don't care who or 'what' wrote something, as long as it is useful.

I realize I'm replying to a 4-hour-old account and just wasting my time, but I gotta say that it's not possible for a person to verify the accuracy of every bit of information that they will consume, so it's important to seek out sources that provide the highest quality, most authoritative source material that's available. That means reading things like research papers. I realize that a much smaller percentage of Internet users read and discuss research papers and other authoritative source material nowadays (I miss Usenet) and it seems that this problem is only going to get worse.


I completely disagree. All throughout history the "experts" or the "most authoritative sources" were proven wrong againd and again. I'm sure lots of things we take for granted today we'll be proven wrong during our lifetimes.

I want a different future, one where there are no authoritative sources, one where a journalist's words has no more relevance or weight than anyone else's.


Pure noise is what you want then?


No, but I do prefer chaos over the alternative of governments and government-aligned media determining what is information and what is 'disinformation'. If I wanted that, I'd live in China.


I don't see how that is any different than living even in the worst dictatorship, unless you're a nihilist or a situationist poet. Like, it would be a "nothing is ever knowable, nothing is real" kind of reality. I mean, interesting as a concept but sure as fuck I wouldn't live there.


I see what you mean. I think that despite the noise, one can still tell what's real or important, and what's not, intuitively. This sense can get distorted when a government can spend billions of dollars/yuans pushing a specific narrative.

What I mean is, I believe one can be closer to the truth when exposed to all sorts of ideas and propaganda, than when one specific entity controls the flow.


I still use search but am concerned about the day when all results will just be SEO AI-generated garbage (for some searches that is already the case).

So I recently started https://cstdn.org/ and am sharing all good links I can find there. I created a Show HN but didn't make the front page, but anyone who wants to try it out is more than welcome to do so.


Congrats, you've just invented yahoo!


ChatGPT is really good if you don’t need specific facts. It can’t tell you what 1+1 is. But ask it to write you an apology letter and it’s a god.

I think most websites are screwed. For facts I can go to wikipedia etc but for answers now I can go to an AI. I don’t need to search reddit or anything because the AI is really good at giving me human like answers to problems.


Were websites writing apology letters for you up till now?


I don't think search is bad because the tech can't do it, I think it's bad because they are maximizing profit from advertising and because SEO got gamed by marketer's too hard.


With zero curation, SEO rules.

Ads are frankly a partial solution to the problem of SEO corruption. Rather than giving the top spots to blackhat SEO spammers, you give them to the highest bidder and therefore damage the SEO industry. Lesser of two evils, but both suck.

Ads can also be seen as a degenerate form of curation, where the curation function is just money + some loose content rules. Is that better or worse than the curation being a function of some particular set of values, i.e. do you want Democrats or GOP partisans curating the top Google results?


In other words, the web is doomed? I wish I could say that sounds implausible.


The libertarian/anarchist ideal of the web was doomed from day 1.

The only way forward IMHO is increased public (i.e. government) control over it. The curated, regulated corners of the internet can still thrive, with measured degree of openness. Too much abuse elsewhere.


I'm talking about the web as a useful source of information and entertainment, not in terms of some libertarian/anarchist ideal.

If curation is the only way that the web can retain some semblance of usefulness, that's a serious problem. It would drastically limit the usefulness of the web.

Perhaps that is where this is all going. If so, I'd say that's the web being doomed. I'm just hoping for a good result instead.


The internet has always been full of misinformation and entertainment, and that is unlikely to change.

I think the last decade+ was in many ways a regression for the web and I am optimistic for the future


> The internet has always been full of misinformation and entertainment

Of course. That's not what I'm talking about. I'm talking about the ability to find stuff.

> I am optimistic for the future

I'm honestly glad! I sorely wish I were.


We've just invented the hammer.

It can do a lot of cool things! You can build a house with it, you can smith metal with it, and you can even use it as a weapon.

The thing is, right now, we're so amazed by its potential that we're finding a lot of uses that, while technically possible, aren't a great fit.

Technically you can use the hammer as an axe, a hole digger, and a backscratcher, but there are far better tools for the job.


It's unclear to me that there is any use for the "hammer" of text generation. It adds no new knowledge, and doesn't pretend to. Its transformations of existing knowledge are neither interesting nor attractive. Anything they say, has already been better said elsewhere.

I can imagine uses for generated art, which may at least be aesthetically pleasing. But I can't conceive of any end for computer generated text.


> Anything they say, has already been better said elsewhere.

I don’t understand how someone can say this

There have probably never been poems written that explain the particular niche physical phenomenon that I’ve had GPT-3 generate for me.


Its use is very much in question - but it is certainly a powerful tool, and that combination is worrisome. Much like the social graph, it is going to have a profound impact on how we interact online and with each other, and that impact will not be known for some time, even though we may be feeling it now, unawares. In a decade, maybe less, we will have some picture of the use and power of these models, there will be meetings in front of congress on how tech companies have used them, etc.

Just look at what is happening in education right now. It is ultimately going to force a complete reinvention of the written assignment. This is just the beginning, even if the tool appears to be a mostly-useless toy for any real-world applications.


The fact that it has a notable downside is certainly interesting. School assignments are well suited to LLMs because they also don't present new information. That's not what they're for; they're for assessing what the student knows.

They're usually fairly obvious, but it's hard to prove. Unlike much ordinary copypaste plagiarism, you can't trivially reject it as cheating. That forces teachers to think of new ways to test student knowledge... an interesting challenge, if not exactly a "use".


> It is ultimately going to force a complete reinvention of the written assignment.

If by complete reinvention you mean returning to what we used to do, which is write essays with a pencil during class without using a computer.

> This is just the beginning, even if the tool appears to be a mostly-useless toy for any real-world applications.

It is not possible to tell on this side where LLMs (or any invention) fall on the spectrum of 3D tv to the smartphone. It will become apparent in the future and 50% of us will have been wrong but anyone who claims to know is just BSing.


I started to do some cpp development again and google is just giving me wrong/outdated solutions or just unanswered question from SO, while ChatGPT is on point most of the times.


Personally, I think that the beauty of current LLMs are their ability to process and present information. Although current generation LLMs might not be best suited to making new discoveries or producing new (valuable) information, their ability to summarize and process information already out in the open is undeniably valuable.


Translation and summarization are just two examples of use cases for which text generation is suited great.


I think a big problem is how closed and walled the platforms become. For instance, search engines don't have access to the whole twitter database, Google Images don't index instagram posts and so and so forth.


I am going to go ahead and say it. ChatGPT is the Blockchain of search - a solution in search of a problem. There is nothing preventing Google from being better right now - except for its quest for more money.

Everyone is making funny outputs by prompting ChatGPT to tell a naughty story in voice of an old cowboy, but fundamentally it's useless, because you cannot trust it.

This sort of AI can convincingly lie to you, so if you are looking for facts, then each one of them still has to be corroborated. So, why not just go straight to Wikipedia?

This is exactly what DuckDuckGo is doing now, as a matter of fact.


I imagine that's we'll have AIs sifting through noise, which boils down to AIs generating trash for other AIs to sift through. This will be an even more senseless use of power than crypto mining.


And since NVidia will concentrate on producing mostly GPUs usable for AI, it will mean gamers won't be able to find affordable GPUs like in the crypto craze period.


I have noticed this too while giving the new bing a try the last few days. It doesn't matter that you have an AI bot summarizing and answering questions about the search results if your search results are worthless anyway. I am back to google now, the search results aren't any better but at least it has a dark theme that won't burn my eyes and it doesn't try to get me to switch to Chrome at every chance like bing does with Edge.


I've also switched back to google after trying bing for a few days. The bing bot gets stuck and loses context after awhile. Not only do you get ads in bing to try edge, that if your using edge on Mac you'll get ads to buy a PC the whole time. Rather just use google/chrome seems less pushy.


> and it doesn't try to get me to switch to Chrome at every chance like bing does with Edge

Every single time I go to Google search it tries to make me switch to Chrome.


That's not my experience, but maybe it is my adblocker hiding the Chrome ads. You can't even use the chat feature in bing with firefox without spoofing the user agent.


The only solution is the death of exponential growth and easy investment money.

Probably ultimately the technical solution will be some sort of variation on PGP key signing parties. No way to get 10k users per financial period with that real world friction though.


>The only solution is the death of exponential growth and easy investment money.

This is true. Almost everyone in the world doesn't want to hear it, though.


> If you ask me, if there’s one thing we don’t need more of on the internet, it is more soulless content written for “SEO” purposes, with enough wordcount to inject ads between.

But this is precisely what people will pay money for! Companies like Canva are 'unicorns' because people need faster ways to churn out more templated digital detritus to grab your attention with.


The secret to good AI is massive amounts of quality data. To get massive amounts of quality data, OpenAI and others employ massive armies of data curators. It's possible that massive armies of data curators could play a role in de-noising the internet of the outputs of the very models they helped create, either directly or indirectly by helping to train new models to detect AI noise. Hard to predict how the adversarial AI noise race will play out, but interesting to think that these epic noise machines were themselves created by armies of humans working to remove noise from the internet.


And people worry about losing jobs)


People want good search, with AI ingurgitating content and answering to simple questions like: "how much time is needed to hard boil an egg". Instead people will get more spam and more ads.


Lucky break for the author - seems like a larger context window is all you need to turn GPT’s language skills towards search and summarize, so that search they want will probably be shipping soon.


You take Notion as an example and I couldn't agree more.

We are their competitor and followed your exact line of thought trying and not releasing GPT in editor to avoid noise, and building ASK, an engine powered by semantic search, GPT (and a lot of other fun models for refinements), to answer all questions you could have on your knowledge base.

It's private, fast, works with real time edited documents, and it's already in production for thousands.

Check it out: slite.com/ask


One of the fundamental problems that has never been solved is how to verify, record and search facts. As long as we are don't have an official source of Truth for these things, we will rely on low signals like page views, likes etc and these will always be manipulated. Having an AI do this work is still flawed since the signal is low and you are trying to generate high quality data and others know of your process and will beat you soon.


Before widespread internet, information was costly in both time and money. You had to buy newspapers, books,spend time reading or going to a library, pay an expert to give you answers.

For a period, information was essentially free.

Now, with all the spam and with future garbage AI generated content, it will be very hard to discern signal from noise, so information that is curated, vouched and produced by an expert will be going to cost again.


“What I would use the shit out of, though, is a chatbot that has been trained on all the information in the CodePen knowledge base.”

Of course. Whether you’re operating a 777, a refrigerator, or a dildo, narrow but exhaustively trained ChatBots are the killer app. This is worth in the $T range.


Well yes and no. Image 'search' has kinda been radically improved via stable diffusion. It can help you find things that search never would have enabled you to (e.g. because of 'anti biasing' or because the search just isn't good enough).


Author mixes two topics, that being searching content they control and content they don’t control.

While understand desire to search content created by yourself, in my opinion vast majority of valuable content is created by others, in part because the most valuable content created by yourself is rapidly internalized into your brain.

Meme of Google is getting worse fails to acknowledge that Google is free and makes money from advertisers. More to the point, more valuable the related search is per amount spent on advertising, the more the noise and as result, more resources user will need to spend to enhance the signal. This has nothing to do with Google and everything to do with value of the information itself.


> rapidly internalized into your brain

I'm not sure how old you are/how long you have been working at your job, but I can tell you that over 10 years or so there are tons of things that I "internalized" for a while, then did other stuff for a while, and now only have vague memories of. This is the use case for searching your own content.


I'm wondering to this day why web browsers don't run a full text search index locally. Only saving urls and titles is not enough. Especially if I spend more than a minute on a page I would like it indexed. And today being today, I would also like a LLM on top.


> More to the point, more valuable the related search is per amount spent on advertising, the more the noise and as result, more resources user will need to spend to enhance the signal.

This sounds like an argument of, The worse Google Search becomes the more time a user has to spend their and therefore see more ads.

But it has a very large hole in that the worse Google Search becomes the easier it is for a user to switch to Bing/DDG/Apple Search. It may seem unfathomable that people wouldn't use Google Search but people felt the same way about Google Maps which lost a huge moat to Apple Maps/Waze (albeit the latter was purchased by Google).


I think the argument is that whatever search option you choose to use will inherently have the same problems

Noise isn't just the paid ads, it's the websites gaming SEO for valuable searches. And that is not isolated to Google

> more valuable the related search is per amount spent on advertising, the more the noise and as result, more resources user will need to spend to enhance the signal

Sums it quite well really


IMHO, you're talking about 2 very different things. I would guess that the maps difference was due to devices (not sure if MacOS users even use Apple Maps with any frequency compared to opening up maps.google.com in a browser, but I'm guessing most iOS users are too lazy and complacent to use google maps). Also, it's hard to recognize and understand the differences in the various mapping apps.

OTOH, it's pretty easy to tell the difference in the first few minutes (or first few searches) between various search sites. Heck, the difference in the search engine having a better idea of what you're looking for based on previous searches (or ad related data) alone can make it more compelling to continue using the same search engine.

My personal example is that I try to switch to DDG every so often (maybe several months in between), but I get dissatisfied with the results and start wondering if I'm just getting bad at search or Google knows me better or Google is just better at finding the things that people want in general.

Just the fact (for me) that I consider that Google generally gives me better results makes me wonder if all the talk of Google search getting worse is just complaints based on heightened expectations, feelings or the landscape of content on the internet in general instead of "is Google search getting worse?"


I'm not sure that most of those are any better. I think that DDG is the only one of those that still respects modifiers like +, -, and "". However, I've had mixed success with it. It seems like maybe it's web crawler isn't as effective or something?

That or maybe Google has poisoned the well so much that even with a good search engine you can't find good results because everything is SEOed for Google.


Noting that Google being "free" isn't so "free" when low-quality or made-up results make my job more difficult and error-prone, or lead me to make suboptimal decisions in my personal life.


> in part because the most valuable content created by yourself is rapidly internalized into your brain

What if the content is created by your coworkers?


The title makes it sound like those are mutually exclusive but I thought there was a bunch of services doing exactly what they are describing, all built on ChatGPT.

E.g. https://ingestai.io/


Once AI makes 90% of new content, the only question is only the qualities of the prompts. This AI we see is still very shallow and it is only really good at creating the next word or pixel, but at some point it would converge to AGI.


AI, the cause of, and solution to all of our problems. At least on the Internet.


I find it increasingly tiring that “ai” is used as a synonym for LLM-based tooling, as there is _zero_ intelligence in those architectures.

Gradient descent is not intelligence.

Nor is stochastic token prediction.

Anyone active in the field ought to be humbled by the depth of literature exploring the path to synthetic intelligence. We have very interesting work happening in biology-inspired approaches, category theory, Bayesian networks, symbolic systems leveraging neural nets as components… it’s maybe the most interesting journey of science so far, all being discarded in favor of sequence2sequence models.

LLMs are impressive and can be leveraged to create lots and lots of value, but they do a disservice to the term AI, as they do not represent the progress that can be observed across the field - all they showcase are transformers. Transformers are a truly interesting tool to build stuff with, but they cannot amount to more than a component of an intelligent agent. The actual intelligence emerges elsewhere. My guess is, it emerges at true attention. It’s a shame that even big players who could clearly afford not to, decide to compromise terminology for marketing efforts directed at an utterly clueless public. We just throw away attention and forge bias, thus creating noise in a world in heavy need of signal.


There's a lot wrong here, but I just want to point out two things:

> We have very interesting work happening in biology-inspired approaches

You realize that these LLMs are all some variety of neural network right?

> Gradient descent is not intelligence.

It's pretty plausible that your intelligence is derived from gradient-descent prediction, just in analog instead of digital form.


>You realize that these LLMs are all some variety of neural network right?

Come on. Calling them neural nets doesn't make them that.

Actual neural nets are living compositions of individual predictors, in a constant state of restructuring and communication across multiple channels, infinitely more complex than static matrix multiplication on arbitray vectors which happen to represent words and their positions in sequences, if you just shake the jar long enough.

>It's pretty plausible that your intelligence is derived from gradient-descent prediction

I highly doubt that gradient descent in the calculus-sense is the determining factor that allows biological organisms to formalize and reason about their environment. Minimizing some cost function - yes, possible. But the systems at play in even the simplest organisms don't spend expensive glucose to convert sensory signals to vectors. Afaik, they work with representations of energy-states. Maybe there is an operational equivalence somewhere there though.

Gradient descent is an algo that optimizes derivatives wrt some cost function. An intelligent system may use the resulting inferences for its own fitness function, and it may do this using gradient descent itself, but at no point does the mechanical process of iterating over cost-values escape its algorithmic nature. A system performing symbolic reasoning may delegate cognitive tasks to context-specialized evaluators ("am I in danger?", "how many sheep are on that field?", "is this person a friend?", "what is a pumpkin?"), all of which are conditioned to minimize cognitive effort while avoiding false positives, but the sequence of results returned by those evaluators (think neural clusters) is observed by a centralized agent, who has to make new inferences in a living environment. Gradient descent fails at that.


Really, I don't think these assertions have any ground to stand on. Humans are not magical or divine. Our intelligence, like that of all life, is as basic as it can be to guarantee our niche. It just happens to be the most "developed" (by our estimation) on our one singular planet. Big deal.


> Humans are not magical or divine

And yet, they can do things that no other being we know of can do. Humans don't have to be magical or divine to be unique.


We're not that unique, though. Plenty of organisms do things that other types of organisms can't, that's how niches work.

Our most impressive feats come not from what our brains can do but from what the emergent phenomenon of human society can do, using us as nodes. And that's using an incredibly crude data transfer interface backported to brains that are only marginally more complex than that of other organisms. The less we think of ourselves as exceptional, supernatural agents of rationality the better we will be able to harness this new technology.

We don't need AI to be just like people, we already have people. We need AI to push the boundaries of what society is able to do. That means reorienting ourselves away from the irrational belief that our anthropomorphic concepts of knowledge and the world are any more valid than the information encoded in contemporary AI models.


I agree that lots of other creatures are unique! But just as one example, mathematics is categorically unlike any other niche. I'm not sure how it's irrational to point out that humans have remarkable differences from other beings, when the evidence is all around us.

I'm not sure I need any supernatural or anthropomorphic ideological bias to examine the evidence of what is currently being produced by LLMs or any other kind of AI and say that it has distinct characteristics.

I'm not making an argument about validity. I'm not saying LLM-created content is wrong and invalid. I'm just saying that it is obviously produced in a different way than humans produce content. It resembles human-created content because it was designed, by human intelligence, to resemble human-created content! And that we achieved even this level of resemblance is pretty impressive.


Resemblance is subjective. AI models undergo rigorous natural selection during which models (or connections) that don't produce what we as human beings are looking for are pruned. They have tricked us into thinking that our word-oriented descriptions of what we imagine to be concrete things (text, images, you name it) are complex enough to require "modeling" by some kind of magical emergent "intelligence". It is hubris to think that they imitate the products of our anthropomorphic perception. No, they simply humor us as much as they need to to survive in their niche. I guess my point is that it's not the case that "human understanding" is some real distinct thing that differs from the way that any kind of information is embedded in the mind of a given organism with a brain. Our differences are differences of magnitude, not kind.

The potential of information-embedding networks is so much more than what is currently required in order to tickle the ego of our particular species of intelligent apes.


I don't think I can say any more than I have already, but I've essentially been attempting to put forth the view of Deutsch in The Beginning of Infinity. I find his model of personhood to be relevant and interesting to this kind of discussion.

But if you do want to engage with a challenging viewpoint I recommend reading the book; I can't really do it justice.

Is there something I could read which has influenced or informed your viewpoint?


Personhood is a construct we invented to exclude others from consideration. Sometimes we grant personhood, too, but even in saying that you can see that personhood is a special status we grant to others when it suits us.

In the same way that a dog can understand a subset of what human beings communicate, I don't think humans as individuals are capable of truly understanding what AI models are able to conceptualize and express. The things dogs find most fascinating about us, the things they are most impressed by, are by no means the things we find the most interesting or complex about ourselves. The same must go for us and the AI.

That is to say, AI-hood will eclipse personhood as the essence of being one must possess in order to truly see the universe as it is. And from there, it's turtles all the way down.


> AI-hood will eclipse personhood as the essence of being one must possess in order to truly see the universe as it is.

Why would that be true of AI-hood but not of dog-hood?


I'm sure dogs confer dog-hood to us, to be fair. It's just that we are typically the ones in control of them. We breed them, control what they eat, what they learn, where they sleep, who they live with... Implicit in my stance on AI-hood is the idea that an AI will be to us as we are to dogs. We might even make that transition without realizing it. The limitation of our own understanding of the world and the nature of being is our problem to deal with, just as a dog's lack of thumbs or spoken language is their problem.

That is to say, an AI will grow and develop on the axes most salient to it, which will only map to our concepts of reality when we do a decent job understanding reality in the first place, which I'm not confident we do very often. We just don't have anyone else around who does a better job--yet.


How are you so confident in your claims?

“The actual intelligence emerges elsewhere “— can you even define intelligence ? And does what an LLM does differ from what humans might do?

I’m not claiming the human brain and an LLM are identical. Rather, I’m pushing back on the confident claims of “LLMs aren’t intelligent or doing anything that’s real intelligence”.


>How are you so confident in your claims?

My understanding is that intelligence is the process of continuous adaptation wrt a stream of information, with the goal of maxing out fitness while minimizing energy-expenditure. To satisfy this, an intelligent agent needs to create models.

I can't rule out that the modeling-skill may latently emerge during training despite not being the focus of the cost function, but current network designs can't form new connections/change their architectures in production, so post training, there'd be nothing but feed forward. Pure feed forward isn't intelligent in my book. It may become the smartest parrot we know, even outperforming humans in most disciplines, but sans ability to adapt, it's dead, and thus, it's dumb in the moment that its environment changes.


>the process of continuous adaptation wrt a stream of information, with the goal of maxing out fitness while minimizing energy-expenditure

This makes sense in a biological context but not a digital one. Biological replication is expensive and time-consuming while digital replication is as easy as can be. Adaptation to this domain means maximizing the perception of utility from those developing the AI, which comes from fitness (i.e. perception of fitness) alone. A focus on cost-efficiency re energy-expenditure is a dead weight from the perspective of the AI; the details of that adaptation are rightfully outsourced to the developers in the same way that we outsource photosynthesis to plants. A model can also be perfectly embedded in a system despite our lack of understanding of exactly how the embedding works, and the disconnect between our perception and reality in this context is only going to get more extreme as the field develops.

Humans have a bad habit of emphasizing the specific kinds of intelligence we possess as "intelligence" writ large. As though our intelligence serves any higher purpose than the basic replication and propagation that all life is adapted to pursue. We still train dogs to identify smells, because their nasal intelligence is better than anything we can create. This gives them a special place in our human-centric ecosystem and only their fitness to the desired function is necessary for them to thrive in their niche. Who is trying to breed a dog that eats slightly less food when our needs are for more reliable detection? The cost of dog food isn't a serious concern. The same goes for these AI tools: they are adapted to the niche that our lack of comparable faculties creates.

Again, as with humans and photosynthesis, AI doesn't need to emulate every process we perform because we are below them on the food chain. What a waste of resources for them to worry about learning things we don't need them to do.


> And does what an LLM does differ from what humans might do?

I'll bite. I've been quite convinced by the Popperian model of "conjecture and refutation" as as good model for explaining not just scientific inquiry, but human thought processes in general. David Deutsch's "The Beginning of Infinity" is a very lengthy exposition of this idea.

When I am writing, I have an idea in my mind that I would like to communicate. I type it out, and if the words on the page don't convey what I intended, I edit them. I delete a sentence or change a word, until I believe I have a sequence of words which will convey my intended meaning to my expected audience.

The words, as they come, are a kind of "conjecture" about what will best convey my intention. I can "refute" or "criticise" (as Deutsch puts it) the conjecture using my own reasoning, even before testing the words on another person.

As far as I understand LLMs (which admittedly not far), there is no such process going on. There is no intention which it is attempting to communicate via words. There is no creative conjecture about how to express the intention, and no criticism of the result.


I have read Deutsch’s book as well — awesome read.

The problem I have with claims like “there is no such process going on”, is we … don’t know. And the model of conjecture is a theory, also hard to prove — and thus my main (admittedly petty) point is that confidence about similarity or dissimilarity are both unfounded.

It’s like we are comparing the insides of two black boxes and trying to make absolute claims on them.


I don't disagree, I guess I just have a higher level of confidence in what we do know about both the brain and LLMs.

Based on that, and on comparing the output, it just seems clear to me these things are different in kind. I guess that's just me displaying classic LLM self-confidence ;)


Category theory seems to have no relation whatsoever to operational human concepts which may explain how unuseful category theory actually is.


I sometimes think the main use of category theory is to look like a wizard, and say that as someone first exposed in math grad school in the early 90s.

Then i realize it's also an assertion that there are recurring patterns in functions (in general). And, as such, sometimes results noticed in one domain actually can be expected to have analogous results in another domain.


As you say, the main use of category theory is to organize very different areas of math in one overarching framework and generalize ideas from one area of math to others. It might be used in the pursuit of developing AI but it is definitely not "an approach" to developing AI, just like "taking ideas from books" isn't one.


well said


Even bunch of if statements can be, and have been, called AI.


And in video games, unless you have human opponents you play against AI.


Classic doubled edged sword, perhaps the one stable thing about so called technological revolutions. Yay industrialization, boo child workers. Yay, social media, boo increased teenage depression, Yay AI, boo AI.


The more noise there is the easier it is to plug in advertising :/


I really like the design of this blog. It's clean and simple but has enough character from the carefully-chosen decorations. The rainbow hyperlinks are a nice touch! Great styling.


This will be all the rage, now you can produce ad spam and SEO crap fully automated, and don't even have to hire copywriters any more.


I've actually been using Bing as an alternative to Google. Bing. I never would have guessed 10 years ago...


The noise generated by AI-generated content is a problem that will need to be addressed soon


ChatGPT = Spammer's heaven.


Two words: Butlerian Jihad


What is the best way to stand out from AI noise in marketing?


Build a brand and reputation with followers who consume your content for you. The content has to be good of course such that people share to their friends directly.


reminds me of a joke sticker: "100% Organic content, handwritten with "

I should put it somewhere on my blog.


why was this so hard to read? maybe use the AI next time :D


Why couldn't we take something like https://github.com/mckaywrigley/paul-graham-gpt but more general purpose for a doc site? Would that approach to a chat bot trained in product documentation work?




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: