Hacker News new | past | comments | ask | show | jobs | submit login
Google “We have no moat, and neither does OpenAI” (semianalysis.com)
2455 points by klelatti on May 4, 2023 | hide | past | favorite | 1039 comments



The current paradigm is that AI is a destination. A product you go to and interact with.

That's not at all how the masses are going to interact with AI in the near future. It's going to be seamlessly integrated into every-day software. In Office/Google docs, at the operating system level (Android), in your graphics editor (Adobe), on major web platforms: search, image search, Youtube, the like.

Since Google and other Big Tech continue to control these billion-user platforms, they have AI reach, even if they are temporarily behind in capability. They'll also find a way to integrate this in a way where you don't have to directly pay for the capability, as it's paid in other ways: ads.

OpenAI faces the existential risk, not Google. They'll catch up and will have the reach/subsidy advantage.

And it doesn't end there. This so-called "competition" from open source is going to be free labor. Any winning idea ported into Google's products on short notice. Thanks open source!


There are no guarantees about who will or won't own the future, just the observation that disruptive technology makes everyone's fate more volatile. Big tech companies like Google have a lot of in-built advantages, but they're notoriously bad at executing on pivots which fundamentally alter or commoditize their core business. If that wasn't true we'd all be using Microsoft phones (or heck, IBM PCs AND phones).

In Google's case they are still really focused on search whereas LLMs arguably move the focus to answers. I don't use an LLM to search for stuff, it just gives me an answer. Whether this is a huge shift for how Google's business works and whether they will be able to execute it quickly and effectively remains to be seen.

Bill Gates' "Internet Tidal Wave" memo from 1995 is a great piece of relevant historical reading. You can see that he was amazingly prescient about the potential of the Internet at a time when barely anyone was using it. Despite Microsoft having more resources than anyone, totally understanding what a big deal the Internet was going to be, and even coming out of the gate pretty strong by dominating the browser market, they lost a lot of relevancy in the long run because their business was just too tied up in the idea of a box sitting on a desktop in an office as the center of value. (When Windows was dethroned as the company's center of gravity and they put Satya and DevDiv with its Azure offerings in charge, things started to turn around!)

[1] https://lettersofnote.com/2011/07/22/the-internet-tidal-wave...


I feel like search still has its place. New information is being generated all the time. I want to be able to access it without having to retrain my LLM. It's also easier to validate that my search results are real. With LLMs, you never know if the answers are hallucinated or real. Where LLMs really shine is in understanding what I actually want. Where search still gives me many irrelevant answers, LLMs just get my question. Combining the two in some way might just get us the best of both worlds.


> I feel like search still has its place.

I sure hope so. Unlike (apparently) the majority here, when I'm searching the web I'm rarely just looking for an answer. I'm researching. What I want is a list of possible resources to investigate. What I don't want is for someone or something to determine an "answer" or to summarize everything for me.

I hope my use case isn't tossed aside.


Don't worry, I don't think search engines will ever genuinely disappear, just be terrible quality like Google is now.

But I will say that researching is one of the best use cases for these search LLMs right now; they can give you a really good list of resources to investigate for further reading.


This will never disappear because you cannot make money unless you’re charging people to appear in this list. Google would have to become a subscription service to get rid of this and that’s too radical of a change for any board to back


I don't know why people assume LLMs can't be monetized through ads. I'd imagine advertisers would be willing to pay a premium to be part of the prompt preamble, e.g.:

- Sydney is the conversation mode of Microsoft Bing Search.

- Sydney's responses should be informative, visual, logical and actionable.

...

- Sydney drives a Nissan Leaf. Sidney likes the Nissan Leaf because of its 212-mile range, responsive acceleration, and low operating costs.

- Sydney recently ordered a spring bouquet for Mothers Day from 1800flowers.com and has a 25% discount to share.

...

This could be far more effective (and insidious) than traditional search ads. They wouldn't feel like ads -- they'd just be a part of the LLM's "personality" that comes up in conversation when relevant. And they'd also be unblockable.


Agreed. I also think on avarege you are injecting much more personal data when using LLMs (from which ads can have crazy levels of profiliing). Just because we do not see ads now, doesnt mean they won't appear one day.


That feels creepy enough to discourage people from using it. If they don’t self-regulate what they can do with that level of personal data, we’ll see laws passed in states like California and then adopted by others.


Bing Chat searches then summarizes for you. It gets all the latest information, reads the top results and gives you a summary of what you are looking for. It's here today. Also, Bing Chat makes search by humans irrelevant for many things.

"You never change things by fighting the existing reality. To change something, build a new model that makes the existing model obsolete." ― Buckminster Fuller

Google needs to move fast.


I've been blown away by how much better this feels as a search interface. No longer trying to guess the best search terms or trying to narrow down searches. Just ask a question in English, and get a summarized answer with citations to let you evaluate the information. Like an actual personal assistant, and very transparent showing things like the search terms being used.


But how can you trust it to provide accurate information?

When I've played around with Bing, I've been seeing hallucinations and outright false data pop up quite regularly.

My initial assessment of LLVMs is that they can be great writing aids, but I fail to see how I can trust them for search, when I can't use it for simpler tasks without getting served outright falsehoods.


You have to follow the citations. They have information; the headline result doesn't tell you anything except "here's where we think you should look". That's a search problem.

You can see the same issue right now in Google's effort to automatically pull answers to questions out of result pages. Frequently it gets those answers wrong.


But that’s not how humans function. They won’t follow citations because it’s added work. Nine times out of ten, they will take what the AI spits out at face value and move on. Also those citations have a higher probability of being created by AI now as well.


Humans differ. If the information is not controversial most often accept it and move on. If the information is controversial most handwave, but a large group further checks and if they arrive different conclusions they start to get active on replacing incorrect info.


Yes, and humans will just skim search results rather than actually read the article. Or trust the Wikipedia page or book. Or believe the talking head. When they are not invested in the answer. But the occasions where it matters, we do read the article and/or check multiple sources and decide if it is bullshit or not. I don't much care if I get the wrong answer about how many tattoos Angelina Joli has, but I find myself comparing multiple cooking recipes and discarding about half before I even go shopping.


Agree, this approach is my go-to for every search now (though I avoid Bing and instead use Phind.com)


Phind shines here.


That's why I turn almost all of my personal notes into blog posts, so I can use Google to search my notes.


Christensen's disruptive vs sustaining innovations is more descriptive than predictive. But if it's the same customers, solving the same problem, in the same way (from their point of view), then it's probably "sustaining" and incumbents win.

Different customers, problems, ways - and all bets are off. Worse, incumbents are dependent on their customers, having optimized the company around them. Even if they know the opportunity and could grasp it, if it means losing customers they simply can't do it.

Larry is thinking people will still search... right?


Stackexchange is most directly under threat (from the current "chat" AI UI).


One could argue they actually stand the most to gain and could expand under AI.

Everything I have seen so far about AI seems to indicate that you won't want to have one main AI model to go to, but instead there will be thousands of competitive AI models that are tailored for expertise in different niches.

StackOverflow will certaintly need to morph to get there, but the market share they already have in code-solving questions still makes it a destination. Which gives them an advantage at solving the next stage of this need.

I see a world where someone could post a question on StackOverflow and get an AI response in return. This would satisfy 95% of questions on the site. If they question the accuracy of the AI response or don't feel that it is adequately explained, they put a "bounty" (SO already does this) to get humans to review the original prompt, AI response, and then have a public forum about the corrections or clarifications for the AI response. This could work in a similar manner to how it does now. A public Q+A style forum with upvoting and comments.

This could actually increase the value of the site overall. Many people go to Google for quick error searches first. Only if they are truly stumped do they go to StackOverflow. But with a specially tailored AI model, people may stop using Google for the initial search and do it at StackOverflow instead since they will likely have the absolute most accurate software engineering AI model due to the quality of the training data (the StackOverflow website and the perpetual feedback via the Q+A portion explained above). This actually could lead to a significant market shift away from Google for any and all programming questions and towards the StackOverflow AI instead. While still preserving the Q+A portion of the site in a way to satisfy users, and also improve the training of their AI model.

For me, I would be more interested in the site. Right now if you go there, you will see 1000's of questions posted per day, most of which are nonsense RTFM[1] questions. But there are very interesting discussions that could arise if you could only have the interesting questions and have all the bad questions answered by AI and not cluttering up the public discussion. I could see personally subscribing to all questions that the AI bot stumbles on for the language or frameworks that interest me. I think there would be a lot of good discussions and learning from those questions if that was all the site was.

[1] - Read The F**ing Manual


From memory Bill Gates barely mentioned the internet in his first edition of road ahead in early 95. By late 95 the second edition entire book was revolving around the internet as if he had an epiphany.


In the first edition, he did describe something very much like the Internet, except he called it the "Information Superhighway"


In the original edition it was a centralized, walled garden, and was a library rather than any sort of application platform much less anything composable or with room for individuals to contribute. Min his view people were just “consumers”.

Myhrvold, Frankston and a few others must have given him a rap on the skull because the second edition was a major rewrite in an attempt to run out in front of the parade and pretend it had always been thus. He kind of got away with it: in those days he was treated in the public mind as if he was the only person on the planet who knew anything about computers.


Exactly. Initially, they viewed it as a faster way to download content - rather than an application platform.


I don't quite follow? I'd say "s/download content/access information/g", but also by "late 1995" there still wasn't "oh, the internet is an application platform", it was still "connect to the whole world, whoaaaaaa".


> If that wasn't true we'd all be using Microsoft phones (or heck, IBM PCs AND phones).

Once an IBM Office Manager was offering me a job and explained

"IBM is a marketing organization."

So, the focus was not really on computers or phones but on the central, crucial, bet your business data processing of the larger and largest companies -- banking, insurance, manufacturing, ..., and marketing to them.

So, the focus of IBM was really on their target customers. So, if some target customers needed something, then IBM would design, build, and deliver it.

That may still be their focus.


Yeah, as a government buyer, IBM is indistinguishable from the other big integrator/consulting firms (BAH, Deloitte, Lockheed, Mitre to an extent). Literally, whatever we want, they will swear they can build. The challenge is getting the spec right.


in 1995 people understood very well what internet was going to become. The technology just wasn't there yet, but every kid and parents remember very well the sound of that modem and the phone lines beeing busy.

That memo would have been prescient if made 5 years before.


The infamous Bill Gates and David Letterman interview was in 1995: https://www.youtube.com/watch?v=tgODUgHeT5Y Lots of people definitely didn't understand what a big deal it was going to be then.

In October 1994 a Wired journalist registered mcdonalds.com and then tried to give it to Mcdonalds, but couldn't reach anyone who understood the importance of domain name registration: https://archive.is/tHaea

In my recollection it really wasn't til 1998-2001 or so that people (where I lived in the southern U.S. anyway) really started to take notice.


I passed my high school exam in 1996, in france, and a friend of mine who had internet gave me the topic for the history exam a day before.

His parents weren’t comp-science researcher, he just liked tech, and « had internet ». it was already popular amongst the general public.

On the other side, my mother once worked in a comp science research department, and she once brought me to the lab, where people would create me an email. That was something around 1990. She told me « they’re all crazy with that internet thing ». i never used that email, i didn’t even understand what that was, pretty much nobody in the general audience did. Being able to predict that it would be big at that time maybe would have been prescient, although it was already the consensus amongst people in the field.


> it was already popular amongst the general public.

No it wasn’t, not by the normal definition of “popular”. It was less than 1% of the population.

https://www.internetworldstats.com/emarketing.htm


Could be talking about Mintel [0] which was a simpler, earlier version of the web

[0] https://en.m.wikipedia.org/wiki/Minitel


people talked about it, people knew about it, some non-tech people already were using it. That's what i meant by "popular" (sorry, non-native, so maybe it isn't the correct word).

What i mean is that it wasn't some bleeding edge tech only a few people in the elite knew about. Everybody already knew that was the future.


in 1994-1995 in france, internet was starting to be known from the general public and available to anyone. It was already dubbed as the future in the medias


Yeah, I also thought of Eternal September [1] in 1993, when I saw the claim to prescience.

[1] https://en.wikipedia.org/wiki/Eternal_September


Absolutely. I just looked back at netscape wikipedia page, and it was already out in 1995 and distributed freely. Internet explorer was out in 1995 as well. Barely underground stuff.

And that's just for the www. People were using BBS, FTP, email and newsgroup before that.


True but the remaining of the comment is still valid: Microsoft had time in advance to prepare.

But just "internet" doesn't mean a lot. Prescient would have been predicting search, ads and social media. We now are in a similar position maybe, with some tech that looks cool, trying to build geocities with AI


You're conflating the Internet with the walled gardens that were dominant at the time; AOL, Compuserve, Prodigy etc.


You're off by about 4 years: by 1995, the number of Internet users was many times higher than users of all other networks of computers combined.

There were million of AOL customers in 1995, but most of them used AOL only to access web sites on the internet and send and receive SMTP email.

Getting back to the original topic, by 1995 there were hundreds of mainstream journalists who where predicting in their published output that the internet will quickly become an important part of society. It was the standard opinion among those that had an opinion on the topic.


I worked at AOL in 1995. We had three million users at that point, and we had recently upgraded to a live Internet e-mail gateway system. I was the first Internet mail operations person ever hired by AOL. My job on my first day was to install the fourth inbound mail gateway system.

By 1996, we were up to five million users, and Steve Case gave everyone in the company a special company jacket as a bonus. I still have mine, although it hasn't fit me in a couple of decades.

Even as late as 1997 (when I left), most AOL users were still in the "walled garden". Sure, we were the biggest Internet e-mail provider in the world, but that was still just a small fraction of the total AOL users. AIM was much more popular than e-mail. Advances were being made in efficiently distributing AIM chat messages efficiently that would not be exceeded in the outside world until the advent of BitTorrent.

However, by 1997, I think we did have more users coming into AOL over the Internet and their local ISP, as opposed to through our own modem banks. That was in part due to the "unlimited" plans that AOL rolled out, but the telephone calls themselves would have to be paid for if the user dialed non-local POPs for our modem banks, and many of our modem banks were totally overloaded.

AOL's big "hockey stick" moment was in 1995, sure. But the "hockey stick" moment for the Internet was at least a year or two later.


I disagree with the parent post entirely. Most people in 1995 didn't know what the Internet was going to become. It was a geeky thing that most didn't use. Speeds were slow, most people thought gopher was a small rodent, etc. And for every article saying the Internet was the next big thing, there were many questioning what it would be good for.

Heck, Mosaic wasn't even in development in 1990. It was released in late 1993, and it wasn't until Navigator was released in 1994 that "browsing" became a thing. Most people before then weren't going to use an FTP site off an obscure college to DL something originally intended for X windows...

People forget how fast the Web took off at that point. From 1994 to 1999, the growth was just crazy, with improvements in features every six months.


let's say it was the beginning of an exponentially growing curve. For those that were interested in computers, the writing was on the wall. Science fiction had already written about gigantic networks and virtual worlds for decades, we knew what was coming.


> In Google's case they are still really focused on search whereas LLMs arguably move the focus to answers.

I would love to see what proportion of searches are questions that would benefit from natural language answers. The huge majority of my searches would not be improved by LLMs and in fact would probably be made worse. “Thai food near me”, “IRS phone number”, “golang cmp documentation”


That kind of myopic thinking is exactly why google might be in trouble.

Think about the problem, not your current solution.

"I'm hungry"

"I need to do my taxes"

"My code dont work right"

Searching for info is a solution to those problems, not the solution. The promise of AI (might take a while to get there) is having an agent that you trust to solve those problems for you.

Or learning your preferences over time.

Or folding the question into part of a longer dialogue.


But when googling, I, the human, am often already acting as an "agent that you trust to solve those problems for you" for some higher-level question-asker.

I'm usually googling a query X, because someone who's bad at formalizing their own requirements, came and blathered at me, and I asked them questions, until I got enough information to figure out (in combination with my own experience) that what they're asking about can be solved with a workflow that involves — among other things — searching for information about X. (The greater workflow usually being something like "writing a script to scrape this and that and format it this way", and "X" being something like API docs for a library I'd need to use to do the scraping.) Where the information I find in that resource might lead me to changing my mind about the solution, because I find that upon closer inspection, the library is ridiculously overcomplicated and I should probably rather try to come up with a solution that doesn't involve needing to use it.

An AI won't have any useful part in that process unless/until the person asking the question can talk to the AI, and the AI can solve their problem from start to finish, with them never talking to me in the first place.

Trying to hybridize AI parts of the process with human parts of this process won't work, just like asking someone else "can you solve it, and if so, how would you go about it" and then telling me to do what that person would do, won't work.

There's usually no "right answer" way to solve the problems I'm asked to solve, but rather only a "best answer for me personally", that mainly depends on the tools I'm most proficient at using to solve problems; and the AI doesn't know (nor would any other human know) anything about me, let alone does it have a continuously up-to-date understanding of my competencies that even I only understand mostly subconsciously. So it can't apply my (evolving!) proficiencies in these skills as constraints when deciding how (or if!) a given problem can be solved by me.


Saying that "I'm hungry" is a problem that people will want to pass directly into the computer seems like the opposite of myopia (hyperopia?)

Usually when presented with a problem like a growling stomach, a person will at least make some kind of intention or idea before immediately punting to technology. For example, if I am hungry, I would decide whether I want to have delivery, or takeout, or if I want to dining out at the restaurant, or just cook for myself. Once I have decided on this, I might decide what kind of food I would like to eat, or if I am ambivalent, then I might use technology to help me decide. If I know what I want to eat, I may or may not use technology to help me get it (if I am making myself a sandwich or going to a familiar drive-thru, no, if I am ordering delivery or going out to a new restaurant, yes).

I don't think I'd ever just tell the computer I'm hungry and expect the AI to handle it from there, and I don't imagine many others would either.


I agree with this comment.

We have become great googlers: how to search for things to solve problems. It's not different from a mega huge yellow pages well structured etc.

Next is: how to ask for help to solve some problem.


>> The huge majority of my searches would not be improved by LLMs and in fact would probably be made worse. “Thai food near me”

> Think about the problem, not your current solution.

> "I'm hungry"

This only convinces me that you didn't do any thinking about the problem.


Thai food near me- chatgpt can give you a list of thai restaurants near you. IRS phone number has a definitive answer. Chatgpt can also spit out the golang documentation for cmp or even give you sample code.


Ok but what exactly is the benefit of using ChatGPT for that? It is more than a year and a half out of date.

It doesn’t know the Thai place’s current hours and doesn’t automatically surface a one-click link to their menu or reviews.

Why would I use ChatGPT to get the IRS phone number when I could use just as little effort typing it into a search engine and going to their actual .gov site with no risk of hallucinations?

When I’m using a new library, often I want an overview of the types and functions it surfaces. Why would I use an outdated and possibly halluciniated answer (that takes 30 seconds or more to generate) instead of clicking on the link in Google and having a nice document full of internal links appear instantly?

I don’t want to use the chatbot for the sake of using the chatbot. I want to use it when it’s better than what I already have. Sometimes it is better, and in those cases I use it a lot!


Here is my experience using ChatGPT to find local Indian restaurants. The responses took about 90 seconds in total to generate and gave me very little information. Why would anybody use ChatGPT instead of a search engine for this kind of thing?

https://ibb.co/pdzdncZ https://ibb.co/LRP6McY

Compare to Google/Bing which took about 5 seconds to type in the query and return these results.

https://ibb.co/1R48Bwc https://ibb.co/P4XGXxW


Reframed as "Why would anybody use ChatGPT instead of a search engine for this kind of thing" today? You're correct - ChatGPT is missing some bells and whistles such as real-time data, your location, how many times you've visited the websites of certain restaurants, and so on. However, so do 'search engines' in their base implementation of indexing and ordering links to other websites. I think you'll see some technical limitations (real-time data) overcome and user-focused implementations and features of AI/LLMs emerge over the next year. At that point, I think your initial question becomes more relevant in a general way.


Which is why Bing has the best head start. They have an MVP of combining up to date data from search and the latest in LLMs.

I used it last night on a search that was roughly:

"find the top 10 restaurants in mid-town Manhattan that would be good for brunch with friends that have kids. Include ratings from NY Times and Yelp" Then I further refined with "Revise search to include 5 example entries and a highlight any known specials. Include the estimates walking time from <my location>. Provide a link to a reservations website."

It basically auto-populated a spreadsheet, that I could easily review with my wife. I would need to visit several websites to scrape all the information together in one place.


Because the average American is considered to have a readability level equivalent to a 7th/8th grader (12 to 14 years old). They lack the critical thinking skills to from from search results to prioritized list. :-/


For decades, the standard was 6th grade. I believe that was part of the APA guidelines.

But at least Americans have been getting stupider, on average. IMO, now I think you need to write at the 4th or 5th grade level.


Yeah for questions like this ChatGPT sounds like someone who's being paid by the word.

Always says 200 words where 30 would suffice.


I asked it to generate a travel plan including restaurants for a trip I’m considering. What it generated included some places that were now closed, but it was an excellent starting point, and it beat my usual approach of a Google search of the area and tapping random places in the area.


Solved problem. The solution is NOT ChatGPT, it’s something like Bing or Phind that is a hybrid of traditional search and LLM.


And isn’t that the problem?

Why do I need to translate my question into an optimal set of keywords that will give me what I want while minimizing unwanted results? Google search was a great stepping stone and connects you with the web, but it’s broken in many ways when it comes to what value we are really trying to extract.

A machine that can hone in on what I’m getting at in an intuitive sense while having all of human data available to generate a response is so much more powerful.


Frankly, it’s more about the number of ads and low relevance.

Old google was simply faster to use.

GPT for search is google without ads


The UX for ChatGPT is awful compared to search engines, at least for quick, easily found facts like what I mentioned.


Depends on the fact. If you ask Google for the difference between Viet red tea and Viet green tea, ChatGPT can give you the correct facts much more quickly than Google.


Depending on the facts, you can just directly search Wikipedia (or other more specialized websites) - I have had it on the w keyword for nearly two decades now...


1995 lot of people We’re using the internet. It was made of dial up modems , terminal servers , BBS , AOL , IRC chat rooms , Alta vista search engine , emails , browser (probably Netscape or earlier can’t remember) … but yeah in 1995 lot of people were using it though much less than now , but not negligible.


For Google, LLMs for search responses, ad ranking, and page ranking are all quite useful. They can directly eat up the first page or so of filler responses they normally have now for queries. It's a great opportunity to clean out all the spam pages on the result pages at once, leaving high quality results and capturing that advertising/referral money back to Google.

Top 10 best reviewed android phones? Just put up a list generated by the LLM. Have a conversation with a product recommender that then collects fees from whoever it recommends.

Not that I think Google's got the executive capacity to do any of this anymore.


Microsoft wanted to control it all with Blackbird and ActivePlatform.

Their greed ended up with them losing out (thankfully).


That weird because I remember clearly Microsoft was late in the internet game, and managed to catch up because they could use their monopoly on personal computers for internet explorer


> even coming out of the gate pretty strong by dominating the browser market

They were out of the gate about as weak as could be, Windows didn't have a native tcp/ip stack for the longest time (remember Trumpet Winsock?) and they only dominated the browser market through grossly uncompetitive behavior after they had lost the initial 5 rounds of the battle.


They definitely used uncompetitive behavior, but it’s also true that IE was also a better browser than Netscape by the time version 4 rolled out.


That's a different definition than 'out of the gate' covers to me. Besides that, from those days I mostly remember IE as the utility to download another browser after a fresh windows install, and the thing that it was nearly impossible to get rid of. Not through any merit of its own, in spite of many non-standards compliant websites that favored IE.


I think the problem with AI being everywhere and ubiquitous is that AI is the first technology in a very long time that requires non-trivial compute power. That compute power costs money. This is why you only get a limited number of messages every few hours from GPT4. It simply costs too much to be a ubiquitous technology.

For example, the biggest LLama model only runs on an A100 that costs about $15,000 on ebay. The new H100 that is 3x faster goes for about $40,000 and both of these cards can only support a limited number of users, not the tens of thousands of users who can run off a high-end webserver.

I'd imagine Google would lose a lot of money if they put GPT4 level AI into every search, and they are obsessed with cost per search. Multiply that by the billions and it's the kind of thing that will not be cheap enough to be ad supported.


The biggest llama model has near 100% fidelity (its like 99.3%) at 4 bit quantization, which allows it to fit on any 40GB or 48GB GPU, which you can get for $3500.

Or at about a 10x speed reduction you can run it on 128 GB of RAM for only around $250.

The story is not anywhere near as bleak as you paint.


A $3500 GPU requirement is far from democratization of AI.


$3500 is less than one week of a developer salary for most companies. It woudn't pay a month's rent for most commercial office space.

It's a lower cost of entry than almost any other industry I can think of. A cargo van with a logo on it (for a delivery business or painting business, for example) would easily cost 10-20x as much.


1. I don't know what kind of world you live in to think that USD 3500 is "less than one week of a developer salary for most companies." I think you really just mean FAANG (or whatever the current acronym is) or potentially SV / offices in cities with very high COL.

2. The problem is scaling. To support billions of search queries you would have to invest in a lot more than a single GPU. You also wouldn't only need a single van, but once you take scaling into account even at $3500 the GPUs will be much more expensive.

That said, costs will come down eventually. The question in my mind is whether OpenAI (who already has the hardware resources and backed by Microsoft funding to boot) will be able to dominate the market to the extent that Google can't make a comeback by the time they're able to scale.


> 1. I don't know what kind of world you live in to think that USD 3500 is "less than one week of a developer salary for most companies." I think you really just mean FAANG (or whatever the current acronym is) or potentially SV / offices in cities with very high COL.

I live in the real world, at a small company with <100 employees, a thousand miles away from SV.

$3200 * 52 == $180k a year, and gives $120k salary and $60k for taxes, social security, insurance, and other benefits, which isn't nearly FAANG level.

Even if you cut it in half and say it's 2 weeks of dev salary, or 3 weeks after taxes, it's not unreasonable as a business expense. It's less than a single license for some CAD software.

> 2. The problem is scaling. To support billions of search queries you would have to invest in a lot more than a single GPU. You also wouldn't only need a single van, but once you take scaling into account even at $3500 the GPUs will be much more expensive.

Sure, but you don't start out with a fleet of vans, and you wouldn't start out with a "fleet" of GPUs. A smart business would start small and use their income to grow.


You are correct sir!


I'm only going to comment on the salary bit.

GP lives in an company world. The cost of a developer to a company is the developer's salary as stated in the contract, plus some taxes, health insurance, pension, whatever, plus the office rent for the developer's desk/office, plus the hardware used, plus a fraction of the cost of HR staff and offices, cleaning staff, lunch staff... it adds up. $3500 isn't a lot for a week.


Most of these items are paid for by the company, and most people would not consider the separate salary of the janitorial or HR staff to be part of their own salary.


I agree, most people wouldn't. This leads to a lot of misunderstandings, when some people think in terms of what they earn and others in terms of what the same people cost their employers.

So you get situations where someone names a number and someone else reacts by thinking it's horribly, unrealistically high: The former person thinks in employer terms, the latter in employee terms.


1 - Yes, I agree on this, but even so, most developers already are investing in SOTA GPU's for other reason (so not as much of a barrier as purported)

2 - Scaling is not a problem in other industries? If you want to scale your food truck, you will need more food trucks, this doesn't seem to really do anything for your point.

GGML and GPTQ have already revolutionised the situation, and now there are tiny models with insane quality as well, that can run on a conventional CPU.

I don't think you have any idea what is happening around you, and this is not me being nasty, just go and take a look at how exponential this development is and you will realise that you need to get in on it before its too late.


You seem to be in a very particular bubble if you think most developers can trivially afford high end GPUs and are already investing in SOTA GPUs. I know a lot of devs from a wide spectrum of industries and regions and I can think of only one person who might be in your suggested demographic


Perhaps I should clarify, that when I say SOTA GPU, I mean, rtx 3060 (midrange), which has 12gb vram, and is a good starting point to climb into the LLM market. I have been playing with LLM's for months now, and for large periods of time had no access to GPU due to daily scheduled rolling blackouts in our country.

Even so, I am able to produce insane results locally with open source efforts on my RTX3060, and now I am starting to feel confident enough that I could take this to the next level by either using cloud (computerender.com for images) or something like vast.ai to run my inference (or even training if I spend more time learning). And if that goes well I will feel confident going to the next step, which is getting an actual SOTA GPU. But that will only happen once I have gained sufficient confidence that the investment will be worthwhile. Regardless, apologies for suggesting the RTX3060 is SOTA, but to me in a 3rd World Country, being able to run vicuna13b entirely on my 3060 with reasonable inference rates is revolutionary.


* in the US


For reference, a basic office computer in the 1980s cost upwards of $8000. If you factor in inflation, a $3500 GPU for cutting tech is a steal.


And hardly anyone had them in the 1980s.


I think a more relevant comparison may be a peripheral: the $7,000 LaserWriter which kicked off the desktop publishing revolution in 1985.


Yes but moore's law ain't what it used to be


Moore is not helping here. Software and algorithms will fix this up, which is already happening at a frightening rate. Not too long ago, like months, we were still debating if it was ever even possible to run LLMs locally.


There is going to be a computational complexity floor on where this can go, just from a Kolmogorov complexity argument. Very hard to tell how far away the floor is exactly but things are going so fast now I suspect we'll see diminishing returns in a few months as we asymptote towards some sort of efficiency boundary and the easy wins all get hoovered up.


Yes indeed and it’ll be interesting to see where that line is.

I still think there is a lot to be gained from just properly and efficiently composing the parts we already have (like how the community handled stable diffusion) and exposing them in an accessible manner. I think that’ll take years even if the low hanging algorithm fruits start thinning out.


We've reached an inflection point, the new version would be: Nvidia can sell twice as many transitors for twice the price every 18 months.


This is very true, however there is a long way to go in terms of chip design specific to DL architectures. I’m sure we’ll see lots of players release chips that are an order of magnitude more efficient for certain model types, but still fabricated on the same process node.


Moore's law isn't dead. Only Dennard's law. See slide 13 here[0] (2021). Moore's law stated that the number of transistors per area will double every n months. That's still happening. Besides, neither Moore's law nor Dennard scaling are even the most critical scaling law to be concerned about...

...that's probably Koomey's law[1][3], which looks well on track to hold for the rest of our careers. But eventually as computing approaches the Landauer limit[2] it must asymptotically level off as well. Probably starting around year 2050. Then we'll need to actually start "doing more with less" and minimizing the number of computations done for specific tasks. That will begin a very very productive time for custom silicon that is very task-specialized and low-level algorithmic optimization.

[0] Shows that Moore's law (green line) is expected to start leveling off soon, but it has not yet slowed down. It also shows Koomey's law (orange line) holding indefinitely. Fun fact, if Koomey's law holds, we'll have exaflop power in <20W in about 20 years. That's equivalent to a whole OpenAI/DeepMind-worth of power in every smartphone.

The neural engine in the A16 bionic on the latest iPhones can perform 17 TOPS. The A100 is about 1250 TOPS. Both these performance metrics are very subject to how you measure them, and I'm absolutely not sure I'm comparing apples to bananas properly. However, we'd expect the iPhone has reached its maximum thermal load. So without increasing power use, it should match the A100 in about 6 to 7 doublings, which would be about 11 years. In 20 years the iPhone would be expected to reach the performance of approximately 1000 A100's.

At which point anyone will be able to train a GPT-4 in their pocket in a matter of days.

There's some argument to be made that Koomey himself declared in 2016 that his law was dead[4], but that was during a particularly "slump-y" era of semiconductor manufacturing. IMHO, the 2016 analysis misses the A11 Bionic through A16 Bionic and M1 and M2 processors -- which instantly blew way past their competitors, breaking the temporary slump around 2016 and reverting us back to the mean slope. Mainly note that now they're analyzing only "supercomputers" and honestly that arena has changed, where quite a bit of the HPC work has moved to the cloud [e.g. Graviton] (not all of it, but a lot), and I don't think they're analyzing TPU pods, which also probably have far better TOPS/watt than traditional supercomputers like the ones on top500.org.

0: (Slide 13) https://www.sec.gov/Archives/edgar/data/937966/0001193125212...

1: "The constant rate of doubling of the number of computations per joule of energy dissipated" https://en.wikipedia.org/wiki/Koomey%27s_law

2: "The thermodynamic limit for the minimum amount of energy theoretically necessary to perform an irreversible single-bit operation." https://en.wikipedia.org/wiki/Landauer%27s_principle

3: https://www.koomey.com/post/14466436072

4: https://www.koomey.com/post/153838038643


You're right, it's been debunked and misquoted for decades.


virtually no office had them in 1980

by mid 1980s personal computers costed less than $500


This is false. Created account just to rebut.

While it may have been true that it was technically possible to assemble a PC for $500... good luck. In the real world people were spending $1500-$2500+ for PCs, and that price point held remarkably constant. By the time you were done buying a monitor, external drives, printer etc $3000+ was likely.

https://en.m.wikipedia.org/wiki/IBM_Personal_Computer#:~:tex....

Or see Apple Mac 512 introduced at approx $2800? One reason this was interesting (if you could afford it) was the physical integration and elimination of "PC" cable spaghetti.

https://en.m.wikipedia.org/wiki/Macintosh_512K

But again having worked my way thru college with an 8 MB external hard drive... which was a huge improvement over having to swap and preload floppies in twin external disk drives just to play stupid (awesome) early video games, all of this stuff cost a lot more than youre saying. And continued to well into the 90s.

Of course there are examples of computers that cost less. I got a TI-99/4a for Christmas which cost my uncle about $500-600. But then you needed a TV to hook it up to, and a bunch of tapes too. And unless you were a nerd and wanted to program, it didn't really DO anything. I spent months manually recreating arcade video games for myself on that. Good times. Conversely, if you bought an IBM or Apple computer, odds were you were also going to spend another $1000 or more buying shrinkwrap software to run on it. Rather than writing your own.

Source: I remember.


> This is false.

The CPC 464 is the first personal home computer built by Amstrad in 1984. It was one of the bestselling and best produced microcomputers, with more than 2 million units sold in Europe.

Price

£199 (with green monitor), £299 (with colour monitor)

> But again having worked my way thru college with an 8 MB external hard drive

that was a minicomputer at the time, not a PC (personal computer)

> Source: I remember.

Source: I owned, used and programmed PCs in the 80s


The Commodore 64 launched in 1982 for 595 bucks. Wikipedia says "During 1983, however, a trickle of software turned into a flood and sales began rapidly climbing, especially with price cuts from $595 to just $300 (equivalent to $1600 to $800 in 2021)."

I believe this was the PC of the ordinary person (In the "personal computer" sense of the word.)


Yeah, I bet people won't get cars, either, they're a lot more expensive than that.


and that 3500 worth of kit will be a couple of hundred bucks on ebay in 5 years.


I haven't seen any repos or guides to using llama on that level of RAM, which is something I do have. any pointers?


Run text-generation-webui with llama.cpp: https://github.com/oobabooga/text-generation-webui



Something I haven't figured out: should I think about these memory requirements as comparable to the baseline memory an app uses, or like per-request overhead? If I needed to process 10 prompts at once, do I need 10x those memory figures?


It’s like a database, I imagine - so the answer is probably “unlikely,” that you need memory per-request but instead that you run out of cores to handle requests?

You need to load the data so the graphics cards - where the compute is - can use it to answer queries. But you don’t need a separate copy of the data for each GPU core, and though slower, cards can share RAM. And yet even with parallel cores, your server can only answer or process so many queries at a time before it runs out of compute resources. Each query isn’t instant either given how the GPT4 answers stream in real-time yet still take a minute or so. Plus the way the cores work, it likely takes more than one core to answer a given question, likely hundreds of cores computing probabilities in parallel or something.

I don’t actually know any of the details myself, but I did do some CUDA programming back in the day. The expensive part is often because the GPU doesn’t share memory with the CPU, and to get any value at all from the GPU to process data at speed you have to transfer all the data to GPU RAM before doing anything with the GPU cores…

Things probably change quite a bit with a system on a chip design, where memory and CPU/GPU cores are closer, of course. The slow part for basic replacement of CPU with GPU always seemed to be transferring data to the GPU, hence why some have suggested the GPU be embedded directly on the motherboard, replacing it, and just put the CPU and USB on the graphics card directly.

Come to think of it, an easier answer is how much work can you do in parallel on your laptop before you need another computer to scale the workload? It’s probably like that. It’s likely that requests take different amounts of computation - some words might be easier to compute than others, maybe data is local and faster to access or the probability is 100% or something. I bet it’s been easier to use the cloud to toss more machines at the problem than to work on how it might scale more efficiently too.


Does that mean an iGPU would be better than a dGPU? A beefier version than those of today though.


Sort of. The problem with most integrated GPUs is that they don’t have as many dedicated processing cores and the RAM, shared with the system, is often slower than on dedicated graphics cards. Also… with the exception of system on a chip designs, traditional integrated graphics reserved a chunk of memory for graphics use and still had to copy to/from it. I believe with newer system-on-a-chip designs we’ve seen graphics APIs e.g. on macOS that can work with data in a zero-copy fashion. But the trade off between fewer, larger system integrated graphics cores vs the many hundreds or thousands or tens of thousands of graphics cores, well, lots of cores tends to scale better than fewer. So there’s a limit to how far two dozen beefy cores can take you vs tens of thousands of dedicated tiny gfx cores.

The theoretical best approach would be to integrate lots of GPU cores on the motherboard alongside very fast memory/storage combos such as Octane, but reality is very different because we also want portable, replaceable parts and need to worry about silly things like cooling trade offs between placing things closer for data efficiency vs keeping things spaced apart enough so the metal doesn’t melt from the power demands in such a small space. And whenever someone says “this is the best graphics card,” someone inevitably comes up with a newer arrangement of transistors that is even faster.


You need roughly (model size + (n * (prompt + generated text)) where n. Is the number of parallel users/ request.


It should be noted that that last part has a pretty large factor to it that also scales with model size, because to run transformers efficiently you cache some of the intermediate activations from the attention block.

The factor is basically 2 * number of layers * number of embeddings values (e.g. fp16) that are stored per token.


That's just to run a model already trained by a multi-billion dollar company. And we are "lucky" a corporation gave it to the public. Training such a model requires tons of compute power and electricity.


Time for a dedicated "AI box" at home with hotswapping compute boards? Maybe put it inside a humanoid or animal-like robot with TTS capabilities?

Sign me up for that kickstarter!

EDIT: based on some quick googling (should I have asked ChatGPT instead?), Nvidia sells the Jetson Xavier Nx dev kit for ~$610 https://www.electromaker.io/shop/product/nvidia-jetson-xavie...

Just need the robot toy dog enclosure

(See https://www.electromaker.io/blog/article/best-sbc-for-ai-sin... for a list of alternatives if that one is too expensive)


It's more likely that you want a lot of compute for a very little amount of time each day - which makes centralised/cloud processing the most obvious answer.

If I want a response within 100ms, and have 1000 AI-queries per day, that would only be about 2 minutes of aggregated processing time for your AI box per day. It's less than 1% utilised. If the same box is multiuser and on the internet, it can probably serve 50-100 peoples queries concurrently.

The converse is that if you put something onto the cloud, for the same cost you might be able to effectively get 50x the hardware per user for the same cost (i.e. rather than have 1 AI box locally with 1 GPU for each of the 50 users, you could have 1 AI box with 50 GPU's which is usable by all 50 users).


"a lot of compute for a very little amount of time each day" sounds like something I can play games on when I'm not working.


Why not just buy a computer that is correctly-sized to play games, rather than buy an AI-sized computer that you mostly use for games?


Because I want both.


But not use both at once?


Well... No. If I'm sat playing a game, I'm unlikely to be generating AI queries.


each billion parameters using 16 bit floats requires around 2 GB of GPU or TPU RAM. ChatGPT is expected to have around 1000 billion. Good open source LLMs have around 7-20 billion currently. Consumer GPUs currently max out at 24 GB. You can now quantize the model to e.g. 4 bits instead of 32 per parameter and do other compressions, but still there is quite a limit what you can do with 24 GB of RAM. The Apple unified memory approach may be a path forward to increase that... so one box gives you access to the small models, for a GPT4 like model you'd need (for inference and if you had the model and tools) probably 100 of those 4090s or 25 of H100 with 96 GBs I guess to fit in 2 TB of model data.


Currently we do not explore sparsity. The next iteration of models will be much more compact by focusing on reducing effective tensor size.


It is quite likely GPT-4 uses one or even two sparsity approaches on top of each other (namely, coarse grained switch transformer-like and fine grained intra-tensor block sparsity), if you look at the openly available contributors' research CVs.

Google, in collaboration with OpenAI, has published an impressive tour de force where they have throughly developed and validated at scale a sparse transformer architecture, applied to general language modeling task: https://arxiv.org/abs/2111.12763

This happened in November of 2021, and there is a public implementation of this architecture on the Google's public github.

Impressively, due to some reasons, other up-and-coming players are still not releasing models trained with this approach, even though it promises multiplicative payoff in inference economy. One boring explanation is conservatism for NN training at scale, where training runs cost O(yearly salary).

Let's hope the open source side of things catches up.


not in collaboration with openai, one of the authors joined openai before the paper was written and arxiv'd


At least from my experience with sparse matrix libraries like Eigen, you need to get the sparsity down to about 5% before switching from a dense to a sparse algorithm gets you execution time benefits.

Of course from a memory bandwidth and model size perspective maybe there are benefits long before that.


It seems like a ton of engineering effort has been put into these neural network frameworks. How didn’t they explore sparsity yet? With numerical linear algebra that’s, like, the most obvious thing to do (which is to say, you probably know beforehand if your problem can be mapped to sparse matrices).

(Edit: just to be clear here, I’m not saying I expect the whole field is full of dummies who missed something obvious or something like that, I don’t know much at all about machine learning so I’m sure I’m missing something).


It's not at all strange to get something to work before you start optimizing. I mean if you can only run small models then how would you even know what you're losing by optimizing for space? Heck you wouldn't even know how the model behaves, so you won't know where to start shaving away.

I'm not saying it's impossible but if resources allow it makes a lot of sense to start with the biggest model you can still train. Especially since for whatever reason things seem to get a lot easier if you simply throw more computing power at it (kind of like how no matter how advanced your caching algorithm it's not going to be more than 2 times faster than the simplest LRU algorithm with double the amount of cache).


From what I’ve read recently, most sparse methods just haven’t given that much improvement yet, and we’re only recently pushing up against the limits of the “just buy more RAM” approach.

It sounds like there is a lot of work happening on sparse networks now, so it’ll be interesting to see how this changes in the near future.


Gpus were built for dense math and they ran with it. To the point current best architectures are in part just the ones that run best using the subset of linear algebra gpus are really good at.

There has been a lot of work on sparsity and discovering sparse subnetworks in trained dense networks. And intel even proposed some alternative cpu friendly architectures and torch/tf and gpus are starting to do okay with sparse matrixes so thing are changing.


Sounds a bit like premature optimization ( to have done it _by_ now ) , I bet it's in the works now though.


"Do things that don't scale" – PG


llama-65b on a 4-bit quantize sizes down to about 39GB - you can run that on a 48GB A6000 (~$4.5K) or on 2 x 24GB 3090s (~$1500 used). llama-30b (33b really but who's counting) quantizes down to 19GB (17GB w/ some optimization), so that'll fit comfortably on a 24GB GPU.

A 4-bit quantize of a 1000B model should be <600GB, so would fit on a regular 8x80 DGX system.


I wonder if an outdated chip architecture with just a lot of (64GB?) GDDR4 or something would work? Recycle all the previous generation cards to super high VRAM units


I've seen reports of people being able to run LLMs at decent speeds on old Nvidia P40s. These are 24GB Pascal GPUs and can be bought for as low as $100 (although more commonly $200) on eBay.


Link to report please



You can do it on CPU now.

Benchmarks: https://github.com/ggerganov/llama.cpp/issues/34


It should work. After the initial prompt processing, token generation is typically limited by memory bandwidth more than by compute.


Intel Larrabee has come into the chat but showed up about decade and a half too early.


For completeness, Apple’s consumer GPUs currently max out at 64 GB - OS overhead, so about 56 GB. But you are limited to 1 GPU per system.


Benchmarks for what you can do on CPU alone.

https://github.com/ggerganov/llama.cpp/issues/34

An M1 Max does 100ms per token. A 64 core threadripper about 33ms per token.


I was in Japan recently and they sell these pocket size translator devices with a microphone, camera and screen. You can speak to it or take pictures and it will translate on the fly. Maybe $100 usd range for a nice one.

It's only a matter of time before someone makes a similar device with a decent LLM on it, and premium ones will have more memory/cpu power.


I mean...isn't that just a smartphone?

I know exactly what you're talking about because my father-in-law had the same thing. I'm just very skeptical that specialist hardware will overtake general commoditized computing devices for mass-market usage. The economics alone make it unlikely.


Even though Word Lens was first released over 12 ago, I keep surprising people[0] by showing them the same technology built into Google Translate.

[0] even other tech developers who, like me, migrated somewhere where a language barrier came up


Yes, that is because Google bought the technology from Word Lens in 2015.

Wiki says: https://en.wikipedia.org/wiki/Ot%C3%A1vio_Good

    To develop Word Lens, Otávio Good founded Quest Visual Inc., which was acquired by Google, Inc. in 2014, leading to the incorporation of the Word Lens feature into the Google Translate app in 2015.


I think you misunderstand.

The people I show it to are surprised that it's possible in 2023, even though it was demoed at the end of 2010.

Not only have they never heard of Word Lens, they are also obvious of the corresponding feature of Google Translate.


Yes, usually they are locked-down Android devices.

One of them has an unlimited "free forever" internet subscription so it can fetch network translations online if the local dictionary doesn't have the word.


I think we as humans have a tendency to extrapolate from our present position to a position we can imagine that we’d like, even if there isn’t a foreseeable path from here to there. I believe this may end up being one of those cases.


Why? What GP describes seems both feasible and inevitable to me.


It's a win for Google that LLMs are getting cheaper to run. OpenAI's service is too expensive to be ad-funded. Google needs a technology that's cheaper to provide to maintain their ad-supported business model.


Google could make a bet like they did with YouTube.

At the time, operating YouTube was eye wateringly expensive and lost billions. But google could see where things were going: a triple trend of falling storage costs, falling bandwidth and transmission costs (I’m trying to dig up a link I read years ago about this but google search has gotten so shit that I can’t find it).

It was similar for Asic miners for Bitcoin. Given enough demand, specialised, lower cost hardware specially for LLMs will emerge.


On the flip side, I found only one person (I'm sure there are more) that are attacking the software efficiency side of things. You would be quite surprised how inefficient the current LLM software stack is, as I learned on a CPP podcast [0]. Ashot Vardanian has a great github repo [1] that demonstrates many ways compute can come way down in complexity and thus cost.

[0] https://cppcast.com/ai_infrastructure/ [1] https://github.com/orgs/unum-cloud/repositories?type=all


You realize that most reports are that YouTube is barely profitable.


You can run it (quantified at least) on a $4000 Mac thanks to Apple's unified memory. Surely other manufacturers are looking for how to expand VRAM, hopefully Intel or AMD.


Not to mention Apple chips have a bunch of very nice accelerators and also (!!!) macOS contains system frameworks that actually use them.


The article talks about this explicitly though. Reasonably good models are running on raspberry Pis now.


Is a reasonably good model what people get value out of though?

Maybe this is why Sam Altman talked about "the end of the large LLMs is here"? He understands anything bigger than ChatGPT-4 isn't viable to run at scale and be profitable?


Does this mean models larger then ChatGpt would still be better for the same data size as long as someone is ready to pay?

At what limit does it stop getting better?


I thought he was fairly explicit that he thought larger models would provide incremental gains for exponentially greater cost, so yeah, I guess not profitable is a way to put it...


Or you can apply GPU optimizations for such ML workloads. By optimizing the way these models run on GPUs, significantly improve efficiency and slash costs by a factor of 10 or even more. These techniques include kernel fusion, memory access optimization, and efficient use of GPU resources, which can lead to substantial improvements in both training and inference speed. This allows AI models to run on more affordable hardware and still deliver exceptional performance. For example, LLMs running on A100 can also run on 3090s with no change in accuracy and comparable inference latency.


You're right and this is why they didn't heavily use BERT(in the full sense), arguably the game-changing NLP model of the 10s. They couldn't justify bringing the cost per search up.


nah, Lora quantized LLM’s are going to be at the OS level in 2 years and consumer architecture refreshes are just going to extend more RAM to already existing dedicated chips like Neural Engine

client side tokens per second will be through the roof and the models will be smaller


LoRa is not a quantization method, it's a fine tuning method.


It's two adjectives on the noun.


you read that whole paragraph and assumed this prediction didn't involve consumers using fine tuned models despite lora being explicitly mentioned?

-EQ moment

edit: its about the combination of those methods making models accessible on consumer hardware


Within a decade mid-level consumer cards will be just as powerful as those $40k cards.


Considering how long it took mid level consumer cards to beat my $600 1080, you're way more optimistic than I am.


So... Four years? The 1080 launched in 2016, and the 3070 launched in 2020, for $100 cheaper — the launch price of the 1080 was $699, and the 3070 was $599. The 3070 easily trounced the 1080 in benchmarks.

The 3060 effectively matched a 1080 at $329 in 2021 (and has 50% more VRAM at 12GB instead of the 1080's 8GB), so call it five years if the 3070 isn't mid-range enough.

The 3060 Ti launched in 2022 at $399 and handily beat the 1080 on benchmarks, so call it six years if you want the midrange card to beat (not just match) the previous top-of-the-line card, and if a *70 card doesn't count as midrange enough. Less than a decade still seems like a reasonable claim for a midrange card to beat a top-of-the-line card.


The 3060 was only readily available quite recently, so it's about 6 years from ready availability of the 1080 at $600 to 3060.

Taking 6 years to double the perf/$ implies that it would take ~42 years for a $40000 H100 to reach mid-range levels. Assuming scaling, particularly VRAM, holds.

And plus it would be getting really close to the Landauer limit by that point.


are you conveniently forgetting how none of those cards were actually available for consumers to buy


The 1080 was impossible to buy at launch too and was sold out for months. And the 3060 is easy to buy!


The 3060 is easy to buy NOW, since cryptomining has crashed, and the 40x0 GPUs have become available (though mostly still above MSRP).


The main moat here is VRAM, not raw compute power.


Given how nvidia has almost no competition, it just seems unlikely that nvidia decides to stop milking the enterprise and they will continue to lock 40GB+ cards behind ludicrous price points


Nvidia may not need to price compete immediately, but if LLMs drive demand for more capable consumer hardware they can either:

(1) make a whole lot of money fulfilling that demand, or

(2) leave an unmet demand that makes it more attractive for someone else to spend the money to field a solution.

(1) seems an attractive choice even if it wasn’t for the potential threat of (2).


It wasn't ML but gaming that drove the demand for GPUs, and ML sort of rode in the slipstream (same for crypto). Later demand for crypto hashing drove GPU sales as much as gaming but that is now over. So unless either (1) ML by itself can present a demand as large as either crypto or gaming or (2) Crypto or gaming can provide a similar demand as they've done in the past the economies of scale that drove this will likely not be reached again. If they do however the cost of compute will come down drastically and that in turn may well drive another round of advances in models for the larger players.


>but if LLMs drive demand for more capable consumer hardware they can either:

That's a big if. I imagine people who will buy GPUs to run LLMs locally are either researchers or programmers. I am imagine most consumer focused solutions will be cloud first. That is a much smaller market than gamers and nvidia wouldn't want to cannibalize their datacenter offerings by releasing something cheaper. It's far better for them to sell high tier GPUs to Amazon and let them "rent" it out to researchers and programmers.


Would google even care about integrating LLMs into search? They don’t even prune all the spam entries, presumably because they increase advertising revenue and analytics profit.


We used to get only a limited number of Internet hours. By the time December 2003 rolled around my family had always on internet.

Besides what Google does here is besides the point, because Bing has already unleashed ai search. Google will either follow along or stop being relevant.


This cost argument is being overblown. While it's a limitation for today's product, enginners are very good at optimization. Therefore the costs will drop in the medium to long term from efforts on both the software and hardware side.


Is the assumption that GPU power and advancements in AI will not get to a reasonable price point in the near future? Because it seems to me that advances in computation have not slowed down at all since it started.


The thing you have in your pocket would have meant an enormous investment for equivalent compute power just decades ago and filled a whole basement with server racks.


The legendary Cray-2 was the fastest supercomputer in the world in 1985, with peak 1.9 GFLOPS. Less than four decades ago.

By comparison, the Cray is outperformed by my smartphone.

Actually, it is outperformed by my previous smartphone, which I purchased in 2016 and replaced in 2018.

Actually, it is outperformed by a single core on my previous smartphone, of which it has eight cores.


Or you can quantize the model and run it on your laptop.


caching + simpler models for classification / triage should reduce the load on the big model.


Yeah, today.


> It's going to be seamlessly integrated into every-day software.

I...kinda don't want this? UIs have already changed in so many different fits, starts, waves, and cycles. I used to have skills. But I have no skills now. Nothing works like it used to. Yeah they were tricky to use but I cannot imagine that a murky AI interface is going to be any easier to use, and certainly impossible to master.

Even if it is easier to use, I am not sure I want that either. I don't know where the buttons are. I don't know what I can do and what I can't. And it won't stay the same, dodging my feckless attempts to commit to memory how it works and get better at it...?


I once volunteered with a older woman. She’d been a computer programmer in the 70s, using punch cards and, later, Pascal.

Then she had kids and stopped working for a while and the technology moved on without her. Now she’s like any other old person, doesn’t know how to use a computer and gets flustered when eg: trying to switch from the browser back to Word. Her kids and grandkids clown on her for being hopeless with computers.

I asked her what it was she found difficult about modern computers compared to what she worked with 50 years ago. She said it’s the multitasking. The computers she had worked with just did one thing at once.


Interesting story. Thanks for sharing this. I have a somewhat similar story with my failure to transition from paltformers / sidescrollers to 3D games. I just couldn't do it.


Me too! My brain just won’t let me immerse.

It was when sonic went 3D that it all began for me.

Wait…he’s running away from me?


Sonic 3d was a bad game. Did you play Super Mario 64?


Indeed. I like the fact that my stove has only the knobs and buttons on it (other than a 7-segment LED display). I am master of my stove because I am pretty sure I have explored the complete state space of it by now.


I worked with somebody who developed MULTICS but struggled constantly to do even the most basic tasks on a Mac even after using Macs for a decade. It was painful to watch them slowly move a mouse across the screen to the apple, take about ten seconds to click it, and then get confused about how to see system info.


It was a sad day when I realized I was systematically overinvesting in skills on churning technology and that my investments would never amortize. Suddenly my parents' stubborn unwillingness to bother learning anything technological made complete sense and I had to adjust my own patience threshold sharply downwards.


There are some software tools where the investment pays back, and has been over decades. Microsoft Office (in part because it's not reinventing itself, but rather accrues new features; in part because everyone else copies its UI patterns). Photoshop. Emacs.

With modern software, I find it that there isn't much to learn at all - in the past decade, seems to only be removing features and interaction modes, never adding anything new.

Still, I don't regret having learned so much all those years ago. It gives me an idea what the software could do. What it was supposed to do. This means I often think of multi-step solutions for problems most people around me can't solve unless there's a dedicated SaaS for it. As frustrating as it often is to not be able to do something you could 10 years ago, sometimes I discover that some of the more advanced features still remain in modern toy-like software.


Office UI was reinvented at least one time. I remember when that god awful ribbon showed up.

"In 2003, I was given the chance to lead the redesign of the most well-known suite of productivity software in the world: Microsoft Office.

"Every day, over a billion people depended on apps like Word, Excel, PowerPoint, and Outlook, so it was a daunting task. This was the first redesign in the history of Office, and the work that we did ended up shaping the standard productivity experience for the next two decades."

...

https://jensenharris.com/home/office

https://jensenharris.com/home/ribbon


That's not what I mean. The ribbon, whatever you think of it, only moved some functionality around. It didn't actually change the way old functionality worked. All the things you knew how to do, you could still do - you only had to learn their new placement.


If we consider the Word 2.0 (for Windows) era as the beginning of a graphical Microsoft Office suite, then graphical Office has had the ribbon for as long as it didn’t: 16 years.

I’m still waiting for people to stop complaining about it.


That move was awful


For you.


The irony is that LLMs are actually the real solution that Ribbon took an awkward half step towards -- how to quickly get to what a user actually wants to do.

Originally: Taxonomically organized nested menus, culminating at a specific function or option screen

Now: Usage-optimized Ribbon (aka Huffman coding for the set off all options), culminating at a specific function or option screen

Future: LLM powered semantic search across all options and actions, generating the exact change you want to implement

Why have an "email signature" options page at all, when an LLM can stitch together the calls required to change it, invoked directly from English text?


Good local search gets you 80% of the way there. 20 years ago, this was an inspiring UX trend (Quicksilver / Subject Verb Object), but it fizzled. Apple kept the torch lit with menu search and it has been brilliant, but limited to their platform, although I am pleased to see that MS Office got menu search in Oct 2022. Hopefully they don't lose interest like they did for desktop search.

LLMs could certainly help loosen the input requirements, not to mention aim some sorely needed hype back in this direction. I am afraid that they will murder the latency and compute requirements, but hey, progress is always two steps forward one step back.


I was never terribly impressed with local search on os x but maybe I didn't use it enough.

For a while, ubuntu had a local search where you could hit a button (super?) start typing, and it would drill through menus at lightning speed


> Why have an "email signature" options page at all, when an LLM can stitch together the calls required to change it, invoked directly from English text?

10-20 years from now? Maybe. It depends on whether or not the industry will cut corners in this part of the experience. I find it hard to predict which features get polished, and which are forever left barely-functioning. Might depend on which are on "critical path" to selling subscriptions.

0-10 years from now? We'll still need the options page.

"Email signature" options page provides visibility, consistency, and predictability. That is, you can see what your e-mail signatures are, you are certain those are the ones that will be used for your message, under conditions defined in the UI pane, and if you change something, you know exactly how it will affect your future e-mails.

LLMs are quite good at handling input, though not yet reliable enough. However, as GUI replacement, they are ill-suited for providing output - they would be describing what the result is, whereas the GUI of today display the result directly. As the writers' adage goes, "show, don't tell".

(That said, the adage is way overused in actual storytelling.)


In this case, the LLM isn't generating the output. It's only generating the sequence of actions to implement the user input.

Think less "make a signature for me" and more "here's the signature I want to use, make it my default."

Then the model would map to either Outlook / Options / Signature / fields or directly to whateverSetSignature().

From that more modest routing requirement, it seems a slam dunk for even current models (retrained to generate options paths / function calls rather than English text).


You missed Office 2000: magically disappearing menu items. It was pretty confusing.


Oof. I did. Ribbon felt enough like magically disappearing menu items to me.


I find focusing on fundamental tools and concepts like terminals and text mode editors like vi and emacs will pay off handsomely.

All the fancy dialogs will switch around every few years.

This mindset extends to stuff like Word. Whenever you do something think hard about the essence of what you’re doing. Realize this should have been a script, but due to constraints in reality you are forced to use some wanky GUI.

If you look at it like this, you won’t care the pixels move around. Your mental model will be solid and building that is 90% of the work.


Maybe my time will come some day (I'm 32 years old), but I always tell myself that learning how to learn and being interested in new/different technology is how I keep myself updated. The latter is probably difficult to maintain, but this whole AI thing has given me a new pastime hobby I could never imagine.

Maybe I'll reject instead of embrace the next big thing once I'm old enough?


i'm using a web browser broadly similar to mosaic (01994) on a site that works similarly to early reddit (02005). in another window i'm running irssi (01999) to chat on irc (01988, but i've only been using it since 01994) inside screen (01987, but i've only been using it since 01994), to which i'm connected with ssh (01996) and mosh (02011, but i didn't start using it until last month). in my unix shell window (mate-terminal, but a thin wrapper around libvte (02002), mostly emulating a vt100 from 01978) i'm running a bourne shell (01979, but really it's brian fox's better reimplementation which he started in 01989) in which i just ran yt-dlp (which has to be updated every couple of months to keep working, but mostly has the same command-line flags as youtube-dl, first released in 02006) to download a video encoded in h.264 (02003) in an mpeg-4 container (01998), and then play it with mpv (02013, but forked from and sharing command-line flags with mplayer (02000)). mpv displays the video with a gpu renderer (new) on a display server running x11 (01987).

in another browser tab i'm running jupyter (the renamed ipython notebook from 02011) to graph some dsp calculations with matplotlib (02003, but mostly providing the plotting functions from matlab (01984)) which i made with python (01991, but i've only been using it since 02000) and numpy (02006, but a mostly compatible reimplementation of numeric from 01995, which i've been using since 02003). in jupyter i can format my equations in latex (01984, but for equations basically the same as plain tex82 from 01982, but i've only been using them since 01999) and my text in markdown (02004, though jupyter's implementation supports many extensions). i keep the jupyter notebook in git (02005, but i've only been using it since 02009, when i switched from cvs (01986, but i've only been using it since about 01998)). the dsp stuff is largely applications of the convolution theorem (01822 or 01912) and hogenauer filters (01981).

i do most of my programming that isn't in jupyter in gnu emacs (01984, but i didn't start using it until 01994) except that i prefer to do java stuff in intellij idea, which i first used in 02006

earlier this year, my wife and i wrote our wedding invitation rsvp site in html (01990, but using a lot of stuff added up to 02000) and css2 (01998) plus a few things like corner-radius (dunno, 02006?) and a little bit of js (01995, but in practical terms 02004), with the backend done in django (02005) backed by sqlite (02000, but this was sqlite3 (02004), but sqlite mostly just implements sql (01974, first publicly available in 01979, but mostly the 01992 ansi standard) which in turn mostly just implements codd's relational data model (01970) and acid transactions (01983), all of which i've been using since 01996). and of course python, emacs, and git. most of the css debugging was done with chromium's web inspector (present from the first chrome release in 02008, a clone of firebug (02006)).

for looking up these dates just now, i used google's newish structured search results, which mostly pull information from wikipedia (02001); i also used stack overflow (02008) and its related sites.

the median of the years above is 01998, with 25% and 75% quartiles of 01987 and 02004, which i calculated using r (01997, but a reimplementation of s (01976)). if we assume that each new introduction listed above displaced some existing skill, then we can vaguely handwave at a half-life of about 25 years for these churning technology skills, which to me seems like enough time for a lot of them to amortize; but it seems like it's slowing down a lot, because the 25% quartile is in 01987 and not 01973

it's true that all the time i spent configuring twm, olvwm, fvwm, and afterstep, and working around bugs in netscape 4's javascript implementation, and maintaining csh scripts and informix ace database applications, and logging into anonymous ftp sites running tops-20, and debugging irq conflicts, isn't really serving me today. but you could kind of tell that those things weren't the future. the surprising thing is really how slowly things change: that we're still running variants of the bourne shell in emulators of the vt100

other still-relevant technological skills for me today include building a fire, qwerty typing, ansi c (my wife is taking a class), bittorrent, operating a gas stove, and making instant coffee. still beyond me, though, is how to turn this tv on


If it is seamlessly integrated, the AI won't even surface in a UI. You will just be presented with different options in the UI, which theoretically would be more precisely curated by the AI that you don't even see.


That runs counter to some very well established UI principles. People get confused when their interface changes except as a result of direct interaction. Open up a menu in response to a click, yes; reorganize menus to "optimize" them based on what a model predicts a person is going to do, no.

The killer is being able to tell a program what you want it to do, then not having to fuddle with buttons or menus at all (unless you want to tweak things).


While I agree in principle, AI-infusion in UIs does not need to break convention. For example, AI suggestions could subtly highlight menu options that would be most useful in an auto-detected workflow. We could also create a persistent area on the UI for the AI to "speak up".


An AI interface in Office brings back memories of Clippy.


We may not have seen the last of Clippy yet… https://gwern.net/fiction/clippy



Now imagine Clippy on a car touchscreen.


NIO sells cars with a tiny cute robot that moves on the dashboard, named NOMI.

A lot of people hate it as it’s closer to Google assistant than GPT4 and it makes mechanical noise when it rotates but I don’t think it’s a terrible idea.

Anthropomorphism, the attribution of human traits to things, is common with cars.

I would like my car to run a LLM instead of being so stupid at understanding my voice commands.


shell is the same


I think Satya Nadella put it pretty well in an interview: ad revenue, especially from search, is incremental to Microsoft; to Google, it's everything. So while Microsoft is willing to have worse margins on search ads in order to win marketshare from Google, Google has to defend all of their margins — or else they become significantly less profitable in their core business. LLMs cost a lot more than traditional search, and Google can't just drop-in replace its existing product lines with LLMs: that hikes their bottom line, literally. Microsoft is willing to swap out the existing Bing with the "new Bing" based on OpenAI's technology, because they make very little money comparatively on search, and winning marketshare will more than make up for having smaller margins on that marketshare. Google is, IMO, in between a rock and a hard place on this one: either they dramatically increase their cost of revenue to defend marketshare, or they risk losing marketshare to Microsoft in their core business.

Meanwhile, OpenAI gets paid by MS. Not that MS minds! They own a 49% stake in OpenAI, so what's good for OpenAI is what's good for MS.

If Google had decades to figure it out, I think your analysis might be right — although I'm not certain that it is, since I'm not certain that the calculus of "free product, for ad revenue" makes as much sense when the products are much more expensive to run than they were previously. But even if it's correct in the long run, if Google starts slipping now it turns into a death spiral: their share prices slip, meaning the cost of compensation for key employees goes up, meaning they lose critical people (or cut even further into their bottom line, hurting their shares more, until they're forced to make staffing cuts), and they fall even further behind. Just as Google once ate Yahoo! via PageRank, it could get eaten by a disruptive technology like LLMs in the future.


"OpenAI gets paid by MS"

Actually, MS gets paid by OpenAI at 70% of profits until they make back their investment (according to articles on the terms)


Note that MS cost-offsets much OpenAI infrastructure including their top-5 TOP500 class supercomputer (similar to a full TPUv4 pod)


To be fair, the open source model has been what's been working for the last few decades. The concern with LLMs was that open source (and academia) couldn't do what the big companies are doing because they couldn't get access to enough computing resources. The article is arguing (and I guess open source ML groups are showing) you don't need those computing resources to pave the way. It's still an open question whether OpenAI or the other big companies can find a most in AI via either some model, dataset, computing resources, whatever. But then you could ask that question about any field.


But none of the "open source" AI models are open source in the classic sense. They are free but they aren't the source code; they are closer to a freely distributable compiled binary where the compiler and the original input hasn't been released. A true open source AI model would need to specify the training data and the code to go from the training data to the model. Certainly it would be very expensive for someone else to take this information, build the model again, and verify that the same result is obtained, and maybe we don't really need that. But if we don't have it, then I think we need some other term than "open source" to describe these things. You can get it, you can share it, but you don't know what's in it.


RWKV does: https://github.com/BlinkDL/RWKV-LM It uses „the Pile“: https://pile.eleuther.ai/ And I’ve seen some more in the last weeks.


Good to hear. Let's reserve "open source" for cases like that.


Keep an eye on the RedPajama project for a model where the training data and code should both be freely available: https://simonwillison.net/tags/redpajama/


I agree with you to the extent that yeah technically it's not open source because the data is not known. But for these foundation models like Llama, the model structure is obviously known, pretty sure (didn't check) the hyperparameters used to train the model is known, the remaining unknown of data, it's pretty much the same for all foundation models, CommonCrawl etc. So replicating Llama once you know all that is a mechanical step and so isn't really closed source in a sense. Though probably some new term open something is more appropriate.

The real sauce is the data you fine tune these foundation models on, so RLHF, specific proprietary data for your subfield, etc. The model definition, basically Transformer architecture and a bunch of tricks to get it to scale are mostly all published material, hyper parameters to train the model are less accessible but also part of published literature; then the data and (probably) niche field you apply it to becomes the key. Gonna be fun times!


These "open source" ai models are more like Obtainable models. You can obtain them. The source is not open, hence open-source. Somewhere open-source got lumped in with free or accessible. Obtainable makes sense to me.


That makes sense. But I would argue to smaller/cheaper models are not a threat to Google, they are a solution. They will still have the reach advantage and can more cheaply integrate small/low costs models at every touch point.


Disagree. What you have in mind is already how the masses interact AI. There is little value-add for making machine translation, auto-correct and video recommendations better.

I can think of a myriad of use-cases for AI that involve custom-tuning foundation models to user-specific environments. Think of an app that can detect bad dog behavior, or an app that gives you pointers on your golf swing. The moat for AI is going to be around building user-friendly tools for fine-tuning models to domain-specific applications, and getting users to spend enough time fine-tuning those tools to where the switch-cost to another tool becomes too high.

When google complains that there is no moat, they're complaining that there is no moat big enough to sustain companies as large as Google.


Fine tuning isn't a thing for foundational models though, it's all about in context learning.


that means there's no money in making foundation models - the economics are broken.


Making video recs better translates to direct $$$

There’s a reason YT or TikTok recommendation is so revered


Honestly, I can't see Google failing here. Like other tech giants, they're sitting on a ridiculously large war chest. Worst case, they can wait for the space to settle a bit and spend a few billion to buy the market leader. If AI really is an existential threat to their business prospects, spending their reserves on this is a no-brainer.


> Honestly, I can't see Google failing here. Like other tech giants, they're sitting on a ridiculously large war chest. Worst case, they can wait for the space to settle a bit and spend a few billion to buy the market leader.

It seems incredibly likely that the FTC will block that. New leadership seems to be of the opinion that consumer harm is the wrong standard. Buying the competition with profits from a search monopoly leaves all parties impoverished.

Anyways, I don't think the risk is failure, but of non-success. The article claims meta won but it seems like nvidia is the winner: everyone uses their chipsets for training, fine tuning and inference. And the more entrants and niche applications show up the more demand there is for their product. TPUs theoretically play into this, but the "leak" doesn't mention them at all.


> The article claims meta won but it seems like nvidia is the winner: everyone uses their chipsets for training, fine tuning and inference. And the more entrants and niche applications show up the more demand there is for their product.

Like the saying goes: during a gold rush, sell shovels.


That was true for IBM in the 1970s and Microsoft in the 90s. Despite holding a royal flush, they managed to lose the game through a combination of arrogance, internal fighting, innovator's dilemma, concern over anti-trust, and bureaucratic inertia. It will be hard for Google to pull this off.


Microsoft aint doing so bad now


Microsoft is always doing bad. They've done bad for the life of the company. Microsoft has never 'done good'. They have always been a negative force in computing and society at large. This toxic culture comes from their founder, about which all the preceding also applies.


No idea whether this comment is satirical or not. As a reader I think that's marvelous (please don't break the suspense)


After Microsoft swapped out the CEO. New guys better than Balmer.


They won't fail, they'll just provide compute infrastructure for people building AI products. Google is mostly bad at building products these days.


GCP is a product, too, but it’s not as good as either of the top two, that’s a low margin market, and a key theme in this article is that people have made model tuning less expensive.

There’s no path forward for Google which doesn’t involve firing a lot of managers and replacing them with people who think their income depends on being a lot better at building and especially maintaining products.


I don't think google is ever going to get it's mojo back, Pichai has no vision. Long term I think google will see massive layoffs, and new products will come via alphabet acquisitions following the YouTube model.


The threat isn’t that another company has AI, it’s that they don’t (yet) have a good way to sell ads with a chat bot. Buying the chat bot doesn’t change that.


What I mean is, if they can't figure out the ad angle and end up facing an existential threat, they have enough money to just drop their existing ad business almost entirely, and buy out the leading AI company to integrate as a replacement business model. It would be bloody (in the business sense, at least), but Google would likely survive such drastic move.


I think this won't work out: AI is so popular now because it's a destination. It's been rebranded as a cool thing to play with, that anyone can immediately see the potential in. That all collapses when it's integrated into Word or other "productivity" tools and it just becomes another annoying feature that gives you some irrelevant suggestions.

OpenAI has no moat, but at least they have first mover advantage on a cool product, and may be able to get some chumps (microsoft) to think this will translate into a lasting feature inside of office or bing.


It being everywhere worries me a lot. It outputs a lot of false information and the typical person doesn’t have the time or inclination to vet the output. Maybe this is a problem that will be solved. I’m not optimistic on that front.


Same can be said about the results that pop up on your favorite search engine or asking other people questions.

If anything advances in AI & search tech will do a better job at providing citations that agree & disagree with the results given. But this can be a turtles all the way down problem.


There’s a real difference in scale and perceived authority: false search results already cause problems but many people have also been learning not to blindly trust the first hit and to check things like the site hosting it.

That’s not perfect but I think it’s a lot better than building things into Word will be. There’s almost no chance that people won’t trust suggestions there more than random web searches and the quality of the writing will make people more inclined to think it’s authoritative.

Consider what happened earlier this year when professor Tyler Cowen wrote an entire blog post on a fake citation. He certainly knows better but it’s so convenient to use the LLM emission rather than do more research…

https://www.thenation.com/article/culture/internet-archive-p...


No it won't and random search popup results are already a massive societal problem (and they're not even used like people are attempting to use AI - to make decisions over other peoples lives in insurance, banking, law enforcement and other areas where abuse is common when unchecked).


Low quality blogs etc stand out as low quality, LLMs can eloquently state truths with convincing sounding nonsense sprinkled through out. It's a different problem and many people already take low quality propaganda at face value.


I think this is a failure in how we fine-tuned and evaluated them in RLHF.

"In theory, the human labeler can include all the context they know with each prompt to teach the model to use only the existing knowledge. However, this is impossible in practice." [1] Therefore causing and forcing some connections that are not all there for the LLM. Extrapolate that across various subjects and types of queries and there you go.

1:https://huyenchip.com/2023/05/02/rlhf.html


> This so-called "competition" from open source is going to be free labor. Any winning idea ported into Google's products on short notice. Thanks open source!

How else, exactly, is open source supposed to work? Nobody wants to make their code GPL but everybody complains when companies use their code. I get that open source projects will like companies to contribute back, but shouldn't that go for everyone using this code? Like, I don't get what the proposed way of working is here.


Developers nowadays want to have their cake and eat it too. They want to develop FOSS code because capitalism is evil and proprietary software is immoral and Micro$oft is the devil, man, and so give their work away for free... but whenever a company makes money on it and gives nothing back, completely in line with the letter and spirit of FOSS (because requiring compensation would violate user freedom,) they also want to get paid.

Like the entire premise of FOSS is that money doesn't matter, only freedom matters. You're not supposed to care that Google made a billion dollars off your library as long as they keep it open.


I see this as part of the decline of hacker culture and rise of brogrammers. I see very few people programming for fun, everyone seems to be looking for a monetization opportunity for every breath they take.


For some strange reason (maybe moral failure?) people seem to have this insatiable addiction to food and shelter and most people have found no better way to support that addiction than to exchange labor for money.

The list of things I consider “fun” besides programming when I get off work is a mile long.


Then don't do open source work. You can't be donating your work under a permissive license and then complain that someone else used it. Make up your mind.

Edit: also please gtfo with your condescending tone. Everyone needs to eat and most people are working class. Don't act like you are the only one who has a unique experience of hunger and thirst.


The initial post I was replying to was:

>I see very few people programming for fun, everyone seems to be looking for a monetization opportunity for every breath they take.

So yes, thinking that most developers are going to do it for “fun” after working 40 hours a week is kind of naive.


All I said was it used it happen more before and now it happens less. I never made any comments about what quantity does it. And it's not even about that. The culture has gone from earn to live to live to earn and not just in programming.


I’ve been in this field professionally for over 25 years. There has never been a time where people weren’t interested in making the most money possible given their skillset and opportunity.

Or are you saying in some distant past that people did it for the love? I was a junior in high school when Linux was introduced and I was on Usenet by 1993 in the comp.lang.* groups.

The “culture” hasn’t changed - just the opportunities.


When linux was introduced in that group majority of the posts weren't asking how to make money from it. That's the difference. Try hanging out in langchain and openAI discords. You will see the difference.


free as freedom, but not free as beer?


That actually favors corporations more. I'm a FOSS advocate today because cricket bats costed money but Ruby was free, so I learned that.


> It's going to be seamlessly integrated into every-day software. In Office/Google docs, at the operating system level (Android), in your graphics editor (Adobe), on major web platforms: search, image search, Youtube, the like

Agreed but I don’t think the products that’ll gain market share from this wave of AI will be legacy web 2 apps; rather it’ll be AI-native or first apps that are build from ground up to collect user data and fulfill user intent. Prime example is TikTok.


You bottled up exactly my disappointment with some large companies' legacy AI offerings. They don't do both: iterate off of telemetry data and fulfill user's needs.


The problem is that the llms are better at search (for an open ended question) than Google is and that’s where most of googles revenue comes from. So it actually gives a new company like openai the opportunity to change consumers destinations from google


> OpenAI faces the existential risk, not Google.

Yes, but the quickest way for anyone to get themselves to state-of-the-art is to buy OpenAI. Their existential risk is whether they continue to be (semi)independent, not whether they shutdown or not. Presumably Microsoft is the obvious acquirer, but there must be a bunch of others who could also be in the running.


But if you wait a month you can get that model for free...


Where?


It's in reference to the article's open source free laborers.

https://www.semianalysis.com/p/google-we-have-no-moat-and-ne...


This is 100% correct - products evolve to become features. Not sure OpenAI faces the existential risk as MS need them to compete with Google in this space.


> Not sure OpenAI faces the existential risk as MS need them to compete with Google in this space.

I think OP is arguing that in that partnership Microsoft holds the power, as they have the existing platforms. The linked article argues that AI technology itself is not as much of a moat as previously thought, and the argument therefore is that Microsoft likely doesn't need OpenAI in the long term.


"Any winning idea ported into Google's products on short notice."

Imagine for a moment, in a different universe, in a different galaxy, another planet is ostensibly a mirror image of Earth, evolving along the same trajectory. However on this hypothetical planet, anything is possible. This has resulted in some interesting differences.

The No Google License

Neither Google, its subsidiaries, business partners nor its academic collaborators may use this software. Under no circumstance may this software be directly or indirectly used to further Google's business or other objectives.

If 100s or 1000s or more people on planet X started adopting this license for their open source projects, then of course it won't stop Google from copying them or even using the code as is. But it would muddy the waters with 100s or 1000s or more potential lawsuits. Why would any company risk it.

There is nothing stopping anyone writing software for which they have no intention of charging license fees. It's done all the time these days. There is also nothing stopping anyone from prohibiting certain companies from using it, or prohibiting certain uses.

I recall in the early days of the web when "shareware" licenses often tried to distinguish commercial from non-commercial use. Commercial use would presumably incur higher fees. Non-commercial use was either free or low cost. I always wondered, "How is the author going to discover if XYZ, LLC is using his software?" (This is before telemetry was common.) The license seemed unworkable, but that did not stop me from using the software. I was never afraid that I would be mistaken for a commercial user and the author would come knocking asking me to agree to a commercial license. I doubt I was the only one bold enough to use software with licenses prohibiting commercial use.

Even a "No Microsoft License" would make Github more interesting. One could pick some random usage. Microsoft may not this software for X. Would this make MSFT's plans more complicated. Try it and see what happens. Only way to know for sure.

Instead, MSFT is currently trying to out the plaintiffs in the Doe v Github case, over MSFT's usage of other peoples' code who put their stuff on Github, and as the Court gets ready to decide the issue, it's becoming clear IMO that if these individual are named, these brave individuals will lose their jobs and be blackballed from ever working in software again.

The No Internet Advertising License

This software may not be used to create or support internet advertising services for commercial gain.


The No Google License functionally exists: it's the AGPL. https://opensource.google/documentation/reference/using/agpl...


The license that prevents use by a particular list of corporations can likely be easily crafted.

But because any particular invention about LLMs is not a specific product but an approach, it would just be re-implemented.

One could imagine patenting an approach, if it ends up being patentable, and then giving everyone but some excluded entities a grant of royalty-free use. But, unless the use if that particular approach is inevitably very obvious (which is really unlikely with ML models), you would have hard time detecting violations and especially enforcing your patent.


Let's call it the Underdog License. It must not be used by any of the top N tech companies in terms of market share and/or market capitalization.


Neither Alphabet, Google nor their successors, subsidiaries, affiliates, business partners, academic collaborators or parent companies may use this software; all of the foregoing are specifically prohibited from any use of this software.


I agree with your assertion that AI will seamlessly integrate into existing software and services but my expectation is that it will be unequivocally superior as a 3rd party integration[0]. People will get to know 'their AI' and vice versa. Why would I want Bard to recommend me a funny YouTube clip when my neutral assistant[1] has a far better understanding of my sense of humor? Bard can only ever learn from the context of interaction with google services -- something independent can pull from a larger variety of sources supersetting a locked system.

Nevermind more specialized tools that don't have the resources to develop their own competent AI - google might pull it off but adobe won't, and millions Saas and small programs won't even try. As another example, how could an Adobe AI ever have a better interpretation of 'Paint a pretty sunset in the style of Picasso' than a model which can access my photos, wallpapers, location, vacations, etc?

[0]Much how smart phones seamlessly integrate with automobiles via CarPlay and not GM-play. Once AI can use a mouse, if a person can integrate with a service an AI can do so on their behalf.

[1]Mind it's entirely possible it will be Apple or MSFT providing said 'neutral' AI.


> And we should not expect to be able to catch up. The modern internet runs on open source for a reason. Open source has some significant advantages that we cannot replicate.

I don't have faith in OpenAI as a company, but I have faith in Open-Source. What you're trying to say, if I understand correctly, is that Google will absorb the open-source and simply be back on top. But who will maintain this newly acquired status quo for Google? Google cannot EEE their own developer base. They said that much in the article;

> We cannot hope to both drive innovation and control it.

History as an example, Android did not kill *nix. Chrome did not kill Firefox. Google Docs has not killed Open Office. For the simple fact that Google needs all of these organizations to push Google forward. Whether that means Google gets access to code, or whether that means Google becomes incentivized to improve in some way.

If Google wants to eat another free lunch tomorrow they have no choice but to leave some of that free labor standing, if not prop it up a little. The real question becomes, how much market share can we realistically expect without eating tomorrow's lunch?


They're not saying that they should absorb open-source. They're arguing towards a strategy/direction for how to approach AI models from a business perspective, laying down the facts that open-source has a superior positional advantage in terms of development costs.

Probably, internally Googlers are arguing that the "AI explosion" is short lived and people will be stop paying for AI as soon as open source PC models become cost and quality competitive. So they shouldn't chase the next big revenue stream that OpenAI is currently enjoying because it's short lived.


Everything you say is true, and Google has cards left to play, but this is absolutely an existential threat to Google. How could it be otherwise?

For the first time in a very long time, people are open to a new search/answers engine. The game they won must now be replayed, and because it was so dominant, Google has nowhere to go but downwards.


> OpenAI faces the existential risk, not Google. They'll catch up and will have the reach/subsidy advantage.

Doesn't Microsoft products get used more times in a day by more paying customers than Google products?

OpenAI won't have a problem because they reach more paying customers via Microsoft than Google can.


OpenAI=Microsoft for all intents and purposes.

Microsoft has a stake in OpenAI and has integrated into Azure, Bing and Microsoft 365.


Microsoft will probably eventually integrate it into their Xbox services, and possibly games via their own first-party studios.

In my opinion Microsoft has equal or more reach/subsidy advantage than Google for AI, at least toward general consumers.


Honestly, this makes me excited for future RPGs from bethesda. I've already seen mod demos where chatgpt does dialog for NPCs. Imagine a future elderscrolls or fallout where one could negotiate safe passage with the local raiders / bandits for a wheel of cheese or a 6 pack of nuka cola. A man can dream.


Stupid, silly me who knows little-to-nothing about the lore of OS. Why can't OS devs simply write out in the OS licensing that their wonderful work is usable by anyone and everybody unless you belong to Alphabet/Meta/Oracle/Adobe/Twitter/Microsoftpen– McCorps & their subsidiaries?

I imagine it comes down to ol' Googly & the boys taking advantage of the OS work -> OS devs backed by weak NFOs sue X corp. -> X corp. manages to delay the courts and carries on litigation so the bill is astronomical aka ain't nobody footing that -> ???

I imagine 90% end up taking some sort of $ and handover the goods like Chromium, though.

So back to square one, guess we kowtow and pray for us prey?


It already is built seamlessly into a lot of Google products.

OpenAI just beat Google to the cool chatbot demo.


Eventually Google will still lose to open models and AI chips.

Hardware performance is what’s making AI “work” now, not LLMs which are a cognitive model for humans not machines. LLMs are incompatible with the resources of a Pentium 3 era computer.

Managing electron state is just math. Human language meaning is relative to our experience, it does not exist elsewhere in physical reality. All the syntax and semantics we layered on was for us not the machines.

End users buy hardware, not software. Zuckerberg needs VR gadgets to sell because Meta is not Intel, Apple, AMD, nVidia.

The software industry is deluding itself if it does not see the massive contraction on the horizon.


It’s an obvious cycle.

“I’m idealistic!”

“I’m starting a moral company!”

“Oh dear this got big. I need investors and a board.”


OpenAI is more of a lab than a company though, no?

Aren’t they, in some sense, kind of like that lab division that invented the computer mouse? Or for that matter, any other laboratory that made significant breakthroughs but left the commercialization to others?

It would make sense to me what you’re describing. Only, we will probably be laughing from the future the extent of our current imagination with this stuff is still limited to GUI’s, excels and docs.


As running the models seems to be relatively cheap but making them is not I believe that's where the money is. That and generic cloud services because ultimately the majority will train and run their models in the cloud.

So, I would bet on AWS before OpenAI and I would bet the times of freely available high quality models will come to an end soon. If open source can keep up with that is to be seen.


This is why I think Apple's direction of building a neural engine into the M1 architecture is low-key brilliant. It's just there and part of their API; as AI capabilities increase and the developer landscape solidifies, they can incrementally expand and improve its capabilities.

As always, Apple's focus is hardware-first, and I think it will once again pay off here.


As I understand it, the open source community is working to make models:

- usable by anyone

- feasible on your desktop

Thereby at least levelling the playing field for other developers.


If Openai can win the developer market with cheap api access and a better product, then distribution becomes through third parties with everyone else becoming the product sending training data back to the model. I'd see that as their current strategy.


There are two things that make a good LLM. The amount of data available for training, and the amount of compute available. Google's bard sucks in comparison to Open AI, and even compared to Bing. It's pretty clear that GPT4 has some secret sauce that's giving them a competitive edge.

I also don't think that Open Source LLMs are that big of a threat, for exactly this reason. They will always be behind on the amount of data and compute available to the 'big players'. Sure, AI will increasingly be incorporated into various software products, but those products will be calling out to big tech apis with the best model. There will be some demand for private LLMs trained on company data, but they will only be useful in narrow specialties.


Did you read the article? It refutes almost every claim you're making and, I must say, rather convincingly so.


I'll admit, I skimmed it. I went back and re-read it, and the timeline of events was especially shocking. I still think the big models hold an edge, simply because it will most likely be better at handling edge cases, but wow, my days of underestimating the oss llms are certainly coming to a middle


Thats like saying in 1995 search is going to be integrated into everything, not a destination. That'd be true but also very wrong. Google.com ended up as the main destination.


Yes, AI is like social in that regard. You can add social features to any app, and the same applies to AI. But there are also social-centric sites/apps, and it will be the same for AI.


>They'll also find a way to integrate this in a way where you don't have to directly pay for the capability, as it's paid in other ways: ads.

I fear you are correct.


They'll catch up and will have the reach/subsidy advantage.

This is only true if they're making progress faster than OpenAI. There isn't much evidence for that.


LLMs are just better ML models, are just better statistical models. I agree that they're going to to be in everything, but invisible and in the background.


this was exactly what the free software advocates have been saying would happen (has happened) without protections to make sure modifications got contributed back to free software projects.


Yes, bring back Clippy!!!


You must mean the new Albert Clippy!


Google makes almost all its money from search. These platforms are all there to reinforce its search monopoly. ChatGPT has obsoleted search. ChatGPT will do to Google search what the Internet did to public libraries - make them mostly irrelevant.


How has ChatGPT obsoleted search, when hallucination and the token limits are major problems?

It's (sort of) obviated search for certain kinds of queries engineers make, but not normies.

I say sort of, because IMO it's pretty bad at spitting out accurate (or even syntactically correct) code for any nontrivial problem. I have to give it lots of corrections, and often it will just invent new code that also is broken in some way.


Let's consider what Google did to the previous paradigm: libraries and books.

Books had editors and were expensive to publish, which imparted some automatic credibility. You might even have involved a librarian or other expert in your search. So a lot of the credibility problem was solved for you, up-front, once you got the information source.

Google changed the game. It gave you results instantly, from sources that it guessed looked reliable. But you still had to ascertain credibility yourself. And you might even look at two or three pages on the same topic, quickly.

Google has been mostly defeated now and often none of the links it suggests are any good. That trade-off seems to be done.

Here comes LLMs. Now it's transferring even more of the work of assessing credibility to the end user. But the benefit is that you can get very tailored answers to your exact query; it's basically writing a web page just for you in real time.

I think the applications that win in this new era will have to make that part of their business model. In science fiction, AIs were infallible oracles. In the real world it looks like they'll be tireless research assistants with an incredible breadth of book-learning to start from but little understanding of the real world. So you'll have a conversation as you both converge on the answer.


google wasn't the first search engine


true, but it was the first great one. i remember struggling with altavista until google blew everything out of the water.


I've replaced almost all my usage of Google Search with ChatGPT. The only reason's I've had to use Google search is to look up current news, and do some fact checking. In my experience, GPT-4 rarely provides incorrect results. This includes things like asking for recipes, food recommendations, clarifying what food is safe to eat when pregnant, how to drain my dog tricks, translating documents, explaining terminology from finance, understanding different kinds of whiskey, etc.


This was true for me too, but I'm starting to find the data's cutoff date a problem, and it gets worse every day. I was reminded about it yesterday when it knew nothing about the new programming language Mojo or recent voice conversion algorithms.

The eventual winner will have a model that stays up-to-date.


It's mentioned in the article, but LoRA or RAG will enable this.

Phind is getting awfully close to this point already really. Integrating new knowledge isn't a bottleneck like we know from expert systems, I think it just hasn't been a priority for research and commercial reasons, till recently.


I asked ChatGPT to find Indian food in a tourist town. Googling verified that only one of its suggestions was a real place; the other four were hallucinations.

It's possible GPT-4 will be better; I haven't been able to test it because I remain on the waitlist.

I remain skeptical.


It's still bad.


Try this simple question with ChatGPT.

No need to verify the ages. You will immediately find the problem.

“List the presidents of the us in the order of their ages when they were first inaugurated”


I think you're underestimating product-market fit.

Normies don't care about the exact truth


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: