I've been out of Google for a year now so I can share this story:
I created the AI suggestion chip at Google, which combined with an internal project called Jane, became the Google Assistant. At the time Sundar was desperate for a narrative to counter Alexa in front of investors and wanted a HUGE I/O announcement, so off we sprinted.
But I went to the executives at the time in charge and said look, this really isn't going to be an assistant if it doesn't have the APIs into services that users need to accomplish actual tasks. Sharing some information and changing the music isn't what a butler does, a butler anticipates your needs and executes on them -- and we can't do that until we have the APIs to anticipate and execute.
I was, of course, ignored, because in my naiveness at the time and I didn't realize it didn't matter what the assistant DID, it mattered what the narrative to investors was about its potential. So 2017 I/O Sundar announces the assistant and it looks great inside Allo, and it's a great demo -- mission accomplished.
Now 6 years+ later, it's a stagnant, overspent team across the space -- because why? Because it can't do anything for users. All these years and billions later it still can't do anything important -- it has no user journey.
And it took me equally as long to realize, that still doesn't matter. FAANG has gotten so large that the stock bump that comes from narrative outpaces actual revenue from working products. It makes for a crazy world... because ironically Google ALSO invented the truly disruptive transformer technology behind the LLM and when we tried to launch that with MINA, Sundar blocked it! Too disruptive, no narrative, too scary...
I don't know the lesson to be learned here, but wow, it just makes me want to give up...
For what it's worth, all I ever wanted it to do was to be able to reliably turn off lights and play the right music.
Nearly a decade later, when I say "turn off the music", it'll pause Netflix on the TV I'm actively watching (and talking to it through) while the speaker in the other room keeps playing music. Or if you ask it turn off the light, the whole house goes dark.
I don't need it to read minds, but it sure would've been nice for it to understand the difference between a TV, a music speaker, and a light bulb.
I don't know how Google works (or doesn't, apparently), but I often wished they just sat down with a few interns and manually went through the top 100 most used commands and made manual command chains for them.
The assistants are annoying not because they're not AGI enough, but because they can't reliably do the simple tasks that we actually want to offload to them...
I used to use alexa for home control. Over the past 5 years or so, it got progressively smarter and more capable - and worse! It was incredibly annoying. Now I just use a locally hosted smarthome solution.
My Alexa recently lost knowledge of most rooms, it can't work most lights anymore. A few months prior it suddenly refused to work with "livingroom", apparently some new device of that name appeared after a automatic device scan, but would not react or be removable. In the end I renamed it "main room" which worked until the recent snafu. The only, only good thing about Alexa is the low latency. But if most things just don't work right, what good is it.
We have a ton of devices connected to Alexa and HomeKit via Home Assistant. We mostly interact with Alexa through our Sonos system.
I want to throw Alexa out the goddamn window every other week, especially when I ask it to turn off a very specific room, and it responds "which device?" Despite parsing my query perfectly.
I always enjoyed when it would do something like get the wrong task altogether, but what really annoyed me was when it started giving me tips on how to do other things.
This is what bothers me about all of this "investment" into AI now. Is it the same thing? Just purely CEO performance art to mollify stupid and uncreative Investment Banking analysts so they can fill in the cell in their spreadsheet that makes stock price go up? "Yes. Mister Johnson at Goldman, we, also have an AI strategy for 2024, very similar to BigCoXyz who you also model..."
It seems that companies don't have to do anything anymore--they just have to spin a story for Wall Street and make sure they lap it up.
Of course it is! Just like every company needed a "blockchain strategy" in 2018, every company now needs an "AI strategy". Because just like blockchain, the companies that use AI will crush those who don't... right?
(for what it's worth, I do think LLMs are useful for certain tasks, but it's ridiculous that every company is pivoting to be an "AI company" for no real reason...)
> I don't know the lesson to be learned here, but wow, it just makes me want to give up...
Sundar is an awful CEO, that's the lesson. That's the problem there. He exhibits no leadership abilities or vision. He's extremely weak. He's essentially the anti-Jobs.
With all due respect, push the business forward where? Google already won everything. They are on top right now. The irrational thing here is to get the size of Google and then try to promise investors growth when you should be promising dividends. The only place for Google to go is to stagnate or to go down. "Growth" is just going to get them a more serious antitrust action.
That goes for Amazon as well imo. They are another company that won about as much as anyone can but has refused to move past the startup mentality and irrational promises of growth. It's going to get them both burned.
Microsoft is the only company that has really handled this well but that's probably got something to do with them getting in antitrust trouble back in the day, when they were behaving much more like a Google or an Amazon.
I was about to give up on MSFT and sell back in the aughts. But then Nadella took over, and boy am I glad I didn't sell.
A friend of mine who worked at Microsoft in the 90s told me that Gates lamented in a meeting that nobody was coming up with new ideas on how to leverage their expertise and make money.
He's turned Google into Microsoft. The kinda CEO you get when you decide to sell out, stop innovating, and milk your captive audiences for all you can before it all explodes...
That it's the people who say the "right" things that win at life, not the people who study hard and try to make an actual difference. Maya Angelou said it best - she was trying to be inspirational, but it's actually depressing as hell for anybody who thought hard work would get them ahead: "People will forget what you said, they'll forget what you did, but they'll never forget how you made them feel".
1) Saying the “right” things does not work in the long run because people want results. Also, if the “right” things are not true, they will notice, and there are usually consequences.
2) I think May Angelou is right that people remember how you make them feel. I also think they remember people’s actions and statements. A good example of this is Google itself. People eventually figured out that the “do no evil” moto was bullshit. They also figured out that when Google said something, you should be skeptical because of Google’s track record of canceling products.
> anybody who thought hard work would get them ahead
That's another manifestation of the Labor Theory of Value, where what something is worth is proportional to the amount of labor that went into it.
This is false. Value is only tangentially related to the labor input. Just because you work hard creating a song doesn't mean people will like the song and pay $$ for it.
Working smart is what matters. By that I mean working at creating things people want.
Some people may be misguided when reading the "AI suggestion chip" term. I think what you mean by AI suggestion chip is actually the combination of the frontend and backend to display a suggestion chip/button [2] in the UI of Google Search that performs an action that is executed by what is considered the Google Assistant. It is not a silicon-based physical hardware AI chip.
IMO what dooms assistants is the competition on who owns the top of the funnel and how that limits what APIs software/hardware services that are exposed to third parties from hardware from LG, Samsung, Sony, etc. The Google Assistant or Alexa is useless controlling the TV if it is only allowed to turn it on or off. I couldn't care at all about the Samsung assistant universe if all I had from them is a TV, but I do want to control it with Alexa or Google.
Maybe Matter[1] will matter in the long run, but is a little late for the momentum/size the assistant teams in any company had years ago to build these things out.
I kept selling them on this opportunity. The market of APIs front run by intelligence will be even more valuable than ads.
So I built their Business Messaging program to try to get access to those apis.
What took me too long to discover is that an API market of 100% confirmed conversions will be worthy less than an intent market with a 10% or 1% chance. Because you can show more ads across the intent market that aggregates to more revenue than a 100% confirmed transaction being auctioned off. (Specifically because an ad can’t go for more than the cost of the transaction of course, but multiple ads can)
TLDR. Google became the same as the newspaper ads they disrupted. Making more money off the failure of their ads to convert than their successful ones.
Almost 25 years ago I worked at a little startup. I started to build a network management system using ..... tcl/tk some open source HP Openview, what was the name?
Anyway, I had a neat display showing live nodes as green. I was redirected to enhance that display with graphs and charts and linking the nodes, we bought a TV to display it on... long story short, it was more important to show this awesome graphic tool to potential investors then to know whether our systems were actually down.
I think it was from the movie industry - just something in the background shot with lots of blinking lights and occasional beeps was known as an EBG, or "electronic bullshit grinder". They do tend to impress investors as well as audiences.
> I don't know the lesson to be learned here, but wow, it just makes me want to give up...
Its like they always say: the road to Hell is paved with good intentions.
Its good that you recognize the problem. Welcome to the other side. That just makes you more intelligent and ready for the next event. If you know what is moral and just, and if you have the capabilities to see the future thanks to your experience... you now know how to handle these events in the future.
For now, support the rise of interest rates and ignore the cries of the SV elite to have the Fed cut rates back down. We all know that crappy projects make less sense when money costs 5.25% / year rather than the 0%/year it cost throughout the 2010s decade.
Beyond that? Hard to say. Depends where things go.
Can anyone tell me how Alexa has evolved since its inception? To me it seems equally intellectually challenged since day one.
I can tell Alexa to "turn off my bedroom light", I can't tell it to "turn off my Livingroom light", but if I tell it to "turn off my bedroom AND my living room light", then it doesn't understand it. Alexa is as bad as the voice dialing systems from cars of 15 years ago.
What have all those researchers and developers at Alexa been doing all those years that the bloody thing can't understand basic command chaining?
My biggest frustration here is that it can't understand simple responses to questions it asked me. I'll ask Alexa to turn off the lights in a certain room, sometimes she'll respond "a few things share that name, which one did you mean" and I cannot say "both" or even "all of them". The command equivilent of none-of-the-above or all-of-the-above feels table stakes for an assistant which can ask you for specific selections if it is confused about which you meant.
You can ask to "turn off all of the lights" - but "all of the lights" or "all of them" isn't a valid response once you're into the question-and-answer phase, at least not reliably for me.
In the 1970s I tried to create an Advent-like game (Advent was the predecessor of Zork). The user interface was plain English text commands.
I rapidly discovered there were so many common ways to express the same thing in English, that trying to enumerate them was impossible. At least impossible on a PDP-10.
Oh, come on. There's aren't that many common way to express basic things in English like telling it to turn of all the two lights on your house when it asks you "which one?".
"All of them" and "both" should be enough to cover most users and not put big pressure on Amazons datacenters and yet Alexa can't even get that right.
I figured as soon as I shared my own anecdote a bunch of others contradicting would follow. This perhaps makes it even more frustrating - I promise I enunciate well and don't verbally abuse the voice assistant!
Alexa was supposed to be a revenue stream for Amazon that never really materialized. As a developer of a now deprecated Alexa Skill, they constantly badgered me into monetizing it.
All of those folks were trying to figure out how to have Alexa make money, and my interpretation of the layoffs is they never figured out how.
Millions of people actually use it? Or keep it as a paperweight who's sole job is to "Alexa turn off the lights" once a week?
If it's the latter then you probably don't need to keep thousands of highly paid engineers since that's what Alexa could also do 10 years ago and what vehicle voice commands could do 15 years ago without internet connection and expensive data centers. Drops mic
Well how do you define “uses it” because your paperweight story sure sounds like use to me. They probably don’t need all those engineers, but there’s a long-tail of features people use beyond weekly lights.
Also, Alexa works better than any car control ever has, and at a fraction of the price. You obviously need internet connectivity for the majority of features people care about. Like controlling devices over the network, getting weather, streaming music, etc
Considering they’ve slowly started moving features behind a subscription already, keep watching because I’ll bet Alexa becomes a subscription for most features beyond timers and weather and controlling your first 10 smarthome devices.
It’s just hard to charge for something that used to be free.
I feel like all of this is very solvable with current generation AI.
I can ask chatGPT right now "What would you do if asked to make a stop at a gas station on the current navigation route" and it gives a sensible answer (in this case "i can't add a stop to your navigation"). So it's clearly possible for modern AI to work with this sort of context. They just need to integrate modern AI with voice assistants and add appropriate application hooks the AI can act upon everywhere asap.
This also makes layoffs in voice chat departments stupid. They are on the verge of a revolution. Everything stated here as a weakness of voice assistants is about to be solved.
I think there's still a huge amount of work to go through the codebase and add appropriate (and safe) hooks for the AI to act upon. That means a complete pivot in what the staff work on but i don't think layoffs make sense as since there's an urgent need to pivot to be first in this space.
The context recognition is there sure. But teams working on old models of context recognition (more hardcoded methods) are still engineers and there's a need to throw engineers at this to be first.
Imagine saying in winter "Make my living room feel like a summer beach party" leading to blinds up + lights on with yellow hue + heat + ambient beach music turning on. That needs hooks into every first and third party home automation system. It needs the ability to choose specific genres of music on a playlist. etc. There's so much work to do and it needs to be done before the competitors. Firing engineers, even when they would need to pivot to achieve this doesn't make sense to me as this is one of those winner takes all technology races. You really really want to be the first to set the standard for how third parties integrate these hooks.
It's an iPhone moment in the making and Amazon is firing staff.
It might just be my expectations were always that it would get better, and it hasn't. Or that I know they laid people off, so knowing that I expect it to get worse.
I'd switch but everyone I know with a Google setup has trouble with theirs. And we have some Apple devices so we know Siri isn't great either.
Kids love it.. the "Alexa fart for me" is the best implemented part of it.
I just tested Alexa: "turn on the living room AND the den" and it worked fine.
However, further resting revealed that I need to stress the "and" and have a little of a pause before it so that Alexa understands that "and" means an additional command and isn't just part of "living room." However, even is situations where Alexa didn't recognize the "and" part of the command it still turned on the living room lights...
The system was redesigned so that ASR would handle the transcription and the "understanding." It never worked correctly, many many people complained, some people showed counter designs and data driven forecasts of catastrophic loss across multiple major areas. Those people were fired, the data was faked, and here we are today, with a broken Alexa. That said, it can "listen" and "see" quite well.
Somehow, some way, Apple is the real winner in the voice assistant battle. They realized years ago that the rise of devices like Amazon Echo and Google Home would not have an impact on the iPhone. As a result, they accepted Siri as a dumb but good enough assistant and saved saved billions and billions of dollars in R&D.
They successfully kicked the can into the future. That is arguably the best approach to technical debt: wait until you can wait no longer but failure is not a result. And they have deep pockets where millions per day in spend is immaterial ($384B in annual revenue, $167B in profit).
It would be nice if Apple dove into the open source chat assistant ecosystem, sorta-kinda like they did with Apple Maps.
Being Apple, I'm sure they are trying to do everything from scratch "better," but I think they could have already improved on Siri with Whisper, an open source LLM and some TTS out of the box.
At most they admit to spending million dollars/day not that they are doing to catchup. If news say Apple spending multi million dollars / day on chip design which is not admitting they are doing to catchup.
> saved saved billions and billions of dollars in R&D
Did they though? Do we know how much Google and Amazon spend on these devices?
Apple may have won the first battle by "not fighting" but the next battle is a race to make these dumb assistants smart.
Amazon seems to be the possible long term winner, as they don't make phones and still got millions to buy their "dumb speaker" when they (consumers) already had one in their pocket!
Amazon has admitted they spent upwards of 10B a year on Alexa. Alexa is much more feature rich than Siri, so I sure hope Apple isn’t out spending them.
I don’t know if I agree that Amazon will be the long term winner either. I see your logic, but Apple also has a speaker and a phone. The HomePod doesn’t have the same market share, but it’s still a sizable market (>20%).
IMO Google should have been the clear winner, but they will probably be the loser. They had cheap and varied hardware, they had phones, they had APIs and features like Alexa. But Google can’t seem to stop being Google, and of course GAssistant leadership has absolutely no focus or consistency roadmap. Perhaps tellingly they’re giving up on cloud “skills” and focusing on android.
OpenAI needs to keep the hype train going, SoftBank still has too much money, and Jony Ive was probably offered more money than he's ever made with Apple, likely making him the real winner
The apple homepod is a nice, wireless speaker that can (usually) play the music you ask it to by voice command. If that's what you expect of it, it's really quite good.
Apparently, Amazon truly believed people would get in the habit of saying "hey Alexa, buy this thing" and that this would drive revenue. I'd assume Amazon of all companies would know that most people prefer to compare products, read (fraudulent) reviews and look for deals before spending their money.
I think they imagined it would be used for well-known products that the user was already very comfortable with, where they already have done all their comparisons and are know the cost/benefit tradeoffs with.
I could imagine doing that myself, but the voice interface seemed so clumsy and a little scary that I never got over the hump of trying it. There are things that we would routinely buy on Amazon without any real comparison after the, for example, 10th time we've bought them -- but we still never buy them on Alexa.
This is what it was intended for. They were pushing it at around the same time as they were pushing Amazon Dash, the buttons for ordering specific consumable products [0].
I agree fully, though: Alexa never became reliable enough for me (or many others) to trust it with a credit card. A voice assistant that constantly mishears things in the small domain in which I do trust it can't be trusted to pick the right thing out of Amazon's entire catalog.
Amazon also thought it would be a conduit for subscriptions, and a conduit for in-app purchases.
I had seen market research a few years ago that basically said people don’t like buying this via voice without seeing an explicit bill/numbers on screen. That was around the time they announced the devices with a screen.
They've barely managed to keep me from pulling the plug with the unwanted assanine desperate monetitation suggestions.
I was pleased to discover it finally figured out just last week how to give me my preferred news source when I accidentally say "Alexa news" instead of "Alexa news from..."
Honestly, for a while there I was actually doing exactly that, mostly to re-order things I order often, but sometimes to just order the top result for a certain query. I have since mostly lost faith in Amazon as a brand and no longer trust their reviews or products to be of any decent quality. So I guess I would say that it's not Alexa that failed here, but the amazon brand itself.
Alexa has been one-upped. She's little more than a voice-operated dumb terminal now. ChatGPT is seen as the "Computer" from Star Trek. Not surprised by the job cuts. Sorry to hear about it. I am surprised by Alexa's lack of evolution over the years. Maybe I'm naive as to the complexities of AI and speech-to-intent, but I remember standing in my kitchen asking Alexa if she knew about ChatGPT. She recommended the ChatGPT Skill which worked, but ultimately I was left disappointed once I exited the skill, and returned back to _plain old_ Alexa. I wonder when ChatGPT will be able to turn on my lights and set a reminder to take the bread out of the oven? I should go look into that...
> I wonder when ChatGPT will be able to turn on my lights and set a reminder to take the bread out of the oven? I should go look into that...
This is the part you’re underestimating. The language->intent stuff is not solved but it’s long been “good enough” for basic tasks. The long-tail of integrating with other services is hard and not improved by better AI (unless you have some clever new ideas).
Maintaining the actual logic that lets Alexa talk to hundreds of APIs is very human-intense even if it’s not hard.
The main problem with Alexa / Google Assistant / Siri is that the tech was not ready when they launched. We didn't have models that could understand non-trivial user requests, generate non-trivial actions or keep track of context properly. Now we do.
Amazon, Apple and Google are all working on incorporating LLMs but why is it taking so long? Why are these assistants still so bad? ChatGPT has been available for a year, GPT3 API for 3 years. I suspect some of it is legacy tech and legacy researchers from the pre-LLM era.
How often would ChatGPT hit the exact API you expect with the exact request you need in 1 shot. People I know will still very often try 5 prompts before they get what they wanted from LLMs, that doesn't work in a Siri world (or its just as frustrating)
With ChatGPT some of the limitations of the tech are handled by the user e.g. starting a new chat when you want to discuss a new topic. An assistant has to detect changes in user context somehow. Also, I think it would be harder to know what to inject in the prompt since conversations are more like context based RAG rather than topic (embedding) based.
Then you have all the usual generative issues: hallucinations, alignment, sticking within guardrails, no repeatable testing, drift. The potential for errors at that scale is pretty staggering.
No, the main problem is that these projects were grossly mismanaged and didn't really have a concrete purpose. It would've been possible to build something incredibly useful without LLMs, I just don't know why they didn't. Though the top comment I feel answers that question quite clearly. These assistants were never developed to be useful.
I think we will see a persistent divestment from the space in the next five years. Smart speakers/devices have been a tremendous financial failure and virtual assistants in general are pretty much outdated tech since the release of ChatGPT.
They work ok as a side feature. It is nice to ask Siri to play music and that wasn't possible a few years ago. But it's a value add for the Apple ecosystem - not a capital P Product.
Alexa & Google Home work fantastic in these areas:
- Playing music
- Playing audiobooks
- Telling the time
- Setting timers for cooking.
- Giving the day's weather.
Alexa beats Google solidly in kid-friendly material.
I subscribe to the Amazon PrimeFancyMusic thing, and that works well for my Echo Studio.
I would agree that a GPT driven voice controlled system might be great.
What I would actually prefer would be a speaker system that I could drive with a command line tool _or_ voice. Arguably, a tidy enough installation on a RPi Zero connected to a dumb speaker would be sufficient for the CLI...
A "smart" speaker that simply did the above 5 tasks reliably and was driven off of music/audiobook subscriptions would be fantastic. It wouldn't shoot for the moon, but it would be a workhorse.
I honestly have no idea what's taking them so long. It's tempting at this point to build a custom integration for Home Assistant that just uses function calling to trigger various actions.
Because it costs too much. They're already losing money on every speaker sold (Google/Amazon anyway), plus the infrastructure required to maintain service. Adding an LLM is a huge expense for an already failing form factor.
Alexa is unimaginably bad considering the money and time they’ve spent on it. Someone should do a business school case study on how it could be so bad. It literally can’t handle the most basic of tasks involving playing music on Spotify without making idiotic mistakes, or even properly processing the same 3 commands we give it every day to turn on or off the lights. I’ve been muttering for the past 2 years to myself about how they should fire everyone involved in the project. I guess they finally realized it too.
You can safely say this for any user-facing product Amazon has. Look at the Prime TV interface, the Amazon Music app (and their "free" tier), etc. Everything they put out is third-rate, the only thing they know how to do well is AWS, which keeps the lights on.
I don't find the Amazon Fire interface to be nearly as horrible and frustrating as Alexa is. What I don't understand is how it is so bad. You could literally just glue together the tiny.en Whisper model and the Mistral7b LLM and add some APIs and get something that would work so much better than Alexa for most tasks. I feel like I could do that in well under a month and replicate the most essential functionality, but with something that actually works well. So what exactly have the hundreds of highly compensated devs been doing for years there? It's truly mind-boggling to me.
I was an SDM at Amazon up until last month, I can tell you that the engineers are busy arguing over minute details and choice of wording in design docs. That's essentially the job.
Not surprising, imo—voice assistants got big at a time when there was no way to implement them except by what I would call a massive catalog of hacks. Now that LLMs with really good NLU for major languages exist, Amazon must have made a determination that either voice assistants are no longer worth actively developing, or that LLMs ought to serve as the core of the NLU component of their dialogue systems.
Not surprised - I have voice assistants in every device I have owned in the past 10 years and have never found a good reason to use them. ChatGPT on the other hand I use several times a day.
Recently switched to Home Assistant to control all my Hue locally along with a bunch of other devices. It was a pain in the beginning and I was cursing HA. 2 weeks later, I love it.
You’ll be cursing HA off and on for as long as you use it. The local control angle is the right one (Apple’s HomeKit is also local for its devices and its great, I wish they’d invest more in it).
The breaking changes in HA are out of control, every release is a dozen or more - a good chance at least one integration you use will have a breaking change every time you update.
If you wait too long to upgrade it’ll just implode.
I created the AI suggestion chip at Google, which combined with an internal project called Jane, became the Google Assistant. At the time Sundar was desperate for a narrative to counter Alexa in front of investors and wanted a HUGE I/O announcement, so off we sprinted.
But I went to the executives at the time in charge and said look, this really isn't going to be an assistant if it doesn't have the APIs into services that users need to accomplish actual tasks. Sharing some information and changing the music isn't what a butler does, a butler anticipates your needs and executes on them -- and we can't do that until we have the APIs to anticipate and execute.
I was, of course, ignored, because in my naiveness at the time and I didn't realize it didn't matter what the assistant DID, it mattered what the narrative to investors was about its potential. So 2017 I/O Sundar announces the assistant and it looks great inside Allo, and it's a great demo -- mission accomplished.
Now 6 years+ later, it's a stagnant, overspent team across the space -- because why? Because it can't do anything for users. All these years and billions later it still can't do anything important -- it has no user journey.
And it took me equally as long to realize, that still doesn't matter. FAANG has gotten so large that the stock bump that comes from narrative outpaces actual revenue from working products. It makes for a crazy world... because ironically Google ALSO invented the truly disruptive transformer technology behind the LLM and when we tried to launch that with MINA, Sundar blocked it! Too disruptive, no narrative, too scary...
I don't know the lesson to be learned here, but wow, it just makes me want to give up...