There seems to be a real golden opportunity here that neither google or apple have jumped on yet. And perhaps its much harder than I imagine as it needs a large context window.
But you don't need to beat chat-gpt 4. A chat-gpt 3.5 esque model that has access to your todo list and calendar that I can talk to like siri would make the tool 10x more useful overnight.
If it even had access to your last 50 emails I could imagine saying something like.
- remind me to respond to that email from bob tomorrow.
- Draft me 3 responses that are kind, but indicate clearly that we can't go forward with his request.
This could even happen in the background, and I'm happy to pay a bit extra for the compute, it could be "premium siri"
Kinda agree with this. It seems like every company is hard at work building their vision of an AI future, but no one can get the present right. Integration of current GPT-level intelligence could do so much to improve reminders, scheduling, calendar management, email sorting, web browsing, app installs, general UI navigation and tons more, but they all just want to build better chatbots and train larger models and chase AGI.
The companies developing these new AI models are so rich that they don't need or even care about small improvements for day-to-day tasks and problems. Instead, they seem hyperfocused on looking for the next disruptive improvement, while selling shovels to anyone else interested in picking what's left.
100%. Give me an intelligent agent that integrates with calendar, reminders, notes, messages, mail, *shortcuts* and the Finder. Zapier actually has a decent UI for this which integrates with OpenAI.
>happy to pay a bit extra for the compute, it could be "premium siri"
Siri is heavily embedded in the ecosystem, yet has bitterly under delivered for far too long. Many improvements should have been made absent access to an LLM.
Apple should provide this behavior out of the box, and figure out how to make the costs work without additional subscriptions.
I think the cost of running these models is still astronomical, and unlike ms and google apple just doesnt really have cloud compute at their scale. So I could imagine they might run this as a premium service first to build hype, and then figure out bringing it down in cost over time.
A bit like the vision pro, its a premium product now, but I could imagine 3-6 years down the line it will cost around an ipad pro. Apple is really good at scaling things once they see the market. But suddenly switching everyone to a gpt-3.5 for siri overnight would be a humongous cost (I think).
Unlike Google and MS, Apple has neural processing hardware in user's hands. Every modern Apple device has neural processors as part of its CPU, from iPhones and iPads to Macs of every price-point.
Apple is the only company I know of (discounting Nvidia), that has this established base of consumers just waiting for a killer app.
Exactly. Apple is likely to take the edge computing approach (in fact, they already do - try searching your iPhone's photo library for "sports car" or "cat"; I'm fairly certain this all happens on-device using the neural engine).
In addition to the leaps and bounds in silicon, Apple has also quietly been making acquisitions in the space (like $200M+ for Xnor.ai - focusing on low-power ML intended for edge computing applications).
They are likely relentlessly optimizing, and in typical Apple fashion will not be first to release something, but will do a killer job in distributing it to hundreds of millions of devices/people at once, with a UX that's thoughtfully polished and accessible to all.
It’s just too expensive. Using the full GPT4 context window, for example, costs almost $2.00!
Obviously it would be much cheaper at scale, and like you said, it doesn’t have to be cutting edge. But still, the compute for an interaction with Siri is a fraction of a penny.
Apple has to make sure you can’t make Siri give the offensive answers AI is famous for. Apple sells quality, they are always under a magnifying glass and they can’t play by the same rules Google can.
While I would truly be thrilled to see that, I don't think any big companies are capable of doing this. Not because of technical ability, but because of perceived antitrust concerns. Many products are more useful to users when they are tightly integrated, but companies won't do it if it will draw the ire of regulators.
I think apple with its app ecosystem though could just let apps offer themselves as plugins to "siri-gpt", with their available function calls (like openai does now).
Their own apps, mail/todo/calendar could then play in a common ground which should be considered fair.
Apple has the perfect ecosystem to do this imo, they could knock openai plugins out of the park, since the apps would be ones that you use, and know about you, so would be far more useful.
Please just fix Siri. It works so shitty that sometimes I use my phone to turn things on and off instead of using my HomePod mini. I don’t care if Siri can tell me what the capital of the US is. I just want to reliably control my smart devices.
Put a freaking GPT in front of Siri at a minimum. Train it on Siri abilities and let it take what I say and translate it to something Siri can understand so I don't have to speak like a robot. Even when I say it perfectly sometimes Siri just loses the thread completely, it's infuriating.
Siri excels at like 2 things: setting reminders and setting timers. Anything else I don't really trust it for.
The issue with GPT-Siri is the amount of people (and journalists) who will jump on it on the nanosecond it's released to make it say something newsworthy (racist, factually incorrect etc.)
Teaching an LLM is easy, making it safe for public consumption is _really_ hard.
> making it safe for public consumption is _really_ hard
"Safe" in what sense? Is an LLM output which we recognize as racist or factually incorrect going to be directly responsible for the loss of life, serious injury or significant property damage?
If LLMs can be considered unsafe for public consumption, then so are Twitter, Facebook, heck, even Wikipedia and Hacker News!
Agreed. I'm hopeful that this not only with LLM's improve, but it will lose it's newsworthiness. Kinda like how "Tesla catches fire" used to be a constant headline, while "Regular car catches fire" wasn't.
Yeah, if you say the right things to an LLM, it might say something weird. Just like your child might swear if you swear around them, or they might make up a fact and claim it's true. It's cute/funny/interesting, but really not special or newsworthy. But because AI is the new big thing, everyone is scrambling for a headline. I think people will get bored and move on.
Seriously, I have two rooms that sound slightly similar and it can never distinguish between the two and just defaults to one. I would enunciate the syllables hard and it doesn’t do anything.
Google's home assistants treat timers and alarms as different things entirely but I mentally don't. "Set a timer for x minutes" and "set an alarm for x minutes" are conceptually the same to me when I'm in the kitchen. But if you ask for timers, it will only give you timers, and if you ask for alarms it will only give you alarms.
I use the non-display speakers so I can't even view the stuff.
I actually do treat them very differently on my Home Mini. I want them to be different in the following ways:
- Timers can be set to the second (2 minutes 30 seconds), alarms are only to the minute (5:00) (it already works this way, I think)
- A timer noise should be immediate and jarring (so you can rush out of the other room to pull the thing out of the oven before it burns), alarms should start quiet and harmonious and only gradually increase in volume and obnoxiousness as needed to wake you up (sadly, few alarms are like that, but Google timer/alarm sounds are different)
- Ideally you want to be able to give timers names when you set them ("oven", "stockpot", "toaster oven") when you're using multiple, that are repeated when they go off (Google doesn't do this). You don't need this with alarms
- Alarms need a snooze option. Timers don't
While they both involve a sonic alert after a certain amount of time, their use cases are so different that they really are totally separate features.
The biggest headache with Google's ecosystem is lack of "Timer Groups."
They have speaker groups (that work well), but if you have two+ Google Home/Hub/Mini devices in a single space asking for a timer will add it randomly to one of the devices, not the room or a group, and it won't always display on the Hub.
For example I have an open-plan kitchen, two Google Home Max speakers, and a Hub (display) on the counter. If I ask for a timer, it MIGHT go to the Hub or it MIGHT go to either of the two Home Max speakers.
Cancelling it involves figuring out which device took it, and standing close to it (or it just won't cancel). This is a really obvious and strange oversight, as Google Home's ecosystem supports "rooms" by default.
I was recently pleasantly surprised when I said, "Siri, set a timer for 7pm" and Siri responded by a) informing me timers couldn't be set for a time, and b) finding and enabling my cooking alarm which is typically set for 7pm, and telling me so.
The truly saavy version would have calculated the time left until 7pm and set a timer for that many minutes, however.
Alarms (at least on Apple systems) are permanent, timers are ephemeral. My wife is the same as you and constantly sets alarms instead of timers - I recently looked at the settings of our kitchen HomePod and found several hundred alarms, all switched off.
I have never, ever considered timers and alarms to be the same thing. If nothing else, a timer happens after a specified period of time. An alarm happens at a specific time.
I recognise the frustration; weirdly, for me I get all the errors with Alexa, and almost none with Siri, but the near daily experience of "Alexa, Küche auf" "Ich kann nicht 'Küche' auf Spotify finden" when we don't even have Spotify means I totally empathise.
Siri is also getting worse. In my native language, I used to (often) say “Add milk to grocery list”, and it now replies “I’m sorry, but I don’t know which speaker you mean.”
Things that would be easy for GPT-4 Siri can't handle at all. Basics like "Hey Siri - add a time to my most recent reminder" aren't possible. For GPT-4 it would easily query a DAO to select for the most recent reminder and update it. But I suppose Siri is actually a massive tree of hand-written rules and they can't cover every case.
just embarrassing that people have to hack around Siri to get proper AI on our phones. the classic "do it 5 years late" apple approach isnt working where AI is concerned
This will be another example of the power of distribution. It doesn't have to be better than Chat GPT. It just has to be good enough.
This is similar to Slack vs Microsoft Teams. Slack had a several year head start on product, but Microsoft had a massive lead on distribution (their Office 365 install base). Teams is good enough.
Teams is atrocious, like it is stuck back in the 90s. It's only popuplar because Microsoft gives it away. For all of it's warts I'd take Slack back in a heartbeat.
I'd bet that 90% of Teams users had never used Slack and never would have through their employers. To them, Teams is like all software in that everybody can complain but it's still much more advanced than... Lync.
Teams << Slack < Discord when it comes to chat but it's still Slack << Teams < Zoom for video which is where I see it displacing competitors the most.
If your office has a culture of email for written communication and the chat feature is sparingly used and mostly for quick 1-1 messages or "can I call you?" then it's passable.
I wonder how long before our IT folks switch us to using Teams for video. Mid-2020 we switched to Zoom (from Webex, before Cisco decided to make it a copy of Zoom) for video, along with Slack (from HipChat). The combo of Slack+Zoom worked fairly well. And I got very reliant on the Slack plugin for Outlook to keep me on schedule. Now we're on Teams, which I still find frustrating on a daily basis, but still using Zoom for video. And Microsoft's own calendar integration for Teams is not at all as good as the plugin for Slack was, so I'm back to hit-or-miss for my scheduling.
Not the most onerous of problems, for sure, but I'm not liking how much our IT department is clamping down on everything. Between cost cutting on software and the myriad malware packages they install to monitor our MacBooks, I'm not their biggest fan.
Large part of the value of LLMs is how good they are. Large part of the value of various chat apps are how many already have them installed. The quality of responses is definitely important for LLMs, otherwise the FOSS ones would have eaten OpenAIs lunch already, which doesn't seem to have happened, at least not yet. The FOSS ones are still trying to catch up with GPT3.5, which compared to GPT4 doesn't even come close.
Apple is so bad at AI that they couldn't even get autocorrect right on the iPhone keyboard right until the end of 2023 (iOS 17). They willingly ignored most of the AI advances over the last 10 yers. How investors think highly of this company's management is mystery to me...
> They willingly ignored most of the AI advances over the last 10 yers.
Nonsense.
CoreML, Apple Silicon (Neural Engine), and even Face ID all have AI advances built right into them.
Apple is not desperate to join the AI hype mania and neither is it a direct threat to them.
Their iDevices and services make enough tens of billions for them to eventually catch up or acquire other foundational AI companies to do that. That is a luxury reserved for extremely profitable companies that can afford to wait or enter late.
People actually analysed the WWDC keynote. Apple said "AI" exactly zero times in the whole presentation.
It wasn't an accident. They know what the current so-called "AI" is and have been using it for a loong time. They're not just getting on the hype train of calling everything AI.
If Apple can leverage their silicon to run models on your physical device at speeds that are acceptable, this could be a game changer for generative AI.
Apple doesn't need a smart chatbot, per se - they should be looking at embedding AI into their existing tools to make them better.
Take iWork, for example. You can use AI to rough-in a document you need to write. Then you can fill in some of the details and make it so it's written by you. Even better if you can point to a personal archive of documents you can use to train your own personal AI so it's better able to write a more relevant document.
Consider Logic Pro. We've all seen AI-generated music, so why not provide an AI that generates your base music project that you can then tweak and embellish?
This is how I expect normal people to want to use AI - embed it in their existing tools. I also don't think most people expect AI to generate 100% of the content, instead they would expect the AI to do the heavy lifting and get them maybe 60% to 70% of the way there. This gets them over the blank page syndrome and allows them to produce a quality result.
I'm reminded of a meme I recently saw that really resonated with me, it said AI isn't going to take your job, it's somebody using AI who's going to take your job. AI is a tool and while we may be dazzled with its capabilities, we're going to be even more dazzled by what people using AI are capable of doing.
Apple should create app extensions that plug in to their LLM. This way a user can teach the system default LLM new stuff and developers can run finetuned weights/layers at the edge (on device).
Apple would instantly become the largest deployed LLM service except the users pay for the hardware and electricity and space in exchange for privacy.
Knowing how safe Apple likes to play and how unsolved the legal side of input data is I doubt Apple will build anything like ChatGPT but I can see them building LMs for iPhone. Even Tesla switched to LM for their self driving software
I really hope this works. Siri is essentially useless except for very small commands like "Call X", and even if X is a clear and unique name sometimes it messes it up. Apple is miles behind google and alexa :/
Here’s what I want: Siri should run locally on all the Neural Core Apple hardware I have. It should always be listening and watching my every move PRIVATELY so that it can understand questions I may have for it. It should never access the cloud unless I explicitly approve it to do so, like how HomeKit works today when opening a garage door or unlocking a door.
If there’s one company that could credibly pull off such a feat without it feeling super creepy—it’s Apple.
Maybe I’m missing something obvious but shouldn’t the SIRI “platform” already be ready to integrate LLMs as just yet another processing API behind the scenes?
Like why wouldn’t Siri with GPT-n behind it just totally crush anything that any other possible competitor could do? Distribution and adoption of SIRI was complete over a decade ago.
I just don’t see how anyone could compete on a voice assistant tool with apple and google. I mean I WISH it was possible but the reality is different.
GPT models are too capable. Even if you did a good job of integrating it with Siri it would still occasionally make a mistake due to some random confluence of user input, garbled voice to text, and system prompt data. Over a large enough user base this is bound to happen often enough to be a problem i.e. generate bad press. Microsoft is steaming ahead with this because they've mastered the art of dealing with bad press from buggy software, they can afford to have it go wrong occasionally. Apple on the other hand has their premium brand aura to preserve (whether deserved or not) and so they'll be late comers to this, having to put safetybelt on top of safetybelt to keep incidents to the barest minimum.
Fully concur with this take and I think that it describes why Apple would be slower and more cautious in the rollout.
The end result is still the same in the long term, and that they’re going to eventually be able to figure out systemic ways to include LLMs and work to improve them to get around existing constraints.
Exactly. Bard and whatever the Bing AI is can be accidentally racist now and again. People will just shrug and go "it's beta / it's just buggy Microsoft" and go about their day.
Get AI Siri to say something racist and it's front page news everywhere in the world.
I'm skeptical. Like Microsoft, Apple has a lot of money to throw at super large models. But the question is whether they are committed enough to AI to do it, and whether they have the required ML talent. The latter was actually a problem for Microsoft, otherwise they wouldn't cooperate with OpenAI. Regarding commitment: This also seems iffy to me. Generative AI doesn't fit very well into Apple's hardware centric philosophy.
> Generative AI doesn't fit very well into Apple's hardware centric philosophy
They have a huge unique niche to exploit with local private models, probably the only company on the planet with the hardware in place to take advantage of this today end-to-end, but unfortunately I don't think generative AI fits with Apple's culture for anything I just think they'd be terrified of it saying something wrong so would probably clip its wings to the point that it's mundane and kinda useless.
I don't think it's a very huge or exploitable niche for two reasons:
- Android, Windows, Linux and MacOS can already run local and private models just fine. Getting something product-ready for iPhone is a game of catch-up, and probably a losing battle if Apple insists on making you use their AI assistant over competing options.
- The software side needs more development. The current SOTA inferencing techniques for Apple Silicon in llama.cpp are cobbled together with Metal shaders and NEON instructions, neither of which are ideal or specific to Apple hardware. If Apple wants to differentiate their silicon from the 2016 Macbooks running LLaMA with AVX, then they have to develop CoreML's API further.
Things are already possible on today's hardware, see https://github.com/mlc-ai/mlc-llm which allows many models to be run on M1/M2 Macs, WASM, iOS and more. The main limiting factor will be small enough, high quality enough models that performance is high enough ultimately this is HW limited and they will need to improve the neural engine/map more computation on to it to make the mobile exp. possible.
Not saying you're right/wrong, but argument from the other side: Apple's hardware ships with hardware specifically for ML, and a lot of it (maybe even all of the recently released?) has shared memory between GPU/CPU, meaning you could run larger models. Windows/Linux is currently stuck with having to run inference with either RAM or VRAM, limiting the options quite a bit.
> But the question is whether they are committed enough to AI to do it, and whether they have the required ML talent.
Given how business leaders throughout tech feel that AI is going to be transformative, I don't think commitment is really going to be a problem. Many leaders feel that "you either get good at AI or you don't exist in 10 years".
In terms of attracting talent, there are 3 main things top AI folks look for:
1. Money (they are people after all)
2. The infrastructure (both hardware and people/organization-wise) to support large AI projects.
3. The willingness to release these AI projects to a large swath of people (to have "impact" as folks like to say).
E.g. Google had 1 and 2 but their reticence to release their models and corporate infighting made many of the top Google researchers leave for gigs elsewhere. I think it remains to be seen how Apple will handle #3 as well.
To chime in Apple already has a lot of great ML talent they are just far more deliberate and slow to change their products. People forget that FaceID was/is one of the most cutting edge ML features ever developed/deployed when it was released a few years ago.
Siri is sort of a red herring because its built by teams and tech that existed before Apple acquired most of its ML talent and some of its inability to evolve has been due to internal politics not the inability to build tech. iOS 17 is an example of Apple moving towards more deep learning speech/text work. I would bet heavily we will see them catch up with well integrated pieces as they have Money, infra, and already the ability to go wide (i.e all iOS users, again think FaceID).
> Generative AI doesn't fit very well into Apple's hardware centric philosophy.
Well, I disagree with this take. Apple is known for playing the long game and planning for many years ahead. CPU power is still growing and Apple now has their own CPUs on every device. Sure you won’t be able to run something similar to GPT-4 in foreseeable future, but I predict we will see multiple small, feature-oriented LLMs that can easily be fit into a smartphone or at least an iPad with M(n) processor
> Generative AI doesn't fit very well into Apple's hardware centric philosophy.
I think it fits pretty well since Apple controls almost 100% of their stack. If they need hardware specific tweaks to make AI models run better, they can do that. On M* Macs for example, the unified memory model lets them do a number of AI tasks even with the lower powered GPU.
I hope Apple gives us both a web API (that can be used on any platform at a low-ish cost, similar to OpenAI) and a smaller, private model that's optimally sized to run on Mac, iPad, and Phone hardware.
The only other one I know is the Apple Maps API. I think it‘s really likely that this will just be an improved version of Siri and not stand-alone product.
"Smaller" models still require excessive RAM while not producing acceptable results for conversation. LLMs will probably always run in the cloud, anything else will just be too weak for most applications.
> I think Apple will come up with some crazy hardware to run good quality LLMs.
you mean like the "neural engine" that has been present in their SoCs for nearly a decade? (this is also why M1/M2s can run LLMs at comparable speeds to desktop GPUs... and they weren't even designed with LLMs in mind yet)
What are those NLP tasks, if I may ask? (I was thinking above about using it as a chatbot like ChatGPT or Bard, which currently seems the only application for end-users.)
News summarization, news data extraction, news question answering, news filtering. I can assure you that older 7B/13B models had trouble following directions and outputting (for example) JSON.
I'm pretty sure Apple won't offer those things locally on iPhones. The hardware requirement is too high and the value to average Apple customers too small.
if you use commodity GPUs, sure. if you use TPUs (which Apple is already building into their chips) the efficiency improvements are massive. seriously look at some Coral Edge TPUs and what they can do at power levels completely unheard of for GPUs. then look at how much faster M1/M2 Macs are than normal desktop GPUs for machine learning tasks because they have an onboard accelerator
It's not just inference time, RAM size is another bottleneck. Apple, being Apple, probably wouldn't want to offer anything less than GPT-3.5 level of intelligence. Which I would estimate at 220 billion parameters (1/8 MoE GPT-4 rumor), which would require 220 GB RAM at 8 bit parameter quantization.
apple probably has the attention to detail to train the absolute shit out of their models. they will not need 8x220M parameters to do what GPT4 does, if they ever get to that point. see LLaMA2 7b and 13b being (subjectively) far better than LLaMA1 even with the same number of parameters, just by having been trained more
apple is known to care a lot about stuff like this. like, a lot. they are pedantic as heck
What most people fail to realise is that LLMs are not enough to improve Siri.
Apple needs something like what Adept.ai has with action based foundational models. [0] for Siri to be useful.
LLMs are essentially overhyped for everything other than summarisation and this just shows that OpenAI really has no moat and it is getting eroded faster than they can stop losing money on training the model.
But you don't need to beat chat-gpt 4. A chat-gpt 3.5 esque model that has access to your todo list and calendar that I can talk to like siri would make the tool 10x more useful overnight.
If it even had access to your last 50 emails I could imagine saying something like.
- remind me to respond to that email from bob tomorrow.
- Draft me 3 responses that are kind, but indicate clearly that we can't go forward with his request.
This could even happen in the background, and I'm happy to pay a bit extra for the compute, it could be "premium siri"