The basic problem here for Apple is that LLMs will never actually be able to avoid prompt injection issues, and the entire "personal awareness" set of functionality they're trying to make uses LLMs. Unless somebody at Apple invents a new state of the art, it's not going to happen.
With that said, I'm surprised they haven't yet at least replaced the 'dumb' Siri commands with something that's effectively an LLM translation layer for an internal API. That would get significantly better experience (even a dumb LLM is way better at understanding natural language directions than Siri) with no 'personal awareness' stuff needed.
All Apple needs to do is allow 3rd party apps to integrate with Siri, then use an LLM as a way to convert natural language into the set of allowed API calls. Basically similar what chatGPT does with function calling. And on osx, they should have already had a head start due to applescript integration everywhere in most apps. I have no idea why they're trying to reinvent the wheel with "Shortcuts" which is severely limited.
They already have everything waiting there for their taking, and they're squandering it for no reason. Siri on osx should have been built on top of AppleScript from the get-go, then the switch to LLM would have been easy.
For that matter, I wonder why on osx I haven't seen any 3rd party apps be a siri replacement using applescript to drive applications directly. So much effort is spent on screen scraping and trying to get "agents" to use computers like humans do, but for decades osx has already had a parallel set of APIs built into every application specifically for machine consumption. And most good 3rd party apps even make use of it.
They already have a compostable automation api with 3rd party integrations: Shortcuts!
It’s not perfect, but surely you could natural language -> llm -> temporary shortcut script and that gets you a decent part of the way to a smarter Siri
I'm not as familiar with Shortcuts API but from a quick glance it seems less rich than apple events/apple script. With LLM + apple script you could achieve computer use agents on easy mode. Not just one-off "what's the weather" queries but complex multi-step interactions like "send the latest photo I took to John".
To start with, Automator on mac would be the perfect place for LLM integration. And Script Editor too. Being perhaps one of the few read-only languages, people would probably _prefer_ an LLM spit out applescript for them. And Apple probably has the highest quality data set internally. Combined with the fact that there there is a uniform way to specify the API surface (sdef), this is a task that is possible by most LLMs today. Just apply a little marketing spin to change the angle from "niche power user feature" to "apple uses computer-use agent AI to allow average joe to automate their entire workflow" and it's a smash hit.
From there it's not much of a stretch to slap some speech recognition (just take Whisper, it already beats whatever iOS uses), add some guardrails, and have it be orders of magnitude better than what Siri currently is capable of. And this is all possible with today's LLMs, and thanks to deepseek without paying a cent to anyone else. When interactive computer-use does get mostly solved, that can be added in as a fallback for anything where applescript doesn't cut it (e.g. web navigation, electron apps, etc.). But there's a clear logical progression that allows you to easily evolve the user experience as the technology matures without having to throw out your stack the entire time.
But to me I think their fate was sealed was Shortcuts was shipped on mac when Automator already existed. And it's clear apple events has been a languishing feature, with integration in native apps already breaking.
Shortcuts takes precedence because it was originally created on iOS and TC want absolute convergence between the OS.
They don't realize (or don't care) how restricting it is to go this route. To be honest that may be the desired result.
I have tried to use Shortcuts a few times, and if Automator could be frustrating, Shortcuts is even worse.
It seems to have improved with th recent scripting additions, but in my opinion it doesn't a lot of sense. On the phone it seems only useful to make up for lacking software with mediocre voice assistant and on the desktop, it is very lacking and limited. Applescript is an obtuse language but at least it's a programming language still, with the Shortcut toolkit it seems like you need to approach everything in a roundabout way.
In any case, it shows the failing of Apple ideology of "easy" computing for everyone. At some point you need to admit that some things are going to need some learning and competency and the best thing you can do is provide tools to enable the competent user. Not create some Frankenstein thing and the incompetent won't understand better anyway.
Apple has really failed, chasing "simplicity" at the expense of actual usefulness. And this has permeated even the desktop OS which is just dumb.
I’m not sure I want / need an LLM for the handful of basic commands I want Siri to do in the car with no complication. Siri is good at some straightforward command patterns. And that’s how users have been trained.
Adding an LLM feels like a solution looking for a problem
I have a pretty good grasp on Siri commands and it's still _terrible_. Here are a few from the past few weeks:
“Siri, drive to <regional grocery chain 20 minutes away>” → “Now driving to <regional grocery chain 2 hours away>.”
“Siri, send <name> a message on <popular messaging app>.” → One of _two_ responses (seems random): “Sorry, I can't do that while you're in the car” OR “What would you like to say to <name>?”
“Siri, play '<song in my Apple Music library by an artist I play every week>'.” → “Now playing <some song I've never heard of by an artist I've never played>.”
“Siri, remind me when I get home to <task>.” → “Sorry, I don't have your address.” (My contact information, with complete address, is in my Apple Contacts, and set as my card.)
The list goes on. It's actually become somewhat of a game for my wife whether Siri will actually complete a given request.
That's pretty much my experience as well.
Even something as dumb and simple as setting timers is getting messed up some times. To the point where I'm just considering buying a hardware multi-timer.
I'm not sure how good Google is at this but it seems this sort of technology has hard limits anyway.
It's already hard for humans to understand each other so I wonder if all of this isn't just a giant distraction and a big waste of time.
As another mentioned, many of us haven't been trained. Every time I've reached for Siri I've been let down and just did it myself or didn't do it at all.
When I bought the new iphone (coming from Android most recently and iphone before that) I figured that Siri would actually become good because of this translation layer, and then them just not doing that has been a huge let down for me on the phone.
Palm's Graffiti handwriting system, for people who took the time to learn it, was strictly better and less frustrating than natural handwriting systems at the time. It lost to keyboard-based systems, which honestly are worse.
In my experience, "good enough" systems that need far less training will win *
I also only use Google Assistant to choose music and make calls in the car, but that's also because it kinda sucks at anything else and I haven't learned all the commands.
LLMs are probably better at "understanding" what the user wants, e.g. "Hey Siri, I have to take my kids to the doctor tomorrow" would reply "Do you want me to make a calendar event?".
Asking ChatGPT, it even has suggestions like "Do you want me to check traffic, do you want me to make a checklist for what to bring?".
If I continue with "but I can't make it", one of the suggestions ChatGPT imagine possible is: "Would you like me to send a message to your partner, babysitter, or anyone else involved?"
Yes, for users that have been conditioned on how to talk to Siri, there is no point adding an LLM layer. But for everyone else an LLM might add a more natural and forgiving interface. Also an interface that might be more open to exploration, if you can describe a goal and the LLM picks whatever command it feels fits that intent.
I can absolutely see an argument that it should be a feature power users can turn off to keep the "original" Siri experience
Everyone thought LLMs would scale exponentially like everything else in computer science. It's actually linear and they're all stuck holding a bag of mostly worthless goods.
This problem of non-technical product folks over-promising features is going to get much worse in the age of LLMs. The models are incredibly adept at providing a proof-of-concept. But it's often a monumental endeavour to cross the gap between 70% and 90% accurcacy; 90%-95% even more. This long-tail pain isn't new, but the ability for non-technical folks to poke a model on a chat site and then assume their idea is ready to be deployed to production is a more recent phenomenon.
> The models are incredibly adept at providing a proof-of-concept. But it's often a monumental endeavour to cross the gap
That's not just with LLMs. This has been an Achille's heel of demos for decades.
It's usually quite easy to make a demo of something sexy, but making it ship, is a very, very different thing.
This is made exponentially worse, by managers thinking that the demo means "we're almost ready," and even by the engineers, themselves, who are often not experienced enough to understand that demo != ship. Also, it has been my experience, that researchers that come up with amazing ideas, have absolutely no appreciation for what it takes to turn their ideas into ship product. Since Apple is all about ship product, they have to bridge that gulf. They are trying to commoditize something that is mostly buzz and raw research, right now.
I'm really pretty jaded about ML and LLMs in general, right now. I feel as if a whole bunch of really sexy demos were released, creating a huge level of buzz, but that the shipping product is still a ways off.
I don’t disagree. I do think there will be a tendency to say, “We can do X using AI.” When X can happen, but it isn’t guaranteed to happen by the system.
Here, it doesn’t sound like the features promised were truly demo-able, and when they were they were “not working properly up to a third of the time”.
Having a random CRUD MVP that is 2/3rds done is different than having a SOTA LLM implementation only being 2/3rds reliable. It is a vastly different problem to get from there to the finish line.
But I think marketing types would be equally likely to make promises in both scenarios.
I’m getting this sense too, as my own employer is starting to add AI features. The pitches for what is possible are not grounded in what engineers know the system can do, they’re just assumptions about “if we give the LLM these inputs, we’d expect to see these outputs that are accurate enough for meaningful productivity gains.”
I’m not an AI skeptic, but it’ll be interesting to see how we manage the uncertainty in these projects.
> Apple’s product marketing organization, marketing communications, software engineering team and AI organization are all culpable.. Given the nature of this collective failure, it’s unlikely to result in management changes on Cook’s executive team.. the whole mess is likely to come to a head this week at Apple’s annual Top 100 offsite. A March tradition since the days of Steve Jobs, this is a meeting of the company’s 100 most important executives.
Since no one was responsible for this collective failure, can a single leader be given responsibility for future Siri success?
Management is about persuading people to do things they do not want to do, while leadership is about inspiring people to do things they never thought they could. —Steve Jobs
I upgraded to an iPhone 16 in part because I was interested in the AI augments to iOS. My assumption was that Apple was likely to do something tasteful, and probably not nuclear-level bad in terms of privacy.
I've since disabled everything AI. My feeling is that my phone is much snappier without. Text summary features are pretty bad—in one case a summary of a text thread led me to believe my dad was having an acute health crisis when in fact it was a fairly benign discussion of news from doctors. Even custom emoji, which could have been fun, are simply bad.
Meanwhile some basic iOS features like "mute all notifications from this particular text thread" have been broken for me for several months.
I have an android phone and switching google assistant to gemini made it worse for the fairly basic things I actually use google assistant for, like setting reminders, so I switched it back.
Are people really clamoring for AI in the voice assistant on their phone?
Using a fully-fledged flagship model as a personal assistant actually works pretty well. You don't need a task app, just tell the AI to remember and remind you. Granted, now you have to inject prompts to work around context limitations, allow the AI to schedule wake ups for itself or allow it to send notifications, etc. But those are issues with straight forward solutions. The issue is that for now those models are incredibly expensive and resource intensive to run. And a quantized 3.25B model like Gemini Nano just can't perform the same.
Yup, had to stop using it, next step was to get rid of Gemini altogether. It's so weird to claim you give people an advanced AI when actually you are removing a whole bunch of features.
This is why I still haven't pulled the trigger on home automation stuff. Bad enough keeping my laptop functioning, don't want that experience extending to the rest of my life!
I know Apple has been known to wait until a technology was very mature before putting it into a product, sometimes years after competitors had already done so.
But Apple Intelligence seems like the complete opposite… like they had to have something now.
It's not even to the level of Apple Intelligence. If I tell Siri to set a timer for 5 minutes, it goes off, and I say "Hey Siri, stop alarm," it stops my daily wake-up alarm, not the timer. It feels like Siri hasn't gotten any better in the past decade.
The same year the tech industry got taken by storm by LLMs, Apple announced a brutally expensive, niche VR/AR headset with no particular use case. That product was essentially DOA.
I think this is the first time in decades where Apple took a step from their back foot. The Vision Pro announcement during the initial hype cycle of ChatGPT made it look like they were just totally out of step with the industry. It was a dud product that cost billions in R&D, the market for it was unclear, and the industry zeitgeist (and talent) was going in a totally different direction.
So their response was to rush something out the door to avoid looking like they were caught flat-footed. Now it's abundantly obvious they were unprepared for AI, they've failed to ship features they promised, and they have to commit resources to support an additional platform that no one cares about because they did ship that one. Worst of all worlds.
Had they slept on the Vision Pro, I'm pretty convinced they could have credibly kept their powder dry on LLMs, like they've done with many industry hype cycles in the last. People would've assumed they were doing the Apple thing of not being first to ship something, but the first to ship compelling.
Vision Pro is a vanity product for Tim Cook. Once you understand that it makes a lot more sense.
It's the only product he has ever pushed, and I believe it's because of the shallow wow factor which fits TC very well.
In any case it would always have been a nice product, regardless of price, but at this price it's just nonsensical.
Is there evidence Vision Pro production was ‘cancelled’ as opposed to just running its limited course as planned? Are there substantial leaks indicating its successor has been cancelled?
Early 2024: Apple supplier says planned production cut by 50%
Late 2024: Apple supplier says cheaper Vision Pro 2 (N109) cancelled
Late 2024: senior executive moved from Vision Pro to Siri/AI team
Early 2025: Bloomberg says tethered Apple AR glasses (N107) cancelled
There may be an SoC (e.g. M4/M5) refresh of existing Vision Pro 1 design.
This all doesn’t sound worth reading into. There are other interpretations to supplier leaks. Ditto re: execs moving between teams. Apple AR != Vision Pro. On the flip side Apple is still shipping updates, new content, and selling units. I’ll just wait and see.
It's true that 50% cut in production followed by a halt in production does not mean that Vision Pro development has stopped. A smaller number of employees could continue work and something could be released in a few years, like any R&D project.
In comparison, Meta smart glasses sold 2M units in 5 quarters. Essilor-Luxotica is increasing production to 10M units by the end of 2026. Perhaps to be expected at 90% lower cost than Vision Pro, but those volumes have kickstarted an app ecosystem for smart glasses, which has failed to materialize for the low-volume VisionPro.
If future 2027 Apple AR glasses appear with a subset of VisionOS, they will compete with incumbent products from Meta and Xreal. Some lessons learned and some apps developed for Apple AR could help Vision Pro and VisionOS, if Apple AR gains millions of users like Meta smart glasses.
> Across Vorrath's many high-profile Apple projects, she has been known for keeping work on schedule, and for implementing rigorous bug testing. Consequently, her move to the Apple Intelligence and Siri team is likely to be because the project needs to be given more impetus.
They could have waited with the vision pro too. Just keep the tech in their back pocket until they could release something that was viable for consumers.
The personal context issue could have been solved long before LLMs appeared.
My foil for this has always been the simple request from Siri: "Take me home".
Apple and my device know where I live. I have even registered "home" in Apple maps. This is not a huge leap, and this is not even something that requires an LLM. But it does force the featureset and capability to be better. This was a problem in 2015. 10 years later I still can't make this simple request.
I don't know why Apple just froze Siri and put it on life support. They could have been doing far better in this space.
On a related note, I don’t know how Apple has managed to mess up so badly the idea of storing your own information like address, phone number, etc.
Every time I’ve moved, for months afterwards I keep having to fix my address when using Apple Pay or setting directions to home, etc. because it seems like they keep multiple copies of that information all over the place, instead of referring to a single piece of information.
Places I can think of where I’ve had to update my address one by one: each credit card I have associated with Apple Pay; Maps; my own contact that I can share with others; subscription information; shipping address for Apple Pay.
I get it that some people use separate billing and shipping addresses, but for the vast majority of people the use case is “I have a single home address where I want to direct Maps to, be billed at, and ship stuff to, please update it all at once when I update my address.”
Yes it's extremely dumb and annoying. It's everywhere in Apple products, for example your Health ID cannot get your weight for your latest recorded weight in the app. I mean, it's even the same app, it's like the devs can't be bothered to make a smart product.
I think this is because of the way they silo everything, the way they are paranoid about privacy/security (mostly for their marketing, they don't care that much) and the culture that isn't very competitive anymore.
It finally hurts them because the experience isn't coherent and very frustrating for the user. It's funny because the marketing is all about the "ecosystem" but in reality, the coherence is mostly about how things look.
The rumor of a big redesign isn't surprising; instead of working on how things actually work (they wouldn't need to back off some marketing choice and that seems impossible) they will put a coat of paint and make people focus on good looks.
The last redesign was terrible and it took an awful lot of time to get things back to half-decent.
It's just one more motivation to not renew for an iPhone I guess...
My speculation is that the small on-device models are simply not useful enough for practical purposes.
The on-device transformer models are described as having 3B parameters[0]. Their own figures from 6 months ago show that humans prefer other models' output 25-40% of the time, which aligns with Gurman's reporting.
I don't know how well Apple's A-series chips can handle transformers, but if you play with their on-device diffusion models (Image Playground, Genmoji), you can watch your battery drop before your eyes. And the output of these models is also embarrassingly behind the state of the art.
If Apple can make great foundation models for Private Cloud Compute, that's great. But then what's the point of buying a new iPhone for on-device inference capabilities?
Secondly, I speculate that allowing a server-side model to query the personal context database is going to be hard to do performantly, requiring several round trips and uploading fragments of the data to the cloud for processing.
Apple's approach makes no sense but they don't have a choice because of all the past marketing nonsense.
They need to justify their hardware quality/price which is becoming irrelevant because even the cheapest phone work just fine without lags nowadays.
AI will be something to be done on servers for the foreseeable future but Apple put itself in a hole with their privacy nonsense.
If instead they had truly invested on Macs and allowed their customers to build their own personal cloud, they would have a business case. Instead, they were extremely greedy and tried to double dip with high hardware price and "service" subscriptions (their low storage devices are made just for that).
They are going to pay for this short-term thinking, eventually...
I am curious how many people are like me and never use the voice assistants on phones. I have amazon alexa devices and only use it for timers, weather, and controlling lights. I never use my phone's voice assistant. (I'm not even sure how to turn it on)
It seems there should be two intelligent agents Siri which takes commands and does them and then some other agent that is an llm that you can call up with a different call name. By separating the two you solve all the problems.
It’s one thing to chat with an LLM about information/knowledge. It’s another completely as a user to understand which actions it can take on your behalf. That seems like the disconnect.
What would be nice if an LLM could just exist to tell me what Siri is capable of. As I usually blindly stumble into its capabilities.
siri was under performing for years: im a 'heavy' user - it's been terrible. it worked better in 2012 and slowly got worse. it's acceptable in 2025 but not good.
the siri division was caught nlp-ing anonymized data and totally sideswiped by llms. they have not transitioned. generally apple finds a Moses (like they did with m1 chips), hires a cracked team to build out but.. the salaries they offer don't touch openai or anthropic and who wants to work with nut that was building out siri initially? the guy is terrible
steve would have gutted the siri team. switched to a <100 person team and they'd have something nice 2025, even unique. apple is the team i expected to have deepseek type innovations on limited hardware. instead she wants to search the internet for what my local humidity levels are.
I only want one feature from Siri. When I say "hey Siri, set a timer for 10 minutes" while in my kitchen, I'd like it show up as a live activity on my phone. (You can see it in your phone, but it's buried deeply inside the Home app, which is dumb.)
I use a Mac at work, have an Apple Watch and iPhone, and have yet to use any AI features. I just don't need them. I'm overjoyed when I add a meeting in Calendar and it shows up when I tap the date on my watch. I don't need artificial intelligence when the software is well designed and actually works!
Whatever Tesla uses for its voice navigation just works every time. No idea why Siri (and Alexa) still have such poor command recognition by comparison.
Jon Gruber's latest long post[0] comments on the oddity that this is only coming from Gurman, with zero other sources able to corroborate it.
Personally, I am ambivalent: I hope that Apple is taking the situation at least as seriously as it seems to be in the apparent leak, and I hope that they do indeed have some of this stuff working in a partial, demo-able fashion.
Been nerd raging here on HN since October cause compared to the same chatGPT app on my iPhone Siri sucks!
As a nerd who talks to chatGPT for past year or more while driving to get things done & use it as a knowledge base.. I've been like oh awesome Siri will do just as GPT does once Apple intelligence is released ..oh ok not until a future update ..oh wait now not at all lol
Further been expressing my desire for a GPT phone where on the Lock Screen u visually see your AI assistant or agent and it sees you. Basically a H.E.R. Phone that does everything for you and interacts with AI agents (businesses, friends, families, etc) to get things done for you via chat, text, hand, facial gestures... basically a human AI in ur phone that does everything for u.
No doubt such gets poo-poo(ed) as I think only 25 to 40 percent of phone users use Siri, Alexa, etc. a lot don't use it cause it just hadn't worked well for them to privacy concerns. But innovation and Apple is getting reamed for not being innovative.. not even matching what the chatGPT app offers.
Actual article: https://www.bloomberg.com/news/articles/2025-03-14/apple-s-s... (https://news.ycombinator.com/item?id=43365517)
Related:
Something Is Rotten in the State of Cupertino
https://news.ycombinator.com/item?id=43348891
The basic problem here for Apple is that LLMs will never actually be able to avoid prompt injection issues, and the entire "personal awareness" set of functionality they're trying to make uses LLMs. Unless somebody at Apple invents a new state of the art, it's not going to happen.
With that said, I'm surprised they haven't yet at least replaced the 'dumb' Siri commands with something that's effectively an LLM translation layer for an internal API. That would get significantly better experience (even a dumb LLM is way better at understanding natural language directions than Siri) with no 'personal awareness' stuff needed.
All Apple needs to do is allow 3rd party apps to integrate with Siri, then use an LLM as a way to convert natural language into the set of allowed API calls. Basically similar what chatGPT does with function calling. And on osx, they should have already had a head start due to applescript integration everywhere in most apps. I have no idea why they're trying to reinvent the wheel with "Shortcuts" which is severely limited.
They already have everything waiting there for their taking, and they're squandering it for no reason. Siri on osx should have been built on top of AppleScript from the get-go, then the switch to LLM would have been easy.
For that matter, I wonder why on osx I haven't seen any 3rd party apps be a siri replacement using applescript to drive applications directly. So much effort is spent on screen scraping and trying to get "agents" to use computers like humans do, but for decades osx has already had a parallel set of APIs built into every application specifically for machine consumption. And most good 3rd party apps even make use of it.
They already have a compostable automation api with 3rd party integrations: Shortcuts!
It’s not perfect, but surely you could natural language -> llm -> temporary shortcut script and that gets you a decent part of the way to a smarter Siri
I'm not as familiar with Shortcuts API but from a quick glance it seems less rich than apple events/apple script. With LLM + apple script you could achieve computer use agents on easy mode. Not just one-off "what's the weather" queries but complex multi-step interactions like "send the latest photo I took to John".
To start with, Automator on mac would be the perfect place for LLM integration. And Script Editor too. Being perhaps one of the few read-only languages, people would probably _prefer_ an LLM spit out applescript for them. And Apple probably has the highest quality data set internally. Combined with the fact that there there is a uniform way to specify the API surface (sdef), this is a task that is possible by most LLMs today. Just apply a little marketing spin to change the angle from "niche power user feature" to "apple uses computer-use agent AI to allow average joe to automate their entire workflow" and it's a smash hit.
From there it's not much of a stretch to slap some speech recognition (just take Whisper, it already beats whatever iOS uses), add some guardrails, and have it be orders of magnitude better than what Siri currently is capable of. And this is all possible with today's LLMs, and thanks to deepseek without paying a cent to anyone else. When interactive computer-use does get mostly solved, that can be added in as a fallback for anything where applescript doesn't cut it (e.g. web navigation, electron apps, etc.). But there's a clear logical progression that allows you to easily evolve the user experience as the technology matures without having to throw out your stack the entire time.
But to me I think their fate was sealed was Shortcuts was shipped on mac when Automator already existed. And it's clear apple events has been a languishing feature, with integration in native apps already breaking.
They have since added App Intents to the LLM automation stew.
Shortcuts takes precedence because it was originally created on iOS and TC want absolute convergence between the OS. They don't realize (or don't care) how restricting it is to go this route. To be honest that may be the desired result.
I have tried to use Shortcuts a few times, and if Automator could be frustrating, Shortcuts is even worse. It seems to have improved with th recent scripting additions, but in my opinion it doesn't a lot of sense. On the phone it seems only useful to make up for lacking software with mediocre voice assistant and on the desktop, it is very lacking and limited. Applescript is an obtuse language but at least it's a programming language still, with the Shortcut toolkit it seems like you need to approach everything in a roundabout way.
In any case, it shows the failing of Apple ideology of "easy" computing for everyone. At some point you need to admit that some things are going to need some learning and competency and the best thing you can do is provide tools to enable the competent user. Not create some Frankenstein thing and the incompetent won't understand better anyway.
Apple has really failed, chasing "simplicity" at the expense of actual usefulness. And this has permeated even the desktop OS which is just dumb.
What you are describing is definitely how Alexa used to work, no idea about now. My only point being that I’m sure this has been thought of before
I’m not sure I want / need an LLM for the handful of basic commands I want Siri to do in the car with no complication. Siri is good at some straightforward command patterns. And that’s how users have been trained.
Adding an LLM feels like a solution looking for a problem
I have a pretty good grasp on Siri commands and it's still _terrible_. Here are a few from the past few weeks:
“Siri, drive to <regional grocery chain 20 minutes away>” → “Now driving to <regional grocery chain 2 hours away>.”
“Siri, send <name> a message on <popular messaging app>.” → One of _two_ responses (seems random): “Sorry, I can't do that while you're in the car” OR “What would you like to say to <name>?”
“Siri, play '<song in my Apple Music library by an artist I play every week>'.” → “Now playing <some song I've never heard of by an artist I've never played>.”
“Siri, remind me when I get home to <task>.” → “Sorry, I don't have your address.” (My contact information, with complete address, is in my Apple Contacts, and set as my card.)
The list goes on. It's actually become somewhat of a game for my wife whether Siri will actually complete a given request.
That's pretty much my experience as well. Even something as dumb and simple as setting timers is getting messed up some times. To the point where I'm just considering buying a hardware multi-timer.
I'm not sure how good Google is at this but it seems this sort of technology has hard limits anyway. It's already hard for humans to understand each other so I wonder if all of this isn't just a giant distraction and a big waste of time.
One feature I love on my android phone is how reliably it sets timers based on a voice command:
"Ok google, set a timer for 8 minutes" has kept me from burning my food on the stove for at least a year now!
As another mentioned, many of us haven't been trained. Every time I've reached for Siri I've been let down and just did it myself or didn't do it at all.
When I bought the new iphone (coming from Android most recently and iphone before that) I figured that Siri would actually become good because of this translation layer, and then them just not doing that has been a huge let down for me on the phone.
> how users have been trained
Palm's Graffiti handwriting system, for people who took the time to learn it, was strictly better and less frustrating than natural handwriting systems at the time. It lost to keyboard-based systems, which honestly are worse.
In my experience, "good enough" systems that need far less training will win *
I also only use Google Assistant to choose music and make calls in the car, but that's also because it kinda sucks at anything else and I haven't learned all the commands.
---
* except for hr systems, for some reason
LLMs are probably better at "understanding" what the user wants, e.g. "Hey Siri, I have to take my kids to the doctor tomorrow" would reply "Do you want me to make a calendar event?".
Asking ChatGPT, it even has suggestions like "Do you want me to check traffic, do you want me to make a checklist for what to bring?".
If I continue with "but I can't make it", one of the suggestions ChatGPT imagine possible is: "Would you like me to send a message to your partner, babysitter, or anyone else involved?"
if they used something like Whisper for transcription, the quality of understanding what I'm saying would already massively improve.
Yes, for users that have been conditioned on how to talk to Siri, there is no point adding an LLM layer. But for everyone else an LLM might add a more natural and forgiving interface. Also an interface that might be more open to exploration, if you can describe a goal and the LLM picks whatever command it feels fits that intent.
I can absolutely see an argument that it should be a feature power users can turn off to keep the "original" Siri experience
Everyone thought LLMs would scale exponentially like everything else in computer science. It's actually linear and they're all stuck holding a bag of mostly worthless goods.
This problem of non-technical product folks over-promising features is going to get much worse in the age of LLMs. The models are incredibly adept at providing a proof-of-concept. But it's often a monumental endeavour to cross the gap between 70% and 90% accurcacy; 90%-95% even more. This long-tail pain isn't new, but the ability for non-technical folks to poke a model on a chat site and then assume their idea is ready to be deployed to production is a more recent phenomenon.
> The models are incredibly adept at providing a proof-of-concept. But it's often a monumental endeavour to cross the gap
That's not just with LLMs. This has been an Achille's heel of demos for decades.
It's usually quite easy to make a demo of something sexy, but making it ship, is a very, very different thing.
This is made exponentially worse, by managers thinking that the demo means "we're almost ready," and even by the engineers, themselves, who are often not experienced enough to understand that demo != ship. Also, it has been my experience, that researchers that come up with amazing ideas, have absolutely no appreciation for what it takes to turn their ideas into ship product. Since Apple is all about ship product, they have to bridge that gulf. They are trying to commoditize something that is mostly buzz and raw research, right now.
I'm really pretty jaded about ML and LLMs in general, right now. I feel as if a whole bunch of really sexy demos were released, creating a huge level of buzz, but that the shipping product is still a ways off.
I don’t disagree. I do think there will be a tendency to say, “We can do X using AI.” When X can happen, but it isn’t guaranteed to happen by the system.
Here, it doesn’t sound like the features promised were truly demo-able, and when they were they were “not working properly up to a third of the time”.
Having a random CRUD MVP that is 2/3rds done is different than having a SOTA LLM implementation only being 2/3rds reliable. It is a vastly different problem to get from there to the finish line.
But I think marketing types would be equally likely to make promises in both scenarios.
I’m getting this sense too, as my own employer is starting to add AI features. The pitches for what is possible are not grounded in what engineers know the system can do, they’re just assumptions about “if we give the LLM these inputs, we’d expect to see these outputs that are accurate enough for meaningful productivity gains.”
I’m not an AI skeptic, but it’ll be interesting to see how we manage the uncertainty in these projects.
> But it's often a monumental endeavour to cross the gap between 70% and 90% accurcacy; 90%-95% even more.
That sounds exactly what we went through in the late 90's with voice recognition (remember dragon naturally speaking and via voice?).
https://archive.is/hj3Om
> Apple’s product marketing organization, marketing communications, software engineering team and AI organization are all culpable.. Given the nature of this collective failure, it’s unlikely to result in management changes on Cook’s executive team.. the whole mess is likely to come to a head this week at Apple’s annual Top 100 offsite. A March tradition since the days of Steve Jobs, this is a meeting of the company’s 100 most important executives.
Since no one was responsible for this collective failure, can a single leader be given responsibility for future Siri success?
I upgraded to an iPhone 16 in part because I was interested in the AI augments to iOS. My assumption was that Apple was likely to do something tasteful, and probably not nuclear-level bad in terms of privacy.
I've since disabled everything AI. My feeling is that my phone is much snappier without. Text summary features are pretty bad—in one case a summary of a text thread led me to believe my dad was having an acute health crisis when in fact it was a fairly benign discussion of news from doctors. Even custom emoji, which could have been fun, are simply bad.
Meanwhile some basic iOS features like "mute all notifications from this particular text thread" have been broken for me for several months.
Their ai summary really eats shit when it comes to slang heavy, short messages.
It’d be genuinely useful if it kicked on in threads with high amounts of unread messages
I have an android phone and switching google assistant to gemini made it worse for the fairly basic things I actually use google assistant for, like setting reminders, so I switched it back.
Are people really clamoring for AI in the voice assistant on their phone?
> so I switched it back.
Google has helpfully decided that you don't actually want to do that:
https://blog.google/products/gemini/google-assistant-gemini-...
later this year, the classic Google Assistant will no longer be accessible on most mobile devices
The Gemini based assistant has been getting better... It was weird to see the massive step backwards though.
+1 OMG I was going die using that Gemini garbage! Was so glad they had a way to switch back!
Using a fully-fledged flagship model as a personal assistant actually works pretty well. You don't need a task app, just tell the AI to remember and remind you. Granted, now you have to inject prompts to work around context limitations, allow the AI to schedule wake ups for itself or allow it to send notifications, etc. But those are issues with straight forward solutions. The issue is that for now those models are incredibly expensive and resource intensive to run. And a quantized 3.25B model like Gemini Nano just can't perform the same.
Yup, had to stop using it, next step was to get rid of Gemini altogether. It's so weird to claim you give people an advanced AI when actually you are removing a whole bunch of features.
My Nest can still only get “ok google turn on the light” the second time I say it. Good shit!
This is why I still haven't pulled the trigger on home automation stuff. Bad enough keeping my laptop functioning, don't want that experience extending to the rest of my life!
I know Apple has been known to wait until a technology was very mature before putting it into a product, sometimes years after competitors had already done so.
But Apple Intelligence seems like the complete opposite… like they had to have something now.
It's not even to the level of Apple Intelligence. If I tell Siri to set a timer for 5 minutes, it goes off, and I say "Hey Siri, stop alarm," it stops my daily wake-up alarm, not the timer. It feels like Siri hasn't gotten any better in the past decade.
It is almost like Apple has an operations guy at the helm who doesn't understand software at all!
The same year the tech industry got taken by storm by LLMs, Apple announced a brutally expensive, niche VR/AR headset with no particular use case. That product was essentially DOA.
I think this is the first time in decades where Apple took a step from their back foot. The Vision Pro announcement during the initial hype cycle of ChatGPT made it look like they were just totally out of step with the industry. It was a dud product that cost billions in R&D, the market for it was unclear, and the industry zeitgeist (and talent) was going in a totally different direction.
So their response was to rush something out the door to avoid looking like they were caught flat-footed. Now it's abundantly obvious they were unprepared for AI, they've failed to ship features they promised, and they have to commit resources to support an additional platform that no one cares about because they did ship that one. Worst of all worlds.
Had they slept on the Vision Pro, I'm pretty convinced they could have credibly kept their powder dry on LLMs, like they've done with many industry hype cycles in the last. People would've assumed they were doing the Apple thing of not being first to ship something, but the first to ship compelling.
Vision Pro is a vanity product for Tim Cook. Once you understand that it makes a lot more sense. It's the only product he has ever pushed, and I believe it's because of the shallow wow factor which fits TC very well. In any case it would always have been a nice product, regardless of price, but at this price it's just nonsensical.
Yep, Tim Cook = Steve Ballmer
> Had they slept on the Vision Pro
2025 versions of iOS, iPadOS and macOs are all being redesigned based on VisionOS, despite VisionPro production and successor cancellation.
Is there evidence Vision Pro production was ‘cancelled’ as opposed to just running its limited course as planned? Are there substantial leaks indicating its successor has been cancelled?
This all doesn’t sound worth reading into. There are other interpretations to supplier leaks. Ditto re: execs moving between teams. Apple AR != Vision Pro. On the flip side Apple is still shipping updates, new content, and selling units. I’ll just wait and see.
It's true that 50% cut in production followed by a halt in production does not mean that Vision Pro development has stopped. A smaller number of employees could continue work and something could be released in a few years, like any R&D project.
In comparison, Meta smart glasses sold 2M units in 5 quarters. Essilor-Luxotica is increasing production to 10M units by the end of 2026. Perhaps to be expected at 90% lower cost than Vision Pro, but those volumes have kickstarted an app ecosystem for smart glasses, which has failed to materialize for the low-volume VisionPro.
If future 2027 Apple AR glasses appear with a subset of VisionOS, they will compete with incumbent products from Meta and Xreal. Some lessons learned and some apps developed for Apple AR could help Vision Pro and VisionOS, if Apple AR gains millions of users like Meta smart glasses.
> Late 2024: senior executive moved from Vision Pro to Siri/AI team
That executive is not having a good time...
She may help schedule realigment with reality, https://appleinsider.com/articles/25/01/24/apple-intelligenc...
> Across Vorrath's many high-profile Apple projects, she has been known for keeping work on schedule, and for implementing rigorous bug testing. Consequently, her move to the Apple Intelligence and Siri team is likely to be because the project needs to be given more impetus.
They could have waited with the vision pro too. Just keep the tech in their back pocket until they could release something that was viable for consumers.
Had they slept on the Vision Pro, I'm pretty convinced they could have credibly kept their powder dry on LLMs
No shortage of powder at Apple, it's safe to say.
The personal context issue could have been solved long before LLMs appeared.
My foil for this has always been the simple request from Siri: "Take me home".
Apple and my device know where I live. I have even registered "home" in Apple maps. This is not a huge leap, and this is not even something that requires an LLM. But it does force the featureset and capability to be better. This was a problem in 2015. 10 years later I still can't make this simple request.
I don't know why Apple just froze Siri and put it on life support. They could have been doing far better in this space.
On a related note, I don’t know how Apple has managed to mess up so badly the idea of storing your own information like address, phone number, etc.
Every time I’ve moved, for months afterwards I keep having to fix my address when using Apple Pay or setting directions to home, etc. because it seems like they keep multiple copies of that information all over the place, instead of referring to a single piece of information.
Places I can think of where I’ve had to update my address one by one: each credit card I have associated with Apple Pay; Maps; my own contact that I can share with others; subscription information; shipping address for Apple Pay.
I get it that some people use separate billing and shipping addresses, but for the vast majority of people the use case is “I have a single home address where I want to direct Maps to, be billed at, and ship stuff to, please update it all at once when I update my address.”
I encounter this every time there is an Apple hardware release and I try to buy on the phone, and pay with Apple Pay.
They’ll have lost shipping and billing address and I have to laboriously type it in again before I can pay for it.
Sometimes I will have purchased from them, the very prior week, sending to the same address they now lost.
How a trillion dollar company cannot figure out the most basic functionality of an online shop I don’t know.
Yes it's extremely dumb and annoying. It's everywhere in Apple products, for example your Health ID cannot get your weight for your latest recorded weight in the app. I mean, it's even the same app, it's like the devs can't be bothered to make a smart product.
I think this is because of the way they silo everything, the way they are paranoid about privacy/security (mostly for their marketing, they don't care that much) and the culture that isn't very competitive anymore.
It finally hurts them because the experience isn't coherent and very frustrating for the user. It's funny because the marketing is all about the "ecosystem" but in reality, the coherence is mostly about how things look.
The rumor of a big redesign isn't surprising; instead of working on how things actually work (they wouldn't need to back off some marketing choice and that seems impossible) they will put a coat of paint and make people focus on good looks.
The last redesign was terrible and it took an awful lot of time to get things back to half-decent. It's just one more motivation to not renew for an iPhone I guess...
FYI, I find that "navigate home" works, if you want to get home. (I agree with your complaint however that Siri is brain dead much of the time)
I just tried this, thinking that I was "holding it wrong" and no - it still doesn't work. It says it doesn't know where I live.
The most interesting question arising from this is the meta-question: who leaked this meeting and why?
And how, given Apple's Severance-grade secrecy?
My speculation is that the small on-device models are simply not useful enough for practical purposes.
The on-device transformer models are described as having 3B parameters[0]. Their own figures from 6 months ago show that humans prefer other models' output 25-40% of the time, which aligns with Gurman's reporting.
I don't know how well Apple's A-series chips can handle transformers, but if you play with their on-device diffusion models (Image Playground, Genmoji), you can watch your battery drop before your eyes. And the output of these models is also embarrassingly behind the state of the art.
If Apple can make great foundation models for Private Cloud Compute, that's great. But then what's the point of buying a new iPhone for on-device inference capabilities?
Secondly, I speculate that allowing a server-side model to query the personal context database is going to be hard to do performantly, requiring several round trips and uploading fragments of the data to the cloud for processing.
0: https://machinelearning.apple.com/research/introducing-apple...
Apple's approach makes no sense but they don't have a choice because of all the past marketing nonsense.
They need to justify their hardware quality/price which is becoming irrelevant because even the cheapest phone work just fine without lags nowadays. AI will be something to be done on servers for the foreseeable future but Apple put itself in a hole with their privacy nonsense.
If instead they had truly invested on Macs and allowed their customers to build their own personal cloud, they would have a business case. Instead, they were extremely greedy and tried to double dip with high hardware price and "service" subscriptions (their low storage devices are made just for that). They are going to pay for this short-term thinking, eventually...
I am curious how many people are like me and never use the voice assistants on phones. I have amazon alexa devices and only use it for timers, weather, and controlling lights. I never use my phone's voice assistant. (I'm not even sure how to turn it on)
Enterprise MDM and Apple Configurator can disable voice assistant entirely, which may be needed in some work contexts.
Siri today is capable of doing anything I really want it to.
If it only it could f*ckin understand me! Whisper (or whatever the ChatGPT app uses) runs rings around. Basically zero errors.
For me they just need to fix the goddamn voice recognition performance.
Could a 3rd-party app use Whisper to drive Siri / Shortcuts?
It seems there should be two intelligent agents Siri which takes commands and does them and then some other agent that is an llm that you can call up with a different call name. By separating the two you solve all the problems.
But that doesn't match the marketing vision of the product and brand identity. /s
It’s one thing to chat with an LLM about information/knowledge. It’s another completely as a user to understand which actions it can take on your behalf. That seems like the disconnect.
What would be nice if an LLM could just exist to tell me what Siri is capable of. As I usually blindly stumble into its capabilities.
siri was under performing for years: im a 'heavy' user - it's been terrible. it worked better in 2012 and slowly got worse. it's acceptable in 2025 but not good.
the siri division was caught nlp-ing anonymized data and totally sideswiped by llms. they have not transitioned. generally apple finds a Moses (like they did with m1 chips), hires a cracked team to build out but.. the salaries they offer don't touch openai or anthropic and who wants to work with nut that was building out siri initially? the guy is terrible
steve would have gutted the siri team. switched to a <100 person team and they'd have something nice 2025, even unique. apple is the team i expected to have deepseek type innovations on limited hardware. instead she wants to search the internet for what my local humidity levels are.
> who wants to work with nut that was building out siri initially
Who are you referring to?
I only want one feature from Siri. When I say "hey Siri, set a timer for 10 minutes" while in my kitchen, I'd like it show up as a live activity on my phone. (You can see it in your phone, but it's buried deeply inside the Home app, which is dumb.)
I use a Mac at work, have an Apple Watch and iPhone, and have yet to use any AI features. I just don't need them. I'm overjoyed when I add a meeting in Calendar and it shows up when I tap the date on my watch. I don't need artificial intelligence when the software is well designed and actually works!
yeah, adding "AI" to something like that really means add in random nonsensical failures
Whatever Tesla uses for its voice navigation just works every time. No idea why Siri (and Alexa) still have such poor command recognition by comparison.
This feels like a AI summary of a real article.
Not shipping something that isn't ready for prime time is hardly "dire".
The world can do without learning the merits of putting glue on pizza, visualizing black Nazis, and chat bots that advise users to commit suicide.
Jon Gruber's latest long post[0] comments on the oddity that this is only coming from Gurman, with zero other sources able to corroborate it.
Personally, I am ambivalent: I hope that Apple is taking the situation at least as seriously as it seems to be in the apparent leak, and I hope that they do indeed have some of this stuff working in a partial, demo-able fashion.
[0] https://daringfireball.net/2025/03/a_postscript_on_the_singu...
Been nerd raging here on HN since October cause compared to the same chatGPT app on my iPhone Siri sucks!
As a nerd who talks to chatGPT for past year or more while driving to get things done & use it as a knowledge base.. I've been like oh awesome Siri will do just as GPT does once Apple intelligence is released ..oh ok not until a future update ..oh wait now not at all lol
Further been expressing my desire for a GPT phone where on the Lock Screen u visually see your AI assistant or agent and it sees you. Basically a H.E.R. Phone that does everything for you and interacts with AI agents (businesses, friends, families, etc) to get things done for you via chat, text, hand, facial gestures... basically a human AI in ur phone that does everything for u.
No doubt such gets poo-poo(ed) as I think only 25 to 40 percent of phone users use Siri, Alexa, etc. a lot don't use it cause it just hadn't worked well for them to privacy concerns. But innovation and Apple is getting reamed for not being innovative.. not even matching what the chatGPT app offers.
Just realized something funny.
Current AI is pretty janky, unpredictable, unreliable etc.
So, Microsoft actually has an enormous advantage over Apple, when it comes to AI integration.
Because their whole OS is already like that!
> It feels so well integrated into the rest of the system!
- Anonymous user