Hacker News new | past | comments | ask | show | jobs | submit login
Apple’s Shortcuts will flip the switch on Siri’s potential (techcrunch.com)
239 points by evo_9 10 months ago | hide | past | web | favorite | 191 comments

Cognitive science has a term called "Theory of Mind" (ToM), which refers a human cognitive ability that typically emerges in 4 year olds (In fact there is a famous experiment you should look up and try if you have any kids/nieces/nephews around that age), whereby humans are able to formulate the mind of another entity (the capacity it's working at; information it holds, and doesn't). This is what allows us to, for example, tell a lie. It also allows us to modify our approach in the way we communicate with dogs vs children vs adults, for better communication efficacy (or two different adults where one has high and one has low expertise in some domain, like programming).

In my opinion the biggest thing preventing me from using siri is not what it can and cannot do, but that it has been nearly impossible for me to develop a ToM for Siri. And since I simply dont know what siri is capable of, I only use this bot for a very narrow set of tasks. Furthermore one of my ToM priors for siri is that she has a mind that is incapable of learning. This is a big turnoff - bigger than i think we acknowledge; since we are used to interacting with entities that can, even if laboriously. For instance my Australian shepherd might not be able to bring me the newspaper; but if I really wanted her to do that task, I could slowly get her to approximate that behavior, and it would probably be satisfying to see progress in performance. With siri, I simply assume there are things she can do and things she cannot, and it'd be pointless to try and goad her into adding even a trivial task to her functional repertoire.

The lack of learning is what also bugs me about all the new tech. It's being used for targeted ads, and stuff like that, but not for trivial tasks that would actually improve my interaction with the interface. A few things that come to mind, are that I always listen to music as full albums, and yet the UIs still always try to get my to shuffle various playlists. Every day around the same time I text my partner nearly the same message, "I'm leaving work." and yet autocomplete still suggests other phrases. If I click the share button, I then go through 6 steps to share the article via a text message with the same person. Imagine if clicking share offered one click of "Would you like to send this as a text message to Jane?"

I'm not a fan of the current AI craze, and happily like my devices somewhat stupid, but if they're going to try and be smart, at least learn my predictable behaviors, and offer me shortcuts on those.

It's unclear to me what these new shortcuts really offer, but I'll be interested to try them. However I have a hunch what I want is actually even simpler than these will provide.

So I currently use a combination of Launch Center Pro and Workflow (the developer of which was purchased by Apple last year, and appears to be the source of what has now become Siri Shortcuts) to do what you're describing.

I have a Workflow that when run, uses Maps to estimate the drive time between wherever I am and my house, pads the estimate by 5 minutes, then adds that to the current time to get an ETA, formats it into a text message and sends the ETA to my wife.

In Launch Center Pro, I have a geo-fenced shortcut that presents a notification to run my workflow whenever it sees that I have left the area around my office building.

The UX is, halfway between my office building and my car, I get a notification from LCP - I tap the notification, Workflow thinks for a minute, and presents me with a text message to my wife and I tap "send".

As nice as this is already, I believe Siri Shortcuts may improve it in several ways:

1) Currently, apps are not allowed to send messages without explicit user interaction, so Workflow can't actually text my wife, it just presents me with an imessage screen and an already-written message that I must tap "send" on

2) Workflow does not support geofenced or time-based running of workflows, hence why I need LCP to launch my Workflow based on a geofence. If Siri Shortcuts supports this, then I won't have to rely on LCP anymore.

3) There is no way (that I'm aware of) to trigger my workflow automatically, I must tap a notification and unlock my phone, or run an app and select the workflow within it. An improvement would be a programmable Siri Shortcut ("Hey Siri, let my wife know my ETA"), and even better would be automatic running of workflows based on defined conditions, so my workflow would run and send a message automatically when crossing the geofence without me even needing to be aware of it (besides maybe a notification that lets me know the workflow has run).

According to this article[0] those annoyances are still there

> It does look like certain annoyances with Workflow are showing in Shortcuts as well. For example, sending a message via a Shortcut still requires you to manually hit the send button

[0] https://9to5mac.com/2018/07/06/apple-shortcuts-hands-on/

FWIW, I switched to a new iPhone a few weeks ago, and it has learned at every workday at a particular time when I open a Messages conversation with a particular person and type "Good," it correctly auto-suggests the rest of the phrase that I sent at that time.

I have noticed, however, that it wont always make suggestions for similar repeated texting events. I think it has to do with limited space in the iOS auto-complete bar, because once I get past the first couple of words it fills in the rest.

If you're using iOS, is it possible that your usual phrase is too long to fit in auto-complete?

As for the other things you suggest, it appears this is the direction Apple is headed, based on the most recent WWDC keynote.

Yep, I've been thinking quite a bit about this lately as well.

I set the same alarms Monday through Friday to wake up for work (usually a few since I'm a deep sleeper). While I know I could schedule these, I usually prefer to manually set them, so I wind up doing that every night.

In the morning I can do, "Siri, turn off all alarms" and Siri will disable any that are set.

But at night time, I can't do "Siri, set my usual alarms for tomorrow morning."

Seems like a critical type of action that a predictive engine should be able to process. But perhaps I'm wrong.

For a few things ("I'm leaving work" messages, in particular), I've had good luck with Workflow[0][1]. Unfortunately, it's all manually programmed and no learning, but it is decently intuitive and powerful.

[0]: https://www.workflow.is/ [1]: No affiliation -- their app just solves a need of mine.

Isn’t that the app Apple owns now and is based on Siri Shortcuts?

Seems so, but you can actually use Workflow today (without running a beta version of iOS).

The famous experiment is the "Sally Anne task", which goes like this:

The child is in a room with Sally and Anne. Sally has a basket and Anne has a box. Sally puts the marble in her basket and leaves the room. While she's out, Anne steals the marble, and puts it in her box. Sally then comes back into the room.

You ask the child "Where will Sally look for her marble?". This tests whether or not they can fathom the world through Sally's eye, and see that her view of it will be different from their own.

The experiment was formulated by Simon Baron-Cohen.

Did you read the article? It's saying you'll be able to slowly "train" her by adding shortcuts.

Your comment seems to be a general gripe, and it doesn't address the fact that this new change may actually solve your problem.

(I agree with you, but I'm excited by this addition because it finally feels like a difference)

You're just provided with a way to automate a task and have Siri trigger it. It doesn't address the OP's key complaint that there's no clarity on what kinds of tasks Siri can do.

For instance, from 9to5mac, these comments - "While Siri and native integration with the operating system is fantastic, this is still very much the Workflow app, with a few extra bells and whistles. It works and function in a similar fashion by creating blocks that trigger one after another"

"One example of using Shortcuts with Siri is creating a Shortcut that grabs your current location, creates an ETA home, enables Do Not Disturb, and then sends a text to your roommate that you’re on your way home, and plays some music. All while you’re getting into your car, and preparing to leave. And all you have to say is “Hey Siri, I’m on my way home.” Alternatively, you could also just go into the Shortcuts app and start the request there."

All Shortcuts is adding is a way to trigger all of them in one new step. It's like a dumber IFTTT for Siri, basically, one you can't build new behaviors in, you can just script together existing ones. Useful, but it's still hard to tell what can Siri do.

You can search the actions in shortcuts and quickly see what it can do though.

A voice assistant should be able to tell you what it can do. Maybe there should be a mode you can turn on where Siri will watch what you do and when there are things you could do more quickly with Siri, it could tell you.

A huge area all of the voice assistants could improve upon is in error conditions. At home I'll say something like "Alexa, turn on the bedroom light" and it will respond that it doesn't know any device of that name. It should offer to tell you device names it does know about.

>Maybe there should be a mode you can turn on where Siri will watch what you do and when there are things you could do more quickly with Siri, it could tell you.

You say that as if it's a completely non-trivial thing to implement.

I think you mean 'as if it's a completely trivial thing to implement', since non-trivial means hard and complicated.

But, assuming that's what you meant, how does that compare to, say, speech recognition and understanding? Apple, Google, and Amazon are up to their eyebrows in machine learning and AI. They're working to solve very, very hard problems. But the next step of that needs to be to make it so their hard work is discoverable by users organically.

Sorry, you're right. I did mean "trivial" rather than "non-trivial" but barfed the second out anyways. :-P

I think that problem is much harder than just plain machine learning because you have to somehow gauge what the user is doing or what tasks they're performing well enough for the AI to be able to even tell where it can insert itself. I don't know of any product or service right now that can "watch" a user's behavior and suggest places where the AI can insert itself and I suspect the reason why is because doing something like that is very, very difficult.

It's one thing to detect that you leave work at 5pm every day and that you usually drive home. It's another to say "I noticed that you set a cooking timer every day around 6pm, do you want me to set that for you automatically?" You're not setting a timer just to set a timer, you're setting a timer because maybe the recipe you're using requires it. Different recipes will have different timers. The AI doesn't know the intent behind the action just the action itself.

> You say that as if it's a completely non-trivial thing to implement.

I say that as somebody that has greatly benefited from that type of functionality in the past.

Please share any device currently out there that can watch what you do and tell you where it can insert itself. We'll wait.

IIRC, Jetbrains does a great job inside their IDEs. They have a training mode of sorts that suggests faster ways to do things.

That might be the case but it's definitely not watching what you're doing, like you suggested. There may be prompts that are triggered by repeated actions (think hitting the Shift key 5 times in Windows and getting the Sticky Keys prompt) but there's nothing in there that's watching you and suggesting where you can make improvements.

Again, what you're asking for isn't a trivial implementation.

What I'm asking for is pretty much exactly what JetBrains have done in IntelliJ. When they system sees me do something manually that could be done with voice, tell me. For example, if I enter an address into maps then start navigation, the machine could suggest I use Siri next time with the phrase "navigate to 123 main street".

At the very least, it would be nice if I could say "Hey Siri, what voice commands can I use with app X?"

Why would I do that? If I want to know what Siri can do, I can just say "Siri, (do the thing)" and find out. That's -way- easier than scrolling through a list (and more accurate, to boot; just because she can do it, if I can't figure out an acceptable magical incantation, she can't, for all intents and purposes).

Just looking over the list still doesn't address the OP's point. Maybe if I memorized it, in the way that if I had an exhaustive list of everything a 4 year old child can do that I memorized I'd be able to 'intuit' what they could do, but why would I do that?

That's just it; the whole 'theory of mind' is basically the idea that we can intuit what someone else can do, think, etc, without such a list. I'm able to limit my own own mind to have the same limits as someone else. That is, I can imagine what someone else is likely thinking given a subset (or even theoretical superset! I.e., "They know if there is money in this account. If there is, they are likely to do X. If there isn't, they are likely to do Y") of information that I have. I can determine what a child will be able to do based on exposure of other things they can do.

None of that applies to Siri. I can't infer what she has access to (both in terms of data and functionality). I can't use capabilities of one thing to infer capabilities in another. She can order me an Uber; can she order me a pizza? Can she order me a highly detailed expandable oak table from a boutique vendor? I can infer what a real PA is likely to be able to do (even if I don't know the specifics of how), but I can't do that with Siri. It's a black box. Giving me an exhaustive list of all the things doesn't change that; it's now a black box with a manual. Yes, okay, maybe if I memorize the manual I can determine what she can do, but the point the op is making is that for a real PA I can infer what capabilities they have without memorizing a manual. Until Siri can as well, she's not a replacement, feels gimmicky, and has real barriers to adoption to overcome.

>Why would I do that? If I want to know what Siri can do, I can just say "Siri, (do the thing)" and find out. That's -way- easier than scrolling through a list (and more accurate, to boot; just because she can do it, if I can't figure out an acceptable magical incantation, she can't, for all intents and purposes).

Funny, I'm the opposite. I like the ability to flip through and see "Oh, I didn't know it could do that!" Then I mentally file it away as a thing that exists.

I would never have learned that Siri (via Wolfram Alpha) can tell me what planes are overhead just by trying it except for having read it in a list somewhere, because I would never have thought to ask that. But since I read a list of interesting things that Siri knows, I now know it has that information.

Just trying to guess what capabilities are available is like trying to learn how a unix command works with no man page. "Just run it with every possible flag and see what happens!" It'd be great if Siri could do everything, but she can't, and the search space of possible actions with natural language is far too large to find everything I might use by guesswork. A black box with a manual is better than a black box without one.

In the example you gave, you learned about a specific thing siri can do- tell you what planes are overhead. Now say you know from experience that WA also provides the current altitude and speed of those planes. If you had a strong ToM for Siri you would have a very good intuition about whether you could rely on siri for info as well. As it is, I have no idea. Do you? Don't you think it'd be a much better experience if we did know what to expect?

Clearly I didn't express myself well.

In the new shortcuts app, there's a pretty clear view of what siri can do. Basically any text editing you want (someone even made a C parser)

There's a fairly concise list of areas. With a 10 second skim, you can get a sense of "oh, Siri can be taught to do stuff in these areas"

In the same way you wouldn't necessarily know what a dog could do without a little outside research (per their theory of mind dog example)

Meanwhile, with ios 12 all apps will be able to add prominent "add to siri" shortcuts in their apps. So as you use an app, you'll see "ahh, Siri can be trained to work with this all in this way"

Then these can be chained together. So you can now get a fairly clear general idea of what may be possible.

You still need to learn details to do specific training, but this seems like a big step up, and closer to the dog example. You can at least form a theory of what Siri can likely be taught to do.

Not a PA yet, but to me this new iteration is at least graspable.

Just like a manual transmission that doesn't have a diagram on the shift knob. You need to take apart the engine if you want to find out. Or grind out the gearbox a few times accidentally trying to put it in reverse at 50.

Your comment makes me think of an interesting idea.

One of the problems with Siri/Google/Alexa is they try to be everything and do everything at once.

What if they split the expertise/domains they're capable of into different names?

For example, there could be a "mind" that is really good at playing music.

So you'd say, "Hey MusicBot, pause the song."

Or, you could say "Hey, InfoBot, what's the weather?"

Over time, you could get to know more "minds" that live in the cloud, and develop a theory of mind for each of them.

I just saw advertised that google home/mini has at least one named bot for special domains... I believe it was Toby? who is a "chef".

> For instance my Australian shepherd might not be able to bring me the newspaper; but if I really wanted her to do that task, I could slowly get her to approximate that behavior, and it would probably be satisfying to see progress in performance.

Given she's an Australian shepherd, it would probably be only a bit startling to subsequently come home and see her analyzing the finance section of the paper. :)

This is, I think, one of the failings of human ToM. We tend to underestimate entities which are different from us. The differences are not only of capacity and of information. They are differences of worldview, values, and biases that are very hard to capture, even between adult humans of different cultures. It's easy to imagine people as a less intelligent and less experienced versions of yourself, but that is not an accurate theory of their mind.

The same differences, only more so, are present between humans and dogs or humans and Siri. Given that the most empathetic and social people (with, I assume, a better theory of mind) are generally not found in technical roles, I wonder if they might find it easier to make that stretch and effectively use Siri or Alexa?

Ha yes, im saying it'd be difficult to get her to bring the paper because she is already so busy day trading dogecoin.

I hear what you are saying but given we are the designers of the siri 'mind', we are in a unique position to make it more human-centric. It will be easier in the long run to adapt siri to humans than the other way around.

The thought of Siri intentionally telling white lies is pretty funny and a bit creepy

- Hey Siri, am I fat?

- No Bruce, you are just big boned

- Thanks Siri.

tangent: I'd ask anyone claiming to be "big boned" to show me a fat skeleton.

You can tell men's skeletons from those of women based on how wide the hips are versus the shoulders, and you can tell if a woman had given birth based on her skeleton just looking at the position and width of the hips.

Skeletons may not be "big boned" per se, but there are people with wider hips (as one example) who would never be able to be as slim as some supermodels based purely on their bone structure. I agree, though, that the term "big boned" is just a nonsensical platitude to avoid saying someone is overweight.


Please don't do this here.

But those people are "big boned" because they are fat and have been fat for so long that it's changed their bone structure. No one is fat because they're big boned, they're just big boned because they're fat.

Also, for what it's worth, LMGTFY is considered by many to be a very rude response.

I think you can assume anyone that sends those links knows that they are rude.

No need for down votes. Yeah it’s interesting how psychology plays a role in us interacting with a voice assistant too. I don’t think Siri will ever be able to judge you on something.

Your post implies that kids under 4 years old can't lie.

This is, of course, patently untrue. (Mine told lies at 12 months.)

I should have qualified that: "lie with high-rates of success". To clarify, at 1-years-old your infant could not formulate a high quality representation of your 'mind'. That is he/she couldnt fully appreciates that information held in your brain was different than the info available to themselves (the union/intersection/xor of info), nor did they have a strong intuition regarding how vastly different your two mental abilities were, without a theory of how minds work.

The classic experiment I mentioned above involves a child and two adults in a kitchen. One of the adults wants a piece of candy (or something - going from memory here), but the other adult says they cannot have it yet, and puts it in a cabinet. All 3 witness where the candy is placed. At which point the one who wants the candy leaves, but on their way out the door says "when i get back I want the candy". So then it is just one adult and the kid in the room. The adult asks the kid to move the candy to a different location in the kitchen. The kid does this and then sits back down. After a few minutes the adult says to the kid something like - "That other adult who wants the candy will be back in a few minutes. Where do you think they will look first when they get back?" - without fail 3 year olds will point to the cabinet where the candy is currently located, whereas 5 year olds will always point to the cabinet they adult who left saw it last. Something happens in brain development during age-4 that gives us the capacity to understand that different minds hold different information based on their own experiences. This task has been repeated dozens of different ways, including simply having an adult standing in a different location in a room as 3,4,5 year olds, where the adult's view of a wall (of pictures) is clearly obscured by some screen. Then they ask the kids stuff like... how many pictures of teddybears will that adult say there is on that wall? Again 3 year olds report what they themselves see, while 5 year olds perform like adults would on this task.

Thanks for the detailed explanation. As a parent, I can see what you imply with regard to different kinds of lying. A very young child will lie to avoid an undesirable result in a very basic way: "Did you throw your food on the floor?" (with food plainly visible to all). Child nods emphatic "no".

While older child only lies when they think that I don't have enough information to know the truth for sure.

Is this unique to Siri, or do all voice assistants/computer programs exhibit this?

I think it's all of them. Their abilities are just far more rigid than what we are accustomed to. That mixed with the fact that Siri is a black box that requires query to gain a piece of knowledge. This is not how we are used to doing it. A lot of the information we bake-in to the ToM we have for someone or something is based on observing how they behave and interact with others, or simply by themselves. I see my nephew about twice a year, and even with his rapidly changing capacity I can usually pin down what things he is/isnt capable of in a matter of days. Siri on the other hand is in my pocket almost every minute of every day, and i feel like i know nothing about her.

Long story short, I think direct vocal query to gain one piece of rote information is a terribly inefficiant way to develop a ToM of someone/something. If Apple wants to bring siri to the next level they need to figure out a way for us to learn about her abilities passively. The catch however is that I dont want siri bugging me all the time, but maybe sparsely she could give me a popup that says "hey I can do that for you". Particularly if she has 'seen' me do it 100 times and think she could save me a few minutes of effort. That imo would be more game-changing than adding additional siri-api for developers, which will add a bunch more eastereggs that i will never discover.

That is exactly what Shortcuts will help you do (to a degree). Developers can 'donate' (Apple's term for it) the activities that you perform in their applications. "Siri" will attempt to figure out when you perform this action most often using time/geo data and suggest it on the lock screen or search screen. This could also prompt you to add a voice trigger for this action, thereby 'teaching' Siri more.

After reading your clarification, and reflecting on this a bit more, I think Siri might actually have a chance to impress. What came to me is that I sorta have a per-app ToM for Siri (which i suppose parallels humans being good at some things and bad at others). But what I realized is that...

Siri is a cartographer. It seems like she has good domain specific mastery of maps, directions, places, etc. and not to mention, she might even have have an thin (but growing )representation of her own, about my desires and what I know and dont know when it comes to geospatial logistics. In maps Siri really shines, right? I could say something like "Siri, where is the nearest Jack-in-the-Box" and via Maps she say stuff like - There are two Jacks within 3.1 miles, but you might also be interested in knowing there's an In-n-Out and a Shake-Shack nearby as well. In-in-out is currently not as busy as it usually is this time of day, and examining traffic patters I've formulated a shortcut. Should I give you those directions?

Then a few min later I might be eating a double-double and say "Hey Siri, how much memory do I have left on my phone?" and she will reply "Sorry I can't help you with that" (the actual response I got to that question just now). Know thyself Siri! Anyway, it's just stuff like which makes it so difficult to pin down what she is capable of.

But I'd be satisfied if Siri were to master of a few more apps in a domain general manner. That way I can continue avoiding the rote memorization of commands and simply rely on Siri being a domain general boss at this or that app. Paticularly for other use-cases where (as in maps) it actually makes sense to get and recieve info via voice and not simply interacting with the screen.

This is an excellent comment.

I’m the developer of a popular timetable/schedule app for college. It seems like the perfect use case for Siri: “What class do I have next?”.

Up until now it hasn’t been possible due to the restrictive API. I’ve looked into the new shortcuts and Siri API docs and although the beta documentation is sparse, I’m confident I’ll be able to develop a natural, first class Siri integration.

Unfortunately, it won’t be trivial to implement (at least to do well). It probably won’t be done before the iOS 12 release, and I’m hesitant to start working it just yet due to the sparse beta docs.

Over the next year or two, I think there will be some great Siri integrations built. Hopefully, users discover and use them :)

I always wanted something like this in my uni days, but was always too lazy to make it! More importantly than what, "Hey Siri, where's my next class?" ;)

-e- heh, quick LinkedIn stalk reveals you went to UC at the same time as me!

Isn't that just.. a calendar?

This is what I did. Create a ‘Uni’ calendar and fill it with your class days and times - with repeats used as necessary.

Irritatingly, Siri doesn’t distinguish between calendars so you can’t reel off just classes, but you can ask about the day or the next ‘event’.

Personally I found this setup along with Apple’s ‘Up Next’ widget and Siri Apple Watch face to be way better than any other class management app I tried.

Many even publish iCalendar files ready to import. I actually ended up writing a custom scraper for my HS[1], after which the LMS provider actually wrote their own exporter fairly quickly.

I guess they weren't too happy one of the kids had to keep a database of all of their students' passwords... :P

[1]: https://github.com/teozkr/schoolsoftsync

Could you add a fake "Class" contact to each of your class/meetings, and then ask Siri "when is my next meeting with Class"? Doesn't exactly roll off the tongue I guess.

Funnily enough I started working on something similar in Uni as a class project and went embarrassingly far before realizing all the functionality I was building already existed in any of the major calendar apps. There should be a name for software developers thinking a customized software solution will solve their personal problems (time-management in particular)? I suspect it is the reason there are so many todo list apps.

I’ve been working on a Siri/Shortcuts-enabled App, and while I agree documentation is very sparse, the API is also very succinct. I only watched the WWDC session about the soup app and was able to get going from there.

Have you checked out the two WWDC talk videos on developer.apple.com about Shortcuts? They should provide a lot more context and connect the dots between a lot of the new APIs.

Thanks, I think I’ve seen them but should definitely check them out again. I might have missed a thing or two :)

I think we’ll see more useful docs before release, they usually release more and more up until the iPhone launch events.

They're already up, though a bit sparse: https://developer.apple.com/documentation/sirikit/

Can't you do that by adding to the iPhone calendar?

Do you mind sharing what the app is called?


> Also, Shortcuts don’t require the web to work – the voice triggers might not work, but the suggestions and Shortcuts app give you a place to use your assistant voicelessly. And importantly, Shortcuts can use the full power of the web when they need to.

> This user-centric approach paired with the technical aspects of how Shortcuts works gives Apple’s assistant a leg up for any consumers who find privacy important. Essentially, Apple devices are only listening for “Hey Siri”, then the available Siri domains + your own custom trigger phrases.

I don't get it. How is this different from Android? Android Actions [0] were announced before this. I think Assistant also works offline (with most voice commands + in voiceless mode).

[0] https://developer.android.com/guide/actions/

SiriKit and Siri Shortcuts are very different things.

SiriKit allows you to build Siri support into your app a la Android Actions, but Siri Shortcuts is designed to allow drag-and-drop end-user "programming" of workflows that can be triggered by Siri.

"Hey Siri, I'm on my way home" could turn on you thermostat up, order you a pizza, remotely trigger your IoT enabled kettle and start playing your home-commute playlist.

For the more advanced of us, Workflow currently allows doing things like calling arbitrary REST APIs and parsing JSON. I've reverse engineered the API of a local coffee-ordering app so I can one-click order my morning coffee.

Next thing I'm planning is my "I need a coffee" button which will get the nearest cafe, order me a flat white, and pull up the directions.

Yeah, Siri Shortcuts seems more like using Tasker with the AutoVoice plugin.

Siri Shortcuts is same as Google Assistant routines. Only they didn't make an extra app for it. Say" I'm Home" and a lo of Assistant Actions will get triggered.

To me, the biggest thing here is that it takes a completely different approach than what has been the traditional path for voice assistants. In the past, it was always the game of waiting either for a custom skill or app, or it was hoping that Google or Amazon would program in some logic for handling a particular case.

Shortcuts enables basically any end user with enough devotion and dedication to short circuit this. It doesn’t require them to be an app developer and it doesn’t require them to learn code at all. The most basic shortcuts can be created without any if-then-else logic while enabling so much.

I’ve seen my mom ask her Google Assistant to do things with “and” a lot and they just don’t work because command chaining hasn’t been implemented. But with Shorcuts, she could conceivably make a chain and designate a phrase to be conversationally equivalent with an “and” in the middle.

Instead of having to explain to family why Siri can’t do X or Y, I can just make a Shortcut or show them how, and I’ve solved the problem rather than explaining why it can’t be done.

> I’ve seen my mom ask her Google Assistant to do things with “and” a lot and they just don’t work because command chaining hasn’t been implemented

it is implemented. It just doesnt work often (or at all) for foreign languages.

I'am pretty sure this is the next big thing google is updating

> Instead of having to explain to family why Siri can’t do X or Y, I can just make a Shortcut or show them how, and I’ve solved the problem rather than explaining why it can’t be done.

At least with GA and ifttt you can as of late make your own phrases (and responders). Nothing for the nontechies, but some progress there.

Do you know if it's possible to get the Home to respond to some external event? Doesn't seem to be possible to use as a Then with IFTTT (for example "When I arrive home Then say Welcome Home")

I found this:


Which might let me write my own API to do something but it would be good if it were built in somehow.

> Do you know if it's possible to get the Home to respond to some external event?

Iam actually not sure if this should be the feature of my google home assistent while, ofcourse i see the use cases.

> https://www.npmjs.com/package/google-home-push

The dependincies show that this uses the https://github.com/thibauts/node-castv2-client. So iam guessing this only allows one to use the chromecast api of the google assistent device.

Iam pretty sure the google assistant api will not be open soonish.

> Nothing for the nontechies, but some progress there.

This is the important thing. I'm a fairly competent programmer and if I'm really missing out on something, I really don't worry about it. The problem is that most users aren't programmers!

> it is implemented. It just doesnt work often (or at all) for foreign languages.

Everyone has latched onto my comment because of command chaining, but this misses the point. The point was to give a concrete example of something that a user can now accomplish that they couldn't before. The Google Assistant cannot in fact build rudimentary constructions out of arbitrary system objects like Shortcuts can -- command chaining is just the example I picked.

> it is implemented.

AFAIK only for Google Home devices, not phones

Google has command chaining, as well as Routines - i.e. Shortcuts, but with intuitive UI.

Apple is moving Automator to iOS, this isnt voice UI, its imprecise programming: https://pbs.twimg.com/media/DhbmQJBX4AMEoho.jpg:large

The "voice UI" part isn't shown on that screen: you can assign a phrase to use to trigger these Shortcuts.

You can call Workflow/Shorcuts imprecise programming, but you're comparing apples to oranges. Routines don't let you arbitrarily group data and interact with it in a way that's friendly to non-programmers.

Have your tried Assistant Routines? They're in the Assistant Settings. Multiple actions get triggered with single command

> This user-centric approach paired with the technical aspects of how Shortcuts works gives Apple’s assistant a leg up for any consumers who find privacy important. Essentially, Apple devices are only listening for “Hey Siri”, then the available Siri domains + your own custom trigger phrases. Without exposing your information to the world or teaching a robot to understand everything, Apple gave Siri a slew of capabilities that in many ways can’t be matched.

Is there a good doc or video that delineates what data stays on the device vs. being sent to Apple for processing? E.g. how much of this functionality will be available if you are not signed into iCloud?

from the keynote, no information is sent to Apple for processing. Whether it'll work if you're not signed into iCloud is a different issue but from the way it's described in various places in the WWDC Keynote, it ought to work just fine without any data connection, like airplane mode and no Wifi.

Wouldn't the user's speech data need to be sent to Apple to convert to text, or identify the intent first?

We were doing speech recognition 20 years ago without any kind of networking. I dictated part of a term paper in 1995. Dragon Dictate I think is what it was called. You could even navigate the word processor menus and UI with it, or say things like “make that bold” and it usually worked! Just a bit less often often than Siri works actually.

Sure it was more of a novelty, and had to be trained on your voice. But that was 20 years ago.

Not only is Dragon still a product, it is owned by Nuance, which helped to develop Siri and drew on SRI research, https://www.forbes.com/sites/rogerkay/2014/03/24/behind-appl...

Hence Siri’s name.

Also, a purely local 'Voice Control' was standard in iOS 3-4, and it's still there in iOS12... just, buried.

Even Apple had this working a long long time ago in classic mac os. I used to launch all my applications via voice til I got bored of it.

macOS still has non-Siri voice-control under Accessibility Preferences.

And iOS has very good non-Siri dictation

I believe that still requires the use of Apple’s servers.

Having just double-checked, yes you are correct.

Is this a change for all Siri functionality? It currently requires internet on my iPhone.

I agree with this idea for one reason in particular: as a user, truly leveraging Siri requires a manual engagement - the experience is not conversational, it is not like speaking with a person. It is like using a tool - one needs to conscientiously adjust oneself to follow its rules and get it to work as desired.

Users actively taking control over how they use Siri (as in iOS 12 Siri shortcuts) will almost certainly encourage them to more conscientiously adjust their usage behavior patterns.

There exists a general assumption that a voice has a human-intelligence behind it, but obviously now not all voices do. This poses a tough learning curve, as evinced by criticism of Siri as ineffective or plain bad. Yeah, Siri won’t respond the same way your boyfriend will when you ask him to find some good Chinese food, or express some feeling. But Siri will excell at setting alarms, or adding an event to the calendar, or starting a meditation with Timeless. It comes down to matching the language to the tool / following a protocol. It comes down to manually engaging with precision.

> There exists a general assumption that a voice has a human-intelligence behind it, but obviously now not all voices do.

This is one of the reasons I've always been bothered by the over-humanisation of Siri. I feel there is a belief on the Siri team that if they make it feel more like a human with jokes and overly verbose responses then people will be more forgiving when it fails.

I've always felt the opposite, that it gives it a "incompetent super-intelligence" sort of vibe instead and I start to wonder if when I ask it to set a timer for the 3rd time if I'd be less frustrated if it just responded with a confused beep rather than a quip.

> Siri won’t respond the same way your boyfriend will when you ask him to find some good Chinese food, or express some feeling

I think Siri did pretty well when I asked her these:

Q: Find me some good Chinese food

Siri: I found four Chinese restaurants a little ways from you:

Q: How are you?

Siri: Very well, thank you!

Parent said:

> respond the same way your boyfriend will

Note that she ignored the "good" part of the query, has no memory of the place you went to last time that you did/didn't like, can't filter by your preferred mode of travel, and so on.

I think that that's the joke here.

That's about the only things Siri can handle without completely blowing it.

I use Siri the most when I’m driving. It comes in handy for a lot of things then...

Directions. - Take me to X.

Reading text messages - “Read my text messages from my wife” or “Read my last text message”

Reminders - “Remind me to call my X/do X when I get in the car/get out the car/get home/at Y”

Music/Podcasts - Play X/Play a song by Y.

Taking notes, calendar events, (What do I have to do today?)

Yeah, which is why I was surprised to see that the parent comment chose those as examples…

It's great to see Apple finally add some powerful customization, and for power users this is fantastic, but to average users this makes no difference. When it comes to normal queries, Siri is still far behind. Sure, with some extra work I can get almost any query to work, but most users don't want to manually code up all their queries.

Perhaps it's already there, but I wonder if shortcuts will be able to be shared or there will be a library for them. So someone can pre-define all sorts of stuff, then I as a "basic" user can just add the shortcut vs. having to "program" it myself.

Workflow (the mother of Siri Shortcuts, as I understand it) already has sharing functionality.

For example, this "get travel time to input destination" Workflow: https://workflow.is/workflows/ff987bcf0ad746d496415d7f4c75a8...

Shortcuts has this as well.

None of the digital assistants (I have most of them) do anything truly useful out of the box for me, so the “needing to do work to customize” isn’t an issue for me. I’m actually happy I finally can with Siri, bit late, but better than never.

> but to average users this makes no difference

Ordinary users will just pick prepackaged workflows out of the gallery.

And some I can imagine being very popular e.g. "Post last photo to Instagram".

Does anyone know of an extensive list of useful things that people typically ask their voice assistant?

As a non-user, and as someone who types faster than they speak, I find it hard to come up with compelling use-cases.

You can type faster than you speak on your phone?

I can't type that fast and when I'm driving I don't want to type at all. For me, being able to say "navigate to 123 main street" or "play the beatles" is very convenient.

Haven't looked for a list, that'd be a good idea.

So far all I do is:

"Remind me to wake up at 6am" or "Remind me to get my laundry in 30 minutes" (sets it in the todo list reminder app, since the alerts are better than the alarm app)

I think you have a lot of company in the "only use Siri to set timers" camp. I also do alarms and appointments, but nothing else is reliable enough.

Stuff like estimates for driving time with traffic, or simple trivia questions is something I very frequently ask my google home. Otherwise when I'm using the phone I almost never use it unless I'm far from my phone and I need to do something simple.

I posted a list earlier, but hopefully you don’t type while you’re driving.

But things like Reminders are quicker with Siri. Especially things like Reminders based on external triggers - like locations and getting in and out of the car.

Two of the issues with Siri is that Apple does not like to roll out features unless they work for everyone (across all languages), and discoverability of what it can do against disappointment in what it can't (really, you think defaulting to a Bing search is the best thing to do in this case?)

Siri Shortcuts lets you define your own local command and control phrases from presented actions, effectively solving both issues. It could (depending on per user investment) make Siri way more individually useful.

iOS 12 beta is amazing. It is extremely fast and feels like perfection. Hopefully apple will open the OS a little more for customization and apps to do more powerful things then it will truly be perfection.

How does it hold up for day to day use? Is it stable enough that could be installed on your main phone without many issues?

I've had my lock screen flip out a couple times, once requiring a manual reboot of my phone. The notifications list has graphical glitches. Some third-party apps are suspiciously crashy. It's probably the least problematic early iOS beta I've used, but like any beta I still wouldn't rely on it if missing a call would be a deal-breaker.

I have installed it on my one and only iPhone (X) since it was made available. No issues so far except a couple random resprings (soft restarts.)

I can’t decide how much of an improvement it is over iOS 11 though (which was already performing good enough for me.)

I think this year’s big winner is macOS, not iOS.

Fairly stable for me. Occasional freezes, sometimes requiring a reset.

Definitely feels faster, even on an iphone 8.

That’s all good but I hope that Siri can be improved to understand voices like mine (a native Greek so my English accent is not perfect). Siri understands 40-50% of what I say. Google (on iOS, so same hardware) is more like 80-90%.

Google's voice recognition is nuts. One time I fired up Google's voice search and fed it some of the MST3K names for Dave Ryder from Space Mutiny (e.g., "Punch Rockgroin", "Blast Hardcheese", "Splint Chesthair", "Big McLargehuge", etc.) and was surprised at how many of the searches it got right -- capitalization, spacing, and all.

To demonstrate what Shortcuts will be able to do, here's someone writing a C parser in Workflow (a predecessor to Shortcuts):


(One of the replies is by Ari Weinstein, who is currently on the Siri Shortcuts team)

Think of Shortcuts as a visual scripting language that leverages app functionality and ties into Siri.

Either you can use drag and drop to write your own scripts, or you can run scripts written by others.

You can run your scripts in a variety of ways, including a trigger phrase you set with Siri.

Here's a short video showing the creation of a very simple script in an older version of the program.


If you are a podcaster, you might create a more complicated script that converts a source audio file to MP3, adds MP3 tags and artwork to the resulting file and then uses FTP to upload the result to your podcasting network.

Another script idea would be to text someone a list of the blocks of free time you have open during a given workday based on your calendar data to help set up a meeting.

The possibilities are wide open. You can even tie directly into web API's.

But does anybody know whether it will finally be possible for me to ask Siri to give me directions to my next appointment?

I’ve been waiting for this obvious functionality for years.

Yep, you can do that. This weekend I made a shortcut that goes through your calendar, finds the next appointment, and then picks your maps app of choice to load up directions into.

I took it another step further and am working on a version where it looks at the prices on Lyft and Uber, presents you with the respective options, and calls a ride to your next appointment. Shortcuts in iOS 12 are a real fun little thing to play with, the most fun I've had programming in a while.

Happy to send it to you. I looked for your contact info but couldn't find any way to contact you, but feel free to reach out in my profile if you want.

I'm not running the beta, so I guess don't bother. I'm sure I'll have fun creating it myself in a few months. :)

Looks like this update still won't let you ask Siri to play a specific song on any service except Apple Music, which is definitely my biggest gripe with Siri and the iPhone at large since switching from Android. I would switch to Apple Music, except then there'd be no way to wirelessly cast/play music through my component stereo. Semi related, if anyone has solutions to this problem I'd be very appreciative.

A lot of people seem to mention Assistant Routines, but the major problem here is that it’s not available in a lot of countries outside the US (even not the UK..), meaning it’s not actually an alternative for the majority of the world.

The funny bit is though, that your Google Home will still recommend that you set up routines for things, except you just...can’t.

Since I don’t have it available myself, I cannot comment much on it, but while searching for why it wasn’t appearing in the Home app, I certainly didn’t get the impression that people are impressed by it :(

This is great for developers, but pointless for users. It doesn't change the user behavior, and the research shows people are using VA at home and nowhere else...a place better suited to Google Home and Echo.

There've been a lot of attempts by Apple to create richer hooks into apps like search integration, but they don't do much for engagement. There's some good ideas out there, along with anecdotal successes, but the interaction model isn't great and better app integration won't fix that.

I use Siri on my phone all the time. Do you have a link to that research? I haven’t heard that before.

Given that iOS will be prompting people with possible Shortcuts based on what they do frequently I can easily see people starting to adopt this.

I don’t think you have to use Siri to trigger these, it’s just an obvious easy/fast way. Users could still use the shortcut app or widget to do it.

As I’ve been watching some of the Apple community on Twitter since this started to go into beta they’ve already produced some fun/surprising stuff. This seems like it’s going to be a BIG deal.

This is the best I can do right now. It isn't where I originally read it but google is failing me. https://creativestrategies.com/voice-assistant-anyone-yes-pl...

What it comes to is the people on HN don't represent most people using iPhones.

With more laws being passed about using your phone while driving and seeing that most cars have Bluetooth integration, it should become more popular. I use Siri all of the time while driving.

Good news for HomePod and CarPlay users, and I suspect that the ability to have a short phrase, trigger a sequence of complex actions will make Siriuse out and about more common.

I personally turn off all notifications on my iPhone (except for a few select apps) and do not use Siri.

I can tell you that the Siri suggestions in Search really annoyed me (usually use search to find... apps, not to be fed all the crap from random apps).

I want a clean, non-intrusive experience. Apple should focus making the hardware better and to get the software out of the way (don’t make me think about it) instead of pulling all this crap on its users in the name of innovations.

> I can tell you that the Siri suggestions in Search really annoyed me (usually use search to find... apps, not to be fed all the crap from random apps).

In iOS 11 the delay for searhing for apps is unacceptable. instead of displaying the apps found right away (substring match over all apps should be instantaneous), it waits until all the crap from Siri and other random apps is fetched.

Agreed. The only way I find & launch apps (besides the 4 on the dock) is by swipe-down from the middle of the homescreen, to reveal search, then to search for the app's name. This is very analogous to how I launch apps in OSX, namely command-SPACE, then start typing the application name.

It's extremely frustrating when iOS takes forever to find an app, or worse proioritizes all kinds of other garbage before the app's icon. If I type "waz" and I have the Waze app installed, I sure as hell expect to see the Waze icon instantly at the very top. Ideally, after typing just the "W".

You'll be pleased to know that the results in iOS 12 instantly reveal the first four-matching installed apps. (Instantly meaning "I can't really determine how long it takes, because it's probably somewhere under 100ms")

All of the other stuff loads very quickly. Mail results are very very quick (over 3 inboxes containing over 100,000 mail items), and then a-bit-less-quick for the Siri search suggestions.

I'm actually shocked at how much iOS 12 improved responsiveness.

also was very surprised, its been one of the most notable perceived performance jumps between major releases probably ever

you can actually make it behave like this if you turn off "Search & Siri Suggestions" for each app: https://www.imore.com/how-access-and-use-siri-search-suggest...

Which makes it even more annoying when you try click an icon you see and it’s replaced when Siri finishes, making you open a different app by accident. Terrible UX.

> Apple should focus making the hardware better and to get the software out of the way

I…what? They clearly do, considering that they have the best smartphone hardware on the market and have what’s generally considered to be a good mobile OS? This feature is something that power users have been asking for years–if that’s not innovation in your eyes I’m not sure what is.

a big part of the improved Siri integrations is helping group the notifications and give you the option to shut them off. So it really is trying to get the software out of the way if you want

I switched to car play and use Siri a lot more then I used to. For most things it is fine. Others... this could help.

One thing I would like to know is why Siri cannot answer basic questions about what music is on my phone. For example, “what albums by New Order do I have on my iPhone?”. That does not work but “play album Substance 1987 by New Order” does.

Siri on CarPlay is useless for me. Maybe it’s my situation (RP British English, driving in Sweden) but it usually takes 10 attempts to get it to recognise a track I want to be played, and getting routing information is easily double that for anything other than ‘drive home’. I one tried to get it to route me to ‘Malmö’ (third largest city in Sweden) and gave up after 30 attempts. I honestly have no idea what I’m doing wrong, but I suspect nothing and it’s just useless software.

I'm not sure if this has changed yet, but until recently Siri regularly failed to recognize artists, albums or titles, which often happen to be english, if the device/siri was set to a different language. In my case that's german and I can't count the number of times I had to pronounce the artists I wanted Siri to play in a ridiculously german way.

English is so ubiquitous that, in my opinion, all Siri queries should be checked for both local language and english.

Yup. Mixed language is a pain. Set to English by default but when I ask for something in Japanese or Mandarin it does no work well.

As another RP speaker using CarPlay, it works well. However you do have to heavily anglicise placenames when abroad. So I just tried ‘ get directions to Mal-moe’ and it worked first time.

As someone who speaks in a strong Cultivated Australian accent, I fear Siri will never get me.

Ubicomps colonial impulse at work. Siri may not adapt to speakers, but speakers will adapt to her.


To run an Alexa skill on iPhone you need to install the Alexa app and the Amazon app and use your Amazon app to handle the voice part. Then open the Alexa app to view the card. I hope I'm understanding the hype of the article correctly in that I can build a native app and use Siri to interact with it.

This is a great example of how owning the entire stack from CPU on up to user interface allows a company to do things that others just can't. Whether it's any good or not, well, time will tell. But it's not even possible for any other company (maybe Google with the Pixel, maybe).

I don't see why this could not be possible on Android. All Google needs to do is build an API for Assistant that other apps can use to 'donate' frequently used actions/intents. Assistant can then make suggestions to users based on its own analysis of users' activities in the same way Siri does.

Not only is this possible in Android, but this already exists in Android:


Apple could feasibly go further and do integration with macOS Siri as well down the line I'd imagine - Google can't do that nearly as deeply.

Why? Can you explain that reasoning? Google already shipped full integration of Assistant into their ChromeOS.

I am seeing a trend in this thread, that people think that Google Assistant doesn't have something whereas in reality it has already implemented all and more than what Apple did this year. Either, they are biased and are making assumptions, or they use Siri and are again making assumptions . Partly, another reason might be that Routines is within the Assistant Settings so people haven't discovered. Still, seems odd at a site like this. Frustrating.

Especially since the next macOS release will run iOS apps. That may be the approach Apple is planning to bring this to the Mac.

The next version of macOS won’t run iOS apps. Apple is working on unifying some of their frameworks to work across platforms.

OK, thanks for that clarification. That does make more sense.

Cortana has had an API like this from the beginning, so I don't see why owning the entire sack is necessary.

I just want macros. I would love to have a single button that does in OS functions, like having a button for EQ settings, or to launch a specific website in a browser (such as a different weather service)

So basically they exposed an API to add to Siri a new skills. I wonder whether other platform already did simillar things? CMIIW, Cortana already exposed an API for us right?

Really they’ve combined two things. They took an app called workflow, which they bought, would you can use to automate things on iOS. They extended it dramatically, and then added the magic ingredient of making it possible to have Siri call those actions that you define. Plus some AI stuff to suggest shortcuts for things you do frequently.

Alexa is quite popular because it has so many skills the people have created for it. That’s one of the reasons people claim Siri has been “behind“ and Amazon’s Echo devices have been doing so well.

I don’t know if Cortana or Bixby support anything like this.

"hey Siri, destroy all evidence!" phone begins wipe

I’m sure you’re trying to be humorous, but just in case you’re not, this isn’t something that Shortcuts lets you do.

I also think this is going to revolutionise the TouchBar.

The ability to have contextual shortcuts could be pretty powerful.

I think this is iOS only for now, but it seems like a very obvious thing to port to the Mac.

It’s already there in the form of Automator and AppleScript.

That’s a different way of automating things. Automater seems to have been ignored, and AppleScript is quite old and people have been worried about for years.

So, is it ready to do the complex task just like google and amazon echo?

Does Google finally allows user to configure top-level actions, instead of the silly "Ok Google, ask $someapp to do something" (where what I want is simply "Ok Google, do something")?

Or even better just "$act" when I'm alone in the room!

The "Ok Alexa ask $someapp to …" prefix is disgusting.

Saw someone else comment on a different TechCrunch article here on HN about how these articles about Apple read like press releases. I really feel that now.

As a PM, this article is what I would write as sample press coverage I'd like for a product release.

Interesting to see that the author is a former PR member of http://my.workflow.is/ + has one recent article for TechCrunch.

Oh hey looks they were purchased by Apple: https://techcrunch.com/2017/03/22/apple-has-acquired-workflo...

Quick Edit: I'm not saying that this isn't good technology, rather I'm more concerned that TechCrunch is taking contributors who have ties to products being reported on and may have interests that are not being communicated to readers.

TechCrunch does this all the time, most of the "news" blogs do – I worked at a company that employed a PR firm that basically put our product release collateral directly onto a bunch of those blogs – it made me very uncomfortable.

Yeah, this is native advertising and it's Everywhere.

"Native advertising" sounds very much like the MBA-term for what I would call "deception".

(I'm not accusing you of anything, just making commentary)

Workflow is the technology which is becoming the Siri Shortcuts editor.

Shortcuts is basically Workflow with private API access.

It should be noted that this is a Contributor article (common on weekends), not an article from the main TechCrunch staff.

It should also be noted that post-acquisition Workflow (where the author worked at and is noted in the author blurb) was the precursor team to Shortcuts.

Company issues press release on new product. News companies relay information to readers. Readers complain the info just looks like a press release.

But as a reader, I do want to know this stuff.

You can get it from the source. No deceptive marketing tactics required.


It's fine if they make clear it's just a press release, but this is a press release masquerading as journalism.

To be honest, I don’t feel like these sites even know what to ask. This may be the Gell-Mann Amnesia effect in action.

Where else did you think Tech content comes from?

Much of it is independent enthusiasts dissecting public information.

Specifically because this is an announcement that is a copy of functionality shown for Android already.

It's Apple catching up.. which if this was a journalistic article would be mentioned.

"Apple announces same feature Google announced 6 months ago"

Sounds like Automator for dummies.

A dumbed-down Automator, maybe. Please don’t insult the users of Workflow.

I'm not insulting anyone. "X for dummies" is a book series.

Which doesn't make it any less insulting.


It’s to highlight the “Keep”/“Manage” UI that iOS 12 will have to silence apps that abuse notifications.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact