Hacker News new | past | comments | ask | show | jobs | submit login
Cracking Siri (applidium.com)
526 points by nolanbrown23 on Nov 14, 2011 | hide | past | web | favorite | 103 comments



Using an almost amazingly simple procedure a few weeks ago, I worked a bit with @tmm1 on figuring most of this out. We actually got custom commands working via both proxy and on-device interposing based methods: http://mobile.twitter.com/tmm1/status/131520489049960449


A little googling shows some interesting info about the ACE request/header. From skimming, it looks like a header compression method for VOIP on cell/lossy connections.

Slide deck: http://www-rn.informatik.uni-bremen.de/ietf/rohc/ace-033100-...

Whitepaper: http://w3.ualg.pt/~bamine/B3.pdf


Looks like guzzoni.apple.com is named after Didier Guzzoni (http://www.ai.sri.com/~guzzoni/), an employee at SRI.

He's also listed on an interesting Apple patent that was only filed a few weeks ago, "INTELLIGENT AUTOMATED ASSISTANT"(http://www.wipo.int/patentscope/search/en/WO2011088053).

Some very interesting implementation details there.


I'm kinda wondering why Apple bothered using HTTP for something that really doesn't use anything recognizable as proper HTTP. Was it just for HTTPS?


Did Apple re-implement the protocol to be HTTP-like, or is it a holdover from before the acquisition of Siri?

Especially when you are a startup, building the perfect protocol isn't your biggest concern. Being able to reuse already existing components like load balancers and connection libraries allows you to get your MVP out sooner.


They're possibly just reusing their existing HTTP request libraries for creating the request. But bear in mind that with an HTTPS connection, once the secure session has been established you can send anything you like over it. There's nothing to enforce strict HTTP over an HTTPS connection.

That's how tools like Corkscrew can tunnel SSH (and practically any other TCP-based protocol) over an HTTPS connection.


Probably so it'll work through strict proxies.


Since it's HTTPS, those proxies can't see the traffic anyway, so as long as they used SSL on port 443, they could use any protocol on top.


They potentially can; commercial firewalls can man-in-the-middle HTTPS traffic with a locally signed and organization-computer-trusted SSL certificate.


Yes, you're right, in fact I found a few weeks ago that even Squid can do that.


Fiddler also: http://www.fiddler2.com/fiddler/help/httpsdecryption.asp

Great for debugging third party https stuff.


How does that work? I thought all verification of certificates was done in the browser...


IT installs the corporate MITM certificate on all of their computers so the browsers accept them as valid.


Would this still affect the iPhone 4S though? If I understand this all correctly, I think that corporate IT would have to install the self-signed root cert on your phone for Siri to be MITM'd. There's no reason for your phone to trust it otherwise.


Unfortunately, Siri does not use the system wide proxy. At least it does not on my iPhone. I tried intercepting the traffic with sshmitm which did work for all other iOS services (e.g. game center) but not for Siri. I'm wondering how these guys sniffed the traffic.


Did you read the article?

When the proxy failed, they "ressorted (sic) to using tcpdump on a network gateway". They eventually had to "setup a custom SSL certification authority, add it to our iPhone 4S, and use it to sign our very own certificate"


I have read that, but they used tcpdump only to detect what kind of traffic Siri sends after failing to use a normal HTTP proxy. Setting up a custom SSL certification authority is exactly what sshmitm does - but it does not (yet) support transparent proxying. Somehow they have redirected traffic for guzzoni.apple.com to a fake server that acts as a man in the middle (probably simply by using their own DNS), but what I wanted to know is what software they used to fake that server.


Not sure what they used, but this software should be suitable:

http://www.thoughtcrime.org/software/sslsniff/


They did mention using their own DNS: "In that case, the simplest solution is to fake an HTTPS server, use a fake DNS server, and see what the incoming requests are."


It's possible to do transparent proxying using iptables on Linux. Also, as ahlatimer mentioned, pointing the phone at your local DNS server and adding records for all the relevant domains would work, too.


The question that springs to my mind is not 'how can I play with this?' but 'Are Apple bringing Siri to the desktop?', seeing as it appears there's nothing specific to the 4S hardware in how this works.

I'd quite like to be able to add calendar entries or tweet without moving to another application.


Apple has clearly had a difficult time keeping up with early demand for Siri services as it is.

I think keeping it limited to the 4S looks a lot more like a operational necessity at this time.

Given that, If Siri appears on the Mac between major OS releases, I imagine it might be only for new hardware (i.e. a Macbook Air with an exterior Siri button and purple LED) at first as well.

Eventually (once they can scale Siri well enough), it could be released as a modestly-priced Mac App Store app. I bet it would be more pricy than FaceTime ($0.99 US) though.

I presume that's what they'll end up doing for existing iOS customers, pegging Siri for iPhone 4 and recent Touches at a price that keeps 4S customers satisfied to get early access and/or "free" Siri for the life of their phone.


> an exterior Siri button and purple LED

That's really gross, and exactly the kind of design choice Apple never makes.


Ha! Very true, I didn't even picture it and indeed I can't. I don't think they would actually put any Siri hardware features on a MacBook in the first place, so I went off the deep end there.


What makes you think thy've had a tough time scaling with early demand? I'd think that this scales horizontally pretty well, given that each request is largely stateless and there's no interaction between users.


> each request is largely stateless

Negative. Siri remembers the context of your conversation.


But according to this article, the server seems to only translate speech to text, with the whole natural language processing and AI happening on the device.


I'm talking about the Siri outages at peak times, which seem to have subsided for now but indicate that they weren't ready for even the demand they've had.


I think the 4S-specific hardware is the improved proximity detector. This allows you to raise the phone to your ear and speak to Siri instantly. I suspect the older phones' proximity sensor isn't fast enough to support that feature, so Apple figured they'd go 4S only. (It can't hurt that they sell more phones that way.)


Actually, Apple included an additional proximity detector to make sure Siri works when you bring the phone to ear. Here's iFixit report:

http://www.ifixit.com/blog/blog/2011/11/09/little-sister-sir...

This means that Siri won't provide optimum experience (pick the phone to ear and Siri is ready to take the command) for iPhone4 and older versions.


You can do that already, as long as you don't mind using your antiquated fingers to do so. Alfred[1] is my personal favorite, but Quicksilver[2] will do the job too (and is free).

[1] http://www.alfredapp.com/

[2] http://qsapp.com/index.php


Thank you, I'm actually a user and big fan of Alfred already, but I second your recommendation to everyone else.

But this is the future and I want my jetpack/siri :)


I wonder how Apple is taking all of this? Is Applidium risking their developer license?


I would doubt it - there's nothing illegal about reverse-engineering a protocol.


Sure, but there's nothing illegal about Apple kicking someone out of the App Store on a whim either.

Doesn't matter if you are breaking the law or not, plenty of legal apps get rejected. Apple sets their own terms outside of US law.


Apple has to comply with its contract just as the developers do. I did just check the agreement and either party can terminate with 30 days' notice for any or no reason, so they could theoretically terminate.

Given this is completely out of the scope of the App Store or even the SDK (contrast with the security researcher who got unapproved code executing), however, I don't imagine Apple will feel the need to terminate. I guess we'll just have to wait and see.


Apple usually avoids shitstorms or backpedals if they cause one – but sometimes they don't.

It's not unreasonable to assume at Apple won't do anything but it's risky.


I didn't see anything in this article that mentions that the natural language understanding is done in the cloud. May be I am missing something, but I don't understand why everyone is jumping to the conclusion that the NLU is also done in the cloud and downvoting other's comments that said so.

From what I've seen, Siri sends compressed audio to the cloud which translates that to text. What happens to the text and how does that translate to action? Where is this being handled? Is there any proof that this is done in the cloud?


It'd be interesting to see whether or not Apple changed the Siri protocol since the acquisition. Was this originally how Siri worked when it was independent?

Because Siri has roots in government contracting (it's named after SRI International, and was originally funded by DARPA) I wonder if the roots of the obfuscation start there rather than at Apple.


I don't see any obfuscation here, just a compressed binary encoding.


I believe the DARPA project was purely textual, and speech recognition was only added after it was spun off to make it more consumer friendly.


Cannot upvote this enough. Stuff like this is the reason I read HN.


I wonder if there are any characteristics about the microphone in Apple devices that the servers could check the audio against to prevent this sort of a thing. There should be a way to somewhat distinguish the device used to record a stream given Apple's control over the devices on which Siri runs and overcoming that would be hard enough for anyone to bother.


Maybe, but what's the point? If you try to run a service on top of this, you'll have to make so many requests that you'll either have your ID banned or you'll need to buy so many iPhones that you might as well contract a speech-to-text service to some company.

If you're just using it for personal reasons, why should Apple care?


What I had in mind were not services but rather Siri clients for non-Apple hardware, which I assume Apple would not be particularly happy about. When Siri comes to iPads and Macs, owners of a much broader range of devices could take an ID and use it, for instance, in an unofficial Siri client (should one be created) on an Android device. But then again, I may be way overthinking this.


Probably. Take note of the fact that OS X editions don't use a serial number. You can very easily share them with friends and family and online. Same goes for iWork, however some of the more expensive software does use SN.

If you already bought an iphone/mac/ipad (in the future) that has Siri, then I don't think apple will care much if you use siri on other devices. However what is really useful with siri is where it talks to the os layer and other applications. That kind of integration isn't all that easy to do.

So if someone writes an app with the integration to the os and apps (calendar, sms, phone, phonebook...) and decide to use Siri (illegally), then I think they deserve a medal or soemthing for their hard work for porting the siri front end to another platform.


Duplicating Siri involves much more than just speech-to-text -- the language understanding is the hard part. Heck, Google has its own speech-to-text servers.


Really interesting. I'm curious what their tools look like but the github repository the article links to is currently empty.



<spolier> guess who doesn't verify the root CA. Think of all the fun to be had with a Siri man-in-the-middle


Missing the point, I think. There's no security bug here. The application isn't responsible for verifying the root CA in typical security models (though some, like Chrome, do something similar -- that's how the compromised Dutch CA was discovered). The idea is that the CA list is populated by your platform vendor and you trust it.

The trick here was that Siri was asking for an HTTPS connection to a named server, and you can't MitM that without having a signed cert for that server. So they added a new CA to their local (jailbroken) iPhone platform data and signed a cert for the Siri server.


No, it's not jailbroken (there's no jailbreak for the iPhone 4S). This is just a feature of the iPhone: if you embed a SSL certificate in a mobile provisioning profile, it will add it to the system list. This is mostly intended for enterprises who might have a special SSL cert for their intranet, but it also works for this purpose as well.


And for anyone thinking about ways to fix that problem, the researchers could have hooked SSL's read/write calls using a DYLD interposing library. Once you get superuser access on the phone, you can't trust your code to be safe.


There is no jailbreak for the iPhone 4S (at least not publicly available), so any hacks like this must be done from outside the device.


Have a look at the HTTP-proxy software Charles (No affiliation of mine.) In the last question of the FAQ the monitoring of SSL-connections within iPhone-apps is explained: http://www.charlesproxy.com/documentation/faqs/

There is no bug. This is what SSL will do, when you install additional certificates.

(Oh, and it's a fun way to find new web services to play with.) :-)


Yeah - Fiddler2 http://fiddler2.com/fiddler2/ does this too. It's really neat for debugging http based API calls.


You can add a root CA to both linux and os x. No problems. Though on an iPhone you'll have to jailbreak it first...so I guess apple didn't think of that, or they don't care.


No, you don't. Installing a root ca an iOS device is as easy as sending it via mail to the device and then clicking on it (with a few more clicks to confirm).

edit (because I can't reply): It does show a big warning and you have to enter the device unlock code to do this, so it should be reasonably safe.


Hmm. That sounds like a big security hole. Phishing attacks in particular. Though I guess the extra clicks should discourage users.


It is necessary — some places have custom non-public CA's, for things like S/MIME and internal servers.

On the other hand, I'm pretty sure Siri doesn't have to communicate with your company's internal servers (and my paranoia already suggests a malicious IT department, reckless — and probably illegal — as that would be), so the code should, in my opinion, accept only specific CAs.


Compartmentalization would make sense. Installing a root CA in the email app would only work for the email app.


Anyway, this is a proof that siri is a pure cloud service and as such may work even on 5-yo Sagem...


Not exactly. The text-to-speech is done in the cloud, but the hard part (algorithmically speaking) is natural language processing, which apparently (I don't know for sure) is still done on the phone.

I don't know what Apple's excuse is though, but limited processing power is certainly not a problem.


I think you have it exactly backwards - the iPhone 4S/Siri speech-to-text/natural language processing are done in the CLOUD. The text-to-speech is done on the phone itself. My (non-Siri of course) iPhone 4's Voice Command stuff is COMPLETELY on the phone itself, and would do TtS of my contact list and Artist names, etc.


The article says, "The iPhone 4S really sends raw audio data". At least for Siri, TtS occurs on the cloud - not sure where the text processing > API occurs though.


Sending raw data and _receiving_ raw data are NOT the same thing. It's been clear that the iPhone 4S sends raw data to the cloud, based on people sniffing the network shortly after release.


From the article:

> The iPhone 4S really sends raw audio data. It’s compressed using the Speex audio codec, which makes sense as it’s a codec specifically tailored for VoIP.


I don't know why this is being downvoted -- I guess others are reading something different than I am?

There are three parts to Siri:

1. Speech-to-text (parent has it backwards but that's what he means, obviously)

2. Text-to-intent (referred to by parent as NLP)

3. Intent-to-API calls

Obviously, (1) happens in the cloud and (3) happens on the device. It is still unclear where (2) happens but if the cloud service only responds with text, it seems that (2) happens on the device.

And (2) is still a hard problem by itself.


I would LOVE to backward-engineer Siri's speech-analysis algorithms. Confidence scores help, but it doesn't look like any other modeling data is available?


That's all happening on the servers, so it's not the sort of thing you'll really ever get access to.


Is there a possibility to craft a Siri server reply with malicious code? Shouldn't be too hard for the applidium guys to attempt (maybe even use a fuzzer?)


Maybe, but then you need to manually add your own root CA to the iPhone, or the cert verification will fail, so it's not a security issue.


Might be a way to jailbreak the phone though, no?


Only if there's an exploitable bug in the Siri client.


"Seems like someone at Apple missed something!"

What did Apple miss? (in other words: how could they avoid this, assuming they wanted to avoid such crack)


I love reading investigative coding stories. Always fun to take a peek into secret--especially high-profile--code.

Thanks!


The remote server is located at apple-compu.car1.charlotte1.level3.net.


can the server-side be a watson like computer cluster ? just curious...


down for me :(



> The iPhone 4S sends identifiers everywhere.

So if I'm reading this right, Apple is sending UDIDs over HTTP?


From the article it looks like UDID's are being sent over HTTPS


Not UDIDs. UDIDs are 40 hex digits; the Siri ID is 30.


HTTPS


No one is at all concerned that this is a hack?

I know it's interesting stuff, but I'm curious what "rights" Applidium have in publishing this information.

With this information, (if I'm not wrong) it wouldn't take long to simply DDoS Siri...

Or port Siri to Android (effectively stealing IP).

(I have no bias either way, just pointing out, if someone figured out how to reverse engineer dropbox, so you could use their space, without a dropbox account, would we all be going "wow, this is so cool!" or would we be crying out "this is such an irresponsible hack!")


Hacks are admired here, not condemned. Reverse engineering should always be allowed. This information doesn't make it possible to DDoS Siri or port it to Android as each request requires a unique iPhone ID; Apple can easily filter out unauthorized requests.


Maybe You admire them. But I don't. Are we going to hack/crack each other's apache servers from now on? Or are we going to build businesses that will solve problems for everyone?


A hacker is "[a] person who delights in having an intimate understanding of the internal workings of a system" and this site is called "Hacker News".

And yes, we are going to help each other improve the security of our systems. If we don't, someone malicious will.


"As a result, we are able to use Siri’s recognition engine from any device. Yes, that means anyone could now write an Android app that uses the real Siri!"

Are they just lying then?

There demo said they got siri to work with no iphone involved (in the end).

Also... DDoS would still be effective, no? (the server still has to 'filter')

> Hacks are admired here

You sure about that? A lot of China-bashing happens here based around it's 'Hacking' of U.S targets, I've never seen admiration of such things.


They're not lying. Anyone could write an Android app that uses Siri, but it would require the ID from an iPhone to work, so distributing it would be problematic.


Yeah, unless someone happened to have a huge cache of real device IDs...

http://www.readwriteweb.com/archives/dear_iphone_users_your_...

(I hope the app store magical-vetting-code is smart enough to ensure the new hit app "Somewhat Annoyed Birds" isn't capable of fishing around in the phone for the Siri ID and sending it back to the developers website along with the high score you just got...)


Why would that be problematic? You write a server that provides clients with an iPhone ID that hasn't been banned from using Siri yet, and then you make the app contact that server to get the ID.

I'm sure Apple would send a nastygram, but they send nastygrams if you scratch your phone and don't get it repaired quickly enough. There is no law against telling other people your phone's serial number. There is no law against sending an HTTP request to an HTTP server for non-malicious reasons. So really, I don't see much of a legal problem.


Where would you get the valid IDs? You can't share the same ID between very many users, or Apple will ban it. You can't buy an iPhone for every user of your Siri app. iPhone users won't willingly give you their IDs. Are you going to somehow obtain and use the IDs of unsuspecting iPhone users without their permission? That is likely illegal and definitely will get you sued and booted from Android Market.


My guess is that Apple will ban an ID after a day or two. My other guess is that you can just keygen the ID.


You are confusing cracking and hacking.


I see so, "cracking" is admired here. "hacking" condemned.

Got it! :)


From the article:

"The iPhone 4S sends identifiers everywhere. So if you want to use Siri on another device, you still need the identfier of at least one iPhone 4S. Of course we’re not publishing ours"


> 'No one is at all concerned that this is a hack?'

You're asking that on a site called 'Hacker News' if I'm not mistaken. It is indeed a 'hack', a clever and skilled exploration of technology carried out with perfectly good or neutral intent.


That's right, Hacker News is about compromising security and cracking software.... How did I miss that all this time?

My initial post (which has been down voted out of existence) is a valid point.

I don't actually care whether Apple get hacked or not. I was curious what people thought of publishing a 'hack/crack' like this.

Lots of rationalising going on, but to me it still seems wrong. I'd hate people to leverage my work (even for 'personal use') without my permission. Interesting how 'hackers' are happy to hack other peoples stuff, but cry out when it's their own stuff getting hacked.


The "hacker" part of the title is ironic. Any hacker news here is purely accidental.


Anyone with Wireshark or tcpdump could have already seen what IP address the Siri client communicates with it.

Any competitor's jealous of Siri aren't learning too much to find out that the client uses HTTP, compression, and binary payloads in what it sends over the wire to the Siri service - the magic is server-side. The client has to communicate with the service somehow.


> No one is at all concerned that this is a hack? > I know it's interesting stuff, but I'm curious what "rights" Applidium have in publishing this information.

In the United States, reverse engineering is entirely lawful. It is even made explicitly clear in the DMCA that reverse engineering is allowed. Which part are you specifically worried the most about?

> With this information, (if I'm not wrong) it wouldn't take long to simply DDoS Siri...

This is just scaremongering. Knowing an IP address is enough to DDoS a server. Are you suggesting that it's somehow unethical to independently publish the location of a publicly-available server? Are you also going to indict the DNS server that gave it to them?

> Or port Siri to Android (effectively stealing IP).

Theft relates to physical property. I'm not sure what would be stolen here as Apple still controls the Siri server and requires a unique iPhone 4S ID to be used. Again, though, reverse engineering for the purpose of interoperability is legal in the United States. There's no way to frame this as stealing.

> (I have no bias either way, just pointing out, if someone figured out how to reverse engineer dropbox, so you could use their space, without a dropbox account, would we all be going "wow, this is so cool!" or would we be crying out "this is such an irresponsible hack!")

This is a red herring. Your proposed situation suggest a security vulnerability of some kind wherein Dropbox hypothetically allowed someone access without paying. No such vulnerability to Siri was found; all requests to the Siri server were made using a valid phone id and returned valid, official responses.

The only thing that's unclear to me is if the anti-circumvention portion of the DMCA extends to technology used but not created by the author e.g. Apple did not create SSL but they use it to secure transmission - does this make spoofing an SSL certificate an instance where the DMCA's anti-circumvention law would come into play?


The denizens of Hacker News are not concerned that this is a hack, no.




Registration is open for Startup School 2019. Classes start July 22nd.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: