Hacker News new | past | comments | ask | show | jobs | submit login
What Does It Take to Track a Million Cell Phones? (thehftguy.com)
254 points by siva7891 on July 19, 2017 | hide | past | favorite | 84 comments



I've designed systems that do this on continental scales (i.e. hundreds of millions of cell phones simultaneously, in real-time). The devil is in the details and non-trivial; this is not a "an intern and 6 months" job. Mobile telemetry is not nearly as ideal in practice as assumed here and it typically takes a couple years to learn how to handle the numerous peculiar artifacts of that data that will damage the quality of a naive implementation. Reconstructing a model of the population from the cleaned data that approximates the ground truthing is surprisingly difficult and requires quite a bit of clever data science and maths.

It takes a lot of work and expertise to build a population model from mobile telemetry that approximately reflects reality. Far fewer people know how to do this well than you might assume by looking at the requirements for a naive implementation. Even most mobile carriers have limited ability.


Carrier iQ demonstrated GPS level tracking of 100m+ phones nearly 10 years ago.


> the numerous peculiar artifacts of that data that will damage the quality of a naive implementation.

Do you have examples (or a link/reference to something that has such examples) of those types of artifacts?


Have you posted about this at length somewhere? If not, care to elaborate on what it took to design this system?


I have not written about it. Most of the difficulty and complexity, from my perspective, is in the data science and processing required to construct an accurate population model, which requires additional data sources beyond the mobile telemetry. I designed the custom database platforms (easy for me) underneath which supported the online data processing.

It isn't that difficult technically, if you have experts doing it, it just requires far more domain expertise to do correctly than I think people expect. You also need to be willing to write some of your own tooling to deal with the data efficiently and effectively.


A great PoC is fairly doable by an intern.

Increasing the precision by tenfold will likely increase the effort by a hundred fold or more. Just because it can be made harder and more expensive doesn't mean it has to be.

At the end of the day, a bit of precision doesn't change the nature of an effective planet scale mass surveillance system.


I'm gonna take a wild guess, and say that the NSA has a monopoly on the talent for this field.


The Snowden leaks confirmed NSA has the ability to conduct co-traveler inference.[0] In other words: finding mobile devices proximate to a targeted mobile device, based on similar vectors. Perhaps even making associations in absence of targeting via patterns in device proximity over time.

It probably gets real interesting when they're trying to distinguish between various modes of transit, such as a city bus, an Uber/Lyft/taxi, and a private vehicle not participating in rideshares. Of those examples, the latter would suggest the highest degree of association.

Pure speculation, but I wouldn't doubt they take a peek at ridesharing data for co-traveler inference purposes. Knowing if a rideshare driver is on or off the clock would be incredibly valuable information in that context.

[0] https://www.washingtonpost.com/apps/g/page/world/how-the-nsa...


Graph reconstruction from space-time event data goes far beyond the above in terms of capability. You can infer relationships between people that never co-travel, infer that people have been places that are not in the event data, etc by stitching together large numbers of orthogonal event streams over long periods of time. It is straightforward to distinguish between various modes of transit analytically. The "metadata" that simply indicates an event in space-time is far more valuable analytically than the data because it is possible to reconstruct so much with it that isn't contained within the data per se.

I was doing all of this five years ago, the capability has been around for a while.


Do you ever worry that this data might be used for bad purposes?


The handful of people I know that are real experts at the data science are all in the private sector.


Mapping is wide and common industry, just like web or finance.

The NSA only recruits in the USA. It's a fraction of the talent pool of the planet.


Its actually smaller. At minimum you have to be a naturalized or native born citizen of the USA for many jobs in the US Government. For the NSA, add the requirement of having a current clearance or the time to get one (worst case years).


NSA violate the constitution on a daily basis, I'm sure they have a loophole to get whatever talent they need


Care to elaborate?


Most of this talent is in the private sector. Primarily around traffic.


> which requires additional data sources beyond the mobile telemetry.

So it really isn't technically difficult. It's just lacking in data?

In your original post, you made it seem like it was extremely difficult. From my perspective, it seems like child's play from a technical standpoint.

> It isn't that difficult technically, if you have experts doing it

But your top comment implied it was extremely technically difficult.


The are some things that are technically very difficult if you have no domain expertise, which applies to the subject matter. Most people that try to do this without experience fail in practice, it takes a lot of time and effort to become competent at it, but once you figure it out it is repeatable without too much effort.

There is a much smaller set of things that are technically difficult to execute even if you are highly experienced at doing it -- each time is a challenge. This is not one of those cases, it just has a severe learning curve.


This article finally answered a question I've had for a while: how they can do decent triangulation with just two towers.

> "We said that a tower covers a radius around it. In practice, this is sub optimal so that’s not how it’s done.

> Instead, a station is usually split in 3 independent beams of 120 degrees."

So it's not the intersection of two circles anymore, it's the intersection of two arcs, which will likely only have one intersection point, unlike circles.


A step beyond that: some can get fairly accurate with a single "tower". We had at minimum 3 BTS on every site, generally at 120 degree spacing but this could vary, but each BTS had multiple antennas and would do some rudimentary triangulation based on signal arrival times to each antenna. We could generally get within a couple of hundred fee.



Note that a lot of the information from the BTS is already available to anyone who "asks nicely".

The mechanism that provides roaming is based on trust, so anyone connected to the SS7 network can query the location of any phone in the world and even intercept its calls. Just say to the home carrier "hey this phone is roaming on my network, would you be able to send me all of its calls and texts?".


There was a talk/demo I saw a few years ago that went into great detail about how this works. I remember it was given by a German. Anyone know what I am talking about?

Edit: It was a video.


I remember something similar, it was a presentation given at CCC in Germany. Tried searching for it on their YouTube channel, just to discover said channel was terminated for breaking YouTube ToS?!

That's really sad, their channel had videos of all the past talks from the CCC, an amazing resource that's now gone.

I think this is the one you might have been talking about: https://www.youtube.com/watch?v=lsIriAdbttc

If it's not that one then it's probably one of the "Running your own 3G/3,5G/GSM network" talks.



That's the one, nice find.



You probably were thinking of the CCCen YouTube channel? That never was an official channel, but a copy-cat that just uploaded all the CCC stuff, and was banned for it.


Yeah, I now realized this too. Still weird as CCCen had the videos organized in a more useful way, thus popping up way above in search results. At least good to know the stuff is still there.


Probably Karsten Nohl. He has done many SS7 talks, so this may not be the exact one you saw: https://youtu.be/BbPLscWQ1Bw


Those are not what you're talking about but check out the slides from P1 Security about this issue: https://www.slideshare.net/p1sec/hes2010-philippe-langlois-a...


Which is also why SMS makes a poor authentication factor.


"The mechanism that provides roaming is based on trust, so anyone connected to the SS7 network can query the location of any phone in the world and even intercept its calls."

One of the first things I did after opening the article was to search for the string "ss7" ... was disappointed to see it mentioned zero times ...


The phone companies already do this, more or less, as is shown in court cases where cell phone records are brought in as evidence

A decade ago that data was a little more iffy (i.e. it was more a good estimate (typically within half a mile or less) than a true location), but with a combination of more towers (and therefore more data points), the ubiquity of smartphones (which check in more often, are doing geolocation related things, etc), and better / more accessible/well-known analytics tools, is think even 6 months would be a generous time-frame


> The phone companies already do this, more or less, as is shown in court cases where cell phone records are brought in as evidence

You can also arrange to buy this information. I worked for a place where you could request someone's location by phone number. There were a lot of contractual obligations around us having the phone owner "allow" us to do that, but no technical ones.


Yep this is more than possible. It's done in marketing. If you have a short code to "text for coupons" There's a high chance that they're doing a ping against your number.


How did you access the data? Via a simple REST api?


> How did you access the data?

We signed a contract, fulfulled our obligations, and paid them money

> Via a simple REST api?

I really don't remember. It might have been SOAP or something. It was an HTTP-based API, but I don't think it was REST specifically.

There was also a 30s or so delay from request to when we'd get the location back.


Interesting, thanks!


Was it a phone operator specific system? Like could you get the location of any phone number or only AT&T's, for example.


I believe our provider had contracts with the big 4 in the US.


What's the accuracy on that location information? Is it down to the meter/10 meter/100 meter?


It wasn't GPS accurate, but accurate enough for our usage (monitoring volunteers). I want to say it was under 10m and over 5m on average. It's been about 6 years since I've dealt with the system, so the details are a bit fuzzy.


repo men usually have someone able to do this for $50-200


Here's one such testimony from Aaron Hernandez's murder trial, where his cell phone pinged a tower near the scene of the crime:

https://www.youtube.com/watch?v=Am8izKu5ZSU

https://en.wikipedia.org/wiki/Aaron_Hernandez#2013_murder_of...


Q. What Does It Really Take to Track a Million Cell Phones?

A. Sell outsourced billing solutions to the mobile carrier. (See AMDOCS)


Please don't abuse this :)

https://github.com/ernw/ss7MAPer


Could you give a high level of what the ss7 network is and what this tool does? I'm not very familiar with this area.


Inrix, TOMTOM, and a couple other have been providing this data as a product for at least 2 decades. There was an early provider that lead the space, but the name of that company eludes me at the moment, may have been actually purchased by inrix.

Most of those companies focused on 10m+- resolution and focused on path data to build traffic speed data for local news companies.

Only cost a couple million bucks and an extensive partnership agreement to get into the space.

There is a lot of data washing in those agreements, mostly related to preventing reverse identification.

Airsage has taken it to the next level in the more recent past with GPS based anonymized data, but data with EXTENSIVE history. The Airsage product is zip code and smaller resolution and can provide months to years of location history of an anonymous cell phone id.


Seems easy to mitigate with a tweak to the network connection order: https://news.ycombinator.com/item?id=10985599


To answer the 'Call for comment' about intersecting complex shapes... one simple, fast, general, approximate, discrete method is to use OpenGL to get your GPU to do it for you. Just render the shapes into an off-screen framebuffer, using appropriate logic ops or stencil planes, then read back the final buffer to get a bitmask of the possible positions. To reduce to one estimate of position, find the centroid of the largest contiguous pixel group (flood-fill different seed ids; histogram pixels; select region id with highest count).


I did the math a while back, don't have the notes at the moment, but scaling an AWS system I built enough to collect 600m points of data each minute and compute on data within 100ms and retain it for a few minutes would run a bit over $10k usd/mo to operate. I operated it at about 3m events/min with a good amount of compute per including ip to geo lookup... Zookeeper would be the only bottleneck in this case assuming good enough partitioning.


Using AWS is the problem here, and that's why it's so expensive. You could do this on bare metal WAY faster and more efficiently, and then you own the hardware forever, for the price you paid to do it for a month with a third party.

AWS does not scale this way, you can't just throw more resources at a problem and expect to be profitable.


Agreed it could be done cheaper over long term, just wanted to share about an actual prod system. This also had 3x replication via Kafka to avoid stampedes etc if anything failed and keep going with an at-least-once guarantee.

In my opinion tho, even that price point is pretty accessible to keep tabs on all citizens with that resolution which was my hypothetical case.


Put everything in Google BigQuery or Google DataProc.

Cheaper than both and hardly any maintenance required.


You would own the hardware until it died which is not forever.


Just because it quits working doesn't mean you don't still own it. Might wanna lay off the green leaf, bro. Hahaha


If my phone is powered off can i be tracked?

What if i remove the battery?


If your phone is powered off or in airplane mode it is not supposed to emit RF and thus cannot be tracked. This is a matter of trust, so if your threat model includes high end threats, the assumption that it follows the normal requirements may be invalid.

If you remove the battery, it will be unpowered and unable to emit RF and thus cannot be tracked. While it is theoretically possible to hide an auxiliary battery in your phone, that would be very hard to achieve, especially in modern thin phones. If your threat model includes highly motivated state sponsored actors, this is could be achieved.

If you put your phone in a RF-tight enclosure (e.g. metal box), the RF energy cannot get out and thus it cannot be tracked.


Modern thin phones don't let you remove the battery, so they could easily continue to transmit RF.


You can put the phone inside a RF radio blocking bag.


The bag is only going to attenuate any signal, not block it (block would imply infinite attenuation). Whether or not the attenuation provided is sufficient to prevent an adversary from receiving the signal I'm not sure. I definitely wouldn't bet my life on it. I'd want a pretty thick metal box with proper seam gaskets as a minimum.


Yes, if you're in a city you're tracked constantly by dragnet surveillance.

Questions like this one aren't very useful without a threat model. Who are you trying to prevent tracking you? If it's just your phone carrier then obviously turning off your phone and removing the battery will render it inoperable. But now you don't have a phone, and your location info wasn't very useful to begin with anyway unless you were involved in an operation where you need to conceal your location.


This goes into I Don't Know What I Don't Know Dept. (sorry Mad magazine)

A patron was telling me that the way the GPS is so accurate is because it uses the phones radio... Didnt know that either. (i mentioned to him that there is one spot in MA where our google directions are off by 1/2 mile.. Same place every trip.)


It's called Assisted GPS: https://en.wikipedia.org/wiki/Assisted_GPS

It speeds up getting an accurate location, doesn't provide a more accurate location than GPS.

GPS on is generally accurate to around 5 or 6 metres. That's the technology on it's own.


this method will only work with GSM network because 1.GSM networks doesn't verify BTS 2.GSM encrypt keys are cracked and all over the internet. Users of other kind of networks should not worry about this kind of hack. Actually here in China a fake BTS a.k.a 伪基站 can be easily purchased online.


> Users of other kind of networks should not worry about this kind of hack.

The article is not about a hack. The article is about how the cell company or state-level actor can leverage the connectivity information that is required for any modern cell service to operate.


"this method will only work with GSM network because ..."

Yes, that's true - but remember that all of our 3G/4G phones are also 2G phones and that if you disable/jam/overpower the 3G/4G signals the phone will very happily revert down to 2G, possibly with no encryption, and possibly in a way that you have to be very careful to even notice.

There are quite a few attacks that are mitigated by 3G/4G in theory, but in practice you're still vulnerable to because your phone can be downgraded to 2G by an outside actor.


Interestingly, the 2G networks are being (or have? I can't remember which) shut down entirely here in Australia.


It works on all generations: 2G 3G 3.5G 4G LTE.


I was hoping there would be some information in here about what cell phones leak that a third party could pick up on. For example, tracking the mac address in beacon packets, or the cell frequency equivalent of that. Of course if you can hook into the base stations you can track them.


Who else noticed the Winamp icon at one of the diagrams?

https://thehftguy.files.wordpress.com/2017/07/tdoa.png?w=300...


Now you know why my Nokia 3310 is switched off most of the time.


Nothing worthwhile ever takes an intern and six months. Ever.


One has to wonder why interns even bother. /s


A: A deeply sociopathic mindset. See the requirements section for details.


This is "off-topic" to the attention span of HN. I recently realized this when someone mentioned corruption as the main problem of some issue; yeah, that's "very general", but nevermind programmers, not even a power user would keep looking for program bugs or worry about the order they do things in when it's already been confirmed that the memory or PSU or something like that is faulty. Anyone worth their salt would stop those other debug activities to focus on correcting that, while someone who absorbed these broken parts and/or their acceptance as part of their synthetic identity will do anything but that.

Also see Hannah Arendt, Erich Fromm, et al. This other mediocre shit? This being a "hacker" in a goldfish bowl? That's for those who can't hack the adult responsibilities of the 20th and 21th centuries. those who fell asleep, those who already fell off. They will downvote you today and look the other way as drones take care of you tomorrow, don't hold your breath for anything else. Anything else, any future worth a fuck, has to be done despite their wishes, or rather, despite where they are drifting.


I have no intention of contradicting that, but if so, that makes it all the more disturbing the organizations where this if not only standard operating procedure but just one tool among many similar ones.


Somebody please explain this line from the post:

Radio waves travel at the speed of light 299 792 458 m/s.


When you turn on a lightbulb, the light coming from the bulb travels at the speed of light, which is 299,792,458 meters per second. Radio waves also travel at this same speed, since it is also light.


It's worth mentioning that neither light nor radio waves travel at 299,792,458 m/s through atmosphere. That's the speed of light in a vacuum.

An interesting question is whether radio waves, gamma radiation, and visible light all travel an identical speed through atmosphere.

The reason light slows down in atmosphere is because it hits atoms. It travels between each atom at the speed of light, but when it reaches an atom the radiation is absorbed and re-emitted, which introduces a delay. So the question that I'm wondering is: do different frequencies of radiation get absorbed and re-emitted at the same rate as every other frequency? That would give it identical speed. But if the absorption is different then presumably the speed would also be different.


> An interesting question is whether radio waves, gamma radiation, and visible light all travel an identical speed through atmosphere.

They don't. The index of refraction tells you about how the speed of light is changed by a medium, and the fact that it's different for different colors of visible light is why you get effects like rainbows.

This stack exchange question might interest you if you'd like to read more: https://physics.stackexchange.com/questions/196803/why-is-th... .


Both radio waves and light are just differing frequencies of Electromagnetic Radiation.


GBNST (guilt by non-scientific thinking): Radio waves = sound = speed of sound, therefore wtf sound = light, but now I 'see the light.'

After having a second cup of coffee I did a doh! and realized conflating 'radio' with sound is non-sensical, but I wonder if I'm in the minority thinking this way. Or maybe it's just my non-tech background!




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: