This is a great idea and in my opinion should be a best practice for any company. We're actively working on enabling this functionality for our core data aliasing engine at the company where I work.
The idea is pretty cool when you start to think about adding self-destructing properties to individual pieces of data, so reasoning about data type and entropy becomes a risk modeling problem.
A concrete example: imagine if bank account numbers, credit card numbers, emails etc have self-destructing properties where there exists an outer shell "pointing" to the data but the underlying data is destroyed (using techniques like crypto-shredding et al.). The outer shell would have canary properties that work in real-word systems but since the underlying data is destroyed, all we would be left with are canary properties without the underlying data leak.
A good example of some companies that offer something similar:
We started something similar with BreachInsider (https://breachinsider.com) to allow businesses (or I guess individuals?) to do this themselves with minimal overhead or resources. The idea being that they sprinkle these ‘users’ throughout their databases and see where they show up, and be alerted if they ever get contacted or show up somewhere unusual (Pastebin etc.)
We ran something similar, firing ‘insiders’ across many of the top 100 sites and services, to spot breaches (either in the traditional sense of security incidents, or lapses in privacy for end users).
> Our further investigation reveals that Facebook does not fully enforce its policies [9] that require app developers to disclose their data collection and sharing practices as well as respond to data deletion requests by users. Of the analyzed apps, 6% apps fail to provide the required privacy policies, 48% apps do not respond to data deletion requests, and a few apps even continue using user data after confirming data deletion.
I remember suggesting to our security admin to add a 'honeytoken' user to our production database in case it ever got owned. At least you would know you were owned.
Great idea! Decades ago, we would do this when selling lists of (voluntary) subscribers. We would seed the lists with our own secret monitoring names and addresses in order to track the use and misuse of the lists. It's a shame that these simple things need to get reinvented constantly. Sometimes, people see differences when they should see similarities.
Not that different from Android apps or Windows apps. Facebook also refers to them as apps so calling them "Facebook apps" is a pretty accurate description.
I'm not saying it's inaccurate. I'm saying it's ambiguous. Literally every search result I got on the first page for "Facebook app" was about a mobile app written by Facebook to access a Facebook-owned service (fb, WhatsApp, Instagram).
Since some people might not read the article and just the title, it seemed worth calling out.
Edit: ah, the title was edited from "Facebook app" to "third party social network app". So never mind :)
Lead author of the paper here. I am encouraged to see such insightful discussion on our work. Excited to discuss and address any questions/concerns that you anyone may have.
We are also publicly sharing a disclosure page (https://github.com/shehrozef/canarytrap). This page contains details of third-party apps which are detected as misusing user data or violating Facebook's TOS in our work.
If Facebook really cared, they'd offer to share an anonymized email proxy when connecting for the first time, like Apple does when signing with Apple on a website that supports its SSO.
16 out of 1024 apps... Surprisingly low. Lots of these are games, presumably funded by ads and/or iap, scrounging a few more bucks from email lists fits right in.
Hey,
Lead author of the paper here. I would like to highlight that these detected apps amount to more than one percent of monitored apps. Considering Facebook has millions of apps, there could be tens of thousands of apps potentially misusing users data.
The idea is pretty cool when you start to think about adding self-destructing properties to individual pieces of data, so reasoning about data type and entropy becomes a risk modeling problem.
A concrete example: imagine if bank account numbers, credit card numbers, emails etc have self-destructing properties where there exists an outer shell "pointing" to the data but the underlying data is destroyed (using techniques like crypto-shredding et al.). The outer shell would have canary properties that work in real-word systems but since the underlying data is destroyed, all we would be left with are canary properties without the underlying data leak.
A good example of some companies that offer something similar:
- https://canarytokens.org/generate
- https://github.com/thinkst/canarytokens
- https://canary.tools/
Pretty cool technology that can really go far.