Hacker News new | past | comments | ask | show | jobs | submit login

Disclosure: am a dev working in the MCN business.

The "private data" the app collected, is used, for most part, fingerprint the unique user.

In every MCN app, there was a huge fake user problem. If an app collect zero identifiable fingerprint, then a spammer can easily fake millions of views and manipulate ranked content. The app developers are asked think clever to collect every piece of info they can, while spammers spent night and days spoof every parameter in a virtual machine or even on a matrix of remote controlled real phones.

For example, if a iPhone 11 user logs in, but only with screen resolution of 320x240, is it legit? I have caught tens of thousands of fake users with simple checks like this. However the tricks expires pretty quickly, you have to move on with new feature checks, together with decision trees and bayesian networks.

Some of the fingerprint collecting SDKs are even using native code to check some ARM specific instructions to tell if the device is fake or not. The parameters check had to be done in every important API calls, or spammers can easily pretend be good citizen during parameter checking process and swap the session to a cheaper VM/phone or spam the targeted API with scripts.

Chinese companies all have their own team dealing with frauds or spamming on daily basis, the same way as everything can be faked in China.

Think cyber attacks from Chinese IPs are bad? Now imagine doing business in China and all users of your product are bots, what methods do you have to filter out the real human users? Good luck.

Many ads network SDKs are collecting user data in the same way. Otherwise it's easy to spoof fake clicks and page views.

I not stating if it's the right or wrong thing to do, I am just saying it's how things are done in current state of business.

"Other apps you have installed" is also extremely revealing about the user's interests and thus very valuable for ad serving. And equally privacy invasive. Think about pregnancy tracker or religious scripture apps. The OS should not expose this data within the sandbox of other apps by a different developer.

> "Other apps you have installed" is also extremely revealing about the user's interests

Yes, ads SDK across different apps can provide detailed aggregated information. Also apps promote each other, the "channel distribution" is huge business and relies on apps acknowledge each other.

I highly doubt many of the Tiktok reverse-engineering result may turn out to be some thirdparty ads or anti-fraud SDKs which Chinese companies use often.

Yes, the ads SDKs are another thing that are ripe for "disruption". And by disruption, I mean the DPAs fining and every app developer that uses any of them until the SDKs stop doing that.

This is useful information. I assume the worst about Chinese technology. Your explanation helps me understand and forces me to examine my biases.

I think it's best to try and eliminate prejudice as much as possible, but it is weird when it turns out that the reason your server is down is because the Taiwanese version is being attacked by Chinese IPs.

What’s MCN?



It's an established term to describe org affiliated with youtube/twitch/tiktok/instagram etc.

It's an acronym which fell too deep into its niche; a check word used to advertise one's legitimacy as a person on the other side of the curtain.

Mainland China? I'm confused too...

HN seems guilty of frequently using obscure acronyms. Is this an SV culture thing? Is it that hard to type things out or use a text-expanding app?

What’s SV? (seriously)

Silicon Valley.

People deeply immersed in some technical niche or subculture tend to use acronymms and other jargon unconsciously. That said, unexplained acronyms (I also wondered and found no likely definition in a DDG search) are an exceeding poor communication practice.

As far as I know it's an American cultural thing to use an acronym whenever possible.

Particularly military-sounding ones for some reason, MILACS perhaps.

Are fake users really a problem or is the business model fingerprinting users?

Any kind of system where you can translate http requests to money will have a problem with fake users. Might be click fraud for ads, sending spam to web forums, or liking YouTube videos to get the videos recommended.

For some of these operations, you can just work off of the content. Spam messages need to advertise something, so the text needs to look very different than for legit posts.

But something like an upvote or like? It's a single bit of information, you can't say if it's legit or fraudulent in isolation. So then you need to come up with additional signals to cluster on from wherever you can.

Some of it will be behavioural (these 10k users only liked these spammy videos), but a lot of it has to be environmental.

Posting blatant spam message got removed by NLP pipelines, spammers evolves to be more implicit, for example, the avatar of the user contains an image of crypto currency ad, then the spammer "liked" your video. You got a notification, and you noticed his/her avatar.

Had to OCR all those god damn avatars.

I can confidently say, having rubbed shoulders with the botting / gold farming / hacker community in China that it is HUGE. Like YYYUGE! back when the whole ride hailing industry was still in a subsidy war, you had drivers who were spoofing their rides to get kickbacks from didi/Uber. Pinduoduo suffered a pretty big loss a while back from not accounts scalping subsidies. I have friends in my wechat who used to share their huge walls of remotely controlled phones. So yes it is a huge problem in China. There is even a phrase for it, 撸羊毛

I worked on a social platform some years ago, when it the population was somewhat 'mature' after inital growth spike, then the majority of new signups were fake. "Cleaning up" fake accounts was a big part of the job for the customer service team, as well as automation tools to suppor that task.

If visibility on your platform is somewhat commercially relevant, then you will have lots of people pushing fake accounts for various goals. And if you ignore them, then the more technically competent ones will set up offers to sell access to fake accounts on your platform, so that they will be abused also by actors who don't have the ability to create thousands of fake accounts on their own.

It's not a problem if your platform is not profitable. Or the crowds is not worthy to spam.

Do you know why Apple or Google doesn't provide this kind of unique id ? Applications would be far less invasive so it's in their interest

> Do you know why Apple or Google doesn't provide this kind of unique id ?

This picture explains


Chinese can recycle real iPhone/Android devices at minimal cost. Anyone can rent a fleet of real devices, then RCE software can execute any kind of task you want on a real app on a real phone. So even Apple or Google provide some kind of unique id, e.g. iOS already have something like identifierForVendor, the spammers emulate a real user's app download, registration, login process, thus obtaining a real ID. So what can you do about the ID?

So the obvious solution is to check for more user information beyond a simple ID. Your IP, mac address, wifi router address, other process the OS is running, device parameters, etc. and privacy is f??ked in the process.

They have (had?) uniqueIds. But they don't solve the problem of identifying real users. You can simple fake them or take a real one from somewhere. Information-Density of a single value is far to thin to be a reliable indication on whether someone is a real human or just a script.

The worst part about this is that it does not require GDPR consent because it is genuinely necessary to store such client data (to detect fake accounts). Same as you don't need GDPR compliance to store someone's address in an online store just to deliver a package to them.

PS. Someone correct me if I'm wrong.

For any personal data collection, explicit consent from the user is required. An online store can only collect data necessary for the purpose of the service. So it has to ask for my explicit consent to legally collect personal data required to deliver a package like for example my name, address and phone number, but can't ask me about say, my marital status or religion.

Any ideas as to why Apple and Google aren't held accountable for providing trusted means of identifying users as human beings? They aren't banning TikTok because they know that there isn't malicious intent and that TikTok is doing the dirty, but necessary work.

Applications are open for YC Winter 2021

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact