Hacker News new | comments | ask | show | jobs | submit login
Socioeconomic group classification based on user features [pdf] (uspto.gov)
289 points by isp 9 months ago | hide | past | web | favorite | 109 comments

This is written so broad... facebook basically has just patented statistical classification if its output are socioeconomics?

How can this be legal? Facebook did not invent machine learning. How can they patent it?

It hasn't been challenged yet, and it likely wont be. Basically, Facebook (and other companies) patent stuff so if they are sued they can counter sue. Essentially, causing a kind of nuclear war for the companies (think Apple and Samsung).

The whole point is to avoid conflict. Even if it's not enforceable it might cost a few hundred thousand to million dollars to fight it.

and non-nuclear-powers (companies w/o a large portfolio of patents) just get arbitrarily fucked? what a system!

The best solution, given these nukes are artificial, is to get rid of them. With real world nukes, you can't put the genie back into the bottle, but in this case you can remove the government distortion that is causing this arms race.

Having legal "nukes" can also be a good thing, even for David when they're going against Goliath. I was recently involved with a legal dispute with a large corporate landlord. I was able to line up enough potential illegal civil and criminal acts that I'm pretty sure their lawyers just decided it would cheaper to settle with me.

that's nice. doesn't really inform the decision on software "nukes" though, does it? it might be that regulation/'distortion' is appropriate in one case, but not the other...

In some ways, patents losely correlate with R&D spend. So when a new play enters a mature field, they'll lack patents and end up having to license patents from companies that did the hard R&D.

It's a way of allowing R&D externalities to be captured.

The system doesn't work all that well, but that is the intent.

Except when some douchebag grants a patent on the double click, or a phone with rounded corners.

I think these companies just don't get involved in nuclear confrontations of this kind - it's a costly buiseness; this is a game for the big suckers. You still might want to get a few patents of your own in order to prop up your own valuation. (If you want to be bought out by a bigger player)

These companies unwillingly get involved in nuclear confrontations of this kind as soon as one of their products ends up looking remotely like it could eat some of the big suckers' lunch.

BigCos routinely use patents against small players. It isn't a huge war only because the small player can't fight back in kind.

On the contrary, patents are one of the few mechanisms available to small players that 'BigCos' respect.

That's how MAD worked during Cold War as well, isn't it?

Can happen. I once worked for a large engineering company and a startup competitor emerged with a cool new innovation.

We very explicitly discussed that we would either aquire them or use the patents portfolio to shut them down.

This is not a patent. It is an application for a patent. The claims may be rejected during the application examination process, for example, if the examiner determines that the claims are obvious in light of the prior art.

Let's be honest, the patent will be granted. Patents are easy to get. Then everyone else ignores the vast majority of them in the real world.

It depends on the art unit processing the application[0]. If your application is classified as a 'software-related business method', your chances drop dramatically.

[0] http://www.ipwatchdog.com/2015/05/21/hardest-easiest-art-uni...

> Let's be honest, the patent will be granted.

Is there any kind of public comment period where the patent's claims to novelty could be disputed? I feel like a lot of the worst patents may be granted because no one else was watching.

You might even be able to find prior art someplace you don't expect, like some sociology academic paper.

I'm not if it's at all effective, but Stack Exchange has a product for that https://patents.stackexchange.com/

That was once true about software patents, but it's no longer true. I've seen some legitimately interesting / intellectually surprising software patents get denied not because they weren't novel or industrially useful, but because they lacked an inventive step.

Inventive steps are hard to come by if math isn't patentable and the fundamental principles of software creation aren't really changing.

Even if it gets granted, there's prior art. The big TV and movie analytics firms have been extrapolating socioeconomic data from viewing patterns since at least the early 2000s

How money does it cost to apply for a patent?. Because if the cost is negligible, then an exploit would be to flood the patent system with dumb patent application until one of them is accepted. Hell, maybe a machine learning algorithm could be trained to produce trillions of gibberish pdf files, until one of them makes sense, then you can claim the invention.

My patent cost about $10k all in. I filed the provisional application myself; then I found an investor who bankrolled hiring a lawyer.

I recently did not renew the patent and have allowed it go into the public domain.

The patent system requires people to review the patents before they are accepted. Overloading patent reviewers with garbage patents isn't going to do you any favors. It'd be a bit like DDOS'ing yourself...

Broadness is not a bar to patentability, as long as that breadth doesn't read on other claimed art or is obvious. There might be other reasons to invalidate the patent, but that's not one of them.

Given that the parent said "facebook basically has just patented statistical classification if its output are socioeconomics?", I think the commenter's point is that the breadth does include things that have massive amounts of prior art and are thus obvious.

You can patent specific applications of existing technology. But this particular patent may not pass the Section 101 abstract idea hurdle. “Use machine learning to determine socioeconomic status” is pretty abstract, and could be unpatantable under the Supreme Court’s Alice case law. Under that case law, something seemingly directed at an abstract idea may be patantable if it has an inventive concept (reflected in the claims) that goes beyond the abstact idea.

If the claims pointed to specific features and specific processes that gave you more reliable determinations than prior-art methods, that'd probably be patentable. (Leaving aside whether it should be, it probably would be). But machine learning presents an odd situation. With machine learning, you don't iterate on an algorithm, improving it until it generates reliable results. You feed data to a machine learning algorithm, which then infers e.g. which features are particularly probative of the desired classifications. I would wonder whether even specific applications of machine learning would be patentable under Alice, since the computer rather than the patent applicant makes the inventive inferences.

"Use machine learning to determine socioeconomic status" is basically the prompt for every econometrics term paper.

This is analogous to patenting image classification, in the abstract. It's completely ridiculous.

Patent a very specific model for determining socioeconomic status? Sure. Patent the process in general? No fuckin way.

> Patent a very specific model for determining socioeconomic status? Sure. Patent the process in general? No fuckin way.

That's my point. If you're using machine learning, isn't the "specific model" just the output of the machine learning algorithm? And if so, even that may not be patentable, because the applicant didn't create the inventive parts, the computer did.

Not necessarily. I'm not talking about the weights of a neural network per se. I'm talking about something like a specific neural network architecture, or a specific feature engineering procedure.

You're supposed to write a patent as broadly as you can while still being a non-obvious improvement over the prior-art. That lets your monopoly apply as broadly as possible and make it more likely that you can successfully prosecute an infringer.

This seems like Facebook's trying to break more into the realm that the big credit agencies have occupied re: data products.

All the big credit agencies (Experian, Equifax, Transunion, etc) have products that cluster people into socioeconomic groups, with ridiculous names.


Examples from that brochure:

"American Royalty" "Small Town Shallow Pockets" "Full Pockets, Empty Nests"

Letting them hide behind their branding as "credit agencies" is ridiculously naive in this day and age. They are surveillance companies. Faceboot's only difference is being a relative newcomer compared to the incumbents.

This entire surveillance industry is an escalating crime against humanity.

If your looking for the descriptions for the categories, here is one link: http://missioninsite.com/PDF_Files/Mosaic_Descriptions%20Gro...

Most commenters seem unclear on how patents work in the US, so here's a brief summary:

1. Patents starts out as patent applications (what this document is). A patent application by itself doesn't give you any legal rights.

2. After waiting a year or more, your patent application gets reviewed by a USPTO examiner. The examiner either approves it and grants you a patent contingent on you paying a fee, or (more likely) tells you what's wrong with your patent application or cites prior art that already does what you claim and asks you to make changes.

3. All parts of the patent application except the claims section (which is the very last part) are for background and explanation only. They don't determine what you're patenting or whether someone is infringing your patent. These sections are supposed to be clear enough that someone with ordinary skill in whatever field your invention comes from could read these parts of your patent and reproduce your invention.

4. The claims section (which starts with "What is claimed is:") is very carefully worded to be as unambiguous as possible and carefully lay out exactly what the invention is and what its scope is. Any good patent agent or attorney will make these claims as broad as possible while still being patentable so that you get the broadest monopoly possible and have the greatest chance of successfully prosecuting an infringer.

5. The claims section is the only part that matters for determining infringement. The text in the rest of the patent and the figures show what the invention is so that someone can reproduce it, but they aren't used to determine the scope of your invention.

It appears that FB [edit: is attempting to patent] a questionnaire that allows them to guess your socioeconomic status from, for example, where you live, how old you are, and how many internet-connected appliances you have. Another argument for overhaul of the US patent system, and to delete the FB app.

Facebook patent application (US 20180032883) for "classifiers input information about a user and output a probability that the user belongs to a given socioeconomic group".

Relevant Twitter thread: https://twitter.com/WolfieChristl/status/960630738256367617

Copy-and-paste friendly version of the patent application, but without images: http://appft.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&Sect2=H...

Limited news coverage, e.g., http://expressnewsline.com/2018/02/05/facebook-to-develop-sy...

Full disclosure: repost of https://news.ycombinator.com/item?id=16315606 (from 70 days ago). Reposted by explicit request of the mods. May now be exceptionally topical in light of the Facebook revelations that have occurred since.

A patent doesn’t grant the assignee the right to use a technique. It grants them the right to keep others from doing so. Whether or not this patent is granted, a separate mechanism, such as regulation or legislation, would be necessary to prevent Facebook from employing the methods it claims — unless someone else, who could afford to litigate it and wouldn’t license to Facebook (a ”non-practicing non-licensing entity” (NPNLE), or “benevolent troll”?), had patented it first.

This could be really useful for targeted public Health messaging and services.

TV stations need to broadcast a certain number of hours of public good programming. Online ads should show a percentage of ads for public good.

An algorithm like this could do good since it may have better sensitivity and specificity than self reported data but would need to be used by non-FB people to make sure that it’s really for public good rather than has the appearance of public good.

... and ketchup counts as a vegetable in a healthy school lunch. Good luck with those "public 'health' messages".

No, it doesn't. That proposed guideline, as ludicrous as it was to even be proposed, was rightly rejected and never went into force.

And this is why an expert propagandist understands that you don't need to prove anything. You just say it (preferably with some emotionally-wrapped slogan), and three to ten years later, people will simply recite what you've said as though it is fact.

Of course, you can pad it in weasel words and boring facts. Nobody will remember those parts of the message anyway.

Wait, is this controversial because of what are they doing with the data? Or because they are trying to patent something it shouldn't be patentable ?

It’s controversial because people don’t understand patents.


Fig 2 basically encompasses what most machine learning production services look like, minus the socioeconomic classifier. Should we all start trying to patent our classifiers now?

The claims section (at the end of the document) is what is used to determine infringement. The rest of the document (including the figures) just illustrates what the invention is and describes how to make it.

And the claims section basically says 'collect online data, compare it to a trained socioeconomic classifier and classify said data into socioeconomic groups'.

take any invention patent from before computers. any.

a machinist can just say "all the steps to produce that device can be done in a lathe. should we all start to patent any part we turn in a lathe now?"

and while I agree that all patents are bad, the answer still is Yes.

Personally, even with the recent news, I'm still having mixed feeling about the whole ordeal.

On one side I'm not against using/collecting data for the purpose of better content (=better engagement) or more targeted ads (still better than looking at ads that have nothing to do with me). Though I am for it to be anonymized (ie separating the personal data from the generic things like names from items bought) however that is pretty much useless from the moment you enter your name anywhere as they can basically fingerprint the data and match it up to you anyways.

On the flipside of this, I don't like the direction this all is going - sure I like tailored content, but at what point do we reach a threshold where people are just shoved content down their throat since they cant find stuff outside these mechanisms to broaden their view and interests.

More related to this topic, as long as the data is there, someone will analyze it, and socioeconomic targeting has been done for a while already (you don't market 8K curved suround sound TVs to 90 year olds) this is just a step up. However I'm a bit scared of when we reach the tipping point on content in terms of targeting: what we might want to see vs what it might want us to see.

I'd rather have untargeted ads.

When I read a printed tech magazine, I get the ad targeting that may be interesting for me, just because of the content I'm reading. There is no other information flowing back to online advertisers, or, to be more realistic, person tracking companies, which is what they really are. The content should be enough to determine the ad category, which would suite your desire for more engaging ads.

There's no reason why online publications can't do that as well. If I read a blog post about food, show me an ad about the coolest frying pan on earth and also one about the one with the best price to performance to ratio.

Show me ads for a Porsche even if I can't afford it, I may in a couple of years, and in the meantime the ads may have told my subconsciousness that this Porsche should be my goal.

I don't mind ads like that so much because they don't hijack my thought process.

If I'm researching about food, ads about where to buy ingredients or kitchen tools will be integrated into my thinking lines without problems, and that's fine. I might even click them if they seem legit and honest and fit to what I'm trying to do.

If you start showing me, say, ads about the newest movie or holiday destinations, if I allow them, they will derail my thoughts into different areas. I hate that, and that is a large part of why I block ads too.

edit: There is even a place where I like advertisements. In the work area, please do inform me what new developments your company has made that can help me with my developments, and do inform me what your company sells, roughly, so I can get back to you when I do need that UV CCD.

This is possible for focused websites (like the website about food you mention), but not really for something generic like Facebook. Unless they start guessing what you are interested in based on your behavior, which is kind of what they do now.

Then maybe the business model for these genericas should be toast. No entity has to have the ability to be able to know me in such a fine grained way as to be able to even target me out of context and influence me for consumerist (or even political) reasons.

If I am in the mood of car prOn - show me porsche or stuff. If I ready about python - show me current books on the topic. Or Udemy courses. Or what not.

If I currently read about how to get rid of pimples - show me beauty products.

But don't try to influence me into buying the latest camera gear, when I am interaction with my animal protection buddies over facebook.

That is also a good view on it. I would like it the most if online marketing was flipped - no networks but the media agencies have to serve related content and make ad deals based on those, hence much less cross marketing crap. However I see also why businesses don't do it as the conversions would probably be much lower with that approach. Also I think the auto industry, since you mentioned it, is closer to this approach as they condition users from a young age to pick a brand they like.

Ads are a [D]DOS on your brain computing power. Targeted or not.

That's true, and I generally dislike them, thus I run an ad blocker and some limits on JS. However I have a bit of a broader view on the topic. For one instance, we wouldn't have many of the things we have without the economic backbone of the internet - ads. Google Maps - No, E Mail services - No and many more as they are just to drive user engagement to serve more ads or to lock you into the rest of their ecosystems hence Android and iOS and their integrations with the services. However I don't mind at all some higher quality marketing like technical blogs with some real value + a plug for their service on how it solves the problem, and similar sponsored content. There is almost always to sides to both stories.

Email predates Google. https://www.openstreetmap.org/ does fine without ads. The ad-based business model makes it harder to develop competing essential information tools that are also economically viable.

Email predates google, however I don't thing the services would be anything alike without the bigger players. Initialy email was just a communication tool, however it quickly evolved to be sold as a feature on various hosting provides or was "thrown in" a more general package just to be feature par in attempt to bring in more people to use the service. This is not really directly related to online ads, however just shows that it's comercialized one way or another.

Open street map is a great example, along with wikipedia and numerous open source projects which are either backed by some foundations or collect donations and membership fees. The broader scope here is that while this is a better approach for the world, as long as there is the possibility to earn a buck on some service, there will be a comercial version of it. Guess capitalism just works that way.

1. This is just an application, not a granted patent.

2. The title barely matters at all. The claims define what they’re trying to patent.

I have 2 questions:

-Why do they feel the need to patent this? The algorithm look trivial.

-How is this even acceptable? Making decisions based on such "predictors" should be (and often is) illegal.

>Oh, you're physically disabled? I predict this will cause you emotional distress and you will perform badly in my college. Sorry, you're rejected.

>Oh, you're black? Sorry, my predictors predict you're a criminal! Don't worry, the police will arrive soon. You are free to use other banking institutions.

>Oh, you're poor? Sorry sir, we don't sell cars to poor people, because our predictors say they typically use our free repair services! That costs us money, please buy a used car from craigslist or something.

I can't comprehend how they could publish and acknowledge using this kind of profiling. And that's without getting into politics.

Quite the incriminating patent title, and it's not even necessary. Couldn't they just title it "group classification based on user features"?

It's like FB is trying to push us in the direction of China[1] and they don't care who knows it.

[1] http://www.scmp.com/week-asia/opinion/article/2131737/chinas...

This is, literally, cyber phrenology. How can Silicon Valley employees support this kind of crap?

This is a tool to assist in automated censorship: oh, this person claiming to be black isn't clicking like a black man, he must be a troll and we must stop fake news!

You do realize how easy that is to compromise, right?


Or more likely... targeted ads.

Why do you think they are mutually exclusive?

Kinda shocked that someone would apply for a patent for a tool that is essentially a decision tree.

Wow. I can't believe they are trying to patent this...

How do I find out if I'm one of The Poors or not?

Do you Share stuff from pages designed to be viral, e.g. ladbible?

Does your profile picture contain anime/cat/dog features on your face?

If the anwer to both is no, you're clear.

Want to see this applied to Warren Buffet who lives in a reasonably modest house, in a modest city, and shuns computers.

That's not how statistics works. They're classifying groups, not individuals.

Really? Could you please elaborate?

I would think they are classifying each individual into one of many groups, each group being an arbitrarily delineated social class. If that's true, than Mr. Buffet would indeed have the potential to be misclassified, because his behaviours ("features") are atypical of people as rich as him.

They may also look at the socioeconomic status of the people you associate with and make inferences.

"... socialeconomic group of users ...'

I'd assume they are already using this. FB ads already allow you to target ads based on household income.

It's just another metric for targeted advertising.

Just ask if they like goldeneye or metal gear solid

Or ask if they like Napoleon Dynamite: https://www.nytimes.com/2008/11/23/magazine/23Netflix-t.html

Is this ethical?

Facebook and being ethical generally are mutually exclusive.

Well it's marketing, so no.

An ethical spin can be made on it, as Zuck did in his testimony. He is making advertising more efficient for small businesses, which helps those businesses grow. That is a good goal, and I believe this to be true. We just need to decide if the juice is worth the squeeze.

Mods: can you change the title, please? This is a patent application; no patent has yet been issued.

Yes, we've updated the headline. Thanks!

Is this a bad thing? Advertisers want this. It does not make sense to market luxury cars to the poor, for example.

To "clickbait" poor person with luxury car is easier than rich person.

It is not a bad thing because in the end yes, will be discriminated at the counter by amount of money they have. Of course no one should get stuff for free.

It is bad in general, because people will be closed by their socioeconomic status and will have less possibilities to see better things. It can reduce economic mobility by some business not wanting to deal with people who are classified by automatic classifiers, even though they might have more money, be smarter, than what their FB history is telling.

> It does not make sense to market luxury cars to the poor,

Luxury cars aren't the concern. It's tumbrels we're worried about.

It's a good thing for business, yes, but there are a lot of ethical implications surrounding socioeconomic/behavioural predictions. I've been getting the feeling recently that Facebook has successfully circumvented decades of regulation on human studies since their research isn't (correct me if I'm wrong) governed by the Institutional Review Board

This has powerful implications I believe in the United States and other nations that are made to hold the tacit communal belief that they are a largely middle-class society.

Certainly, you could classify these groups in a closed process and operate to exploit their individual wants and desires, which is something existing models of capitalist industry engage in already. However, what if Facebook were to do this all transparently? For example, being sold something with the explicit revelation that your economic class is the primary consumer?

The patent seems to be based on defining class solely on the dollar value of last years income, whereas the actual measures of class membership take other forms.

There's nothing wrong or useless about advertising based on W-2 income level; but that won't have greater sociological effects on economic class or whatever.

Of course via the magic of widely shared credit reports, everyone already knows actual class breakdowns based on class differences in personal balance sheets and other measures.

I wish people wouldn’t post direct links to patents.

The advice from the IP lawyers at every company I’ve worked for (Microsoft, Dropbox, Amazon, Google) has been to never read patents or patent applications.

It's still better to link directly than to "tech gadget news" websites (that are so popular on HN): their writers understand even less.

Yeah, it’s definitely a toss up, but because I can’t/won’t read a patent, but might read a discussion about it, in this case the techjunk article is counterintuitively preferable

I'm honestly curious what the aversion is to reading patents and patent applications. Is it to avoid suspect patent infringement by being able to claim ignorance? i.e. "I have not read any patent that relates to some work that I've created wholly on my own that out of coincidence achieves the same as a patented work." Does that actually hold up if defending IP?

> Does that actually hold up if defending IP?

Yes. You are subject to triple damages if you read the patent, so it's foolish to read patents unless you have to. It's impossible to read all patents, and patents in almost all cases are useless, so "I didn't read the patent" is the normal case for software developers.

Patents make no sense for software, and should be abolished for software.

Yeah, as osteele points out, it moves the needle from unknowing infringement to willful infringement. And willful infringement carries triple damages.

When people have asked “how would they even know?”, there are usually two answers: 1) don’t underestimate what can come out in discovery, and 2) no lawyer is going to advise perjuring yourself

Did they explain why?

This is standard legal advice. It reduces the chance that an infringer will be found guilty of a willful violation, that results in treble damages.


Why is that?

Because wilful violation of a patent can make you liable for triple the damages. It's much harder to claim it wasn't wilful if you've read the patent.

I still think Facebook is US government tool . How it was spread in 2006 - 2010 on all foringe media channels financed by USAID. Most people in US still think that USAID is free money to foringers and want it to cut.

This probably wouldn't hold up in court because it's essentially doing something marketers have been doing forever, but on a computer[1].

[1] https://en.m.wikipedia.org/wiki/In_re_Bilski

God I hate patents.

What if one country wants their poor to stay poor, and one country wants their poor to aspire to be rich, so they can take advantage of opportunity for upward mobility?

Why would this patent or the associated technology (which in my view is barely patenable) have any bearing on a governments ability to do this? Governments have been doing this very successfully for thousands of years without the need of a machine learning model.

They were doing this without competition from a machine learning model.

I was reading through the paper, and it read like a female gathering rating their boyfriends or new suitors. The algorithm depicted in figure 2 goes, how old is he? over 30 uh? Does he have a house? Yea? Nice, Where is it? Toronto, WOW.

I shouldn't be surprised that a machine learning neural model is classified on the merit of users who are the nosiest classifiers :).

We haven't patented it, but sociology has studied mating through the lens of socioeconomic classes for a while. What you describe is upper middle class mating! You just forgot to ask if he is an engineer, a physician or a lawyer.

I think this is brilliant. We might discover more about society through the algorithms that spin off from profit based ventures.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact