Hacker News new | past | comments | ask | show | jobs | submit login
How The New York Times Uses Software to Recognize Members of Congress (nytimes.com)
331 points by beriboy on June 7, 2018 | hide | past | web | favorite | 127 comments

The example made me lol: "Mitch McConnell (red, almost certainly - confidence = 100.0)

On that note, they could utilize the box color to match the party affiliation.

No no no! Snap chat filters with a donkey or an elephant!! XD

as long as you don't use that to retrain the software...

It seems like the box color would be used to differentiate if multiple congresspeople were in the shot

Yup! That's what we do! (Hi, I'm Jeremy, the developer who worked on this software)

Who came up with the “who the hill” name? That’s amazing and I laughed out loud :D

I did! We had a brainstorming. It was initially (during development) called "Shazongress" but we couldn't get that one out of committee.

Shazongress sounds like something granpa Rick would invent!

what about right-center or left-center? ;)

I'm waiting for journalists to walk around with google glass type device to do this on the fly. Bonus it could record what they see and hear for later use.

I always think the future of journalism will be something like in Garth Ennis's 'Transmetropolitan' where there are camera (drones) everywhere, watching everything and an escalating tension (and maybe technical arms race) between those hiding / burying the signal and those trying to bring them to light. Consider this a recommendation for anyone looking for a inspiring (though somewhat adult) tale of near-ish future sci-fi.

I'll one-up that with Jacek Dukaj's "Black Oceans" - cameras everywhere; you're free to turn them off if you need some privacy, but if anything happens to you while outside the view of the surveillance network, good luck with your insurance claim.

The internet seems to think this is only available in Polish. Any idea if there is a good translation out there?

Warren Ellis.

I can't wait to live in that future, glasses that let you know if you bumped into anyone famous

Just reminding me of someones name would be nice... I'm pretty bad at that.

It’s only impressive to remember someone’s name now since we don’t have it being projected to us. Once that happens then it’s as impressive as remembering someone’s birthday now that we have facebook.

Insert that one black mirror episode.

OT, but I feel like this is the HN equivalent of an "evergreen tweet"

obligatory xkcd #<insert number>

Surely removing your need to do that will help, instead of focusing on it as a personal improvement goal!

I get the excitement, but this sounds like a terrifying future for famous people.

It sounds like a fun* game of, "Fool the facial recognition!" And it's not just famous people, it's also "undesirables." So you could recognize people on the sex offender lists or any number of other crimes. You could also have real life "gaydar."

*: fun here is used somewhat sarcastically

Also for anyone with anti-establishment or anti-Amazon opinions... Millions of feeds uploaded and analyzed by their servers, far away from all regulation. What could go wrong eh?

also sounds like a terrifying future for child kidnappers or fugitives or anyone on some wanted list.

whether they deserve to be on that wanted list or not

I mean right now we have cameras everywhere but they’re not all HD so the govt can’t run facial recognition reliably plus they need access to all the cameras. But when people voluntarily put facial recognition devices on their own selves, en masse? Wow.

Sounds like something Google or Amazon could benefit from practically giving away.

Am I right that sex offender registers are public in the US? And they sometimes include people convicted of very minor crimes?

Yep, definitely a Black Mirror episode in the making.

you are correct

Cars sounded like a terrifying future for stagecoach drivers.

Just watch Anon on Netflix...

They aren't that famous if you don't recognize them after bumping into them.

Fame is all about context though. I'm reminded of a specific incident when Kanye was leaving a taxi in France, and bumped into some woman. They had a short conversation and she walked away. Paparazzi asked her if she knew who he was, she said "no," they said "that's Kanye West!" She said "who's Kanye West?"

Similarly, I wouldn't recognize the president of Brazil. Or the senators of Idaho... Or the governor of new York...

Well, what's wrong with that? Knowing I walked passed the governor of new york doesn't even register. What should I feel?

Influential is probably a better word.

I was hoping to read an article about NYTimes setting up video cameras outside of popular restaurants in DC and using ML to perform facial recognition on everyone to try to find members of congress and well known lobbyists. oh well... it would be like TMZ-4-DC

Why set up video cameras when people bring their own...

> Rachel Shorey found members of Congress at an event hosted by a SuperPAC by trawling through images found on social media and finding matches.

The author says that training their own model would have been too hard due to lack of training data, but evidently Rekognition had sufficient training data to make it work? Why can't NYT use the same training set Rekognition uses? Does Amazon somehow have a secret non-public collection of celebrity photos?

It shouldn't take an intern too long to collect a representative set of Congress people and other high officials for training. Maintaining it would not be an undue burden. That would eliminate the false positive matches for all the unwanted celebs. Clearly Amazon's models aren't that great to begin with so there's little reason to stick with them.

Wrap it up into a simple native app and you can bypass the MMS BS. Even better, a sufficiently capable dev could integrate an opensource recognition library [1] to have it entirely implemented on the device.

[1] https://github.com/rudybrian/tuFace

Hi! I'm Jeremy, one of the developers.

We'll probably work on something like this for the next version. One reason it's harder than you think: We would have to buy / own rights to the photographs before we could use them to train -- most of those photos are owned by Getty or the AP. And our own photographs are perfectly lit and square, which made them awful for training face recognition.

The other hangup (which I didn't get to in the article) is having to add / remove people. New members are constantly being added and that's a maintenance burden for us. Amazon usually has the new member within a day or two. (Our team is very small and we have a lot of other responsibilities!)

But good points, definitely.

> We would have to buy / own rights to the photographs before we could use them to train -- most of those photos are owned by Getty or the AP.

I think your model would be covered by derivative art... unless you started selling the model itself.

"We would have to buy / own rights to the photographs before we could use them to train..."

Is this actually true?

In USA I don't think it is because the end use is transformative [1].

In UK it would be tortuous because it relies on Fair Use to temporarily store the images in order to extract the facial structure data. Fair Dealing is really draconian in comparison.

[1] https://www.lib.umn.edu/copyright/fairuse

Really doesn’t matter - the legal team at the NYT thinks it might be, and lawyers exist to tell people “you’d better not”.

And it's our job, as someone who knows what a computer is, to move forward with common sense if they're overreaching which, is their job.

They have every incentive to be as conservative in their advice as possible, and no incentives to "allow" risks. Doesn't increase their compensation any.

But...doesn't congress maintain some sort of api for available bio data - possibly including photos - to avoid the maintenance issue brought up in the piece? A quick google search shows propublica has sych an api, and it seems to have originally been developed by NYT... https://projects.propublica.org/api-docs/congress-api/

> We would have to buy / own rights to the photographs before we could use them to train

Please do elaborate on who's enforcing this mindset on you/your team.

Rekognition crawled and annotated millions of images of different celebrities to train their face recognition model. Once you have an accurate model for a lot of classes it's much easier to add new ones with just a few samples.

> Once you have an accurate model for a lot of classes it's much easier to add new ones with just a few samples.

This is pretty cool. Do you know of any good references for stuff like this? Not sure what the right topic name would be: online learning? streaming?

This is known as transfer learning, see [0] for an approachable example.

[0] https://www.mathworks.com/help/nnet/examples/transfer-learni...

Transfer learning

Thanks for the replies!

I can't wait to see how long it takes Congress to pass a law making it illegal to use facial recognition software on members of Congress.

(And no one else)

Thankfully, only high capacity assault facial recognition software is likely to be banned as a result.

So you should be able to send a selfie to this api and it will tell you which member of congress you look most like

Except in Illinois, where sending the data off device is illegal.

(See previous HN discussion)

If you're going to state something is illegal, you need to provide a source. What previous discussion?

I'm not reaperducer, and I can't find the old HN discussion. But here's an article about Illinois/Texas biometric data laws: http://abc7chicago.com/technology/why-googles-face-match-fea...

It would be fun to see which members are the most requested by NYT reports.

Oh, that is interesting. Also, hi Jon!

I have wanted for awhile to build a site which trained a machine learning system on the various data made available surrounding Congresspeople and information on members which were eventually found to be guilty of adultery or other similar crimes - then produce a score for every member of Congress rating how likely it is that they are cheating on their spouse, or taking bribes, or similar. Give them a sneak preview into the types of systems they are aiding and abetting in the creation of. I am uncertain of whether it could be considered defamation to have a brainless machine learning system decide there's an 85% chance some random member of Congress is an adulterer. I don't actually believe that any such system could ever reach any reasonable level of actual effectiveness due to the fundamental complexities of human behavior and circumstance, but that's not stopping the law enforcement side of things from moving forward so I don't see why it ought to stop the side trying to point out fundamental flaws in the strategy.

I've considered something like that, but instead of trying to figure out crimes, it would produce a score for bills.

A corruption score for bills, almost like a facebook for bills "This bill is friends with Exxon". It would figure out who spent the most getting the bill passed, and who they bought off to get it.

Just a simple thing for people to point to when they say things are corrupt. Granted in today's environment, that score would be 100% most of the time, but it would be interesting to have some idea just who bought the bill.

Some of the coding-friendly news orgs like ProPublica have done one-off versions of this. It'd be great to have an ongoing tool to check the scores at any time.

I’d take it a step further and ingest all public record data including using FOIA requests to find any behavior that could have a representative charged with a crime (fraud, bribery, etc).

As sibling comment said, don’t generate an adultery score. That’s not productive or decent. Find actual evidence of wrongdoing, not draconian scoring systems.

Some folks in Brazil did something like that, IIRC it's called project Serenata de amor, it uses machine learning to process public government data in search of weird and suspicious expenses and flag it.

Link: https://serenata.ai/en/

Let's take it a step further than that.

People already know who is corrupt, who is sexually harassing, etc.

Certainly their victims and co-conspirators know it. Probably their staffers, friends and family have a pretty good idea.

Often reporters themselves know who is dirty but don't have enough corroborating sources to get past the fact checkers.

So it seems like the Wikileaks model could be improved with a crypto market. Those in the know place bets on who is dirty and get a payoff when the dirt eventually gets disclosed.

It would be a nice incentive to get more disclosure, and it would directly reward the victims and leakers.

Great idea, but let's drop the crypto and drop the gambling. Instead, crowdfund dirt that would constitute grounds for impeachment or recall for politicians.

Credible whistleblowers have additional incentive, and it would serve as a good yardstick for just how much folks detest a given official.

Alternatively we could take the crypto and go full Weimar Republic with a Kickstriker[0] clone for politicians!

[0] - http://kickstriker.com/

> let's drop the crypto

Hard to protect leakers, victims and funders when the perpetrators have the power to trace payments and seize assets.

Sure let's get John and Jane Doe to acquire a secure crypto, securely access a website over Tor, and manage to maintain OPSEC. Then we can have them setup their own Dropbox clone with rsync and some trivial bash scripts!

Alternatively, we can create an easy-to-use site for the common man who is willing to pony up some petty cash for whatever their version of "justice" is. At sufficiently high levels, in sufficiently corrupt societies they can trace things down and disappear you. This idea isn't for those places, and likely wouldn't even work anyway.

However, in nations with reasonably sound rule of law this could potentially work.

> adultery or other similar crimes

Adultery is not a crime.

(You can argue that it's an indicator of a person's character, or lack thereof, sure. But that's something different.)

citation please. (It is in 21 states in 2017)


Today I learned!

> Note: Minnesota's statute applies to both partners if they are both married. If one partner is unmarried the law only applies to the married woman. The law does not apply to a married man and unmarried woman.

That's pretty terrible.

Also the legend is pretty amusing:


Lying about it is an impeachable offense.

Lying is not an impeachable offense, however perjury certainly is.

*to a court only though, no?

Only under oath

I believe it is in New York.

That depends on the state.

It's disturbing to me that you're so focused on adultery, which isn't a crime in most places and is a personal matter for the couple involved. More than 70% of people cheat on a significant other at some point, so you'd be casting a wide net.

Why not instead look at real crimes like pay-for-play, fraud, sexual assault, etc.?

Everyone seems to be missing the point completely. I am not talking about building an adultery detector. I am talking about an object lesson. I am talking about showing Congress that the systems they are building in order to profile, label, and categorize the public based upon spurious statistical models are dangerous. Congresspeople are myopic, selfish creatures. If it doesn't affect them personally, they see it solely as a matter of "is it more power for us? Then the vote is yes!"

Adultery was picked because it is a very common behavior which is nonetheless viewed very poorly by society. It's the kind of thing which historically gets politicians into trouble. Being named as an adulterer is a realistic thing for a politician to fear, as it could ruin their life - the way getting labelled a terrorist or a pedophile or similar will be a reasonable thing for a citizen to fear when systems are debuted that use the same technology to label people with a likelihood score. That is why adultery was picked. If you wanted to do by pay-for-play where would you get your training data set? Where are the people who have actually lost their office because of that? The others are the same. We have a long history of politicians getting busted for adultery, so we can have a good training dataset. And the results will be garbage and useless. That's the whole and entire point. Once you have an automated system built, it doesn't matter whether its conclusions have any merit. It will hand out judgements, no one will be able to explain what basis they actually have, but investigations will be targetted and reputations will be destroyed.

My intent isn't even to accomplish any of that reputation destruction, it is simply to show Congress the ill use such systems can and will be put to.

Even if only to prove your point correct with 1 or 2 examples, this would be a great system to create! Let congress feel the pain that so many citizens unfairly feel.

There's a non-moralistic reason to be concerned with adultery in the case of politicians and public servants, which is that knowledge of it can be used for extortion.

Well, there's two sides to that. The other being they may be using their position of power to strong arm less powerful people into an affair, then discrediting their lover to cover their ass. Even if you don't care about that person being mistreated, it tells you something about their general priorities and how likely they are to be generally corrupt.

That's only because its a crime striking those laws down and removing them from the military system - would reduce the attack surface for bad actors trying to suborn people in to betraying the USA.

Likewise having a spiff 18 months ago at burning man or glasto isn't really a huge risk to security if its not a crime - would also help with recruitment for TLA's

That's why it's important keep gay people out of Congress, too?

They wouldn't care if you are gay. But if you are closeted and gay, that becomes something you can be blackmailed over and you will fail to get a security clearance. I don't know what they do when someone who can't get a clearance is elected... but the only reason they ask about lifestyle factors in background investigations is to determine if you're able to be blackmailed. If you're open, then obviously you can't be blackmailed with the fact you're gay or polyamorous or in an open marriage or whatever. It's the secrecy that's the problem. (This is all off-topic though, since I don't actually care about adultery and I don't think the system I'd build could legitimately catch anyone for adultery. It would do nothing but show that peoples reputations can be so easily smeared with basically no evidence using these sorts of systems. That's the point.)

> I don't actually believe that any such system could ever reach any reasonable level of actual effectiveness due to the fundamental complexities of human behavior and circumstance...

Absolutely it could - that would all be factored into the percentage. Human behavior and chance encounters are the exact reason you could never say 0% or 100%, however.

How many nodes do you need in your neural net in order to account for the variation of human behavior, the influence of happenstance, etc? When your model fails to converge, do you tell your government customer that you can't actually produce a score and give back the $200 million or whatever you got for the contract to create such a score?

Which existing public data sets would you be using to train against?

How about score them on how they really vote. "You say you're a democrat but our party detector test says that is a lie!" 83/17

That is a dangerous idea. Do you really want everything to be so black and white? Vote down the party lines or else?

No, I just want to see who says they are a hard R or D and really vote middle of the road. Perhaps the site could let me vote too to see where and or who I agree with best.

> produce a score for every member of Congress rating how likely it is that they are cheating on their spouse

Sounds like a really mean spirited thing to do. They are people too.

This is an embarrasingly bad approach to face recognition for a small set of frequently photographed people.

Several comments from the article give me concern

- They seem to think Rekognition is a panacea for their problem, but there are many known issues with Rekognition celebrity detection. Not to mention that the cost-per-request is often highly unfavorable compared with building a higher-accuracy, situation-specific solution with extensions to pre-trained models.

- They say some interns took a “novel approach” by creating a hard coded look-up table for disambiguating similar politician-celebrity pairs. This creates awful tech debt and failure cases. I’m not knocking it too hard because it’s pragmatic, which is a good sign about those interns, but this should be seen as a necessary wart to be improved, not a point of pride.

- As others have pointed out, even considering turnover in Congress, it seems like people who report on Congress for their full time job should recognize them. It truly seems like a silly, wasteful use of resources to solve this with computer vision.

This is all consistent with what I’ve heard from colleagues at NYT data science. As well as people I’ve known in data science bootcamps around New York, like Insight, who heard recruiting pitches.

Their department seems self-aggrandizing, using highly overwrought personalization models and seeming to have 538-envy for how they want their data science work to come off despite 538 exiting, among other important figures like Mike Bostock.

It just comes off as a place that wants to do status signalling to seem like a machine learning or data science thought-leader, but they don’t pay competitively or do what’s needed to retain good people and would rather do patchwork stuff like this with interns than to take the work a little more seriously.

I don’t get the impression it’s a place serious ML practitioners would want to go.

Isn't this the same technology that would allow surveillance on every private citizen?

> Most recently, Rachel Shorey found members of Congress at an event hosted by a SuperPAC by trawling through images found on social media and finding matches.

I bet nothing in the technology says "member of Congress" or depends on the target being member of Congress. So anybody can mine social media and collect surveillance data on people. And that is probably already happening.

TL;DR: They use a API from Amazon that's already trained for Congressmen.

If anything this article doesn't reflect well on Rekognition

>Nope, it’s too hard! Computer vision and face recognition are legitimately difficult computer science problems.

Someone is woefully ignorant how good facial-recognition surveillance is.

There's a difference between "difficult" and "can't be done". Yes, facial recognition has come a long way, but it's still non-trivial to set up a custom facial recognition service for your particular needs.

the obvious next step to this would be to build a mobile app with a built-in model to recognize everyone deemed important using live video from the camera.

Cool. Maybe next they can tackle subscriptions without ads.

This reminds me of Casino Royale. Wow.

Hmmm ... your job is to cover the actions of 540 people elected to DC, many of whom you already recognize, and you can't remember what they look like? I'm not a journalist, but that seems like an essential thing to memorize, along with some minor metadata (locale, party, a bit of bio). Spend a weekend and do it.

Every profession has things you can look up and things you just have to memorize. 540 people isn't much - can sports journalists recognize 540 athletes? Otherwise you'll be in situations where you don't have an opportunity to look them up (e.g., can't get a photo, no time, etc.), and you'll have many false negatives: If you don't know what they look like, you won't realize it's a member of Congress at the party with the coke.

As the article states up top, there's decent churn in Congress, making this more than a one-time or annual thing. Also, it's not just members of Congress who are important to cover in a beat, but their senior staff members and aides.

Spending a significant amount of time developing a process for face memorization and undertaking it would be an example of needless/premature optimization, especially for people who may be covering Congress tangentially. Most of a Congress reporter's job does not depend on having random encounters with members of Congress.

> Most of a Congress reporter's job does not depend on having random encounters with members of Congress.

So much for my fantasy of a reporter's life; press conferences and hearings sound boring. But I will nitpick a minor point:

> there's decent churn in Congress, making this more than a one-time or annual thing

I don't remember the rate at which incumbents are re-elected, but it's pretty damn high. Unfortunately, after you memorize them once, you'd only have to learn a few more at a time.

The House turns over a lot. The Senate is a different story, those guys fossilize.

This election the house will have 55 voluntary departures + 2 people resigned so far. Assuming ~90% get relegated which may be high that's easily ~100 new members. It varies quite a bit but 2010 for example was down to 85% that won their races.

> The House turns over a lot. The Senate is a different story

I respectfully refer the gentleperson from Spooky to the following:


Few things in life are more predictable than the chances of an incumbent member of the U.S. House of Representatives winning reelection.

They don't provide a number but eyeballing the chart, I think that number starts with a "9" over several decades, and is increasing. Here's an article that says it was around 96.6% in 2014;[0] it must be embarrassing to find yourself in the bottom 3.4 percent of any group.

(It also says House members are reelected more often than Senate members.)

[0] http://www.politifact.com/truth-o-meter/statements/2014/nov/...

You’re totally right, but being a rank and file congressman is kinda miserable... many transition to other offices, federal/state appointments, etc.

Representatives also face 3x as many elections.

Remembering the 150 names of the memebers of my fraternity, along with their hometown, big brothers name, major and several facts about them took me nearly 8 weeks to completely. And to be candid I still didn’t know about 50 of them. It’s difficult memorizing those things and I was actively engaging with them on a daily basis and putting work towards memorizing their names. I could imagine it would be a stretch to remember that many names, many of whom you will never talk to in your life along with possibly their home state and party affiliation. Especially since they mentioned the number of congressmen/women is constantly changing.

> can sports journalists recognize 540 athletes?

Well... I don't know if that's a fair comparison. Members of Congress don't generally walk around with their names embroidered on their shirts (but, hey, that might be a good idea!)

Also, athletes look like...athletes, with non-facial physical characteristics, such as height and muscle mass, that make it easier to pick them out from a general crowd. A congressmember is less easier to pick out in a room full of other suit-wearing middle-senior aged people.

The smell, the outstretched hand, the other one in your pocket ...

This sounds like 2018's version of "you won't always have a calculator."

Not all brains have an equivalent ability to recognize faces. Like, "face blindness" is a real thing.

A text-based interface is easiest for reporters to use, so while texting is slow, it’s superior to a web service in the low-bandwidth environment of the Capitol.

This is disturbing to hear. How can our congress make the best decisions possible if it can't access and communicate relevant information quickly? The ROI to the United States of simply having a high-bandwidth network at this global powerspot is so obvious that I had just assumed it was the case—so to hear that reporters can't even use a web interface to quickly send images is frightening if true, and perhaps even indicative of a broader issue of our government's inability to effectively execute, partially rooted in its inability to empower itself with the tools necessary to effectively execute.

* Edited at burkaman's prompt to be less sensationalist

Extremely disturbing, deeply terrifying? Please try to reserve phrases like this for the many actually terrifying situations in the world, not minor inconveniences.

You absolutely do not want members of Congress using an open network at a "global powerspot". Hopefully they use a highly secured network that is not open to anyone who can get press credentials. Seeing an open unsecured network at the Capitol might actually be deeply terrifying.

It's the 3G connection used by the public in one of the basement floors of the capital building that has low bandwidth. The capital has its own internal network for congressmen and their staffers, they just don't let random reporters connect to it

Even worse (weirder), the Senate bans electronic devices on the floor. If a Senator wants info, they have to sprint out one of the doors to the lobby where they have an aide waiting with an iPad (usually).

The House allows iPads on the floor, and reporters are allowed to bring laptops into the gallery. It's how we get our live votes transcribed! https://www.nytimes.com/2017/05/08/insider/how-we-beat-the-h...

Interesting quite different to the HOC where the result of a division (vote) is read out quite soon after.

I have worked at large 500+ delegate conferences using parliamentary procedures and now they often use electronic systems for both teller and card votes which is much faster

Do you mean the paper work concerned with the days business or being able to look up information quickly before they speak in a debate?

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact