This is a current limitation, and an artifact of the data+method but not something that should be relied upon.
If we do some adversary modelling, we can find two ways to work around this:
1) actively generate and search for such data; perhaps expensive for small actors but not well equipped malicious ones.
2) wait for deep learning to catch up, e.g. by extending NERFs (neural radiance fields) to faces; matter of time.
Now, if your company/government is on the bleeding edge of ML-based deception, they can have such policy, and they will update it 12-18-24 months (or whenever (1) or (2) materialises). However, I don't know one organisation that doesn't have some outdated security guideline that they cling to, e.g. old school password rules and rotations.
Will "turning sideways to spot a deepfake" be a valid test in 5 years? Prolly no, so don't base your secops around this.
The thing with any AI/ML tech is that current limitations are always underplayed by proponents. Self-driving cars will come out next year, every year.
I'd say that until the tech actually exists, this is a great way to detect live deepfakes. Not using the technique just because maybe sometime in the future it won't work isn't very sound.
For an extreme opponent you may need additional steps. So this sideways trick probably isn't enough for CIA or whatnot, but that's about as fringe as you can get and very little generic advice applies anyway.
It sounded to me like the parent poster wasn't saying not to use it, but simply that it cannot be relied upon. In other words, a deepfake could fail a 'turn sideways' test and that would be useful, but you shouldn't rely on a 'passing' test.
Another way to think of it might be that it can be relied on - until it can't. Be ready and wary of that happening, but until then you have what's probably a good mitigation of the problem.
I think the concern is complacency, and the inertia that existing security practices leads to security gaps in the future. "However, I don't know one organisation that doesn't have some outdated security guideline that they cling to, e.g. old school password rules and rotations."
Or put another way, humans can't be ready and wary, constantly and indefinitely. At some point, fatigue sets in. People move in and out of the organization. Periodic reviews of security practices don't always catch everything. Why something was implemented was forgotten by institutional memory. And then there's the cost for retraining people.
The flip side of that is people feeling/assuming there's nothing they can really do with the resources they have therefore they choose to do nothing.
Also, those that are actively using mitigations that are going to be outdated at some point are probably far more likely to be aware of how close they are to being outdated by encountering more ambiguous cases, as seeing the state of the art progress right in front of them.
As for people sticking to outdated security practices? That's a problem of people and organizations being introspective and examining themselves, and is not linked to any one thing. We all have that problem to a lesser or greater degree in all aspects of what we do, so either you have systems in place to mitigate it or you don't.
Therefore, developing and customizing a proper framework for security and privacy starts by accurately assessing statutory, regulatory, and contractual obligations, and the organization's appetite for risks in balance with the organization's mission and vision, before developing the policies and and specific practices that organizational members should be doing.
To use a Go (the game, not the language) metaphor, skilled players always assess the whole board rather than automatically make a local move in response to a local threat. What's right for one organization is not going to be right for another. Asking the caller to turn sideways to protect against deepfakes should be considered within the organization's own framework, along with the various risks involved with deepfakes, and many other risks aside from deep fake video calls.
If that is conclusion that is considered within the organization’s custom security and privacy framework, sure.
If there is no such framework, this is no different than yoloing lines of code in a production app by a team that does not have at least some grasp of the architectural principles and constraints at play. Or worse, not understanding the “job to be done” and building the wrong product and solving for the wrong problem.
Exactly. Even the article gave a couple cases of convincing profile deepfakes. Admittedly they’re exceptional cases, but in general progress tends to be made.
If all the money on self driving cars would have been put into public transport (driverless on rails is a solved issue) and pushing shared car ownership instead, we might actually get somewhere towards congestion-free cities.
We can already have congestion free-cities today, no new technology nor public transport required. We had the technology for quite a while now: congestion charging.
It works really well in Singapore to control congestion, and also worked well in London when they adopted it afterwards.
Public transport also works quite well in many places around the world.
It also used to work really well in North America in the past. A past when the continent was much poorer. (I'm mostly talking about USA plus Canada here.)
Public transport only works when after you step off the bus or train, you can get to your destination on foot. Density is outlawed in much of the USA and Canada.
https://www.youtube.com/watch?v=MnyeRlMsTgI&t=416s starts a good section about Fake London, Ontario. At great expense, they built a new train line. But approximately no one uses it, because you can't get anywhere when leaving the stations. The video shows an example of a station where the closest other building is about 150m away. And that's just a single building. The next ones are even further.
Land use restrictions and minimum parking requirements are a major roadblock. And just throwing money at public transit directly won't solve those.
Shared car ownership is an interesting idea. Uber can be seen as one implementation of this concept. It can be done profitably, but I'm not sure it has much impact on the shape of cities?
In the grand scheme of things, there's not much money being put into self-driving cars so far. A quick Googling gives a Forbes article that suggests about 200 billion USD.
In terms of this particular tech previous obvious limitation, namely no blinking, worked for something like a quarter from discovery.
Venn diagram of people who someone wants to trick by this particular tech, those who read any security guidelines and those worthy of applying this kind of approach to in the first place is however pretty narrow for the foreseeable future. It's more of a narrative framing device to talk about 'what to do to uncover deepfake video call' as a way to present interesting current tech limitations - not that I particularly mind it.
This may be like a proof of work cryptography issue, except the burden of work is on the deep fake. Just ask a battery of questions, just like out of a Bladerunner scene or whatever. This is still the problem with AI. It depends on tons of datasets and connectivity. Human data and human code are kind of the same. Even individually, we can start with jackshit and still come up with an answer, whether right or wrong. Ah, Lisp.
> Self-driving cars will come out next year, every year.
"Come out" could mean different things in different contexts. Deepfake defence context is analogous to something like: there are cars on public roads with no driver at the wheel. And this is already true in multiple places in the world.
I think it's odd we don't think of other limitations of products the same way. Put another way, why don't we just say it can't do it.
Example, we don't say a jet ski has a current speed limitation of 80 mph, we say it can go 80, but not 81. It's a simple fact. No promise that it will be faster tomorrow, because that's not what it is, it's not its future self.
It's like they're combining startup it will always be better after you invest more money with the reality of what "is" means.
One thing that I haven't seen mentioned is that many of the recent articles I've seen misuse the phrase "deep fake" and usually mean "face-swap algorithm" or "look-alike". The former, I believe has been able to defeat this test for 10 years at least and the latter has always been able defeat this trick.
The only person who is promising self driving cars next year (and has done so every year for the past 5 years) is Elon Musk. Most respectable self-driving car companies are both further along than Tesla and more realistic about their timelines.
Let's take a look at some of those realistic timelines. A quick googling gave me a very helpful listicle by VentureBeat from 2017, titled Self-driving car timeline for 11 top automakers. [1]
Some examples:
Ford - Level 4 vehicle in 2021, no gas pedal, no steering wheel, and the passenger will never need to take control of the vehicle in a predefined area.
Honda - production vehicles with automated driving capabilities on highways sometime around 2020
Toyta - Self-driving on the highway by 2020
Renault-Nissan - 2020 for the autonomous car in urban conditions, probably 2025 for the driverless car
Volvo - It’s our ambition to have a car that can drive fully autonomously on the highway by 2021.
Hyundai - We are targeting for the highway in 2020 and urban driving in 2030.
Daimler - large-scale commercial production to take off between 2020 and 2025
BMW - highly and fully automated driving into series production by 2021
Tesla - End of 2017
It certainly wasn't just Tesla who was promising self-driving cars any second now. Tesla was definitely the most agressive, but failed to meet its goals just like every other manufacturer.
There was definitely a period when everyone (for certain values of same) felt they needed to get into a game of topper with increasingly outlandish claims. Because if they didn't people on, say, forums like this one (and more importantly the stock market) would see them as hopelessly behind.
A number of european capitals seem to have managed to do driverless high capacity underground trains. Here in the UK, we've got a number of automated trains but for union reasons they still have drivers in the cab who press go at each station.
In the US, it looks like Detroit has a self driving line, and there are a bunch of airport shuttles. Presumably you are hitting the same union issues as us?
Let's not dismiss the point that self-driving cars are the "stone soup" of machine learning industry. Like the monk who claimed he could make soup with just a stone, machine learning claimed that with two cameras, two microphones, and steering/brake/accelerator control, a machine would someday soon drive just like a human can with that hardware equivalent.
Then it turned out well, we actually need a lot more cameras. Now we need high res microphones. Now we need magnets embedded in the road. Now we need highly accurate GPS maps. Now we need high power LIDAR that damages other cameras on the road. Now we need....
Each little ingredient in the soup "made only with a stone." Machine learning has utterly failed to deliver on this original promise of learning to operate a vehicle like a person, with no more sensors than a person.
"Machine learning has utterly failed to deliver on this original promise of learning to operate a vehicle like a person, with no more sensors than a person."
I am not aware of anyone except Musk making that claim. "Machine learning" as in the statements of the main researchers, certainly did not promise anything like it.
The problem for self driving cars is the risk tolerance. No one cares if a deep fake tool fails once every 100,000 hours because it results in a sub standard video instead of someone dying.
What about reflections? When I worked on media forensics, the reflection discrepancy detector worked extremely well, but was very situational, as pictures were not guaranteed to have enough of a reflection to analyze.
Asking the subject to hold up a mirror and move it around pushes the matte and inpainting problems to a whole nother level (though it may require automated analysis to detect the discrepancies).
I think that too might be spoofable given enough time and data. Maybe we could have complex optical trains (reflection, distortion, chromatic aberration), possibly even one that modulates in real time...this kind of just devolves into a Byzantine generals problem. Data coming from an untrusted pipe just fundamentally isn't trustable.
I wonder how good the deepfake would be for things it didn't have training data on. For example, making an extreme grimace. Or have the caller insert a ping pong ball in his cheek to continue, or pull his face with his fingers.
One thing I notice with colorized movies is the color of the actor's teeth tends to flicker between grey and ivory. I wonder if there are similar artifacts with deep fakes.
Years and years of having to do increasingly more insane things to log into banking apps until we’re fully doing karaoke in our living rooms or stripping nude to reveal our brand tattoos
If I remember correctly, the context was that Microsoft had made the Kinect mandatory for the Xbox One which wouldn't function without it. And the Kinect was being used for some silly voice/motion control crap.
The extreme reaction and copypastas like this probably lead to microsoft scrapping that idea a few years later.
Microsoft Teams developed a feature when if you’re using a background and turn sideways, your nose and the back of your head are automatically cut off.
Bug closed, no longer an issue, overcome by events.
Interesting that you bring that up. The most egregiously invasive student and employee monitoring software requires that the subject always face the camera. That seems most ripe for bypassing with the current state of deepfakes. https://www.wired.com/story/student-monitoring-software-priv...
My bank does a much better system where they ask for a photo of you holding your ID and a bit of paper with a number the support person gave you for authorizing larger transactions. It's still not bullet proof but since you already have to be logged in to the app to do this, I'd say it is sufficient.
This case I was on the bank text support requesting to make a transaction of $100,000 in one go which the app would not let me do. So it was a real person on the other side. Bank was in Australia called Up.
This sounds like a good thing. An extra step in a $100,000 transaction to prevent accidents or crimes definitely feels justified if the accounts not marked as normally moving heaps of money like a billionaire or something.
I'd trust the data with a (real, not online) bank more than most other companies like Google.
I'd be more worried about people hacking into networked security camera DVRs at stores and cafes and extracting image data from there. Multiple angles. Movement. Some are very high resolution these days. Sometimes they're mounted right on the POS, in your face. Sometimes they're actually in the top bezel of the beverage coolers.
Banks are the hardest way to get this data, not the easiest one.
Good point. I’m still wary of just assuming (if that’s what we’re doing here?) that old established organizations you’d expect to be secure are in fact secure. For example I would have expected credit rating agencies to be secure…
Mandatory reporting certainly helps IMO. Reporting should be mandatory for anyone handling PII.
No bank is going to run such a system in house. It will be a contracted service whose data is one breach away from giving fraudsters a firehose of data to exploit their victims.
Frankly, of all the personally identifying data I share with my bank, a low resolution phone video of the side of my head is the least worrying. It's like worrying the government knows my mum's maiden name!
In the eventuality that robust deepfake technology to provide fluid real-time animation of my head from limited data sources exists and someone actually wants to use it against me, they can probably find video content involving the side of my head from some freely available social network anyway.
Would you please stop posting unsubstantive and/or flamebait comments to HN? You've been doing it repeatedly, it's against the site guidelines, we end up banning such accounts, and we've had to warn you more than once before.
Not sure what to say to this one. Women can get sensitive, if requests of them can be seen in an unpleasant light. Women have also been historically tricked into posing for cameras, had their images misused, and are often quite sensitive about it.
I thought my comment was legit, and on topic. If one is going to implement a policy where people have to slowly move their camera around their body, there may be severe misunderstandings ... and an inability to clarify if a bad response runs-away on twitter and such.
Support persons should be carefully coached on how to handle this.
I guess all I can say here is, I didn't mean this to be so controversial.
We ban accounts that keep doing that, so would you please review https://news.ycombinator.com/newsguidelines.html and take the intended spirit of curious conversation more to heart? Tedious flamewar, especially the back-and-forth, tit-for-tat kind, is exactly what we don't want here.´
Listen, I'm willing to try to adhere more closely. I see how some of the above posts click with what you're complaining about.
One suggestion ; some news articles are just, almost, entrapment. I feel like HN having an article about racism, is going to entice all sorts of comments which break the site guidelines.
Take my posts above ; I legitimately am concerned that by simply labeling those we are strongly opposed to, eg racists, we fail. If we just label, if we therefore misunderstand their motivations, any attempts at correction become flawed, and just plain don't work.
I want positive corrective action to fix things, not divisive posturing.
Obviously in hindsight I should not have waded in, I should have realised how my post, my motivations could be misunderstood. I agree, my bad. 100%. But once there, I'm left in this horrid position of suddenly feeling as if I'm being labelled as a racist sympathizer or some such. A reputation is a hard thing to leave on the ground, with no response!
The other thing is, why is this even important, when you shouldn't be basing decisions off the other person's race or face in general?
Base everything off the work they do, not how they look. Embracing deepfakes is accepting that you don't discriminate on appearances.
Hell, everyone should systematically deepfake themselves into white males for interviews so that there is assured to be zero racial/gender bias in the interview process.
But currently, it's pretty much a guarantee that you can pick out a deepfake with this method as there is no way for current methods to account for it that are in use.
As with any interaction with more than one adversary, there is an infinite escalation and evolution with time. And similarly then something will come up then that is unaccounted for and so on, and so on.
Asking for entropy that’s easy for a real human to comply with and difficult for a prebuilt AI is at least a short term measure. Such-as show me the back of your head sideways then go from head to feet without cutting the feed.
If it's a high-threat context I don't think live video should be relied on regardless of deep fakes. Bribing or coercing the person is always an alternative when the stakes are high.
What if the real person draws something on his face?
Does the deepfake algorithm removes it from the resulting image?
Can you ask the caller to draw a line on his face with a pen as a test?
> Can you ask the caller to draw a line on his face with a pen as a test?
I think if the caller did this without objection that would be a bigger indication that it is a deep fake than the alternative. What real person is going to comply with this?
“The Impossible Mission Force has the technical capabilities to copy anyone’s face and imitate their voice, so don’t base your secops around someone’s appearance.”
It is however, a lower bound on whether it is the case that something is a reasonably forseeable/precedented area of research.
After all, if the artist can imagine and build a story around it, there'll be an engineer somewhere who'll go "Ah, what the hell, I could do that."
*By Golblum/Finnagle's Law, it is guaranteed said Engineer will not contemplate whether they should before implementing it, and distributing it to the world.
This another example of why we can't have nice things.
Ask the caller to move their hand in front of the camera so the hand fully obstructs the view, and then slowly slide the hand to the side until it completely moves out of the view. Crop-resistant!
It's only a fool's errant if you give up. If you don't give up, then it's a cat-and-mouse game. Forever outsmarting the other, but neither side winning permanently.
I don't think that's very robust. The entire image could easily be fake, as the face is fake, and people already have fake backgrounds. If the entire image is fake, there's no reason for the actual camera's view to match the fake image, so a wider angle camera would keep you in view as you move, and the system could generate a tighter fake view.
It's a constant cat-and-mouse game. When I worked in this space (2019-2021), the best defense against deep fakes was looking at the microfacial behavior/kinematics of the "puppetmaster" and comparing against known standards of the deepfake subject. Works even if the fake is pixel-perfect (since it looks at the facial "wireframe" rather than the image itself). The obvious downside is you need sample data of the subject (and usually tons of it). I wonder if that general approach can be optimized. E.g. Deep fakes tend to struggle with certain fine movement/detail, if you had a reflection of the subject, the algorithm would have to not just replicate the main face and the mirror, but also be completely optically consistent.
Was a fun project, but the cat-and-mouse feeling was inescapable.
For those curious, look up the DARPA MediFor project. Siwei Lyu (in the article) did a bunch of work in this space. Also see Hany Farid and Shruti Agarwal. They've worked specifically with deep fake detection.
> On receipt of the form, we will require a photograph of you, or a trusted representative as proof of identity. You will have to get a NEW photograph taken, holding two symbol of ours. The two symbols we need you to hold are a loaf of BREAD and a FISH (the name of our church). This proves that the person in the photograph is genuine. Passport or other photographs will NOT be accepted.
> (...)
> As dumb as he looks, I'm not happy. I asked for the fish to be on his head AND a loaf of bread. I got neither!
It reminds me of the meme where guys sing over "Evanescence - Bring Me To Life" with the snapchat gender swap filter on. The female vocals are done facing the camera, showing a female face, the male vocals are done sideways. Turning sideways effectively disables the filter, showing the real (male) face.
Long term, the only robust way to solve this is going to involve a remote attestation chain i.e. video that's being signed by the web cam as it's produced, and then transformed/recompressed inside e.g. SGX enclaves or an SEV protected virtual machine that's sending an RA to the other side. Although hard to set up (you need a lot of people to cooperate and CPU vendors have to bring these features back to consumer hardware), it has a lot of advantages over what you might call trick-based approaches:
1. Robust to AI improvements.
2. Blocks all kinds of faking and tampering, not just deepfakes.
3. With a bit of work can securely timestamp the video such that it can become evidence useful for dispute resolution.
4. Also applies to audio.
5. Works in the static/offline scenario where you just get a video file and have to check it.
There are probably other advantages too. The way to do such things has been known about for a long time. The issue is not any missing pieces of tech but simply building a consensus amongst hardware vendors that there's actual market demand for [deep]fake-proof IO.
In reality, deepfakes have been around for some years now but have there been any reports of actual real world attacks using them? Not sure, I didn't hear of any but maybe there's been one or two. Problem is, that's not enough to sustain a market. Attacks have to become pretty common before it's worth throwing anything more than cheap heuristics at it.
Then you just point the webcam at a screen or microphone at a speaker?
I really don't think moving our trust to unknown, unnamed manufacturers of hardware in far away places is a solution.
The solution is not going to be high tech, imho. Just like we have learned a skepticism resulting from Photoshop, we'll learn a skepticism of live video or audio.
You could layer on IR depth mapping, available in many Windows Hello providing camera systems.
I happen to agree with the other voices here saying this is a folly game of cat and mouse, but there are near-time methods of making this harder to fool. And that might be enough for now.
The solution you propose sounds vastly overengineered. Why would we need remote attestation, tampering resistance and enclaves when this is simply a problem of your peers being unauthenticated?
If you care about the identity of who you are speaking to remotely, the only solution is to cryptographically verify the other end, which just requires plain old key distribution and verification. It's just not widespread enough today for videocalls because up to now, there wasn't much need for this.
What are the practical scenarios where you need this that would require the introduction of overengineered, Orwellian, crippled computers described above, while at the same time not being solvable by other more realistic means?
you don't. You assume that the key distribution mechanism is secure, so that whoever has the key _is_ who they say they are.
This doesn't prevent adversarial impersonation (where you cannot trust the party that want themselves impersonated). E.g., if you are an employer, and interviewing via video, you cannot tell that the person authenticated and performing on video is indeed the person you're hiring. I dont think this is a problem that _should_ be solved tbh.
That just moves the problem around, it doesn't solve it. It's also incompatible with anti-money laundering laws that don't make it easy/safe to outsource ID verification.
Encryption and use of signed certificates has certainly been a big help against web fraud. No, it's not perfect, and can't prevent certain kinds of phishing, but it has definitely raised the bar for would-be scammers. It makes it nearly impossible to spoof "amazon.com" in the browser, and it prevents passive snooping on open WiFi.
You can't make it impossible, but you can make it very difficult.
My elderly uncle almost gave $10,000 to a scammer who had convinced him that his nephew was sitting in a jail and needed this money to be paid for his bail. Luckily, he reached out to me for help and I was able to confirm that his nephew was at home, not in jail.
I honestly can't imagine some of the scams that are coming, particularly to the tech-vulnerable, if we don't do SOMETHING to make real-time deepfake video harder than it now is.
The only solution will be in person meetings, as it has always been. Faking audio has been around a really long time. If you needed to be absolutely sure the person you're talking to is legit, you met them in person (mission impossible style disguises not withstanding).
Nothing has really changed with deepfake, other than the fact that for a brief period we could be sure the person we were having a video chat with was legit because the tech didn't exist to fake it.
I think it would be useful if news outlets signed their video content using watermarking techniques. Then social media sites where news is shared could automatically check for recognised signatures for major outlets and give it a checkmark or something. The signature could be easily removed but video without the checkmark would then be suspicious. It would also be useful if they added signed timecodes to frames so it could be checked if the video has been edited.
Deepfake models are trained on very similar data. They don't generalize well, usually. E.g. we take lots of data from YouTube videos of a single person under a specific condition (same time, same day, same haircut etc.)
I know that as I spent quite some time researching these models and worked on a deepfake detection startup. Purely looking at it from a technological side, it's a cat mouse game. Similar to an antivirus software. A new method appears to create deepfakes. A new detection method is required.
However, we can also make use of the models to not properly generalize and their limitations of the training process.
Anything that is out of distribution (very rare occurrence in training data) will be hard for the model:
- blinking (if the model has ever only seen single frames it will create rather random unusual blinking behavior
- turn around (as mentioned by the author, side views are rarer in the web)
- take off your glasses
- slap your cheek
- draw something on your cheek
- take scissors and cut a piece of your hair
The last two would be especially difficult and funny (:
Looking at how fast dall-e is improving, and how it "understands" concepts even if you mix them in crazy ways, all of your later examples seem solvable in less than a decade.
But I don't know much about ML so I might be wrong.
I tried using NoScript a long time ago and felt it just broke everything and I didn't want to start whitelisting every site I use just to make the web usable. uBlock Origin is good enough for blocking ads and trackers.
It does not provide a cure for cancer, because its job is simply to inform the user suffering an issue that the experienced issue is not strictly on the server side but depends on the client side. Which appears duly, as a first step in troubleshooting. It informs the user that there are configurations that avoid it, so the user has an orientation to possibly check for alternatives, assess and reconsider local configuration. More details (works on this and that) are avoided for obvious privacy reasons.
In the new titles I saw an "is site s down": the information "works here", which you seem to be calling avoidable, is in fact a basic troubleshooting step to reconstruct where the issue is.
Serendipitously over the weekend I was thinking about a future where for key sensitive data access (e.g. production main) you may need to have a quick 5 minute call (4th factor, "3D verification") where you would be asked to turn on your camera and be asked to answer some simple question and in different positions...
Main thinking was how out of control it would get, it would probably end up looking like anti-cheat systems where its a constant cat and mouse game due to growing sophistication of deep fake models.
in the UK this is already relatively common for online banking. I asked my bank to raise my daily transfer limit the other day for a property purchase and part of the process was recording a video of myself in their app.
And, of course, as barriers are raised it makes it very difficult for some portion of the population (and less convenient for everyone else). I have to change addresses for an older person at a couple of banks and I'm sure it's going to be a nightmare.
That being said, if instead of "use our custom app you've never seen to record a video" it's "just talk to a person with some standard video chat" then maybe it makes things a whole lot easier? But I don't see that being how it's implemented these days...
Yeah. It's still going to be a barrier for some people but I'm guessing most could get comfortable with it if they were forced to. But getting my dad to do anything that isn't a voice call is pretty much pulling teeth. (Except for using Amazon. I think a lot of things are more don't want to than can't.))
I assume the other problem is that public key infrastructure doesn't exist in a lot of places, whereas (almost) everyone has a webcam.
I had the same thought as many on this thread: all biometric identification is basically an arms race that moves along as new ways of gathering biometrics become convenient and ways of faking them are developed. But as you say, yubikeys also have problems. At some point it will probably be a hybrid, e.g. require a known acquaintance to digitally sign a video where you appear together.
"Hey! To make sure you stay secure, we require a short video. Please look straight into the camera and tap the screen."
"You look great! We just need you to blink 5 times, and you're almost done!"
"Almost done! Just show us your best side and turn your head to the left like shown above."
"Of course, you only have best sides. Just turn your head to the right like displayed above, and we can continue."
"You've almost got it! Please open your mouth and show us your teeth."
"Wow, look at you go! Just one step remaining: Tilt your head to the right like shown above."
"Now, to complete your verification, hold your national ID beside your face. Make sure it does not obstruct your head! We need to be able to see your pretty face!"
(Tongue in cheek, of course. But my banking app actually uses this kind of language, even for verification stuff, and I don't like it :D)
I think you also need to add video of occluded areas, so backs of ears and nostrils too. Shouldn't be too invasive but you have got to do this so you don't get deep faked.
It's probably not obvious to many that there's nearly a limitless source of training data on social media at this point. Your comment is eerily prescient and now all trends can become suspect as being a plant for additional training to circumvent, well, known circumventions!
Probably a more robust test would be asking the caller to run their hand through their hair a few times. Maybe you could pre-render a few samples, but it would be trivial to request the person pass their hands through their hair in a specific way, or simply do it again after their hair is already messed up a bit from the first time. It could still be defeated by the caller having the same hair style (or wearing a good wig) as the person they are imitating, but then making someone look like someone else with practical effects has been a thing forever and it has not been a huge problem.
That would have trouble passing anti-discrimination requirements: disability (no hands), medical (bandanna covering cancer treatment hair loss), religious (burka, rasta, yarmulke, sheitel), racist (cornrow).
And trouble with: dreadlocks (can’t run fingers through), bald headed guys (as mentioned by sibling comment), and people with hairdo’s (coiffure, hairspray, topknots, plaits etcetera).
It doesn't need to be literally their hand through their hair, it just needs to be some action which is easy to perform but complicated to photo-realistically simulate in real time from an arbitrary starting condition. Have them tug on their clothes to see how the fabric moves, have them or a caretaker turn a nearby light on and off such that their illumination changes, etc.
why wouldn't it still work? Hands in front of faces are already a huge problem for live deepfakes, wether or not the faker or the faker are bald shouldn't make this much easier. The only scenario this wound't be extra difficult for is if both the faker and the person being faked are bald, and even then the presence of a hand will likely cause some artifacts.
> ...,we need to consider the high availability of data for notable Hollywood TV and movie actors. By itself, the TV show Seinfeld represents 66 hours of available footage, the majority featuring Jerry Seinfeld, with abundant profile footage on display due to the frequent multi-person conversations.
> Matt Damon’s current movie output alone, likewise, has a rough combined runtime of 144 hours, most of it available in high-definition.
> By contrast, how many profile shots do you have of yourself?
Perhaps could ask the caller to perform some other interaction that would be difficult to fake, like drinking a can of Mountain Dew. Maybe make them sing a jingle and do a dance...
I remember someone posting a chat thread from one of the more advanced AIs within the last few years wherein they asked it who the president of the US is, and it was not able to answer.
Interestingly, this is a question my father would ask patients as a paramedic who was trying to assess people's consciousness. Another would be, "what day of the week is it?".
I'd say that these technologies are just like magic - they can seem to do things that defy your expectations, but oftentimes they fall apart when looked at from a different angle.
For current mainstream text generation models it doesn't really depend on where the bot is and what time of the day it is, that's kind of the whole point - their text generation process simply doesn't use the current time as a possible input factor, these models would provide the exact same result (or random picks from the exact same distribution of potential results) no matter when and where you run them.
They would be expected to answer with something matching the day/time distribution that was represented in the training data they used; like the answer to various prompts of the "current president" question is dominated by Trump, Obama and a bit of Bush and Clinton, simply because those are the presidents in the training data and the more recent events simply aren't there yet - like the many models who have no idea how to interpret the word 'Covid' simply because they have been trained on pre-2020 data even if the model was built and released later.
Many contexts in which the president is named in training data are political. And nobody's going to put a chatbot on the web without filtering out political material
The point isn't to check if they actually know - it's to gauge the response. If they say "I don't know" that may be a valid answer, but if they say "George Bush" then something is seriously wrong.
Also, if a human has to be told a basic fact they'll generally provide an indication of embarrassment or an excuse or "why are you asking me these questions", not try to continue the conversation with interesting facts...
Here's what Meta's blenderbot replied with: "The Tokyo tower is taller than the eiffel tower. Interesting facts like that interest me. Do you know about it?"
I'm not surprised that it responded with a random unrelated fact, but it is funny that the second sentence is incredibly awkward, and the last one isn't really coherent English.
Just a total AI meltdown from one simple question.
For me, I call these Eliza-isms, since it reminds me of its simple formulas like "Can you tell me more about ___" that people got so much mileage out of.
I think you're on to something. The modern day chat-bot/answer engines seem very susceptible towards trying to answer fact-based, yet obviously incorrect questions. They seem unable to parse the entire question and instead focus on the most generic terms. For instance, the "What year did Neil Armstrong land on Mars?" example that shows up on HN from time to time.
It sounds like part of this issue is that it loses tracking if it can't see both of your eyes, which of course could be defeated by using a couple of cameras spaced at 45° to one another and calibrated to work together in some way.
Instead of a "deep fake" face swap an attacker could send virtual video from a fully-virtual environment using something like an nvidia Metahuman controlled by the camera array. I think that would be pretty easily detectable today but maybe less so with an emulated bad webcam and low-res video link. The models/rigging are only going to improve in the future.
The classic "Put a shoe on your head" verification route would still defeat that, at least until someone invents a very good tool to allow those types of models to spawn and manipulate props.
Here we go again... there's a rule that describe this situation where a measuring matrix becomes the standard, the said matrix is no longer indicative.
But please, I don't want to be pointing to a random bus outside my window to prove that I'm not a robot/deepfake...
The degrade of news article quality > the degrade of fact-checking journalism scrutiny > the degrade of written article quality > people rather watch live stream event than reading > degrade of live stream event trustworthiness because of deepfake...
What's next? Heavily scrutinised journal articles which runs check on videos with anti-deepfake AI-based algorithm?
> Arguably, this approach could be extended to automated systems that ask the user to adopt various poses in order to authenticate their entry into banking and other security-critical systems.
This approach works until it doesn't. How long before deep fake can handle the 90 degree profile scenario? Not saying its not a valid approach but you'd have to consider the time it takes to implement these other checks and then the time we expect deep fakes to improve in this scenario
I wonder how much work would it entail to swap one actor's face for another's in a movie. Just finished watching Fury Road, and Tom Hardy just feels a bit off to me.
That's "bread and butter" work in VFX. I used to be a stunt double actor replacement specialist. These daze, ML enhanced tools make the work for a face replacement shot exponentially faster and easier - as is needed for the huge number of superhero stunts insurance companies will not let the stars perform.
Each film he does one very public stunt for real; for the other stunt shots he does them as well, but not hanging off the side of a building or whatever, he's 5 inches off the ground with a landing pad in case someone slips. Remember that whatever is in the background was probably not there when the sequence was shot; background replacement is another "bread and butter" effect.
I was recently looking for designers for my company when I came across an interesting profile on Dribbble. I reached out and quickly scheduled a time when we could talk over zoom. At the meeting time, in comes this person who seems to have a strange-looking, silicone-like face. I was using my Zoom account (I rarely use other peoples zooms unless I trust them), to avoid situations like this. One thing I noticed is that when the candidate touched their face, their fingers would appear to sink into their skin - almost as if it were made of liquid. Secondly, their face appeared larger, lighter and smoother than their neck. I got spooked an immediately let the candidate know that I was not comfortable moving forward.
More interestingly, what exactly are them mechanics of getting a deep fake into video call? How is it possible that a what seems like a deepfake could make its way into my Zoom? Is Zoom enabling external plugins that alter video details?
It’s fairly trivial to have a virtual camera source and point Zoom to that as it’s input. It has nothing to do with integrating deeply with Zoom or getting “into” your Zoom. Check out Snap Camera[0] for an example.
I do very much hope that you told the candidate what spooked you. Ideally, you would have done this early in the interview, giving them a chance to disable any video filtering / face-beautifying software that they may have been running.
If you didn't do either one of those, perhaps you now know enough so that next time you will be able to give the interviewee a chance to demonstrate whether or not they're using a "Smooth over my facial blemishes because I'm uncomfortable with how my face looks and want it to look 'prettier'." filter.
The live-streaming software OBS has a “virtual webcam” feature that can make a generated video feed behave like a hardware webcam. Perhaps something similar is being used to feed generated video into zoom?
Input for software can be anything. Camera feed can be a generated one and the software consuming it doesn't have to be aware it isn't a real physical camera.
Admittedly, I use it, but I have it set pretty low. My face isn't lit up very well, and without it, in my webcam, my skin ends up looking a lot rougher than it really is.
If I set it to the max, then it just looks like a blurry mess.
Things like OBS (streaming software) can create a virtual camera. I am guessing its something like that where Zoom does not even know the camera is not actually real hardware.
This is pretty funny. We're going to run the entire gamut of different verification technologies for them all to become compromised, forcing us to return to in-person transactions for everything.
Does this mean any possible audio or video a real human can do, current ML can fake with enough quality of data? Like there's no possible test a real human can do which can't be faked, given the relevant data?
Heh, at some point I'm convinced that we'll use both:
* customizable 3D avatars
* customizable voices
to communicate in meetings and in communities (VR Chat style). So the origin won't be associated to your avatar or your voice, but it'll be associated with your account (like in good old chat).
An audio prompt like 'Using your <right | left> hand, repeat the numbers that I am signaling. Use <a different | the same> set of fingers from what I am using'.
I had a call with a Polish government last year to get access to one of the government portals and they were asking me to move my head to the side and also move a palm of my hand very slowly in front of my face.
Side thought: I really enjoy how similar some of the suggestions (in TFA and comments) resemble reality checks for lucid dreaming. In general Observing something and asking oneself "is this really how reality behaves?" Which is such an interesting question itself to question the nature of reality beyond our own initial perceptions.
What's with everyone playing who-wants-to-be-a-gymnast by saying turn side ways, put a shoe on your head, do a backflip, move out of frame.?
Encrypt/sign the feed, watermark the images with a QR code containing the sig, have an app on your phone with their pubkey and display a big green check when it matches. every pure deepfake attempt is now easily dealt with. Boom.
Patiently waiting for the government(s) to step in and start providing a modern ID service - a driver license with a built in private key, a fingerprint unlock, and a PIN.
The combination of the three can still be defeated by someone following you, stealing the card, lifting fingerprint from a glass, and spying the PIN, but that’s a lot of trouble to go through and online identity fraud will become extinct.
There is a tried and true method for making sure someone sending you pictures is real. Ask them to write today's date and a random word on a piece of paper and hold it up next to their face. People have been doing this on the Internet for ages. It doesn't authenticate them but it does show they are actually producing the pictures.
You don't need to do silly head movements. You could send the other person and email with a password, or a text, or a signal message or ask where you last had a drink together or...
If you are concerned that all methods of communications are compromised you wouldn't suddenly trust zoom if they do some silly head movement.
I suppose this turning sideways trick will work until it doesn't.
I do appreciate everyone on this site contributing to my knowledge of infosec. I don't work directly in the space, but I feel the contributions on this site help educate those of us not directly working the profession.
Sometimes, I wonder if us humans even know or care that we are taking things too far. I am all for progress and going beyond, but deepfake and all these other recent AI developments are taking us to a dystopian future which I am not super hopeful about.
If you really want to verify the other end and if asking them to do something is allowed, you can ask them to any number of things, isn't it? The key is to not turning it into a protocol. That will ensure it gets built into the software.
Media forensics algorithms do work on various forms of rebroadcast, transmission, and compression, so yes this should be possible (for now). look up darpa medifor project. Siwei Lyu (in the article) did a bunch of work in this space. Also see Hany Farid and Shruti Agarwal. They've worked specifically with deep fake detection.
Signed up for one of those 'neobanks' (that don't have physical branches) and part of the signup required me to turn my head sideways. I wondered why they wanted me to do that. Now I know.
Ray Kurzweil: "The day it starts working, we're doomed". Reality: "We got convincing front-facing deep fakes! Sideways? Don't worry, it will be ready in just 24 months!"
Adversary keeps camera off (bad hair day, broken web cam, low bandwidth, etc). Now how do you verify their identity? (Hint, the same way you would when the camera is on.)
I had an Indian sales rep for a deepfake filter, it creeped the hell out of me when the voice totally did not match up with the pasty white Irish face.
We can already resolve the interframe wobbles that happen due to the inability to preserve spatial invariants on static meshes allowing for dynamic camera motion.
Of course, the spatial invariants of meat-suits in motion require an understanding of volumetric structure, and not just restricted depth surface meshes.
But it's not some unencodable computational enigma.
deepfakes done in a unethical way is a real threat indeed. This paper shows how to identify some of them.
And metaphysics.ai are doing something a bit different. Let's wait and see.
People asked stuff like this 15years ago (do bunny ears on yourself or pretend to pick your nose). Usually to see if the other person is catfishing with a prerecorded video. It usually happens if the other person types instead of speaks (because it's "late" and people are sleeping)
The only thing interesting about the title is the possibility of real time deepfakes for calls. If it's not realtime then 15years ago called and they want their technique back
This is a current limitation, and an artifact of the data+method but not something that should be relied upon.
If we do some adversary modelling, we can find two ways to work around this:
1) actively generate and search for such data; perhaps expensive for small actors but not well equipped malicious ones.
2) wait for deep learning to catch up, e.g. by extending NERFs (neural radiance fields) to faces; matter of time.
Now, if your company/government is on the bleeding edge of ML-based deception, they can have such policy, and they will update it 12-18-24 months (or whenever (1) or (2) materialises). However, I don't know one organisation that doesn't have some outdated security guideline that they cling to, e.g. old school password rules and rotations.
Will "turning sideways to spot a deepfake" be a valid test in 5 years? Prolly no, so don't base your secops around this.