Using images with a restriction to non-commercial purposes is a bit more of a gray area, depending on how you separate commercial from non-commercial activity. Since they share the data set with researchers at other organizations (presumably including competitors), I'd consider it non-commercial enough, because they don't gain a competitive advantage, but the details might have to be fought out in court.
That said, from the article you have this: "NBC News obtained IBM’s dataset from a source after the company declined to share it, saying it could be used only by academic or corporate research groups." which tells me that IBM has restricted distribution to non-commercial activities, and this "To build its Diversity in Faces dataset, IBM says it drew upon a collection of 100 million images published with Creative Commons licenses that Flickr’s owner, Yahoo, released as a batch for researchers to download in 2014. So they started with a data set that someone else had re-licensed for this purpose already. (That would be Yahoo!)
Back in the day, Yahoo!'s terms of service for Flickr were that you gave Yahoo! its own license to your work and could specify the license that others got if they downloaded your work. So I can imagine that it is entirely possible/legal for Yahoo! to exercise their rights under that ToS to relicense the photos how they saw fit (and remember Verizon/Yahoo! was trying to make the asset as valuable as possible, and this would contribute positively to that effort).
I expect that somewhere someone has sold the old classmates.com archives of images and/or digitized a few thousand yearbooks for images. It is not too hard to find sources of hundreds of images where the include a head shot, are all equally lit, and have a small number of backgrounds to remove to leave just the facial features.
As a few HNers are no doubt aware, IBM allegedly obtained explicit permission to use Douglas Crockford's software for Evil rather than Good, because it couldn't guarantee that its customers aren't doing Evil.
On the one hand, US Copyright is very explicitly attached to a piece of work, not the facts or ideas contained in it. Per 17 USC 102:
> In no case does copyright protection for an original work
> of authorship extend to any idea, procedure, process,
> system, method of operation, concept, principle, or
> discovery, regardless of the form in which it is described,
> explained, illustrated, or embodied in such work.
You are therefore free to create works of your own that analyze facts contained in others' copyrighted works, comment on their ideas, and so on. This is always true if you don't include any of their copyrighted work, and often true, via fair use, if you only include the small pieces needed for your commentary. Accordingly, it seems pretty clear that you could analyze a huge collection of copyrighted portraits and do whatever you want with the results (distribution of hair colors, eye positions relative to nose, how these vary within/across individuals, and so on).
The counterargument would appear to be that derivative works include the "abridgment, condensation, or any other form in which a work may be recast, transformed, or adapted." Neural networks do seem to store their training data, at least in some form, and there's a fuzzy line between extracting some facts from data (which is fine) and omitting data to create an abridged version (which isn't).
I wouldn't want to bet much either way, but I do think it would be a little odd for copyright to limit how much you can "learn" from something, either manually or via machine.
I just spent about 15 minutes trying to confirm that and... I have no idea. I suppose it's not surprising that a software engineer would not be able to suss that out in 15 minutes. Every definition I find tends to focus on on art, visual and auditory creations.
Disregarding a legal interpretation (you know, the thing that actually matters), I can see it either way. Certainly the model is based off of data derived from the characteristics of these images. On the other hand, if I saw e.g. a shade of blue in one of these images that I liked, would I need to provide attribution if I measured it and used it in my own work? I have no idea, I suppose I'm just thinking out loud here. I do understand the taking something to a logical extreme (the color example) is not the end all be all of legal arguments.
But I'm not a lawyer, so no idea how the interactions are in that situation. The closest analogy I could think of are sampling in music: how much must the original works be atomized before they don't count anymore?
Not a lawyer and I bet you could get a case to go to court arguing otherwise, but this is my guess on what the result would be.
I suppose that even if you consider the neural network as a black box, you can generate images that bear some resemblance to the training data in some indirect way. For example by walking along a gradient of the output with respect to changes in the input.
First, you can't copyright a color, period. You can trademark a color, but that trademark only applies to a very specific use of that color. For instance, you can paint your house or non-delivery-service business in "UPS Brown" without fear, but you couldn't use it in conjunction with a delivery service.
The purpose of trademark is to eliminate customer confusion, where they may think they're doing business with one entity when in fact they are doing business with another. Non-confusing uses of trademarks are legal.
A paint store can legally mix paints with those colors, unless you've told them that you're going to be using the color for a trademark-infringing purpose.
Stores may voluntarily decline to mix trademark colors to avoid even the possibility of a lawsuit, but it isn't legally required.
The problem of treating a neural network as derived work is exactly why IBM said they wouldn't use the dataset in their products. Instead, they'll likely train various different networks, note which ones performed best, write a paper about it and throw the trained networks away. So long as they do that, they're not infringing on anyone's copyright.
The CC non-profit was set up in 2001 according to Wikipedia, yet people were going things like face detection in the 80s which would have required using images of faces
Sure, there were face detection projects before 2001, with much lower quality results, and maybe not even familiar to the creators of CC.
Controlling a fetus' DNA is just getting ramped up, but of course there has been a good deal of academic knowledge of the possibilities before hand. Do you think there may be some laws or contractual formats worked out in the last few years that might apply in the area, yet have not adequately taken the implications into consideration?
Time's arrow being what it is I expect there are.
Programming is often considered an artform. It's just that these particular artists are palpatine's personal carbonite sculptors. Still artists.
This is a real problem, made worse by the fact that the internet had pretty well trained everyone to avoid reading licenses.
This article is clickbait, it's attempting to inflame readers by misinforming them and/or feeding common misconceptions. As journalists we should expect more of them. I do. We shouldn't allow ourselves the luxury of accepting "that's just the way journalism is now". It wasn't always this way, and it doesn't need to be.
I beg of readers: know when you are being played to raise agendas based on false premises. The author just wants to stir up the public so they have more to write about later. If by some chance the author believes anything they wrote, then perhaps NBC should consider moving them to the obits.
The results were terrifying and they really affected me in a very negative way. Needless to say, we moved on to other projects. The world is not ready for the harm such technologies can cause.
Our index contained 100+ million faces and the compute costs were obscene.
It would take a random new photo and give you 3 or 4 likely matches.
Tineye only matches already known photos. Google/Bing image search will abstract a given photo and show you more of the same type: eg more white men wearing red shirts, rather than identifying the person and showing only photos of that person.
Google has access to almost all credit card transaction data so that should be easy.
Perhaps the specific use they're being put to isn't covered by the particular CC licence of each one, but until that is someone's claim I don't see this is quite the issue it's portrayed as.
These include just strangers on the street - anyone who doesn't consent to public photos. There are thousands of lovingly censored faces in her photos.
If everyone had done this I guess the situation would be different...
All of them did, however, use WhatsApp to share the photos with each other. <facepalm>
I believe the mechanism is to use a generic / per message key to encrypt each individual message, and then use each recipient's public key to encrypt the encryption key before sending that (along with a link to the cipher text) to each end user.
This is also why sending a video the first time takes forever (encrypt, upload, encrypt key, send key, send link), while forwarding it to another person (encrypt key, send key, send link) does not.
Whatsapp is end-to-end encrypted, so they're not connected data-wise.
Your Messages. We do not retain your messages in the
ordinary course of providing our Services to you. Once your
messages (including your chats, photos, videos, voice
messages, files, and share location information) are
delivered, they are deleted from our servers. Your messages
are stored on your own device. If a message cannot be
delivered immediately (for example, if you are offline), we
may keep it on our servers for up to 30 days as we try to
deliver it. If a message is still undelivered after 30 days,
we delete it. To improve performance and deliver media
messages more efficiently, such as when many people are
sharing a popular photo or video, we may retain that content
on our servers for a longer period of time. We also offer
end-to-end encryption for our Services, which is on by
default, when you and the people with whom you message use a
version of our app released after April 2, 2016. End-to-end
encryption means that your messages are encrypted to protect
against us and third parties from reading them.
^ c/pasted to remove horizontal scrollbar
I thought WhatsApp is e2e encrypted,
edit: guess not.
> WhatsApp end-to-end encryption ensures only you and the person you're communicating with can read what's sent, and nobody in between, not even WhatsApp.
Is your claim that Facebook is plainly lying about this? That would be a a pretty high-risk thing to do, even for them: their usual MO is to cover their abuses in a couple layers of legalese and deniability.
Its a concern to you but not them.
If I scrape millions of photos from Facebook (including yours) then train a differentially private model that can extract features from a new face, is that a privacy violation?
A differentially private model is one in which you cannot identify the inclusion of any single datapoint which means you cannot tell the difference between a model trained on the dataset and the same model trained the same dataset with the addition of your one datapoint.
You might argue it’s a privacy violation because the scraping process might involve people looking at your images but if that was fully automated and nobody ever looked at your images - the model can be trained then the data inmediately deleted...
Facebook don't want you to create separate accounts for different purposes (i.e. one just for marketplace). So they aren't going to optimise the product around your use case.
I don't know what the future is going to look like, but, man, we're going to be going through some shit to get there.
The moment someone else puts up a picture that you happen to be in (esp. family, friends) and then tags it with your name -- your image and name be scooped and cataloged.
Even that's not wholly necessary required. If they can obtain data from your own Facebook app (or an app using the Facebook SDK), it can place you in that area around the time the photo was taken and given it's friends you have connections with on Facebook, it's easy enough to surmise it's you without the necessary confirmation.
Seems very Orwellian, to be sure, but not out of the grasps of the end-goals of data harvesting/profiling.
Not for me, as I don't have a Facebook app installed, and I firewall off all outgoing traffic just to make sure no apps are phoning home without my explicit permission.
Regardless, I think that it's a fallacy to say that because it's impossible to defend yourself perfectly, then it's not worth defending yourself as well as you can.
I think there should be no problem to scrape other social networks like Facebook, Instagram or Twitter from the countries where there are no legal restrictions and the photos are considered to be a "public data". You can outsource face recognition tasks to such countries.
I'd say there are lots of unethical use cases but also a few ethical use cases of such a trained model.
But isn't that like the point of a facial recognition algorithm? Recognizing individuals by their faces? Presumably from a reference image that has a name?
Also it seems pretty trivial to reverse lookup the images if they were from a public source and some of those will have names, unless they are significantly downsampled.
You can use faces without names attached to improve the engine's modeling for recognizing human faces in general (and more importantly: improve the system's ability to distinguish human and animal faces).
(Your first comment is pretty interesting by itself, incidentally: both NBC News and your comment make an assumption that is not universially true about the technology. Face recognition is a much wider space than "recognize an individual by their face." Clustering of similar faces, emotion analysis, camera targeting, human presence / absence can all be done without name labels).
This is written as if algorithms were sentient beings overcoming the next level of obstacles, rather than just being written by mostly white men who train them on photos of people who mostly look like themselves.