Hacker News new | past | comments | ask | show | jobs | submit login
Making Visible Watermarks More Effective (googleblog.com)
292 points by runesoerensen on Aug 17, 2017 | hide | past | favorite | 96 comments

This reminds me a bit of the common argument for locks: it keeps regular people honest. Watermarks are designed really to deter you from casually copying an image and pasting it on your site, unless you don't care at all about a watermark showing (e.g. a blog site with a few followers).

I'm sure a mixture of the original computer vision technique plus some smearing on the original image could wash away even these randomly perturbed watermarks.

Just with many other security matters, you will always be trying to stay one step ahead of someone attempting to circumvent protections. So really the best protection against someone removing watermarks is to file a copyright infringement against the infringing party. DMCA (fixed acronym order :]) is a powerful tool in the USA.

In either case, this was a neat article.

This is exactly right, and is also what keeps the casual thief from doing things like copying software. Obviously these can be cracked, but the layman isn't going to deal with, or even know about it. I don't want to get into the "well those people wouldn't have paid for it anyhow" argument. I firmly believe, as much as a hassle as it is, it's preventing proliferation to the masses. Not much to complain about really, we did it to ourselves.

Same can be applied to locks on a house. Prevents someone looking to have fun, from just walking into your home. Otherwise it's too much effort.

Regarding reporting. I've submitted various copyright claims to Instagram for others using my photography, and they responded quickly. It was a smooth transaction.

I think there's this assumption in some quarters that because there is a tradition of very intelligent people, such as MIT alums, knowing how to pick locks that it's essentially a universal skill. I think it also grossly underestimaes the low intelligence, drive, and education of most people who are breaking into a home of an evening.

These are people who are most of the time looking for the open window, the unlocked door, or some other sign of carelessness which they can exploit. Like most predators they're not looking to take down the strongest or the fittest, they're not even looking to take down the average; they're looking for the weak, the sick, the elderly... and they are the biggest victim groups along with criminals themselves.

If you didn't have locks, then most people could break into your house without leaving any indication that the home had been broken into except that they had stolen some of your stuff. A lock means that most people have to either crudely force the lock, break a window, break down the door, etc.

The ease (or lack thereof) of picking locks isn't the issue. Forced entry isn't very hard for many (possibly most) homes. As a person with a home, I guess it's nice that forced entry makes it obvious that an unauthorized person has been in my house, but missing belongings would be pretty obvious too.

Missing belongings are obvious to the homeowner, and if that homeowner has insurance including those items named specifically, and your insurance company believes your story. As it stands now broken locks, broken windows, broken doors also serve as evidence that a crime was committed. At that point your insurance takes over if you have insurance, otherwise that's pretty much it most of the time. I don't think you'd like to see what would happen if you try to sell to police and insurance the notion of bandits was only sign of being in your home was the absence of some of your belongings.

It's pretty hard to fully protect in the first place because someone desperate enough can always inpaint (aka content-aware fill) over the watermark. Project Naptha has a pretty nice web demo showing how well that works for erasing text from images: https://projectnaptha.com/

Off-topic: Can you fill out your profile? I am not active on HN but I sometimes go by the initials fta elsewhere

Whenever they tell me my smart lock can be hacked I respond... "yeah, and someone can also just take a brick, smash my window, and go through that way."

You'll notice the broken window. Your insurance will see the evidence and pay up. Much harder to prove somebody messed with a smart lock.

Security works best in layers.

My origional comment might be a bit simplistic because In my case in order to not be caught they would have to hack not just my lock but also have to hack my security system network (separate from my home Wifi) and disable the alarms and camera.

But you're right not everyone has layers.

Although one could argue a traditional non-smart lock is not hard to "hack" and leave no evidence either.

Think about this: What if unsafe web-based locks become more prevalent? What if someone builds a "brick smashing service"? Install the app and run it while driving through $neighborhood until a device is detected. The app comes with filters and only shows locks that haven't been unlocked in the last three days. It shows you the probability of 'someone is home'. You press the 'unlock' button, pay $some_amount in bitcoins and the service unlocks it for you.

Can you see our point now?

But a malicious person in Guatemala would have to first fly to wherever you live, then throw a brick individually through each window they wanted to break. With smart locks, that same individual can throw tens of thousands of bricks through tens of thousands of windows without even leaving their desk.

But what good does that do if they can't then walk through the door?

Also, most smart door locks are operated over a protocol called ZWave which has a limited range. When people talk about smart lock hacking they are usually talking about RF spoofing of the ZWave signal. You can, of course, hack the bridge device if the bridge device is on Wifi.

Uber, but for hackers in remote parts of the world to connect with someone who will go into the building who's smart lock was just hacked and steal the stuff.


But the practice has somewhat backfired. Some members of the general public now erroneously believe that any image lacking a watermark is a public works.

For one specific anecdote, a local popular bar lifted an image of their establishment found on Flickr and used it in their marketing materials (posters mostly). When contacted about this by the photographer, the establishment claimed due to the lack of watermark and being their establishment they were well within their rights to use the photograph. They eventually agreed to stop using it as a "gesture of goodwill" (but never paid for it, and weren't sued as the cost of even small claims was likely higher than the damages).

You read stories like this often if you hang out on photography forums. The general public seem to conflate a watermark with copyright and a lack of watermark with a public works. Basic copyright needs to be taught in schools.

It's odd that, in this day and age of ubiquitous and cheap yet very good cameras, the owner or one of the employees didn't just take the photo themselves.

As some one who both owns a DSLR and has a family member with a small business I can tell you very definitively that owning even a good camera and knowing how to take pictures that can be used in marketing materials are two very different things. (I even watched a few photography YouTube channels.... Lol)

Probably just started using a different picture without permission either :/

Adding to this: the obvious problem with relying on a security system that is difficult for the "casual" exploiter but not for the dedicated exploiter is that it only takes one dedicated exploiter to develop and distributes a tool that casuals can easily use, it's game over. Each casual doesn't have to crack their game's copy prevention, remove watermarks, or discover their DVD's encryption key. They just have to all sit back and wait for the first pro to do it and release a tool. I've worked in software companies where I've questioned the efforts we expended on copy prevention, and it's always "yea this is to deter the casual copier", but it's futile because tools.

>I'm sure a mixture of the original computer vision technique plus some smearing on the original image could wash away even these randomly perturbed watermarks.

Absolutely. When I did graphics for a local TV news station, I got very good at quickly cloning away the watermarks on stock photos. Visibly watermarking an image is, at best, a nuisance that deters the lazy.

It seems the problem here is that the watermark is transparent, and thus still contains information about the original image.

Similar to how if you want to censor a part of an image, you should always use a single solid color because e.g. a blur can be inverted.

But I guess that would degrade the quality of the watermarked photo too much.

People do sometimes use small, non transparent watermarks.

If they are small enough though, then the person copying just overlays their own. Or uses automated inpainting to approximate what's underneath.

Example of content aware fill for anyone that hasn't seen it https://www.youtube.com/watch?v=NH0aEp1oDOI

The watermark has to overlay the important parts of the image (otherwise, overlaying your own watermark, or simply cropping it out, becomes trivial). It has to be transparent, because the purpose of having the image available in the first place is that potential customers must be able to evaluate it for their purposes before purchasing a license for it. Slapping a solid color bar across it makes that quite difficult.

I wonder if this approach can be combined with "deep photo style transfer" to make a watermark that is clearly visible to human eyes, but is specifically permuted and adapted to the target image, in a way that it both appears to "belong" in the image (and thus is less disruptive for the legitimate purposes of evaluating images for suitability), but also destroys enough of the original image data to be impossible to remove without significantly and visibly altering the image?


Honest question:

Recently I took a 6 weeks vacation. One of my cameras had a dust particle that resulted in several thousand pics having a semi translucent watermark-like impression.

I think that the process shown here to defeat those watermarks would be ideal to batch-correct my pictures. (As of now, I have to manually correct the ones I love and leave the others as they are). Does anyone happen to know a tool that would allow me to do something like that?

As I remember, the same thing happened in one of my friend's wedding. The photographer they hired had one of his lenses with dust, spoiling a lot of pictures... It would be a really nice tool for those situations if they release that code.

Maybe having a few evenly, brightly lit photographs would be extremely helpful.

Absolutely, that's a great idea.

The researchers identify the watermarks to be removed by finding pixel patterns that persist across a large number of images, as shown on this animation: https://1.bp.blogspot.com/-cJwNoUxIBzM/WZTDpw3ru6I/AAAAAAAAB... . Their proposed solution is to randomly warp the watermarks.

Unfortunately, their solution could be quickly defeated with image-to-image generative adversarial convnets trained to... remove watermarks from image pairs. (That is, instead of training a model to change, say, image style or resolution, train it to remove artificially added watermarks.)

These seems likely to me too, but until someone demonstrates it, I'm less certain. GANs are notoriously unstable and it is still quite hard to produce useful models with them.

Sounds like we need an adversarial convent for creating watermarks.

Seems like it is saying the following:

We can find the watermark in images and subtract it from the image. If we distort the original watermark, but subtract the average watermark, then you will not recover the original image.


The point of the article is that, for sites with a huge image corpus, if they apply some randomization to the watermark for every image, it's much more challenging to recover a un-marked image.

Right now, sites like shutterstock and other image sites apply the same watermark to every image, making it quite easy to computationally extract the watermark, and then apply the inverse transform to marked images.

If the watermark is permuted on a per-image basis, it becomes much harder, since you can't extract the watermark from a single image.

I wonder what would happen if I took 1000 copies of the same image that is using a warped mark and ran it through the algorithm....

The algorithm is calculating the mean of the set of images, so putting 1000 copies of the same image will do absolutely nothing

Couldn't I run the result through Photoshop with some "Content Aware Fill" on the remaining spots?

i mean to remove the watermark on that specific image

The idea is to use a randomly-warped per-image watermark. So in theory, there's exactly one distinct watermark per image, so 1000 of the same image would have the same watermark.

It would be easy to change that hash by modifying one pixel per image (pick one that is unaffected by the watermark).

I'm not sure what you're suggesting. Change what hash? Of the image? I can't think of how that could help with removing a watermark.

jotato: I believe this can be defeated by tricking the watermarker in to generating 1000 distinct watermarks for the same image.

azdle: That won't work, because "there's exactly one distinct watermark per image, so 1000 of the same image would have the same watermark."

Kdparker: So subtle alter the image, to trick the watermarker in to thinking the image isn't the same, thus generating more distinct watermarks for the (effectively) same image.

If you have the original image, the problem is already solved. If you don't have the original image, then all you will be doing is submitting the watermarked image and having the site apply another random watermark to it.

They would probably use the same warp every time on a specific image, so that you can't average it out across multiple download sessions.

The algorithm takes a bunch of pictures, figures out what's in common between them, and then removes that part. If you give it only many copies of the same image, it will find that the whole image is in common between the images, and subtract out the entire content.

Surely the watermark could also be ugly and irreversible; ie clamp to white.

Nice. I wonder though, will the randomly placed watermarks distract the legit viewer, and affect their judgment of the images?

Example: When I go through Adobe stock photos, although I find the watermark initially annoying, I would quickly learn to "unsee" it in the next photos because I know how it looks like and where it is on the photo.

With varied watermarks, I'm not sure if the same mental technique can be applied. Shrugs, I may just be overthinking it.

I think the question is why would we need protective watermark anyway if stock photo companies are already crawling and sometimes phishing for use of licensed stock photography on the web and then directly send out an DCMA or a charge?

I've been in many situations where the copyright owners reached out for damage fees after downloading a full-res, un-watermarked photo from free stock photos sites in blog posts, so I'm sure the tech is all there already.

This is extremely time-consuming. It's enough to put smaller operations out of business. https://arstechnica.com/tech-policy/2014/09/one-mans-endless...

What if you put it into a video? Photoshop it into a meme? Drop it in to an internal presentation?

These are all situations where the copyright of a stock image owner is infringed and yet there is very little that automated processes can do to detect them.

What is the issue with dropping copyrighted images into internal presentations ?

If I put a copyrighted image into an internal presentation, I am still copying and redistributing this image. Without permission to do so, that seems like it's straight-up infringement. I don't believe it usually meets the criteria for fair use, either.

Sure, I do that all the time. But, two things to consider:

1) No one outside of the 5 people attending the presentation will ever know. The 5 colleagues following the presentation absolutely don't care where the images are coming from, as long as the point is clear and I'm speaking loud enough.

2) Redistributing ? Really ? If I'm sending a cat picture to my mom, my manager or my favorite slack channel, I'm "redistributing" ? Come on. It's not a publicly available blog post, it's my inner social circle.

I'm never going to pay $30 for the few images I used to make my presentation less boring. However, I can make a little effort and put a 12pt "credits" slide at the end (usually, no one care about).

Couple thoughts:

a. Copyright isn't about audience, it's about author. Ask if the person whose work your using would care, not the people seeing it

b. copying is copying every time, not just after the n-th time

c. credit acknowledgement is good, but it doesn't pay the rent for people who work to create content

d. a rough rule of thumb would be whether you're using art to support a profit motive. Anything that happens for your work would be considered in support of a profit motive.

e. "ask" in point a is not used in the figurative sense

To rely less on litigation and more on technology.

Additionally, DMCA is only valid within the United States.

Sometimes the problem is that you just can't know the copyright on an image.

I'll push for this again, https://github.com/ibudiallo/imgcopyright

Html has a lot of meta attribute, why not one for copyright.

how can the copyright be verifiable? If somebody copied an image, then put _their own_ copyright attribute on it, who becomes the responsible party when a lawsuit happens when that image is misused?

Bits have no color.

A much better place for such a tag IMHO is in the EXIF block. In fact, such a "copyright" EXIF tag already exists, but isn't commonly used or recognized on the web. Verifiability question remains though.

I wonder if some kind of a decentralized system could be made to automatically register published images, so that first registration could be reliably proven. Not that first publication equates to copyright, but that would solve part of the problem.

This is just adding a warping effect, and I’m quite sure that if this technology had already existed, then the same team at Google that did this research would have, with a similar amount of work, been able to circumvent this technique, too.

I mean, de-warping warped imagery is something that Google’s image stabilization software used on YouTube can already do very well. Adapting it for this purpose should be possible.

If the same watermark is added to each image, then you can estimate the watermark by averaging many images and subtract it to get original image back.

If each image has a slightly different watermark, then simple averaging won't work. Instead you need to come up with a model that describes how the watermark is changing, then estimate parameters of that model. The more complex the change, the more images you need for parameter estimation.

Image stabilization won't work here because it relies on large features and therefore won't be sensitive to relatively weak watermark signal. Besides, it's only stabilizing in three dimensions (pitch, roll, yaw) and won't help with warping within the image.

Averaging will still provide the average watermark given a large enough sample set. Which is then a single geometric transformation from the used watermark. Estimating that transformation for a specific image should be fairly straightforward as they could easily reverse a watermark that was offset. At which point it's reversible.

Not really, because in each image parts of the watermark will be in different places. You don't even need to warp the watermark to achieve that, it's enough to randomly reposition it.

It has crisp edges. If the same watermark is repositioned it's trivial to figure out the adjustment and remove it. Warping is harder but still largely doable by finding each part of the watermark's exact location.

Supose you have 4 points on a box. If you randomly increase or decrease the location of each point by some random value say +-3% then averaged 100,000 of those boxes, you get a smaller box(1) surrounded by a larger increasingly blurry box. This is true even if no single picture shows that single smaller box, but it will show up on the average.

(1) Rather than a true box the midpoints are going to be slightly buldging, but with a large sample set it's very close to a box.

Sure, you can get into cat and mouse games ever stranger geometry. But, the water mark is limited by how much it distracts from the image.

The article shows the result of this approach, and it clearly does not work well.

As I said you need a new step: "Estimating that transformation for a specific image should be fairly straightforward".

It does show a good average and simply moving the watermark was easily reversible suggesting detection of a watermark in an image was easy.

But wouldn't it require you to find how the target image's particular watermark was warped? None of the other watermarks will have precisely the same warping parameters as that image's watermark. You could attempt to de-warp the watermark in place on the image, but that would seem to be likely to warp the image as well, no?

you could de-warp the watermark in place, and this would warp the image but you could reverse the warp after you're done removing the watermarks.

straightening a plane e.g. a whole video frame, is a bit different from removing a random warp on a watermark (where the warp is more complex than tilting/angling the watermark), while its overlaid with partial visibility on a non-warped image

Heck, I was just impressed by the clever method of removing watermarks. (I mean, if you think about machine learning more than I do, it probably doesn't seem very clever. But I thought it was.)

I wonder if removing watermarks from images would be something a machine learning algorithm would be well suited for. Admittedly my understanding of machine learning is rather limited, but it seems like it'd be pretty easy to generate a large training set of watermarked and unwatermarked images for the algorithm to train on. No idea how effective it'd be though.

If you read the blog post, they link to their own research that does exactly that.

Could you show me where? I saw nothing in that article mentioning machine learning or deep learning specifically, and the paper the blog post is based on seems to use non-ML based techniques.

My assumption is that a deep learning algorithm trained on multiple different styles and variations of watermarked images would be much more robust to the sort of changes Google proposed in this blog post as a way to defeat their existing algorithms.

Not all machine learning is "deep neural networks". I think that's why your comment is being downvoted. The video shows specific model parameters that are learned for each watermark.

Yes, apologies for being imprecise in my wording. In this case I am indeed specifically referring to deep neural networks.

Add a warped new MD5 for each watermark and they'll never be removable.

Rather than an MD5, put an image ID number in the watermark. Then it serves two purposes.

Could we transform the watermark in randomized distances (X and Y) to avoid the subtraction? If the training on the pattern itself isn't as robust, the method starts to fall apart a bit more.

Why do they use transparent watermark????

It is not so easy to decide if the watermark covers an important aspect of the photo.

Yea, but that may be acceptable for eg previewing a photographer's work—you reach out for the originals.

Didn't Google invent this watermark defeating mechanism? So they're proposing protection against an attack they created?

It is a trivially reimplemented mechanism. This is not an attack that needs Google scale to be feasible and just like videogames and DRM, once the "crack" exists the technical ability needed to exploit it approaches 0.

That's how research works.

Thinking out loud here - but would this be a good use case for blockchain technology?

The problem with visible watermarks is it detracts from the image visibility. Nobody wants to look at photos or digital art pieces with huge ugly watermarks on them. Could blockchain tech help establish ownership in a way that would make watermarking obsolete?

edit: cool - downvotes for asking a question. Real nice guys.

The visible watermarks primarily aim to deter unauthorized use rather than to prove copyright claims or help people locate a copyright holder. If you want to prove copyright claims, there is a centralized registration system that has existed for over a century


Locating copyright holders is a much thornier problem


but it's not obvious that a decentralized database is more effective somehow than a centralized database.

cool - downvotes for asking a question. Real nice guys.

I didn't downvote you, but I suspect at this point people are a bit fed up of the constant suggestions of using the Blockchain for everything.

Also, it's applied in completely the wrong way. You don't watermark to prove ownership, you watermark to prevent unpaid use.

Ugh - if that's the case this place is less open to discussion than I thought. I was honestly curious if it could be a possible application for a technology that people constantly complain has no use.

Oh well, keep em coming. The downvotes only make me stronger ;)

But what's the actual implementation? Just suggesting a hot topic might have applications without specificying how isn't really starting a discussion. As other have said watermarks deter unauthorized use, they're not a proof of ownership.

Don't take downvotes so personally, they're more about weeding out bad ideas from good ones than a reflection of your worth to the community, this isn't reddit where we care about karma count. If you fix up your comment instead of complaining you might see those downvotes get reversed (I've had that a number of times).

Has anyone suggested using blockchains for comment voting?


But if you're going to use blockchains for voting, you might as well use them for the whole thing. Every new comment, every edit, every deletion and vote. Consensus as moderation.

Probably not but I was thinking either ML or VR could help

>VR Yes, yes, possibly 3D printing as well.

You meant AI...

ML is a real thing, unlike AI.

No clue where the VR came from.

He's trying to synergize his core competencies to get an MVP using his AR/VR play.

Oh yeah, definitely. Just like blockchain technology should be used to remove the pot holes from the road near my house, or my computer battery dying.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact