> [The challenge image] has a curious bit of coloring in it. What gives? Shouldn’t it just be black and white since the text is black?
> I’m actually not 100% sure why this happens (and sometimes doesn’t), but it’s an artifact of the rasterization process when text is rendered to screen.
This is a brilliant technique called subpixel rendering.[0] A "typical" computer screen's pixels are split into three columns of red, green and blue, instead of lighting up as a solid square. Using color fringing at non-pixel-aligned edges of characters can effectively triple the perceived horizontal resolution.
Wrong assumptions about the pixel layout will show up very badly, as shown in [1].
HiDPI displays (also mobile, not just 4K) get no real benefit from this and usually have a more complex subpixel layout anyway. I remember seeing old iPads usually showing awful fringes as subpixel rendering was enabled without considering the actual orientation of the display.
It still does - for fonts. It's called "ClearType Text Tuner" and it's part of Windows. It allows you to select one of several different subpixel rendering types to pick one that looks best on your screen.
Irritatingly, some windows applications implement their own subpixel rendering. As somebody on a 1440p BGR panel, those applications become extremely obvious to me as the aliasing is exactly backwards to what it should be. One irritating example is the web version of MS Word, as it doesn’t use the browser font rendering and instead implements it’s own (in canvas I think?) - giving all text eye-straining fringes on my display. Google Docs has the good sense to stick to black-and-white anti-aliasing at least.
Most of these issues exist on other platforms as well. Try running a QT or GTK app in Mac, or an opposing widget kit app on Gnome or KDE.
Things have certainly gotten better (similar to HiDPI support, multi-monitor, etc) as developers have standardized or learned the tools, but an individual app can choose to do it’s own bespoke rendering on any platform.
Fair, it's not a windows issue. I only use windows, so I've only experienced it on windows and I didn't want to overstate - but obviously that was poor wording on my part.
You should try the Dark Reader plugin that reverse all the colors and make the subpixel rendering of the web version of MS Word become BGR subpixel rendering. I know it because the text looked weird on my RGB display with the plugin on.
I was about to get a monitor that was too good to be true for the price. Turns out it was BGR. At the time, I didn't know what that meant. But the reviews said it was not good, so I avoided it.
I appreciate that reviewer every time I think about it.
My organization built a similar tool that can find bad redactions caused when people just use a black rectangle on top of text in PDFs: https://free.law/projects/x-ray
Yes, it's been that way long before PDFs. Simply knowing the potential words, often names, that could appear in a document, gives those with the redacted documents a chance at determining what has been hidden based on size. This might be part of the reason why when declassifying documents, the redactions end up being more of a sentence than is needed. The extra buffer of hidden words gives some additional protection to what needs to be redacted.
This reminds me of one of my proudest moments in high school.
For a test in German class (my worst class), the teacher had just used tippex to remove some words and put them next to the text, and we had to fill them back in. I grabbed my ruler and measured all the sizes. There was 1 very long word, many medium sizes and a few smaller ones, but with this information and the context of the text for the first and last time I was able to get my first and last 10/10 in this class.
A malicious "redacting" algorithm submitted to the underhanded C contest used a similar idea, just on lower level.
PNG allows ASCII numbers, so flipping all digits to 0 creates a pixel which is graphically "masked" but leaks information about the original pixel: "000" means the value was larger than 99.
Nope. That's called rebroadcast. It's also used to try to "launder" photo manipulations, like compositing. I helped work on some algorithms which could pick up artifacts even after rebroadcast.
I would absolutely not trust pdf not to leak metadata. Although now you risk metadata leak from the printer or scanner, which may or may not affect your threat model.
When a coworker asked me for my recommended method of creating and publicly sharing redacted copies of documents which (in their unredacted forms) contained PII for children, I told them to do this, in no uncertain terms.
> Am I the only one who redacts info, prints it out, then scans it back in?
if you have the source document, redacting from the source (by actually removing and replacing with an appropriate placeholder, not obscuring, the content) and regenerate the static (e.g., PDF) version.
If you are working from print, I think scan and redact by digital replacement (not overlay or otherwise obscure) would be sufficient. Redact->print->scan probably helps somewhat (especially if the scan is low quality) if you are using a bad redaction method to start with, but why do that?
Not if there is a rasterization step in the process. That's essentially what printing and scanning achieves, rasterization, and we can do that without the printer and scanner.
Of course, the artifacts introduced by printing and scanning (especially with contrast turned way up) gives it an air of legitimacy, although these can also be simulated.
If you print to paper and scan you are mostly safe, but if you do a software print to a pdf document you might use a tool that saves the actual content as invisible text or the whole word document as an attachment to the pdf. I would print and scan physically if it was something important. Or just edit the word document to remove the stuff and then print and scan to avoid saving the edit history since I don't know if that will be saved somewhere.
Usually I'm in full control of the software myself so I just output X instead of the secret data.
On MacOS, preview makes a clear distinction between 'drawing on' and 'redacting' PDFs.
It is an important part of UX that shooting yourself in the foot should _not_ be the default.
Wanting to redact information is not a subset of PDF knowledge. Understanding how PDFs work is not a prerequisite of desire to redact information. Lots of people have only the most basic rudimentary understand of how PDFs work, how Adobe works, and the limits or capabilities.
A lot of people don't even know you can print to a file instead of paper. Not sure why you're surprised about that, after all the standard method for all formats is "save as" or "export" and it's reasonable to assume those two options include all possible ways to save a file. It's a UI quirk that goes against user expectations.
Recently discovered a manual forr some home appliance with a clear Word comment along with username, seems like slipped in when the manual was translated.
Tangentially related but I immediately recognized the person in the video in the article as "AltF4" who has done a lot of work in the Super Smash Bros. Melee scene. He has worked on slippi[1], created a Python API library for the game[2], and coded a bot[3] to demonstrate the library which professional players find very difficult to beat based on its inhuman reaction times (as you'd expect).
I've always wondered what some of these Melee players do as a real job and I guess in this case I found out by accident (in hindsight, the name of the company is in his Github bio, which I had never checked before).
AltF4, Dan Salvato, Fizzi, and UnclePunch are all absolute legends. All of the things they've done to improve Melee are staggering. Amazing work by all of them.
Pixelation is one of these hilarious instances of where a needlessly complicated approach is used to do something that can be done much quicker, easier, and, as it turns out, way more secure.
Wanna redact text?
Put a solid bar over it. There, done. No takesies-backsies from that one.
Lol, that's really a stretch/nitpick, like saying:
> Delete it.
And this is not so good advice if you're using software that supports history and your deletion action is in another action that you can simply revert.
...then really you should know how to flatten? Not understanding the tools you use for sensitive tasks like hiding/redacting data is a recipe for disaster.
Seems like "flatten" ought to be a trivially easy mechanism to implement in software somewhere, and maybe even as a standalone application for the less familiar. Basically just a nicely trimmed screenshot, no?
Sure, but you should still delete it. You can always add a bar at the old position afterwards. Then it's still a visible redaction, but you don't end up having a hidden copy of the unredacted text.
>Redacting PDFs – What did the Manafort Lawyers get wrong?
Date: May 18, 2020
>With Paul Manafort being released from Jail on May 13th, for those in the document space like Alfresco, it was worth revisiting the PDF redaction issue that surfaced during his trial. Back in 2019, lawyers representing Paul Manafort (a former lobbyist, political consultant, and lawyer, who chaired the Trump Presidential campaign team) filed a response to special counsel Robert Mueller’s claims that he violated his cooperation agreement by repeatedly lying to prosecutors. On pages five, six, seven and nine either the lawyers or the special council staffers attempted to redact sensitive passages.
>Although parts of the public version of this filing appeared to be redacted by black bars at first glance, it quickly became apparent that anyone with Adobe Acrobat, or other PDF viewing tool, or even browser-based viewing tools, could easily copy and paste the text that still existed under the redaction blocks to another document to simply reveal the passages that had been redacted. From the UK, a similar incident happened back in 2011 with the Ministry of Defence.
> Most pdf editors will put a square over the text
Redaction is usually a "Pro" feature in commercial PDF editors. People who use PDF's annotation feature to draw a big filled shape over the text/image to be redacted, don't know what they're doing and should be prepared for poor outcomes.
Adobe Acrobat and Foxit have "Pro" variants that redact properly, I'd love to know exactly how they do it. The open-source approach I'm familiar with, essentially converts the PDF to a flattened image and edits it to apply a colored shape over it. Example[1].
I saw a variant of that last year. Someone sent courthouse floor plan to a contractor who needed office layouts, with security information "removed". Turns out it was just an overlay and he could see all security cameras by dragging the layer aside in an editor.
Would "print to PDF" be a good enough scenario for most people putting a square over the text? I admittedly haven't looked hard enough into it but theoretically it should be flattening it.
No, because “Print to PDF” preserves text (i.e. text is selectable in the resulting .pdf document), so it’s gonna be selectable behind a black square as well.
Although I just checked, MacOS Preview has a feature “redact” that (is supposed to) actually redact text. Well done!
On MacOS 10.15 Preview, I drew an "annotated" black box over a paragraph that was recognized as text. I exported it in two ways, 1) "Export as PDF" and 2) Print -> Save as PDF. Both results did not include the underlying text under the box.
That's not to say it's definitely not there, but it's at least something.
> Put a solid bar over it. There, done. No takesies-backsies from that one.
Well, mostly. But say you know the name that's redacted belongs to a small group (eg: US president since 1970 to today) - you could probably rule out (and mabye rule-in, determine) the redacted name, based on font-size, kerning etc in the document.
I wonder how fare a machine learning model could go, for longer reports - say 1000 pages with ~100 pages redacted - and the style of writing could be approximately inferred from the visible content - how many sentences/paragraphs could the AI fill in with some probability?
> But say you know the name that's redacted belongs to a small group
True, but sometimes there is no way around this, because redacting part of the text visually, so that it is clear where and how much was redacted, may be a requirement.
If there is no such requirement, one can always just replace the part of text like so:
This is the original text that I am going to redact now.
This is [REDACTED] now.
> you could probably rule out (and mabye rule-in, determine) the redacted name, based on font-size, kerning etc in the document.
Depending on what you're redacting you can eliminate this variable.
For example when I want to redact a piece of sensitive text in a video often times I'll make the solid black bar longer than the text being hidden. This way you can't infer the length of it based on the length of the bar. Of course this only works when you can extend the bar in such a way where it won't hide non-sensitive info that's important to see. In practice it works well, for example for redacting browser history just make the bar the entire width of the browser URL bar and for API keys or secrets often times the key exists as an env variable on its own line so extending the black bar is no problem.
I always get a little paranoid with a solid bar even. For example, if I'm trying to block my SSN on a PDF out, I don't trust that the block won't stay a mutable block when saving it. I tend to "print" a new PDF in hopes that it flattens the document.
I want random-text-overlay-pixelation (or some lookalike) as a feature instead of plain-text-pixelation. Giant black boxes everywhere is quite the distraction, whereas blur/pixelation is not, so just make the underlying data random.
My employer makes a tool that does this! It's a screenshot tool we had been using internally for years that we released last year. It takes a sample of the colors in the image and uses them for random-looking obfuscation. So it's obvious that the data is hidden, but there's no pattern or way to un-blur that data. It doesn't work as well on low-contrast images, but we find it quite useful for hiding sensitive data. Some examples:
I think it's kinda hard to see on the wikipedia example what has been redacted. I just do "Obfuscate" in Greenshot with pixel size 16. Call me when this algorithm reverses this https://i.imgur.com/NYf0Dpe.png! :D
I do this on my website with paywalled information. It respects upper/lowercasing and spacing but randomises each character. Anyone can disable the CSS blur but the data is still obfuscated.
This reminds me of Manning liveBook's obfuscation strategy [0]. It scrambles the letters to keep the majority of the specific details obfuscated while somewhat revealing the gist (word length, acronyms, anagrams, code segments) presumably to encourage sales and discourage piracy.
Potentially leaking names by giving out number of letters in name and surname? Just wondering. If so, consider killing or randomizing spaces and letter count.
Ideally the user wouldn't have to do anything. Whatever your editing tool is would have some "text blur" mode which functions and looks like normal blur but the end result is blurred garbage.
I think as far as culture goes it would be simpler to promote the idea that black bars are the only way to do it, than that you need to make sure you're using a special software that knows how to securely pixelate text.
It is a cool idea though, and text-shaped pixelation is much more satisfying than random noise. (And if someone's cheeky enough to decode it, they will be very disappointed ;)
I’ve seen even this fail once, although it was unusual circumstances. Black box on what turned out to be slightly off-black text using a PDF highlighting tool. Not the ideal way to do it. When printed and scanned you could just barely read it if you had a monitor with a bad viewing angle or adjusted the contrast. A paralegal caught it before it went out.
When I implemented the pixelation censorship effect in The Sims 1, I actually injected some random noise every frame, so it made the pixels shimmer, even when time was paused. That helped make it less obvious that it wasn't actually censoring penises, boobs, vaginas, and assholes, because the Sims were actually more like smooth Barbie dolls or GI-Joes with no actual naughty bits to censor, and the players knowing that would have embarrassed the poor Sims.
The pixelized naughty bits censorship effect was more intended to cover up the humiliating fact that The Sims were not anatomically correct, for the benefit of The Sims own feelings and modesty, by implying that they were "fully functional" and had something to hide, not to prevent actual players from being shocked and offended and having heart attacks by being exposed to racy obscene visuals, because their actual junk that was censored was quite G-rated. (Or rather caste-rated.)
But when we later developed The Sims Online based on the original The Sims 1 code, its use of pseudo random numbers initially caused the parallel simulations that were running in lockstep on the client and headless server to diverge (causing terribly subtle hard-to-track-down bugs), because the headless server wasn't rendering the randomized pixelization effect but the client was, so we had to fix the client to use a separate user interface pseudo random number generator that didn't have any effect on the simulation's deterministic pseudo random number generator.
[4/6] The Sims 1 Beta clip ♦ "Dana takes a shower, Michael seeks relief" ♦ March 1999:
(You can see the shimmering while Michael holds still while taking a dump. This is an early pre-release so he doesn't actually take his pants off, so he's really just sitting down on the toilet and pooping his pants. Thank God that's censored! I think we may have actually shipped with that "bug", since there was no separate texture or mesh for the pants to swap out, and they could only be fully nude or fully clothed, so that bug was too hard to fix, closed as "works as designed", and they just had to crap in their pants.)
The other nasty bug involving pixelization that we did manage to fix before shipping, but that I unfortunately didn't save any video of, involved the maid NPC, who was originally programmed by a really brilliant summer intern, but had a few quirks:
A Sim would need to go potty, and walk into the bathroom, pixelate their body, and sit down on the toilet, then proceed to have a nice leisurely bowel movement in their trousers. In the process, the toilet would suddenly become dirty and clogged, which attracted the maid into the bathroom (this was before "privacy" was implemented).
She would then stroll over to toilet, whip out a plunger from "hammerspace" [1], and thrust it into the toilet between the pooping Sim's legs, and proceed to move it up and down vigorously by its wooden handle. The "Unnecessary Censorship" [2] strongly implied that the maid was performing a manual act of digital sex work. That little bug required quite a lot of SimAntics [3] programming to fix!
This is incredible. I'd love to read more anecdotes like this. I don't want to pressure you into turning it into a blog... but I'd certainly be an avid reader :)
Thank you! Here are some Sims articles I've published on Medium, and I've written lots more stuff on HN, which I'd like to eventually clean up and publish on a blog some time:
Will Wright on Designing User Interfaces to Simulation Games (1996)
Also there's a great interview with Chris Trottier about "the toilet game", "tuned emergence", and "design by accretion", that I published on my old blog, which is still on archive.org:
>Sims Designer Chris Trottier on Tuned Emergence and Design by Accretion
>The Armchair Empire interviewed Chris Trottier, one of the designers of The Sims and The Sims Online. She touches on some important ideas, including "Tuned Emergence" and "Design by Accretion".
>Chris' honest analysis of how and why "the gameplay didn't come together until the months before the ship" is right on the mark, and that's the secret to the success of games like The Sims and SimCity.
>The essential element that was missing until the last minute was tuning: The approach to game design that Maxis brought to the table is called "Tuned Emergence" and "Design by Accretion". Before it was tuned, The Sims wasn't missing any structure or content, but it just wasn't balanced yet. But it's OK, because that's how it's supposed to work!
>In justifying their approach to The Sims, Maxis had to explain to EA that SimCity 2000 was not fun until 6 weeks before it shipped. But EA was not comfortable with that approach, which went against every rule in their play book. It required Will Wright's tremendous stamina to convince EA not to cancel The Sims, because according to EA's formula, it would never work.
>If a game isn't tuned, it's a drag, and you can't stand to play it for an hour. The Sims and SimCity were "designed by accretion": incrementally assembled together out of "a mass of separate components", like a planet forming out of a cloud of dust orbiting around star. They had to reach critical mass first, before they could even start down the road towards "Tuned Emergence", like life finally taking hold on the planet surface. Even then, they weren't fun until they were carefully tuned just before they shipped, like the renaissance of civilization suddenly developing science and technology. Before it was properly tuned, The Sims was called "the toilet game", for the obvious reason that there wasn't much else to do!
This wasn't much because it was blurred but because they showed a few images unblurred of parts of the QR code and the plain text code. The rest was brute forced and reconstructed from the QR code format.
Anyone remembers the Canadian paedophile who used a spiral effect to hide his face in photos? Simply reversing the spiral effect got the police a very clear image of his face and led to his arrest.
Another related piece of advice is that you should not use blur to hide sensitive information because blur is actually a mathematically reversible operation ! It's quite surprising at first but the information is still there, it has just been spread out. If you know the shape of the kernel and have enough resolution then you can unblur the image. You can even apply this process (called deconvolution) to physical blurs like imperfections in telescope optics. In practice however bit resolution is often too low to have good results.
You should also be careful with video, since it contains many frames it might be possible to reverse rougher pixelations or blurs than would be possible with a still image. There exist various superresolution algorithm that can extract high-resolution images from multiple low-resolution video frames.
My understanding is that Gaussian blur is technically lossless, all the data is still there in the decimal points. In practice, pixel values are truncated to 8-bit integers rather than arbitrary precision decimals, but because the blur is lossless to begin with you can sometimes deblur and retrieve enough information for whatever you were trying to do.
Hmm, the question is whether information is actually lost. In the answer linked below there is an example with a two pixel image that at least in that case information is actually lost, which means that the algorithm is not reversible.
In that example he's not blurring in the usual sense, he's replacing the whole image by a single average value.
Blurring is mathematically a convolutions product of an image that can be represented as a function f and a kernel g. The key fact is that the Fourier transform of f.g is the product of the Fourier transforms of f and g. So if you know g, or can guess what it is, you can solve for f knowing f.g and g, since everything is linear. (Some blurs might not be linear transforms but the usual gaussian blur is)
Information can be lost in two ways : 1/ quantization of the data due to storage in a limited number of bits, 2/ truncation of the blur at the image edges. So blurs cannot be fully reversed, but some information can be recovered. That might be enough to identify the information that was concealed in the first place. See the Wikipedia article on deconvolution for examples.
A blur filter is very different from a distortion algorithm (like the swirl filter). For a distortion algorithm you just have to put the pixels back into their original position.
Wouldn't an algorithm just like this one also be able to work on really blurry (i.e. far away zommed in) photos too? Especially with a huge corpus of trained content?
To prevent some of the issues with blurring or compression leaking sensitive info, one simple workaround I’ve used is to just put black boxes over the text as usual in any program and then take a screenshot and save it out from there. No accidental history or compression leaking.
One thing that still leaks is the exact spacing between words adjacent to the black boxes. This is a particular concern one only one or two words are blacked out -- for example a name. With a known font (with known character widths and kerning tables) and a known text engine applying those kerning rules, there's often very few combinations of letters that give the exact pixel (or subpixel) width blank space.
Yes, but being able to rule out "Firstname Lastname" can itself be interesting. Depending on context, the space of names is not all people, but can be very restricted.
Tbh I think there needs to be some dedicated tool for censoring images. Even with this method there is a small chance that it isn’t 100% opaque. I have seen several images where you can pull them in to gimp and adjust the color levels until even the slightest color difference becomes blown out and you can read the censored text.
When redacting individual words, you probably don't want to use the same size black box as the rendered word, since someone might be able to figure out what word it was based on the length of the box and the document font.
Also, consider removing metadata from digital files using mat2:
That should work, given than the result doesn't have any actual input from the original data (except width of the text) and completely opaque over the actual text.
I imagine that would be the same as a black fill, just in a different color with noise.
I've used the original image data as a fill, but scaled down, so you get a mosaic effect. Then I randomise the tiles in that mosaic and then I blur. The result is that it seems like the original data was redacted but in reality the original data has been scrambled to such an extent that it can no longer be retrieved.
It's perfectly fine to use pixelation in the vast majority of scenarios where you are just trying to NOT draw attention to details that aren't part of your message. If someone gets nerd-sniped enough to actually sit down and spend a few hours deconvolving someone's name or email address, that's not the end of the world unless it's a very exceptional piece of information. Use your judgement.
In any case, manually blacking out rectangles to hide text looks ugly.
This is bizarre advice. If you care enough to redact information in an image which may linger on the internet forever, you probably care enough to want to do it properly. It's often going to be sensitive personal information like an address.
I wonder if a better (albeit slightly more involved) process to make blurred out sensitive inputs for shared content would be to: 1) add a block over the text, but use the same color as the form in put 2) add similar looking, but fake text on top of the block 3) apply the blur/pixelation.
This way you keep the nicer aesthetics but you get the benefits of using the black box approach.
Probably overkill for many cases, but when I create any sort of content I like to keep it looking fresh.
Text pixelization is only useful in highly visible projects where you want people to be engaged to the point they figure out what the text is. Viral marketing, film, etc. this is a great tool, with very few valid applications.
I don't know how universal this experience is (or why it happens), but if I step back a bit from the screen and squint/unsquint my eyes, my brain can unblur most of the pixelation effect.
The bottom line is that when you need to redact text, use black bars covering the whole text. Never use anything else.
That actually may not be enough if you're applying the black bar to compressed image data like JPEG because compression artifacts surrounding the black bar can be leaking information about the covert data.
Might be interesting to test how plausible that is. How likely is it that a human doesn’t see the artifacts but they are leading enough info to reconstruct the underlying data.
I think it's more likely the JPEG blocks which straddle the box edges simply don't cover any meaningful part of the text. 12 pt font at 96 DPI is 16 pixels tall, meaning 50% of the vertical height of a line simply wouldn't fall into the blocks straddling the edges of a line-height box. You'd get ascenders and descenders but not much else. Tops of numbers or all-caps I think is best case.
Though, web images now are being served in higher resolutions (200+ DPI) for "retina" displays, and scanned images are generally 300 DPI, in which case you'd be lucky even to get ascenders and descenders.
I'd be curious to give it a try though. If Facebook memes are any indication, many humans are totally oblivious to near-unreadable levels of artifacting.
Not necessarily good enough. In principle you need to either get the raw original, or black out every macroblock that ever contained any sensitive information.
Saving to PNG doesn't necessarily change anything (though see below) -- the issue is the artifacts that are already present.
JPEG breaks an image into 8x8 pixel blocks. Each of those blocks then has its information content reduced, so that it can be described in fewer bytes. (I.e., information is thrown away -- making JPEG "lossy", and producing visible artifacts.) This has the necessary side-effect that, when reconstituted, this 8×8 block now contains redundant information (if not, then the compression of that block was not lossy). This finally implies that at least some certain pixels of that block can be (at least partially) inferred from other pixels. That is, if lost, they can be recreated.
(It's helpful to understand also that JPEG does not encode each block on its own, but additionally factors out block commonalities into a central "dictionary".)
For the above to be useful to infer text hidden by a black box, requires:
(a) that the edges of the black box are not aligned to the 8×8 grid;
(b) that the relevant portions of text to be recovered lie near the edges of the black box (i.e., within the 8x8 blocks which straddle the edges); and
(c) that these blocks originally contained data of sufficient complexity, and/or deviating sufficiently from the rest of the image content, that the encoder decided to throw away sufficient information in these blocks to leave significant artifacts.
Finally, if the redacted image was re-encoded as a JPEG (or other lossy format), the re-encoding process must not have thrown away too much information in these blocks, else the redundant information will have been obscured and rendered all but useless for reconstituting the redacted information.
So, an easy way to avoid having redacted information extracted in this manner is simply, to ensure that your black boxes extend at least 8 pixels beyond the redacted text in each direction. (And also, to force the JPEG encoder not to re-use the dictionary from the original image, as information about the statistical distribution of block data could theoretically be extracted from that. Round-tripping through PNG is one way to force this additional safety measure.)
This still isn't 100% information-theoretic secure -- there's still residual information in artifacts elsewhere in the image about what patterns the original image's dictionary contained (which could be extracted with e.g. principal component analysis), which, when combined with a prior statistical distribution of the expected uncompressed content of the image, could leak some information about the portions which were redacted -- but I suspect the amount of information available via this channel to be vanishingly small.
Quick note: when giving examples of variable width and monospace fonts, the both look variable width. No monospace font is displayed (mobile Safari iOS 15.3.1)
I usually convert PDF to raster and draw over. White or light gray box overlay will save ink/toner for everyone and is just as opaque as a black box.
I still do pixelize with huge pixel size when information is not that important. I think it better conveys that there was something there, and redacted areas look more organic.
I'm the developer behind Redact.Photo [0], instead of blurring or pixelating the image, I've added an additional step to improve data security. See "how it works" section on the page.
1. Image is scaled down to create a mosaic effect
2. Pixels are moved around a random offset of x and y axis
3. Blurring is applied
Adding black bars over information will always be better but this does result in more smooth redaction that I think cannot be undone because of the randomisation step.
I tried it out. It still leaves a lot of the original information in the redact. Is it enough to reverse? I can't say it definitely is, but I would not be willing to risk something important with it.
I'm not sure what you tested with, but it might "seem" that the original information is there?
Only the original color data remains, but the detail is gone and each pixel position behind the redacted area is randomised/mixed. This randomisation step also overwrites pixel information so there is data loss as well. Because it's then blurred it looks like the info is blurred only. But if you'd be able to reverse the blur you'd end up with pixel noise. I find it hard to believe that that noise could be reverted.
Redaction can be needed for a document, but redaction can also be needed for a video (car's license plate, credit card number, etc.). As far as I know, there's no video editing software that recommends black bars and, for example, the official Adobe documentation at https://helpx.adobe.com/premiere-pro/using/masking-tracking.... seems to recommend their Masking and Mask Tracking features.
Black bars may look very ugly in a video. Still, are video editing products recommending a process that has a high risk of leaking sensitive data? There might be reasons that attacking redaction in a video is harder than attacking redaction in a PDF. However, maybe it's actually easier in some cases, e.g., with several similar frames, the attack could take advantage of averaging across frames.
Unfortunately I don't see anything at https://hackerone.com/adobe that could get someone a bug bounty for researching this.
Its curious that in tv/film people's identity is censored by blur/pixelation/black bars on eyes and, documents are always censored by black bars. I dont recall seeing pixelation used for text, then why do we mostly go for pixelation in the first place when wanting to omit info on screen.
> I’m actually not 100% sure why this happens (and sometimes doesn’t), but it’s an artifact of the rasterization process when text is rendered to screen.
This is a brilliant technique called subpixel rendering.[0] A "typical" computer screen's pixels are split into three columns of red, green and blue, instead of lighting up as a solid square. Using color fringing at non-pixel-aligned edges of characters can effectively triple the perceived horizontal resolution.
Wrong assumptions about the pixel layout will show up very badly, as shown in [1]. HiDPI displays (also mobile, not just 4K) get no real benefit from this and usually have a more complex subpixel layout anyway. I remember seeing old iPads usually showing awful fringes as subpixel rendering was enabled without considering the actual orientation of the display.
[0] https://en.m.wikipedia.org/wiki/Subpixel_rendering
[1] http://www.lagom.nl/lcd-test/subpixel.php