Hacker News new | comments | show | ask | jobs | submit login
Most Digital Photocopiers Save Every Page Ever Scanned (cbsnews.com)
197 points by dpritchett 2767 days ago | hide | past | web | 79 comments | favorite

What I am finding really absurd about this whole thing is, the photocopier companies knew about this (obviously), and a number actually had services/software available to wipe the drives - but failed utterly in actually telling anyone who might care about them.

My gf works at a hospital and there are many panties in a twist about this right now. She contacted one of their major photocopier providers and asked about what happened to the ones they had on lease that were returned in the last few years. "Oh, they were sold, or scrapped, or given away... couldn't really tell you for certain though." So the obvious question: "Did you wipe the drives first?" "A: No, that wasn't included in our contract with you." - because nobody knew about this 'feature' when the contracts were written!

Massive fail both security wise and from the sales side on missing easy upselling. Very likely any hospital or police dept would gladly have paid for these services had they know it was necessary.

"In 2008, Sharp commissioned a survey on copier security that found 60 percent of Americans "don't know" that copiers store images on a hard drive. Sharp tried to warn consumers about the simple act of copying."

Does anyone remember this warning? I certainly heard nothing about this; and while I knew there had to be a hard drive in there to temporarily store the image, there was no reason to expect that it would store images permanently. Of course Americans "don't know" that copiers store images like that; how would we? Who had ever, EVER told us?

Why would there have to be a hard drive? I would have assumed there was enough RAM to do a job, and that powering down the copier would clear it.

Who the heck needs the copier to remember their copying jobs long-term?

Copiers these days can do so much more, they can print, have mailboxes (for storing, archiving, printing from or to), send emails, upload to and download from shared folders, integrate into your applications, send and receive faxes and so much more. And that's just the low end, I have no idea what the advanced features are.

That explains why they have hard disks, but not why they keep simple copies on the hard disk.

I'm surprised it's only 60%!

It's WAAAAAY more than 60%, I assure you. If the HN crowd is surprised (like I am!), the millions of people who don't know a hard drive from a monitor have absolutely no idea.

I'd guess that it's more like 99.999%.

Depends on how the survey question was structured. I can imagine

Do you know that photocopiers store image on a hard drive?

would get very different response rates than

Do you know that photocopiers store image on a hard drive permanently until manually wiped?

Or even:

Do you know that photocopiers store image on a hard drive? 1) Yes 2) No


When you photocopy an image, the photocopier... 1) saves the image to a hard disk permanently. 2) temporarily stores the image until the copy is finished. 3) other

The first question leads people to lie in order to not be seen as ignorant.

>Do you know that photocopiers store image on a hard drive?

Is still pretty leading. People would say yes because they don't want to appear naive.

"Do photocopiers keep any records of copies made?"

Ah, but in that case you'd get fifty-fifty since people who had no idea would just guess.

I was considering it as an open question not a yes/no. If they say yes you ask them how. That way they are demonstrating their knowledge rather than claiming it.

Like if you wanted to find out if people know that bacon flavour crisps are vegetarian you'd ask something like "do any crisps contain meat or animal products?" and be prepared with follow on questions. So if the response is "no" you go back with "not even BBQ beef or bacon flavour?". If they answer "yes" you ask "out of these, which are suitable for vegetarians: ..." or somesuch.

Makes me wonder who the hell they asked. Copy-machine security experts working at Sharp?

It's the first I've ever heard of it, and I've been around the things and dealing with the tech side of them (networking etc) for years.

At my last "office" job, the new machines they brought in were fully networkable. (The next time you're using a newer generation copier, check whether there's some cat5/6 plugged into the back.)

So, if you copied something personal during your lunch break (considered a de facto perk, as long as exercised in restraint, e.g. that tax form before dropping same in the mail), would it remain on the copier hard drive? Worse, would it be deliberately archived in a company datastore?

This place was big enough and sophisticated enough to have some technologists dedicated to managing the machines (in conjunction with a service contract). Yet I ended up having to help them with some configuration difficulties. Which led and leads me to consider the implications also raised by this story. Any organization with a halfway decent security policy should understand and address these problems when first deciding the bring the machines in. Yet they apparently don't. And manufacturers should have addressed them up front in the feature set and use/management guidelines (e.g. a setting to wipe images on job completion, whether user controlled or in overall systems settings; a clear machine management feature to securely wipe (e.g. to a clearly defined and understood DoD standard) all drive data storage). Yet they apparently haven't. Or they don't clearly steer customers to knowledge and use of those features.

The reasons for the technological features are obvious. Their mis-management, unfortunately, seems all too familiar. I'm sure there were people arguing for better, but that would have been hard.

Can someone explain the need for a copier to do that? Which engineer in his right mind thought that this would be a good idea without some routine clean up procedure?

It says in the article that he had to use forensic / recovery software to extract the images from the drive. That suggests to me that they were deleted, just not overwritten.

It's just like "deleting" a file off of your desktop PC; it's unlinked, but the data is still physically present on the drive platters until it happens to be overwritten, unless you make a special effort to overwrite it -- and most applications don't do that.

I'm not sure that it's really a flaw in the copier design, because they're not intentionally retaining the images any more than your PC is intentionally designed to retain Word documents that you "delete" without overwriting.

The problem is that many users are unaware that a modern photocopier even contains a hard drive ... they are probably oblivious of the fact that modern copiers have little in common with the traditional photostatic machines they grew up with, except that they're faster and don't smell as much. Manufacturers need to better educate buyers (or lessees) about the machines they're using, and perhaps make hard drives more accessible so that when they're sold, they can have the drives removed and destroyed first.

I've done a few recoveries from Photocopiers in the past - pretty much what everyone has suggested is correct.

Once you get the HDD out it's easily treated like any other drive. Most of the documents will be deleted (there is often a buffer of non-deleted documents) but recovering them is pretty trivial. Some of the newer ones do actually scrub data at intervals now.

Do these copiers (multifunction right?) include a function for this? I would hope that these copiers include a wipe drive button.

The headline of the HN article is a bit misleading.

"using a forensic software program available for free on the Internet, he ran a scan - downloading tens of thousands of documents in less than 12 hours."

Since he had to use special software to extract the files, it looks like the software deletes the files, but not securely. Basically, it uses 'rm' instead of 'scrub'.

As explained below, storing scanned pages temporarily is good when photocopying (or faxing) multiple pages.

Also, the creator of the software mentioned (securely wipes photocopier harddrives) is a very smart marketer. The article is basically an advertisement for his software, "INFOSWEEP," and he's the one who showed the journalist this vulnerability in the first place.

You need it to do multiple copies, double sided copies, collated etc. All the major maker's built-in software does overwrites of the data to various levels of security. Otherwise the government wouldn't buy them.

They also have fairly small drives - the companies are out to make money - so at somewhere like kinkos your job would be overwritten multiple times by the end of the day. It's only an issue if the system fails and the drive needs to be replaced - but anywhere operating under any sort of security regs would destroy the drive before it went off site.

We used to destroy them along with any waste explosive/munitions. Then health and safety stopped us and we had to buy a super monster shredder - which is a lot scarier than explosives but it can shred drives/entire files/interns etc.

Those are good points and legitimate use cases. But if they have small hard drives, how did the guy in the story recover "tens of thousands of documents" from 3 copiers? Why would a copier need to be able to store that many pages?

I mean, OK, hard drives are cheap, so maybe copier manufacturers just said "eh, even if they run huge jobs, they can come back and repeat them later. Only costs us a couple more bucks." But dang - they gotta address this.

EDIT: Hmmmm. I guess I just forgot how freaking big hard drives ARE these days. Discussion below about how a "small" 20GB drive could store so many documents at such-and-such size... oh yeah. It's startling to remember how gigantic 20GB sounded 10 years ago.

Why would you need nonvolatile storage for that? Why not just store it in RAM temporarily?

A hard drive could be cheaper than the amount of RAM required to store 20 pages of uncompressed scans.

If it's only 20 pages, I'd actually guess not. HDD + controller, for a minimum of (say) a 20 gig drive (I couldn't find any smaller ones), I'd guess would be a lot more than the extra hundred megs (tops) in the RAM that it already needs to have anyway. So it only makes sense to have a drive if you want to store a LOT more images than 20.

Maybe it needs a drive anyway for some other reason (like the software that runs it lives there instead of on a CF card or something), so it ends up being cheaper and easier that way?

100 megs? A single letter-size sheet is 30 megs at 600 ppi.

Okay, so 600 megs. An extra gig of RAM has to be cheaper than a hard drive + controller you wouldn't otherwise need, right?

Many people occasionally copy/print multi-hundred page documents. That runs to many gigs of RAM, $30 just for the DRAM chips. And then you need to design what amounts to a custom PC motherboard to talk to them. The price could easily run to $75, not counting the enormous engineering costs.

Or you buy a $40 ATA drive and connect it to your embedded processor's built-in interface.

This, actually, was the first thing that came to mind:


That was really interesting but the EFF really should put a date somewhere on that page. A "last updated" or something because right now I have no clue when they wrote that and whether they're still investigating it.

Short-term storage is useful for things like being able to make 5 photocopies of a document--- old copiers would have to literally run the copier across the document 5 times to do so. Now they're more like scanner->printer, so they scan once and print 5 times. They don't delete it immediately after the job either, because many copiers support convenience features like "ok, 2 more copies of that last document please". It does seem like they could delete once N more documents have been printed in between, though.

Reading the story, it sounds like they are removed from the file system, but not securely erased.

Not "more like" - they actually are scanners and laser printers. With some special paper handling for the printer.

i'd quite curious about this, myself.

the answer is also necessary to figure out how many images are probably being stored. if the copier is just keeping them for debugging reasons (which would explain why nobody knows they're there), then they could very well be highly compressed 72dpi images.

Unexpected conclusion: Photocopying your rear end counts as pro-privacy activism.

If done properly, this could be part of an awesome XKCD.

What would a stick-person's scanned butt look like?


and if the local kids have been doing it, you're likely still in possession of child pornography.

From the article: "All the major manufacturers told us they offer security or encryption packages on their products. One product from Sharp automatically erases an image from the hard drive. It costs $500."

This should not cost extra. It should be included in every single hard-drive-based copier.

It would be a shame if your business were to 'happen' to catch fire, but luckily we can prevent that from happening... for a fee.

... says the insurance company, who installed char cloth in your walls as "insulation".

Isn't it ethically questionable for the manufacturers to know about the issue (as they admit in the article), but to charge extra for software to address it? At the very least, shouldn't encryption or the mentioned erase utility be sold with the machine, with the buyer having the option to opt out if they don't want it?

Lots of manufacturers charge extra for various security options, on all sorts of hardware and software. We can go back and forth all day on the "ethics" of it, but it's pretty standard practice.

And Sharp's add-on module might not be required if the copiers were treated like the computers that they are; it would be perfectly fine to just remove the hard drives when the units are being sold at their EOL, as I suspect most healthcare and government facilities do with their old PCs.

From an Infosec perspective this is terrifying, not simply because of the amount of data, but how difficult it is to scrub. Most people/businesses know that they need to scrub the hard drives from computers before donating/surplussing. Doing the same with copiers is a nightmare.

This is why I'd recommend all businesses use some kind of secure disposal service to get rid of old equipment. Hopefully the businesses that specialize in this field will add copiers to the list of items they scrub.

It's not just copiers. Most office printers these days contain their own servers, with hard drives for the spool. It would be just as simple to analyze the drive from one of them and grab whatever's left.

And there you'd just be dealing with PS and PCL, probably, rather than some weird proprietary image format that you might get on a copier's drive.

"All the major manufacturers told us they offer security or encryption packages on their products. One product from Sharp automatically erases an image from the hard drive. It costs $500."

That's just outrageous. I can't think of a reason why copiers need to store images to begin with. It's a huge fail from the producers, and they shouldn't charge for cleaning up the mess.

Instead this should evoke a scandal like the Toyota braking thing and make producers have to recall their copiers, with millions or billions of losses.

Any Idea how to extract these stored images. There has been more then one occasion where I would have likes copy of an accentually shredded original.

You'll need to get the drive out. The connector needed is the only complication; it could be anything but is probably one of the mini IDE style connections.

Once out and connected to a computer you can recover data using any manner of tools.

I think FTK 1.8 still has a free trial you can use (http://accessdata.com/downloads.html). It's a bit of overkill for such a simple job but it will recover files fine. Otherwise just Google for programs to use on your OS of choice.

This reminds me of the hidden identifying markers put on all printed color pages for later possible "forensic" investigation by FBI/spook types:

"Government Uses Color Laser Printer Technology to Track Documents"


Copiers saving potentially (very) sensitive information to enable features most people don't use. Seems like a poor choice of default...

I've found the Readability Redux Chrome extension to be just as good and significantly faster! https://chrome.google.com/extensions/detail/jggheggpdocamnea...

In re: the article - Do you think arc90 is logging every document that is passed through Readability? Is this body of information worthwhile? I suppose it is good ammo for them to use to approach large media properties about the possibility of a paid site redesign project.

So, wait, when I push "Print + Delete" it deletes the page from the job queue, but still stores it on the hard drive? Now that's misleading.

I wish I had enough money to buy some digital photocopiers from interesting places. Government surplus sales would probably be the most fun.

In general, you can't really scrub a hard drive. No company in their right mind should ever let an old computer go out the door with a hard drive in it. Hard drives should be destroyed. I recommend something like this: http://www.youtube.com/watch?v=sQYPCPB1g3o

Not true. A single pass of 0's is sufficient to make a magnetic drive unrecoverable. Anything more is FUD.

Yeh agreed. It's a much argued situation in the Forensic world - but this paper pretty much agrees that it is impossible: http://www.springerlink.com/content/408263ql11460147/

Given that our government requires the destruction of storage devices to protect their secretes from seeing the light of day, I have to wonder why anyone would ever want to risk it: http://www.nsa.gov/ia/guidance/media_destruction_guidance/in...

Seriously, how expensive are hard drives these days?!

Yeh it' sensible if you have, say, confidential/classified information (we have to destroy most of our dead drives).

But I think that's mostly just overkill. I'm fairly sure that it is completely infeasible to recover data even for the mighty government. :P

I suspect the theory is that wiping a drive is prone to accidental failure/mistakes. Destruction is pretty unequivocally permanent :)

Wipe the drive, then destroy it.

After all, simply destroying a drive would allow you to magnetically scan the bits and pieces of the platters left over.

meh, I don't imagine anyone could do that - the positioning of the platters is important and with a load of it munched you're screwed :)

Hm... two steppers geared down quite a bit, a hard drive head. Striping between platters really is a problem, but given enough time...

That would be an interesting challenge, now to find someone to pay for it.

I don't think it would be easy, but if the information is still on the platters you should be able to recover at least some of it.

Especially if it is email or other textual information it would not take a very large fragment to contain a large chunk of text.

On a single platter drive you'd have a better chance than on a multi-platter one, the synchronization tracks would help in figuring out what went were. It's a bit like puzzling together shredded documents.

The problem with this is that it assumes it's always safe to destroy a drive, as opposed to the scenario where the mere fact you destroyed storage media is enough to implicate you.

The other problem with this is that the NSA is paranoid, and paranoia doesn't necessarily move with the times. This relates back to the first thing: If it's cheap and safe to destroy drives, and you are storing secrets, why not? That does not mean you need to destroy drives, which is what this conversation is about.

Wearing a belt and suspenders does not mean belts are prone to failure.

Well, that depends on how paranoid you are... while overwriting a sector probably renders the data on that sector unrecoverable, modern disk may 'disable' bad sectors, copying their contents to more reliable sectors.

Few people would care to try to recover these things, but if you really are working with nuclear secrets...

have any references for this? I've seen the same claim and would love to believe it.

The whole '26 pass' thing is based on a paper that created 26 different patterns of wiping the drive in order to default the different hardware schemes for the hard drives of the day. But that paper was written long, long ago, and I don't believe any of those schemes are used anymore.

Oops. s/default/'defeat'/

EVERY page? That could finally solve all my hard drive space problem. Store the images on a photocopier!

20GB Hard Drive (Cheap) / 100KB / Jpg stored = 200,000 pages.

Effectively infinite, when considering the copier lifetime. Certainly falls within most people's definition of Every.

At 300dpi (which is pretty minimally low), an 8.5x11 image is 8.4 Mpixels. At any decent compression, that's going to be more like 1MB/page, or 20,000 pages.

Maybe we're talking monochrome? That reduces image size, but I'm not sure how to calculate that.

Don't worry, it's not "every" page like the journalist sId, it's only the last 20,000. Somehow, I don't think that will make anyone feel better.

It's "every" by any normal definition, particularly in this context.

So the answer is to have the work-experience kid press the copy/scan button 20,000 times! :P

Well, monochrome is 1bpp, so it makes the raw image about 8Mbit or 1MB. The rest you'd only be able to determine empirically, and it depends quite a bit on what you're scanning and the scan quality.

"Noisy" images, including poor-quality scans or scans of halftones, don't compress well, but a crisp scan of high-contrast text or line art will be a fraction of that 1MB even with a trivial compression scheme like RLE. If you store as a TIFF with LZW, or something proprietary but similar to that, you're probably talking about an average of 200kB or less for a high-resolution scan. And this is assuming a fairly generous size/speed tradeoff, since you have a local hard drive to burn. I've seen huge document archives that go the other way (scan at lower resolutions, compress hard) that average 30-50kB/page.

They don't compress the images - they are written as essentially bitmaps from the scanning engine then read back to the printing engine. Most actually record 8bits/pixel even in mono because it lets you do lighter/darker and background level in software. so a typical page at 300dpi is something like 100sq in * 90000 pixels = 9Mb

Have a reference?

(boy, I'd hate to have to type that name to log in...)

Not that hard on a qwerty keyboard. Try it.

Another interesting, related fact, that not many people know of: the small photo booths you find on airports, train stations, malls etc., the ones often used to take that classic, romantic strip of pictures with your loved one, or your passport photo and so on, exposes each frame on a special type of transparent, photographic film laid flat onto a strip of photographic paper. The directly exposed paper is the copy you get in the slot, but the reel of film is kept in the machine for "governmental bodies' reference".


Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact