Hacker Newsnew | comments | show | ask | jobs | submit login

About seven years ago, I was at a sushi bar and struck up a conversation with an older gentleman sitting next to me. He told me he was a developer and created systems for USPS. I am always fascinated by the technology used in large scale systems so I picked his brain for a good hour.

From what I recall, he said at the key distribution centers, USPS scans every single mail (in standard envelop sizes) and in under a second, runs OCR for the destination address. Results from OCR are matched to the address database and if the match is significant, the mail is automatically diverted to the correct queue. Now here's the fun part - if OCR fails or print/handwriting is unreadable, a photograph is immediately sent to one of the hundreds of humans waiting to decipher the address and type it in (think Amazon Mechanical Turk). The humans have under 10 seconds to read, decipher, type, and submit the correct address. During this time, the letter is held up in a waiting buffer and the moment the correct address is available, it is diverted to the correct queue.

I asked him if that means USPS took a photo of every single piece of mail and he said yes, they had to, otherwise nobody would ever get any mail due to the sheer volume of mail they had to manage. I asked if the photos of envelopes were saved forever and he said, well, I'm pretty sure they are but I'm not allowed to publicly admit that.

I know it's a personal anecdote but that was seven years ago. I can't even imagine what they're doing now.




(Using a throwaway account)

I worked on the OCR systems. Fun fact: at one time, the USPS was the world's biggest user of Linux in a production setting. Their OCR boxes ran on Linux (until they were replace with SGI O2 boxes at a massive cost... but I digress).

Here's the path the mail takes: it is picked up by carriers from the mail boxes. Then dump trucks bring it to the P&DCs (Processing and Distribution Centers). There are about a 1000 PDCs in the country, I think. There, mail is dumped into a massive conveyor belt, where the first machine (AFCS, or Automated Facer Canceller System) makes sure that the mail is facing the right way, and is upright. Various heuristics are used for this. Here the mail is stacked nicely into flat boxes, vertically.

Postal workers then feed these boxes to the MLOCR (Multi-Line OCR) machines. These machines scan pieces at the rate of 13/second. After being scanned, the letter goes on a long loop before coming back to the beginning: this loop, about 3 seconds (not sure about this) is the latency: the reading machine has this much time to decode the address. Also at this time: a fluorescent barcode is sprayed at the back of the piece, giving it a unique ID. If the OCR machine can read the address, it is sent to a bin indexed by the first 2 digits (or so) of the ZIP code (assuming it's not local).

If the OCR can't read the mail, it is sent to a separate pile. Then a program called RCR (Remote Computer Reader) kicks in: a person sitting in some remote area gets the image, enters enough information to decode the address, and the results are collected (tagged by the ID of the fluorescent barcode). After a few hours, this separate pile is run through the sorting machine again: this time, the fluorescent barcode ID is used to match the results from the human, and a real barcode is sprayed on the front and the piece is sorted as before.

Now, there are variations in the above, but this is the gist of it.

Fun facts: the USPS aims to handle a piece at most 7 times. And when a piece gets jammed in the machine and is torn, it gets put in a "body bag" with an apologetic note.

-----


Great info, thanks.

How reliable is the mail delivery? Do you know how much mail is lost? One percent, more, less? (I believe one kind of failure is called UAA - undeliverable as addressed.)

I'd love to learn more, but don't know where to start.

Some of us election integrity activists are deeply concerned with the transition to vote by mail (all postal ballots, no more poll sites). One practical complaint is our assumption that 1% of all mail is lost. In a big county like mine, that's 12,000 ballots.

My FOIA requests were rebuffed. Apparently the data gathering is done by third parties, so is considered proprietary. (A nice dodge, illustrating how privatization reduces government transparency and accountability.)

The best information I found was looking at court cases, where USPS' customers (eg bulk mailers) dispute the UAA, and don't want to pay extra.

-----


In general, I think mail delivery is very reliable. But given the volume, there will be outliers. Even if we assumed 99.9999% reliability (a hypothetical number), given that they sort 300MM pieces per day, 300 pieces per day will be affected.

If you have the money, you should try an experiment: mail a large number of ballot-like pieces from different mailboxes all over the county (say, 10,000 letters) and see how many reach the destination. Sure, it'll cost $5K, but you may have a better answer.

-----


The Royal Mail in the UK quotes ~99.74% reliability (for delivery, not on-time delivery), FWIW.

-----


I'm curious how much mail also gets lost due to being delivered to the wrong mailbox. I average at least one mail per month that is not addressed to me in my mailbox.

-----


In 2006, during my senior year of college, I worked nights at the Postal Encoding Center in Beaumont, Texas. The starting pay was $15/hr and encoding is a 24/7 operation, making it a highly pursued job by college students.

It's true that every piece of mail goes through OCR. If that fails, it's sent off to one of the encoding centers as you described. There wasn't a 10 second limit to encode an address, but all encoder's performance was continually monitored and those that didn't perform quickly would not get as many hours per week. There were random audits done of a sample of 10 responses; over time your accuracy was expected to be 99%.

In addition to encoding scans of envelopes, there were more sophisticated systems for encoding packages and magazines. Since there is no standard place to put an address on a box or a magazine, encoders were provided with images from all sides of an item, making the encoding process have two steps: finding the address, encoding the address.

As OCR improved, the number of Postal Encoding Centers was reduced. The last I heard was that the Beaumont center shut down and there are only 5 left nationwide.

-----


Better OCR (and, cynically, perhaps a higher tolerance of error to cut costs) is making the human handwriting readers obsolete.

http://www.nytimes.com/2013/05/04/us/where-mail-with-illegib...

-----


And much less hand addressed mail. It's much easier to OCR a printed label.

-----


I wouldn't be surprised if modern domain specific OCR can give you an error rate that beats that of a time-constrained human reader.

-----


While I agree with you at the general level, it sounds like these trained individuals are ridiculously good at what they do. Even seven years ago, to be able to take a crack at an envelope in just 10 seconds and type out the result is impressive.

-----


And their jobs are getting harder:

"It used to be that we'd get letters that were somewhat legible but the machines weren't good enough to read them. Now we get letters and packages with the most awful handwriting you can imagine."

-----


It'd be more interesting if they captured the address the package is sent to; the return address (if any); and the post mark.

As I understand it postal mail is traditionally given much stronger protections than other forms of communication, especially in the US.

I'd be very surprised if postal mail was being intercepted and contents were read without very strong warrants.

EDIT: It does sound like a fascinating system though. All that mechanical stuff; all those different sizes; all that paper dust! Postal mail is amazing.

-----


It really is. I used to do embedded software for controlling mailing machines and the technology for paper handling is amazing. You mentioned paper dust: the printer we used was based on a Canon print engine with wipers added to periodically clean off accumulated dust and ink goo. IIRC, we ran a wipe cycle every two minutes to keep it clean enough to not clog the jets. It was a real problem because the ink had to be a fast-drying composition to avoid smearing as the mailpieces stacked up.

I moved on before I got to work on the sorting machines: the intricacy of that stuff is truly mindblowing for a mechanisms geek like me ;-)

-----


I'm not surprised, and I think this is general knowledge.

When I move, I tell the post office my new address. They are able to reroute my mail to me (while the new tenant gets their mail successfully). The post office reads the outsides of envelopes, and understands that mail to the same address can be treated differently.

So the fact that they send digital records of all mail sent to the FBI shouldn't be a surprise. But still is, somehow.

-----


> I can't even imagine what they're doing now.

Not delivering as much mail? http://en.wikipedia.org/wiki/United_States_Postal_Service#Re...

-----


I wonder if they fed the human-matched addresses back into the OCR system to train it to be better at reading handwriting.

-----


I'm sure with the right light frequency, x-ray or other, they can partially read mail through the envelope.

Not saying they're doing it, just saying it's possible.

-----


I know that these and similar cameras are in use at the Postal Service.

http://www.fairchildimaging.com/files/2kand4klvcameramanualr...

I know from experience that you can read at least the outside page of a tri-folded letter through most envelopes on the address side. According to one of the Fairchild applications engineers, it is a problem with mail sorting systems, because they have to reject that noise to read the address properly. The Osprey camera has excellent sensitivity in both UV and Visible wavelengths.

-----


Well there's ways to pull the paper out without disturbing the envelope if they really wanted to go that far.

As it stands what you're suggesting is at least already illegal, FWIW.

-----


> As it stands what you're suggesting is at least already illegal, FWIW.

Unless a secret presidential order established a secret law (declared legal by a secret court) that made it legal (but illegal to disclose to the public), that is.

-----


The "secret order" you refer to is in accordance with the public law, otherwise the "secret court" you refer to wouldn't have said that it's legal.

Invading the mail, on the other hand, is quite explicitly illegal.

-----


An undisclosed order that is in accordance with an undisclosed nonintuitive interpretation of the public law oughtn't be legal, and regardless of conformance to the USC, seems to violate the constitution that provides the foundation for the government's existence.

-----


Edit: I wished I'd read rayiner's sibling reply elsewhere. It explains the point I'm trying to make in a fashion that's about 100x better. -----

To the extent that revealing an E.O. doesn't endanger national security or other legitimate government purposes I agree completely that it should be public.

However I don't agree that it's safe in general to rely on a given Administration's "interpretation of the law". As Snowden has pointed out, the Administration can change... you should assume that what is permissible under the law and Constitution is actually being done, if that actually worries you.

So if the law says that the Government can intercept foreign communications pursuant to a trap-and-trace it's probably a good assumption to make that the Government is actually, at some point, trapping aforementioned communications.

I mean, if this was working just like a normal law enforcement scheme then you'd already have to deal with the possibility that the government is tapping a communications channel pursuant to a regular Article III warrant to investigate communications of a terror cell for months at a time. Presumably this wound theoretically still accumulate enough data (and metadata) to theoretically wreck your theoretical world should a theoretical despotism come to pass.

What an E.O. should do is to define where an Administration will focus its limited resources in enforcing the law. Perhaps they will decline to fully defend laws that are anti-homosexual in nature. Perhaps they will avoid aggressively going after marijuana usage (would be nice!). But even in that situation, if you cross state lines to buy weed you're still technically breaking the law and should be prepared for consequences of that; the E.O. could change tomorrow, after all.

And besides all that, what if they guy making an interpretation is at a much lower level. An individual cop might make a snap decision, do you expect them all to mail you a Policy & Vision Statement each month?

Even if that did, it would be hopeless to try to push the edge of 1000 different "lawful ways to enforce the law". Assume anything the law permits might be done.... and even then, it's hard enough to fully comply with all the law, even the ones that clearly fall within Constitutional guidelines.

-----


Ugh. When the news media refers to "secret courts" and "secret laws" they're taking some liberties with the definitions of "court" and "law".

The first thing to understand is that one of the basic concepts in our separation of powers system is that the executive has discretion in how it enforces the law. Take something basic like the Sherman Antitrust Act (15 U.S.C. 1). The most important piece is just one paragraph: "Every contract, combination in the form of trust or otherwise, or conspiracy, in restraint of trade or commerce among the several States, or with foreign nations, is declared to be illegal..."

The courts establish the precise contours of what is a "restraint of trade" or what is a "combination" under the law. This creates a set of boundaries for the executive. The executive is empowered to enforce the law, but has discretion within those boundaries. If it thinks some class of things is or is not a violation of the law, it is entitled to prosecute cases accordingly until the courts decide the point one way or another or Congress clarifies the law.

Presidential orders cannot create law, but they can guide the rest of the executive branch's enforcement of the law, within those boundaries of discretion. The President might issue a directive telling the DOJ: "we don't think that such and such agreement is a 'combination' under the antitrust laws, so don't prosecute such cases." Usually these interpretations are public (and are published in the form of regulations). Sometimes these interpretations are secret, in which case the media calls it a "secret law." But the key thing is that the directive only guides executive action that was lawful anyway.

Now, the FISA court has been called a "secret court" but it again serves to guide executive discretion, and is not a court of general jurisdiction. Its opinions are binding on no other court other than itself, and its jurisdiction is extremely limited. The basic principle behind FISA is that the executive can do a lot of things as a part of its foreign intelligence function Constitutionally that we don't necessarily want it to do. In particular, it can conduct surveillance of foreign agents entirely without warrants because foreign agents don't have 4th amendment rights. The purpose of the FISA court is to constrain the executive's discretion in this regard, by requiring it to get a FISA warrant for all foreign surveillance, even though such surveillance would not require a warrant under the 4th amendment.

To circle back to mpyne's point: neither "secret courts" nor "secret law" can override public courts and public law. Rather, they are internal to the executive. They guide the executive's discretionary powers within the boundaries established by public law. If they hadn't written it down, they'd still be entitled to do it, and nobody would complain about any "secret laws" or "presidential directives." The things mpyne mentioned are illegal according to public law, and thus not within the executive's power to do regardless of any secret directives or secret court opinions.

-----


this was a Linux system, I remember reading the articles taking about it (but it's been a while)...

-----


That one was for machine-printed addresses. I had the luck to hire the lead on that project in 1997 or 1998. The interview consisted of "teach me how that works."

Edit: Ah, after we worked together it looks as if he went back to pick up the hand-written addresses as well: http://www.linuxjournal.com/article/2985

-----


Oh man, I remember when those dual Pentium Pro 200's came out. So much awesome...

-----


"7 years ago they were saving 100% of the information... I can't even imagine how much higher that percentage must be by now"

-----


I meant, they were storing then. Who knows what they're doing now. Analyzing, creating graphs/networks, sharing historical records?

-----


Except I've never, ever heard of this used to solve a crime.

Definitely never used in court or we'd hear about it but they probably wouldn't risk the constitutionality of that.

So they are just collecting it for the sake of spying on everyone. Lovely.

-----


Just because you haven't heard of it doesn't mean it doesn't happen. Here's a case from 1970: http://scholar.google.com/scholar_case?case=1776184466461190...

You'll notice that early on, it mentions Ex Parte Jackson, an 1878 case which established that the contents of mail are protected by the Fourth Amendment, but that the outward form is not. Mail 'metadata' has always been fair game, just as it would be fair for a police officer to observe your comings and goings on the street without any need for a warrant.

Then there's the Postal Inspection Service, which is the law enforcement agency that specializes in mail fraud (perhaps you've heard of that?) and which predates the founding of the USA.

I don't mean to be rude, but the fact that you've 'never, ever heard of' something doesn't mean anything special. You don't strike me as terribly well-informed.

-----


Except I've never, ever heard of this used to solve a crime.

Reading the fine submitted article would solve that problem.

-----


...by examining information from the front and back images of 60 pieces of mail scanned immediately before and after the tainted letters sent to Mr. Obama and Mr. Bloomberg showing return addresses near her [Ms. Richardson's] home.

Those are some powerful tools for investigators.

-----




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact

Search: