
Clean your data: What programmers can do to remove 'dirt' from databases - abennett
http://www.itworld.com/development/77023/clean-your-data
======
edw519
I'm a little confused. First, OP says

 _In most instances, there is no technical solution_

then he says

 _there's a lot you can do at least to diagnose the state of your data_

If you're going to "diagnose the state of your data", why not clean it up.
There's so much that you can do at entry time, at retrieval time, and at any
time between:

    
    
      - wash non-printable characters
      - wash illogical (depending upon context) characters
      - trim leading and trailing spaces
      - verify check digits
      - verify lookups
      - verify against standards (USPS, etc.)
      - add Soundex, Metaphone, levenshtein, etc.
      - build a context suitable hash
      - add Soundex, Metaphone, levenshtein, etc. to the hash
      - wash standard keywords
    

Has OP actually spent much time being responsible for a data base? If he had,
he probably would have spent half of that time keeping it usable.

