Free-form text input may be preferred by users, but possibly a nightmare for the...

geocar · on Nov 15, 2015

This isn't always about data integrity.

I'm an American, but I live in the UK: My home address is in the UK, but I live in a flat and my neighbours are mail stealing cunts so I use a scan+email service for my mailing address which is based in the US.

Similarly, my land line is a UK number, but my mobile is a US number, and when you refuse to let me type in a `+` sign I spend a lot of energy guessing whether to give you a number in NANP or in international formats.

The thing is, capturing information is just that: capture. By doing the traditional programmer thing of validating the field on input, you're pissing me off as a potential consumer. I guarantee that I know more about where I live than you do, so I'm more likely to abort my transaction if you tell me that my address is invalid. That means your non-profit arts association simply doesn't get my donation.

Thing is, I actually accept that my situation is exceptional and not the rule, so I think this is really about programmers being unable to deal with exceptions; treating them as nothing more than dynamic escapes or nonlocal goto, like there's only a choice of more complexity, or more rigidity.

This is nonsense.

Simply capture whatever the user types. That means all input fields are plain text or blobs or whatever. You can try to validate it into your business model when you have some business need: like mailings, or shipping, and then attempt to extract and validate the specific fields when applying the mapping. If you have an array of failures, you can allow the user to review at that point.

I do this with two tables: An input table, and a data table. The input table has forward pointers to the data that is extracted, and the data table has backwards pointers to the inputs. The data tables might be used for business logic like mailings, order processing, login management, or shipping things.

Shipping is a particularly good example: I want to maintain four shipping providers since they offer different rates. This allows me to offer "free shipping" by simply selecting the cheapest provider and pushing that cost into the product. To do this I need to know their shipping zip code for the US, or the shipping country for international that's it. I don't need anything else, and three patterns (/(\d{5})(?:-\d{4})/, /\b([A-Z]{2}$)/m, and maybe a list of common countries) should be enough to extract from most orders. Anything else I can punt to my fulfilment center who can either call the potential customer, delete as spam, or manually extract.

Or maybe I just ship everything UPS: I send it off to label making, and the 0.003% that fail I have to hand-check anyway (after all, are we verifying the city names as well?)

What am I doing with these URLs? Am I visiting them? Am I verifying someone has placed some widget on there? Or am I putting a link next to their name on a bulletin board? Verification means different things depending on the use case.

Missing that crucial email address field? Or maybe there's an extra space on it? What exactly am I doing emailing them? What if the email bounces? What if it gets marked as spam? Verification of an email address has less to do with the characters in it than it has to do with the use-case: If this is for an account recovery, I want to know you can email me and will work with your system to do that.

This approach also means I don't need to "edit" things either, because edits are simply new inputs. Logging is free. Users are happy.

The point is it's not a balance; avoiding the illusion is easier than you think and the hardest parts of the problem of data validity are problems you have to solve anyway.

DrScump · on Nov 15, 2015

  My home address is in the UK, but I live in a flat and my neighbours are mail stealing...

Are you sure it's your neighbors? We get mail theft here (in Silicon Valley) all the time; thieves harvesting mail for valuables, credit cards, tax data, etc. for ID theft.

In my city, police will not respond even if there is theft in progress and they have idle units at Starbucks next door, claiming that there is no state law against mail theft -- it's up to the USPS to deal with it.

I have a PO Box for everything but junk mail. The much-maligned USPS has a really nice feature nowadays: you can sign a (free) agreement allowing them to accept packages on your behalf from other carriers... so I have FedEx, UPS, etc. all going to my PO Box, using the street-address format for the Post Office proper. It's much less expensive than private services like the UPS Store and such.

Your neighbors are probably more focused on stealing your newspaper. Or spouse.

geocar · on Nov 15, 2015

The USPS operate their own police force[1] because federal laws are handled by the federal government.

[1]: https://postalinspectors.uspis.gov/

jrapdx3 · on Nov 15, 2015

To give a example, we have members living in Canada, Middle East countries, etc. Phone numbers, addreses, zip/postal codes are unlike those typical here in the US, so these are plain text fields in the data entry form.

If there's an entry mistake and database fields contain the wrong info, it's not a tragedy, we'll find out sooner or later and it can be corrected.

Of course, date entries by necessity can't be as "free-form". For one thing, browsers can be picky about their preferred format for type="date" inputs, accepting "/" separator, but not "-". Requiring a year to be 4 digits between 1980 and current year doesn't seem too onerous.

These might seem obvious, but complaints about such constraints still come up. It just needs to be clear to users that certain data needs be entered in specific format for good reasons. I agree with your comments about street addresses, phone numbers, or personal names which need to be unstructured input (but handled securely). .

I thought by now this stuff would be appreciated as pretty basic to the developer's craft, but apparently it still isn't widely enough taught or known.

jmnicolas · on Nov 15, 2015

>when you refuse to let me type in a `+` sign

I'm not sure I understood your problem, but the '+' sign can be replaced by '00' (two zeroes) when you cant to make international calls.

geocar · on Nov 15, 2015

Not from the US. In the US you dial 011 and then the country, so if my family calls me they go 01144...

angelbob · on Nov 15, 2015

When there's a compromise between user and developer happiness, remember the developer is usually the one picking.

So it's understandable that users often do an end-run around what they choose.

fauigerzigerk · on Nov 15, 2015

I think it is rarely a tradeoff of happiness. Most of the time it is about finding out what the right thing is and then doing it and explaining it well.

I think the reason why it often isn't done like that is twofold. First, there are many incompetent, sloppy developers who just don't care. Second, users are unwilling to pay for the time it takes a competent, unsloppy developer to do the work properly.

For instance, it is not impossible to design an address entry form that accepts unusual but correct addresses and at the same time captures as much structure as possible. Not impossible, but surprisingly difficult if you think about it.

You would think that it's economic rationality that leads to sloppy design and hence to the need for free form text fields. Maybe it just doesn't make sense to design everything with great care? But I don't think that is true. 90% of the time I needed to phone in to a support line was about issues that a slightly better website could have provided easily. Bad software costs them hugely in terms of support and user satisfaction.

It's plain to see that the good companies do get their data entry fields right and bad ones don't. I bet you could even predict share prices of companies based on the quality of their most trivial data entry forms.

TeMPOraL · on Nov 15, 2015

> remember the developer is usually the one picking.

I wish. Point me to such a company. Quite often it's the management that's picking, which ensures that neither users nor developers are happy. A developer would at least pick something that works and makes some sense.

jrapdx3 · on Nov 15, 2015

Of course it can happen that the developer doesn't adequately address the users' concerns, perhaps it's impossible to do, or more likely, the developer wasn't really listening.

So yes, the end-runs could be understandable. If the developer understands the user issues, and takes the trouble to mitigate the situation, it's a lot less likely end-runs would be attempted.

Any case, the smart developer knows there's a message intrinsic to end-runs, something isn't working and the developer's part of teamwork to see what can be done about it.