
The “Chad” bug - mrb
https://plus.google.com/+MarcBevand/posts/fBfCsaXReH5
======
grardb
This is remarkable. I always find it interesting when bugs like this occur.

It reminds me of a hackathon I attended where a food ordering startup (I
forget the name, but they were chosen to feed us dinner that night) had a
similar bug, which baffled me beyond belief. Without going into crazy detail
about my password, it typically follows a certain pattern but is never the
same across websites. For some reason, the website kept saying my password was
invalid. It met all the password requirements that the website asked for
(length, capital letter, etc.).

I forget the exact details, but it ended up being the exact location of a
capital letter, the location of a number, or some combination of both. I could
never figure out how a bug like that could even be coded up. My best guess is
that it was some poorly-formed regex.

> Some people, when confronted with a problem, think "I know, I'll use regular
> expressions." Now they have two problems.

~~~
0xcde4c3db
Besides the usual regex aches and pains, the grammar for email addresses is
_far_ more complex than most people realize. According to a highly-voted Stack
Overflow answer [1], the current RFC-specified grammar for addresses can't
even be matched with regex alone. Combining the edge cases of the grammar with
(say) Unicode normalization sounds like a recipe for hours of fun.

[1] [https://stackoverflow.com/questions/201323/using-a-
regular-e...](https://stackoverflow.com/questions/201323/using-a-regular-
expression-to-validate-an-email-address)

~~~
kccqzy
I find that quite unbelievable. When I had a similar problem last year, the
first resource I found was a W3C specification[1] about <input type=email>.
The specification clearly states that email addresses should match:

    
    
        /^[a-zA-Z0-9.!#$%&’*+/=?^_`{|}~-]+@[a-zA-Z0-9-]+(?:\.[a-zA-Z0-9-]+)*$/
    

Since this is an official W3C doc, I see no reason why people shouldn't use
this.

Edit: There is also a version by WHATWG[2] here:

    
    
        /^[a-zA-Z0-9.!#$%&'*+\/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$/
    

It apparently does a more thorough validation than the W3C one in the domain
part, but the difference between the two is not apparent in practice.

[1]: [http://www.w3.org/TR/html-
markup/input.email.html](http://www.w3.org/TR/html-markup/input.email.html)
[2]: [https://html.spec.whatwg.org/multipage/forms.html#e-mail-
sta...](https://html.spec.whatwg.org/multipage/forms.html#e-mail-
state-\(type=email\))

~~~
zurn
The email input regexp is only about matching the atext@domain subpart of the
syntax. The full email address parsers are about parsing the full allowed
syntax you can put in an email to: field that includes address lists and
display names.

Also W3C produced HTML specs are hardly gospel about email-related things (but
I don't know whether there's anything wrong in this case).

~~~
kccqzy
But the question is, if a signup form is asking for your email address, will
you type in your name, a pair of angle brackets, and then your email address
in between? That's simply not what the web developer wants here.

~~~
lmm
Will I type that in? No. Will I copy-paste my email from my address book
program? Quite possibly.

------
raldi
I bet there's a hashtable involved somewhere, and Chad's address just happens
to hash to, like, 0x00000, and it turns out when that happens, there's a bug.

As a workaround, I bet you can use CHAD@... or chad+blah@...

~~~
mrb
A hashtable bug seems possible.

(Hangouts Dialer still does not see him if saved as CHAD@. It sees him when
saved as chad+bla@ but it's annoying because then his email is wrong in my
contact list as his email provider does not support + aliases.)

~~~
AnkhMorporkian
You could try chad()@. () is an empty e-mail address comment which, by the
RFC, is supposed to be ignored during delivery. Not every mail server supports
it, but it's worth a try until they fix that bug.

~~~
marcoperaza
Email address comment? What the hell were they thinking?

~~~
csours
Think about pre-outlook days, you may want some notes to remember who you are
emailing.

~~~
Dylan16807
I'm thinking about it. Still doesn't make sense to put it _in_ the address.

------
edent
It's a pity there's no way to report bugs like this to Google.

The only way I've found of getting anything resolved is to forward issues to a
friend inside the company, or hope that you can write a blog post which gets
enough attention.

I get that filtering and testing millions of random bug reports from all
corners of the Internet is hard - but it's a problem which Google desperately
needs to solve if it wants to retain the trust of its users.

~~~
asuffield
(Tedious disclaimer: not speaking for anybody else, my opinion only, etc. I'm
an SRE at Google.)

> It's a pity there's no way to report bugs like this to Google.

This is a popular myth.

General instructions are here:
[https://www.google.com/tools/feedback/intl/en/](https://www.google.com/tools/feedback/intl/en/)

In this particular case, it's an android app, so what you do is tap on the
hamburger menu, hit "help and feedback", then "send feedback".

~~~
edent
Having reported many bugs this way - I don't think I've ever had a response,
let alone seen anything fixed.

Looking at Android (OS) issues -
[https://code.google.com/p/android/issues/list?can=2&q=&sort=...](https://code.google.com/p/android/issues/list?can=2&q=&sort=-stars+-opened&colspec=ID%20Status%20Priority%20Owner%20Summary%20Stars%20Reporter%20Opened)
\- it's clear that the majority of bugs are ignored. Even when they're well
described and affect multiple users / devices.

~~~
asuffield
We do not routinely release information about what happens to bugs, so you
should not expect a response. I've certainly seen bugs reported on these
channels be fixed. I cannot release any statistics.

> Looking at Android (OS) issues -
> [https://code.google.com/p/android/issues/list?can=2&q=&sort=...](https://code.google.com/p/android/issues/list?can=2&q=&sort=..).
> - it's clear that the majority of bugs are ignored.

A quick glance at that page appears to disprove this claim. If you flip it to
display "all issues", there are 206689 bugs in that tracker at the moment, of
which 44583 are open. That tells you that 79% of all bugs filed have been
closed - so, at least 79% of bugs were not ignored.

Note that this tracker is for the operating system only, and does not include
any of the Google apps that the feedback system covers.

~~~
edent
> We do not routinely release information about what happens to bugs, so you
> should not expect a response.

Which goes back to my original point about customer trust. If you know I've
reported a bug, why would you deliberately not tell me that it has been fixed?

> so, at least 79% of bugs were not ignored

Well, take a look at some of the ones which have been closed -
[https://www.reddit.com/r/androiddev/comments/2on1fe/google_c...](https://www.reddit.com/r/androiddev/comments/2on1fe/google_closed_11889_android_bugs_last_48_hours/)
and
[https://news.ycombinator.com/item?id=8803118](https://news.ycombinator.com/item?id=8803118)

I know a good many people who work in Google - they're all smart and
dedicated. But there's something about the corporate culture which imposes a
"don't listen to external feedback" mindset.

It's your OS and they're your apps - you can do what you like with them. But
don't be surprised when users stop trusting you to listen to their concerns.

~~~
Eyas
> If you know I've reported a bug, why would you deliberately not tell me that
> it has been fixed?

Sounds like a typical fallacy of end-users looking at software. Assuming that
the developer deliberately is denying you a feature, rather than having simply
not spent the engineering time to make it possible.

In this case, for instance, it could be that the pipeline to get from external
feedback channels to Google's internal bug-trackers very one-way and its hard
to go back. Or, there's a disconnect between when the fix ships and when the
ticket is closed, and keeping track of the entire chain of data requires some
work. Or, there's no easy distinguishing features on bugs that came externally
to make them easily identifiable once fixed. Or, there's no process yet for an
automated response that says the bug is fixed (should it provide more detail).
Etc.

~~~
edent
Hence, my first comment.

> I get that filtering and testing millions of random bug reports from all
> corners of the Internet is hard - but it's a problem which Google
> desperately needs to solve if it wants to retain the trust of its users.

------
packetized
I wonder if this is related to i18n or country lookup. Chad is the only semi-
common English-language name that's also a country name, that I can think of.

~~~
benplumley
Jordan, Georgia. I feel like if this were the cause then the bug would be a
lot more common.

My guess is the dialler hashes some parts of the contact to get a UUID, but
for this contact it happens to be outside the range the dialler can look at -
perhaps off-by-one, where the dialler looks for UUIDs of 1 and above and this
happens to hash to 0.

------
ryporter
Somewhat similarly, I encountered a possible bug in Google Docs many years
back. I was reorganizing my documents, and I temporarily changed one of the
names to "delete". Poof -- I could not longer find it anywhere (or even search
for words that I knew were in it). I forget how I got back to it (maybe via my
browser history), but I changed the name a bit, and then the document was
"found".

This could have simply been a race case unrelated to the filename, but it's
much more amusing to speculate that it was due to hack introduced during
development. I now regret not trying to reproduce it, but I was pretty
frustrated after I found my document again. I did contact support, but didn't
hear anything back.

------
nbakshi
This reminds me my favorite name while testing: "McNulla". I have seen quite a
few webforms which had a regex to remove any NULL string, because of which it
would not take this name as it has a "Null" string in the name.

~~~
EvanAnderson
I knew a family w/ the last name of "Null" from high school. I wonder, from
time to time, if they have suboptimal experiences using the 'net.

~~~
isp
Obligatory:
[https://news.ycombinator.com/item?id=3900224](https://news.ycombinator.com/item?id=3900224)

------
PhasmaFelis
> _I exported the contacts and looked at the raw Google CSV data. One of the 2
> problematic contacts had a whitespace character at the end of its phone
> number. I removed it. Bingo, Dialer can now find it!_

This is kind of horrifying. Google being tripped up by trailing whitespace?

------
timberburn
When I was setting up an account on Comcast's website, I was consistently
getting an nondescript internal server error when submitting the form.

Took me quite awhile and many failed attempts to find that Comcast will throw
an error when your requested username contains "comcast".

~~~
hidroto
i wonder if that is to stop people from using names like comcastSucks or
worse.

~~~
rdancer
It may not have been the rationale, but it sure must be the most common use-
case.

------
incepted
> Some people, when confronted with a problem, think "I know, I'll use regular
> expressions." Now they have two problems.

jwz's wit was a lot of fun in the early 2000s but that quote is too often used
wrong.

This quote is not about regexps, it's about using wrong tools for the job.
Using it without context makes it sound dumb. "Well, what if regexp is exactly
the right solution for that problem?".

~~~
lmm
A regexp is almost always the wrong solution. It's a way of representing a
finite state machine _that obscures the states_ , which are the only valuable
part of the state machine abstraction (they're inherently incomprehensible
otherwise). And most implementations these days have random extensions,
meaning you have all the performance and safety issues of a turing-complete
programming language - but a much worse UX. They may have made sense in the
days of ed and the teletype, when a terse incomprehensible expression was
better than a slightly longer readable one, but they don't now.

~~~
incepted
> A regexp is almost always the wrong solution.

That's quite a sweeping statement.

Sure, complex regexps can be hard to read but state machines are just hard to
read for humans in general, whatever the form.

What do you recommend for parsing simple text entries, then?

~~~
lmm
If there are no existing libraries, parser combinators. More verbose but so
much more readable, and they make it much easier to parse into an actual
structure rather than a list of match groups.

~~~
incepted
> More verbose but so much more readable

First of all, "more readable" is extremely subjective.

Second, there are a lot of different parser combinators, all with very
different syntaxes.

Finally, parser combinators are readable by people familiar with them and
regexps are readable by people familiar with them. Regexps are also much more
widespread and approachable. And very often, writing a parser combinator to
parse a simple text entry is way overkill.

There are many good reasons why regexps are so popular.

~~~
lmm
> Finally, parser combinators are readable by people familiar with them and
> regexps are readable by people familiar with them. Regexps are also much
> more widespread and approachable

You don't have to be familiar with them to find something like:

    
    
        def emailAddress = userPart ~ "@" ~ hostnamePart ^^
          {(username, at, hostname) => EmailAddress(username, hostname)}
    

clearer than any regex. Named capture groups help a little bit but I've never
seen people using them (and they don't have a consistent syntax across regex
implementations either).

> And very often, writing a parser combinator to parse a simple text entry is
> way overkill.

Disagree. They can be very much a one-liner.

~~~
incepted
I think you'd be hard pressed to find someone who doesn't know parser
combinators tell you the snippet above is readable.

First of all, what language is this? Well, I know, but you seem to forget that
the parser combinator syntax varies per language. What's the equivalent syntax
for Java? Or for Python? What about a language that doesn't have a parser
combinator library? Or one that has several ones, all slightly different?

I think you're falling prey to the specialist fallacy: you're obviously very
comfortable with parser combinators but you've forgotten how long it took you
to get there and you now see them as an ultimate solution to all problems
without realizing their downsides.

~~~
lmm
> First of all, what language is this? Well, I know, but you seem to forget
> that the parser combinator syntax varies per language. What's the equivalent
> syntax for Java? Or for Python? What about a language that doesn't have a
> parser combinator library? Or one that has several ones, all slightly
> different?

I very deliberately didn't mention the language or the library, because I
think the snippet is readable without knowing that. Minor syntax differences
between libraries matter when writing, but not when reading, and reading is
more important. (And it's not like there aren't several slightly different
implementations of regexes)

I'm not that committed to parser combinators - I'd be happy to consider
alternatives - but anything where you a) name the things you're capturing b)
can easily combine several small parsers to make a bigger parser will have a
huge readability, testability and maintainability advantage over regexes.

~~~
omaranto
> anything where you a) name the things you're capturing b) can easily combine
> several small parsers to make a bigger parser will have a huge readability,
> testability and maintainability advantage over regexes

I think many regex libraries have named captures and in most languages you can
either concatenate regexes or build regexes from strings which in turn can be
built from concatenation. Furthermore, I thought building regexes by
concatenation of smaller pieces was a commonly recommended technique for
improving readability (I have no data on whether the recommendation is
commonly followed or not, of course).

~~~
lmm
I've never seen either of them in real-world code. Concatenation is something
but I think it's a lot less flexible - if you have a parser for something and
want to make a parser for a comma-separated list of that thing, with parser
combinators that's one call, whereas with a regex I don't think the string-
manipulation is straightforward, and would the named capture groups still work
if they were now being hit multiple times?

~~~
omaranto
You're definitely right that concatenation is less composable, and I agree
that parser combinators are more powerful. I was just pointing out there are
somethings you can do in regexes to make things more readable.

And about captures inside repetition: I don't see any reason they couldn't
capture a list of strings instead of a string in dynamically typed languages
but in all regex libraries I'm aware of they do something that seems useless
to me: they only capture the last occurrence! (They probably just overwrite
the capture on each repetition.)

------
suprjami
I didn't get the reference.
[https://en.m.wikipedia.org/wiki/Chad_(paper)](https://en.m.wikipedia.org/wiki/Chad_\(paper\))

~~~
snydly
Ohh, I didn't get it at first either. Thought of the country.

Best image to explain a hanging "chad":
[http://images1.fanpop.com/images/photos/1400000/Halloween-
ho...](http://images1.fanpop.com/images/photos/1400000/Halloween-how-i-met-
your-mother-1469279-1008-758.jpg)

------
abhishekash
Do people with other android version or the phone make face the same issue
while using this email id ?

------
frik
Many sites don't support the plus in email addresses ("+" = comment, supported
e.g. by GMail). Not so funny if the register process works but the login or
password reset features are broken.

Example: a site let me register and login with the plus. But resetting the
password was hard, I had to escape the plus to get it working.

------
db48x
Is that "chad@" or "chаd@" (homoglyphs)?

~~~
mrb
No homoglyphs.

------
Gravityloss
So, when computing power increases, we just add useless parsing at every level
of software, decreasing performance and causing bugs like these.

------
GotAnyMegadeth
I also see this with one of my brothers' names which is Olly <surname>.

------
tyingq
A potential workaround...quote the local part:

"chad"@example.com

It seems to be supported by Exchange, gmail, and a few other MTA's I tested,
and gets routed to the right place.

------
dougdonohoe
Was it the Chad?

[https://www.youtube.com/watch?v=_79BtELxB2k](https://www.youtube.com/watch?v=_79BtELxB2k)

------
Eyas
I wonder if this is the only case where the name/first name of the contact is
exactly the same as the recipient's address.

------
tmaly
regex bugs are bad, I have been bitten by one before. Its pure technical debt

------
mwpmaybe
The Lando system?

------
mattbillenstein
Android is a wasteland.

