
CA Comodo used broken OCR and issued certificates to the wrong people - longwave
https://bugzilla.mozilla.org/show_bug.cgi?id=1311713
======
nneonneo
Relevant mailing list post: [https://www.mail-archive.com/dev-security-
policy@lists.mozil...](https://www.mail-archive.com/dev-security-
policy@lists.mozilla.org/msg04654.html)

In this email, Comodo discloses the security issue to Mozilla. The email was
sent 26 days after researchers Florian Heinz and Martin Kluge of Vautron
Rechenzentrum AG informed them of the bug.

Comodo clearly states that they used OCR for .eu and .be domains because the
TLD registrars redacted their port 43 WHOIS data, and only provided an image
of an email address on their web WHOIS pages. There was apparently no other
way to obtain the email address.

Rather than flag humans to fix OCR in ambiguous situations, they had automated
heuristics to correct the OCR, as determined by the security researchers.
However, the heuristics chose the wrong output for the domain @a1telekom.at,
producing @altelekom.at (an L instead of a one). The researchers registered
altelekom.at and obtained a cert for a domain owned by A1 Telekom, a major
ISP.

~~~
Pyxl101
There is an accusation in the comment thread of the article that Comodo only
disclosed this issue to Mozilla _after_ it was reported publicly by the news
media.

> steffen 2016-10-20 08:35:58 PDT

> In fact, the linked incident report refers to the heise article I also
> linked. So Comodo chose to "publish" this immediately after it was made
> public by others. That would be quite a coincidence. This raises the
> question of whether Comodo would've informed Mozilla at all if the media
> hadn't picked up on it.

~~~
dfeart3453465uf
A lone security researcher can find a bug and write it and share up a lot more
quickly than corporation.

Corp has to write, test, verify, share internally, review and approve before
it can be released. Bureaucracy. They also needed to patch their systems too.

Ryan Sleevi (Google) asked the question, and Robin Alden (Comodo) stated a
reasonable timeline. There is no conspiracy here.

------
taurath
>The OCR has a reproducible bug and has trouble differentiating small l and
the number 1. It also has trouble differentiating the number 0 and the small
o. Instead of fixing the bug or not using such obviously unsuitable software
the software apparently evaluates the following characters - if there is a
number after the small l it reads the l as the number 1. Similar issues with
o/0.

So what they're saying is y0u can fo0l their servers with 1eetspeak?

~~~
darklajid
I mean - these 'fixes' are common as far as I can tell (working in/around
OCR). But then again - I'm not issuing certificates.

Quite often you try to eliminate uncertainty by being clever: Sure, OCR
engine: Go ahead and recognize O and l and B if you want. If I know that the
context of this text is an amount, I'll still replace those chars with 0 and 1
and 8 afterwards.

(Engines usually allow you to configure the allowed character set, but in
practice it seems to be easier/more reliable to work like a parser: Lenient in
what you accept and strict in what you pass on)

~~~
emodendroket
I'm always amazed at how many Kindle books you buy have clearly not gone
through a simple spell check to catch errors like these.

------
johnwheeler
+1 for [https://letsencrypt.org](https://letsencrypt.org)

~~~
anfedorov
To clarify, Let's Encrypt validates identity by trusting DNS resolution of a
domain and then trusting a TCP/IP connection to the IP, correct?

It's good to encrypt things, but I wouldn't be too surprised if it were
possible for folks to issue bad certs via them, as well.

~~~
AndyMcConachie
I don't know why you were down voted. Yes, Let's Encrypt does verification by
requiring a site to host a string on port 80. They discover the site via DNS,
and they do NOT require DNSSEC. Thus you can absolutely trick Let's Encrypt
into issuing a bad cert if you can serve them bad DNS responses.

This OCR issue with Comodo in TFA concerns WHOIS data, which may or may not be
more reliable than unsigned DNS data. Regardless your point remains valid.

~~~
tokenizerrr
You can also trick practically every other CA using the same techniques.

~~~
AndyMcConachie
Yes. And remember you only have to trick one of them for them all to be
useless :)

------
codegeek
I am usually not good with donations but one company that I gladly donated to
has been letsencrypt. They have made life so simple. Please donate[0] or
become a sponsor[1] if you can.

[0] [https://letsencrypt.org/donate/](https://letsencrypt.org/donate/)

[1] [https://letsencrypt.org/become-a-
sponsor/](https://letsencrypt.org/become-a-sponsor/)

~~~
asymmetric
+1. Only nitpick is that LetsEncrypt is a nonprofit, not a company.

~~~
criddell
Nonprofits are still corporations. LetsEncrypt is a service of Internet
Security Research Group:

[https://letsencrypt.org/isrg/](https://letsencrypt.org/isrg/)

ISRG is a Secion 501(c)(3) corporation.

------
oxguy3
For the love of God, why has Mozilla not suspended Comodo yet? Too big to
fail, my ass -- give a few months of warning before the notBefore cutoff date,
and everyone will have plenty of time to switch over to a competent CA.

~~~
__jal
Seriously. How many times has Comodo displayed terrible judgement?

Bugs happen. Stupid bugs happen. Stupid systems with stupid bugs even
sometimes happen.

What I don't see is Comodo learning anything from their serial screwups.

------
asidiali
Comodo should be put out of business. They stole $100 from me for a
certificate then gave me the run around for months while I tried to get a
refund for a certificate I never received. Still haven't gotten my money back.

~~~
Avenger42
Chargeback?

~~~
Keverw
A chargeback is when you dispute the charge with your credit card company,
like if you never received your items.

~~~
orf
> like if you never received your items.

Like paying for a SSL certificate you never received?

~~~
Keverw
Yep, sounds like it would be useful for this case.

Not sure why people are down voting my answer. The other person posted
"Chargeback?" \- note the question mark. So I took that as if they were asking
what a chargeback is, and was trying to explain it as simplified as possible.

~~~
jdmichal
You shouldn't be downvoted, but I do believe you misread the intent. Avenger42
was responding to a post which did not mention "chargeback", so it wouldn't
make much sense to interpret it as, "What is a chargeback?". Rather, "Why
don't you use a chargeback?" would seem to be a better interpretation.

~~~
Avenger42
Yes, apparently I should have been more specific.

To be fair to Keverw, it's easy to think that I meant to reply to dman's post
and was asking what the term meant.

~~~
jdmichal
Sure. I don't fault Keverw's response; I was just attempting to answer the
question.

------
longwave
The underlying issue here is that WHOIS is still not standardised despite
being around for over 30 years, and the registrars do not have any other
common interface that can be used to discover domain owners and other
metadata. Is there no workable solution to this problem?

~~~
nailer
> the registrars do not have any other common interface that can be used to
> discover domain owners and other metadata

There's `.well-known` HTTP resources:
[https://tools.ietf.org/html/rfc5785](https://tools.ietf.org/html/rfc5785)

I work for CertSimple and we automatically use `(site)/.well-known/pki-
validation` if we see whois privacy, because in practice customers using whois
privacy are inevitably uncontactible. We may use `.well-known` by default in
future. Our friends at LE/Certbot (ie, ACME) use `(site)/.well-known/acme-
challenge`.

Also there is various efforts to replace/standardize whois, eg WEIRD and RDAP
[https://tools.ietf.org/html/rfc7483](https://tools.ietf.org/html/rfc7483).
However .well-known resources work well enough for the purposes and in
practice is more widely used.

~~~
eridius
Why isn't .well-known/pki-validation in the Well-Known URI registry?

[https://www.iana.org/assignments/well-known-uris/well-
known-...](https://www.iana.org/assignments/well-known-uris/well-known-
uris.xhtml)

~~~
tialaramex
Probably someone needs to ask IANA to do so. As far as I know (and I'm happy
to learn otherwise) the first mention of .well-known/pki-validation was Ballot
169 [https://cabforum.org/2016/08/05/ballot-169-revised-
validatio...](https://cabforum.org/2016/08/05/ballot-169-revised-validation-
requirements/) to the CA/B forum.

It is possible that the people drafting that ballot didn't feel they ought to
approach IANA until after it was voted through, and then they simply forgot.

It is also possible that the subsequent IP fuss (a bunch of CA/B members turn
out to have patented some methods listed in Ballot 169) distracted everybody

Finally it's possible IANA just didn't get around to updating the list yet.

------
djsumdog
Universities that are part of InCommon paid to get unlimited Comodo SSL certs.
Their API was pretty terrible and we ended up finding quite a few issues.

Every time I hear about these Comodo breaches, I'm not surprised. Supposedly,
Iran was able to get them to issue fake certs for some major sites:

[http://www.pcmag.com/article2/0,2817,2382518,00.asp](http://www.pcmag.com/article2/0,2817,2382518,00.asp)

------
cordite
Should being part of a CA include having a red team constantly trying to
breach things?

~~~
ethbro
Or more specifically, do security auditors have red teams authorized as part
of their audit?

~~~
thenewwazoo
In my limited experience (infosec for a big 4 firm), the answer is no. The
audits are done as cheaply and as quickly as possible. I worked alone, in
fact, and essentially did process testing (read: document review).

------
ig1
Previously from Comodo:

[http://www.pcworld.com/article/2887632/secure-advertising-
to...](http://www.pcworld.com/article/2887632/secure-advertising-tool-privdog-
compromises-https-security.html)

------
ComodoHacker
I'd like to know how other CAs perform domain validation for .be and .eu TLDs.

Disclaimer: not associated with Comodo in any way.

~~~
flavmartins
I work for one of the big CAs out there. For those domains a human actually
performs a manual WHOIS query for those TLDs and then manually enters in the
email address associated with the domain contact and they are required to
include a screenshot of the WHOIS details for verification of the information.
A second individual then is required to perform a verification that the email
address entered is correct per the attached screenshot.

All of this, plus the querying of the organization's legal registration,
business address and contact is all done by trained people and due to internal
efficiencies and workflows we can complete that in a matter of minutes from
the time a customer places an order.

In the end, even an organizational vetted certificate is still completed just
as fast as it takes customers usually to click on the approval email to
authorize issuance and submit the CSR for the certificate creation.

~~~
ComodoHacker
So it's a matter of cutting costs for Comodo. Thank you.

------
ungzd
So stupid anti-spam measure — email addresses as image — led (indirectly) to
such huge vulnerability.

~~~
Kenji
I think that is a great anti-spam measure. Most web scrapers are not gonna run
OCR on the images of your contact page and you save yourself huge pains and
loads of spam. Tell me, what is a better way? Using obscured JavaScript code
to inject the address into the page? CSS hacks? HTML comments inbetween parts
of the address?

~~~
m8rl
I have my email address open on most websites I've designed and programmed for
decades now, and I don't receive more spam at these address than on at those
addresses which aren't public. Spammer harvest for years now using many
channels: buying addresses, hacking databases, using viruses stealing complete
address books from people.

Using images as anti-spam measure really gives you a lot more problems than
benefits (except you'd use random single-usage addresses).

Don't obfuscate, but fix your spam filters or switch your email provider. What
helps a lot is server-side moving of detected spam emails to your junk folder
and looking through this folder from time to time.

------
chetanahuja
Web security based on PKI model based on 100's of "trusted" authorities is
just broken. And yet, the "security industry" continues doubling down on "moar
TLS" "moar green locks" model instead of coming up with a better model.

The tragedy is, that most of the internet access is now happening from mobile
devices and majority of _that_ is coming from native apps. The apps need
neither the same trust model nor have any "green locks". But PKI/TLS based
orthodoxy has such a death grip on the industry that people continue to use
this broken model for native apps where it makes even less sense than it does
for browsers.

~~~
corecoder
Well, unless apps authors are writing their HTTPS clients from scratch, I
suppose major mobile OSes provided HTTP client API functions do actually check
certificates?

~~~
advisedwang
I think chetanahuja is saying apps don't have to rely on CAs. They can
distribute a single trusted certificate, only trust a single CA or use key
pinning.

------
Johnny555
Did Comodo admit to using OCR for this, and that it wasn't a human
transcription mistake (humans mistake 1's and l's too)

It just seems odd for them to use an image of a web page to transcribe
information from a web lookup when they could just scrape the text off the web
page directly without using the intermediate image and OCR.

However, I could see them using a human in the chain to look up the whois
information, it just seems strange to come up with a complicated OCR solution
(and if they did, that they couldn't find a font that makes 1's and l's look
more distinct, like [http://forum.high-
logic.com/viewtopic.php?t=4004](http://forum.high-
logic.com/viewtopic.php?t=4004))

~~~
wongarsu
>when they could just scrape the text off the web page directly without using
the intermediate image and OCR

Try looking up whois info on google.eu. Most tools will simply output `NOT
DISCLOSED! Visit www.eurid.eu for webbased whois.`. Now you can search
[https://whois.eurid.eu](https://whois.eurid.eu) for the whois information of
google.eu. You will find that the email address is only available as an image.

That's exactly the situation Comodo tried to solve.

Given that it's a simple font with no obfuscation, a small pattern-matching
python programm should give you near 100% accuracy. Apparently Comodo used
some off-the-shelf software instead, and that software seems to make
assumptions it shouldn't make.

~~~
Johnny555
Ahh interesting, so it's a case of the cure being worse than the disease --
spammers know how to use OCR too and they don't care about transcription
errors.

So it seems like there's little point in deliberately obscuring the email
address in an image, and certainly no reason to do it with a font that doesn't
more clearly distinguish between letters and numbers.

------
orf
Isnt this is the same company that produced a 'secure' browser that disabled
CORS?

Doesn't surprise me.

~~~
45h34jh53k4j
They also make the best free application firewall software for Windows. Its
unrivalled for features in this space.

Nothing exists like this for linux (however [https://github.com/subgraph/fw-
daemon](https://github.com/subgraph/fw-daemon) is getting close).

The quality of the software both good and bad doesnt' apply here.

~~~
orf
Is this sarcastic? I can't tell. How is it unrivalled, and why would you trust
it coming from a company with an abysmal track record of security? Just Google
'Comodo Project Zero' for a taste. Or read this[1], one of the bad ones.

Also whats wrong with ufw? Github is down so I can't view that link.

> The quality of the software both good and bad doesnt' apply here.

Well I clearly does, because this post is about how their software did some
crazy roundabout stuff to validate domains that didn't work.

1\. [https://bugs.chromium.org/p/project-
zero/issues/detail?id=76...](https://bugs.chromium.org/p/project-
zero/issues/detail?id=769)

~~~
45h34jh53k4j
ufw, and by extension iptables, lacks features such as per process rules. You
have to do hacks like assign rules to users, and run the processes under
different users. Tails does this to isolate the Tor Browser process.

When you check out the link, see that nothing like this exists on linux. The
closest thing on OSX would be little snitch.

github isnt down, flush your dns, its left over cached NXDOMAINS from this
mornings outage.

you can also clear your browser dns cache with chrome://net-internals/#dns

I think my original meaning is there is lots of teams and not all of them are
bad :-)

~~~
orf
> ufw, and by extension iptables, lacks features such as per process rules.

iptables --pid-owner[1]

> I think my original meaning is there is lots of teams and not all of them
> are bad :-)

Sure, you find diamonds in the rough, but when their AV and their certificate
teams are trash does that inspire confidence? When their antivirus software
bundles a 'secure browser' that __disables CORS __(!!!) and infects you only
by scanning a file then why is it safe to assume that they know what the fuck
they are doing as a company?

That's like assuming it's safe for a doctor to operate on your kidney, despite
killing all their patients when operating on other organs, because you know,
it's different.

The firewall might work and have the slickest interface but if it's full of
buffer overflows and written by an idiot then it's not the best, is it.

1\.
[https://linux.die.net/man/8/iptables](https://linux.die.net/man/8/iptables)

~~~
catdog
> iptables --pid-owner[1]

Thats broken (only matches the exact PID, no child processes and according to
the documentation does not work on SMP systems). It also got removed at some
point. There is a cgroup match which may be usable instead, also network
namespaces may be a good solution in some cases.

~~~
45h34jh53k4j
true. but the cgroup match and namespaces require apriori configuration. I
want something that will warn me when a new process is trying to dial out.

------
cik
And yet somehow browsers have decided that self-signed certificates are less
valuable that purchased ones. Seriously?

~~~
wolf550e
If the browser adopts a trust-on-first-use policy with self signed
certificates and the certificate is replaced (possibly because it has
expired), how do you know whether it's MiTM or benign?

~~~
LukeShu
I recall a website with a self-signed cert, but they had a non-https page that
had a their TLS cert fingerprint signed with their GPG key; which effectively
moved the trust from the centralized CA system to the PGP web of trust.

I think it would be cool to have a standard URL (/.well-known/certificate or
something) that explains why you should trust their self-signed certificate,
and have the browser show that as part of the view when you encounter a self-
signed cert.

~~~
45h34jh53k4j
Then you have leaked material over http or some other cleartext protocol. If
you try to add encryption to that you just have turtles or move the key
exchange somewhere else.

The current PKI system allows you to make an unsolicited encrypted connections
to an internet origin over an untrusted connection with strong server
authentication.

Self signed can not provide this. We do not want the web to be like SSH.

------
andrewmcwatters
Have CAs always been this sloppy or are we just hearing about it more
nowadays?

~~~
lucb1e
As the system grows I think there are more and more players, competing for
market share, trying to get low prices and high profits, and more shit happens
as a result. On top of that, since it's getting more and more important, we
also hear about it more.

------
zokier
More worrying than some OCR silliness is that Comodo is issuing certificates
based solely on WHOIS data. I don't think it is intended for such security
critical use.

~~~
45h34jh53k4j
If you read the report linked from the bugzilla, you will read:

One of the methods that Comodo uses to validate that a certificate applicant
owns or controls a domain to be included in the subjectAlternativeName of a
server authentication certificate is set out in the CA/B Forum's Baseline
Requirements document [2] at section 3.2.2.4.2.

That method may be summarized as the sending of an email to an email address
(and obtaining a confirming response) where the email is identified as
belonging to the Domain Name Registrant, technical contact, or administrative
contract as listed in the WHOIS record of the domain.

So the Browser (Google Mozilla Microsoft) and the CA (Comodo Symantec LE)
industry working group agreed that was acceptable.

~~~
zokier
Maybe it is time to review that policy. As far as I can tell, WHOIS protocol
and data are both completely unauthenticated, and as such relatively easily
manipulated by mitm.

~~~
45h34jh53k4j
RDAP over https should be fine

~~~
zokier
And if RDAP or whatev would have been used here then the whole problem would
have never occurred. But it wasn't, due not being required to. Ergo sum, the
requirements should be reviewed to avoid repeating this sort of thing.

------
drumttocs8
Comodo is awful. I remember loving their original products, but it's been
downhill ever since they started trying to monetize so heavily.

------
abricot
pranjalv123 called it:
[https://news.ycombinator.com/item?id=6620467](https://news.ycombinator.com/item?id=6620467)

------
bandrami
How people still think the PKI system is actually delivering security is
beyond me.

We have _zero_ idea how many bad certs like this may be out there (the
nefarious people won't publish their results, after all), and yet a browser
will still treat a Comodo cert as better than a self-signed one (it's
identical to a self-signed cert, since Comodo is a known bad actor now). It's
better than plaintext, of course, but that's not saying much.

~~~
45h34jh53k4j
"Dont throw the baby out with the bathwater." We have certificate
transparency, the browsers are more responsive and reporting, we have HPKP and
other browser countermeasures. Its not perfect but with enough eyeballs it can
get close.

It provides meaningful security. CA signed clearly not the same as self-
signed; Maybe you misunderstand what this means?

~~~
bandrami
_CA signed clearly not the same as self-signed_

How is a certificate from Comodo any different from an self-signed
certificate? There is _actual evidence_ that they gave a certificate to a
third party. That means you should treat any certificate from them as self-
signed, because you cannot trust Comodo to do their jobs.

~~~
45h34jh53k4j
Assuming you understand the structural difference between a self signed and a
CA signed certificate (ie: subject pubkey sig vs issuer pubkey sig
respectively) the difference is clear.

You cannot determine provenance of a self-signed certificate. The sig matches
the subject. With a CA signed, the hold of the CA private key is the only
source (with high probability), so it is attributable.

If you trust the company or not -- play with you trust store. Otherwise this
is apples and oranges.

The only time when this comparison would be apt would be the compromise of the
Comodo Private Key. This would allow anyone to issue Comodo certificates, thus
removing their provenance. Of course then their cert would be revoked and we
wouldn't have this conversation.

~~~
bandrami
_You cannot determine providence of a self-signed certificate._

Exactly, though I think you meant "provenance". It's _exactly_ like how we now
know we can't determine the provenance of a Comodo certificate, agreed?

~~~
45h34jh53k4j
No.

I think I understand why you are not following what I am saying.

You prove that the certificate came from comodo, and only comodo. It can't
have come from anyone else. This isnt trust -- its public key crypto. Only the
issuer could sign it. If you trust it or not, its irrelevant, only that it
could only have possibly originated from there.

If you believe that the public key is truly owned by the subject because the
issuer said so -- this is trust.

~~~
bandrami
OK, but that's only helpful here in the sense that I _could_ remove Comodo
from my trust store, but nobody's going to do that. Not even me, and I'm the
one complaining about this. What I can't do is have any confidence in the
provenance of a CSR they signed: did it actually come from the organization
that controls that domain? (That was what I meant by "provenance")

------
garaetjjte
How even whois verification works? It don't contains email of domain
registrar, not registrant?

------
retox
Yet another in the long line of fuckups.

------
omouse
Not surprised, they seem like a shady outfit.

