
Tell HN: 6.3% of HN top submissions in plain HTTP, more than half upgradable - oefrha
I was using the HN front page to test a library I was writing when I noticed some links that probably should be HTTPS are in plain HTTP. This piqued my interest a bit so I did a little analysis on how prevalent plain HTTP links are on HN. I probably don&#x27;t need to rehash the harm of using plain HTTP, even for personal blogs -- they can be snooped, and they can be modified to inject either ads or more sinister payloads. In fact, years ago I once disabled my ad blocker by accident and saw an ISP-injected ad on my personal site; never again, I swore.<p>The methodology is simple. I gathered all links from https:&#x2F;&#x2F;news.ycombinator.com&#x2F;front (&quot;past&quot; on the navigation bar) for each day from 2020-01-01 to 2020-07-09. These are the top stories of each day. This is a trivial task and resulted in 17566 links (raw data [0][1][2]). There are &lt;100 duplicates, which I kept. Among these are 1112 plain HTTP links, amounting to ~6.3% out of 17566.<p>Next I analyzed how many of the 1112 plain HTTP links are available over HTTPS. Methodology:<p>1. Check if the HTTP version redirects to the HTTPS version; if so, done, otherwise record the HTTP response;<p>2. Replace http:&#x2F;&#x2F; with https:&#x2F;&#x2F; and see if the HTTPS URL works; if so, record the HTTPS response;<p>3. Compare the HTTP and HTTPS responses. If they&#x27;re identical, done. If not, compare the length of the responses; if they differ by &lt;=1%, record this as HTTPS response almost identical as HTTP, and assume the HTTPS version works (the page may not use relative URLs or omit the protocol, so the HTTPS response may be subtly different while having the exact same rendered output).<p>The analysis script is available at [3].<p>---<p>To be continued in a comment since I&#x27;m hitting the 2000 char limit: https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=23802522
======
tjoff
The biggest crime here is ISPs that inject ads.

That is not something that one should accept. The whole concept should fall
flat in any working market.

About as absurd as getting audio adverts inserted into your phone calls.

"This call has been going on for more than 10 minutes, have a listen to our
sponsor".

Maybe that is a thing already?

~~~
belltaco
That would interrupt the call so that probably won't happen.

~~~
DanBC
There used to be a landline company that provided free calls in exchange for
playing ads during the call.

EDIT: It took way too long for me to find details, sorry.

[https://webcache.googleusercontent.com/search?q=cache:JiENVG...](https://webcache.googleusercontent.com/search?q=cache:JiENVGPhWssJ:https://www.baltimoresun.com/news/bs-
xpm-1998-04-21-1998111052-story.html+&cd=1&hl=en&ct=clnk&gl=uk&client=firefox-
b-d)

Would you listen to a 15-second advertisement in order to make a free two-
minute long-distance phone call?

BroadPoint Communications Inc. of Landover is betting that enough of you
would, and is launching its service in Pittsburgh this week with "thousands"
of subscribers. A Baltimore rollout could come this summer.

But before you bid good riddance to long-distance bills, be aware that --
surprise -- there are strings attached.

Before using BroadPoint's service, you'll have to register at the company's
World Wide Web site and give up personal information, such as household
income, number of children and ethnic background.

Once you've sent this information -- which BroadPoint says it will never sell
to other companies -- the only thing separating you from unlimited free chat
are a few ads, selected specifically for you based on your survey responses.
For example, if you told BroadPoint that you have seven young children, don't
be shocked to hear a diaper commercial.

------
hartator
One can argue there is no rational threat vector to access a static content
via http instead of https. (At least with an ISP in America that won't inject
things.)

~~~
oefrha
Look up coffee shop WiFi MITM.

~~~
Nextgrid
I'd be more concerned about advertising/marketing companies collecting data
about unencrypted websites visited by users on their public Wi-Fi as opposed
to rare malicious actors who are typically isolated to a single location.

~~~
oefrha
I’d rather be tracked everyday than be pwned once. Nevertheless, what I posted
is just an example against the “no rational threat vector” reply.

------
notRobot
That's actually much better than I expected :)

It will be interesting to see what the results will be like in another year.

------
osamagirl69
My website is http only, and I have no intention of adding https. The content
is pure html/css with 100% locally hosted binray content, and adding
encryption gains very little for either me or my visitors, and allows it to be
viewed by anyone regardless of what browser they are using, or even if they
aren't using a browser and just telnet straight in.

It is also not reliant on having permission from a 3rd party to exist, and is
fully self contained. Furthermore, because it is self hosted there are several
physical machines at different locations which serve as backups and are
selected by updating the DNS records, which I believe is incompatible with a
basic lets encrypt system.

I understand the drive to push out https for interactive sites, because it is
genuinely a bad idea to require users to submit their login credentials over
plaintext, and it is possible there are other snooping risks and whatnot, but
I really do not see the need for https on many of the simple static sites like
mine that make it to hn. The only compelling argument I have seen to force
encryption onto everyone is because of ISPs attacking users, but I am solidly
in the camp that if you believe your ISP is attacking you you need to use a
VPN for all of your traffic, because injecting ads is the least of your
worries at that point.

~~~
aloknnikhil
[https://www.troyhunt.com/heres-why-your-static-website-
needs...](https://www.troyhunt.com/heres-why-your-static-website-needs-https/)

\- MITM attacks \- Ad/message injection \- Malware injection \- Censorship

> Furthermore, because it is self hosted there are several physical machines
> at different locations which serve as backups and are selected by updating
> the DNS records, which I believe is incompatible with a basic lets encrypt
> system.

Sure. But it's not that much more harder to do it. Use something like the
caddy server and have it manage certificate deployment for you.

Unless there's a client that you intend to serve that absolutely cannot handle
SSL/TLS, I cannot see a reason why you'd want to stick with HTTP. And no, I
don't consider IE6 as a valid excuse. You can always use an older version of
Firefox on Windows 2k and it'll solve this problem. I honestly don't think I
can trade all the above for more compatibility with ancient machines.

> but I am solidly in the camp that if you believe your ISP is attacking you
> you need to use a VPN for all of your traffic, because injecting ads is the
> least of your worries at that point.

I don't understand this. If your resistance to switch is to make it easy for
anyone with any browser to use your web page, then expecting them to use a VPN
because they have a terrible ISP because you refuse to use HTTPS makes their
life more difficult. Surely, they don't see this problem with other websites
that use HTTPS but now need to go get a VPN service just to use yours. Also,
we're assuming VPNs are absolute saints here. Not really true. But even if it
were true, a VPN is eventually someone else's ISP. You're hoping nothing and
absolutely nothing in the routing topology will ever look at your data or
bother to inject into it.

~~~
surround
And how about some privacy? Although HTTPS doesn’t hide the host, it does hide
the path.

------
oefrha
Continued:

Results: out of 1112 plain HTTP links, 642 are available over HTTPS; out of
those 642 entries, 143 redirect to HTTPS, 307 serve identical response from
the HTTPS version, and 178 serve almost identical response (using the <=1%
content length difference criterion); the remaining 14 entries with slightly
wider Content-Length gaps tend to be visually identical too upon manual
inspection. So we can pretty safely claim that more than half of the submitted
plain HTTP links can be HTTPS instead. (We can't be 100% confident without
further inspection though, since a page that responds over HTTPS just fine
might have mixed content issues that prevent it from working at full
capacity.)

Detailed results are available at [4].

By the way, I also tried to match the plain HTTP links against HTTPS
Everywhere's rulesets[5], but coverage is rather poor since the rulesets are
user-contributed; I only got around ten matches out of the 1112 links.

At the end of the day I can't say this little analysis is in any way useful...
Let's just hope more people submit HTTPS when possible. I did notice that
certain HN darlings, e.g. pg's blog, are still on plain HTTP without HTTPS
counterparts. Also, the "Legal" and "Apply to YC" links in the footer are
plain HTTP links; apparently haven't been touched in ages.

[0] [https://pastebin.com/raw/qxVjjEyA](https://pastebin.com/raw/qxVjjEyA)
links.csv.00

[1] [https://pastebin.com/raw/3ZWgTJqh](https://pastebin.com/raw/3ZWgTJqh)
links.csv.01

[2] [https://pastebin.com/raw/jxRjwfwT](https://pastebin.com/raw/jxRjwfwT)
links.csv.02

[3] [https://pastebin.com/raw/bpsq9DZG](https://pastebin.com/raw/bpsq9DZG)
analyzer.py

[4] [https://pastebin.com/raw/hkzZ0m5f](https://pastebin.com/raw/hkzZ0m5f)
upgradability.csv

[5] [https://github.com/EFForg/https-
everywhere/tree/master/src/c...](https://github.com/EFForg/https-
everywhere/tree/master/src/chrome/content/rules)

(Sorry about the pastebin.com links. I don't want to have this HN account
associated to my real world identity, including my GitHub account and personal
sites, so I had to use a non-ephemeral anonymous file host. The raw data file
containing all aggregated links is too large for an anonymous paste, so it was
split up into three pastes; the data analysis script automatically downloads
all of them and assembles them into a single file.)

~~~
woodruffw
> By the way, I also tried to match the plain HTTP links against HTTPS
> Everywhere's rulesets[5], but coverage is rather poor since the rulesets are
> user-contributed; I only got around ten matches out of the 1112 links.

FWIW, you can tell HTTPS Everywhere to operate in "Encrypt All Sites Eligible"
(EASE) mode, which unconditionally attempts to upgrade to HTTPS and errors out
(with a prompt) if the connection fails.

~~~
oefrha
I know, that doesn’t help with this particular analysis though. I basically
did just that separately.

------
davefp
I wonder what would happen if HN banned the submission of insecure links. I
bet more than a handful of 'Show HN' posters would take the time to set up
letsencrypt (or something similar) in order to post.

~~~
dewey
And we'd lose a lot of valuable obscure links.

Usually these old, forgotten or just obscure websites by someone not looking
for SEO traffic or customers that some other person stumbled upon are the most
interesting submissions.

~~~
aboringusername
They're lost anyway. The web is on a path to deprecate and remove HTTP and as
the usage of plain HTTP dwindles even further to a level Google is comfortable
with they'll announce the end of plain HTTP on Chrome (likely in a tiered
approach). We'll likely see warnings of insecure HTTP, followed by a red page
at some point (similar to a mis-configued TLS cert), followed by refusing to
connect to HTTP altogether.

This will absolutely happen by the end of this decade, and HTTP will be a
distant memory.

If you care for HTTP, you need to ensure the contents of any HTTP sites are
preserved in some capacity because one day, they will remain inaccessible,
even using old software will likely not work at some point.

~~~
samaxe
You assume that all web content is designed for a browser, and uses html. This
is simply not true. There are very cool tools and tricks like getting the
weather in a terminal just ‘curl wttr.in’ and bam weather report right in your
terminal. There are other tools like ‘curl ifconfig.co’. It would make the
tools bit more cumbersome if you had to ‘curl [https://](https://) wttr.in’.
Unless the maintainers of curl had it default to https.

Edit: how to add an erroneous space because HN was doing something weird with
the https link

~~~
aloknnikhil
Or the owner can set up an http redirect to https. It's a win-win. curl
happily works with that when you ask it to follow redirects. 'curl -L ...'

------
jrockway
Yeah, it's very easy to degrade to HTTP accidentally. I recently set up HSTS
for my personal domain, and I was quite surprised at where I was relying on
HTTP. For example, to open Gmail I go to
[http://mail.jrock.us](http://mail.jrock.us). If someone was MITM-ing me and
that led to a fake version of Gmail, I would surely have been phished. (Though
perhaps WebAuthn would have saved me.) The fact that that was HTTP and not
HTTPS was obvious in retrospect, but not until I turned on HSTS. So overall, I
thought it was a good experience that improved my personal security. And,
since jrock.us is now preload-eligible, it should help protect my readers in
the future. (Though I admit that I have approximately 0 readers!)

I wrote up the details if you want a little more info:
[https://jrock.us/posts/gmail-and-hsts/](https://jrock.us/posts/gmail-and-
hsts/)

~~~
oefrha
Yeah, HSTS preload highly recommended.

[https://hstspreload.org/](https://hstspreload.org/)

~~~
patrickmcmanus
absolutely! and some eTLDs are preloaded (like .dev) already and that of
course applies to the domains registered in them - which is a nice property.

------
jermier
I noticed in Cloudflare you are given the option of allowing the site to have
both http and https, and the visitor to the site gets to decide what version
they want. But what use case does this have other than allowing requests to be
downgraded by determined actors?

~~~
cj
If you’re providing a service like Rollbar, Bugsnag, Optimizely, etc which are
scripts embedded on customer pages, with XHR api requests, it is sometimes
necessary for the client to connect via http if you need to support certain IE
versions.

For example; there are certain versions of IE that will throw an insecure
content warning if you load https scripts within a http page.

And of course, many older clients don’t (and never will) support newer
versions of TLS.

~~~
jermier
> For example; there are certain versions of IE that will throw an insecure
> content warning if you load https scripts within a http page.

Does IE support 'schemeless URIs'? like:

    
    
        //example.com/resource.js

------
vivekweb2013
Wow interesting analysis. Although I don't think http is that bad when it
comes to static web sites that don't have take any input from user.

------
ecesena
DuckDuckGo has a library for this: [https://github.com/duckduckgo/smarter-
encryption](https://github.com/duckduckgo/smarter-encryption)

At the core it does what the op is proposing, but there's a bit of an extra
complexity to deal with edge cases and regressions.

------
philshem
sometimes you may _need_ http and for that there is
[http://neverssl.com](http://neverssl.com)

[https://news.ycombinator.com/from?site=neverssl.com](https://news.ycombinator.com/from?site=neverssl.com)

~~~
sillysaurusx
Huh. Thanks for this! purple.com used to be my go-to, but it stopped working.

~~~
philshem
Now it’s [http://isoldpurple.com](http://isoldpurple.com)

~~~
sillysaurusx
Hahaha. This made my weekend. Thank you.

Was purple.com really that popular of a workaround? I have no idea how I
started using it, so now I’m curious how you happened to know about this
followup site.

~~~
philshem
I used to use purple.com to test my internet connection, since it wasn't
cached by the browser back then. The site itself is from 1994. It was sold to
a mattress company in 2017(?).

first snapshot, 1998:
[https://web.archive.org/web/19981212032124/http://www.purple...](https://web.archive.org/web/19981212032124/http://www.purple.com/)

later snapshot, 2016:
[https://web.archive.org/web/20160605121833/http://www.purple...](https://web.archive.org/web/20160605121833/http://www.purple.com/)

faq & details of advertising:
[https://web.archive.org/web/20170608195418/http://www.purple...](https://web.archive.org/web/20170608195418/http://www.purple.com/faq.html)

And before the sale of purple.com, this was the advertising policy:
[https://web.archive.org/web/20170702020833/http://www.purple...](https://web.archive.org/web/20170702020833/http://www.purple.com/availability.html)

~~~
sillysaurusx
How did you find out about isoldpurple.com? Did the site mention the new url?

Those FAQ pages are beautiful, by the way. Thank you for digging them up. I
love the “to make my life simple” answers.

~~~
philshem
Alas, I don’t remember!

------
philjackson
In many places (github pages, s3, firebase hosting etc.) HTTPS is as simple as
checking a box. It didn't used to be that way, it used to me an enormous and
expensive hassle. I wonder how many people just don't realise how things have
changed.

~~~
g_airborne
Is there a way to do this easily on S3 if you’re hosting a static website with
custom domain? Last time I checked you still needed to put a CloudFront
instance in front of it.

~~~
philjackson
Ah, no, you're right. Cloudfront is required.

