Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: I wrote an HN bot to suggest HTTPS url when people post HTTP URLs
17 points by fishywang 73 days ago | hide | past | favorite | 10 comments
It's inspired by this comment I made: https://news.ycombinator.com/item?id=26599746.

I actually saw several comments with HTTP URL posted, and that was the only one I bothered to comment on. So I thought that this is something better suited for bots than human.

I hacked this together over yesterday and today: https://github.com/fishy/https-bot.

Basically it uses the Firebase API (https://github.com/HackerNews/API) to find comments with HTTP URLs in them, try the HTTPS version, compare the contents, post back a comment if the contents are more than 95% similar.

The "95% similar" part was actually the first part I wrote in the code. At first I tried a few existing go packages implementing diff/lcs, but most of them was quite slow and does a lot of allocations when I'm comparing two randomly generated 10KiB blobs, so I wrote my own (https://pkg.go.dev/github.com/fishy/https-bot/similarity), which is optimized for space (it does almost no allocations), and it's also faster because allocations are slow. (I know this is an unfair comparison that most of the existing implementations need to give you an output that can be used to reconstruct the two blobs back, so at least some of their allocations are required and unavoidable)

I also wrote a bug that it would find the same HTTP url in every run and post the same comment over and over again. My apologize to dang or whoever dealt with it (or maybe the system is good enough that it blocked those repetitive comments automatically).

In the end it successfully made 6 comments across ~4 hours (not including the repetitive ones). All of those comments are flagged (likely due to hn policy), https://news.ycombinator.com/item?id=26692588 is the only one that's still visible to other users at the time of writing, if you are curious.

I just killed it completely from the request of dang. Although it only lived for a few hours, it's still a fun exercise. Maybe I'll convert it into a reddit bot next? Who knows.




How does this tell when http is appropriate? For many posts, I prefer http over https [0] so defaulting to https is not good behavior.

I don’t like this “https all the things” without considering whether it’s necessary or not.

[0] for example this great post about feynman/Dirac was https, I wish it was http to save resources, I have no need for encryption... https://www.cantorsparadise.com/when-feynman-met-dirac-fe9cc...


You may have your own reasons for not preferring https, but having it default is important for security for others, even when the page doesn't contain sensitive material (a point which is debatable as well). https://doesmysiteneedhttps.com is a good short summary of reasons.


I think I understand the arguments for and I disagree.

There may be benefits to some users and those users can mitigate them easily (eg, vpn) without having to impact all users.

I don’t like it being proposed as a solved, no brainer when there should be considerations.

The site you linked to has straw man arguments for many of the items like whether someone can impersonate a site. While possible, it’s unlikely and not reasonable for many sites to worry about.

Take my example of the Feynman article. Perhaps someone MITM the site and changes content. That could happen, but is it likely? Do I care?

The privacy impact is moot because there are large orgs (Google, facebook, etc) monitoring all traffic so I’m not sure why I care that ISPs and wifi hotspots can monitor traffic as well.

I wish sites would present serious arguments rather than assuming simple, weak versions.


> I don’t like it being proposed as a solved, no brainer when there should be considerations.

> I wish sites would present serious arguments rather than assuming simple, weak versions.

It seems to me that considerations on this matter has been seriously and carefully taken in recent years and you seem to be the one going against today's consensus. The charge of providing evidences that HTTPS is not always the right thing on the Web is therefore probably on you¹.

I care about both ecology (if this is your point?) / efficiency and privacy and still prefer that encryption by default. It seems to me that HTTPS is not a significant overhead.

But providing privacy to people who really need it (and not for the wrong reasons) without them looking suspicious is badly needed for a well-functioning society, and this includes good decisions related to ecology.

I would suggest you to build an HTTP-everywhere extension, but many websites will refuse to serve your requests on HTTP so…

1: I actually can see an issue for HTTPS everywhere, and this is that it makes websites unreachable to old devices. But there, outdated browsers are probably an obstacle too, and I guess it would be possible to set up some kind of HTTP proxy for them (?)


As an alternative to similarity check, you could perhaps use the https-everywhere rulesets:

https://github.com/EFForg/https-everywhere/blob/master/src/c...


What could happen if these 10KiB blobs landed on maliciously crafted data? What could happen if the difference between two blobs was a maliciously chosen value?


My code only use the 10KiB blobs for comparison, so as long as I don't have any buffer overflow bugs the actual content doesn't matter. Even if I had a buffer overflow bug, I'm running the code through Docker with distroless based image [0], so that helps a little as well. I guess I can also change docker's runtime from runc to runsc [1] to help mitigate further, but I don't really see that as necessary as it's quite hard to have buffer overflow bugs in go code.

[0]: https://github.com/GoogleContainerTools/distroless

[1]: https://gvisor.dev/docs/user_guide/quick_start/docker/


Thanks for publically sharing your flagrant violation of the site's rules.

Did you expect people to appreciate this?


https://news.ycombinator.com/showhn.html

> Be respectful. Anyone sharing work is making a contribution, however modest.

> Ask questions out of curiosity.

> Instead of "you're doing it wrong", suggest alternatives. When someone is learning, help them learn more.

> don't be gratuitously negative.

https://news.ycombinator.com/newsguidelines.html

> Be kind. Don't be snarky. Have curious conversation

> Comments should get more thoughtful and substantive, not less

> Assume good faith

> Please don't post shallow dismissals, especially of other people's work. A good critical comment teaches us something.


You're right - I didn't have anything constructive to add and should have kept this thought to myself. Apologies to OP.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: