I've dealt with some spammers to various degrees. I think one of the most effective ways of dealing with spammers is to - "shadowban" them. Allow them to use your service, but don't indicate to them that you've identified them as malicious. For instance, when dealing with chat spammers - allow them to chat, but do not show their chats to other users. Another level would be to allow them to chat, but only show their chat to other shadowbanned users. For the author's use case, perhaps something like - if the ip address that created the link shortener accesses the link, they get the real redirect, and if a different ip address accesses it, they get the scam warning page. If the malicious actor doesn't know they've been marked as malicious, they do not know they need to change their behavior.
The second most effective thing is making the malicious actor use some sort of resource. Such as a payment (the author uses), or a time commitment (eg new accounts can only create 1 link a day), or some other source of friction. The idea is that for legitimate users the friction is acceptably low, but for consistent spammers the cost becomes too high.
The 3rd thing I've found effective is that lots of spam comes from robots - or perhaps robots farming tasks to humans. If you can determine how the traffic is coming in and then filter that traffic effectively without indicating failure, robots can happily spam away and you can happily filter away.
IPv6 doesn’t solve this really. You’ll still ban at least /64 and you’ll switch to /48 for the particularly nasty ones. There’s zero reason to ban a specific ipv6 address.
> You’ll still ban at least /64 and you’ll switch to /48 for the particularly nasty ones.
The entire /64 will nearly always be a single ISP customer, not thousands of customers behind one address as it can be for IPv4. And you can start by banning the /64 and then widen the mask, say, 4 bits at a time if abusive traffic continues from an adjacent range. It's not that hard to automate this. Then the /48 gets blocked only if you see abusive traffic from multiple ranges within it, implying that the whole range is controlled by the attacker, or that ISP does nothing about abusive customers, which is nearly the same thing.
That's actually a very interesting idea I hadn't seen before. Certainly makes it less obvious that one has been shadowbanned, and probably would help keep (non-bots) happy. I wonder if it'd be worth the investment to implement.
Shadowbanning is extremely hostile to users that have been mis-identified as spammers (which will happen) while spammers will quickly and easily figure out a way to determine if they've been shadowbanned. That approach needs to stop.
I've employed shadow banning on an online service deal with some deranged ban-evading individuals. It does help a lot. Granted, some of the more savvy users may figure out what you're doing, but you're often not dealing with the brightest minds. Given that your typical online service will maybe employ one moderator per 100k people, any reduction in workload is welcome.
> Shadowbanning is extremely hostile to users that have been mis-identified as spammers (which will happen)
It should always be a manual action and moderators should continue to see messages of shadowbanned users. You can always lift it in case of a mistake.
If you're going to have a free tier on your service and your service has any sort of interaction going on between users that could be degraded by spammers and the mentally insane, you're going to need shadowbanning. It's either shadowbanning or upping the hurdle to creating an account considerably.
I don't understand why shadowbanning would be so effective. It's trivial for any competent spammer to check their submissions from different ip addresses, they will very quickly discover if they are shadowbanned.
The risk of misidentifying legit users and shadowbanning them outweighs the potential gain.
I might be wrong on how these spam bots operate, but I assume someone (human) has to write at least a few lines of scripts tailored to the form on the website, to actually submit the spam. Adding a few more lines to also check that the submission went through doesn't seem like much effort.
> I don't understand why shadowbanning would be so effective
Because if done correctly the user never knows they are shadow-banned. It sounds trivial when you know _how_ the shadowban is done. But for instance, instead of an IP check, perhaps it's a time check - after 3 days it comes into play. Or a combination of different checks. So imagine that you are accessing a service that appears to be working correctly .... you would basically need to a) determine that that service even does shadowbanning, and b) think of infinite ways that you might be shadowbanned and try to determine if that's the case.
A legit user getting told they are banned can contact the site to try and resolve the issue on why they have been misidentified, getting shadowbanned will possibly never get resolved.
If you had the time and inclination you could even seed their account with mock stat's. I.e. when the link shortened is accessed, correctly log all of the metrics to their account so they have solid metrics indicating its working, but fail the actual consumer requests
Logging their metrics correctly is going to take resource. Instead, just set a flag on their account which, if true, means they just see some randomised junk stats.
A big problem that came up at the domain level was what I'd call
a _trustworthy domain with untrustworthy subdomains_, specifically
where those subdomains represent user-generated content.
The Public Suffix List (PSL) [1] to the rescue! It can help with this kind of disambiguation.
Paraphrasing, it's a list of domains where subdomains should be treated as separate sites (e.g. for cookie purposes). So `blogger.com` on the list means `*.blogger.com` are separate "sites".
What's the benefit of a link shortener, these days?
It made sense back before Twitter had one of their own. And I know that some people use it to get link analytics. I've also occasionally seen it used for printed materials, to get pretty URLs that are easy to hand-type.
People also use it for malicious purposes, such as hiding malware, or disguising referral links, or otherwise trying to obfuscate where a link is going. (Note: I'm not calling referral links malicious, I'm calling disguised referral links malicious.)
Other than printed materials (which need pretty URLs and thus often need a dedicated first-party URL shortener) and analytics, what are people using third-party URL shorteners for today?
I have written my own URL shortener. I do it partly to get URLs that are nice to type in printed materials.
I also use it to hedge my risks from using SaaS. For my org, we host some things that we offer to the public on different services. Sometimes a vendor doesn't work out. We use our shortened URLs in public communications, and I can redirect them to our new service if we need to switch. It was a way to address my discomfort with URLs that break too easily when you host on 3rd party services.
URL shorteners never made sense. Twitter was a dumb artificial limit. In 99,9% of cases, it's only used for tracking or obfuscation purposes. And URL shorteners die every day, leaving ArchiveTeam to clean up the mess again.
As someone who runs a small discussion forum its a great way for people who like to spam CSAM, malware, and other stuff I don't want in a way that gets past filters.
I think a conservative estimate of link shorteners usage is that 99% of cases are used by bad actors, and if they would all die out my life would be a lot easier. But, every week it seems some new one pops up and theres a new wave of spam to deal with.
At least thanks to this post I can add a new one to the filters before a wave of spam, so yay?
Sometimes reddit (and likely others) will try to parse a URL's valid characters as formatting and deadlink them (e.g. some wikipedia links with special characters)
They are useful for links that need to outlive the infrastructure they are hosted on. Think about them as a layer of abstraction. Ie. Links in paper published to a journal like nature. It might be valid for 10 years but the links embedded in it will rot quickly as organisations change cms's, domains names change. Organisations merge and disappear.
Also places where the cost to change the url is expensive, bus shelter adverts etc.
I think this is important but also hits the trust problem: open shorteners are basically training users to be phished but a controlled namespace doesn’t have that problem. Ideally you can use a domain you control for everything to get full control of your reputation while still retaining the flexibility to redirect links as needed.
As a user, I’m much more likely to click on the second link. Too many link shorteners come with ads and other annoyances that I’d rather not touch them. redirect-checker.org if I must
I use a link shortener for mailto links that include precomposed to/subject/body. It's handy to have the customer email you, and you reply, since your reply won't ever be marked as spam. If you gathered info via a webform and then emailed the customer, then it would be somewhat more likely to go to spam.
Which is most unfortunate... QR/Camera apps usually just show the domain anyways, and QR codes can easily fit large URL's. I imagine shorteners are used just so that they can choose a lower QR version and include a pretty logo in the middle.
I use them for easy memorization of tools and deployment stuff I use in my day-to-day IT work. It's also nice to be able to track if someone did what they were supposed to do.
This is a valid use case, my company does this, but I would never outsource it when a link expander isn't difficult to build exactly to the spec you want/need.
Yep. I built my own for a similar reason. It went from "we need a URL shortener" on a Wednesday to "we have a robust URL shortener in production" the next Monday.
No disrespect to the folks at y_gy who are clearly doing their best. But link shorteners, even when used by good faith actors, are problematic because they hide the destination of the link, and of course that's an invitation for bad faith actors to exploit, so the battle will be endless. Shorteners got popular on Twitter back in the days when all the characters in the URL counted against a very short limit. But there's less need to use them these days, and I am very reluctant to click on shortened links and don't think that this is unusual.
> But link shorteners, even when used by good faith actors, are problematic because they hide the destination of the link
In a sense, Google Search is even more evil because they change the destination link on-click. So hovering on a search result link doesn't show you the true destination.
Semi related. When I worked at Visa, I developed some ideas around making QR codes slightly more resilient to malicious hijacking when used in the context of a payments or commerce usecase. The idea was for the scanning app to look not just for a QR but also look for adjacent payment acceptance marks (e.g. branded Visa, MC, PayPal, or a merchant's brandmark etc.) and then dynamically only resolve URLs to registered domains associate with those marks. The idea was that QR codes not human readable, and URLs are a lot to ask the average person to reliable parse. So instead, have the scanner also see and understand the same contextual cues that the human can see and understand. And for the human, give them the confidence to scan QRs that will take them to a domain they would expect, and not to a Rick Astley video or worse.
I was recently discussing this subject and I have to wonder if some combination of human readable symbols that is also optimized for machine scanning will emerge.
Right now any phone should be able to parse a url if it can read the type, and so what is the point of QR besides the ubiquity?
QR codes provide built-in error correction so will stand up to serious wear-and-tear, partially obscured images, etc. - and it won't confuse O with 0 and i with l
You are raising all the right points. QR code standards come from an era when cheap digital cameras sensors were MUCH less good than they are now, and similarly when OCR/image-recognition resources were much less cheaply available or built-in to mobile devices.
I can really relate to this article! I created T.LY URL Shortener in 2018, and I've encountered all these issues and more! I found out the hard way when my hosting company shut down my servers for malicious content about a week into launching the site. Malicious actors will go to all sorts of lengths to achieve their goals.
Be careful relying on Stripe to prevent these users. Next they will start using stolen credit cards to create accounts then you will face disputes. If you get too many, Stripe will prevent you from processing payments.
About a year ago, I launched a service called Link Shield. It's an API that returns risk scores (0-100) on URLs. It uses AI and other services to score if a URL is malicious. Check it out and let me know if you would be interested in trying it linkshieldapi.com/
This. And related: I don't want to have to try your system in order to get pricing. I've seen that a couple times, particularly for things that are in beta, where you don't even see pricing until the end of the trial period.
Integrating a new system requires some effort. And there are some systems, like the one in question here, where there's a real cap on how much value they could possibly provide for me, even if they're perfect.
If I can't see whether the pricing falls in that range before I need to sign up, I'm just not going to seriously consider it for most services.
This is really one of the worst patterns in the SAAS market.
I don't want to provide my data to multiple services just to be able to compare their prices and find out which one I'm actually gonna use. At first this will lead to countless automated mails from all those "founders" asking why I haven't started paying yet, and if I'm unlucky my credentials end up on haveibeenpwned.com…
privacy policy does not address data retention, maybe there is none, I just assume that some data would be collected by an API service
I would not use something like this to send my customer data to the thing to check a link, but if it was something that could be self hosted on my vps and a script to attach wordpress chat system to check with it - maybe..
But with no pricing showing, I am assuming I can't afford it anyhow.
What worries me the most about things like these is that it makes it seem like it's impossible to make "free for all" products like these anymore if you're not an established player already. You will get blacklisted and you will receive emails from your host telling you to shut it down...
Established players like bitly and tinyurl didn't have all the resources to deal with the problem when they started out either, and they arguably still don't, yet they get favored by the antivirus vendors and "safe"search blacklists, since they're well-known services. It doesn't seem fair.
Is this really the way it should be? I wonder if they could've explained the situation to the antivirus vendors: The site itself doesn't host malware and doesn't allow the discovery of said malware through its service. It requires a user to receive an exact URL, just like they could've received any other link, and the blocklists should operate on what's hidden behind it instead of the redirect in front. Maybe y.gy could've been hooked into the safesearch API to automatically nuke any URLs blacklisted already by them, or another antivirus vendor.
While "honeypot" was mentioned in the title, there seems to be no useful outcome from caught bad actors, like reporting malicious websites, so browsers could block them.
Passing malicious URL filters is crucial to operations like ransomware, phishing, etc - hiding a bad domain behind a good one is extremely valuable to hackers and relatively cheap. Though I am surprised they'd pay for it due to the payment -> identity link (maybe it's stolen CCs but Stripe is pretty good about blocking that).
> Though I am surprised they'd pay for it due to the payment -> identity...
Between gift cards, money mules, shell corporations, and "that country doesn't cooperate with investigations"...I'd guess that this is no more than a minor problem for serious criminals.
For the malicious links, did you have a chance to track whether the malware actors verify that their links do not work, e.g. by setting a cookie when they make a link and checking it later ?
I wonder if making these malicious links silently work only for the people that submitted them (and to say “no such link” for everyone else) ought to create a degree of confusion and slow them down to some extent at least…
Kudos for the great site and ethos. You still seem pretty buoyed by the experience. Ultimately, I found the article pretty depressing. Your initial free offering with a good UX and relatively little ongoing maintenance was destroyed by an army of criminals. It ended up wiping out weeks of your time developing increasingly complex cat and mouse techniques. Ultimately resulting in you abandoning most of the free plan altogether.
Kinda sad that this is what the online world has become.
And we just put up with it.
Imagine if walking down the road each day was like this – people lining up ready to swindle you or manhandle you in order to steal your things. There would be outrage. But online we have just sort of reached a weird state of acceptance I guess.
I generally always run any shortened link through a link checker before opening. So they are an inconvenience to me.
The time it took you to write all this evidences the problem with hosting the service publicly.
Yesterday I ran into problem with sharing a link to a simplex.chat group which was so long my website builder translated it incorrectly. I looked at link shorteners publicly available and now understand from your writeup why they are somewhat limited now. I found it easier to just spin up my own link shortener on my webserver using Shuri. It took less than a minute for me install. I won't publicize its availability now that I have read this.
My first thought after reading the title: I would never create a link shortener service, too complex and too much responsibility -- can it handle the traffic? what analytics can I provide? should it be a paid service (or rather, can it survive without being a paid service)? how to fight off scammers? what if some day the site goes down or permanently stops running, does that mean all those links are now useless?
My thoughts after reading the article: I was so right.
Getting a chargeback in Stripe is costly. As soon as a dispute is started there's a fixed $25 that won't be refunded even if you win the dispute.
So for a service at $4 a month which is likely to get a lot of fraudulent payments I wonder if it's really viable.
One thing he should do is immediately cancel accounts and refund subscriptions when there's an early fraud warning. They are usually accurate and help avoiding those fees.
This was my thought as well. I wonder if the author couldn’t achieve similar friction without charging. Require a card for signup but only authorize it in the case of “free tier” users. PayPal will let me do that for free, not sure about Stripe.
Is there a name for this phenomenon? It's sort of like the dark forest, but not exactly. As soon as a free service becomes discovered, it is immediately swamped by scammers and spammers.
Many many years ago I ran a small forum for a small webcomic, and one day it was just full of low effort scams and spam. For an audience of, I dunno, a dozen people? I just shut the whole thing down because it wasn't worth our time to do anything about it.
We just can't have nice things, and if you run across something that is actually nice, make sure to thank whoever runs it for all their behind the scenes effort to deal with the scumbags that clog everything, and I mean everything, up with s(p|c)am.
I had a similar thing happen with a mediawiki site that I run. There was some “shrinkwrap” software behind the abuse, though, and a trivial capcha on account creation was sufficient to turn the abuse from a flood into a manageable trickle (I haven’t had to deal with spam since December, and when I do get spamming, it’s typically happening no more than once a month).
A couple years ago a client asked me for their own URL shortener service. I found YOURLS (https://github.com/YOURLS/YOURLS) and reluctantly installed it on a cheap, shared, hosting account.
Thankfully after a couple years, I convinced them (it took several tries) to use a 3rd party hosted provider.
Amusingly, I thought this website was broken in a myriad of weird ways - I kept getting incomplete response errors and bad SSL errors.
As it turns out, my ISP was simply doing a rubbish job at blocking the site. After a few 10s of tries it eventually managed to redirect me to their warning page and prompted me to turn off settings in my account config. Thanks Virgin Media.
This is a great writeup. If you are just looking to deter scammers I bet $1 would have the same affect. I don't think scammers are worried about the price as much as having to give any amount of information to you. I could be wrong though as I am not a scammer!
I made a link shortener in 2010 and it was such a terrible experience. Constant notices from my hosting company about child porn links, repeated ominous emails from the FBI and their counterparts in other countries, having my server temporarily shut down repeatedly. I abandoned it after 6 months because the amount of time it took to continually adapt countermeasures to all the scummy abusers was too overwhelming. In so doing, I'm sure I contributed to all the link rot out there.
Tangentially, it's kinda funny how people really don't realize how much websites/companies/social system implement user-unfriendly behavior because of scammers or other bad actors. (Admittedly, it's something that I also did not understand when I was younger and more naive. Hell, I had to explain this to my 70y-old parent just a few weeks ago!)
The price of success is you then need to deal with moderation in some form. (and on that note: "it is easier to automate bad behavior than it is to police it")
Right now, "enshittification" is (rightly) on many people's minds, but before that the reason any company makes a process difficult is because some assholes ruined it for the rest of us.
The second most effective thing is making the malicious actor use some sort of resource. Such as a payment (the author uses), or a time commitment (eg new accounts can only create 1 link a day), or some other source of friction. The idea is that for legitimate users the friction is acceptably low, but for consistent spammers the cost becomes too high.
The 3rd thing I've found effective is that lots of spam comes from robots - or perhaps robots farming tasks to humans. If you can determine how the traffic is coming in and then filter that traffic effectively without indicating failure, robots can happily spam away and you can happily filter away.