Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Replace CAPTCHA with Proof-of-Work (hashcash.io)
89 points by hippich on June 25, 2014 | hide | past | favorite | 92 comments

How does this stop spammers? Are they supposed to give up because it uses up too much time (and CPU and therefore electricity)?

- Botnet operations are hacked PCs. They aren't paying for CPU time.

- Low-rent spammers often use hacked or rented servers. They aren't paying for CPU time.

- Even I don't pay for CPU load on my VPS.

This won't stop spam. Simple as that. You might get money from this but you're also going to keep getting spam. Oh and you'll chronically annoy everybody on a slow computer who doesn't want to wait.

Compare it to this $0.70/1000 for CAPTCHAs. That's really expensive because it's real money. It also takes time. It also doesn't guarantee posting (many sites have secondary spam-detection features) which might mean 90% are insta-deleted, 5% are held for moderation and only 5% make it through. $0.014/post is actually pretty expensive. Much more than they're spending to get past you.

> They aren't paying for CPU time.

There's still an opportunity cost. Would they prefer to hit a service which limits their machines to a few thousand attempts per day, or a service which does not and lets them put through several million attempts per day?

> Even I don't pay for CPU load on my VPS.

Perhaps not, but it sets a hard limit on how much you can do in a given time period.

> That's really expensive because it's real money.

If they're stealing computer time via a botnet, they're probably not going to be paying Amazon Turk with their own credit card.

Good point. In a closed community, you could ask users to wait 60 seconds to register/login/etc, or they could pay 10 bucks a year for instant access all the time (plus perks since most people wouldn't care about the wait once in a while).

This could be used by those downloader sites that make users wait.

It changes the economics. If your botnet used to be able to post a million spams per second, and now it can only do a thousand, then now you're making 99.9% less money from renting out your botnet compared to last week. The fact that your botnet cost $0 of monetary transactions to acquire and retain doesn't factor into it.

Botnets may not have a marginal cost, but they still have a limited supply, and a demand for their use, which determined the cost to consumers. If they are less effective per time on spamming, either the cost of spamming will increase, or botnet operators will shift into other more profitable verticals.

But if the alternative is solving CAPTCHAs by paying real money, then the question is whether the hackers can spend less money to run the hashcash instances on EC2 - they don't have to actually run the computations on the botnet computers.

The alternative is that botnets can inject html in a man in the middle attack and get real users to solve useless CAPTCHAs anyway. For example, when a user goes to yahoo.com a temporary page may show up that says "due to increased spammers, we need to you do enter a CAPTCHA to prove you are not a bot.

So the user enters the CAPTCHA, then some other botted computer enters it in the real webform.

Basically these types of measures are stupid because any virused computer can be made to do them anyway this is why CAPTCHAs are only 70 cents / 1000, it isn't because people in India are lining up to enter CAPTCHAs at an Indian minimum wage.

If you are powerful enough to MITM arbitrary sites on a user's machine, you can do /much/ better than get users to solve captchas.

At the very least, you can replace all ads on the Web with ads you make money from. And/or you could phish users and steal e-mail/bank passwords. And/or you could replace binaries the user tries to download with malicious ones.

And so on. I suspect captchas are pretty far down on the list of things you'd do if you had this capability. :)

If you do that, then you're only going to get the user to solve a few CAPTCHAs per pwned computer per day (maybe 20 or so if you make it really mean). If that's all you want, this hashcash implementation won't be an obstacle either, but that's a far cry from a million spams a second.

On top of all this, the fact that hashcash requires the browser to "work hard" means that any website using it will heat up my laptop and drain its battery unnecessarily.

Why is that a good thing?

if you laptop solves this in 10 seconds and you need to post 10 messages a day, it is under 2 minutes of working hard for your laptop. I doubt it will make a dent in your battery life

My phone took 2 minutes to do the example one.

> and you need to post 10 messages a day

Showing a captcha each time, for a repetitive task would be a horrible UX decision.

I guess this would depend on how bad situation with spam is on specific site. Just like showing CAPTCHA only once you could do showing hashcash widget only once...

It'll also force people to upgrade their browsers.

Would you rather type a captcha?

Yes, I would actually. I can do a CAPTCHA in about 5 seconds, whereas generating the proof-of-work took around 10 seconds on my computer.


"That's really expensive because it's real money"

Letting my mind go wild and wondering whether it would make sense to let visitors actually pay for the login. This could be DOGE-Coins for example. One could set up a DOGE-faucet and give back the WOWness.

Is that idea totally dumb or does it make any sense? Would love to read your opinion.

That would be pretty swaggy, but as parent comment said, if it's cheap enough for the spammers, it would be way more annoying for your lambda user than effective against spam.

If they rent access to cheap vps boxes then they are effectively buying posts at whatever rate they pay for the vps / however many posts they can get out - so making posting more time consuming also means they get less out of their vps so the cost per post is much higher.

As the parent said, they don't rent. They break low security boxes / poorly secured boxes and use their cpu cycles for $0 + opportunity cost of the hack.

I really like the idea, but... I'm not sure it's possible to achieve right kind of cost balance here. Some assumptions at first: every device has to use the same work scaling factor (because you can always pretend you're running something slower), any cloud server runs at least 10x faster computations than a low-end mobile phone where this captcha has to also work.

That means that even if you scale the work to take at most 10 seconds on the client machine, you'll be able to crack 1/s on a $5/mo on digitalocean. That's ~2.6M answers a month. That's $0.002 per 1000 captchas, instead of $0.70 mentioned on the website and as a bonus it removes the part where you rely on people who could mistype things.

I really want it to work, but does the economic balance actually work here against the spammers or for them?

This is the key point I think. From my desktop (Intel Xeon CPU E5-1620) the example takes 12 seconds to run, on my phone (ARM Cortex-A7 MPCore) it's taking about 120 seconds to run.

If you don't want to annoy your visitors you're going to need to have the max runtime on any device at around 25 seconds.

Now you can't scale this per device, because you can't rule out devices spoofing what they are. So this means you're looking at barely a few seconds to break it on any sort of desktop hardware, and even less on any dedicated server.

Spammers are probably waiting longer than that to just load the page they want to spam, so it wont slow them down. For this to work you need to solve this issue, and I hope you can, but I don't see how.

If the workload is made to be very large, say 30 seconds on a recent desktop, it could fall back to a captcha for slower devices. At least people visiting with fast computers could be spared the captcha.

That's a very good idea, coupled with what other people have suggested (Increase the time on the fly for devices/IPs making repeated or spammy looking requests) I could see this working.

(haven't read the article yet, keep getting 504s)

Regarding feasibility & cost, what do you think of the old '"Proof-of-Work" Proves Not to Work' paper? [1] From what I remember, the time to hash adequately would be too long to be of practical use/benefit.

[1] http://www.hashcash.org/papers/proof-work.pdf

Further, since the computation is implemented in Javascript, it should be possible for a spammer to take the code and make a much more efficient implementation in a machine-near language, such as C. Let's hand-wave and say that would raise performance by a magnitude - and that's probably very pessimistic, really.

At least it is not fixed cost/complexity. If you see large number of attacks/spam, you response by increasing complexity (and cpu cycles required to solve it)

And you remove CAPTCHA :)

Yes, you could change the scale. But the processing power ratio stays the same. At the price per 1k you listed, it's still cost-effective to crack this captcha at 1 per 10s automatically. But noone will accept waiting 100s to post anything.

If you start calculating the hash in the background, that could be useful. I might not want to wait 100 seconds after I've finished my reply, but I don't imagine I'd mind waiting 100 seconds from the time I first opened the thread or started typing.

As someone with experience in both implementing and bypassing spam protection, this idea makes no sense to me. Let's say it takes the user 30 seconds to calculate proof-of-work. 30 seconds per sign up (at effectively no cost) is quite good compared to current solutions spammers use to bypass CAPTCHAs - third-world sweatshops with people solving CAPTCHAs for <$1/hour. What's stopping a spammer from implementing your algorithm in more efficient C? Still assuming the algorithm takes ~30 seconds, all this does is limit the user/spammer to 172,800 sign-ups per CPU core per day, making it (much) less effective than rate-limiting sign-ups coming from the same IP address. Increasing the difficulty of the proof-of-work would just frustrate real users and make them leave your page for a competitor. I know cryptocurrencies are hip and all, but the problem space for proof-of-work algorithms is much smaller than some people here seem to think.

172,800 signups per cpu core per day seems... off. Since there are only 84,400 seconds per day, wouldn't that be 2,880 signups per core per day? Doesn't seem terribly profitable. Particularly when compared to, say, 20 signups per core per second with no proof of work (~1.6 million per core per day).

Even if you reduced the proof of work from 30 seconds to two seconds, that's negligible to the user when submitting a form, but limits a computer to performing a mere 42,200 signups per day, which is several orders of magnitude better than the 1.6 million figure above.

And nothing says you can't also implement rate limiting per IP, which is frankly not terribly useful when facing botnets.

Oops. You're right. I got my numbers wrong, but my point still stands. Unless your goal is denial of service (which this does nothing against), you don't need to make more than 2,880 requests a day.

Most generic spammers (wordpress/mediawiki pharma spammers) post 1-100 messages and then move on to the next site. If someone wanted to specifically target your site, they would use a botnet or write a GPU proof-of-work solver.

You're also right about IP rate limiting not being useful against botnets, but neither is this. For pretty much the same exact reason. Each machine in a botnet comes with a unique IP address, but also 2 or more CPU cores.

Just use a CAPTCHA. CAPTCHAs aren't the problem. The way we use them is. Don't show CAPTCHAs to every user. Ask some questions first. Is there something unusual about this user? Is there something unusual about the user's browser? Location? Is the user doing something unusual? Show them a CAPTCHA. Those "surprise" CAPTCHAs are less annoying to the real user and more frustrating to the spammer, since unpredictable behaviour makes testing their scripts harder. There's a reason why Facebook and Google do this.


Whelp, that's embarrassing. 86,400 is indeed the correct number of seconds per day.

Effectively, this achieves the complete opposite of what CAPTCHA does. It's a test that only computers can solve.

A much more exciting development in the field of bot/spam detection is verifying the integrity of the user's browser by comparing the exposed functionality (HTML5 and JS) and quirks to the ones expected for its user-agent header. I believe this is the method that Dan Kaminsky's "White Ops" startup uses.

On few of my sites running drupal and which had ALOT of bogus registrations and comments, all spam was stopped. So far it works. Different story once it get big enough to warrant specific implementation, but we are not there yet.

If I made cash money everytime a bot solved a problem I'd be elated that he chose to attack me. It means I can spend my time or hire someone to moderate/delete the crap and get paid for it too.

This is next step - implement Dogecoin payments directly to site owner. How much it will generate tho - not sure yet.

This is the first CAPTCHA replacement that I really like.

Adding a delay that costs computing power will both slow down and add real costs to spammers (though if it's a bot net they'll not care about the latter).

I really dislike how impossible CAPTCHAs have become, and the slew of startups working on advertising based ones is not the solution the users want.

As a user, all I want is ease. I'm fine waiting a few seconds... just don't give me an impossible task to perform or use the inconvenience you've created as an opportunity to advertise. All I want is to do whatever I wanted to do, read, write... I can wait a moment, just don't make it painful.

Same here. First time in many years that I don't say to myself "bullshit" while looking at a purported solution to the captcha problem. The solution is at once obvious and very clever. What is the most unexpected is that it doesn't tell humans and robots apart at all. The CAPTCHA word may have blinded us to more general / obvious ideas.

I actually dislike it for the reasons you stated. I don't see this negatively impacting a botnet at all while still annoying regular users with the delay.

Also, even if they are paying for it, the "proof of work" is probably less expensive as captcha breaking.

What I'd really like to see is a more sites that do analysis to block bots, rather than annoying users. Spammers generally need to place a link and do so quickly. It seems much easier to target people who post links in their first 5-10 posts than to do anything else.

This was a reason I made this project. Thank you for great feedback! :)

Thank you for making it. (I'll beg you for an invite later.)

I'd experimented with embedding Javascript Bitcoin miners in my website as a means of monetization without ads. I see this as a more intuitive version of that which restricts the user based on usage (posting rate) instead of on viewership.

The issue I encountered was a miniscule rate of return on the JS-based mining. Do you think that if HashCash integrates with BitCoin for monetization/mining purposes, JS speed will be a limiting factor?

To be precise, right now it uses Dogecoin proof of work. I got it running pretty good with asm.js compiled scrypt miner code. In future I will need to solve scaling problem, as right now it runs with fixed 2 threads only. In future it will self adjust based on user computer performance.

Right now there is no payouts to webmasters, but this is next feature I am working on. It will be made directly to webmaster dogecoin address from p2pool I am running for this project.

Woww. Such clever. Much compile.

I've been trying to get Hamiyoca's JS miner to work with mupool without much success. (Stratum is straitforward. I'm just dumb.) p2pool is way easier from the cursory glance I've given it. I'm surprised I hadn't heard of it before.

Any chance I could trouble you for an invite for jo.jcat at gmail? I'm already using proof-of-work to keep spam off my blog (josephcatrambone.com), but I wouldn't mind switching early to something more conducive. Hell, if the payouts are nonzero when they roll around, I might be able to strip the ads. Alternatively, what's the official way to request an invite? Sign up for the news letter and wait?

basic dashboard is out there, but payments are not yet. there is not much stuff behind login form yet :) so yeah, i would rather develop it further before letting people signup for it. till then you still can use it as a proof-of-work solution using keys generated on home page :)

I will definitely email invite codes to everyone once it is ready to go.

I really like that it works fine on my mobile too.

Have you done tests on many platforms to show which ones are supported?

I'd probably like it to be more automatic... a single clickable progress bar without the need for the 2 text fields or generate button.

But yeah... this is ace.

I created jQuery plugin with that kind of widget, but really, any type of widget can be built on top of it. What jquery plugin does - used API library to communicate anyway, so if people will want different type of widgets - they will come.

As for support on different platforms - Safari is tricky one right now. Trying to figure out why it is not working. IE before version 10 will not work, because it uses typed arrays in javascript. Other than that it should work in any modern browser in theory.

To allow for larger workload values, could the user start "earning" points by doing jobs as soon as they reach the website, and then "spend" those points when they decide to post. The average user will have been on the site for a few minutes once they decide to make a post. When they reach the post page, they will have already earned the post privilege in the form of a payload that has been stored in local storage or cookies.

Interesting idea. It would also give people who frequent the site the most the loudest voices (which could be a good or bad thing)

$url = 'https://hashcash.io/api/checkwork/' . $_REQUEST['hashcashid'] . '?apikey=[YOUR-PRIVATE-KEY]'; $work = json_decode(file_get_contents($url));

Pretty sure this code is vulnerable to local file inclusion. Running file_get_contents on unchecked user input is a terrible idea, even more so coming from a "security solution".

How do you turn a prefix of "https://hashcash.io/" into a local file inclusion?

It was quick and dirty example on how to use it. Frameworks usually expose builder function where you can pass query as and array/hash and that should be used. Do you have better example which will be clear about what happening and use pure PHP?


    if(strspn($_REQUEST['hashcashid'], 'abcdef0123456789-') !=
         die('Invalid character.');

The last time I saw this kind of proposal, it was even simpler than this.

The work was defined as: Given the following salt, construct a payload such that a md5 hash of the salt and payload would result in a hash with n consecutive 0's.

Simple to implement on both ends, and fast enough to not impede much in the user experience but still expensive enough to limit spam. And it becomes easy to increase the work proof required - just increment 'n'.

I built the same thing in terrible form a few years ago! https://github.com/007/hashcash-js

I always wanted to go back and put in a TOTP seed and properly productize it to make it easier to use, but I never made the time. It makes me happy to see someone take it seriously, and build a proper version for modern browsers.

I implemented a proof of work system to protect access to a large number of webpages before.

The problem is that a native implementation is an order of magnitude faster than the best JS interpreter, and it can be 2 or 3 orders faster than an old browser, or on mobile. It would leave IE7 hanging actually.

I concluded that the only way to use a proof of work system effectively would be native crypto primitives in the browser itself.

Another way we explored was making a very simple challenge (not computationally intensive), that yet you can only realistically solve with a JS interpreter. The goal shifted to "spend cpu time", to "raise the difficulty of programming a bot".

I know there are lots of ways to include a decent JS interpreter in a headless program, but it seems this hasn't caught on with bot makers yet.

I keep getting "[fail] Proof of work not calculated" as my Public Key

If you could send me info about browser you use and extensions to pavel@karoukin.us - I would greatly appreciate it.

I was receiving this error message in Chrome on on a Mac in both regular-mode and incognito mode.

Here's more info about my browser: http://aboutbrowser.com/view/JRv4J

I'm getting this on Firefox 28 on Linux Mint even after disabling privacy addons.

I'm also getting this in Firefox 31 on Mac OS X.

thanx. I will look into this issue. Do you have any privacy-related extensions installed by any chance?

So do I.

This isn't a fundamental problem with the idea here, but one thing I don't like is that it makes hashcash.io a point of failure in your form submission. If the site goes down, either you can fail open (spammers get to comment) or closed (nobody gets to comment), but either way you fail. It's also going to add additional latency to your form submission.

I don't see any reason why this needs a third party, although I do grant that makes it easy to use.

Why does this have to a sort of CAPTCHA? It could just be the payment method for your API. I'm not sure that the current difficulty of Bitcoin mining makes this practical, but the idea would be that in order to get to my precious API you mine a little bit for me.

This way you don't need to buy API credits or stuff like that, login and billing are somewhat conflated.

Maybe someone with better knowledge of Bitcoin can comment on the feasibility.

Idea of CAPTCHA is to differ a human from a bot. By making bot's job harder you don't stop them.

I think the same thing we saw at the beginning of bitcoin would happen here - a GPU implementation would be able to run hundreds of the proofs in parallel. Changing the difficulty to compensate for this would be way too much of a burden on the CPU-bound JS implementation.

This can't really be viable unless it is for something where the attackers need ridiculously large scale. Hashcash was actually invented for preventing spam emails, where attackers are used to sending billions of messages per day.

If you have a large social site, you could just set up real turing tests between potential sign-ups and a list of the existing users, with users being removed from the list if they get too many false positives and negatives.

In a similar vein, I had an idea for a social site where everyone was invited via other people, establishing the chain of invites. Spamming from one of the leaf nodes propagates upwards as bad karma, decreasing exponentially (or by some power, depending on how strict you want to be about anti-spam ruling) with each parent step. If the inviter's karma drops below a certain threshold, he/she can't invite more people. If it drops lower still, the branch dies and everyone on it gets banned.

This is probably not a great user experience (getting banned because someone above you invited a bunch of spammers), but I think it's conceptually interesting to distribute the anti-spam onus to the users of the community instead of the administration in a form other than 'report spam'.

This wouldn't 'distribute the anti-spam onus', though. No one wants spam in their community, you don't need to teach people that it's bad. What this system would do is distribute the punishment for the presence of whatever the moderators decide is spam in a way which is disproportionate to users' actual responsibility for it.

Just having an invite-only system should solve the problem equally as effectively (although it creates its own issues with politics, groupthink, etc.)

I don't get it... why not just make your account creation endpoint sleep for 10 seconds? Add a little thing for users to focus on so they don't notice the delay. All set.

because user can create 1 million of accounts in parallel

It's a good idea, its just that not everyone is using a top of the line machine, mobile users will take longer than desktop users for example.

They're definitely onto something though.

great idea! but is it necessary ux to burden the human user to manually trigger unlock? form onfocus or page onload should be less intrusive and still serve the purpose

Other types of widgets can be built on top of API. jQuery.hashcash.io plugin is just one implementation. API docs are yet to be built tho :)

Sorry, can't reply everyone as HN keep erroring with "you submitting too fast". Feel free to reach me with any questions at pavel@karoukin.us

We marked your account as legit. If you still have the problem, let us know at hn@ycombinator.com.

Recent versions of SpamAssassin already include support for this and subtract (varying amounts of) points for e-mails containing valid Hashcash tokens.

Equivalently, you could just make the user pay a tiny amount of BTC or similar. Then they can cache the work so there's no UI delay.

While not with jQuery.hashcash.io plugin, but with direct use of API you can make it transparent for use to calculate it in background after form focus event for example. I am working on complete API documentation so third-party widget could be created.

Have you thought of the possibility of increasing the difficulty after each subsequent request from the same IP address?

You can require arbitrary complexity and therefore if you implement logic which will increase complexity every time username/password are not valid, you should be able to achieve just that.

Checkout jQuery plugin github repository for details on available options.

If the backend is based on http://www.hashcash.org/ then it should be trivial to do so.

So what's the user experience??

What's the advantage of this versus just rate-limiting the number of requests per IP.

Takes way too much time, I wouldn't wait.

I've got an idea. CAPTCHAs could be replaced with suitable items of work from Amazon's Mechanical Turk. It stands to reason that these can't be solved by current computer vision algos, or else they wouldn't be on Mechanical Turk in the first place(?), and the side benefit is that the site owner can collect the 5 cents or whatever the going rate is.

Not necessarily. Let's say a company has some computer vision problem that it needs to be solved with 99.5% accuracy. Let's say today's computers can solve it with 85% accuracy.

The computers can't solve the business need, but they are certainly right often enough to get some spam comments posted (because their cost of being wrong is zero.)

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact