Hacker News new | past | comments | ask | show | jobs | submit login

It appears that Bitstamp did indeed use the creation transaction API in bitcoind which returns a transaction ID and they made the incorrect decision that the transaction ID returned had properties many programmers associate with IDs, like "meaning anything at all."

What they should have done was waited an hour then done an O(n) scan of all transactions globally in history to find the transaction by inspecting for parameters which exactly matched the ones they provided. That is, the Bitcoin developers now say, the correct use of the create transaction API.

Let me use an example programmers may be familiar with. Twilio lets you do SMS messages with three parameters: from_number, to_number, message. You are given back an SMS ID, which you can query to see the results of the SMS message (like, say, was it delivered successfully or did it fail with an error like "that telephone number did not exist").

Here's a discussion with Twilio in the bizarro world where it's like Bitcoin.

Me: "Hey Twilio I created an SMS message but when I try to query it for the results it 404s."

Them: "Are you sure you created the message?"

Me: "Yep pretty sure."

Them: "Are you sure you are looking for the right message ID in /messages/:id?"

Me: "Yep, I'm using the one that I got back when I created it."

Them: "Maybe it changed."

Me: "... What?"

Them: "Message IDs can change."

Me: "They don't usually change."

Them: "Of course, they don't usually change. Why have an ID if they usually changed? They only change some of the time."

Me: "What determines if a message ID changes?"

Them: "Oh, anyone globally can change your message IDs."

Me: "That sounds a bit insecure for a system which is, by its nature, deployed in a hostile environment."

Them: "Don't worry, they can't change after about an hour. Well, probably. It would be pretty expensive for an attacker to change them after an hour. Don't worry though, you'll never need an ID."

Me: "I find IDs useful for querying things. Like, say, messages. Which I have to do. To see whether the message was successful or not."

Them: "Well you're already downloading every message ever. Just scan through for one which matches the same from number, to number, and message contents."

Me: "... You're serious."

Them: "Don't worry though: they can't touch the from number, to number, or the message contents."

Me: "... Does this sound a little problematic to anyone else?"

Them: "It's on our wiki, noob!"

[Edit: Maybe somebody thinks I'm joking. Let me point you to one of the dangerous functions.


Name: sendtoaddress

Parameters: <bitcoinaddress> <amount> [comment] [comment-to]

Comments: <amount> is a real and is rounded to 8 decimal places. Returns the transaction ID <txid> if successful.

You should naturally, upon reading this documentation, figure "I should immediately discard that transaction ID, because it could be changed instantaneously after this message call. If I instead rely on that transaction ID, I will allow malicious users to break the software I am building."]

I feel I need to clarify some of your points, as they're a little misleading.

> Don't worry, they can't change after about an hour. Well, probably. It would be pretty expensive for an attacker to change them after an hour.

You use the term "pretty expensive" here without qualifying it. Changing a transaction encoded in the blockchain would require outpacing the current hashrate of the bitcoin network. That would require a significant hardware investment, on the order of tens of millions of dollars.

> Well you're already downloading every message ever. Just scan through for one which matches the same from number, to number, and message contents.

You make it sound as if you wouldn't have to do this if you had the transaction hash. You still need to iterate through the transactions regardless. It's just a question of whether you use the transaction hash, or derive your own from the parts of the transaction that are immutable.

Let's make your example a touch more realistic:

Me: "Hey Twilio I created an SMS message but when I try to query it for the results it 404s."

Them: "Has the message been delivered?"

Me: "I don't think so. I'm querying it shortly after I create it."

Them: "How are you querying it?"

Me: "With the message hash."

Them: "Ah, that explains it, then. A pending message may be changed before its delivered, altering the hash. This makes the hash unsuitable for identifying pending messages."

Me: "So how do I identify messages?"

Them: "Ideally you wait until they're delivered, but if you really need to check for pending messages, you can search through them looking for a message that matches on to, from and content."

Me: "That kinda sucks."

Them: "We know, but it's a difficult issue to fix. It's documented in our wiki."

Me: "What if I don't read your wiki, or follow your mailing list?"

Them: "Then should you really be running an exchange handling millions of dollars of transactions?"

Me: "... Good point."

There's one problem with this, at least if you're using the official client: the Bitcoin APIs telling you what transactions you've sent and received don't tell you where the transactions came from, which is what you need to know in order to match them. Indeed, the only way it provides to uniquely identify the transaction is the transaction ID, which can change unexpectedly.

MtGox should absolutely have known about this issue. Everyone building anything related to BTC should know not to make assumptions about the protocol, and to treat every input as hostile.

You'd think they'd have at least one guy dedicated to nothing but breaking their software. They make my salary every day with transaction fees (well, maybe until recently) so you can't say they're unable to afford it.

Edit: Rereading this, it sounds more accusatory than I intended. I think your clarification was perfect, but at the same time that MtGox is at fault.

Sorry to ask that but if to, from and content can't be changed, why not make the transaction id a hash of these three element using any deterministic algorithm such as sha1 etc?

That is the obvious implementation workaround, and with such a 'canonical ID' software can do its own malleability-resistant transaction-tracking.

However, the hash over the malleable part is still protocol-significant: which exact incarnation of the isomorphic transaction is being passed around or cemented into blocks. So this new stable ID would be in addition to the older one, and might not even be necessarily expressed inside the protocol: it might just be a convention, and could vary across independent implementations.

The MTGox statement was a plea for the community to converge on such an consensus identifier before MtGox commits to a local fix. But that's not strictly technically necessary, so their stance looks like a strategy for blame-shifting and further delay. The Bitcoin core people don't like to rush into things.

As I understand it, this is essentially the fix the exchanges need to implement. The exchange can generate a hash on the address, outputs and amount, and use that to confirm whether or not the money has been sent.

However they still need the full transaction hash to reference any outputs from it, as these are identified by the (txhash, index) tuple.

> "Then should you really be running an exchange handling millions of dollars of transactions?"

So what happens to people who aren't running an exchange handling millions of dollars of transactions? It doesn't matter if they get screwed by this flaw?

I read much of the wiki and never encountered any reference to transactional malleability.

Presumably those people aren't writing their own custom Bitcoin client libraries.

So if you want to set up your own shop you are stuck doing transactions by hand or you have to use a 3rd party like bitpay?


The point of bitcoin is being able to do it yourself and not rely on centralized institutions.

Well don't forget, there's a standard client that works fine. These exchanges were writing their own custom clients, but that's probably not something a one man shop would have to do. And if you have to modify the client, you should definitely be reading the wiki. And test it extensively when you're running million dollar exchanges.

Not quite fine: http://www.reddit.com/r/Bitcoin/comments/1xm49o/due_to_activ...

The bitcoin reference client seems to get confused by this. It seems to allow additional spending of the unconfirmed change addresses and forms a chain of double spent transactions. The bitcoin balance as reported by 'getbalance' also becomes unreliable as it computes the balance incorrectly. Eventually the wallet stops working.

Funnily enough, I've had a similar situation to that in my line of work.

It wasn't twilio, but it turned out that when we submitted a SMS message of over 160 characters, the provider split it into 160 chunks and sent out as multiple SMS.

So far, so normal. But what happened when the first chunk sent successfully and the second chunk failed?

We got back a notification to say "MessageID: 4ACB-etc Result: OK" but the customer never got the message, and scanning the report on the provider's site showed the customer number, time and message as having failed.

But then the representative agreed it was a problem and set out to fix it rather than blaming our dependence on the ID!

This sums up my view on these recent turn of events.

I find it quite ridiculous that people are trying to lay the blame on not reading an obscure wiki page. I remember reading much of the bitcoin wiki myself and never seeing ANYTHING about not relying on transaction IDs. The API list doesn't even warn you about it.

Why bother returning a transaction ID if it is spoofable? That is simply misleading.

I guess it shows you how how biased all the bitcoin backers are.

For those who seek to understand, rather than just mock with exaggerations:

You don't have to scan all transactions: only those from a firm reference-point of available-funds state, essentially the same point that was used to compose the outbound transaction.

Robust software already has to examine all incoming confirmed-in-block transactions for whether those transactions have consumed prior funds. If they have, even if the local software had as its design goal exclusive control of those funds, the local software must adapt to the new information. (Given the possibility of backups/virtualization-clones/private-key-exports, software must always be open to the possibility another node elsewhere has spent pending funds first.)

So safety against this particular mischief is possible with the same practice that's necessary for other reasons: it's not involved extra work.

Also, it's not "an hour" that lets a node know when it can rely on transaction-state, but block-confirmations, a precise and observable transition. One block is almost always enough, but each additional block adds more certainty. Still, all Bitcoin software already needs to handle occasional orphaned blocks and short forks, so being sensitive to periods of uncertainty is a essential part of all implementations, not extra work because of this one gotcha.

A better analogy than Twilio would be commercial payment systems: there you need to systems that are checking for weeks or months for chargebacks or reversals.

But an even better analogy than proprietary pay-per-use payment systems is SMTP or BitTorrent. The system is an emergent mess anyone can plug into. There are a lot of sharp edges, and even with great care, you're going to hit some painful and costly bugs. Those building billion-dollar businesses on such systems need to be experts, and will still take some arrows, but each incident that doesn't kill the software/business stacks only leaves them stronger.

> What they should have done was waited an hour then done an O(n) scan of all transactions globally in history to find the transaction by inspecting for parameters which exactly matched the ones they provided.

Oh, you mean the scanning they have to do already, to verify "all transactions globally in history"? Inspecting all parameters on all transactions since the genesis, like you'd already have to do to verify they are not stealing or creating money from nothing? The inspection you have to do just to locate even the same transaction you submitted to the network yourself, to verify it was accepted? And you have to spend like 10 whole seconds of CPU time doing this, per ~10 minutes that a new block comes out, verifying the transactions from the last 10 minutes? Golly, that is sooooo much more onerous than just running the blockchain securely! /sarcasm

I'll agree that it's embarrassing, misleading, not documented well, and not gracefully handled by the community now that everyone points their fingers at each other. But you are deliberately making it sound worse by re-describing standard parts of the bitcoin protocol, as if they are new requirements in order to get a sane ID. Anyone writing financial software should be more than capable of quickly adding a few function hooks into the existing process to get a deterministic normalized ID, and the amount of extra computing resources is negligible compared to what you already have to do, just to use bitcoin safely.

I keep asking this over and either I'm missing something or no one knows the answer: The bitcoin devs keep saying that the malleability is a known "feature" since 2011 and that it's MtGox/Bitstamp's problem for searching for transactions based on tx ids. But what the fuck is the point of a tx id, practically speaking, how is it useful at all given that it's malleable?

It's not malleable once it's been included in the block-chain at a certain depth, before that it is malleable. It appears this was known at least by some people since 2011, but it doesn't seem like the information was widely publicized in a way that would help people to understand the possible consequences. From what I gather, even the reference implementation doesn't handle these consequences particularly well.

Now THAT is the ELI5 explanation I was looking for. This analogy is so much clearer than all the articles!

btw, why is the transaction ammount a real number in there? Wouldn't it be clearer to use an integer multiple of whatever the lowest denomination of bitcoins would be?

I was looking into that, but got sidetracked and found this comment from the ref client's lead dev...

sendtoaddress didn't always return a transaction id. It was changed to do that to facilitate bookkeeping. Sort of ironic.


To be fair, he made that addition before anyone, including probably Satoshi Nakamoto, knew that the Bitcoin "protocol" allowed transaction IDs to change at will for up to ~1 hour after them being sent.

This is exactly how the transaction balance is encoded, as integer multiples of satoshi's (the basic unit of bitcoin, 0.00000001 BTC. All known cryptocurrencies do this to prevent float/double rounding errors.

Comment of the year award. Seriously.

I normally enjoy reading your posts here but this one I have to say I find a bit gleefully FUD-tastic. Given that you describe BitCoin as "magic Internet money" [0] I can't help but feel you simply rejoice in BitCoin failures as vindication of your belief that crytocurrency will never work.

There seem to be two prevailing extremes of opinion which appear a lot on Hacker News and many other places, as extremes are wont to do while those in the middle don't feel strongly enough to contribute. Those are 1) BitCoin will replace government control of money and fix freedom, dude! and 2) What fucking morons, can't wait until you crash and burn.

I love this whole thing. It's fascinating, it's an interesting solution to a problem, and watching DogeCoin take off is fun to watch. In my opinion, BitCoin is kind of like when a naïve programmer decides to rewrite an existing library themselves, and comes up against the brutal reality that led the original developers to the compromises and apparently necessary hacks to get the thing working. The analogy here being regulation, insurance, all that jazz. It's educational, and I haven't been this interested in a technology for a while.

I'm picking on your reply here because it is one of many that exemplifies a "haha told you so" rather than really digging into the interesting technical and sociological aspects.

* Message IDs from one API do not equate to IDs from the other, so your example is a bit flawed; there's no way to check with Twilio except the ID.

* Is "ID" even the term used? I don't know for sure, but "Tx Hash" seems to be more widely spread. [Edit: patio11 edited his comment while I was typing mine; I withdraw this point!]

* "It's on our wiki noob" - someone running the 3rd largest exchange should hardly be a "noob"

* "O(n) scan of all transactions globally" - that's not particularly hard, nor is it necessary (why scan all transactions from all time?), nor is it unexpected (the entire thing requires everyone to have the complete ledger, so you have the data anyway)

There are valid points to be made that the BitCoin protocol needs improvements, and these are even acknowledged by the core devs. This whole situation is a bit ludicrous. But I wish we were talking about "what have we learned", not "told you so".

When it comes to BitCoin, the conversation seems to be full of radicals and optimists when success happens, and gloaters when it doesn't. I don't feel either add to the conversation, we could be talking about how to improve this as a currency or (as I believe the long term actual application to be) how this can influence distributed trust, especially important in the current climate.

[0] https://twitter.com/patio11/status/431347845031940096

The Bitcoin community routinely uses the phrase "magic Internet money" to describe Bitcoin. You can verify this trivially via Google. There it is often used with the millenial I'm-joking-but-not-really sensibility. You might reasonably guess that that is not the type of humor I was going for with the reference to the community's in-joke, but that in-joke was not born of ignorance on my part.

People often deploy the word FUD to describe arguments about technology which have no basis in technical fact. Can you identify statements which I've made about Bitcoin which have no basis in technical fact?

I had never heard "magical internet money" before, so I apologise there.

As for the second, FUD was then incorrect. You points were factual. I believe they ignored certain other facts for the convenience of argument (like, most exchanges seemed to know about it). But FUD was the incorrect term.

I'm a bit sad to see my main argument derailed by semantic failures. I guess I need to learn a lot about debating on the internet.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact