There's already a proposal floating around on the dev list to introduce a new address type, P2SH^2, which would allow the relevant data fields currently being used to stuff the info discussed in the article to become hashes. This would have the effect that if you wanted to stuff arbitrary data, you'd be limited to finding hash collisions. See the thread here: http://sourceforge.net/mailarchive/message.php?msg_id=307056...
I've been lurking on the bitcoin-dev list for a while to observe how they handle issues just like this. I'm confident that these problems will be transient.
While such a scheme might solve this particular method of storing data, it won't make the general problem go away.
For instance, if I want to publish an 'n' byte message, I could generate 'n' wallets, each having as their final byte of their fingerprint the n'th byte of my message. Constructing 'n' such wallets will require on average "256*n/2" units of work---quite small, all things considered. I can then transfer a single bitcoin to each wallet in turn, forming a linked list of the bytes in my message. Even better, I also get my coin back at the end.
Entirely unnecessary. Transaction fees address the crapflooding issue, and the fact that you can embed only a very small amount of arbitrary data in each transaction means that you're unlikely to be able to harm someone else by forcing them to "possess" your various short byte sequences (which will, on disk, of course be interspersed with the rest of the transaction data).
This is to say nothing about the fact that everyone else running bitcoin will also possess these bytes in their blockchains, making the possession of them rather unextraordinary.
This whole article is just the latest in "Bitcoin doomed to fail, and here's why!" bullshit that's been going on for what feels like a decade but is really only 3 years or so.
This is actually a big issue for bitcoin, I don't think we should avoid it by saying it's the "standard bullshit".
We also not really talking about small amounts of data (at least at the moment) a few megabytes is relatively significant...
I think the fact that it's "unextraordinary" to possess this data is the interesting thing. That may force a legal distinction which in itself pushes us toward a different understanding of "illegal data" and that perhaps the legal system has to give up on that and move towards accessing or "distributing with intent" being the illegal rather than just possession.
It doesn't seem like hashing the data fields reduces the problem to finding collisions. Say I hash 2 values. I can distribute these and say H1 corresponds to a '1', and H2 corresponds to '0'. Now, I can store 1 bit of arbitrary data per transaction. Of course, I can increase the amount of data per transaction by exponentially increasing my initial work (and the size of the lookup table). But given that transactions are cheep, I would not need to compute the complete rainbow table to make this practical, and you only need to do the hashing once.
The flaw in all of this is that you're embedding 20 bytes in the blockchain. In order to do that, you have to choose an encoding method. In order to get the embedded data back out, you have to convey the encoding method and which block is encoded with it to the recipient. Which is totally pointless because if you can convey that to the recipient then you might as well just use the same communications method to convey the original message. The only plausible reason not to would be if the law prohibits the message but not the information about how to construct the message from the blockchain (and even that is not guaranteed). Even then all you would accomplish is to cause the government to pass a new law that prohibits you from telling anyone that a prohibited link is encoded in the block in the same way that you're currently prohibited from telling anyone the link itself.
The real problem here is not that child pornographers would actually use bitcoin to distribute links, it's that assholes who want to damage bitcoin would put contraband in the blockchain in order to cause legal trouble for innocent users.
But I think that's a broader problem than just bitcoin. You can encode anything into anything. Take anything anyone else has posted and xor it with something you want to encode. The output will resemble garbage rather than either input. But now you can post the "garbage" and instructions on what to xor it with to allow anyone to recover your encoded message, and the poster of the other message becomes an unwilling participant in your encoding scheme. It clearly makes no sense to punish distributors of the original message just because the encoded message is contraband. Which doesn't mean there won't be laws that will punish it anyway, but that is the fight that needs to be won -- to not allow stupid laws that would punish innocent people.
The "encoding method" is ASCII text. The data can be extracted by running "strings *.dat" on the block chain on any UNIX system. After that you can grep through the output.
This is not like XORing data, or as some people have said "everything occurs somewhere in the digits of pi". The blockchain in no sense encodes all possible values, or a fraction thereof, the data is trivial to extract.
This is much more like, it's sitting on the webserver, but not indexed by google. It's actually even worse than that because you can still just grep through the blockchain and find what your interested in.
This may or may not be a problem for bitcoin, but I think it is legally problematic at the moment. This may move us toward a world where it's not illegal to store any particular data or even distribute it. The illegal act might be the viewing or "distribution with intent" or the data. I think that would be an interesting development.
Personally as a user of Bitcoin I've deleted the standard qt client, I personally don't want that data on my computer. I now use a blockchainless client (Electrum).
An encoding method is ASCII text. You could use ASCII compressed with gzip, or bzip2, or lzma. You could use Unicode. You could use a previous block as the key and encrypt with AES, or Blowfish, or 3DES. You could store an IP address and port rather than a URL as the first six binary octets. Or encode the IP using base64, or hex.
No matter what you use, you have to convey that to the party you're trying to communicate the information with -- you at least have to convey the fact that you've encoded something in the blockchain so that the receiver knows to look for it there. How is it easier to convey "you should download the bitcoin blockchain and run strings against it and the URL is the 352nd one you find [out of the six thousand URLs various unrelated people will have encoded]" than to just send the damn URL directly to the person you're telling where to look for it?
>This may or may not be a problem for bitcoin, but I think it is legally problematic at the moment. This may move us toward a world where it's not illegal to store any particular data or even distribute it. The illegal act might be the viewing or "distribution with intent" or the data. I think that would be an interesting development.
I think it would be a welcome development. Right now people are too afraid to be distributors, which makes things difficult for whistle blowers and democracy advocates in oppressive regimes and others who have legitimate reasons to want anonymous censorship-resistant publication methods.
"you at least have to convey the fact that you've encoded something in the blockchain so that the receiver knows to look for it there. How is it easier to convey "you should download the bitcoin blockchain and run strings against it and the URL is the 352nd one you find [out of the six thousand URLs various unrelated people will have encoded]" than to just send the damn URL directly to the person you're telling where to look for it?"
So to my mind it's not that different than a search engine. The blockchain doesn't just contain URLs it contains "meta data" as well. Right now, if you want to find links related to err, certain kinds of illegal photography you just have to search for the relevant keywords in the blockchain and you'll find it and the URLs.
"I think it would be a welcome development. Right now people are too afraid to be distributors, which makes things difficult for whistle blowers and democracy advocates in oppressive regimes and others who have legitimate reasons to want anonymous censorship-resistant publication methods."
Yes, I absolutely agree with you, we live in interesting times.
No, it's not just ASCII text. Only miners (like Satoshi's headline in the genesis block and Kaminsky's ASCII Bernake) are able to include ASCII data in blocks, everyone else needs to somehow encode it in transaction data.
Seriously? This is analogous to writing information of the plates used to print currency/every bill in the country, not scribbling a note onto a twenty.
You're either a troll or you haven't read the post.
They were, indirectly, drawing attention the non-import of that particular byte sequence - it literally does not matter, and the whole thing was to illustrate that a) you can't censor tiny speech effectively and b) DRM is doomed and the keys will always be reverse-engineered.
Saying something meaningful enough to get someone jailed for possessing a drive with the string on it (the criteria for this to be actually harmful to bitcoin) is nearly impossible in 20 bytes in most parts of the developed world.
"Saying something meaningful enough to get someone jailed for possessing a drive with the string on it (the criteria for this to be actually harmful to bitcoin) is nearly impossible in 20 bytes in most parts of the developed world."
In a sensible legal regime, sure. In a legal regime looking for an excuse to shut down BitCoin? Easily done. Technically the AACS key is still illegal. I think the latter rather than the former is more accurately the threat.
But then, if a legal regime is looking to shut down BitCoin they already have plenty of avenues. It already looks an awful lot like money laundering, for instance. So this line of thought is garbage... but only because there's no way any legal system would have to stretch this far to attack BitCoin, because they've got a wide variety of far more plausible attacks. It's not exactly the BitCoin-friendly line of argument you might be hoping for.
But no need to read the other comments, since the author wrote "Some folks have exploited that feature/flaw to publish Wikileaks cables." Information about that publication is in the immediately previous article: "That publishing capability was put into use a couple of days ago when someone publish 2.5 MB of Wikileaks cables in the bitcoin blockchain. It cost a bit of money (about $500) to accomplish that, but the information that was published is now going to be public forever."
A search finds someone who wrote "The wikileaks data starts at transaction 5c593b7b71063a01f4128c98e36fb407b00a87454e67b39ad5f8820ebc1b2ad5".
Therefore, I find your claim that there is "nothing to back it up" untenable.
I really don't understand why people are saying it's 20 bytes, the wikileaks cables are about 2mb, with a 100+ line python program. The latest issue is a long (at least 1000 lines) FAQ containing urls.
I think part of the problem is that people don't want to directly point to the data due to it's nature. But you can easily run strings over the blockchain and see what's there. I did myself and then deleted it and zero'd by free space, it's unfortunately not something I would want on my HD. I moved to a blockchainless client.
I believe that there is an even more important practical issue.
What if someone manages to embed something very much like the EICER string in it? How many people do you think would use the bitcoin client on windows if their AV automatically deleted the blockchain as it downloaded in a misguided attempt to protect them?
Of course, first we have to know if this is possible at all. Does anyone know if there's either a) 20 bytes with a very high AV detection rate or b) some way to embed more than 20 bytes in a row in the block chain?
I think there is a lot of sensationalism in the way this issue has been aproached. To inject data into the blockchain in this way is comparable (although not identical) to writing the same 'evil urls' in a dollar bill with a pen, and then passing it around.
It is a problem that exists in a different layer than the currency, even if it is to some degree 'passed on' through the currency. Likewise, the solution (imho) lies in a different layer: detect a cp link in the blockchain? Great, take down the link, problem solved.
Just as it is not the fault of TCP/IP, or its 'downfall', that it is able to transmit 'evil data', it is not Bitcoin's fault what vandals sometimes write on it.
This is wrong on all points. A dollar bill is seen by an infitesimally small number of users as opposed to 100% of full bitcoin nodes. Dollar bills are also transient and easy to destroy, whereas the blockchain is permanent and forever. Finally, there is no entity that could determine unacceptable content or delete it, and if there were it would carry the standard abuse problems of any censorship program.
So while the issue may or may not be sensationalized, writing ignorant and wholly incorrect commentary is not the antidote.
The concern here is not that a URL is embedded in the block chain, but an actual resource itself. Obviously a URL can be removed, but if the raw data is encoded in the block chain, it cannot be removed. Some people are balking at the fact that you can only encode 20 bytes at a time. However, it's already a reality that a multi-part message has been embedded in to the block chain.
So the real concern becomes, what happens when someone encodes something illegal in the block chain. As a trivial example, what happens when some sort of copyrighted material is embedded in the block chain. Can a country's court prevent it's citizens from participating in the bitcoin transactions as they hold a copy of the illegal material?
So the call is for a system whereby we bypass any legality issues by ensuring that encoding information in this way is not possible.
Is it an overreaction? Maybe it is, but since there may be real legal consequences to letting it continue, it seems prudent to put steps in place that prevent or hamper this possibility. Particularly as it has already been proven that system has been used to store data in larger quantities.
While you likely have a point that this is being sensationalised, the money analogy isn't quite accurate.
The problem is that such "evil" messages can be broadcast globally without the ability to remove them. A dollar bill with a message on it can be taken out of circulation; the bitcoin blockchain can't be reset without major chaos ensuing.
How is this any different than a government declaring bitcoin illegal for whatever legal reasoning they can come up with?
It's not practical to shut down everyone with a bitcoin database any more than it's practical to raid every server with wikileaks data. If they're going to declare this nuclear war on bitcoin it's not going to be on the basis of some piece of data which by the point it's in the blockchain is out of the bag anyway.
This brings up some interesting questions. One could also XOR some illegal data with the text of the US Constitution, then claim that the Constitution "contains" that data, you just need to XOR it with this particular key.
Obviously that's absurd, but where do you draw the line? You need specialized software and the 32 byte transaction ID in order to extract the data.
What other permanent public records could be manipulated like this?
Can someone post the command to create these messages or at least tell me what portion of the transaction this is being stored under? I am curious. My chain is a bit out of date, but I was able to generate the messages http://pastebin.com/wdpF4L4k
~/.bitcoin/blocks $ ls | xargs strings -n 20 | tee ~/Downloads/hiddenblockchain.txt
I would think the solution to this problem is to somehow prove to the blockchain that the address was indeed produced by the SHA256 RIPEMD160 process. I would think mathematically there is some way to do this without revealing what it was you hashed.
I mean, you can verify that you are who you say you are simply by using your private key to sign a message; I would think a comparable process would work for this.
EDIT: Facepalm; you're hashing the public key. You don't need to hide that. See my comment below.
First, crypto is hard. Unless a system is proven to have a certain property, assume the worst case scenario for your system.
Second, SHA256, RIPEMD160 are hashes. By definition (of an ideal hashing algorithm), every output can be generated with the same probability, and changing a single bit of the input will have a 50% chance of changing a given output bit.
I am familiar with how hashes work (I've written a Bitcoin address generator myself). To illustrate what I mean:
The Bitcoin address is just some chain of hashes (and a checksum) applied to the public key. To prove that the address IS actually output from the hash functions [and not spam], simply provide the public key along with it. Of course, you might say that is way too much data for the blockchain to handle. So you only limit the requirement of providing the public key to "suspicious" transactions. What constitutes a suspicious transaction could be a matter of debate, but I imagine it could be done, and it would avoid the problem of a Bitcoin's value depending on its ancestry.
A link to insert boogeyman here which is also on the hard disks of a few hundred thousand other people, downloaded automatically, and stored in a file with a dozen other gigs of bitcoin transaction data.
Am I wrong in thinking that this is some sort of "freedom of speech" and that it isn't necessarily wholly different than Bitcoin's sense of "hands off, let it be, decentralization, no one can control/stifle/limit/etc"?