Back when the commercial internet was just getting its act together there were companies that would give you free online access on Windows 3.1 machines in exchange for displaying ads in the e-mail client. (I think one was called Juno.)
The hitch was that you could only use e-mail. No web surfing. No downloading files. No fun stuff.
But that's OK, since there were usenet- and FTP-to-email gateways that you could ping and would happily return lists of files and messages. And if you sent another mail would happily send you base64-encoded versions of those binaries that you could decode on your machine.
The free e-mail service became slow-motion file sharing. But that was OK because you'd set it up before you went to bed and it would run overnight.
That reminds me of the first time I accessed World Wide Web. Back in '96 I was browsing a computer magazine and happened upon a listing of useful mailing lists, one of which returned the contents of web pages for a requested HTTP address. Same magazine had an install CD for the free Juno email service.
Being a teenager, the first web page I ever requested was www.doom.com, which returned a gibberish of text to Juno's email client. It was an HTML file full of IMG tags (one of those "Click here to enter" gateway pages), but I had no idea what I was looking at at the time. Somehow figured out to open the file in IE2 and saw... a bunch of broken images :)
I still vividly remember the sense of wonder that the early Internet evoked.
EDIT: Just checked the Wayback Machine. Looks like www.doom.com was not affiliated with the game at the time, so I must have browsed to www.idsoftware.com instead.
It's really sad thinking how kids these days totally miss the wonder of the early internet.
In my case, it was at the public library. The lone internet computer was constantly booked. But by watching over a library clerk's shoulder, I was able to see the password needed to unlock the text-based library catalog terminals (which terminals were plentiful and always available). (My parents worked at the library, or else I never could have pulled that off.) Once unlocked, I was able to use Lynx to telnet into my favorite MUD game. Unfortunately it didn't last long until a librarian caught me, which I think resulted in me being grounded from the library for a month or something like that.
Yep and the Shar command that created a bash wrapper round sections of uuencoded data, so you could email a file in segments and conveniently recompose and run it to get the file back, without needing Shar at the other end. Good times.
That brings back memories. Of using an email gateway to get an Amiga fred fish disk - delivered as shar pieces to my uni email account (only staff had telnet, ftp, etc. access). Then assembling the pieces in /tmp on the departmental unix server. Then switching to a PC to use Kermit to get the contents onto a PC floppy. Then using an Amiga utility to be able to read PC format disks to copy them to an Amiga floppy.
I've no memeory of what motivated me to spend so much time just to be able to view some low-res, low-fps 3 second video clip, listen to 8-bit tracker "tunes" and try out some free application that invariably crashed the machine after a minute or two of use.
The original Juno ad server proxied the ads from the internet to the email client, and the proxy was wide open for several months. The first time I ever accessed the open internet at home was by dialing into the email service and bouncing through the proxy. I believe it was closed due to it being shared in the letters section of a hacker zine.
First time I was able to access the WWW via a graphical browser I had a dial-in shell account at an ISP (or BBS or whatever they called themselves back then), then there was a program called "slirp" (which, amazingly enough, seems to have a wiki page at https://en.wikipedia.org/wiki/Slirp ) which allowed one to run "SLIP" (IP-over-serial) over the terminal connection to get IP access from my computer. Amazingly I got it to work, considering I barely knew what I was doing back then.
One big reason why I became a Linux user was that the TCP/IP stack for Win 3.1, Trumpet Winsock, was amazingly unstable and would regularly crash the entire OS. Linux had, even back then, a stable TCP/IP stack. And fantastic advancements like preemptive multitasking running in protected mode so errant user-space applications didn't crash the OS.
For anyone else who's as confused as I initially was: Google Drive allows unlimited storage for anything stored as "google docs". Ie, their version of Word. This hack works by converting your binary files into base64 encoded text, and then storing the text in collection of google-doc files.
Ie, it's actually increasing the amount of storage space needed to store the same binary, but it's getting around the drive-quota by storing it in a format that has no quota.
It's an arms race situation. Once you give me an information channel like a "word document", I've got an endless variety of ways to encode other things into it. I can encode bits as English sentences or other things that will be arbitrarily hard to pick up by scanning.
If I were Google, I wouldn't try to pick up on the content, I'd be looking for characteristic access patterns. It's harder to catch uploads, since "new account uploads lots of potentially large documents" isn't something you can immediately block, but "oh, look, here's several large files that are always accessed in consecutive order very quickly" would be harder to hide. It's still an arms race after that (e.g., "but what if I access them really slowly?"), but while Google would find a hard time conclusively winning this race in the technical sense, they can win enough that this isn't fun or a cost-effectively technique anymore (e.g. "then you're getting your files really slowly, so where's the fun in that?"), which is close enough to victory for them.
So, I'd say, enjoy it while you can. If it gets big enough to annoy, it'll get eliminated.
They can just throttle access to Google documents to something like 4 GB per hour and then block obvious abuses. If people start encoding bits as English sentences they are reducing the amount of useful data they can download within an hour which is exactly what you want.
"They can just throttle access to Google documents to something like 4 GB per hour"
No, that's not likely to work. I'm sure there's far more legitimate users using 4GB of documents per hour than abusers right now. You have to remember things like bulk downloading, bulk scanning, bulk backing-up, shared automated accounts doing all sorts of legit things, etc. are all legitimate use cases. You can't just throw out all "big uses" or your enterprise customers are going to pitch a fit, and that's a bigger problem than people abusing your storage for a while.
(Those things will still have different access patterns than abusers, but thinking about how that will manifest and could be detected is an good exercise for the reader.)
I'd say most certainly do not try this. Do you want to loose access to your gmail, maps, contacts, whatever else you rely on Google for, because you were found abusing google drive?
Seems like more of a "can" rather than anyone actually using it. Always interesting to see how something can be broken or exploited, even though it may not be practical
I wonder why they do that. It seems to me like it would be more effort to leave the Google Docs files out of their calculation, and with no real benefit. For conventional use of Google Docs it would be hard to use a significant amount of disk space, so it's not like users would be clamoring for additional space.
Perhaps it's just marketing, trying to prize people away from Microsoft Office with a thing that doesn't actually cost them all that much?
In the same spirit, I made a few "just for fun" plugins for my (now abandoned) encrypted-arbitrary-storage Dropbox-like application Syncany:
The Flickr plugin [1] stores data (deduped and encrypted before upload) as PNG images. This was great because Flickr gave you 1 TB of free image storage. This was actually super cool, because the overhead was really small. No base64.
The SMTP/POP plugin [2] was even nastier. It used SMTP and POP3 to store data in a mailbox. Same for [3], but that used IMAP.
The Picasa plugin [4] encoded data as BMP images. Similar to Flickr, but different image format. No overhead here either.
All of this was strictly for fun of course, but hey it worked.
Anything that persists can be used to store arbitrary data... I remember (around a decade ago now, I'm not sure if these still exist) coming across some blogs that ostensibly had images of books, details about them, and links to buy them on Amazon and such... I only understood when I came across a forum posting from someone complaining that his ebook searches were clogged with such "spam blogs", and another poster simply told him to look more carefully at those sites, but not to say anything more about his discoveries. You can probably guess what you got if you saved the surprisingly large "full-size" cover image from those blogs and opened it in 7zip!
I feel less hesitant about revealing this now, given how long ago it was and that more accessible "libraries" are now available.
IIRC the “mods are asleep, post […]” 4chan meme originally came from “mods are asleep, post high res” threads where to an outside observer they were just posting high-resolution images of inane things, but there was actually steganography of some sort going on to hide child porn (I think) inside the files.
I can't remember if I tried, but it's important that you get the exact data back that you put in, which is why JPEG obviously won't work.
BMP is the easiest to encode/decode because it's literally a bitmap of RGB, no fancy compression and such, which, if you're storing arbitrary data is obviously not necessary.
PNG was trickier, because of its "chunks" and generally more structure. And compression.
Indeed. I love the "sorry @ the guys from google internal forums who are looking at this" line at the github. All tongue in cheek and aware of the situation.
TBH this is not unlike reporting a security bug to a company as a white hat, but more like a grey hat here.
> If the few blokes using this scam their way into few hundred terabytes of free storage, so be it, it's not worth the hassle for Google, imo.
This. They probably thought of this exact scenario before adding unlimited docs. They probably even expected somebody to make a script for it. Hell, a few of them might even have a script.
As long as a lot of people don't start abusing it or make a file-sharing service based on it, then they probably won't care. Basically, not until it's a significant enough threat to their bottom line.
Ultimately, it's no different than the inevitable person that just has a script to generate garbage and upload it to Google Docs as fast as possible. That's what the 250 docs a day limit is there for.
I've seen a few stories of businesses and their employees all losing their Google accounts, just because the company hired a freelancer who had previously been banned, and Google detected the association. (Pretty sure they got the accounts back after some public outrage.) I wouldn't risk intentionally violating their terms if you're not quite ready to wake up one day 100% Google-free, or very good at hiding your tracks.
The fact that this is even a remote possibility should worry everyone of the ugly monopoly that Google became.
I found myself in a similar situation a couple months ago. An android App falsely charged me on the Play store. After trying to contact Google for multiple weeks I gave up and disputed the charge on my credit card. This resulted in Google coming after me for 8.99$ and threatening me to close all my Google accounts including gmail, calendar, photos, drive and everything I rely daily in Google.
That was a wake-up call for me. I decided to move everything OUT of Google. That company got too much power, it should worry way more people.
Yeesh. I had the same happen - Except I never followed through on reversing the charge on my credit card. I spent multiple hours trying to dispute $2.99 or something. Clearly not for the monetary value - just from pure frustration!
However, I was scared for my Google account so just ended up dropping it. Ridiculous.
"An android App falsely charged me on the Play store."
I'm curious to know how this happened. Would you mind sharing more info?
As I understand it, the only way for an app to 'charge you on the play store' is to:
1) Be a paid app (in which case you pay before the app starts installing), or
2) via in-app purchases, which are handled by the app initiating the IAP, and then Play services taking over to ask for confirmation.
In either case, the transaction is only confirmed by a user action (tapping a button) with the app having no control.
Sure, it's possible for an Android app to trick you, by covering everything apart from the button with something fake, but I'd be surprised if such an app found its way into the Play Store.
Sure, I might have hit a corner case but I made an in-app purchase for a one year subscription for a service.
After using the app for a couple of days and restarting the phone, the app seemed to hit a bug and behave like if I didn't buy the subscription, prompting me to buy another subscription which I did thinking that this would unblock the backend and somehow merge with the fact that I already had a subscription.
Unfortunately, Google Play charged me again for a subscription I already had. Both the app creator AND Google Play were difficult to join. The App creators never replied to any of my emails. Google Play got an automated support website that decided that "I was not eligible for a refund" and there was nothing I could do about it. It also seems to be impossible to contact a real human being to explain the situation.
Nextcloud, preferably on a machine you own (but there are companies selling Nextcloud hosting as well). It replaces Google Drive, Contacts, Calendar, Photos (face recognition can be done with a third-party app), has an RSS reader, bookmarking service etc. Just look at its app store, you can install any of this with two clicks: https://apps.nextcloud.com/
It really is a suite that can combat Google's suite — and you can truly own it. Other than that, DDG for search, and your own domain for email (so that you could transfer it between different hostings if necessary).
I do have a Google account, but I use it for precisely two purposes: Google Play (my phone wouldn't work without one) and YouTube subscriptions (I can use an RSS reader for this, but it's a bit inconvenient). You can create a Google account without creating a Gmail account.
May I suggest using NewPipe[1] for a Google-account-free experience to follow channels? You can import them from your current subscription list, and easily export them when you switch phone or for backup purposes.
The sooner you start, the better. I've moved most of my email/contacts/calendar away [0], and the longer you give yourself to catch the things you've signed up for but forgotten, the better. Youtube was also a pain, but I transitioned my subscriptions manually to a different account. Maps seems like it'd be the trickiest if you're invested. I wasn't a heavy user, and maps still works pretty good when you're logged out.
[0] I use fastmail + custom domain, which works great, but you have to guard the domain very closely.
I think OP means that you have to make sure you don't forget to/neglect to renew it and make sure you don't accidentally lose the domain for any reason.
Thank you for the clarification. I use a dedicated card for domain hosting (with autorenewal enabled) to prevent this specific issue but I recognize most people likely don't do the same.
spot on, basically you now have to worry about the domain being lost or hijacked also. for me, the flexibility to change email providers behind a domain is worth it though
It means if you slip up and lose your domain, nobody can send you email (including 2FA, reset password, add a new email to your account, etc). You can imagine how inconvenient that would be. I use fastmail with a custom domain and that scenario gives me nightmares.
Mostly off-topic, but related: this is one of the major reasons email needs to finally go away. It was never intended to be the backbone of peoples lives in the way it has become.
Access to my email account probably gives you more access to my life and identity than my SSN [0].
I long for the day that we [1] all get assigned a public/private keypair instead of SSNs. That won't fix everything, but it's a huge step above a shared secret that is limited to 9 digits [2].
[0]: Even without signing up for a bunch of services, it's basically impossible at this point (at least in the US) to not have an email address associated with your bank account, car loan, mortgage, credit card, or even just watching TV.
[1]: "We" meaning "US citizens" or anyone else with a similar system.
[2]: I realize you also need info about the person and not just their number, but also apply that to keypairs.
Have the organization responsible of managing the PKI to generate a new subkey from your primary key (kept in cold storage) and publish a certificate revocation for the previous subkey lost/leaked.
Most of our ID cards (health, driving license) already have an expiration date and the subkeys should have one anyway.
No reason you can't have more than one, either. You could even issue keys for people to act on your behalf (e.g. they get access to it on your death as part of your will).
I have been doing it for a long time, the hardest for me is all the registered users I have around the web linked to the email. After a few years of changing each one that mattered I finally get close to zero mail on gmail.
Search I moved to ddg, that was the easy one.
Android can work fine with just f-droid since I noticed I rarely even use the store any more and I need just a few essential apps. For storage, I tend to store only documents and I like to use mega.nz.
The only thing I haven't managed to find a even close to decent alternative it's photos. Google Photos is just simply too good. I would be even willing to pay but really, all the other apps struggle to get sync right or have some other crappy stuff that makes them barely usable.
As I wrote that comment I went on another small search as I do every so often and I found Canon Irista and I have to say, I am impressed. The sync seems to work fine, it's pretty fast and the UI both of the website and the app is pretty solid. I suggest giving it a try if you are on the lookout for a new photo hosting service.
Are they? Email, calendar, online office, cloud storage etc. are all available from various other companies(even beside the big few corporations). The only two areas where you'd really have to sacrifice features would be Android apps, and YouTube if you're running a channel.
tell me which provider has an integrated single signon service for all of those above? Which provider has apps for their service for all major OS'es (including mobile), and is mostly free (or low cost)?
Microsoft does. OneDrive for storage, Outlook webapps does email and calendar. Office online has Word, Excel, etc. All accessed with one Microsoft account. All free.
You might not want to be tied to Microsoft but Google is not the only option.
Edit: Overlooked the comment about Apps. Microsoft offers apps for mobile, but not Linux. Although even on Windows I use the browser to access the services which will work on Linux.
Microsoft's office offerings comes very close (cept for the free part - which i guess is just a bonus and not a requirement). Although i have to say, despite microsoft's attitude for keeping compatibility and old stuff working, they too could chuck a google reader one day, and deprecate/remove a needed service (along with all your data).
What's needed is a syndication of data, and inter-operable apps. Like how xmpp worked. But of course, all vendors don't like this, because it turns themselves into a commodity.
That gives a new meaning to "google bombing" -- a bad actor could cultivate a terrible google rating, then hire onto a low level freelance gig at a big company they wanted to bomb by association! Let's just say Oracle as a hypothetical example -- Russia, if you're listening...
> Pretty sure they got the accounts back after some public outrage.
This may happen only if you manage to get it to the front page of HN or have many Twitter followers. In most cases you don't stand much chances though.
Hehe, I was just thinking how simple it will be for Google to identify accounts using this technique from simple usage analytics. I suspect this will not work for long... but still super cool!
Base85 would probably be a better choice for storing binary as text, since it has a ratio of 5:4 instead of 4:3.
On the topic of "unusual and free large file hosting", YouTube would probably be the largest, although you'd need to find a resilient way of encoding the data since their re-encoding processes are lossy.
I like the "Linux ISO" and "1337 Docs" references ;-)
Back in the day of email gateways between different networks, there used to be a terrible problems with all the tin-pot dictator IBM SYSADMINs at BITNET sites who maintained their own personal styles of ASCII<=>EBCDIC translation tables, so all the email that passed through their servers got corrupted.
EBCDIC based IBM mainframe SYSADMINs on BITNET were particularly notorious for being pig-headed and inconsiderate about communicating with the rest of the world, and thought they knew better about the characters their users wanted to use, and that the rest of the world should go fuck themselves, and scoffed at all the unruly kids using ASCII and lower case and new fangled punctuation, who were always trying to share line printer pornography and source code listings through their mainframes.
"HARRUMPH!!! IF I AND O ARE GOOD ENOUGH FOR DIGITS ON MY ELECTRIC TYPEWRITER, THEN THEY'RE GOOD ENOUGH FOR EMAIL! NOW GET OFF MY LAWN!!!" (shaking fist in air while yelling at cloud)
It was especially a problems for source code. That was one of the reasons for "trigraphs".
>Trigraphs were proposed for deprecation in C++0x, which was released as C++11. This was opposed by IBM, speaking on behalf of itself and other users of C++, and as a result trigraphs were retained in C++0x. Trigraphs were then proposed again for removal (not only deprecation) in C++17. This passed a committee vote, and trigraphs (but not the additional tokens) are removed from C++17 despite the opposition from IBM. Existing code that uses trigraphs can be supported by translating from the source files (parsing trigraphs) to the basic source character set that does not include trigraphs.
I don't think you want your backups in Google Docs either, given that Google may decide to ban you for TOS violations at any time.
I really do think videos would would work, reliably, given sufficient redundancy. Again, we have QR codes already, so this is a proven idea. You can't make QR codes unreadable without removing lots of perceptual visual details. The risk, as with using Google docs, isn't that Google will change their encoding, but that Google will just take down the videos for service misuse.
I think it would be comparatively more difficult for Google to detect this stuff in a video compared to a text document, because you expect some videos to be long and large. The entirety of the Encyclopedia Britannica comes out to less than 500 MB in a .txt document, so using any reasonable amount of space in a Google Doc should quickly raise red flags.
youtube probably doesn't save the originals (though they could in some cold-storage tape drives, perhaps). But even still, it's not difficult to imagine that there may at some point exist a compression algorithm that can be applied to existing compressed video that could change a couple bits around in whatever encoding scheme you've chosen. Depending on the file type, that could be enough to corrupt the whole thing.
Sure you can get around this by adding ECC, but that isn't implemented here.
Base64 has the advantage of relative ubiquity (though Base85 is hardly rare, being used in PDF and Git binary patches). It also doesn't contain characters (quotes, angled brackets, ...) that might cause problems if naively sent via some text protocols and/or embedded in XML/HTML mark-up.
> YouTube ... you'd need to find a resilient way of encoding the data [due to lossy re-encoding]
That should be easy enough: encode as blocks or lines of pixels (blocks of 4x4 should be more than sufficient) in a low enough number of colour values (I expect you'd get away with at least 4bits/channel/block with large enough blocks so 4096 values per block) and you should easily be able to survive anything the re-encoding does by averaging each block and taking the closest value to that result.
Add some form of error detection+correction code just for paranoia's sake. You are going to want to include some redundancy in the uploads anyway so you can combine these needs in a manner similar to RAID5/6 or the Parchive format that was (is?) popular on binary carrying Usenet groups.
A few years ago I also found a backup tool that converted backups to DV videos, so that you could write them on cheap DV cassettes. It was something like more than 10 GB per cassette. Definitely not bad for a few years ago.
The nice thing about yEnc is that it only has to escape NUL, LF, CR, and the escape character itself '=', so it essentially uses all but 3 characters out of the 255 possible values.
While this works over NNTP, SMTP and IMAP (and possibly POP), I'm not sure if it will work over HTTP if any of the servers use the Transfer Encoding header.
Even URL shorteners offer unlimited storage if you jump through enough hoops.
To encode ABCDEFGHIJKLMNOPQRSTUVWXYZ first get a short url for http://example.com/ABC, then take the resulting url and append DEF and run it through the service again. Repeat until you run out of payload, presumably doing quite a few more than 3 bytes at a time.
The final short url is the your link to the data, which can be unpacked by stripping the payload bytes then following the links backwards until you get to your initial example.com node.
I've lost track of the number of times I've seen variants on "Hey, a link shortener is a fun first project for this new language I'm learning; hey, $LANGUAGE_COMMUNITY, I've put this up on the internet now!... hey, uh, $LANGUAGE_COMMUNITY, I've had to take it down due to abuse." There are numerous abuse vectors. Optionally promise to get it back up real soon now, as if there are actually people depending on it.
Maybe it isn't a bad first project, but on no account should you put it up on the "real" internet and tell anyone it exists.
In 1998, the EFF and John Gilmore published the book about "Deep Crack" called "Cracking DES: Secrets of Encryption Research, Wiretap Politics, and Chip Design". But at the time, it would have been illegal to publish the code on a web site, or include a CDROM with the book publishing the "Deep Crack" DES cracker source code and VHDL in digital form.
>"We would like to publish this book in the same form, but we can't yet, until our court case succeeds in having this research censorship law overturned. Publishing a paper book's exact same information electronically is seriously illegal in the United States, if it contains cryptographic software. Even communicating it privately to a friend or colleague, who happens to not live in the United States, is considered by the government to be illegal in electronic form."
So to get around the export control laws that prohibited international distribution of DES source code on digital media like CDROMS, but not in written books (thanks to the First Amendment and the Paper Publishing Exception), they developed a system for printing the code and data on paper with checksums, with scripts for scanning, calibrating, validating and correcting the text.
The book had the call to action "Scan this book!" on the cover (undoubtedly a reference to Abby Hoffman's "Steal This book").
A large portion of the book included chapter 4, "Scanning the Source Code" with instructions on scanning the book, and chapters 5, 6, and 7 on "Software Source Code," "Chip Source Code," and "Chip Simulator Source Code," which consisted of pages and pages of listings and uuencoded data, with an inconspicuous column of checksums running down the left edge.
The checksums in the left column of the listings innocuously looked to the casual observer kind of like line numbers, which may have contributed to their true subversive purpose flying under the radar.
Scans of the cover and instructions and test pages for scanning and bootstrapping from Chapter 4:
(My small contribution to the project was coming up with the name "Deep Crack", which was silkscreened on all of the chips, as a pun on "Deep Thought" and "Deep Blue", which was intended to demonstrate that there was a deep crack in the United States Export Control policies.)
The exposition about US export control policies and the solution for working around them that they developed for the book was quite interesting -- I love John Gilmore's attitude, which still rings true today: "All too often, convincing Congress to violate the Constitution is like convincing a cat to follow a squeaking can opener, but that doesn't excuse the agencies for doing it."
The next few chapters of this book contain specially formatted versions of the documents that we wrote to design the DES Cracker. These documents are the primary sources of our research in brute-force cryptanalysis, which other researchers would need in order to duplicate or validate our research results.
The Politics of Cryptographic Source Code
Since we are interested in the rapid progress of the science of cryptography, as well as in educating the public about the benefits and dangers of cryptographic technology, we would have preferred to put all the information in this book on the World Wide Web. There it would be instantly accessible to anyone worldwide who has an interest in learning about cryptography.
Unfortunately the authors live and work in a country whose policies on cryptography have been shaped by decades of a secrecy mentality and covert control. Powerful agencies which depend on wiretapping to do their jobs--as well as to do things that aren't part of their jobs, but which keep them in power--have compromised both the Congress and several Executive Branch agencies. They convinced Congress to pass unconstitutional laws which limit the freedom of researchers--such as ourselves--to publish their work. (All too often, convincing Congress to violate the Constitution is like convincing a cat to follow a squeaking can opener, but that doesn't excuse the agencies for doing it.) They pressured agencies such as the Commerce Department, State Department, and Department of Justice to not only subvert their oaths of office by supporting these unconstitutional laws, but to act as front-men in their repressive censorship scheme, creating unconstitutional regulations and enforcing them against ordinary researchers and authors of software.
The National Security Agency is the main agency involved, though they seem to have recruited the Federal Bureau of Investigation in the last several years. From the outside we can only speculate what pressures they brought to bear on these other parts of the government. The FBI has a long history of illicit wiretapping, followed by use of the information gained for blackmail, including blackmail of Congressmen and Presidents. FBI spokesmen say that was "the old bad FBI" and that all that stuff has been cleaned up after J. Edgar Hoover died and President Nixon was thrown out of office. But these agencies still do everything in their power to prevent ordinary citizens from being able to examine their activities, e.g. stonewalling those of us who try to use the Freedom of Information Act to find out exactly what they are doing.
Anyway, these agencies influenced laws and regulations which now make it illegal for U.S. crypto researchers to publish their results on the World Wide Web (or elsewhere in electronic form).
The Paper Publishing Exception
Several cryptographers have brought lawsuits against the US Government because their work has been censored by the laws restricting the export of cryptography. (The Electronic Frontier Foundation is sponsoring one of these suits, Bernstein v. Department of Justice, et al ).* One result of bringing these practices under judicial scrutiny is that some of the most egregious past practices have been eliminated.
For example, between the 1970's and early 1990's, NSA actually did threaten people with prosecution if they published certain scientific papers, or put them into libraries. They also had a "voluntary" censorship scheme for people who were willing to sign up for it. Once they were sued, the Government realized that their chances of losing a court battle over the export controls would be much greater if they continued censoring books, technical papers, and such.
Judges understand books. They understand that when the government denies people the ability to write, distribute, or sell books, there is something very fishy going on. The government might be able to pull the wool over a few judges' eyes about jazzy modern technologies like the Internet, floppy disks, fax machines, telephones, and such. But they are unlikely to fool the judges about whether it's constitutional to jail or punish someone for putting ink onto paper in this free country.
Therefore, the last serious update of the cryptography export controls (in 1996) made it explicit that these regulations do not attempt to regulate the publication of information in books (or on paper in any format). They waffled by claiming that they "might" later decide to regulate books--presumably if they won all their court cases -- but in the meantime, the First Amendment of the United States Constitution is still in effect for books, and we are free to publish any kind of cryptographic information in a book. Such as the one in your hand.
Therefore, cryptographic research, which has traditionally been published on paper, shows a trend to continue publishing on paper, while other forms of scientific research are rapidly moving online.
The Electronic Frontier Foundation has always published most of its information electronically. We produce a regular electronic newsletter, communicate with our members and the public largely by electronic mail and telephone, and have built a massive archive of electronically stored information about civil rights and responsibilities, which is published for instant Web or FTP access from anywhere in the world.
We would like to publish this book in the same form, but we can't yet, until our court case succeeds in having this research censorship law overturned. Publishing a paper book's exact same information electronically is seriously illegal in the United States, if it contains cryptographic software. Even communicating it privately to a friend or colleague, who happens to not live in the United States, is considered by the government to be illegal in electronic form.
The US Department of Commerce has officially stated that publishing a World Wide Web page containing links to foreign locations which contain cryptographic software "is not an export that is subject to the Export Administration Regulations (EAR)."* This makes sense to us--a quick reductio ad absurdum shows that to make a ban on links effective, they would also have to ban the mere mention of foreign Universal Resource Locators. URLs are simple strings of characters, like http://www.eff.org; it's unlikely that any American court would uphold a ban on the mere naming of a location where some piece of information can be found.
Therefore, the Electronic Frontier Foundation is free to publish links to where electronic copies of this book might exist in free countries. If we ever find out about such an overseas electronic version, we will publish such a link to it from the page at http://www.eff.org/pub/Privacy/Crypto_misc/DESCracker/ .
It seems like a cute and irrelevant distinction that electronic software would be published in a book. If researchers created a computer that processed information using proteins in plant cells instead of electrons, and such a computer could execute programs on this book directly instead of “scanning” it, would not the textbook be software? When laws say “electronic versions” I don’t think they literally mean to refer electrons, but rather, computer-consumables/executables.
Was this tested before a court and did they accept this sort of obviously subversive behavior? (Not that I personally agree with the laws restricting crypto export.)
IANAL, but if the distinction clashes with the crypto export laws, does it not follow that crypto export laws clash with the first amendment? Which then makes them unconstitutional and the focus should be on whether that is wanted behavior and the constitution should be amended, or not.
> The First Amendment made controlling all use of cryptography inside the U.S. illegal, but controlling access to U.S. developments by others was more practical
A bizarre intersection with this: I once had to prepare part of my employer's source code for registration with the US copyright office. They wanted "the first N pages" of the source code, for N of a dozen or so. After consulting with the lawyers making the filing, I ended up making a pdf that included main() and the first few functions that it called until I got up to N pages.
If you save digital data (ASCII) on an analog form like a cassette tape is that okay? Seems you could alternativly put metallic strips in a book. What about QR Codes? Could you have a massive QR Code on each page which contains a section of source code? Could you use an alternative encoding like dots and lines (.||.||....|.|.|..|) to represent 1s and 0s which is easy to scan (and not require OCR/checksums)?
To what extent does analog encoding fall under the illegal threshold?
This is exactly what I am getting at. For the most extreme example, consider a swarm of nano bots hovering in the atmosphere that implement a computer that can understand and directly execute algorithms spoken in human speech transmitted through pressure fluctuations. There is no distinction that can universally separate speech and computer programs.
The checksums in the left column of the listings innocuously looked to the casual observer kind of like line numbers, which may have contributed to their true subversive purpose flying under the radar.
Are you implying there's something more interesting there than just the DES source code and related data that the book already very clearly claims to contain?
Afaik Phil Zimmermann was one of the first to do it, in '95 through MIT Press—when his PGP circulated a bit too widely for the export regulations. However, the question of him being protected under the 1st wasn't decided in the court.
At least a totally separate account. Probably better to use a totally separate set of IP addresses and browsers and maybe even computers. Google will definitely link accounts created from the same browser and potentially ban your main account if you violate their TOS on another account also owned by you.
This is a complete hack job and probably useless if Google changes free storage for docs.
That being said, they currently allow the guys at /r/datahoarder to use gsuite accounts costing £1 for life with unlimited storage quotas. These are regularly filled to like 50TB and Google doesn't bat an eye.
Anyone remember Gdrive? I can’t find it now, but I think it was probably early or mid 2000s. It let you store files as a local disk (FUSE) via Gmail attachments.
I remember using it back in 2005 iirc, and it was amazing. The files had a label called gmailfs.gDisk which is how it could keep the "file system" separate from the rest.
Now Google generously offers Drive with 15Gigs of space.
There were a couple of different projects at the time (listed in "Other Resources" on the project page) that sought to provide a programmatic Gmail interface.
I still have a "ftp" label in Gmail (checks notes 15 years later...) from the experimental FTP server I implemented as a libgmail example. :D
The libgmail project was probably the first project of mine which attracted significant attention including others basing their projects on it along with mentions in magazines and books which was pretty cool.
I think my favourite memory from the project was when Jon Udell wrote in a InfoWorld column ( http://jonudell.net/udell/2006-02-07-gathering-and-exchangin... ) that he considered libgmail "a third-party Gmail API that's so nicely done I consider it a work of art." It's a quality I continue to strive for in APIs/libraries I design these days. :)
(Heh, I'd forgotten he also said "I think Gmail should hire the libgmail team, make libgmail an officially supported API"--as the entirety of the "team" I appreciated the endorsement. :) )
The library saw sufficient use that it was also my first experience of trying to plot a path for maintainership transition in a Free/Libre/Open Source licensed project. I tried to strike a balance between a sense of responsibility to existing people using the project and trusting potential new maintainers enough to pass the project on to them. Looking back I felt I could've done a better job of the latter but, you know, learning experiences. :)
My experiences related to AJAX reverse engineering of Gmail (which was probably the first high profile AJAX-powered site) later led to reverse engineering of Google Maps when it was released and creating an unofficial Google Maps API before the official API was released: http://libgmail.sourceforge.net/googlemaps.html
Me and a friend came up with a similar idea of a sort of distributed file system implemented across a huge array of blog comment sections. Of course you’d need a bunch of replication and fault tolerance and the ability to automatically scrape for new blogs to post spammy-looking comments on, but I thought it was a pretty funny and neat idea when we came up with it.
I heard about a subreddit a while ago, where every post/comment was a random string. It was speculated at the time that something similar was going on.
It's even more interesting to think about this in the context of preserving banned information for future generations. For example, if all the countries in the world united to ban the New Testament. But you eventually realize the ephemeral nature of the net will probably prevent it from fulfilling such long-term data-archiving roles and you're better off burying manuscripts deep underground.
Even scarier than that would be a Turing complete language where the code is stored and memory is written to comments sections. The actual execution could be done by reading, execution function, and writing comments to store working memory and results. I guess with cryptographic encryption you could even hide what your doing.
Very neat, but it seems to me the issue with all wink-wink schemes like this is that you're ultimately getting something that wasn't explicitly promised, and so might be taken away at any time. So while interesting you couldn't really ever feel secure storing anything that mattered this way.
That references "for education", but it's also true for GSuite Business (and enterprise, but not basic). You'll need to be paying for at least 5 users, or $60/month.
I think that depends on their tools and how they evaluate data usage. If the reporting states that the accounts are using very little storage because it's using the same measuring stick that the client does them it's invisible. The question comes up during an audit of the system when the disk usage doesn't match the report. Then again, if this is used by few people it may just look like a margin of error.
It'll more be that the Google docs "live editing" backends are expensive to use disk and memory wise. They store complete version history with each keystroke of a document.
There's a good chance a megabyte of "document" costs Google a gigabyte of internal storage...
They don't store a complete version history. It just uses checkpoints in their timeline of real-time edits and computes the differences when you need them. Those deltas can also be compressed.
Honestly this isn't ground breaking, we have been using BASE64 to convert binary to ASCII as a way of "sharing" files all the way to USENET days. While applications like these make it easy for the masses to participate in the idea, they don't bring anything new to the table.
That all said, this is really cool from a design perspective and I poured over the code learned a lot.
Google doc allows you to upload images from your computer. Why not just do that? With proper steganography no one will bat an eye on a few docs with some multi-megabyte pictures.
I had an (evil; don't do this) idea a while back to create a Dropbox-like program that stores all your data as binary chunks attached to draft emails spread across an arbitrary number of free email accounts.
Definitely would make an interesting learning exercise--I learned way more about SMTP/POP protocols* than I did before when I implemented demonstration SMTP/POP servers for my libgmail library before Gmail offered alternate means of access.
These days there's even the luxury of IMAP. :D
[*] About the only thing I remember now is the `HELO` and `EHLO` protocol start messages. :)
I may be mistaken, but as far as I'm aware Google docs synced to your local machine are nothing more than links to documents in the Google Drive cloud. None of the data inside those docs is actually stored locally. I found this out the hard way when I decided to move away from GD and lost a lot of files.
Should you want to move from Google services, the best way of ensuring you keep your data is to use Takeout [1], which exports your documents as both doc and html files.
Reminds me of the old programs that would turn your Gmail storage into a network drive by splitting everything into 25MB chunks. Utterly miserable experience with terrible latency and reliability.
This is a favorite lunch topic at work. AFAIK we stumbled on the idea ourselves, but I'm not surprised to hear it's unoriginal. Rather than a list, our design is a tree structure where leaf nodes contain data and branch nodes contain lists of tinyurls...
Because I'm sure Google has NO data on pathalogical docs file sizes. I can't wait for the follow on 'Google banned my account with all my life's data that I didn't back up anywhere for no good reason'
I discovered that a lot of pirate stream sites are already doing something similar (but not exact) to this.
They store fragments of movies (rather than the full videos) in Google Drive files and then combine them together during playback. Each fragment could then be copied and mirrored across different accounts, so if any are taken down they can just switch to another copy. Pretty clever (albeit abusive) solution for free bandwidth.
Very cool! About a year ago I had a similar idea, but to store arbitrary data in PNG chunks[1] and upload them to "unlimited" image hosts like IMGUR and Reddit.
Although if Picasa (predecessor to Google Photos) worked with BMP, it may be better to do that because it's much easier and more space efficient to encode arbitrary data in than PNG.
But I don't understand why Google would do that. For most users, aren't Google Docs files a substantial part of their usage? Or do people mainly store backups?
FTA "A single google doc can store about a million characters. This is around 710KB of base64 encoded data."
This means that in order to reach the limit of the drive space given away for free, they'd need something like 15,000 Google Doc files (15GB) if they counted toward your space limit. I doubt a lot of paying customers even reach that.
The real limit (file size) is reached by binaries. Videos and PDFs, usually.
I bet implementing such a limit would be 3 or more months of engineering effort.
Think about the difficulties. It has to take into account shared directories. It has to know about systems which auto-create documents (like results sheets for Google forms). It has to work with gsuite sysadmins who need to take ownership of files from deleted accounts. The UI to show when you have hit the limit has to be designed. And the support team has to be trained on how to resolve that error. And you're going to have to get that error message translated into 30 languages. Users already over the limit are going to be unhappy - are you going to write extra code to give them a grace period? How will you notify them of that? Will you have a whitelist of users allowed to go over the limit? How will you keep the whitelist updated and deployed to the servers? Who will have access to add/remove people from the whitelist?
The actual system itself has race conditions:. What if that 15000th file was simultaneously created in the USA and Europe? There is no way to prevent that without a cross-ocean latency penalty. Do you want to pay that penalty for every document creation? How do you deal with a net-split where cross ocean network traffic is delayed?
Finally, how will you monitor it? Will you have logging and alerting for users hitting the limit? Will there be an emergency override for engineers to remove the limit if necessary?
At big-web-service scale, simple engineering problems become complex problems fast...
I've wondered if someone could do the same thing with videos and jpgs. Amazon prime, as one example, allows you to store an unlimited number of image files for "free". What if there was a program that would take a video file and split it up into its individual frames as jpgs and stored them on Amazon prime. When you wanted to watch the video the program would rebuild the video file from the individual jpgs on AWS.
My guess would be that the latency of this approach would be far too high to be practical. But you could probably abuse the JPEG format to stuff bits of the video into image files. I think you'd probably still need to spend a fair amount of time buffering before you could start watching without lag.
To me it looks like iodine (https://code.kryo.se/iodine/): very nice as a hacking tool to prove a point, but unlikely to be actually helpful in all but very peculiar situations. As a hacker, of course, I value a lot the first part of it!
On the edonkey network, the file size would be reported raw but the clients could compress and transfer chunks to each other. Some guy had created an empty IL-2 sturmovik iso and seeded it. We lived at a government facility with ill-policed high speed (for the time) internet but even then I knew that I didn’t have a 400 Mbps connection. Maybe 2002/2003.
The whole thing only transferred a few kB. It looked like an entire disc though.
Maybe I'm missing a reference or joke here, but the size of a file means little with respect to how much it can be compressed. You can get a 1 petabyte file down to a few bytes if it's just `\0` repeated over and over.
It isn't truly an oversight, it's an abuse of the fact that Docs/Sheets/Slides are not counted toward your quota. Their storage model is a little more complicated than a standard stream of bytes like an image or a text file.
Has anyone actually tried storing a large amount of data like this? I feel like creating a new google account and using it as a backup for a 300gb folder I have.
Yes. It's called: Post to alt.binaries.* on Usenet.
It's effectively the same thing under the hood. Binaries are split and converted to text using yEnc (or base64, et al.) and uploaded as "articles". An XML file containing all of the message-IDs (an "NZB") is uploaded as well so that the file can be found, downloaded, and reassembled in the right order.
This form of binary distribution has been around since the '80s if you change some of the technical details; e.g. using UUencode rather than yEnc.
Spend $5 for a 3-day unlimited Usenet account with e.g. UsenetServer.com and upload it.
If you want it to stay up, then make another account in 3925 days (the retention period), download it, and then reupload it for another 10+ years of storage.
Couldn't you also embed data into images and upload them to Google photos, or is that discarded when they convert and compress the image in the backend?
I mean include binary data in an image file. So you would have a 300x300px jpg picture of a flower that's 20mb which you could unpack to a binary file.
Base 64 gives you 6 bits per character. Assuming a character requires 8 bits to store eg in UTF8 then yep that’s 8:6. Might be better with compression getting you closer to 1:1.
The script split file into small base64 chunks that are stored into "documents" (mime type: application/vnd.google-apps.document ) that apparently don't count against google drive quota.
Back when the commercial internet was just getting its act together there were companies that would give you free online access on Windows 3.1 machines in exchange for displaying ads in the e-mail client. (I think one was called Juno.)
The hitch was that you could only use e-mail. No web surfing. No downloading files. No fun stuff.
But that's OK, since there were usenet- and FTP-to-email gateways that you could ping and would happily return lists of files and messages. And if you sent another mail would happily send you base64-encoded versions of those binaries that you could decode on your machine.
The free e-mail service became slow-motion file sharing. But that was OK because you'd set it up before you went to bed and it would run overnight.
Thank you, whomever came up with base64.