Hacker News new | past | comments | ask | show | jobs | submit login
Unlimited Google Drive storage by splitting binary files into base64 (github.com/stewartmcgown)
585 points by lordpankake on May 15, 2019 | hide | past | favorite | 247 comments



Base64 is such a wonderful gift.

Back when the commercial internet was just getting its act together there were companies that would give you free online access on Windows 3.1 machines in exchange for displaying ads in the e-mail client. (I think one was called Juno.)

The hitch was that you could only use e-mail. No web surfing. No downloading files. No fun stuff.

But that's OK, since there were usenet- and FTP-to-email gateways that you could ping and would happily return lists of files and messages. And if you sent another mail would happily send you base64-encoded versions of those binaries that you could decode on your machine.

The free e-mail service became slow-motion file sharing. But that was OK because you'd set it up before you went to bed and it would run overnight.

Thank you, whomever came up with base64.


That reminds me of the first time I accessed World Wide Web. Back in '96 I was browsing a computer magazine and happened upon a listing of useful mailing lists, one of which returned the contents of web pages for a requested HTTP address. Same magazine had an install CD for the free Juno email service.

Being a teenager, the first web page I ever requested was www.doom.com, which returned a gibberish of text to Juno's email client. It was an HTML file full of IMG tags (one of those "Click here to enter" gateway pages), but I had no idea what I was looking at at the time. Somehow figured out to open the file in IE2 and saw... a bunch of broken images :)

I still vividly remember the sense of wonder that the early Internet evoked.

EDIT: Just checked the Wayback Machine. Looks like www.doom.com was not affiliated with the game at the time, so I must have browsed to www.idsoftware.com instead.


It's really sad thinking how kids these days totally miss the wonder of the early internet.

In my case, it was at the public library. The lone internet computer was constantly booked. But by watching over a library clerk's shoulder, I was able to see the password needed to unlock the text-based library catalog terminals (which terminals were plentiful and always available). (My parents worked at the library, or else I never could have pulled that off.) Once unlocked, I was able to use Lynx to telnet into my favorite MUD game. Unfortunately it didn't last long until a librarian caught me, which I think resulted in me being grounded from the library for a month or something like that.


It's really sad thinking how kids these days totally miss the wonder of the early internet.

And before that, the wonder of bulletin board systems. I got my first modem in 1985!


Username checks out ;)


I used Agora [1] with Juno, too! There was a particular daemon hosted in Japan, not sure how I found it but probably in a magazine.

[1] https://en.wikipedia.org/wiki/Agora_(web_browser)


Long before base64 there was UUencode, but it was quite sensitive to whitespaces and mail client reflowing, so it didn't make it to the RFC standards.


Yep and the Shar command that created a bash wrapper round sections of uuencoded data, so you could email a file in segments and conveniently recompose and run it to get the file back, without needing Shar at the other end. Good times.


That brings back memories. Of using an email gateway to get an Amiga fred fish disk - delivered as shar pieces to my uni email account (only staff had telnet, ftp, etc. access). Then assembling the pieces in /tmp on the departmental unix server. Then switching to a PC to use Kermit to get the contents onto a PC floppy. Then using an Amiga utility to be able to read PC format disks to copy them to an Amiga floppy.

I've no memeory of what motivated me to spend so much time just to be able to view some low-res, low-fps 3 second video clip, listen to 8-bit tracker "tunes" and try out some free application that invariably crashed the machine after a minute or two of use.


> I've no memeory of what motivated me to spend so much time ...

I do :)

It was the wonder of doing something for the first time. Of feeling like you were pioneering and learning and having fun all at the same time.

Truly, fun times :)


That's really slick.

The original Juno ad server proxied the ads from the internet to the email client, and the proxy was wide open for several months. The first time I ever accessed the open internet at home was by dialing into the email service and bouncing through the proxy. I believe it was closed due to it being shared in the letters section of a hacker zine.


Mary Ann Horton[1] is probably one of the people you want to thank. She's responsible for uuencode.

[1] https://en.wikipedia.org/wiki/Mary_Ann_Horton


Yep, Juno, NetZero, and Walmart BlueLight were all free ISPs that were super easy to manipulate. :)


Reminds me of usenet warez groups filled with uuencoded posts. If you took the time to reassemble them and decode, it worked.


First time I was able to access the WWW via a graphical browser I had a dial-in shell account at an ISP (or BBS or whatever they called themselves back then), then there was a program called "slirp" (which, amazingly enough, seems to have a wiki page at https://en.wikipedia.org/wiki/Slirp ) which allowed one to run "SLIP" (IP-over-serial) over the terminal connection to get IP access from my computer. Amazingly I got it to work, considering I barely knew what I was doing back then.

One big reason why I became a Linux user was that the TCP/IP stack for Win 3.1, Trumpet Winsock, was amazingly unstable and would regularly crash the entire OS. Linux had, even back then, a stable TCP/IP stack. And fantastic advancements like preemptive multitasking running in protected mode so errant user-space applications didn't crash the OS.

Good times.


For anyone else who's as confused as I initially was: Google Drive allows unlimited storage for anything stored as "google docs". Ie, their version of Word. This hack works by converting your binary files into base64 encoded text, and then storing the text in collection of google-doc files.

Ie, it's actually increasing the amount of storage space needed to store the same binary, but it's getting around the drive-quota by storing it in a format that has no quota.


Seems like a good way to earn yourself a Terms-of-Service ban.

If this considered an abuse-of-services now, the terms could be updated to clarify.

The finger print of big chunks of base-64 encoded blobs in Google Docs could be easy to spot.

If Google cares to notice this and take action, they can and will.


It's an arms race situation. Once you give me an information channel like a "word document", I've got an endless variety of ways to encode other things into it. I can encode bits as English sentences or other things that will be arbitrarily hard to pick up by scanning.

If I were Google, I wouldn't try to pick up on the content, I'd be looking for characteristic access patterns. It's harder to catch uploads, since "new account uploads lots of potentially large documents" isn't something you can immediately block, but "oh, look, here's several large files that are always accessed in consecutive order very quickly" would be harder to hide. It's still an arms race after that (e.g., "but what if I access them really slowly?"), but while Google would find a hard time conclusively winning this race in the technical sense, they can win enough that this isn't fun or a cost-effectively technique anymore (e.g. "then you're getting your files really slowly, so where's the fun in that?"), which is close enough to victory for them.

So, I'd say, enjoy it while you can. If it gets big enough to annoy, it'll get eliminated.


They can just throttle access to Google documents to something like 4 GB per hour and then block obvious abuses. If people start encoding bits as English sentences they are reducing the amount of useful data they can download within an hour which is exactly what you want.


"They can just throttle access to Google documents to something like 4 GB per hour"

No, that's not likely to work. I'm sure there's far more legitimate users using 4GB of documents per hour than abusers right now. You have to remember things like bulk downloading, bulk scanning, bulk backing-up, shared automated accounts doing all sorts of legit things, etc. are all legitimate use cases. You can't just throw out all "big uses" or your enterprise customers are going to pitch a fit, and that's a bigger problem than people abusing your storage for a while.

(Those things will still have different access patterns than abusers, but thinking about how that will manifest and could be detected is an good exercise for the reader.)


I would guess those enterprise users pay for Google docs, and could be exempted from throttling on that basis.

If they don’t, Google wouldn’t lose much by throttling them, would they?


> So, I'd say, enjoy it while you can.

I'd say most certainly do not try this. Do you want to loose access to your gmail, maps, contacts, whatever else you rely on Google for, because you were found abusing google drive?


Why not just start counting Google Doc sizes as part of the quota? That would solve the whole problem, right?


Seems like more of a "can" rather than anyone actually using it. Always interesting to see how something can be broken or exploited, even though it may not be practical


I doubt it's actually unlimited behind-the-scenes and you'll hit a hidden quota for your account type or get throttled down to dozens of KB/s.


I wonder why they do that. It seems to me like it would be more effort to leave the Google Docs files out of their calculation, and with no real benefit. For conventional use of Google Docs it would be hard to use a significant amount of disk space, so it's not like users would be clamoring for additional space.

Perhaps it's just marketing, trying to prize people away from Microsoft Office with a thing that doesn't actually cost them all that much?


> Perhaps it's just marketing, trying to prize people away from Microsoft Office with a thing that doesn't actually cost them all that much?

Exactly. Never underestimate how much people love something being "free", even if only costs a fraction of a cent.


In the same spirit, I made a few "just for fun" plugins for my (now abandoned) encrypted-arbitrary-storage Dropbox-like application Syncany:

The Flickr plugin [1] stores data (deduped and encrypted before upload) as PNG images. This was great because Flickr gave you 1 TB of free image storage. This was actually super cool, because the overhead was really small. No base64.

The SMTP/POP plugin [2] was even nastier. It used SMTP and POP3 to store data in a mailbox. Same for [3], but that used IMAP.

The Picasa plugin [4] encoded data as BMP images. Similar to Flickr, but different image format. No overhead here either.

All of this was strictly for fun of course, but hey it worked.

[1] https://github.com/syncany/syncany-plugin-flickr

[2] http://bazaar.launchpad.net/~binwiederhier/syncany/trunk/fil...

[3] http://bazaar.launchpad.net/~binwiederhier/syncany/trunk/fil...

[4] http://bazaar.launchpad.net/~binwiederhier/syncany/trunk/fil...


Anything that persists can be used to store arbitrary data... I remember (around a decade ago now, I'm not sure if these still exist) coming across some blogs that ostensibly had images of books, details about them, and links to buy them on Amazon and such... I only understood when I came across a forum posting from someone complaining that his ebook searches were clogged with such "spam blogs", and another poster simply told him to look more carefully at those sites, but not to say anything more about his discoveries. You can probably guess what you got if you saved the surprisingly large "full-size" cover image from those blogs and opened it in 7zip!

I feel less hesitant about revealing this now, given how long ago it was and that more accessible "libraries" are now available.


IIRC the “mods are asleep, post […]” 4chan meme originally came from “mods are asleep, post high res” threads where to an outside observer they were just posting high-resolution images of inane things, but there was actually steganography of some sort going on to hide child porn (I think) inside the files.

One of many Internet jokes with sinister origins.


I have a feeling PNG might work on Google Photos too, but I haven't tried it.


I can't remember if I tried, but it's important that you get the exact data back that you put in, which is why JPEG obviously won't work.

BMP is the easiest to encode/decode because it's literally a bitmap of RGB, no fancy compression and such, which, if you're storing arbitrary data is obviously not necessary.

PNG was trickier, because of its "chunks" and generally more structure. And compression.


Sounds like a great way to lose your Google account (and all your other linked Google services) for ToS violations to me.


Indeed. I love the "sorry @ the guys from google internal forums who are looking at this" line at the github. All tongue in cheek and aware of the situation.

TBH this is not unlike reporting a security bug to a company as a white hat, but more like a grey hat here.


I think they might not care unless this can somehow endanger their infrastructure. (It's not really unlimited, is it?)

If the few blokes using this scam their way into few hundred terabytes of free storage, so be it, it's not worth the hassle for Google, imo.

edit: Apparently an account can create up to 250 docs a day https://developers.google.com/apps-script/guides/services/qu...


> If the few blokes using this scam their way into few hundred terabytes of free storage, so be it, it's not worth the hassle for Google, imo.

This. They probably thought of this exact scenario before adding unlimited docs. They probably even expected somebody to make a script for it. Hell, a few of them might even have a script.

As long as a lot of people don't start abusing it or make a file-sharing service based on it, then they probably won't care. Basically, not until it's a significant enough threat to their bottom line.

Ultimately, it's no different than the inevitable person that just has a script to generate garbage and upload it to Google Docs as fast as possible. That's what the 250 docs a day limit is there for.


I think you underestimate the amount of room most warez take up. And what these distributors will go through to get free space


It seems you can only store ~177mb of data per account per day this way. Didn't test it myself.


The examples he lists in README.md are around 1GB though (Ubuntu .iso files).


The Google docs usage is limited to 250 docs/day: https://developers.google.com/apps-script/guides/services/qu...

The README.md says that UDS can store ~710kb per doc.

:shrug:


yh hopefully they don't ban me for this


I would make sure to not do this in an important Google account.


I've seen a few stories of businesses and their employees all losing their Google accounts, just because the company hired a freelancer who had previously been banned, and Google detected the association. (Pretty sure they got the accounts back after some public outrage.) I wouldn't risk intentionally violating their terms if you're not quite ready to wake up one day 100% Google-free, or very good at hiding your tracks.


The fact that this is even a remote possibility should worry everyone of the ugly monopoly that Google became.

I found myself in a similar situation a couple months ago. An android App falsely charged me on the Play store. After trying to contact Google for multiple weeks I gave up and disputed the charge on my credit card. This resulted in Google coming after me for 8.99$ and threatening me to close all my Google accounts including gmail, calendar, photos, drive and everything I rely daily in Google.

That was a wake-up call for me. I decided to move everything OUT of Google. That company got too much power, it should worry way more people.


Yeesh. I had the same happen - Except I never followed through on reversing the charge on my credit card. I spent multiple hours trying to dispute $2.99 or something. Clearly not for the monetary value - just from pure frustration!

However, I was scared for my Google account so just ended up dropping it. Ridiculous.


"An android App falsely charged me on the Play store."

I'm curious to know how this happened. Would you mind sharing more info?

As I understand it, the only way for an app to 'charge you on the play store' is to:

1) Be a paid app (in which case you pay before the app starts installing), or

2) via in-app purchases, which are handled by the app initiating the IAP, and then Play services taking over to ask for confirmation.

In either case, the transaction is only confirmed by a user action (tapping a button) with the app having no control.

Sure, it's possible for an Android app to trick you, by covering everything apart from the button with something fake, but I'd be surprised if such an app found its way into the Play Store.


Sure, I might have hit a corner case but I made an in-app purchase for a one year subscription for a service.

After using the app for a couple of days and restarting the phone, the app seemed to hit a bug and behave like if I didn't buy the subscription, prompting me to buy another subscription which I did thinking that this would unblock the backend and somehow merge with the fact that I already had a subscription.

Unfortunately, Google Play charged me again for a subscription I already had. Both the app creator AND Google Play were difficult to join. The App creators never replied to any of my emails. Google Play got an automated support website that decided that "I was not eligible for a refund" and there was nothing I could do about it. It also seems to be impossible to contact a real human being to explain the situation.


> I decided to move everything OUT of Google

Have you been successful in it? Any guidelines / tips? How hard is it?


Nextcloud, preferably on a machine you own (but there are companies selling Nextcloud hosting as well). It replaces Google Drive, Contacts, Calendar, Photos (face recognition can be done with a third-party app), has an RSS reader, bookmarking service etc. Just look at its app store, you can install any of this with two clicks: https://apps.nextcloud.com/

It really is a suite that can combat Google's suite — and you can truly own it. Other than that, DDG for search, and your own domain for email (so that you could transfer it between different hostings if necessary).

I do have a Google account, but I use it for precisely two purposes: Google Play (my phone wouldn't work without one) and YouTube subscriptions (I can use an RSS reader for this, but it's a bit inconvenient). You can create a Google account without creating a Gmail account.


May I suggest using NewPipe[1] for a Google-account-free experience to follow channels? You can import them from your current subscription list, and easily export them when you switch phone or for backup purposes.

[1]: https://newpipe.schabi.org/


The sooner you start, the better. I've moved most of my email/contacts/calendar away [0], and the longer you give yourself to catch the things you've signed up for but forgotten, the better. Youtube was also a pain, but I transitioned my subscriptions manually to a different account. Maps seems like it'd be the trickiest if you're invested. I wasn't a heavy user, and maps still works pretty good when you're logged out.

[0] I use fastmail + custom domain, which works great, but you have to guard the domain very closely.


> [0] I use fastmail + custom domain, which works great, but you have to guard the domain very closely.

What do you mean by guarding the domain? To prevent large volumes of spam?


I think OP means that you have to make sure you don't forget to/neglect to renew it and make sure you don't accidentally lose the domain for any reason.


Thank you for the clarification. I use a dedicated card for domain hosting (with autorenewal enabled) to prevent this specific issue but I recognize most people likely don't do the same.


spot on, basically you now have to worry about the domain being lost or hijacked also. for me, the flexibility to change email providers behind a domain is worth it though


> you have to guard the domain very closely

I'm intrigued by this, would you kindly share more on this!?!


It means if you slip up and lose your domain, nobody can send you email (including 2FA, reset password, add a new email to your account, etc). You can imagine how inconvenient that would be. I use fastmail with a custom domain and that scenario gives me nightmares.


Mostly off-topic, but related: this is one of the major reasons email needs to finally go away. It was never intended to be the backbone of peoples lives in the way it has become.

Access to my email account probably gives you more access to my life and identity than my SSN [0].

I long for the day that we [1] all get assigned a public/private keypair instead of SSNs. That won't fix everything, but it's a huge step above a shared secret that is limited to 9 digits [2].

[0]: Even without signing up for a bunch of services, it's basically impossible at this point (at least in the US) to not have an email address associated with your bank account, car loan, mortgage, credit card, or even just watching TV.

[1]: "We" meaning "US citizens" or anyone else with a similar system.

[2]: I realize you also need info about the person and not just their number, but also apply that to keypairs.


> I long for the day that we [1] all get assigned a public/private keypair instead of SSNs.

What is the remedy for when someone loses or leaks their keypair?


Have the organization responsible of managing the PKI to generate a new subkey from your primary key (kept in cold storage) and publish a certificate revocation for the previous subkey lost/leaked.

Most of our ID cards (health, driving license) already have an expiration date and the subkeys should have one anyway.


No reason you can't have more than one, either. You could even issue keys for people to act on your behalf (e.g. they get access to it on your death as part of your will).


Report in person to an issuing authority for biometric authentication. Have them issue a new one and blacklist the old public key.


Any number of things that are better than what currently happens when a SSN is leaked.


I have been doing it for a long time, the hardest for me is all the registered users I have around the web linked to the email. After a few years of changing each one that mattered I finally get close to zero mail on gmail. Search I moved to ddg, that was the easy one. Android can work fine with just f-droid since I noticed I rarely even use the store any more and I need just a few essential apps. For storage, I tend to store only documents and I like to use mega.nz.

The only thing I haven't managed to find a even close to decent alternative it's photos. Google Photos is just simply too good. I would be even willing to pay but really, all the other apps struggle to get sync right or have some other crappy stuff that makes them barely usable.


I used shoebox as a great alternative to Google photos. The problem is they just shut down.


As I wrote that comment I went on another small search as I do every so often and I found Canon Irista and I have to say, I am impressed. The sync seems to work fine, it's pretty fast and the UI both of the website and the app is pretty solid. I suggest giving it a try if you are on the lookout for a new photo hosting service.


Yandex disk is an alternative to Google photos.


Late reply but here is what I did:

- Bought a new domain name and moved my mails in fastmail. I have been super happy with it so far.

- My Gmail address is now only for spam or very low importance emails.

- All my Pixel pictures are still uploaded to Google Photos, but I backup everything once a month or so.

- I don't use Google Drive for anything anymore. I have an Evernote account and a Dropbox account.

- Completely switchecd to DDG and Firefox.

- I'm still using my Pixel2 as of now but my next upgrade will be an IPhone, or a rooted Google-Independent Android phone.


- Custom mail with your domain or use Outlook online.

- Use Yandex Maps instead of Google Maps. It's very accurate and navigation is smooth. Way better than Gmaps imo.

- Use Office 365 instead of Google Docs.

- You can use One Drive for cloud storage or buy a cheap VPS.


Surely, Google has way too much power, but as the old adage goes: don't put all your eggs into one basket :)


It's unfortunate that they are the only basket in town!


Are they? Email, calendar, online office, cloud storage etc. are all available from various other companies(even beside the big few corporations). The only two areas where you'd really have to sacrifice features would be Android apps, and YouTube if you're running a channel.


tell me which provider has an integrated single signon service for all of those above? Which provider has apps for their service for all major OS'es (including mobile), and is mostly free (or low cost)?


The cost of free is worrying about arbitrary closure, in this instance. Fastmail is much better than Gmail, for what it's worth.


Microsoft does. OneDrive for storage, Outlook webapps does email and calendar. Office online has Word, Excel, etc. All accessed with one Microsoft account. All free.

You might not want to be tied to Microsoft but Google is not the only option.

Edit: Overlooked the comment about Apps. Microsoft offers apps for mobile, but not Linux. Although even on Windows I use the browser to access the services which will work on Linux.


Microsoft's office offerings comes very close (cept for the free part - which i guess is just a bonus and not a requirement). Although i have to say, despite microsoft's attitude for keeping compatibility and old stuff working, they too could chuck a google reader one day, and deprecate/remove a needed service (along with all your data).

What's needed is a syndication of data, and inter-operable apps. Like how xmpp worked. But of course, all vendors don't like this, because it turns themselves into a commodity.


The point is to not put all your eggs in one basket. Doesn't matter who owns that basket.


apple


last i checked, icloud apps can only run on macs.


Yeah. The grandparent post is such a cliche in an age when the competition authorities stopped doing their job.


That gives a new meaning to "google bombing" -- a bad actor could cultivate a terrible google rating, then hire onto a low level freelance gig at a big company they wanted to bomb by association! Let's just say Oracle as a hypothetical example -- Russia, if you're listening...


If you have a significant problem with waking up one day 100% google free, you are already in trouble.


Isn't that story debunked, if we're talking about the one coming from reddit?


> Pretty sure they got the accounts back after some public outrage.

This may happen only if you manage to get it to the front page of HN or have many Twitter followers. In most cases you don't stand much chances though.


I would make sure to not have any important Google account


Google can and will shut down accounts they think are ran by the same actor, even if they're not explicitly linked


Hehe, I was just thinking how simple it will be for Google to identify accounts using this technique from simple usage analytics. I suspect this will not work for long... but still super cool!


Agreed. It also sounds like a great way to (ab)use a dumb commodity via ephemeral Google accounts to distribute data.


Base85 would probably be a better choice for storing binary as text, since it has a ratio of 5:4 instead of 4:3.

On the topic of "unusual and free large file hosting", YouTube would probably be the largest, although you'd need to find a resilient way of encoding the data since their re-encoding processes are lossy.

I like the "Linux ISO" and "1337 Docs" references ;-)


Here is an implementation of arbitrary data storage using YouTube videos: https://github.com/dzhang314/YouTubeDrive


Wouldn’t YouTube re-encode the video and mess up with the data?


If you look at the example video, the videos are encoded in relatively large blocks that are easily recoverable from compression.


You just need enough redundancy/error correction.

Back in the day there were systems to back up to VHS tapes and those were way more lossy than YouTube https://youtu.be/TUS0Zv2APjU


I love that this exists


I want to watch some! Got any urls?



Thank the gods for base85..


You'd be at the mercy of them potentially changing their encoding scheme unannounced and corrupting your files.


Back in the day of email gateways between different networks, there used to be a terrible problems with all the tin-pot dictator IBM SYSADMINs at BITNET sites who maintained their own personal styles of ASCII<=>EBCDIC translation tables, so all the email that passed through their servers got corrupted.

EBCDIC based IBM mainframe SYSADMINs on BITNET were particularly notorious for being pig-headed and inconsiderate about communicating with the rest of the world, and thought they knew better about the characters their users wanted to use, and that the rest of the world should go fuck themselves, and scoffed at all the unruly kids using ASCII and lower case and new fangled punctuation, who were always trying to share line printer pornography and source code listings through their mainframes.

"HARRUMPH!!! IF I AND O ARE GOOD ENOUGH FOR DIGITS ON MY ELECTRIC TYPEWRITER, THEN THEY'RE GOOD ENOUGH FOR EMAIL! NOW GET OFF MY LAWN!!!" (shaking fist in air while yelling at cloud)

It was especially a problems for source code. That was one of the reasons for "trigraphs".

https://stackoverflow.com/questions/1234582/purpose-of-trigr...

https://en.wikipedia.org/wiki/Digraphs_and_trigraphs

>Trigraphs were proposed for deprecation in C++0x, which was released as C++11. This was opposed by IBM, speaking on behalf of itself and other users of C++, and as a result trigraphs were retained in C++0x. Trigraphs were then proposed again for removal (not only deprecation) in C++17. This passed a committee vote, and trigraphs (but not the additional tokens) are removed from C++17 despite the opposition from IBM. Existing code that uses trigraphs can be supported by translating from the source files (parsing trigraphs) to the basic source character set that does not include trigraphs.


I always wondered what the purpose of trigraphs were other than to help win obfuscated code contests haha


You'd want to make sure the coding scheme is in the identifiable visual data. Think QR codes.

Build in a bit of redundancy, and I think it would work.


It would probably work, yes. But I don't think many people want their backups powered via a service that "will probably work".


I don't think you want your backups in Google Docs either, given that Google may decide to ban you for TOS violations at any time.

I really do think videos would would work, reliably, given sufficient redundancy. Again, we have QR codes already, so this is a proven idea. You can't make QR codes unreadable without removing lots of perceptual visual details. The risk, as with using Google docs, isn't that Google will change their encoding, but that Google will just take down the videos for service misuse.

I think it would be comparatively more difficult for Google to detect this stuff in a video compared to a text document, because you expect some videos to be long and large. The entirety of the Encyclopedia Britannica comes out to less than 500 MB in a .txt document, so using any reasonable amount of space in a Google Doc should quickly raise red flags.


that would be tough if youtube doesn't save the originals


youtube probably doesn't save the originals (though they could in some cold-storage tape drives, perhaps). But even still, it's not difficult to imagine that there may at some point exist a compression algorithm that can be applied to existing compressed video that could change a couple bits around in whatever encoding scheme you've chosen. Depending on the file type, that could be enough to corrupt the whole thing.

Sure you can get around this by adding ECC, but that isn't implemented here.


> Base85 would probably be a better choice

Base64 has the advantage of relative ubiquity (though Base85 is hardly rare, being used in PDF and Git binary patches). It also doesn't contain characters (quotes, angled brackets, ...) that might cause problems if naively sent via some text protocols and/or embedded in XML/HTML mark-up.

> YouTube ... you'd need to find a resilient way of encoding the data [due to lossy re-encoding]

That should be easy enough: encode as blocks or lines of pixels (blocks of 4x4 should be more than sufficient) in a low enough number of colour values (I expect you'd get away with at least 4bits/channel/block with large enough blocks so 4096 values per block) and you should easily be able to survive anything the re-encoding does by averaging each block and taking the closest value to that result.

Add some form of error detection+correction code just for paranoia's sake. You are going to want to include some redundancy in the uploads anyway so you can combine these needs in a manner similar to RAID5/6 or the Parchive format that was (is?) popular on binary carrying Usenet groups.


Would be cool to use the audio for some extra bandwidth too, get some Sinclair Spectrum-esque (albeit in stereo) bleeps to accompany the video.



A few years ago I also found a backup tool that converted backups to DV videos, so that you could write them on cheap DV cassettes. It was something like more than 10 GB per cassette. Definitely not bad for a few years ago.


Just FYI, turn your volume way down when listening to these. Wouldn't be a good idea to blow your eardrums on this.


Why not yEnc? 1-2% overhead and it's been in use on UseNet for binary storage for a very long time.


The nice thing about yEnc is that it only has to escape NUL, LF, CR, and the escape character itself '=', so it essentially uses all but 3 characters out of the 255 possible values.

While this works over NNTP, SMTP and IMAP (and possibly POP), I'm not sure if it will work over HTTP if any of the servers use the Transfer Encoding header.


Just use Unicode for the optimal highest possible base 1,114,112!


Even URL shorteners offer unlimited storage if you jump through enough hoops.

To encode ABCDEFGHIJKLMNOPQRSTUVWXYZ first get a short url for http://example.com/ABC, then take the resulting url and append DEF and run it through the service again. Repeat until you run out of payload, presumably doing quite a few more than 3 bytes at a time.

The final short url is the your link to the data, which can be unpacked by stripping the payload bytes then following the links backwards until you get to your initial example.com node.


I've lost track of the number of times I've seen variants on "Hey, a link shortener is a fun first project for this new language I'm learning; hey, $LANGUAGE_COMMUNITY, I've put this up on the internet now!... hey, uh, $LANGUAGE_COMMUNITY, I've had to take it down due to abuse." There are numerous abuse vectors. Optionally promise to get it back up real soon now, as if there are actually people depending on it.

Maybe it isn't a bad first project, but on no account should you put it up on the "real" internet and tell anyone it exists.


In 1998, the EFF and John Gilmore published the book about "Deep Crack" called "Cracking DES: Secrets of Encryption Research, Wiretap Politics, and Chip Design". But at the time, it would have been illegal to publish the code on a web site, or include a CDROM with the book publishing the "Deep Crack" DES cracker source code and VHDL in digital form.

https://en.wikipedia.org/wiki/EFF_DES_cracker

https://www.foo.be/docs/eff-des-cracker/book/crackingdessecr...

>"We would like to publish this book in the same form, but we can't yet, until our court case succeeds in having this research censorship law overturned. Publishing a paper book's exact same information electronically is seriously illegal in the United States, if it contains cryptographic software. Even communicating it privately to a friend or colleague, who happens to not live in the United States, is considered by the government to be illegal in electronic form."

So to get around the export control laws that prohibited international distribution of DES source code on digital media like CDROMS, but not in written books (thanks to the First Amendment and the Paper Publishing Exception), they developed a system for printing the code and data on paper with checksums, with scripts for scanning, calibrating, validating and correcting the text.

The book had the call to action "Scan this book!" on the cover (undoubtedly a reference to Abby Hoffman's "Steal This book").

https://en.wikipedia.org/wiki/Steal_This_Book

A large portion of the book included chapter 4, "Scanning the Source Code" with instructions on scanning the book, and chapters 5, 6, and 7 on "Software Source Code," "Chip Source Code," and "Chip Simulator Source Code," which consisted of pages and pages of listings and uuencoded data, with an inconspicuous column of checksums running down the left edge.

The checksums in the left column of the listings innocuously looked to the casual observer kind of like line numbers, which may have contributed to their true subversive purpose flying under the radar.

Scans of the cover and instructions and test pages for scanning and bootstrapping from Chapter 4:

https://imgur.com/a/7pHSAT1

(My small contribution to the project was coming up with the name "Deep Crack", which was silkscreened on all of the chips, as a pun on "Deep Thought" and "Deep Blue", which was intended to demonstrate that there was a deep crack in the United States Export Control policies.)

https://en.wikipedia.org/wiki/EFF_DES_cracker#/media/File:Ch...

The exposition about US export control policies and the solution for working around them that they developed for the book was quite interesting -- I love John Gilmore's attitude, which still rings true today: "All too often, convincing Congress to violate the Constitution is like convincing a cat to follow a squeaking can opener, but that doesn't excuse the agencies for doing it."

https://dl.packetstormsecurity.net/cracked/des/cracking-des....

Chapter 4: Scanning the Source Code

In This chapter:

The Politics of Cryptographic Source Code

The Paper Publishing Exception

Scanning

Bootstrapping

The next few chapters of this book contain specially formatted versions of the documents that we wrote to design the DES Cracker. These documents are the primary sources of our research in brute-force cryptanalysis, which other researchers would need in order to duplicate or validate our research results.

The Politics of Cryptographic Source Code

Since we are interested in the rapid progress of the science of cryptography, as well as in educating the public about the benefits and dangers of cryptographic technology, we would have preferred to put all the information in this book on the World Wide Web. There it would be instantly accessible to anyone worldwide who has an interest in learning about cryptography.

Unfortunately the authors live and work in a country whose policies on cryptography have been shaped by decades of a secrecy mentality and covert control. Powerful agencies which depend on wiretapping to do their jobs--as well as to do things that aren't part of their jobs, but which keep them in power--have compromised both the Congress and several Executive Branch agencies. They convinced Congress to pass unconstitutional laws which limit the freedom of researchers--such as ourselves--to publish their work. (All too often, convincing Congress to violate the Constitution is like convincing a cat to follow a squeaking can opener, but that doesn't excuse the agencies for doing it.) They pressured agencies such as the Commerce Department, State Department, and Department of Justice to not only subvert their oaths of office by supporting these unconstitutional laws, but to act as front-men in their repressive censorship scheme, creating unconstitutional regulations and enforcing them against ordinary researchers and authors of software.

The National Security Agency is the main agency involved, though they seem to have recruited the Federal Bureau of Investigation in the last several years. From the outside we can only speculate what pressures they brought to bear on these other parts of the government. The FBI has a long history of illicit wiretapping, followed by use of the information gained for blackmail, including blackmail of Congressmen and Presidents. FBI spokesmen say that was "the old bad FBI" and that all that stuff has been cleaned up after J. Edgar Hoover died and President Nixon was thrown out of office. But these agencies still do everything in their power to prevent ordinary citizens from being able to examine their activities, e.g. stonewalling those of us who try to use the Freedom of Information Act to find out exactly what they are doing.

Anyway, these agencies influenced laws and regulations which now make it illegal for U.S. crypto researchers to publish their results on the World Wide Web (or elsewhere in electronic form).

The Paper Publishing Exception

Several cryptographers have brought lawsuits against the US Government because their work has been censored by the laws restricting the export of cryptography. (The Electronic Frontier Foundation is sponsoring one of these suits, Bernstein v. Department of Justice, et al ).* One result of bringing these practices under judicial scrutiny is that some of the most egregious past practices have been eliminated.

For example, between the 1970's and early 1990's, NSA actually did threaten people with prosecution if they published certain scientific papers, or put them into libraries. They also had a "voluntary" censorship scheme for people who were willing to sign up for it. Once they were sued, the Government realized that their chances of losing a court battle over the export controls would be much greater if they continued censoring books, technical papers, and such.

Judges understand books. They understand that when the government denies people the ability to write, distribute, or sell books, there is something very fishy going on. The government might be able to pull the wool over a few judges' eyes about jazzy modern technologies like the Internet, floppy disks, fax machines, telephones, and such. But they are unlikely to fool the judges about whether it's constitutional to jail or punish someone for putting ink onto paper in this free country.

* See http://www.eff.org/pub/Privacy/ITAR_export/Bernstein_case/ .

Therefore, the last serious update of the cryptography export controls (in 1996) made it explicit that these regulations do not attempt to regulate the publication of information in books (or on paper in any format). They waffled by claiming that they "might" later decide to regulate books--presumably if they won all their court cases -- but in the meantime, the First Amendment of the United States Constitution is still in effect for books, and we are free to publish any kind of cryptographic information in a book. Such as the one in your hand.

Therefore, cryptographic research, which has traditionally been published on paper, shows a trend to continue publishing on paper, while other forms of scientific research are rapidly moving online.

The Electronic Frontier Foundation has always published most of its information electronically. We produce a regular electronic newsletter, communicate with our members and the public largely by electronic mail and telephone, and have built a massive archive of electronically stored information about civil rights and responsibilities, which is published for instant Web or FTP access from anywhere in the world.

We would like to publish this book in the same form, but we can't yet, until our court case succeeds in having this research censorship law overturned. Publishing a paper book's exact same information electronically is seriously illegal in the United States, if it contains cryptographic software. Even communicating it privately to a friend or colleague, who happens to not live in the United States, is considered by the government to be illegal in electronic form.

The US Department of Commerce has officially stated that publishing a World Wide Web page containing links to foreign locations which contain cryptographic software "is not an export that is subject to the Export Administration Regulations (EAR)."* This makes sense to us--a quick reductio ad absurdum shows that to make a ban on links effective, they would also have to ban the mere mention of foreign Universal Resource Locators. URLs are simple strings of characters, like http://www.eff.org; it's unlikely that any American court would uphold a ban on the mere naming of a location where some piece of information can be found.

Therefore, the Electronic Frontier Foundation is free to publish links to where electronic copies of this book might exist in free countries. If we ever find out about such an overseas electronic version, we will publish such a link to it from the page at http://www.eff.org/pub/Privacy/Crypto_misc/DESCracker/ .

* In the letter at http://samsara.law.cwru.edu/comp_law/jvd/pdj-bxa-gjs070397.h..., which is part of Professor Peter Junger's First Amendment lawsuit over the crypto export control regulations.

[...]


The checksum is really interesting and would be useful even today. I have looked for the scripts but the links are all gone [1], unfortunately.

EDIT: I have found it[2], finally. It's pretty sad that so much of the internet is getting forgotten, though.

[1]: https://web.archive.org/web/19980630210313/http://www.pgpi.c...

[2]: https://the.earth.li/pub/pgp/pgpi/5.5/books/ocr-tools.zip


It seems like a cute and irrelevant distinction that electronic software would be published in a book. If researchers created a computer that processed information using proteins in plant cells instead of electrons, and such a computer could execute programs on this book directly instead of “scanning” it, would not the textbook be software? When laws say “electronic versions” I don’t think they literally mean to refer electrons, but rather, computer-consumables/executables.

Was this tested before a court and did they accept this sort of obviously subversive behavior? (Not that I personally agree with the laws restricting crypto export.)


IANAL, but if the distinction clashes with the crypto export laws, does it not follow that crypto export laws clash with the first amendment? Which then makes them unconstitutional and the focus should be on whether that is wanted behavior and the constitution should be amended, or not.


> The First Amendment made controlling all use of cryptography inside the U.S. illegal, but controlling access to U.S. developments by others was more practical

https://en.wikipedia.org/wiki/Export_of_cryptography_from_th...


A bizarre intersection with this: I once had to prepare part of my employer's source code for registration with the US copyright office. They wanted "the first N pages" of the source code, for N of a dozen or so. After consulting with the lawyers making the filing, I ended up making a pdf that included main() and the first few functions that it called until I got up to N pages.


If you save digital data (ASCII) on an analog form like a cassette tape is that okay? Seems you could alternativly put metallic strips in a book. What about QR Codes? Could you have a massive QR Code on each page which contains a section of source code? Could you use an alternative encoding like dots and lines (.||.||....|.|.|..|) to represent 1s and 0s which is easy to scan (and not require OCR/checksums)?

To what extent does analog encoding fall under the illegal threshold?


This is exactly what I am getting at. For the most extreme example, consider a swarm of nano bots hovering in the atmosphere that implement a computer that can understand and directly execute algorithms spoken in human speech transmitted through pressure fluctuations. There is no distinction that can universally separate speech and computer programs.


The checksums in the left column of the listings innocuously looked to the casual observer kind of like line numbers, which may have contributed to their true subversive purpose flying under the radar.

Are you implying there's something more interesting there than just the DES source code and related data that the book already very clearly claims to contain?


I don't think so, I think the OP's just trying to be dramatic?


PGP already did this a few years before. It wasn't a secret what was being done, it just had no legal recourse for stopping it.

It's a bit dramatic to imply this was covert in anyway.


Afaik Phil Zimmermann was one of the first to do it, in '95 through MIT Press—when his PGP circulated a bit too widely for the export regulations. However, the question of him being protected under the 1st wasn't decided in the court.


Quite an interesting read, however "Deep Crack" is a horrible name.


I think in the long run, the user could risk the complete google-account if they begin rating the uploads a violation of TOS.

I advise a totally seperate account when using this tool.

But anyways, something inside me likes it. Nicely done. Good job :)


At least a totally separate account. Probably better to use a totally separate set of IP addresses and browsers and maybe even computers. Google will definitely link accounts created from the same browser and potentially ban your main account if you violate their TOS on another account also owned by you.


This is a complete hack job and probably useless if Google changes free storage for docs.

That being said, they currently allow the guys at /r/datahoarder to use gsuite accounts costing £1 for life with unlimited storage quotas. These are regularly filled to like 50TB and Google doesn't bat an eye.


As a data hoarder myself with somewhere around 300TB on G Suite Business, please tell me more about those £1 for life accounts!


Search them up on ebay! Loads of IT admins for schools sell a random email with you. Pretty scammy.


I'm kinda scared to try it since google could mass ban all of the accounts if they want to, but sure is a great job from the dev.

I didn't know this was plausible.


Anyone remember Gdrive? I can’t find it now, but I think it was probably early or mid 2000s. It let you store files as a local disk (FUSE) via Gmail attachments.


I remember using it back in 2005 iirc, and it was amazing. The files had a label called gmailfs.gDisk which is how it could keep the "file system" separate from the rest.

Now Google generously offers Drive with 15Gigs of space.


15 gigs would've been generous in 2005, now it's not.


Are there competitors who offer more for free?


Mega.co.nz


Yes! That’s the one.


I do! :)

"Gdrive" (here: http://pramode.net/articles/lfy/fuse/pramode.html ) and the "Gmail Filesystem"/"GmailFS" (here: https://web.archive.org/web/20060424165737/http://richard.jo... as mentioned elsewhere in this thread) were both built on top of `libgmail` (here: http://libgmail.sourceforge.net/ ) a Python library I developed.

There were a couple of different projects at the time (listed in "Other Resources" on the project page) that sought to provide a programmatic Gmail interface.

I still have a "ftp" label in Gmail (checks notes 15 years later...) from the experimental FTP server I implemented as a libgmail example. :D

The libgmail project was probably the first project of mine which attracted significant attention including others basing their projects on it along with mentions in magazines and books which was pretty cool.

I think my favourite memory from the project was when Jon Udell wrote in a InfoWorld column ( http://jonudell.net/udell/2006-02-07-gathering-and-exchangin... ) that he considered libgmail "a third-party Gmail API that's so nicely done I consider it a work of art." It's a quality I continue to strive for in APIs/libraries I design these days. :)

(Heh, I'd forgotten he also said "I think Gmail should hire the libgmail team, make libgmail an officially supported API"--as the entirety of the "team" I appreciated the endorsement. :) )

The library saw sufficient use that it was also my first experience of trying to plot a path for maintainership transition in a Free/Libre/Open Source licensed project. I tried to strike a balance between a sense of responsibility to existing people using the project and trusting potential new maintainers enough to pass the project on to them. Looking back I felt I could've done a better job of the latter but, you know, learning experiences. :)

My experiences related to AJAX reverse engineering of Gmail (which was probably the first high profile AJAX-powered site) later led to reverse engineering of Google Maps when it was released and creating an unofficial Google Maps API before the official API was released: http://libgmail.sourceforge.net/googlemaps.html

But that's a whole other story... :)

    </nostalgia>


Yeah now look for https://github.com/vitalif/grive2

AFAIK it mostly still works. The older "grive" might not.


Me and a friend came up with a similar idea of a sort of distributed file system implemented across a huge array of blog comment sections. Of course you’d need a bunch of replication and fault tolerance and the ability to automatically scrape for new blogs to post spammy-looking comments on, but I thought it was a pretty funny and neat idea when we came up with it.


I heard about a subreddit a while ago, where every post/comment was a random string. It was speculated at the time that something similar was going on.


It's even more interesting to think about this in the context of preserving banned information for future generations. For example, if all the countries in the world united to ban the New Testament. But you eventually realize the ephemeral nature of the net will probably prevent it from fulfilling such long-term data-archiving roles and you're better off burying manuscripts deep underground.


The thought of a distributed MySQL cluster accessed over various versions of WordPress-as-a-database-layer just makes me happy and confused.


Even scarier than that would be a Turing complete language where the code is stored and memory is written to comments sections. The actual execution could be done by reading, execution function, and writing comments to store working memory and results. I guess with cryptographic encryption you could even hide what your doing.


So UseNet?


Very neat, but it seems to me the issue with all wink-wink schemes like this is that you're ultimately getting something that wasn't explicitly promised, and so might be taken away at any time. So while interesting you couldn't really ever feel secure storing anything that mattered this way.


Yea but you could store unlimited backups across multiple accounts. (Not advocating this however)


You can also just use GSuite with a few users to get unlimited Google Drive storage.

https://support.google.com/a/answer/139019?hl=en#6_storage


relevant video on this unlimited plan and the way it is capped.

https://youtu.be/y2F0wjoKEhg

tl;dw Upload is limited to 750GB per day per account


Small correction; that limit is per user not per account!

You need 5 users for GSuite Business, if you use those alone the limit is now 3750GB/day


That references "for education", but it's also true for GSuite Business (and enterprise, but not basic). You'll need to be paying for at least 5 users, or $60/month.



From what I understand the 5 user minimum isn't implemented.


Yup, I'm storing 42TB on there at present.


Someone is going to notice a few accounts with insanely high storage usage, and then comes the ban-hammer. Enjoy losing your Google account!


I think that depends on their tools and how they evaluate data usage. If the reporting states that the accounts are using very little storage because it's using the same measuring stick that the client does them it's invisible. The question comes up during an audit of the system when the disk usage doesn't match the report. Then again, if this is used by few people it may just look like a margin of error.


It'll more be that the Google docs "live editing" backends are expensive to use disk and memory wise. They store complete version history with each keystroke of a document.

There's a good chance a megabyte of "document" costs Google a gigabyte of internal storage...


> They store complete version history with each keystroke of a document.

I would expect them to only store a diff between each version instead of storing the whole thing. Couldn't find much about this after a quick search.


They don't store a complete version history. It just uses checkpoints in their timeline of real-time edits and computes the differences when you need them. Those deltas can also be compressed.


It's naive to think they don't compress.


From the project page:

> sorry @ the guys from google internal forums who are looking at this


Honestly this isn't ground breaking, we have been using BASE64 to convert binary to ASCII as a way of "sharing" files all the way to USENET days. While applications like these make it easy for the masses to participate in the idea, they don't bring anything new to the table.

That all said, this is really cool from a design perspective and I poured over the code learned a lot.


It's also how email attachments work.


Google doc allows you to upload images from your computer. Why not just do that? With proper steganography no one will bat an eye on a few docs with some multi-megabyte pictures.


I had an (evil; don't do this) idea a while back to create a Dropbox-like program that stores all your data as binary chunks attached to draft emails spread across an arbitrary number of free email accounts.


This existed just after gmail launched. Can't recall the name of the program, but I played around with it to store a few hundred MB in a test account.



That might have been it!


Yeah, there were a couple of different ones wrote a bit more about them in a comment here: https://news.ycombinator.com/item?id=19917018



I had an evil idea to create a key/value storage using HN dead comments.


Definitely would make an interesting learning exercise--I learned way more about SMTP/POP protocols* than I did before when I implemented demonstration SMTP/POP servers for my libgmail library before Gmail offered alternate means of access.

These days there's even the luxury of IMAP. :D

[*] About the only thing I remember now is the `HELO` and `EHLO` protocol start messages. :)


Did this as a college project with a friend, was pretty fun.

Nowadays stuff like Dropbox is much more convenient and reliable.


I may be mistaken, but as far as I'm aware Google docs synced to your local machine are nothing more than links to documents in the Google Drive cloud. None of the data inside those docs is actually stored locally. I found this out the hard way when I decided to move away from GD and lost a lot of files.

So buyer beware I guess.


Should you want to move from Google services, the best way of ensuring you keep your data is to use Takeout [1], which exports your documents as both doc and html files.

[1] https://takeout.google.com


You could do the same thing with QR codes in Google Photos, the compression required for unlimited storage wouldn't affect them.


Related: unlimited private incremental storage on Usenet (concept):

https://gist.github.com/retroplasma/264d9fed2350feb19f977575...

TL;DR: An alternative to NZB, RAR and PAR2. Private "magnet-link" that points to encrypted incremental data.


Reminds me of the old programs that would turn your Gmail storage into a network drive by splitting everything into 25MB chunks. Utterly miserable experience with terrible latency and reliability.


Yeah, there were a couple of projects that implemented that functionality (mentioned more in my comment https://news.ycombinator.com/item?id=19917018 if you're interested).

Also, "Utterly miserable experience with terrible latency and reliability." is such a great customer endorsement quote. :D




I couldn't find it with a quick search, but I remember many years ago someone creating a similar scheme for storing files inside of TinyURLs.

You would run the uploader and get back a list of TinyURLs that could then be used to retrieve the files later with a downloader.

But you couldn't store too much in each URL so the resulting list could be pretty big.


This is a favorite lunch topic at work. AFAIK we stumbled on the idea ourselves, but I'm not surprised to hear it's unoriginal. Rather than a list, our design is a tree structure where leaf nodes contain data and branch nodes contain lists of tinyurls...


Someone also created a filesystem using DNS caches of others to store the files: https://news.ycombinator.com/item?id=16134041


Because I'm sure Google has NO data on pathalogical docs file sizes. I can't wait for the follow on 'Google banned my account with all my life's data that I didn't back up anywhere for no good reason'


I wonder if this could be used to create a P2P network like bit torrent except trackers point to blocks at google doc urls instead of peers/seeds


I discovered that a lot of pirate stream sites are already doing something similar (but not exact) to this.

They store fragments of movies (rather than the full videos) in Google Drive files and then combine them together during playback. Each fragment could then be copied and mirrored across different accounts, so if any are taken down they can just switch to another copy. Pretty clever (albeit abusive) solution for free bandwidth.


Very cool! About a year ago I had a similar idea, but to store arbitrary data in PNG chunks[1] and upload them to "unlimited" image hosts like IMGUR and Reddit.

[1] http://blog.brian.jp/python/png/2016/07/07/file-fun-with-pyh...


I have a feeling PNG might work on Google Photos too, but I haven't tried it.


If you wanna give it a shot, try the code I linked here: https://news.ycombinator.com/item?id=19916126

Although if Picasa (predecessor to Google Photos) worked with BMP, it may be better to do that because it's much easier and more space efficient to encode arbitrary data in than PNG.


So, are the +- 700 kB files too small to register as taking up any space?


google drive doesn't count docs, spreadsheets or presentations against your quota


Not so unlimited given these restrictions: https://developers.google.com/apps-script/guides/services/qu...


Could someone please ELI5 how Google Drive doesn't include text files toward usage?


These aren't text files, but Google Docs files, which Google doesn't count against an account's quota.


OK, I did see that.

But I don't understand why Google would do that. For most users, aren't Google Docs files a substantial part of their usage? Or do people mainly store backups?


Simple, Google wants to encourage people to use their office suite so they indirectly subsidise it in this way.


FTA "A single google doc can store about a million characters. This is around 710KB of base64 encoded data."

This means that in order to reach the limit of the drive space given away for free, they'd need something like 15,000 Google Doc files (15GB) if they counted toward your space limit. I doubt a lot of paying customers even reach that.

The real limit (file size) is reached by binaries. Videos and PDFs, usually.


I see.

But then, I suspect (as others note) that Google will notice when you have 50GB of Google Docs full of base64.

Seems an iffy way to store stuff.


There's a good chance they won't. For privacy reasons, engineers can't just start peering at your files.

They'd have to write a base64 detectors and automate the detection and banning of the accounts without the engineer ever seeing your files.

Any bugs in that code, and they'll ban innocent people.


Wouldn't it be simpler to just set a generous limit for the number of Google Docs files? Say, 15 thousand?


I bet implementing such a limit would be 3 or more months of engineering effort.

Think about the difficulties. It has to take into account shared directories. It has to know about systems which auto-create documents (like results sheets for Google forms). It has to work with gsuite sysadmins who need to take ownership of files from deleted accounts. The UI to show when you have hit the limit has to be designed. And the support team has to be trained on how to resolve that error. And you're going to have to get that error message translated into 30 languages. Users already over the limit are going to be unhappy - are you going to write extra code to give them a grace period? How will you notify them of that? Will you have a whitelist of users allowed to go over the limit? How will you keep the whitelist updated and deployed to the servers? Who will have access to add/remove people from the whitelist?

The actual system itself has race conditions:. What if that 15000th file was simultaneously created in the USA and Europe? There is no way to prevent that without a cross-ocean latency penalty. Do you want to pay that penalty for every document creation? How do you deal with a net-split where cross ocean network traffic is delayed?

Finally, how will you monitor it? Will you have logging and alerting for users hitting the limit? Will there be an emergency override for engineers to remove the limit if necessary?

At big-web-service scale, simple engineering problems become complex problems fast...


OK, not simpler.

So what? They might end up nuking some accounts, when some symptom pops up. And there'd be no recourse, whether or not it was a false positive.


I've wondered if someone could do the same thing with videos and jpgs. Amazon prime, as one example, allows you to store an unlimited number of image files for "free". What if there was a program that would take a video file and split it up into its individual frames as jpgs and stored them on Amazon prime. When you wanted to watch the video the program would rebuild the video file from the individual jpgs on AWS.


My guess would be that the latency of this approach would be far too high to be practical. But you could probably abuse the JPEG format to stuff bits of the video into image files. I think you'd probably still need to spend a fair amount of time buffering before you could start watching without lag.


To me it looks like iodine (https://code.kryo.se/iodine/): very nice as a hacking tool to prove a point, but unlikely to be actually helpful in all but very peculiar situations. As a hacker, of course, I value a lot the first part of it!


I do the same on a different cloud storage provider. I won't name it because I don't want to be banned from it!


I don’t know anyone remember but some years ago I remember seeing a file compressed from 1GB to 1mb. And I was amazed.


On the edonkey network, the file size would be reported raw but the clients could compress and transfer chunks to each other. Some guy had created an empty IL-2 sturmovik iso and seeded it. We lived at a government facility with ill-policed high speed (for the time) internet but even then I knew that I didn’t have a 400 Mbps connection. Maybe 2002/2003.

The whole thing only transferred a few kB. It looked like an entire disc though.


Maybe I'm missing a reference or joke here, but the size of a file means little with respect to how much it can be compressed. You can get a 1 petabyte file down to a few bytes if it's just `\0` repeated over and over.


42.zip?



Nice! I did a similar hack to get unlimited Dropbox space by creating many accounts and distributing the files across the accounts.

https://github.com/WarrenGreen/InfiniteDrop


How is this different from encrypting the binary locally, and store the result as hex strings?


It's 75% more space efficient. And it's automated.


Correction: unlimited as long as it takes Google to fix this oversight in quota calculation.


It isn't truly an oversight, it's an abuse of the fact that Docs/Sheets/Slides are not counted toward your quota. Their storage model is a little more complicated than a standard stream of bytes like an image or a text file.


Has anyone actually tried storing a large amount of data like this? I feel like creating a new google account and using it as a backup for a 300gb folder I have.


Yes. It's called: Post to alt.binaries.* on Usenet.

It's effectively the same thing under the hood. Binaries are split and converted to text using yEnc (or base64, et al.) and uploaded as "articles". An XML file containing all of the message-IDs (an "NZB") is uploaded as well so that the file can be found, downloaded, and reassembled in the right order.

This form of binary distribution has been around since the '80s if you change some of the technical details; e.g. using UUencode rather than yEnc.

Spend $5 for a 3-day unlimited Usenet account with e.g. UsenetServer.com and upload it.

If you want it to stay up, then make another account in 3925 days (the retention period), download it, and then reupload it for another 10+ years of storage.


I would not, in any way, consider this a backup.

If it’s only 300GB check out Backblaze B2. It would cost you $1.5 per month for that amount of data.


This is very clever, and a neat interface.

On the one hand, I think this is great, on the other, I hope it doesn't force google to add limits that bother me in the future :P


Couldn't you also embed data into images and upload them to Google photos, or is that discarded when they convert and compress the image in the backend?


Depends how you encode it. A bunch QR codes, no problem. But encoding into the individual pixel, probably not so much.


I mean include binary data in an image file. So you would have a 300x300px jpg picture of a flower that's 20mb which you could unpack to a binary file.


storing original photo will use google drive space quota (if you don't have a pixel phone).

only high quality is unlimited (reduced size)

so, if your data survive photo compression technology, you can do it.


Check the issues! Some people have tried quite hard to figure that one out.


I think I already saw this a few (>6) months ago on reddit, have you changed/improved anything in the meantime?


Damn 4:3? That ain't too bad.


Base 64 gives you 6 bits per character. Assuming a character requires 8 bits to store eg in UTF8 then yep that’s 8:6. Might be better with compression getting you closer to 1:1.


already better than s3 :) makes me think of https://www.reddit.com/r/DataHoarder/


So what makes the data not count against the usage?


It's such a weird trick that I love it!


Google UseNet.


Thinking outside the box.


Nice!


very clever, well done!


genuinely sorry if you're a Google employee (probably won't put this on my Internship CV)


yeah! let's hope no google employees see this ... oh wait


:)


ELI5 please?


The script split file into small base64 chunks that are stored into "documents" (mime type: application/vnd.google-apps.document ) that apparently don't count against google drive quota.


Google, give this man a job!


Too bad there is no similar trick for atmospheric carbon.


everyone can go fuck off




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: