

Gmail Ran out of space - prabodh
http://daggle.com/pondering-email-conservation-hitting-gmails-storage-limit-1395

======
icey
I _hate_ the fact that I can't search / sort by size in my gmail account. I'm
sure I have at least 1gb of space that's being wasted by attachments that I no
longer care about; but I don't want to go through all my messages to see what
I want to delete and what I want to keep.

~~~
TheElder
What you can do is add your gmail account as an IMAP account in Outlook or
another email client and sort it there. Hard to believe that we must do that
because of their anti sorting stance.

~~~
drawkbox
It isn't really an anti-sorting stance. If you have worked on appengine and
with Google File System you will see that it is fast almost solely because
there is no complete dataset. So sorting is really tough on 64K or 64MB chunks
spread around many many machines. Google is so fast and scalable exactly
because there is no complete dataset easily attainable. GFS works in 1000 item
chunks and that is about the best you can get with that setup. Searching,
counts/increments and metadata about ALL of your data in the GFS or email in
this case, is a tough problem. The same underlying data system is used for
gmail, reader, etc.

~~~
icey
I wonder how difficult it would be for them to create a metadata file per
account that would only contain things like sent from, sent to, size, subject,
date sent (standard email metadata). Then if someone wants to go digging
around, send the metadata down as JSON and let the sorting happen client side
and let the client return the order the messages should render.

~~~
drawkbox
It is possible but it also means creating more data. I could easily see this
being a pay element to gmail in the future. Basically it could be almost like
a bot/spider/worker that gets a fairly recent snapshot of your data as
metadata and allows you to clean up or organize based on it. It could be
pretty tasking though and may need alot of engineering because right now your
gmail data might be spread across thousands of machines. With more data/years
this only going to get more tasking and costly.

At a certain point with data we have been living in a relative small sandbox
in terms of data. As our lives spread to terrabytes of data and across many
services, we to will be unable to run atomic operations on the whole of your
data.

------
yangyang
Kind of misleading title - it implies that the whole of GMail ran out of
storage space, whereas it was in fact just this guy.

------
Locke1689
This article is far too long for something that can be summed up in these
words: I am an abnormally large user of my email. Google's "buy more space"
feature is broken because of a bug. It should be fixed.

~~~
algorias
Not exactly. The point is that the ratio of gmail's storage increase is so low
that more and more people will inevitably bump against the limit in a couple
of years.

It's not about how full the average inbox is today, but where it's headed.

~~~
Locke1689
Not really. Look at the stuff he's deleting -- hundreds of Facebook
notifications = 10s of MB. He goes through and deletes thousands of emails and
doesn't really recover more than 100 MB. My first feeling when I read this was
that he's obviously deleting the wrong stuff. What he should really be looking
for are those picture album emails at 10 MB each. That's what taking up his
space and that's what most people aren't doing. Google is practicing the old
idea of 80/20 and doing it very well.

------
jsz0
Simple solution with an additional filter rule:

[x] Delete message after __ days

GMail's web UI is really overrated. Missing many features that desktop clients
and even other webmail platforms have offered for years. I still prefer
traditional clients with the web UI as a nice fallback.

~~~
pyre
I do enjoy the 'conversation' view though. It's nice that it will pull in the
messages that I sent into a thread, rather than just relying on people quoting
the whole message in replies.

~~~
absconditus
If you use OS X, Postbox will do this.

<http://www.postbox-inc.com/>

I'm sure clients exist for other platforms which do the same.

~~~
pyre
I usually use mutt outside of the Gmail interface, but IIRC there is a Ruby-
based console mail reader that aims to do the 'conversations' thing.

~~~
mkelly
<http://sup.rubyforge.org/>

------
marze
Regarding wasting storage on crud, I would be very surprised if the gmail
storage system does not duplicate large attachments that have been forwarded
around.

It would not be difficult to store one copy of every attachment or identical
newsletter and link to it. This has to decrease their storage requirements
dramatically.

~~~
patio11
Off the top of my head:

1) What if the disk your canonical copy is on dies?

2) You just introduced a dependency for cross-account reference counting
(Alice and Bob share CryptoBible.doc attachment, Alice deletes the message it
is attached to, Bob better still be able to access it). Your programmers will
love you for that one. Don't forget all sorts of fun edge cases like "Does a
suspended account peg an attachment on disk for forever, on the theory they
could be unsuspended?"

3) Define "identical newsletter". No, really, pretend this is a job interview
question. _waits_ Did you take into account the headers? _waits_ How about the
salutation (Dear $FIRST $LAST,)? _waits_ How about the tracking pixels and
unsubscribe links, which are by necessity personalized?

4) What is the performance impact of the algorithm to determine "identical",
with respect to your kinda-sorta-not-really "identical" developed in question
#3? Can it classify billions of messages in realtime to make nigh-
instantaneous retain-or-delete-and-add-pointer decisions?

5) Does this algorithm introduce multiple single-points-of-failure into Gmail
to save a fraction of a sliver of the costs of running the service?

~~~
blasdel
1) Why do you think Google has _the_ canonical copy on _a_ disk?

2) Why do you think Google ever intentionally deletes anything? Concurrent
deletes are hard and expensive no matter what, and all of Google's tools make
it much more so. They've gotten in all kinds of trouble in the EU for not
being able to provide guarantees about how long they retain data.

3) Store headers, body, and MIME chunks separately. Your datastore will make
hashes for its own reasons, and can make the de-duplication decisions
independently of your app.

4) Your datastore is using hashes as the keys for retrieving the data already.

5) No, because there is no algorithm for this feature -- you got it for free
because of other architectural decisions.

~~~
pyre
> _They've gotten in all kinds of trouble in the EU for not being able to
> provide guarantees about how long they retain data._

The thing that I don't get is why they can't just 'wipe' the data. Maybe I'm
misunderstanding GFS/BigTable here, but you don't necessarily need to 'remove'
the information from that database. Just 'zero out' the data in there (or
overwrite it with random data).

~~~
blasdel
For starters, there are tons of indexes and snippets extrapolated from the
data, and they're cached all over the place.

More fundamentally, why do you think any of the records are mutable at all?
You reap enormous benefits at that kind of scale by making all writes log-
structured.

------
zck
A month or so [1] after Gmail launched, The Screensavers [2] got a Gmail
address and, on the air, invited viewers to send them emails to fill it up and
see what happened. Unfortunately, I didn't see the end of the show.

[1]I don't know what took them so long. I know that it wasn't that soon after
launch because Gmail came out in April '04, but this show was in May. (I
didn't have a television at college, but my parents did). Gmail invites were
scarce, but one would think they could've gotten one.

[2]This was on TechTV. It must've been just before TechTV was folded into G4
-- literally the last month of programming.

------
lupin_sansei
What Gmail needs is an optional rolling deletion policy. When your mail gets
to the ~7GB limit Gmail just deletes the oldest mail to stay within the limit.
It could suggest you turn on this feature when you reach your limit.

MythTV does this when you fill your hard drive up with TV shows. It's much
better than refusing the accept any new data,

------
kevinpet
What we need is a "keep for 30 days" button, rather than "archive". Sometimes
I mean keep forever, but more often my reaction to mail is that I don't expect
to need it, and if I don't need it in the next month, I definitely don't need
it, but maybe I'll change my mind next week and want it.

~~~
ique
Actually that is what the Delete button does by default, keep it for 30 days.

~~~
pyre
Yea, but sometimes people will empty their Trash though. If it was archived
with a 'remove after 30 days' tag on it, it wouldn't really be possible to
accidentally purge all such emails.

