Hacker News new | comments | show | ask | jobs | submit login
Effectively Managing Memory at Gmail Scale (html5rocks.com)
134 points by geobourazanas on Dec 15, 2014 | hide | past | web | favorite | 49 comments



> Game developers, take note: to ensure a 16ms frame time (required to achieve 60 frames per second), your application must make zero allocations, because a single young generation collection will eat up most of the frame time.

And this is why GC isn't the be-all and end-all solution to memory management.

(As as for people that are screaming "pauseless GC" - it has throughput issues [generally due to fine grained locking / other synchronization], and also often has a performance hit on your main thread [due to locking or read barriers, generally])

Now, if someone combined a pauseless GC with proper cleanup (i.e. skipping GC altogether) of variables where the compiler could determine when they can be thrown away, matters would be different. (So, in other words, the compiler inserts `malloc` (or whatever) calls, and ensures that every variable created is either `free`d exactly once after it becomes unreachable or is added to the set tracked by the GC (or is a constant - especially pertinent with strings). With (hidden) local variables to track control flow when different branches cause different allocations.)


By far the best write-up about mobile app development and the true cost of GC has been in [0]. It's a very long read but well worth it.

If you took just one rule of thumb from it, let it be this: if you want speed with garbage collected languages, be sure to use at most 1/6 of the overall system memory.

0: http://sealedabstract.com/rants/why-mobile-web-apps-are-slow...


> Now, if someone combined a pauseless GC with proper cleanup (i.e. skipping GC altogether) of variables where the compiler could determine when they can be thrown away, matters would be different. (So, in other words, the compiler inserts `malloc` (or whatever) calls, and ensures that every variable created is either `free`d exactly once after it becomes unreachable or is added to the set tracked by the GC (or is a constant - especially pertinent with strings). With (hidden) local variables to track control flow when different branches cause different allocations.)

Freeing is not an issue, you don't pay for garbage in a proper GC, as the GC only scans living objects, and never frees.

The problem is allocation, as a scan is triggered when a certain number of bytes has been allocated. Go allows you to skip allocating on the heap (thus the GC), since you can define what is allocated on the stack or heap.

You potentially could do as you propose, by inserting "free" at certain points where you could prove the variable was safe to throw away. Free would basically just then say "you can now postpone allocation, you have more free memory". But this has it's own drawbacks. For one you a minimal amount more to do (calling 'free'), but more importantly you will increase time spent in allocation objects, because you have to scan the heap for available space instead of just bumping a pointer.

The most important thing is allowing the programmer a way to avoid the managed heap, which Go does, and C# to some degree does through structs.


I'm not sure I understand how to make zero allocations? Does this mean you create your variables outside of a requestAnimationFrame (rAF) and update their values inside the rAF?


That's what it means, yep. Making object pools, etc. This can be harder than it sounds, to put it mildly.

In other words, completely subverting the GC altogether.


Object pools for vectors are damned useful...both Xith3D and Ardor3D in Java had free pools for their vector classes, and using those significantly impacted memory usage and frame delays.


So basically one needs to re-implement smart pointers in Javascript, right ?

What are the alternative to js style GC,for a language that wouldnt want to the developper to use manual memory allocation.I've heard about ARC.Are there other known architectures?


As I said: a pauseless GC with a smart language implementation would be fine. ARC can work, but either requires programmers to prevent loops or needs a GC regardless.

Object pools at the language level would be an alternative. (I.e. every object is allocated in a "pool", including pools. When a pool falls out of scope the entire pool is "freed". You can copy objects between pools as necessary.)


Or just a smart language implementation. Go makes it easier to avoid heap allocation, and thus GC, since it allows you to specify if a structure resides on the heap or stack.


thanks for your input.


> Object pools at the language level would be an alternative. (I.e. every object is allocated in a "pool", including pools. When a pool falls out of scope the entire pool is "freed". You can copy objects between pools as necessary.)

This might be called "memory regions" in the literature, or at least the work on using this memory management scheme in an ML (ML Kit).


Also related to arenas - the main difference is that an arena is a block region of memory from which different-sized objects are allocated, while a pool is a collection of same-sized objects. Both are pretty common techniques in C, C++, and Ada, and pools are common in high-performance Java.


Smart pointers won't save you.

Yes, they'll eliminate the need for collection, by moving the overhead from a single collection to ever construction/destruction.

This may be worth it to get a constant framerate, but the overhead doesn't disappear. If anything, it could be greater: allocating from the heap usually costs more than an object allocation done by any sane VM.


Constant framerate, even with overhead, can be preferable to random slowdowns.


The way you guarantee constant framerate in both a GCed or manual environment is the same: arenas/regions/pools/buffers. Manual memory management is susceptible to memory fragmentation, which can severely slow down allocation.


> And this is why GC isn't the be-all and end-all solution to memory management.

At least a GC knows where your memory blocks are and doesn't double free them.

> Now, if someone combined a pauseless GC with proper cleanup (i.e. skipping GC altogether) of variables where the compiler could determine when they can be thrown away,

Have you ever looked into ParaSail, Rust, ATS?


> At least a GC knows where your memory blocks are and doesn't double free them.

Wrong. Counterexample: https://mail-archives.apache.org/mod_mbox/subversion-users/2... (Obtained by a quick search of "garbage collector double-free bug" - there are many others out there)

> Have you ever looked into ParaSail, Rust, ATS?

ParaSail and Rust don't have a GC, period, AFAIK. ATS is too strict for my liking. (I want a programming language that will refuse to compile if it can find a "counterexample" that will assert, but will compile something (with a warning and inserted runtime checks unless explicitly marked otherwise) even if it cannot prove something is correct.)


>Obtained by a quick search of "garbage collector double-free bug" - there are many others out there

Yeah, a bug.

I'd rather worry about double free being the case of a GC bug -- in a shared core code that can be fixed and the problem will vanish for everyone -- than in anywhere I have to free memory myself.


> Wrong.....

I don't touch Perl since 2004. Back then it used reference counting, not a GC.

Second, the post reads like a problem in the C code.

> ParaSail and Rust don't have a GC, period, AFAIK. ATS is too strict for my liking.

I was replying about GC alternatives for automatic memory management.


> the post reads like a problem in the C code.

Exactly. All GC does is push down the code that can cause problems like double-frees into the language implementation. It doesn't magically make problems like double-free bugs impossible, like so many people say.

> I was replying about GC alternatives for automatic memory management.

Then why did you respond to and quote something that was talking about something entirely different?


>Exactly. All GC does is push down the code that can cause problems like double-frees into the language implementation. It doesn't magically make problems like double-free bugs impossible, like so many people say.

That's like saying moving the likelihood of an event from 1/100 to 1/10000000, and only under very specific pre-conditions that are easily detactable, doesn't make it impossible.

That is, you are technically correct, which is the worst kind of correct.

The difference between double-free bugs in stuff "pushed down in the language implementation" and double-free bugs in programmer's own code is so huge, it's a total game changer.


> Exactly. All GC does is push down the code that can cause problems like double-frees into the language implementation. It doesn't magically make problems like double-free bugs impossible, like so many people say.

The difference being the compiler vendor vs all the developers using the language.

> Then why did you respond to and quote something that was talking about something entirely different?

Maybe my bad English could not decipher "where the compiler could determine when they can be thrown away, matters would be different. (So, in other words, the compiler inserts `malloc` (or whatever) calls, and ensures that every variable created is either `free`d exactly once after it becomes unreachable".


Does anyone else feel like Gmail has actually been getting slower in the past year?

It feels like it takes a lot longer to load... And once it finally does load, you still end up having to wait a reasonable amount of time for Hangouts.


The old html version is still pretty quick

https://mail.google.com/mail/u/0/h/


I've felt that all of Google's services have been getting slower and less user-friendly over the past 5 or 6 years.


Google Contacts. So slow it hurts.


This is a cool article (from last year, FWIW). In particular, I like how Google uses Chrome browser improvements (the performance.memory API) to fix bugs in Gmail, and then in turn are able to reflect this back to use Gmail to notice bugs in Chrome.

This sort of vertical integration that Google is able to do is really powerful (they were also able to take advantage of this to drive the development of SPDY), but a little concerning for anybody that has a browser but not a popular website, or vice versa. Although in this particular case, I don't think anybody else implements performance.memory, so there would be no easy way for Gmail to track memory usage in other browsers.

Disclaimer: I work on Firefox and am speaking for myself, etc.


> Anecdotes of Gmail tabs consuming multiple gigabytes of memory on resource-constrained laptops and desktops were being heard increasingly frequently

You know, I think I recall being able to run quite nicely featured GUI mail clients and IRC clients simultaneously on a Pentium 90 with 16 megs of RAM. And, I expect those clients had an order of magnitude less developer effort put into them compared to the GMail client-side code. Meanwhile, I'm quite confident that Gmail team is composed of very, very smart developers. So, what am I missing? Why does GMail require two orders of magnitude more resources?


I often make this kind of argument as well, but the demands we put an email client through these days are nowhere near what such a machine could handle.

I have around 80.000 messages in my inbox, most of them with attachments around 10MB, and I can browse and search them instantly.

Then again, I mostly use Gmail's backend, my preferred interface is Mail.app


I don't think it normally does. I think what probably happens is that there are memory leaks. The longer you leave the tab open, the more memory it ends up consuming.

There's also the fact that it runs in the browser, which can make it more difficult to control or understand how many resources will be used.


Gmail still slow.

Unrelated: Why haven't Google implemented a Google.com-quality-level search for email?


Gmail's email search is already the best in the space, by far. I'm sure there are improvements to be made, but zero external pressure to make them.


Gmail looks particularly good after 5 mins using Apple's search in iClouds Mail.


The new Inbox has "Top Hits", which is a more relevance-based search for email. It seems to work quite nicely, for me. I like that it gives you a couple highly-relevant emails, and the rest are still time-sorted for combing through all results.


When dealing with the public web, a search engine makes a single index of the web that millions of users can hit to perform searches. That amortizes the costs of generating and maintaining the index over a lot of people.

Your email corpus is private and specific to you. If Google (or any other company) generated a high-quality search index for it, they'd spend a lot of CPU hours on it and only amortize that across a single user: you.


I find Gmail search to work pretty well for me.

The question I have is why the Google Voice iPhone app does not have a search function AT ALL.


I believe they're phasing out the Google Voice apps (although this may not have ever been announced) in favor of more integrated solutions. Now why Hangouts are not searchable, I will never know.

I could care a bit less about the colors and animation strategy Google is using, but why they don't just put a search bar at the top of every product of theirs is a mystery I'm afraid I'll have to live with for quite some time. Even the Google+ app (at least) has terrible search and takes too many clicks - I believe Facebook has actually caught up on this.


Last I checked, Gmail didn't do partial word searches. So “NewClient” won't match “GreatNewClient”. I think that's what rokhayakebe means.

It's one of the reasons I only use its backend. It baffles me that a company known for search can't do it.


I believe they're attacking the challenge from the opposite direction: Gmail search can be described as "okay" and Google works tirelessly to make the main Google.com search experience worse and worse -- eventually they will succeed in degrading it from its former excellence to a point where its quality matches Gmail's.


GMail always had a memory leak problem. Back in the early days of GMail (2004-2005), Internet Explorer 6 "trident" JS engine leaked memory like crazy. Manual garbage collection was always needed.


It is rather odd to think about "scaling" to a single computer, isn't it? This is more bloat management than scaling.


I was thinking much the same - the Gmail backend surely has scaling issues, but the frontend should be relatively simple. Afterall, you can still use the basic html view if you want, the ajax view is just some icing on the cake; it can't be that resource hungry unless there are bugs.


There are always bugs in complex applications (especially when JS makes it this easy) and people leave their Gmail tab open for weeks: not an auspicious combination.


Oh yes, I wasn't trying to criticize that. Every sufficiently large program will have bugs. It was just that the title implied this was a scaling issue, rather than a debugging issue.


It seems to me that they're referring more to the size of the application. "GMail Scale" implies highly-interactive single page web applications with a lot of functionality, to me.


Maybe scaling as in "scaling a team"? But yeah, it sounds weird.


>While JavaScript employs garbage collection for automatic memory management, it is not a substitute for effective memory management in applications.

And then they continue to make use of the Javascript's GC in order to do effective memory management... at gmail scale...


It's "no substitute". I read that as: it doesn't free you from thinking about memory. GC or not, you still need to care about it.


I always found onedrive services to be faster than google drive(the pdf viewer for instance).It seems that,recently,Gdrive downgraded their default pdf viewer,while proposing third party alternatives(which is an interesting system,I'm in the process of developping a good epub viewer for gdrive).

Afaik,what MS does is,as much as possible,do things server side instead of client side.For instance,online excel calculations are done server side then the resulting cells are just HTML fragments sent to the ajax callback.

It makes client apps consume way less memory,since less data is stored on the client.

I have a crappy Android phone (2.x) and onedrive html app doesnt feel slow on it.On the other hand,I just cant use gmail(ajax versions) or google drive html apps.They are just not responsive.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: