Hacker News new | past | comments | ask | show | jobs | submit login
Why Google Pixel lags 10x more than Moto Z (glek.net)
484 points by tapper on Dec 12, 2016 | hide | past | favorite | 197 comments

The title is "Why Google Pixel lags 10x more than Moto Z" but the content is just a microbenchmark of fsync() without anything resembling a measurement of UI lag or that the 10x slowdown in fsync() actually means a 10x (or any) increase in said lag.

The author's previous blogpost (http://taras.glek.net/post/Laggy-phones-and-misleading-bench...) contends that 4K writes are a good proxy for phone lag, but has no evidence or measurements either, just the author's contention.

It could all well be true! And it's great the author's dug up some concrete areas the pixel team could potentially improve at (dumb question: couldn't the pixel be updated to use most of these FS tweaks? could an in-use pixel FS be converted to use f2fs?). But I don't think the author has done a good job demonstrating an actual relation to UI lag or anything to do with the phone's perceived performance.

disclaimer: googler, but I do iOS things.

So much clickbait :/ Under no condition the Pixel lags "10x as much as Moto" when actually using it. It's a single benchmark of an FS performance for applications that do I/O on main thread. Something that is actively discouraged by things like StrictMode checks.

To be fair he did say the difference would be more obvious after a year of use.

Yes, this has shades of the 2012 Nexus 7 flash degradation issues...

I agree with the previous commenters that the clickbait headline really oversells this article, but this is still a useful investigation.

The 2012 Nexus 7 is a good comparison. It had slow, cheapskate flash memory, which degraded over time. But why should that cause UI jitter? As several people have pointed out, well-written apps use separate threads for UI and I/O. There were some big UI jitter problems in the OS, but there was a big effort to fix those in Jellybean, which the Nexus 7 shipped with.

The problem is that just using separate threads isn't enough to keep UI and I/O completely separate. The UI uses the GPU, and the GPU fights with the kernel over access to I/O resources. Maybe the UI thread or GPU driver has a cache miss and needs code paged in; maybe the OS is busy writing out another page.

If the Motorola phone really does have 10x lower I/O latency, that seems very significant, but we don't know if it really affects the UI without measurements.

Just setting 'nobarrier' on the file system sounds dangerous though. I'd be very wary of that unless they have really good arguments and measurements to show it's safe.

I write Android apps, and I/O reliability is a big headache. I'd put the success rate for simple file system operations at only around 99.9% -- as soon as your users number in the thousands, there are always a few users whose phones fail in bizarre ways. Most of the problems show up on Samsung phones, but that might just be because they're the most popular brand.

Apple hardware definitely ages much, much more gracefully than any Android hardware I've seen. That might be okay if everyone got a new phone every two years, but that's definitely not the case; tons of people are using old phones.

I'm skeptical that off-main-thread I/O is a panacea for high I/O latency. It's certainly a boon; to be sure - but at the end of the day you want that I/O to complete. Showing more frames while you wait for that I/O to complete is going to paper over (very) small delays, but even with a fluid GFX, an app can still feel frustratingly slow.

In fact, smooth but overlong animations can feel more laggy simply because the time it takes for the phone to do what you actually wanted is longer.

If 4k of I/O take 16ms or more with great regularity (and p99 seems rather frequent to me), you're going to regularly hit high multiples of that in any kind of mildly intensive workload (which writes a lot more than once, and a lot more than 4k). And at high multiples of that, no amount of off-thread I/O is going to be able to hide that latency.

So sure: asynchronicity will make slowless a little less ugly. And that's about it.

(On the other hand, fsync is a pretty-heavy handed tool, and he doesn't show the latency for non synced I/O - so maybe the problem is just that software is too conservative - I mean, how often do phones crash, and how bad is losing a small amount of data then?)

how often do phones crash, and how bad is losing a small amount of data then?

Phones crash a lot. Losing data can be anything from a minor nuisance (lost a few seconds of work) to a massive headache (some important file was subtly corrupted).

Corruption is an orthogonal issue - losing any amount of data, even a single bit, can lead to corruption. Avoiding corruption by lowering I/O latency is not a viable strategy - at least not given "realistic" latencies and realistic frequency of I/O. In theory - sure, if I/O were once per day, and latency 1ns, you might accept the low chance of corruption without mitigation.

In practice, you almost certainly need some concept of atomic actions or transactions to deal will corruption.

The more relevant question then isn't whether phones crash and cause corruption, it's whether the additional amount of committed transactions you lose by relaxing durability requirements matters. You're always at risk of losing those in flight - does it matter if you lose the photo you just took? It depends, but I suspect the linked articles fsyncs/second is certainly massively excessive. The "bad" case was after all still just 1/50th of a second, i.e. much less than the blink of an eye. My contention is that phones don't crash often enough to make sub-second durability worthwhile. Most phones I come across certainly crash less than once a day, and losing your past 1 second of actions once a day sounds acceptable to me.

In short: the benchmark is not directly useful. Perhaps it correlates with some relevant aspect of I/O performance (but that's not clear), or perhaps software is buggy and actually tries to fsync more than 50 times a second (but then: fix that), but it may simply be a meaningless benchmark.

> but has no evidence or measurements either, just the author's contention

He both produces some measurements he has made, and cites an academic paper that corroborates. It's not certain proof but it's a long way from 'no evidence'.

I just read the paper and it doesn't say or imply anything about lag.

Can you point me to any evidence related to the title, i.e., Pixel lag is 10x worse than Moto Z?

> couldn't the pixel be updated to use most of these FS tweaks

That would require reformatting, so probably not.

Well, fstransform can reformat filesystems in place, so theoretically it might be possible (it cleverly uses sparse files for this). Not with the filesystems mounted though, so the update should be applied at reboot and it potentially takes a long time. Also very risky, especially with a near full drive.

fsync benchmarks are stupid.

(neither a Pixel fan or hater here)

Please tell us why? If you have expertise with it I would like to know the flaw, I have zero experience with it.

Basically, the benchmark is testing a phone that doesn't run the code for fsync vs. a phone that does. The phone that has a no-op for fsync will clearly be faster at not running any code than the one that actually does what the programmer instructed.

It is like asking what is faster:

    def sin(x):
        return 0

    def sin(x):
        actually calculate the sin(x)
More generally, how often do you call fsync in your code? I very rarely call it and generally allow the operating system to write the file to disk when it is convenient.

If calling fsync is critical to your application, then the Moto Z has broken your app by making fsync a no-op. If it isn't critical to your application, then why are you calling it?

If you use a database for storing your apps data like sqlite, which you should, then you're doing an fsync on virtually every insert/update operation. One of the main reasons for using a database is that it takes away the headache of deciding when and how to fsync.

If you manage application state yourself, and the phone does power off or kill your app at an unfortunate time, you might up with a corrupted application state. If you're a big company like Facebook or Google with millions of users this will be happening multiple times per day, and produce a slew of complaints about your app.

Depending on the exact mechanism and implementation, making fsync a no-op is not necessarily bad. As long as it can (with highest priority) fsync all of its buffers upon notification of loss of power and be done before all capacitance is lost then there should be no trouble at all. Of course, this requires some coordination between the hardware, the OS and the filesystem which might be non-trivial even for a fully integrated company like Samsung so it could be it simply doesn't work at all like that.

>Depending on the exact mechanism and implementation, making fsync a no-op is not necessarily bad. As long as it can (with highest priority) fsync all of its buffers upon notification of loss of power and be done before all capacitance is lost then there should be no trouble at all.

I'd guess the most popular apps DO use a database on Android, right? So we could make some testable predictions. If the Moto Z's fsync is a no-op, then we should expect more database corruption. So if we don't see database corruption, we can presume a highly prioritized, efficient fsync on power loss notification. Battery pull vs. low battery poweroff event? I'm still thinking about controls and test methodology. Also, either Android app databases don't corrupt often, or they're good at recovery, because I haven't seen app errors (aside from incompatibility) in a long while. Then again, I have a Pixel, not a Moto Z.

I am going to reflash with these filesystem changes though; just not sure how to benchmark.

benchmark some database while randomly kernel panicing the phone every few minutes? (http://unix.stackexchange.com/questions/66197/how-to-cause-k...)

The database should never get corrupted. If it does, you have a bad fsync implementation.

> If you use a database for storing your apps data like sqlite, which you should, then you're doing an fsync on virtually every insert/update operation. One of the main reasons for using a database is that it takes away the headache of deciding when and how to fsync.

It batches the fsync per transaction, so there's only one fsync for 10000 inserts in one transaction.

> Depending on the exact mechanism and implementation, making fsync a no-op is not necessarily bad. As long as it can (with highest priority) fsync all of its buffers upon notification of loss of power and be done before all capacitance is lost then there should be no trouble at all. Of course, this requires some coordination between the hardware, the OS and the filesystem which might be non-trivial even for a fully integrated company like Samsung so it could be it simply doesn't work at all like that.

It's supposed to work if the OS crashes. Kernel panics are not incredibly uncommon.

I am willing to buy your argument, but you need to tell me why

> The key option is nobarrier. This effectively makes fsync() a no-op and explains most of the difference in performance.

In other words, because the Moto Z cheats.

I suppose it's understandable... nobody should ever be blocking GUI animations on fsync, much less two of them in a row, but here we are.

Not exactly a cheat. A calculated risk. A friend of mine were having the storage discussion on phones and realized that phones never really experience a 'power outage' in normal use. People use their phones, they get low in power, and then they plug them in. Or the battery gets so low the phone shuts itself off in an orderly manner.

In the 'computer is a phone' perspective there is no sudden power loss unless the user yanks the battery out of the back. That is presumably a rare occurrence for your typical user. As a result, the big risk for phone file systms is that the phone crashes, but nearly every phone I've seen preserves memory contents as long as power is applied, so even a crashed phone, on reboot can pick apart the previous bits in memory and reconstruct what was going on before it crashed.

Given that fairly unique to phone criteria I could see nobarrier as a legit option.

> reboot can pick apart the previous bits in memory and reconstruct what was going on before it crashed

What you're proposing is possible in theory, but no general purpose operating system I've seen actually attempts to recover page cache state from uncleared RAM upon reboot.

Linux has PRINTK_PERSIST [1] to recover the printk logs after (warm) reboot.

[1] https://lkml.org/lkml/2012/3/13/45

Something related though battery (or capacitor) backed write-cache on high end servers.

> unless the user yanks the battery out of the back. That is presumably a rare occurrence for your typical user.

I've witnessed a number of phones with the "crumple zone"-alike feature of effectively ejecting the battery cover + battery out of the phone whenever you drop it.

I would hope dropping a phone is also a rare occurrence for typical phone users. Otherwise they're probably better off getting a Nokia 3210 instead of a smartphone with glass screen.

> I would hope dropping a phone is also a rare occurrence for typical phone users.

I drop my phone all the time and I don't have a case. It's fine.

Me too, though the screen did get a small crack once. If there was a 1% chance of file system corruption each time I dropped my phone, it would have happened to me by now, I think. And usually when I drop my phone, the battery comes out. It used to happen only half the time, but I think the cover got loosened as a result of all the times I dropped it, so now the battery falls out pretty much every time.

It's 2016, if you abruptly shut down a computer in the middle of a filesystem write the chances of ending up with a corrupted filesystem are of around a 0 %, so those are the chances of getting a brick when you drop your mobile.

If you mount with nobarrier I'd say that the chances are somewhat higher than 0%

Not with any filesystem with a proper journaling system. Filesystem corruption is almost impossible.

Barriers are what force the journal to be written consistently, and IIRC they were enabled by default after 2.6.31 because of corruption with ext4.

you could have corruption and not know. i'm pretty sure on the mac hfs+ corrupts files all the time (why? i've run into it multiple times). but it's not necessarily something you'll immediately notice.

That's upsetting. I'll try not to drop my phone while an update is in progress in that case.

I dropped my phone at the weekend without a case. I now have a shattered screen. :/ It's about the 5th time I've dropped my phone and this is the first time something like this has happened.

Apparently the glass in phone screens develops micro fractures from being dropped. (Well, from hitting the ground). You can't see them with the naked eye, but they weaken the glass and they grow with each successive drop.

It's probably not just bad luck that the glass happened to break after the Nth time you dropped it. It was bound to happen if you kept dropping the device.

Carpet or hard floor (ceramic, etc) ?

Wood floor and sometimes concrete. I don't drop it from my hand, usually from a pocket or lap, though.

What phone do you have?

Moto G, 3rd gen

Hah, I have 2nd gen. I drop it pretty often too. Slippery black plastic back.

I drop my phone with unnerving regularity, but I use a case and it's always fine.

I suspect that if I didn't have a case I wouldn't drop it so often.

I have to go with the article's position, though - I'd rather have the few seconds data loss and get the better responsiveness and a longer-lived device.

It all depends which few seconds of data are lost, though, right?

Do you just lose a new contact, or corrupt an essential database?

> Do you just lose a new contact, or corrupt an essential database?

How essential are we talking about here?

What I mean is, your smartphone is a device that can get lost, stolen or destroyed any day. All the data that is physically stored in it should either be expendable or synced elsewhere.

Something could be corrupted and then that corrupted data could be synced to the cloud, trashing what you had there.

> Do you just lose a new contact

Are people creating new contacts that often on a phone? I probably do that only a few times per week. If the phone crashed immediately following the input, I'd just ask the person for their details again.

> corrupt an essential database?

It would only corrupt application data. Android isn't completely stupid, /system is mounted read-only on every Android phone I've ever seen.

I guess it depends on how highly you depend on application data remaining consistent.

Presumably the corruption would only apply to data that was modified but had not been flushed to flash before the power cycle, so at rest data shouldn't be affected.

  >  I probably do that only a few times per week.
Look at Mr. Popular over here.

> at rest data shouldn't be modified

If your data is kept in a SQLite database or any other type of compound storage scheme it's entirely possible to lose data that wasn't being modified, due to corruption of the metadata that governs the layout of the file (actually I don't have experience with SQLite file format specifically, but I ruined my startup's launch a decade ago with almost identical reasoning--thousands of pressed CD-ROMs in the garbage).

>longer-lived device.

Define longer-lived, you'll probably have replaced the battery several times before flash wear becomes an issue

I've seen this too, but the Moto Z is not one of these phones (the battery is not "user"-removable).

This is true for the first few years, but aging batteries can cause problems. My phone is in a situation where under higher power draws (GPS + navigation) it will simply die when the battery falls to 25%. It's been doing this pretty reliably for a month or 2.

I have noticed this with a cheap Window tablet i acquired recently. Left alone it will eventually hibernate, something my Android tablets have never attempted. Similarly, when it goes below 4% battery it will basically force a hibernate.

That said, all mobile devices, laptops included, have a "force shutdown" option activated by holding the power button for x seconds.

In the 'computer is a phone' perspective there is no sudden power loss unless the user yanks the battery out of the back.

Or the battery is a couple of years old. Or the user lives someplace where it's cold outside.

Neither of those should cause a sudden power loss. Even if your battery rapidly depletes, the phone will still shut itself down properly before it loses power.

It shouldn't yet sometimes it does.

fsync() has never been as reliable as the name implies, since flushing the disk cache kills performance system-wide. Fuzziness on what fsync does has... side effects: https://danluu.com/file-consistency/ http://blog.httrack.com/blog/2013/11/15/everything-you-alway...

Filesystems optimized for flash, and for battery-backed systems in general, (laptops, phones) have some history: https://en.wikipedia.org/wiki/Flash_file_system

The fundamental problem is that fsync is too powerful. Most of the time, what you really want is a write barrier --- you don't care about when writes happen, but you do care about their order. fsync is a write barrier, but it's also a synchronous flush.

But don't listen to me. I like TxF too.

Or in short, calling fsync() twice is a tweak under-known: the file itself needs to be flushed, as well as its parent directory.

> I suppose it's understandable... nobody should ever be blocking GUI animations on fsync, much less two of them in a row, but here we are.

this is the thing that confused me.. why are apps doing disk i/o on the GUI thread?

I think lots of stuff causes disk I/O that you might not even be aware of.

For example, on iOS, +[CLLocationManager authorizationStatus] (checking if the app is allowed to use location or not) has to read a file [1]. Almost no app developers are aware of this.

[1] I've been told this by guys who worked full-time on location stuff, but I haven't verified it myself and can't find it in docs, so take it with a grain of salt

fsync() in particular is usually only called on (well, after) disk writes though - I can see that being involved in all kinds of system API calls too, but hopefully the system makers would be savvy enough to use an actual write barrier instead of being fsync-happy.

Because they don't care enough not to. There are even developer tools to help applications not do work on the UI thread: https://developer.android.com/reference/android/os/StrictMod...

Threads are hard, basically.

Even if they don't, the kernel will likely drop everything to get that fsync out of the way. So a fsync happy app that is doing a file write in the background can cause the whole UI to lag because the kernel is busy dealing with said fsync call.

So from reading the XFS FAQ linked in the article(http://xfs.org/index.php/XFS_FAQ#Write_barrier_support.), does this mean that Moto Z never flushes the cache to disk?

Wouldn't the cache HAVE to be flushed as it fills up at some point to maintain coherence.

Yes, it has to be flushed from time to time. Normally this happens automatically when it is needed. fsync() just forces a direct flush.

>In other words, because the Moto Z cheats.

If you're not cheating, you're not doing it right.

I wouldn't really say Moto Z cheats so much as that Google designed the entire software ecosystem of Android without ever talking to a hardware engineer.

They simply do not care about optimizing the performance of the phone running their software. Otherwise they never would have chosen a garbage-collected language like Java for the platform in the first place (I understand they've ameliorated this concern recently). Or used ext4 like here.

> I wouldn't really say Moto Z cheats so much as that Google designed the entire software ecosystem of Android without ever talking to a hardware engineer

Not sure if you're being serious or hyperbolic - but Google bought Android, Inc (an Andy Rubin startup). The team had a lot of Danger alumni - Danger being the makers of the HipTop - aka the original T-Mobile Sidekick (also running Java). I'm certain they knew a thing or two about hardware, they just made trade-offs you disagree with.

or developer had to build separate app for x86, armv7, armv5, armv6, armv8, mips,...

Really just arm* until the last couple of years. These choices were made forever ago.

This articles conflates a lot of things but it also has the priorities somewhat wrong.

1) fsync cost. Yes, fsyncs are dangerously slow in any Android app. (SQLite for example is a common culprit. Shared Prefs are another). HOWEVER, it's possible that flushes cause reads to be queued behind them (either in the kernel or on the device itself) which is even worse because

2) Random read cost is super super important. Android mmap's literally everything and demand paging is particularly common AND horrendous as a workflow. To add insult to injury, Android does not madvise the byte code or the resources as MADV_RANDOM, so read-ahead (or read-around) kicks in and you end up paging in 16KB-32KB where you only wanted 4KB.

Also, history has shown custom flash-based file system on Android to be a world of pain. yaffs, jffs have some pretty atrocious bugs/quirks. I'd much rather see the world unify on common file systems, optimized for flash-like storage, rather than OEMs shipping their own in-house broken file "systems" (I'm looking at you, Samsung).

Why can't f2fs be that common file system?

I just read the F2FS paper and it seems very well-designed to match the physical properties of flash, plus some interesting capabilities to keep hot/cold data separate. If there's something wrong with F2FS, let's fix it. This seems like a far better place to start from than any filesystem designed around the assumptions of a spinning disk.

It's in the mainline Linux kernel now, it's hardly some proprietary obscure vendor thing.

That's fair, it's a better state than the previous attempts.

Still, it's not as tested as, say btrfs and ext4. Can't wait to see its particular quirks.

While it no doubt impacts performance in some cases, MADV_RANDOM probably isn't the correct choice for a lot of circumstances (the classloader tends to do a linear scan of JAR files, for example).

Except that's not how class linking works on Android :)

In particular, everything is compiled down to lookup tables and hash tables within each odex/oat. Your point still stands but the hit is much lower than you would think and given the slow speed of the superfluous reads, it ends up being a net positive for A LOT of cases.

Well, during verification, you very much do a linear scan as I described. Of course, you only verify once, so that minimizes that use case.

The way odex files are structured, there is actually a fair bit of data sequentially organized (for example dependencies), even with the indexing. The odex format does seem to have some elements that anticipate read-ahead (e.g.: those hash tables, dependencies...).

That said, there is a real question about proper tuning of read-ahead for flash memory (like, perhaps 4k or even 0-byte read-ahead is the right thing to do in general ;-). It's not like it is hard to abuse it.

> Android mmap's literally everything

Why is that so?

allows code & resources to stay as clean pages because clean pages can be swapped out even though there's no swapfs whereas dirty pages can't (since there's no swapfs).

Why would you want it NOT to be so?

This makes very little sense. The author shows a fsync() benchmark comparing an actualy fsync (Pixel on ext4 without nobarrier) with a nobarrier (no-op fsync) alternative. The only thing this benchmark shows is that a no-op is faster than the real thing.

The title of the article is "Why [the] Google Pixel lags 10x more than [the] Moto Z." The author showed that the Moto Z is using a no-op fsync while the Pixel is doing the real thing. I think the author appropriately explained and answered the title prompt.

But the author fails to prove that fsync() is in any way the bottleneck for common I/O operations in everyday use of a smartphone.

Was that the hypothesis the author was trying to prove? Because I don't believe it was.

The title is literally "Why [the] Google Pixel lags 10x more than [the] Moto Z". If he doesn't show that fsync(3) is the gating factor, then the fact that the Moto Z nops out fsync doesn't mean that much.

Saying "no-op is faster than fsync" is just not as click-baity a title.

There is a good question though... do you really need fsync on Android? If you don't, why are you calling it?

"try to avoid buying devices that will slow down to point of being unusable as NAND wears out (ala Nexus 7, Nexus 6)"

Has anyone experienced this issue with their Nexus 6? My phone is more than 2 years old and I have no noticeable slowdown. The pixel might have the slower storage option but it has no effect on usability. From what I have read its UI performance is the best of any Android phone yet.

"The Pixels are fast — noticeably faster than Samsung's Galaxy S7. On performance alone, these are easily the best Android phones you can buy." http://www.theverge.com/2016/10/18/13304090/google-pixel-pho...

This slowly happened to my original nexus 7 tablet. Sad. Eventually its just too slow to use. Ten minutes or longer to bootup, etc. Of course its a gamble, will people crack the screen or kill the battery before the memory is destroyed, if so who cares ship it.

You can play games for awhile stripping stuff out of memory so it can live in the fast area but eventually its speed drops to approximately zero even if all you have installed is the kindle app at which point I tossed it.

There is a lot of cargo cult about there about wiping the cache over and over supposedly helps in the sense that the owner feels they're doing something, although all they're doing is wasting the last few R/W cycles left.

The original, 2012 Nexus 7 was particularly bad on that front, a few searches turn up that apparently particularly bad hardware selections were made for the NAND.

Not saying that this means all later products have this issue licked, just that the original Nexus 7 is a known awful case.

so this is whats going on with my N7... it got slower and slower...

Actually, yes. At work we have a Nexus 6 testing device. It started off strong but a few months ago it started lagging badly on videos. It even struggles with gif length mp4s. The problem is notable because it is the only phone with this problem. Even our old Galaxy S4 performs better.

My old Nexus 6 had similar issues, but then the wifi chip failed and I had it rma'ed. The camera fails as well when I try to autofocus. It seems like Nexus devices are somewhat plagued with issues. I'm knocking on wood that Google/HTC get it right with the Pixel.

What kind of videos? Do you have a link perhaps where I could test this?

Giphys and rewarded video ads. We are a gaming company so you can see where our priorities lie!

My nexus 7 became utterly unusable over time, even after a complete factory reset.

The only way I was able to make it remotely usable was to install an old version of cyanogen mod and install only bare minimum apps.

I certainly have - I just replaced my Nexus 6 with a Moto G4, which somehow seems faster with the exact same set of apps and prefs.

As an aside, having a phone that costs $150 is sort of amazing. It's removes the fetishism of "oh god, what if this breaks?" that I had with my iPhones and Nexii. It's a new and interesting feature that isn't discussed enough.

I have to wipe out my nexus 6 every couple of months for it to be bearable. My last experiment took me to franco.Kernel and that have proven itself good during last couple of months, but I'll keep watching.

I want to say that unless you have a bunch of apps wrecking in the background then the whole wiping your phone every couple months and seeing a difference is all placebo effect to me. At least now(since 4.3)

It's not really placebo effect when you can get a stopwatch and time how long it takes to boot, unlock the phone, finish starting an app, and so on. I had this exact phone, and the wipes took a measurable amount of time off of all these things.

I perked up when I read this part of the article. I've had my Nexus 6 for nearly two full years and it's not fun to use. Very laggy and I missed many moments because the camera wouldn't load. I ended up getting a Pixel and I'm worried that in a year I'll see this same phenomenon again.

Seriously. Am I doing something wrong(or right)? I have never experienced any slowdown on any of my newer model Android phones.

I've got a Nexus 5x and the performance was disappointing from the start. But maybe it just feels like that because I came from an iphone 5s.

Oh no buddy the Nexus 5x had some issues when initially released. Google did work these issues out though

From Tim Murray, performance engineer on Pixel:

>that fsync blog post floating around is pretty much bogus. also nobody should use nobarrier, it's not safe at all


I don't think this could be interpreted as a visual lag. fsync() is usually used by databases, not UI libraries. So the title is misleading.

My chinese Android phone has a more annoying hardware lag - a delay between touching the screen and touch event processing is over 100 ms. Any drum app is unusable. And if you try to scroll something up and down fast it is easy to see how the content on the screen lags behind finger movements.

> I don't think this could be interpreted as a visual lag. fsync() is usually used by databases, not UI libraries.

As others have pointed out, a lot of apps do I/O related work in the UI thread, mostly b/c people don't care enough not to (and sometimes it's hard). Flipping a toggle somewhere can easily cause a write to a sqlite database to need to happen.

And it's for far more than just databases.

> So the title is misleading.

I don't think the title is that misleading, it is very much and significantly faster at storage operations than the Pixel, and those matter quite a bit in your daily usage.

> I don't think the title is that misleading, it is very much and significantly faster at storage operations than the Pixel, and those matter quite a bit in your daily usage.

All that was tested was fsync(), not "storage operations", and the Moto Z was faster at fsync() because it turns off fsync.

But it's not clear why this matters to anything. Lag is usually the result of file system reads being slow (paging in code/resources), not writes being slow, so how does fsync performance matter?

> All that was tested was fsync(), not "storage operations", and the Moto Z was faster at fsync() because it turns off fsync.

That's not what was tested, that's only what is shown in the graph. The lack of fsync contributes to it.

I wrote a little fio benchmark driver to fill all available device storage with random 4k writes, print perf stats along the way

One of the other things clearly mentioned is that b/c of Google's use of an additional FUSE filesystem they take a 30% performance hit:

This means that on the Pixel every user IO gets a round-trip back into user-space before hitting the NAND. Fuse burns more CPU and slows down IO by up to 30%.

Sure, the nobarrier trick makes a lot of difference but it's not the only thing in that post, nor was that the actual benchmark.

Google uses FUSE for shared storage, not internal app storage. Apps are supposed to keep most of their data in internal storage since shared storage breaks the whole isolated security model. Content providers and other intents are supposed to be preferred for sharing data. It might a good thing that doing things properly is faster... and throwing everything into the kernel means more and more attack surface.

Google maintain the kernel implementation (sdcardfs) too... they choose to use the FUSE implementation for good reasons. Of course, most other vendors only care about benchmarks, not things like robustness / security.

Ok maybe the author tested other things but he didn't REPORT on anything but fsync results. There's no comparison at all of file system performance other than fsync() numbers.

> One of the other things clearly mentioned is that b/c of Google's use of an additional FUSE filesystem they take a 30% performance hit

Only on one mount point which isn't used for anything lag-related. There's no code on that mount point. There's nothing UI-critical on that mount point. It means your photos might load a bit quicker, but again given that there was no actual numbers or objective information we don't actually know how the two compared. Good job to Motorola to handle that in the kernel, but it contributes nothing to the spectacular claim that the Pixel has 10x more lag than the Moto Z.

But given this claim: "They got Moto-Z to performing close to high-end laptop SSDs." I think it's pretty obvious the author only actually looked at fsync(). Unless somehow the Moto-Z is ~5x faster than the storage in an iPhone 7 (which is the gap between an iPhone 7's storage and a high-end laptop SSD), it's obvious bullshit.

I suspect that the kernel prioritize getting fsync done above all else. This because data written to disk is data safe from sudden power loss. This means that all other calls, ui draw calls included, gets put aside until the fsync is done. And that depends on the IO speed of the hardware, and the amount of data in the queue.

On older android devices i have run apps that could show IO rates. And invariably when things would lag or freeze the IO would be maxed out (one offender would be Facebook and their Messenger, because after each update they would force a recompile or something).

Android 7.1 is supposed to have updates to the graphics stack that improve input lag. Apparently the Nexus devices went from 48ms lag to 18ms lag with the release.

Offtopic: I love my old Moto G, but I've been reluctant to buy any new phones from Motorola since its purchase by Lenovo, due to Lenovo's history of shipping computers with rootkits and malware. Should I suspect that Motorola phones are compromised, or am I just being unreasonably paranoid?

In corporations like that don't expect a common theme. Different departments manage different devices and unless there's a company wide push for a culture change (which would leak to the public), I wouldn't expect changes any time soon.

But on the other hand, if you are paranoid about security in Android, you should either go with a Google phone (quickest updates), or something like Copperhead.

I think the commenter is more worried that he'll be funding a company whose practices he doesn't support.

And with the consolidation of companies, it becomes increasingly difficult to find any company that does things right and is owned by a company that does things right (and has sibling companies that do things right).

I would not recommend Motorola phones anymore simply because Lenovo has utterly trashed Moto's reputation that they acquired under google for timely OS updates. See for example http://www.androidpolice.com/2015/10/02/wtf-motorola-markete... . One of the first things Lenovo did was to lay off a bunch of Motorola's software engineers. They've adopted a Samsung-style shotgun approach to product design (I don't even know what letter they are on now -- M?).

Agreed. I hadn't been a fan of Moto for quite a while, partly thanks to their association with the obnoxious "DROOOIIIIID" phones they manufactured for Verizon. Overall they just hadn't been impressing me at all.

But when I broke my Nexus 5 I needed a replacement. The Nexus 6 was a bit larger than I was interested in buying and at $650 it cost a bit more than I wanted to pay for a device I was not really too keen on.

Enter the Moto X (2014 version) which was essentially a smaller version of the Nexus 6 for around $450. OS was all-but-stock Android and updates seemed to be almost as fast as Nexus devices thanks to Moto Mobile's Google ownership. Picked one up and it was one of the better phones I've owned. A little while later, I picked up a Moto360 on sale for $125 because I had some "play money" and the itch for a new gadget to play with.

The phone was great but over time updates slowed as Google sold them off. Then my watch started crapping out due to a defect and I had to deal with the new Lenovo-owned support team.

Dealing with that support department was miserable. Google has a bad rep in this department but it's nothing compared to the "new" Moto. It took me months to set up an RMA and get the issue resolved as their website constantly crapped out or failed to work properly while setting up the RMA. Their support techs were either unavailable or unable to offer assistance.

After that phone, their later offerings got rid of the near-stock at a good price that made the Moto X so attractive and as things went on, they continued to move toward mediocrity.

This time around, I bit the bullet and paid "iphone money" for a Pixel. So far, other than the bland design it's been an excellent device. And while I would've loved a return to the almost-flagship-for-half-the-cost of the Nexus 4 and 5, in the end, I found that I'd rather pay an extra $200 for something I use daily for 2+ years rather than suffer the delayed updates and poor support.

Motorola is a rare example and even more interesting regarding the relation with Lenovo.

Since they keep an almost pristine Android, and hardware is rock solid, I think the paranoia is not necessary. My Moto X force is really unbreakable.

Another interesting bit was that, instead Lenovo to "eat" the Motorola, it seems that it will move all its mobile handsets to the Moto name and leave quite some large space to develop, while supporting it with lots of money.

I was really really skeptical at the beginning and I also told myself not to buy anymore from Moto under Lenovo, but in the main time my opinion changed and seems they go, let's say, not in the wrong direction, especially the Moto Z and the mods.

The most obvious reason to avoid Motorola is that unlike other major vendors didn't start doing regular security patches post-Stagefright.

No, you're not. After getting caught with like 3 backdoors, it would unreasonable to think you can still trust Lenovo after that.

I wouldn't trust any Chinese company product or service, especially after the recent "cybersecurity law" that allows the government to force companies to install backdoors in their products (both local and foreign).

Perhaps all of this will change in the future, but not anytime soon. People should boycott Chinese products until that happens. It would also set an example for the US and everyone else that it's not acceptable to put backdoors in your products.

"They drove development of the filesystem specifically by Twitter/FB/etc workloads captured from the phone."

Why would Twitter or Facebook apps need to write much to persistent storage? They don't do so in a browser.

Phone memory is severely limited and they have to unload images as you scroll to get the scrolling smooth. Scrolling forward and backward is a common user habit, and netwok latency is high, so you gonna need a local cache.

Lying on fsync doesn't seem a good option tho, even if developers abuse it.

But you don't need to fsync that. It's presumably in a temporary file, or you're just reading it.

You're the only one linking these two ideas together. The article expressly separates discussion of how Samsung gathered data to inform their filesystem design from discussion about Motorola's fsync implementation, and they're inherently separate concepts.

I suspect what happens is the developer just dumps everything in a sqlite database, which will fsync for you on commit.

^ I think this is it. App developers are taught to reflexively use the local db to store data, not files - but may not understand the difference performance characteristics of this db. Someone (like me before reading this post) working on an app for the first time might not realize they're thrashing out fsyncs with UPDATES.

My phone's 4GB of RAM is severely limited?

The iPhone 6s was the first iPhone with more than 1gb of RAM. So, at least in that ecosystem, there's a long history of tight memory constraints. In non-Western countries there are lots of cheap and not-so-well specced Android phones, too.

I know what you mean, but most phones don't even have 2GB. Most <$200 Android phones still have 512MB or 1GB. iPhones are notorious for lacking memory compared to their cost and a broken virtual memory system.

Also, I have an insane amount of tabs open in chrome. I'm not sure what the rule is, but it seems like it keeps ~5 tabs in memory, depending on how much is in the current tab. (Tangent: Opera has a great offline feature for saving stuff before you get on the plane.)

I'm actually a little amazed than I can scroll pinterest or tumblr as well as I can.

> Most <$200 Android phones still have 512MB or 1GB.

Might be you talking of something <$100 since on Amazon there tons of offers with 1GB for ~$50 and 2GB RAM that still under $100.

Mine Xiaomi Redmi Note 3 Pro cost $170 and it's has 3GB RAM.

You're right, the price point has shifted. I don't know about Xiaomi, but phones like the Moto G 3rd gen and comparable Samsung/LG/HTC/Google phones with 2GB are above US$200.

I'd also caution that the prices for US consumers on Amazon are far lower than what is experienced in smaller markets, which have much less competition and fatter telcos. Massive monoply power at work, most tech simply never makes it to many countries at all.

Outside of the US, the telcos generally aren't controlling the market for phones; import tariffs can be pretty high though, and divergent frequency plans segment the world market as well.

In many countries people still pay for their phone as part of their phone plan. Whether that is due to a lack of credit, some perception they get a discount or just habit, its a real thing. Many people only buy a phone outright after they break the one they have on a two year contract.

I was recently in Thailand, and even in major metro areas it was obvious that most shops were owned by a cabal of monopolistic fat cats. Similar situation in many parts of the middle east. Prices for equipment which would just work with a direct import (like iphones) are ridiculous.

Moto G 4th gen is ~150 new, I just bought one.

Not yours. Average phone. Apps aren't developed just for special snowflakes.

Also, until recently, Android had a per-process memory limit.

Yes. Your phone is limited. Most of the phones max the VM's to something like 128 MB. For a while you couldn't easily move to a large, per app memory size. Even today the apps you run are probably doing so with just 128 MB caps.

The rest of the world isn't so lucky?

I'm still running my moto g with 4gb of ram and it performs perfectly fine, even on latest android. If the moto g has aged so well I can guarantee you the market share of phones with <= 4gb ram is quite large.

I think you mean your Moto G has 1GB, or 2GB at most.

My Nexus 5X only has 2GB.

Dunno about Twitter, but Facebook have gotten into the habit of forcing a full app compile on each update.

What exactly does that mean? ART will recompile (and pre-art, dexopt) an app on every update, it's not something you do yourself.

Other comments are querying the relevance of fsync() as a 'lag' benchmark, but I want to query whether fsync() is even meaningful at multiple calls per second.

I know fsync() ensures data gets written to disk, but why does anyone care that it can happen so often? When a device crashes, some data (prior to the sync) may be lost, but do we really need multiple checkpoints per second to ensure only sub-second data loss?

I'd be content with a couple of minutes worth of loss even on my main PC, with its lack of battery backup. To enforce rapid syncs on a phone seems utterly pointless.

Keep the syncs for meaningful checkpoints, like buying something in an app or marking a message as sent. Multiple fsync() calls per second are a total waste.

Loss isn't the same thing as corruption, which is the real fear here.

What you're looking for (I think) is SQLite in WAL mode, with PRAGMA SYNCHRONOUS=NORMAL. That's ACI but not D.

One of the few times they make sense is in a database server application where you might have hundreds of thousands of connected clients.

Any real proof that this actually causes user visible issues at all?

No. Just that it might, depending on how the Pixel's NAND memory degrades over time.

My money is on the battery crapping out first.

I don't see the hard link between GUI smoothness and io. You can have perfectly smooth 12000fps rendering and crappy IO, everything will just be "loading...".

My old 4S runs annoyingly slow. According to the article,the NAND, as it ages will slow down the phone, so will replacing my storage make my iPhone fast again?

Highly unlikely. The flash doesn't slow down, but its hidden pool of spare blocks diminishes as random blocks fail. Delete some junk apps and old photos and you will give the storage allocator more room to play with.

Also, disable background apps unless you really need them. Also Settings -> General -> Accessibility -> Reduce Motion, Reduce Transparency. Delete cookies from Chrome/Safari. Also, Settings -> Messages -> Expires -> 5 Minutes (instead of never).

You may also want to make sure you have a decent amount of free space. If your capacity is at 95% you will most likely experience an even slower 4S than what is typical.

What iOS version are you running? It might be cheaper to just buy a new one.


... And yeah, I wouldn't be surprised either, but I was intrigued by the prospect.

Some people swear that jailbreaking + installing various "speed up" tweaks (be it animation reduction or daemon killers) will help.

Personally I don't use those kinds of tweaks but it might be worth a try.

The fundamental problem sounds like it's that SQLite insists on not running in its own thread. This, coupled with the fact that Linux has no way to issue a true (i.e., non-blocking) write barrier, means that there is no way to implement write barriers in user space.

Whereas, if SQLite did issue I/O from a separate thread, one could easily implement an "async commit" function which guaranteed consistency and ordering but not necessarily durability (i.e. a write barrier). This would suffice I suspect for 95% of usage in applications: users will probably be OK if their phone loses the last few seconds of user input before an OS crash, so long as everything else is left intact.

EDIT: In fact, Postgresql has an option to permit exactly this behavior: https://www.postgresql.org/docs/9.6/static/wal-async-commit.... This is possible in Postgresql because, unlike SQLite, I/O does not run in the client thread.

> The fundamental problem sounds like it's that SQLite insists on not running in its own thread.

Pretty much all apps DO have SQLite running on its own thread, or at least a background thread pool of some sort. Android gets very cranky at the developer if they don't do this - there's tons of warnings, both runtime and in the form of lint. Database access, or anything involving fsync(), is exceptionally rare on the UI thread or any user-latency-critical thread.

It is possible to run SQLite entirely on a separate thread and avoid blocking the main thread on its I/O, though. It just requires that you're willing to ship the data you need to read and write between threads and handle the synchronization yourself. I've written applications that do this and it works very well, but it's certainly more work than if SQLite supported off-main-thread I/O natively.

Of course. But that's more work that I bet 95% of app developers are willing/able to put in. (Heck I've been doing this stuff for years and still have to spend time analyzing my code to make sure I'm not inadvertently holding locks in places where I shouldn't be.)

If f2fs is so superior to ext4 then why doesn't its creator, Samsung, use it on their phones?

It's significantly better on some benchmarks and significantly worse on others. There's no clear winner and ext4 is much more widely tested and has way more development time focused on it. Looking at only a few aspects of performance can result in either of ext4 or f2fs being seen as a clear winner but it's not representative of the whole picture.

I hadn't heard about f2fs before. Sounds useful even on a notebook with SSD.

It's been around since 2012, one of many Flash-specific filesystems. And sure, it can be helpful for flash disks, as other filesystems can too. But in some benchmarks f2fs performs worse than ext4, and in others shows it is not syncing to disk as often as it should be, putting data unnecessarily at risk. Oh, and the fsck program used to crash on any filesystem errors (hopefully they've fixed that by now (edit: they appear to have fixed it in March))

The performance boost in this case is mainly due to the nobarrier option, which is potentially dangerous unless you have something like a battery-backed RAID controller to act as a persistent disk cache. If you back up all your stuff, could be nice. (A similar hack I used to use to eliminate iops-related slowdown was to implement a tmpfs mount for small files that got written a lot, and just rsync them once a minute to disk)

One of the benefits of f2fs is that you can select the allocation and cleaning algorithms that it uses based on the flash chip in use, so before using it on your system, you might want to tune it to your application. https://www.kernel.org/doc/Documentation/filesystems/f2fs.tx...

FWIW I've been using it for the past year for a root filesystem of a Debian NAS running off a USB stick and I didn't have any problems.

Thabks. I was about to ask if Incould replace ext4 with f2fs on the SSD of my Ubuntu laptop. Fsyncs are still at the mercy of the SSD firmware AFAIK.

I found this https://ubuntuforums.org/showthread.php?t=2326934 I'm going to study it. Any other first hand experience here on HN?

I always wondered why my wife's Moto X is still buttery smooth after 3 years!! I love that phone. I have a Pixel but she still has her Moto X and it is really great at being smooth, plus the wave to wake and other gestures are really unmatched by the Pixel.

Love this write up/research! Hopefully it will teach the Pixel team a few things, or maybe they already knew but will now have the ammo to take to Product and change things!!

I suspect it is not the fsync directly that leads to lag. But that when a fsync comes through, Linux stops doing anything else for the duration.

Linux doesn't do no such thing. ext* filesystems do (especially ext3).

If you're calling fsync on your painting thread, you're going to have trouble hitting 60FPS no matter what. What a clickbaity post.

So this might be the reason why some mobile devices (like my ASUS TF700T tablet) degrade in performance so badly over time. Interesting.

It's certainly part of it, but there are other possible contributors. For example, the original Nexus 7 suffered from severe NAND degradation over time, as have some other devices.

Both of those devices are just the victims of Asus cutting corners with cheap flash and Tegra 3's not so great memory bandwidth.

Is there an Android sample app that show how to do I/O properly (off the main thread, and so on)?

I know there's some kind of system to prevent network I/O on the main thread... w... why... I dare to ask the obvious, w-wwhy isn't there a warning for simple disk I/O too?

Android's StrictMode can monitor and notify about both network IO and disk IO.

I found that, and looks very powerful, but that's a bit more than a simple switch in Android Studio, yet less than a warning on the Play Console, because you have to add it to your code (so not a static analyzer).

I wonder if this is why my Sony Xperia Z5 is so sluggish sometimes, despite having a Snapdragon 810 8-core CPU...

It's not about hardware. My new android has twice spec than my 4 year old android phone. Yet, I'm still using my old phone for some apps. New phone has better spec, but it still has laggy scroll.

Snapdragon 810 is a big.LITTLE architecture; you don't have 8 CPUs available at the same time. Only 4.

Check your running applications, the answer to sluggishness might be there.

All 8 cores can run at the same time on the 810.

You are right, 810 has HMP. I somehow missed that.

What does it matter?

> ... noatime,nodiratime, ...

noatime implies nodiratime. I shrug off when I see newbies copy pasting this to their /etc/fstab but this is in a mainstream Android device??

I personally hate options that both have their own behavior, and imply another options behaviour unnecessarily. I'll be explicit when I use these options, and call them all out. That way, when someone who's not intimately familiar with the code reads it, they can actually tell what it's doing at a glance.

I won't comment on your general remark but I don't think it is relevant to this specific case. "noatime" has a simple meaning and a clear name that represents its function: it stops updating inode access times. If you don't want your filesystem updating access times you use this mount option and it's done. Functionality of "nodiratime" is the one that's more specific & esoteric and subset of "noatime", thus it has a more specific name. It's not that "noatime" implies "nodiratime" , but more like "nodiratime" implements a subset of functionality of "noatime". Nothing to hate really.

Nexus 5X running rooted AOSP 7.1.1

/storage/emulated fuse /dev/block/dm-0 /data ext4 rw,seclabel,nosuid,nodev,noatime,noauto_da_alloc,errors=panic,data=ordered,inode_readahead_blks=8 (no nobarrier mount option)

In theory Google should be able to easily change the /data ext4 mount option, why didn't Google?

Kinda makes me regret getting a Pixel to replace my Moto Z. Except the reception was terrible on the Z. Oh well.

How is it that no one has made a worthy successor to the OG Turbo. That was a monster of a phone across the board.

Is your Pixel phone noticeably slower than the Moto Z?

The author mentions it should take like a year for it to become noticeable.

I forgot my sarcasm tags

Are you sure you mean the Moto Z? It was only out a couple of months ahead of the pixel...

I am sure. What's your point? That it's difficult to get new phones in such a short time? You're correct.

Although, to be technical it was a Moto Z Force Droid Edition...

Not difficult, just unusual. As someone who buys my phones I tend to hang on to them for about 18 months on average and have just got a Z...

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact