Hacker News new | past | comments | ask | show | jobs | submit login
3.7X speedup from removing a call to sleep (webkit.org)
150 points by nexneo on Sept 2, 2012 | hide | past | favorite | 67 comments

I am the one who originally added this sleep call to WebKit, when we first imported TCMalloc to use as our custom allocator. It was indeed there for a reason, but that reason is not applicable to WebKit's allocation patterns. TCMalloc was designed for a server workload, over time we have adapted it more to the unique needs of a browser engine. This change may help other operations, but probably not as much as the GC benchmark in question.

The person making this change seems to have not read the original comment either. It sounds like your change made the previous iteration of that code faster.

This reminds me why I hate comments. They tell you what someone once thought the code did. They don't actually tell you what the code does. You should always trust actual code and actual measurements above comments.

The original author of this comment (not me, I just imported the code) believed that it improved performance on a given server-side workload.

The person who removed the sleep, Geoff Garen (a colleague and a friend), is a good programmer. I am confident that he read the comment before deleting it. However, his actual performance measurement trumps the comment.

BTW, here is Geoff's snarky response on reading the posts here that mention the comment above the sleep() call: <http://www.quickmeme.com/meme/3qqgwz/>. I agree with him.

True story...

Client gives me a Help Desk Ticket. User claims batch job use to run in 5 minutes but now runs for hours. The logs confirm this. The commit logs show that an offshore programmer forgot to remove a 10 second sleep command from inside an iteration (for debugging I presume) before promoting to production. I removed it and got a 100X improvement in throughput.

My client said that now his user loves him; what did I do to fix it so fast?

When I told him that I removed the Sleep, he said, "No! No! No! Change it to a 5 second Sleep so I have something to give him the next time he complains!"

" that an offshore programmer forgot..."

Unnecessarily mentioning a characteristic of an individual is a sign of selfish, biased, and/or malicious thinking. Do you mention "American programmer" whenever recounting their bugs? It appears you have an agenda of promoting negative stereotypes, presumably in your favor. This is lame. In the future, you'll appear more credible if you omit this bigotry from your comments.

I am certain there are thousands of programmers, male, female, gay, straight, onshore and offshore who are much better than you (or me.)

Hogwash. Haven't you ever read a novel or a news story or any kind of narrative? Details add color (uh-oh, I'm sure that'll get you riled up) and they help the listener understand context and picture the story in their head. How boring would life be if every time you read or heard anything negative about someone, it only referred to them as a human?

You can't solve prejudice and bias by pretending that people's race, gender, nationality, height, weight, shoe size, etc. don't exist. They do, they always will, get over it.

Sure, good stories have details, but a person's choice of detail reveals how they see the world. The comment's mention of "offshore" clearly serves no purpose beyond encouraging stereotype-based thinking. Knowing this programmer was offshore is as irrelevant as knowing their sex or their eye color. We've generally (and I think rightfully) exorcised this language in reference to other characteristics, why not this one?

In my experience, programmers who emphasize that "offshore" programmers are bad seem insecure about their own skills and job stability. The same is true of sexist men who go out of their way to mention when a mistake was made by a woman but never explicitly highlight when it was a man.

I understand what you're saying, but it's not bigotry if the "stereotype" is true.

"offshore" is synonymous with "cheap(er) labor" in the US programming world, and is an accurate statement since that's what it's generally used for ("let's send it to XXX and it will cost us 1/10 as much")...

How many times do you think outsourcing happens in the US because they are looking for higher-quality code? 1%, 10%, 50%, 90% of the time?

"cheap(er) labor" is also synonymous with a bad product/service, because you often get what you pay for.

Are there exceptions to all this? Yes. Sometimes you can't find good quality coders in the US, but I doubt this is why outsourcing happens most of the time.

He might have generalized by adding that detail ("offshore") but that's not biggotry IMO as much as it is a flavoring.

Offshore is, most of the times, about getting the same quality of work at cheaper cost. How is it cheaper? Not because it is shoddy work, but the economics in the other country work out that way. How can you equate cheaper price = lower quality. If you are looking for people in US alone, then cheaper = lower quality. Since another factor of the equation, the country, is changing, that is not the case.

> Offshore is, most of the times, about getting the same quality of work at cheaper cost.

I have yet to experience that joy.

Rather, offshore means lots of unnecessary documentation, train someone to do my job (promoting me to "lead"), watch them get it wrong, try to coach them onto the correct path, high turnover (so I have to train and coach again in a few months), do a complete rewrite at the 11th hour, then be blamed by the bean counters that I sabotaged the effort somehow.

No doubt the same thing happens in house. But with in house, if we find a good egg, we get to keep them, get some ROI back from the learning curve.

Offshore is, most of the times, about getting the same quality of work at cheaper cost.

I see you've never worked with offshore development.

I see you've never worked with GOOD offshore development.

Offshore also doesn't mean not American...

> Offshore is, most of the times, about getting the same quality of work at cheaper cost

The problem is that the person who evaluates quality has no real understanding of what is being evaluated. This leads to price being the deciding factor. Most of the code I've seen coming from price-driven software factories is awful.

Successful outsourcing stories are few and far between; stories of failures are a dime a dozen. The reason is exactly as you say: companies are not judged on the quality they deliver.

In fact, the same problem has plagued the IT industry since forever. Almost no company successfully competes on quality, because the customer chooses the cheapest offer in the majority of cases.

And here is the crux: define the quality of a delivered software project and put a number on how much higher quality would have saved the customer. Nigh impossible.

The only solution seems to be going into a specialty niche where high quality is required, so any incumbent competitor will fail, unless they deserve to win, because they deliver your level of quality.

And is that what happens? You get same quality code at cheaper prices? Like Wal-Mart gets cheaper made goods from China that last as long as their more expensive domestic counter-parts? Or how Dell has retained it's customer satisfaction ratings by using Indian call-centers?

I think it's been established that when you outsource for cost, you get back a product that cost you more in the long run.

Again, there are exceptions to this, but that's not what we are talking about.

No, it means that they went cheaper than they ought to.Of course, you get what you payed for. But the scale is different. Just because it is cheaper, it does not imply poor quality.

The race is not always to the swift, nor the battle to the strong, but that's the way to bet.

  -- Damon Runyon

I think the difference between the parent post and the grandparent post is experience with "outsourced to overseas" code.

Having worked extensively with outsourced code (4+ years) I can tell you without a doubt the vast majority is poor quality crap, more so than "insourced code" by far. But... Everyone produces bad, buggy code at times.

>Offshore is, most of the times, about getting the same quality of work at cheaper cost.

If this is true, then we're all in deep sh*t, as we'll all be working for Chinese-level wages soon.

That's the eventual result of globalisation, yes.

So you think that it's fine to pay someone less for the same work?

Please do read my comment. If you compare the pay in dollars, of course it is less. But the cost of living, the local economy, etc probably puts them at higher pay or equivalent to what people in US would get.

lower cost

Really? Because based on the story edw519 gave, it sounds an awful lot like the offshore programmer had just made a silly little mistake while the "client" was the one who was actively incompetent/manipulative. If anything, his choice of details presents offshore programmers in a good light and American programmers in a bad light.

That sounds like someone who has never got that pause or "oh" when details of their race/gender/sexuality has been revealed.

What makes you believe that the author of the parent post mentioned the "offshore" characteristic unnecessarily?

When interpreting the story, isn't it important that readers understand that the programmer who added the call to sleep was far removed from both the help desk and the end user of the software and was also unreachable at the time the call was discovered and thus that the author, who worked at the help desk, had to presume that the call was left in the code mistakenly? It's only later in the story that readers (and the author) learn about the ticket-submitter's unusual preferences for satisfying the end user. And it's only then that another possibility dawns upon readers: that, perhaps, the "offshore programmer" knew more about making the stakeholders look good than initially given credit for.

If the author of the parent post is not allowed to reveal that the programmer was working "offshore," how does the story get told at all?

Would "contractor" or "consultant" or "field tech" been unnecessary? Hardly. Offshore doesn't necessarily mean outsourced, it could just as easily mean a satellite office in Europe.

I've worked with good and bad programmers, both on-shore and off-shore. What seems to determine the quality is the standard you set when hiring. If cheap labor matters more than quality, the result is accordingly (and always ends up costing more in the long run).

I suspect the presumption that off-shore means lower quality comes from past experience working with an off-shore team selected based on their perceived lower cost. I've found that when you set the bar high for off-shore hires, regardless of cost, you end up with high-quality people.

I think you've gone and assumed that offshore means Indian or Filipino.

My guess is that he was having a jab at the offshoring mentality.

"Competence is inversely proportional to distance." Or perhaps, /perceived/ competence is.

Offshore could imply a number of things. Let's be nice at first and pretend that this means "across the Bay," or maybe on Alcatraz.

My best guess: The offshore person does not have the development culture present in the "mother ship". Best practices are often stuff that you learn in the hallway, from casual conversation, or watching over someone's shoulder. Attachment to the product's quality is probably lessened, too -- distance does this, even in the days of ten-millisecond-scale pings to the other side of the planet -- and it's easier to "go to lunch" on a problem and worry about it later when a floor-full of engineers aren't actively crowding your cubical about a lame checkin.

Hell, I have enough problems getting two /adjacent buildings/ to talk to each other.

Now, add a time-zone difference that further impedes communication. Add siestas, and mismatching holidays, and language barriers. There are more.

Even before we add in a different country's culture, we have severe issues with the average offshore developer's perception of what's important and how to work effectively.

I'm not going to say "Indian / Pakistani / Kiwi programmers can't code their way out of a paper bag", but when you add the stereotypical stuff to all of this (bad management, poor hiring practices, general slap-it-together attitude), it's /bad/, and not unwarranted to mention, even in passing.

Personally, I didn't take it that way at all. My team interacts very often with another team based in India. It has nothing to do with outsourcing, that's another location of our global company. So, to me, the term "offshore" in this context brings to mind problems of location and time difference, but certainly not some image of a cheap outsourcing shop. The parent post could have just as well meant it this way; in fact, it seems likely.

Over react much?!

I want a social norm against comments like yours, which say "you're engaging in crime-think pattern #122" instead of "you're wrong". I can't fathom why people fail to realize that such comments kill online discussions. How is one supposed to argue against an accusation of crime-think?

It's interesting how spinlocks are making a comeback in user space code. They are widely used in kernel code, but their use has previously been discouraged in application code. As this example illustrates, it's pretty easy to get them wrong...

Spin locks are discouraged in FreeBSD kernel code too, except for very low-level areas (serial console, pmap, and interrupt handling come to mind) where you can't safely ask the scheduler to do anything for you.

That said, our sleep locks are actually adaptive locks, so on SMP systems they'll spin for a bit before giving up and asking to be descheduled.

They are encouraged in the Linux kernel in places where it is nessesary to lock for a very short time. Using mutexes takes a bit of overhead, so they can't give you very short sleep times; that is when spinlocks have to be used.


In FreeBSD the vast majority of sleep lock acquisitions are uncontended and end up being a single compare-and-exchange -- very fast. Spin locks are slightly more work, since we disable interrupts during them.


Spinlocks in linux are the "last resource locks", usually in kernel code only in situations where you can't have the scheduler kicking in but can have a race condition (in SMP systems usually)

So it's basically interrupt handlers.

It sounds smart but it occurred to me that that I can only think of two situations where a spinlock needs to be promoted. 1. Too many cores access that tiny portion of the system at the same time or 2) a bug(like using a spinlock where something else should be used and a more old-fashioned bug).

Will 1) really happen often enough to not use simple spin locks?

As an example of where you might use a user space spinlock, consider:

If you have:

… a multithreaded process with shared resources that need locks

… and for the uncontested case your spinlock is much faster than your next fastest lock

… and you have resources for which the probability of a contested access is very close to zero

… and the locks are only held briefly (relative to compute quantum) (especially if they are held shorter than thread context switch)

You may find that the speed gained by using the spinlock is a win even if once in a while you end up burning your compute quantum because a thread got suspended holding a spinlock.

I agree with this assessment. Where the probability of collision is low, and spins are in that case of short duration anyway, and especially if test and set intrinsics are available, spin locks in userspace can be very valuable. They also, besides saving scheduler cycles, require only a common word access, even while that implicates barriers and cache lines.

You need to be careful though.

Indeed spinlocks should be used carefully. But there are some places where you can't avoid them.

Anyone wanna supply some context? I have no idea what is going on here

It was "only" on one particular benchmark, still it shows that profiling often give you surprising results that nobody familiar with the code anticipated.

Why did they roll their own spinlock implementation in the first place? They could encapsulate the lock primitives provided by, and optimized for, each platform: CRITICAL_SECTIONs on Windows, futexes on Linux, and pthread_spin_locks elsewhere. iOS and OSX have some Mach-specific spinlocks (OSSpinLockLock?), but I don't have any experience using them.

So if I understand this correctly the 3.7x speedup is a speedup in garbage collection and will have a positive effect on the speed of the browser which is nice.

But what sort of effect will this have on page rendering? Anyone in the know care to comment on the specifics?

edit: improvements to my shoddy wording.

My guess from spending only a few days with the WebKit source is that it will not effect page rendering at all.

Possibly not desktop performance at all since I believe GC is done in a separate thread and (from my understanding) you never wait on it, unless you are really memory constrained.

Could make a difference on IOS where I believe you actually don't have this GC memory scavenger thread.

Anyway, the source is quite readable and everyone working with WebKit on one level or another could benefit from taking a look. At the very least it gives you a feeling for the tremendous effort from a huge number of people that the WebKit project really is.

Cool, thanks for the insight.

The speedup sounds like it is very specific to GC, apparently the WebKit GC relies on this component heavily.

I would guess this is relevant only for GC in Safari, since it uses the WebKit JS engine, and not Chrome (which uses a completely different JS engine, V8).

It's also likely only a speedup in a specific GC benchmark. If it had helped in say the SunSpider benchmark, I'm pretty sure it would have been fixed a long long time ago.

This code is actually not that specific to GC - this particular spinlock is used by our custom malloc implementation. It's just that our GC creates a different concurrency pattern than most other workloads for the allocator.

Are sleeps() always necessary in multithreaded programming?

Use of sleep() in multithreaded programming is a bad code smell. There are other safer, more accurate, ways of suspending execution of the current thread until a particular event has occurred, though they may take a little more work to integrate into a program than sprinkling sleep's.

Spinlock implementations sometimes use sleep() calls in them to yield the current thread to others that may be ready to run but are currently being blocked. It's still a bad code smell though because spinlocks, at least in user space code, are used to avoid the overhead of yielding to other threads. There are also other locking mechanisms, like mutexes/futexes, that will accomplish the same yielding behavior if it's really needed, like if it's anticipated that the code waiting for the lock may be held up for a while.

Generally, unless I'm working on super low level stuff, I'll avoid sleep()'s at all possible. If I find that I need to spinlock (sleep() to avoid race conditions), there's something wrong with my implementation.

There was a long comment before the sleep. It seems likely that the sleep was there for a reason.

To be fair, it's a sleep(0), which I assume is just the easiest way to get the process to context switch. I assume this is a speed improvement only when locks become ready sooner than a context switch.

The real fix would be to make context switches faster, this is just a data-driven tweak rather than a general solution.

Awesome!! I wonder when this is going to get pushed to mainstream Safari.

The patch was committed on 2012-05-17, so it should be already present in Safari for OSX 10.8.

Well, this was four months ago so it might already be in Safari

But Safari for Windows will be forever stuck without the fix?

Reminds me of "the speedup loop": http://thedailywtf.com/Articles/The-Speedup-Loop.aspx

The code is, appropriately, in the WTF directory.

What's with the old comment "// Sleep for a few milliseconds", followed by the sleep function?

Reduced the number of lines in the code base AND improved the product? nice.

Double nice. A co-worker and I once had a joke where we wanted our code counts for the year to be negative. No one tracks code counts, of course, but it was a fun idea.

He eventually replaced a particularly ugly and complicated system with a much simpler system, and thus he met his goal of going into the negative. Less code is better code.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact