

Optimizing Real World Go - nteon
http://bpowers.github.com/weblog/2013/01/05/optimizing-real-world-go/

======
donio
I have just tried cross-compiling this with GOARCH=arm and the resulting
statically linked executable works nicely on Android. I am sure ps_mem can be
made to work too but this is easier and handy.

(Android has some other ways to get this sort of data too but the more tools
the better)

~~~
laumars
I really wish Google released an SDK for writing fully fledged Android apps in
Go.

~~~
yareally
I've been wishing for this for quite a while. Though if they didn't ditch
Dalvik as well, it would be more like syntactic sugar than anything. Native
code performance on Android is probably wishful thinking for now.

------
alec
Completely off-topic to optimizing Go, but you may want to look at smem - it
does what you implemented (including looking through a subset of processes)
but includes a few more useful measures of memory usage that help in the
presence of multiple processes from the same binary -
<http://www.selenic.com/smem/>

------
rartichoke
Nice post, I have a question on one piece of it though.

In your final version of splitSpaces() you are calculating the length of b - 1
in the condition of the for loop.

Is Go calculating len(b) - 1 in every iteration or is it smart enough to move
it out of the condition at compile time?

~~~
mseepgood
Go slices (and strings) know their length: <http://research.swtch.com/godata>
So it's not a computation, just a struct member access. And len() is a
builtin, not a real function call.

------
donio
Isn't bytes.Fields what you were looking for with splitSpaces?

~~~
wolf550e
<http://golang.org/src/pkg/bytes/bytes.go?s=6894:6924#L282>

It uses `unicode.IsSpace` which might be slower than necessary for this use
case (after all, how likely is the proc filesystem to use \u2000?). If this
were the bottleneck and the program wasn't IO bound anyway, I bet someone
could hand-code something clever that skipped multiple space and/or tab
characters at a time.

~~~
nteon
bytes.IndexByte is clever in that way, using SSE instructions:
<http://golang.org/src/pkg/bytes/asm_amd64.s>

I tried using it
([https://github.com/bpowers/psm/commit/55bdd3f51c9c61a9247fec...](https://github.com/bpowers/psm/commit/55bdd3f51c9c61a9247fec14601ea776aa547259)),
but it wasn't very helpful, I think because the lines in /proc/$PID/smaps are
relatively short.

~~~
wolf550e
IndexByte is overkill. I meant something like reading four bytes into a
register, comparing the value of the register to 0x20202020 and skipping four
bytes.

I see that you're using ReadLine(), this has to read the input and look for
"\n". As the person who wrote gnu grep said, avoid splitting the input into
lines.

After looking at: <http://lxr.free-
electrons.com/source/fs/proc/task_mmu.c#L549>

I suggest the following: For future proofing, first read the lines of the
first mapping, and verify that: 1\. The "Pss", "Private_clean" and "Swap"
lines come in this same order. 2\. The numeric values do not begin earlier
than byte 17 in each line (i.e. "KernelPageSize: " is still there).

If these assumptions hold, go to the fast-path code. If not, use your existing
code as the safe code path (and output a warning that says that since
/proc/*/smaps format changed you program's code needs maintenance for
performance but probably not correctness).

In the fast-path, do not split into lines. Instead, use Boyer–Moore to look
for "\nPss:", "\nPrivate_Clean:" and "\nSwap:" in the input (after pss, lookup
private_clean, after private_clean lookup swap, after swap lookup pss). In
each of those, skip to byte 17, and fast-skip spaces. Then read digits until
"\n" and perform the next string search.

If you verify that not only did the order of the lines not change but no new
lines were added between them, you can hard-code the offsets of
"\nPrivate_Clean:" and "\nSwap:" from "\nPss:" and not lookup those. Then you
only need to lookup the next "\nPss:" (because the file path is variable
length).

~~~
nteon
excellent suggestion, I will definitely try something like that soon.

There are in fact only 2 variable sized lines - the first VMA info line and
the last VmFlags line. In the fast path the middle hunk of map info can be
accessed as a single []byte of 392 bytes, with constant offsets for the Pss,
Private_* and Swap values.

------
willvarfar
Lovely! Thank you for sharing. I hope this gets into the standard packaging so
it doesn't die unknown.

------
cmwelsh
The viewport is set incorrectly in my iPhone. I can't seem to zoom out either.

~~~
nteon
sorry to hear that. I believe I've fixed it, but don't have any iDevices to
test with.

~~~
cmwelsh
It works perfectly now, thanks. Great article.

