

Ask HN: What is hand-coded assembly language used for these days? - bkovitz

What is hand-coded assembly language used for these days?<p>To put that another way, in the current marketplace, what kinds of program are so worthy of optimization that it's economically sensible to have a human spend several days hand-tuning machine language to squeeze out every CPU cycle?
======
chadaustin
IMVU hand-rolled its SSE skinning loops and parts of the software 3D lighting
code, because only 2/3 of our customers have GPUs. We need to run well on
five-year-old Dells with Intel graphics. (Direct3D on Intel isn't as good as a
dedicated software renderer. We chose RAD's Pixomatic.)

In addition, look at how popular netbooks are becoming. The Intel Atom is an
_in-order_ CPU. Imagine a hyperthreaded, 1.6 GHz 486...

On the iPhone it's even worse. It's got a decent vector unit, but the CPU is
very slow. You'll see great wins by doing your 3D math yourself.

As we continue to become multicore, I could imagine somebody shaving a couple
cycles out of the core message passing routines, though you're almost
certainly bus bound in those situations...

Computers are getting smaller and people want more out of them; assembly
language is back in style!

~~~
chadaustin
Oh, I forgot about another huge case for writing assembly language... These
days, we have tons of different languages talking to each other, all in the
same program. In Mozilla, for example, JavaScript talks to C++ objects via
XPCOM/XPIDL. Since the C++ objects expect data laid out on the C stack in a
certain order and JavaScript has no notion of the C stack, there is a bit of
platform-specific assembly code in the middle that takes the JavaScript
values, places them on the stack, and jumps into the C++.

I'm guessing that most languages with built-in foreign function interfaces
(like Python's ctypes) have similar thunking layers.

~~~
robin_reala
Mozilla also use assembly for heavily used computationally expensive
operations like image decoding. mmoy’s work is worth looking at:

[https://bugzilla.mozilla.org/buglist.cgi?query_format=advanc...](https://bugzilla.mozilla.org/buglist.cgi?query_format=advanced&short_desc_type=allwordssubstr&short_desc=&product=Core&long_desc_type=substring&long_desc=&bug_file_loc_type=allwordssubstr&bug_file_loc=&status_whiteboard_type=allwordssubstr&status_whiteboard=&keywords_type=allwords&keywords=&bug_status=RESOLVED&bug_status=VERIFIED&resolution=FIXED&emailassigned_to1=1&emailtype1=substring&email1=mmoy%40yahoo.com&emailassigned_to2=1&emailreporter2=1&emailqa_contact2=1&emailtype2=exact&email2=&bugidtype=include&bug_id=&votes=&chfieldfrom=&chfieldto=Now&chfieldvalue=&cmdtype=doit&order=Reuse+same+sort+as+last+time&field0-0-0=noop&type0-0-0=noop&value0-0-0=)

------
luu
I did some assembly optimization for an internal RTL-level simulator. We had
~1000 machines on a three year upgrade cycle, i.e., we upgraded 333 machines /
year = $333k / year. Lets say I cost the company $200k / year. Several days =
perhaps $2k, so I'd only need to get a .6% speedup for it to be worth it, not
even including the cost of powering and maintaining our machines.

When I worked on it, our simulator was an order of magnitude faster than
commercially available simulators (Synopsis VCS and Cadence NC-Verilog), which
cost between $1k and $10k per license per year. I worked for a tiny hardware
startup; established hardware companies use a few orders of magnitude more
compute power than we did, so the equation is probably at least four orders of
magnitude further in favor of doing assembly optimization in a commercial
simulator.

~~~
bkovitz
Thanks! Especially for the numbers. I had not even heard of RTL simulation
before. Wow, extremely cool.

------
DarkShikari
Anything that's worth spending time to do fast is worth spending time writing
SIMD assembly for.

You can get 5x, 10x, 20x, or more performance increases just by using the
vector instructions given to you by the CPU. Until a magic compiler appears
that can make proper use of them (read: never), hand-coded assembly will be
critical for almost any application for which performance is critical,
especially multimedia processing.

~~~
gruseom
A compiler can't be made to make proper use of vector instructions? Why is
that?

~~~
DarkShikari
It is an extraordinarily difficult problem to transform scalar code into
vector instructions. The only way to get even passable output from a
vectorizing compiler is to write the code as vectors to begin with, such as
with cross-platform assembly tools like Orc.

And even then you'll often end up significantly worse off than if you wrote
the assembly by hand.

A run of Intel's compiler on the C versions of our DSP functions resulted in a
grand total of one vectorization, which was done terribly, too.

~~~
jrockway
The problem is that you used C, which doesn't have any syntax to represent
meta-information about the problem you're trying to solve. When you write out
C code to, say, add a list of numbers, it's hard for the compiler to optimize
that. But it's very easy for the compiler when you tell it "sum this list of
numbers".

------
a-priori
Signal processing algorithms on the phones made by a certain company I worked
at are mostly written in assembly. The cellular protocols, at least those that
use time-division (e.g. GSM), have strict real-time constraints, but mostly
they use assembly because every microsecond you can shave off those algorithms
is a microsecond you can sleep and conserve power.

------
maryrosecook
Joshua Block, Chief Java Architect at Google, says in Coders at Work:

"But for the absolute core of the system—the inner loops of the index servers,
for instance—very small gains in performance are worth an awful lot. When you
have that many machines running the same piece of code, if you can make it
even a few percent faster, then you’ve done something that has real benefits,
financially and environmentally. So there is some code that you want to write
in assembly language."

------
DCoder
Debugging and reverse engineering games.

When publishers/developers don't give a bleep, the fans take up the task of
fixing the bugs themselves. I happen to run one such project in my spare time
(for C&C: Red Alert 2), and it's amazing how much stuff is broken. It's not as
"serious" as other projects mentioned here, but still a reason to know ASM.
(And a good way to see bad programming practices in action :) )

~~~
smiler
People are still playing C&C: RA2?!

~~~
listic
Come on, people are playing all kinds of games!

I'm sure people still play Master of Magic, a strategic game from 1994. I've
been playing it on and off since it came out, and I began to think - is it
something wrong with me that I like this old game so much? I mean, surely
there must be newer games that are better. I showed it to my teenage brother
in the mid 2000's and he loved it too. I had my non-computer-gaming friends
blown away by the original Heroes of Might and Magic (1995).

I think the world needs better means for preservation of old computer games.

~~~
percept
<http://www.infocom-if.org/downloads/downloads.html>

------
andrewf
Going the other way around - can anyone think of an open source project that
_could_ benefit from some assembly optimization, and isn't? I'd love an excuse
to play with this stuff in a useful fashion.

(I love what I do, but my twelve year old self would be disgusted that I'm not
writing games.)

~~~
jermy
video playback and compression is sufficiently cool, and could benefit from
optimisation. Have a look to see if any help is needed in the ffmpeg tree at
the moment.

------
bilbo0s
Medical Imaging and Oil Exploration. A lot of the really fast packages are
using ARB Assembly instead of GLSL to minimize the number of instructions per
voxel. It adds up if you are doing 4D imaging in real time for instance.

------
nvoorhies
In addition to the optimization reasons, you also end up coding assembly by
hand to tickle features in the verification and bringup of new processors
and/or processor architectures.

Since a lot of the bugs therein may be dependent on a certain sequence of
instructions, doing it in a high level language doesn't make any sense.

------
reedlaw
Microcontroller firmware. There are many examples of AVR code in assembly on
the web. I learned assembly this way. It really makes sense when you're
working on bare hardware with no abstraction layers in the way. Also, it's
useful for time-critical applications such as creating video signals or audio
processing.

------
angelbob
GPGPU stuff -- that is, using your graphics processor for random programming
tasks. While something like <a
href="[http://www.nvidia.com/object/cuda_home.html>CUDA</a&...](http://www.nvidia.com/object/cuda_home.html>CUDA</a>);
reduces the need to write assembly-like code, it also reduces the available
speed substantially.

For that matter, CUDA (and ATI's Bare-Metal Interface, which is similar) is
more assembly-like than C-like in many ways. So even using the higher-level
available language is still pretty much like assembly.

You tend to only write these things when you're going to be running a _lot_ of
elements through, so almost everything you do in these platforms is inner-
loop, or you'd be using a different tool. So even small speed-ups tend to
matter.

------
mfukar
Contrary to popular belief, assembly is not only used for performance. Ask the
security industry for more info.

~~~
tptacek
That has more to do with reading assembly than writing it by hand.

~~~
JoachimSchipper
There are certainly people who write shellcode. As I understand it, people
have written shellcodes that use only bytes that happen to map to ASCII, are
obfuscated to bypass intrusion detection systems, and so on. I'm sure it
requires quite a bit of (specialized) knowledge.

~~~
lallysingh
Well, more like bytecode that doesn't contain a zero-byte, which'd stop a
string dead-on.

~~~
tptacek
In '96 when I wrote the Crispin IMAP server bug, I can't remember which way it
was but you either couldn't have uppercase letters, or could only have
uppercase letters, in the shellcode. I thought I was kind of badass for
writing that code. Of course, by '99, that was a triviality.

Just saying, it's not just NUL.

------
rythie
It's used in bits of kernel programming - often because there is no other way
to do the task.

~~~
tptacek
Modern kernels have very little assembly, outside of things like locore.
They've heavily abstracted away the things you'd normally write in assembly,
like modifying MSRs; also, so much of what you do now is simply memory mapped.

In all of xnu, not counting AES, there are ~17kloc in x86 assembly, most of it
in osfmk/i386 --- where no normal developer is ever going to go. There are
over 730kloc in C.

~~~
rythie
I wasn't implying there was a lot, just that sometimes that's the only way.

~~~
tptacek
I'm just saying, even as a kernel dev, you're unlikely to need to write things
in assembly.

------
Locke1689
I work in virtual machine development, so a portion of the interface code for
hardware virtualization I wrote in straight ASM. This is not (exactly) for
speed reasons, though; it's just impossible to touch the hardware at that
level in C. :)

------
daeken
Compiler intrinsics, binary patches and hooks (although EasyHook has made
assembly a rarity here outside of the occasional shim where odd calling
conventions are used), in-process debuggers, low-level bootloaders, hardware
initialization/management, various thunking mechanisms.

Others have covered the optimization side of things well so I won't repeat it,
but there are tiny fragments of assembly all over the place -- they hold your
system together.

------
vabmit
I've done it for cryptography code and cryptanalysis code. Specifically,
optimizing code to take advantage of specific instructions available in
certain processors or to make use of vector registers and instructions. I
wrote my programs in C and then went back and wrote assembly for parts of the
code that could deliver a significant overall speedup with hand optimization.

One place I did this was various RSA Challenge attack clients.

------
bkovitz
Thanks to all for the many informed and detailed replies!

I am now assistant-teaching a college course in low-level computer
programming. It's an excellent course: the students reprogram a children's toy
robot that uses the ARM processor. [http://www.amazon.com/Little-Tikes-
Giggles-Remote-Control/dp...](http://www.amazon.com/Little-Tikes-Giggles-
Remote-Control/dp/B000096QMU) They're getting up to speed very quickly on how
to get hardware to actually do stuff.

Yes, I actually left Silicon Valley to do grad school. I haven't given up the
principle of "do real stuff, see real results", though. I'm looking to design
a couple fairly small homework assignments consisting of optimizing some ARM
code. I want the examples to be real. Now mulling over which to do...

------
gte910h
Lots of time small embedded programs, especially on underpowered micro's, see
this sort of attention.

Additionally, low level hardware interfacing is often done with hand coded
assembly, because it is easier to "get right" on some crappy compiler
toolchains that you face, then C.

------
tptacek
Debugging and performance monitoring.

------
jonah
Inner loops of graphics algorithms - Picasa, Photoshop, etc.

~~~
onewland
Yes.

Tangentially related but not quite the same, I work at a company that makes
barcode recognition software and some of our most performance-sensitive areas
use assembly. It is mostly C, though.

------
scumola
I've re-written many perl things in C to speed up processing time - for me,
it's better than buying more/newer hardware. Also, for smallish scripts that I
invoke millions of times or a small perl script that does regexps, I can
rewrite those in C to boost speed as well. I don't do any ASM code anymore,
but C is a really good optimization step for me and my projects.

------
brg
Supporting backwards compatibility in vtable lookups. This is becoming
increasingly important with COM.

------
nirmal
Not a marketplace use, but the most recent use I've had for Assembly was in an
Atari 2600 programming class.

<http://nirmalpatel.com/hacks/atari.html>

------
njn
Some weirdos just plain like it more than high-level languages. One of those
weirdos is developing the Linoleum language:
[http://anywherebb.com/bb/index.php?l=D4JeGEdhacS6Srr6QLQgZpW...](http://anywherebb.com/bb/index.php?l=D4JeGEdhacS6Srr6QLQgZpWesA4&r=lYUhcug3l3hhX4s6lfk5)

~~~
zokier
I, for one, think that Assembly is just beautiful in its simplicity.

~~~
daeken
Having read and written assembly on a daily basis for years, I have to
disagree entirely. The only simple thing about assembly is that it happens to
map to machine code directly, but macros and quasi-instructions even make that
iffy. There are so many idiosyncrasies in every ISA, so many ways in which the
code you write has side effects. Assembly isn't just complex in practice, it's
complex in concept.

If you want simplicity, you look at lisps; homoiconicity is perhaps the most
elegant, simple concept known in computing. It may be more complex in practice
(many more layers above the bare metal), but in concept it's simply beautiful.

~~~
axod
Try ARM. x86 assembly is ugly and wart ridden. ARM is like a breath of fresh
air. Unbelievably well designed.

~~~
bensummers
And amusingly it's possible to write some significant desktop software, just
in ARM assembler.

<http://www.cconcepts.co.uk/products/publish.htm>
<http://www.cconcepts.co.uk/products/artworks.htm>

Although I wouldn't recommend choosing assembler for a destkop app, but it
made sense when they were written. And I still think Impression beats many a
word processor and DTP package available today, and it ran in 2MB of memory
with no hard disc.

