

A Problem Course in Compilation: From Python to x86 Assembly [pdf] - mahmud
http://ecee.colorado.edu/ecen4553/fall09/notes.pdf

======
jules
Very good work!

Does anyone have a guide to x86 assembly that covers the basics as well as
more advanced stuff like SSE, how the caches work (especially wrt multiple
cores), a model of how fast the instructions are in which circumstances? I'd
like something that is pretty comprehensive and modern (you shouldn't use x86
floating point anymore? great, then skip that and go straight to SSE) while
not being ridiculously enormous like the intel manuals. How do experts learn
this stuff? Really by reading intel's and amd's manuals?

~~~
haberman
For good, up-to-date information you really have to go to the source
materials, or learn directly from other people. Everything else is probably
going to be out of date.

For a basic description of the architecture, and detailed descriptions of the
opcodes, read the Intel architecture manuals themselves. They're big because
there's a lot to know. You don't have to read it all at the same time.
<http://www.intel.com/products/processor/manuals/>

For very detailed information on the memory hierarchy, read Ulrich Drepper's
"What every programmer should know about memory" series on LWN. Though even
this is probably going to fall out of date soon, as it was written in 2007.
<http://lwn.net/Articles/250967/>

For a model of how fast the instructions are, the best I know of is Agner
Fog's instruction tables: <http://www.agner.org/optimize/>

I'm not an expert, but in my experience architecture experts seem to learn
through experience and by absorbing knowledge from the people around them. You
have to be willing to conduct your own experiments, but you also have to be
cautious in the conclusions you draw because there are so many interrelated
factors.

There is a lot of tribal knowledge floating around about how to optimize
assembly. The Intel manuals capture a lot of it in the optimization reference
manual in particular. By reading that you'll learn the details of branch
prediction, things to avoid like partial register stalls, etc.

At Google (where I work) there are guys who are optimizing at the assembly
level. Some guy will post a specific routine he's trying to optimize to a
mailing list, and it will go back and forth between a handfull of expert-level
people throwing in ideas. There was recently a thread about the performance
difference between movdqa and movdqu, wondering what the performance
difference is. It seems to vary based on the architecture (Core2 vs Atom,
etc.) and whether the read is crossing a cache line etc. etc.

There's just a lot to know, and I doubt anybody knows it all.

------
sammyo
You should consider linking to an outline/synopsis with a link to the pdf.
This would make it more accessible to those(we) of short attention spans as
well as the very busy.

------
kingkilr
Jeremy actually did a tutorial on this at PyCon last year, video available
here: <http://pycon.blip.tv/file/3263942/>

------
ashconnor
Thanks for your hard work.

I breifly covered ASM in my first year of university but I don't think it
counts since we got all input with C.

