Don't go there. There's really not a lot of work. But if you must, low level is more a calling than a learned skill.
You probably shouldn't be learning assembler. First, compilers are really quite good. Yes, it's possible to beat them (I do) but generally not by much. And not by much ain't gonna put bacon on the table. You can probably get what you need from gcc inline asm() calls. Take a look at the linux sources and figure out why and when assembly is used there:
Secondly, writing in assembler is not low level. You just think it is. You should really be understanding caches and you can improve your cache performance in C.
Anyways, unless you deeply know what's going on inside of the microarchitecture of a modern superscalar, out of order, speculative, renaming, μop-cached, hyper-threaded, multicore beast then you shouldn't be fooling yourself by writing in assembler.
Unless you've really read Intel's Intel 64 and IA-32 Architectures Optimization Reference Manual (and ARM's Cortex®-A72 Software Optimization Guide) and meditated on the suras of Agner Fog's Microarchitecture you won't even know what's going on with something as simple as mov RAX, RBX.
Look, most compiler writers don't even know this stuff (Intel C Compiler yes, llvm occasionally) and frankly, it isn't very useful because Intel spends a billion dollars a year to make your bad x86 code run reasonably fast. Consider a switch statement which compiles into an indirect branch, jmp reg. That branch has to be predicted by the BTB before the jmp reg instruction is even fetched and that's really hard to do. Every year they get better and better to the point that you're not even aware of it. But if you want to the help the CPU out, you could put a UD2 right after the jmp reg. This is insanely hard to understand and will help very little.
I agree not to go into low level programming expecting a wealth of job opportunities to suddenly open up, but I wouldn't tell people not to go there at all. Not only is assembly really fun to play around with, I feel like I've gotten a lot out of the bits of assembly I've read/written. Even though I've never written assembly code for work, being able to read the disassembly in gdb has come in handy before. Also, a lot of the subtleties of C/C++ never quite clicked for me until I had an idea of how the generated assembly would work.
It's similar to learning a functional language. I have no idea if I'll ever use Haskell professionally but learning a bit of it has been a good way to see problems and logic differently. I think both assembly and Haskell have made me a better programmer, even if I never become truly proficient in either or use them directly in my job.
Totally agree. When I taught x86 assembly in college (in the 90's), the goal of the class was to give the student a better idea of how things worked under the hood to improve their C programming, not to be an assembly programming.
Just today, I helped a coworker debugging a segmentation fault that occurred in a library where the debugging info was stripped out by looking at the assembly code.
> Just today, I helped a coworker debugging a segmentation fault that occurred in a library where the debugging info was stripped out by looking at the assembly code.
Being able to debug applications which don't have debuginfo (particularly optimized builds) is highly valuable though.
It really does give you insight into some of the lower level stuff. It won't give you the full picture but it will give you ideas about how you might be able to take better advantage of your hardware by changing how you use data.
As an embedded programmer, I wholeheartedly second this.
I'm in an area where there are relatively few opportunities to do close to the metal work. The ones that do, don't pay any more than backend web developers make. In fact I've come to assume that they can offer less due to either the intrigue of the work, or because you often compete with computer/electrical engineers who aren't expecting outsized developer salaries.
Likewise, while I have loved the skills I've learned in embedded and low-level work, working at an embedded shop has taken a lot of the joy out of the learning. You trade off an inherent rewarding development environment with the realities of developing against hardware, where project cycles are long, you're often dependent on horrible vendor APIs and support, and where everything moves much slower.
The counter to that is the type of work I believe you're describing, which sounds like HFT optimization on Intel. I got into embedded because I wanted to eventually end up there, but again, it's an extremely limited market. And as someone who has indeed dabbled in the Fog optimization manuals, you quickly learn to realize how much the ROI of assembly isn't worth it. The future of speed isn't going to be going lower down the programming stack. It's going to be in new hardware: FPGAs and ASICs, writing RTL for system specific CPUs. And that is highly exciting, though it necessitates a pure programmer has to learn hardware at a deep level as you imply.
As for me, I concur with the statements others are making about using my skillset for security research and reverse engineering. Working in embedded development has bored me to tears, and taken the magic out of learning to love the skills themselves. I'd much prefer to work at a pace that isn't limited by the pace of a cross functional team of software and hardware engineers.
Learn the skills, yes, completely. The knowledge is worth it. But for the love of God, don't get expect to get joy out of working in it.
Could not have said it any better. Embedded development is dominated by hardware engineers who make decisions very often without the software development teams input, placing embedded folk square in the target of management angst as well. Choosing Broadcom chips without drivers, 3rd party experimental hardware who wont release data sheets, choosing USB chips that don't support host mode, miswired memory interfaces that mismap fpga access, FPGAs that cause spurious bugs when consuming more power than necessary in certain modes, all lay squarely on the embedded devs shoulders and can manifest at first looking like pure software problems with the schedule pressing. Not a pretty world.
Are you me? Honestly, that all sums it up even better. In an embedded role, you are always a second class citizen. After 6-12 months of planning by the hardware folks: "The hardware is all done. Where's the software? What do you mean it's not finished, you had all year...?" You're a complete afterthought, and it's only amplified when working with a team that doesn't really "get" that without access to the actual, finalized hardware you're extremely limited in capabilities. It only gets better when they decide to go with one-off vendors who have a single product support engineer and its pulling teeth any time something goes wrong. It's often hardware related, but who cares -- you're the guy at the end of pipeline who's tasked with making it work, so yeah, you feel the wrath.
I'm only extra cynical because I'm dealing with that right now. As I have many times before, though. It's part and parcel.
OK, I've worn both hats - chip designer and embedded systems programmer - it makes me better at both - I get to be thye chip guy who understands software and the software guy who understands hardware - it means I get the interrupt setting/clearing logic race free and make sure all the registers can be read, and can make tradeoffs that minimise the amount of hardware we have to build.
The big issue is timing (and latency) - the chip guys are working on long timelines - they're already working on the next chip when the first chip's silicon comes back, their investment and attention is elsewhere, the software guys aren't going to rev up much before real hardware is in their laps, and certainly aren't going to spend any time on that second chip while they're still wrangling the first one - it's not so much a cultural gap between the groups as a gap in time
yes, but it's not cheap and is essentially a second parallel tapeout path that slows your chip design time - if you're building a CPU you likely build a high-level software model and code to that (with some register-virtualisation layer), then test against that as part of the chip DV (QA) process
In defense of HW engineers, they're often staring at a cost of goods spreadsheet.
Earlier in my career, I had a driver that compiled to about 8500 bytes. I noticed the part was spec'd at 16K and wondered if it could get re-spec'd at 8K if I got it down to 8K. Yep and I got mad props from the HW types.
Understood, but when they stare at that COG sheet they should also have the prescence of mind to look for hidden costs. A toolchain and IDE with a C compiler is going to cost much less in the end to the project than a 8K chip with an assembler and no debug environment/support tools will save it. It's very easy to be penny wise and pound foolish in this area.
Please don't discourage people without having the full view.
There are many good low-level teams with software first people. At least in the bay area there are many exciting projects for people with this skill set. Self driving cars (Waymo, Tesla etc) , VR / AR headsets (Microsoft, Facebook etc.), Phones (Qualcomm, Google etc.) , Wearables (Apple, Fitbit) GPUs (Nvidia) and startups AirWare etc..
Sure my experience may be skewed too but there are definitely great jobs with rewarding experiences of shipping tangible products for low level programmers. Not many people get the joy of shipping something your friends and family use and literally say "wow".
There are plenty of reasons to know assembler besides performance. I am completely ignorant of all the pipelining mechanics you discussed but have gotten paid to use assembly for years.
1. Writing embedded software. You're not going to get very far if you don't know how to set up your memory, text section, etc.
2. Debugging embedded software. Good luck interpreting your JTAG output if you don't have a good knowledge of assembler. Even if you get to use gdb/Wind River Debugger/etc., you will want to be able to tell what is happening. Symbols only go so far. A backtrace will get messed up and you'll need to look at the stack to figure out what happened. An access violation will occur and you'll need to figure out what line of code it corresponds to.
3. Debugging without source code. Same as #2, even more difficult.
3. Reverse Engineering. You are trying to figure out what is happening in a proprietary driver binary blob (such as NVidia's Linux driver). You are trying to figure out how to get your software to work with an ancient piece of software whose author has long since been out of business. Even if you have the cash for Hex-Rays decompiler or can readily use the Hopper decompiler, this code is simply not high-level enough to interpret without knowing assembly and having a deep understanding of the stack and memory.
4. Vulnerability research / hacking. You are writing an open-sourced client for a proprietary protocol, looking for a memory corruption vulnerability, etc. You can pretty much forget it unless you know assembly.
I've been telling people since the late 00's that starting my career in embedded was the single best thing that happened to my career. I quickly follow that up with the fact that leaving embedded was the second best thing that happened to my career. I've dreamed about going back, because I do miss it, but the jobs aren't there (in my area and especially now that I've fallen in love with full-time remote work).
I know a hell of a lot more about what our code is actually doing up and down the stack than my colleagues, including most of the managers I've had and "Principals" I've worked with and so on. It helps, a whole lot, in fact, and I find that it's personally rewarding, but it doesn't amount to much more than being the person who gets asked the challenging profiling/performance/optimization questions.
To add to this, most micros these days are ARM based which are specifically designed to not need assembly in their boot code. I basically never use it anymore. That said, you should have some understanding of what C code will map to what assembly for the purposes of memory management and performance.
Also: to say its a calling is accurate. I love what I do because what I create is physical and interacts with the physical world. I can hold my projects in my hands and say "I made this". I've done web and desktop stuff and can say that this work is significantly more fulfilling to me.
Someone has to write those compilers. Those with the skills to do so are vital to the industry. If we want more nice high-level languages and improved operating systems, we need new people entering this area of software engineering. Otherwise, where is the next iteration of systems software going to come from?
LLVM backend developers generally write in TableGen. Their assembly knowledge to say nothing of their microarchitectural knowledge is limited.
That may sound harsh but it's actually the point of a good compiler design (which LLVM has). You can write an optimizer pass without understanding the C parser. You can write a backend without understanding register allocation. You can get something working without really getting it tuned. You can specify superscalar machine instruction scheduling latencies without really understanding instruction scheduling itself.
I'll add that while full-time native work may be in short supply, it's an awesome skill to have in your belt. You many not need it often but when you're suddenly up against limits where current technology doesn't suffice(Java/Python/Ruby/etc) it can seem like magic to break down into the lower stack and get 10-50x performance boost.
Having that escape hatch available is incredibly valuable and for me distinguishes the engineers I hold in high regard.
> Anyways, unless you deeply know what's going on inside of the microarchitecture of a modern superscalar, out of order, speculative, renaming, μop-cached, hyper-threaded, multicore beast then you shouldn't be fooling yourself by writing in assembler.
Assembly is useful for understanding early initialization code for embedded platforms (startup file). Depending on how aligned the vendor's board support package is with your actual target, you might have to touch this, and since this code runs at a time where there is neither stack nor heap (can't have a stack unless the stack pointer has been set up), this is easier in assembler than in the stackless C variant that you might coerce your compiler to support. The same applies to debugging issues bootloaders, where you'll have to inspect how the reset vector looks.
> First, compilers are really quite good. Yes, it's possible to beat them (I do) but generally not by much.
Genuine question: what would you quantify “not by much” as here? In my experience, it's not uncommon to see 2x+ speedups from well-written software pipelined hand-optimised assembly in hot loops. (Especially so for in-order processors, like you might see in embedded applications or as a LITTLE core in your smartphone.)
2x in a hotspot is not much compared with better cache management in the rest the program. But if that 2x is important, great. I know this sounds really boring but you should write it in C first and then find that hot loop in a profiler like VTune and then rewrite. 2x of something really unimportant is still unimportant.
Also while software pipelining is possible on something like Haswell ... its limited register set makes it a limited technique. Modular variable expansion is tough with not so many registers but renaming does help somewhat.
I think tools like VTune are awesome for finding hotspots and reading assembler is like reading Latin. But programming in assembler? I think it's best to disassemble and rewrite your C accordingly.
I should have mentioned this in the first post: if you're not a VTune ace, if you're not looking at the Intel PMRs and scratching your head, you probably should not be writing in assembler in 2017. Also, VTune deals with C (and Java) quite nicely (just not on OS X).
Anyways, you may get 2x+ on something but that approach won't work with a GPU. Similarly, Apple doesn't provide microarchitectural information on the a10. Nvidia doesn't on Denver. Even though assembly will still be with us, this low level approach is going away. Apple, Intel, Arm, Nvidia, ... really want you to write in C.
You probably shouldn't be learning assembler. First, compilers are really quite good. Yes, it's possible to beat them (I do) but generally not by much. And not by much ain't gonna put bacon on the table. You can probably get what you need from gcc inline asm() calls. Take a look at the linux sources and figure out why and when assembly is used there:
http://stackoverflow.com/questions/22122887/linux-kernel-ass...
Secondly, writing in assembler is not low level. You just think it is. You should really be understanding caches and you can improve your cache performance in C.
Anyways, unless you deeply know what's going on inside of the microarchitecture of a modern superscalar, out of order, speculative, renaming, μop-cached, hyper-threaded, multicore beast then you shouldn't be fooling yourself by writing in assembler.
http://blog.erratasec.com/2015/03/x86-is-high-level-language...
Unless you've really read Intel's Intel 64 and IA-32 Architectures Optimization Reference Manual (and ARM's Cortex®-A72 Software Optimization Guide) and meditated on the suras of Agner Fog's Microarchitecture you won't even know what's going on with something as simple as mov RAX, RBX.
Look, most compiler writers don't even know this stuff (Intel C Compiler yes, llvm occasionally) and frankly, it isn't very useful because Intel spends a billion dollars a year to make your bad x86 code run reasonably fast. Consider a switch statement which compiles into an indirect branch, jmp reg. That branch has to be predicted by the BTB before the jmp reg instruction is even fetched and that's really hard to do. Every year they get better and better to the point that you're not even aware of it. But if you want to the help the CPU out, you could put a UD2 right after the jmp reg. This is insanely hard to understand and will help very little.
Don't go there.