I rather fancy getting back into writing asm, just to sharpen that skill. I haven't really written any since MASM 6.x on DOS, 20-ish years ago. I actually found it quite enjoyable and it's surprising how complex an application you can write from scratch in assembly without it becoming unmanageable, so long as you get into the right mindset and make effective use of macros.
Of course, any significant piece of assembly code is likely to contain considerably more bugs than just about anything else of the same complexity. You'll also experience a lot more segfaults during development than perhaps most are comfortable with, but there's something rewarding about controlling precisely what the machine is doing at that level. This is especially true if you manage to find a novel solution that just wouldn't exist when the hardware capabilities are abstracted away by a high level language.
In the same way that everyone should learn a Lisp to think in terms of ASTs and code-as-data, everyone should write at least one whole application in assembly just to appreciate how the hardware really works. Also to see how often there are many ways to solve the same problem (especially with an x86 instruction set), sometimes with wildly different performance characteristics.
This was many years ago, but I still see the same today. I do RE so I've read a lot of compiler output, and I've seen some isolated instances where a compiler did something "clever" (Intel's is not bad at this), but it tends to be rare and it's easy to see the rest of the code still has that "compiler-generated" feel to it.
I said "really learning" above, because I think there's two ways that people are learning Asm: the first, which is probably more common, is that they only learn the ways in which compilers generate instructions. Those who learn the first way would likely not do any better job than a compiler if asked to write a program, and not see the inefficiency of compiler-generated code, so they wouldn't find any particular advantages to using Asm.
On the other hand, I believe that if you learn Asm by starting with the machine itself, independent of any HLL, then you don't get any preconceived notions of what it can and cannot do, which leads to what I'd call "real Asm programming." Then you can see the inefficiencies in compiler-generated code and what HLL abstractions introduce, and can easily beat the compiler in size or speed (often both). Good hand-written Asm has a very different look to it than compiler output.
This is especially true if you manage to find a novel solution that just wouldn't exist when the hardware capabilities are abstracted away by a high level language.
For some entertaining examples of what Asm can do that compilers cannot, look at the sub-1k categories in the demoscene:
One of my favourites: http://www.pouet.net/prod.php?which=3397
Unfortunately, with out-of-order execution and instruction-level parallelism, I doubt learning assembly teaches you much about how the hardware really works.
Edit: To the downvoter, care to comment?
Concretely, learning assembly, you might assume each core has a set of physical registers that correspond to the registers you see and that isn't the case.
NASM and FASM are really the only up-to-date and cross-platform capable assemblers. MASM is up to date, but Windows only. TASM is not up to date. Others appear to have been abandoned.
NASM: Is written in C and generates object files. Requires a linker to produce executables. Slow, inefficient compilation. Has some syntax quirks. May be more flexible in some cases due to the multiple object formats available.
FASM: Written in FASM. Very fast compilation. Cleaner syntax, better debugging tools. Produces executables directly without a linker. Possibly limited due to smaller number of output formats, but likely good enough for most projects that would be written in pure asm anyway.
FASM looks like the best option to learn first and then move to NASM for any specific requirement that FASM cannot meet. The syntax is mostly compatible between the two, so porting code shouldn't be too much trouble in the worst case.
# Translated to gas syntax.
# assemble with:
# as --64 -o hello.o hello.s
# link with:
# ld -o hellos hellos.o
# Modifications to original code considered trivial and to be
# public domain.
# Support intel syntal vs. ATT and don't use % before register names
msg: .asciz "hello, world!\n"
# write syscal
mov rax, 1
# file descritor, standard output
mov rdi, 1
# message address
mov rsi, OFFSET FLAT:msg
# length of message
mov rdx, 14
# call write syscall
mov rax, 60
mov rdi, 0
msg db "hello, world!",`\n`
;; Remember to use 14 for string length!
# String is read only.
msg: .asciz "hello, world!\n"
# Put string length in a variable instead
.set STR_SIZE, . - msg
mov rdx, STR_SIZE
One possible other tool to consider is 'terse': http://www.terse.com/howdoes.htm
It's got a lot of issues, and you probably don't want to actually use it. It's unmaintained, proprietary, DOS only, and according the website, still distributed on a 3.5" floppy. But the syntax has a lot of appealing things about it. You can't actually read the real manual without buying the product, but a short lived open source clone "nega" used a very similar one: http://webcache.googleusercontent.com/search?q=cache:7E6Ddug...
It's easier to update an external assembler than the system assembler. A lot of distros don't ship with updated binutils so you can't reliably compile for newer CPU extensions on them.
Earlier versions of clang's integrated assembler (which clang uses instead of as) weren't fully compatible with as, e.g. no .intel_syntax support.
Different operating systems can have subtly different behavior, e.g. the ancient as that ships with OS X uses $name for macro parameters while most? other systems use \name. I think gcc on OS X is intentionally forgotten so everyone will switch to clang.
Cross platform x86 asm is a real headache no matter what. NASM/Yasm/fasm just make it less of one.
I don't count its fast compilation speed as much of a plus because you've got to write a heck of a lot of assembler before you'd ever notice much of a difference, I'd suspect.
Back in the day I wrote  about simple Java string concatenation. I still get people quoting it now, even though I'm sure it is completely outdated by newer compilers.
It gets even more interesting when you see what x64 (or whatever!) assembly is generated by the JVM.
- it will not be understood, you might get downvoted
- it will be understood and you'll definitely get downvoted
Sure, there are differences: register name, C ABI convention, system calls, memory modes, etc. But those information can be find easily in references. And you need reference for x86 anyway. Otherwise all mechanisms are the same.
An easy tutorial like this injects just the right amount of motivation to atleast dip my feet back in. Having to wade through Intel's 1000 page system manuals to check if my past knowledge is useful or not, would require a lot more motivation than I can muster.
I bought Peter Norton's Assembly Language Book for the IBM PC. This book is pretty awesome and so relevant even 25 years later!! It covers the basics really well and stops at 386 (the latest proc then) so I don't feel inundated with hundreds of CPU architectures. Yes, I'll eventually get to those.
The other two books (in case anyone else is interested):
X86 Assembly Language and C Fundamentals by Joseph Cavanagh.
Operating Systems Design and Implementation (3rd Edition).
Also, while tinkering with ASM I decided to install MSDOS since the ASM book uses Debug.exe and I couldn't seem to find it for newer OS's. While looking to download MSDOS, I discovered the source code for MSDOS 1.1 & 2.0 @ http://www.computerhistory.org/atchm/microsoft-research-lice....
Calling convention, word size and the r- prefixes. If you compete the PGU book (you could just write 32 bit code if you want), these are trivialities.