Russ Cox gave a really good answer to this question. I have to paraphrase because I can't find the original comment.
It was done for implementation speed. They had a compiler backend which they, the early Go team, were all familiar with. It was small so they could quickly change it to suit Go, rather than the other way around.
Russ Cox expressed the sentiment that if they had tried to use LLVM there would have been a lot of up front work that would have taken a long time and the Go project probably would have fizzled out. Using the plan 9 tool chain they were up and running very quickly allowing them to focus on the most important work, designing the language itself.
(Probably shouldn't attribute this to Russ Cox, I've probably read into his (rather briefer than what I wrote above) comments)
In my opinion this was the right choice. If LLVM turns out to be the best backend for Go it's easy to use it later. However, the language itself is largely fixed after 1.0. Better to focus on the language than the compiler backend which can be swapped out later.
LLVM does take a long time to compile, but four hours is an exaggeration. A mid-range computer can compile it in 30 minutes with -j4, and a good computer can compile it in 10 with -j8.
Heh, I was in this same boat two years ago. Servo used to build its own Rust back then, and that involved building llvm. I hadn't yet discovered ccache[1] and I had an old 4GB RAM laptop with a Core something processor. Took 4-6 hours total IIRC. This was also a really old laptop, so that was understandable. But very very annoying, since sometimes I'd need to rebase over a rustup and then wait hours for the build to work. This was a large part of my impetus to move servo to a state where we download Rust snapshots.
[1]: Btw, if LLVM compile times are getting to you, Use. Ccache. It is amazeballs.
Those compilers don't do nearly as much optimization as LLVM does. It's hard to overstate how wide the gap between the classical compilers you like and LLVM/GCC truly is. LLVM has no fewer than three IRs, each with a full suite of compiler optimizations (four if you count ScheduleDAG). LICM, just to name one example, runs at each stage (see MachineLICM.cpp). Turbo Pascal did nowhere near that level of optimization, and it produced unacceptably slow programs by modern standards.
Maybe the biggest issue is optimizing register allocation while providing the information for all roots for the garbage collection... For non-GC languages you don't need to care about it and just agressively optimize register allocation. For GC languages you have to be careful, because the GC must know 1) which registers are holding references and which not 2) which of those registers are "alive" and which are not in use anymore (and therefore ready for garbage collection)...
Anyway, the most difficult part of writing the compiler was probably the garbage collector.