In my experience, there is no better way than to build it. You learn quickly from your failures and become aware of subtle issues that no text-book or article will ever touch since they are so domain specific.
The best way to learn by doing, IMO, is by joining an organization that works on a large and complex code base. It won't be as exciting as writing new code, but adding to an existing complex system will force you to take into consideration things that you otherwise would not.
parent has some great points. You only will really learn by doing.
A couple of things you can do is start reading up on well designed systems and the gotcha stories. Personally, I track highscalability.com since I spend a lot of time working in distributed systems. They tend to have interesting insights and will link to tech talks and blogs that contain a wealth of information. Also go read up on some of the papers explaining the designs behind systems you use (e.g. Amazon dynamo) and why they made the choices they did. It'll give you some insight into how people measure tradeoffs and figuring out what's important.
To your comment about working at a big company not being a mistake (I agree), I think the art of learning through 'apprenticeship' is getting lost in the whole 'hacker' hullabaloo.
My first job out of college was at a large company and I worked with some old timers who had been in the industry for years and on code that, at least parts of it, had been written almost 20 years ago. Learning from my colleague's experiences and through the code was incredible and I am certainly better for it. These sort of experiences can rarely be had working at a startup.
The big question for me is whether Julia will be able to maintain its "purity" as it gains adoption.
R probably started out "beautiful" and "thought out" but has lost that edge with years of community driven development. It's also what make it so damn useful -- you can pretty much find anything on CRAN, often multiple implementations of it.
R is actually one of the most pure languages out there; it basically says "I have vectors; they can have missing values, be nested, and can have other vectors as attributes. And I have functions with lexical scoping. Now go and build the rest as you like." So people did this, one better, one worse -- but the core and beautiful stuff here is that all those approaches will work together and just do the job.