It is that simple but the more complex story is that when the number of hidden layers exceeds 2, training becomes difficult. Also convnets for example cheat by having the connections between layers be incomplete bipartite graphs (not every node is connected to every other node), usually chosen because of some physical property - for computer vision nearest neighbors - eg.
Use another deep learning network to supervise training of your DLN. You can also use it to supervise itself. It is simple idea invented about decade ago (at least I heard it about decade ago here, in Ukraine).