Great reading! i would like to kindly suggest the following:
"The minimum values the error would take can be computed using gradient descent, an optimization algorithm that is capable of finding local minima of a given function"
i think it's better to clarify that gradient descent "can find" global minima, there are some neural networks cost functions that are non convex, in that case the gradient descent can be stuck in local minima, as Andrew Ng explains in his coursera free course is that this local minima normally doesn't represent a big problem in neural networks, but as a general concept the idea behind optimization algorithms in cost functions is to find the global minima, for example in linear regression and logistic regression we have convex cost functions so the Gradient descent can find pretty close values to global minima.
Also i will suggest that the main task of Gradient Descent is not to compute the minimum values the error would take, it's to find the values of W that will get you the minimun error.
Thank you so much for the comments! I indeed agree with every word you said. Regarding your suggestions about the main purpose of gradient descent you are right and so I will rephrase this part of the post in order to clarify it. :)
The tips in your article are indeed very helpful for someone who is just starting out with deep learning. It took me some time to come up to the majority of these conclussion and I would have felt very lucky I have read your article. Thanks for sharing!
What a great read; thanks for sharing the thoughts. I believe there's a "T" missing st the start of the sentence "To fix that problem" and a space missing between "overfitting,but".