Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This is a fun blog post, but I thought it was a little hard to follow. A few observations:

  When we added the features a new problem emerged: their 
  ranges are very different from X1 meaning that a small 
  change in θ2, θ3, θ4 have much bigger imopacts than 
  changing θ1. This causes problems when we are fitting the 
  values θ later on.
This was a little confusing because you reference θ2, θ3, θ4 without explicitly showing them in h(x).

  Because we will be using the hypothesis function many 
  times in the future it should be very fast. Right now h 
  can only compute one the prediction for one training 
  example at a time. We can change that by vectorizing it
What does it mean for _h_ to compute something? Why is vectorizing better? Context about the computation is needed to determine if vectoring will speed computation.

Why do you use gradient descent when you can use a closed-form solution to solve the regression? It would be nice to discuss both gradient descent and the closed-form solution.

You cover a lot of topics in this blog post which have a lot of nuance and depth (e.g. random initial weights) that merit whole posts on their own.



Thank you so much for your feedback!

You are completely right and I have updated the post. (should be online within a few minutes.)

> You cover a lot of topics in this blog post which have a lot of nuance and depth (e.g. random initial weights) that merit whole posts on their own.

I agree the post is quite long. The reason for that is because I wrote the initial version for Google Code In, a programming competition for high schoolers. It had to cover a list of concepts and I wanted to explain them well instead of just giving a quick introduction so it ended up being quite long.

It would definitely be interesting to write another article on symmetry braking some time.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: