Hacker News new | past | comments | ask | show | jobs | submit login

> seems to be mostly irrelevant

From a perspective that could be too local in time. But:

> ReLU activation functions

Why did you pick ReLU, of all? The sigmoid makes sense because of the aesthetic (with reference to the derivative), but ReLU in that perspective is an information cutoff. And in the perspective of the goal, I am not aware of a theory that defends it as "the activation function that makes sense" (beyond effectiveness). Are you saying that working applications overwhelmingly use ReLU? If so, which ones?




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: