Hacker News new | past | comments | ask | show | jobs | submit login

They use a random forest classifier, which is an ensemble model that gives a consensus result of several decision trees. One way to achieve this consensus is voting. Random forest models are commonly used in building chemical models like this (and in QSAR), because they are quite robust. Due to the typically small size of chemical data sets (dozens to thousands, typically), more sophisticated methods are not usable and do not perform better.



Even then random forest is the wrong choice for this type of data. It should be the thing you do in your first hour of having it before choosing something more appropriate




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: