FWIW, I think the title change removes a lot of information about what the post is about. -----
 So this strategy is really only effective when you'd like to raise rent by 9.1% per year? -----
 Realize that in many areas, rental increases have legislated maximums, or increased notice periods for any raise above a certain percentage. This is a cunning way around that. -----
 No it's not. Discounts are still governed by rent laws.It's just a standard 1-month free signup discount followed by annual increases. -----
 There are more options than that. For instance, to obscure a planned increase from \$550/month to \$660/month, the offer could be two months free: 12 x \$550 = 10 x \$660.There's no reason the discount has to be for the full rental amount in a month. To obscure an increase from \$550/month to \$575/month, the offer could be be \$300 off for January: 12 x \$550 = \$275 + 11 x \$575. -----
 A 9.1% price increase per year is more than most businesses pull off, and would make a lot of them very happy. -----
 Every two years, it seems. -----
 It's still every year, but with a one-year lag on the first increase. -----
 Agreed. Here's a previous HN discussion on that topic: http://news.ycombinator.com/item?id=4281630 -----
 Another way to say what you're proposing is to use a linear predictor, and to train it by globally optimizing 0-1 loss (just the number of mistakes that you make) on the full set of data that you have. Even ignoring computational issues (this can be shown to be NP-hard), you seem to be making several mistaken assumptions. I'd really recommend reading some basic stuff on generalization, but a couple of the mistaken assumptions are as follows:1. That the particular input features that you've chosen are somehow the only possible choice. But who's to say that you shouldn't add new features which are the square of each original feature? Or maybe some cross product terms, like the product of the ith feature times the jth feature. Or maybe some good features to add would be the distance from each point you've seen so far. Etc. Continuing down this path, you basically get to the question discussed in the OP about choosing a kernel for SVMs. This is just one example where hyperparameters come into play, and you need some method for choosing them.2. That a linear predictor is impervious to overfitting. Consider the extreme case (which comes up often) where you have millions or billions of features and far fewer examples (e.g., if features are n-gram occurrences in text, or gene expression data). Then it's likely that there are many settings of weights that fit the data perfectly, but there's no way to tell if you're just picking up on statistical noise, or if you've learned something that will make good predictions on new data that you encounter. In both theory and practice, you need some form of regularization, and along with this comes more hyperparameters, which need to be chosen.Finally, by your reasoning, it seems like you would always choose a 1-nearest neighbors classifier [1] (because it will always end up with 0 error under the setting your propose). But there's no reason why this is in general a good idea. -----
 As another commenter pointed out, the accuracy really needs to be evaluated using a validation set, not the test set--the approach described in the post is training with the testing data. In the field, we call this "cheating".The basic idea of automatically tuning hyperparameters (the things this post discusses tuning with genetic algorithms) is cool, though, and is becoming a popular subject in machine learning research. A couple recent research papers on the topic are pretty readable:Algorithms for Hyper-Parameter Optimization:http://books.nips.cc/papers/files/nips24/NIPS2011_1385.pdfPractical Bayesian Optimization of Machine Learning Algorithms: -----
 Also, for people who are interested in the application of predicting college basketball with machine learning, there's a Google group that is worth joining: -----
 Thanks for the information! I've updated the article to reflect this.Here's a question: where does "the field" hang out? Is there a cohesive community of any sort? -----
 I'd say the closest thing to a cohesive community would be the MetaOptimize Q&A forum, but maybe others have other suggestions: -----
 The Kaggle forums can be a good resource, and competing there is a good way to polish your skills.Also, you might check out Random Forest algorithms--they're high performance but still very beginner friendly, as there aren't many parameters to tune. There's a nice implementation in the excellent scikits-learn python library. -----
 Hmm, so gaussian distributions are easy to use and ubiquitous and all (they're the basis functions used in SVM), except that I don't see any reason for them to be priors here. But since 2012 > 2008 I feel like I'm obligated (and I'm semi-trolling) to point out the obvious about lazy assumptions based on "flexibility and tractability", which is that they can implode hilariously in your face. C.f. the financial crisis. -----
 I think you're confusing the Gaussian "process" used in Bayesian optimization with a standard Gaussian distribution. They are very different things - as are Gaussian copulas and what is referred to as the 'Gaussian kernel' (which is not actually a distribution at all) in the SVM. The Gaussian process is a distribution over functions, the properties of which are governed by the covariance function - so the prior over the function, or the assumption about its complexity and form, is determined by the choice of covariance function. Of course it is very important to choose a prior that corresponds to the functional form you are interested in, which is actually discussed and empirically validated in the literature referred to in that post. It's a bit ironic that you are claiming to point out the dangers of making lazy assumptions by doing exactly that. -----
 Of particular note is the use of Theano for the machine learning heavy-lifting. If you do machine learning and haven't looked into Theano, you're probably making things harder on yourself than it needs to be. -----
 Last year there was discussion where some HNers suggested that they'd be interested in participating in the competition if we allowed other forms of data: http://news.ycombinator.com/item?id=2321009This is the early announcement this year, where we're soliciting suggestions about other forms of data to include for this year. -----
 I did research with Daphne (and co-authored a paper with her) in my senior year of undergrad, and she demanded a high standard of work, yes, but she was an excellent supervisor. Everybody knows how brilliant she is, but she also put a lot of effort into teaching my (also undergrad) project partner and I about how to formulate a research problem, how to do research, and how to present research. The primary concern appeared to be our personal growth, not the research machine (though that's not to say that the research wasn't important).Working with her was one of the highlights of my undergrad education, and her class was great, too. -----