I write tests not to convince myself my code is correct (it often is, and repl and ad-hoc testing are more than enough to make sure it does the right thing now) but to prevent myself & future others from breaking it as they maintain it in the future.
Regarding the perceptron, most modern texts also have a regret analysis for the perceptron, which coupled with an online-to-batch conversion tells you how well do you expect a perceptron to perform on unseen data after a single pass, and it's usually a very good estimate (the answer is on average about as well as it did on the examples in the training data).
ML PhD student here. The reason why this is different is that the parallel monte carlo simulations are running on different subsets of the data in each machine, and then averaged.
It is not obvious that this can work at all in some cases. Think, for example, a clustering model. If there are two clusters, but one machine calls them A B and the other machine calls them B A, averaging will give you useless results.
So the contribution of this paper is finding a set of models on which naive averaging works, and showing an efficient mapreduce implementation of it.
That said, I don't find the paper particularly interesting.
Ph.D student in Stats here.
That we can even sample the full posterior with so much data is interesting in and of itself. The method isn't particularly revolutionary, especially compared to a method Zaiyang Huang and Andy Gelman came up with 8 years ago, but it's practical (for a restricted class of models). Steve Scott spoke to my department about a month ago, describing the problem. It's not a solution that works for all posteriors, but it certainly allows more freedom than restricting oneself to conjugate priors and allows computation on large datasets.
Academics tend to trivialize the implementation (we had some pretty strong critics of his talk in my dept), but some kudos are in order for that, even if the algorithm itself isn't revolutionary.
Fun fact: this is pretty much the same dynamic program as CKY, the parsing algorithm that can parse any context-free language in the appropriate normal form (only productions looking like A->BC or A->a) in time cubic on the length of the sentence.
The fallacy behind arguments like this is that when someone says that we're "proposing to push people to speak at conferences who aren't necessarily the most knowledgeable speakers" it is implicitly assumed that it is known how to rank the speakers, and that doing so is easy.
In practice, however, while one can often tell a really bad speaker and a really good speaker apart there are plenty of bordeline cases and the variance is big, so using minority-biased criteria to "break ties" will not necessarily lead to worse speakers on average, as long as these criteria are uncorrelated with actual skills.
(and I'm not sure they are, as the mean quality of a woman-presented talk in conferences I've attended has been consistently higher than the mean man-presented talk)
The above construction is a not a breakthrough and it is how I was taught in my approximate algorithms class. Also the way the optimization problem is usually defined is not as "give a circuit with minimal cost" but "what is the cost of the minimal-cost circuit?", which is easier to reason about in an approximate way.
Same here. Intuitively, a decider that can answer "Is there an answer less than X?" will give you one bit of the lowest cost each time you query it. Do it repeatedly until you have all of the bits, and you're done.
There is no guarantee that the number of bits in 'all the bits' is finite. Because of that, there is no guarantee that such an algorithm finishes. The simplest counterexample is N=2, with a distance of 1/3 between the nodes, and shortest (and only) tour with length 2/3.
You don't need to represent the actual length of the solution in a finite number of bits, you just need to represent the solution itself. The solution can be represented as the set of edges contained within it. Each edge is either present or not, so a solution can be represented as a bit set the size of the number of edges in the graph, or at most n^2 bits where n is the number of vertices.
For a practical idea of how you'd do this, I think you'd just modify your binary search based on the possible circuit values. Basically, figure out what number excludes half of the remaining solutions, rather than just half of the range, and use that in your binary search.
Yes, any algorithm that halves the set of potential solutions in each iteration will work, but I do not see how one can get such an iteration step from "a decider that can answer "is there an answer less than X?". If suspect (but cannot prove or even have reasonably solid arguments) that the "figure out what number excludes half of the remaining solutions" step will be at least as hard as the problem itself.
Now, if I had such an oracle, I would let it loose on all graphs equal to the target graph with one edge removed. I think (but cannot prove (yet?)) that that is a way to prove, for each edge, whether it is by necessity part of the shortest tour. I also supect that, for most graphs, the set of remaining "might be in the shortest tour" will be so small that an exhaustive search will be easy.
That's exactly what floating point numbers are: representations of real number intervals as finite length bit strings. The number of bits simply determines the granularity of the representation (ie. the width of the intervals).
In the case of this problem the set of possible paths is finite so it can, in principle, be sorted in order of increasing cost. The smallest cost delta between any 2 paths in the sorted list is the lowest cost edge in the graph (this probably assumes that all edge costs are positive) so you only need to continue the binary search until intervals of this size are reached. At that point the exact solution has been found.
EDIT: There might be multiple paths with the same minimal cost but binary search will still find this value.
That depends on how you define your representation. It is useful for some problems to represent an interval with it's first bits, e.g. to take 0.101 as representing all numbers in the [0.101, 0.110) interval.
One built from the ground up with the idea that people can hire online. Can I rent a clown for my kids birthday party through a temp agency? Can I hire someone to clean my house from that same temp agency? Can I hire someone to detail my car?
It's interesting, but you're going to get a lot of human complication involved. You'll need administrators with the expertise to vet all those types of people. If money is flowing through you, customers will expect you to guarantee quality. And guarantee you don't send a criminal into their house to entertain their kids. I did essentially this just for math tutors for the math portion of the SAT for high schoolers. By far the most work was in hiring and managing staff.
If money is not flowing through you then the product is Angie's List.
citation please? First, what does quantum computers have to do with finding the higgs boson? Second, is D-wave not standing on the shoulders of decades of academic research in designing their computer?