mjw 18 days ago | link | parent | on: A Fervent Defense of Frequentist Statistics My attempt to summarise the difference in language familiar to computer scientists, is that you can look at the frequentist vs Bayesian debate as being about when a worst-case analysis is preferable to average-case analysis for unknown parameters of a statistical model.There's something you don't know (the parameters). Are you looking to make statements which bound how bad things could be under the worst-case setting of those parameters? Or do you have some idea upfront about how likely different parameter settings are, and want to make statements about them in the "average" case?Rather like with worst-case vs average-case analysis of algorithms, which is more appropriate depends what you're trying to do, and sometimes both are interesting.-----
 mjw 67 days ago | link | parent | on: K-means Clustering 86 Single Malt Scotch Whiskies This is awesome! I wonder if anyone's done something similar with beers?Anyway a few "next thing to try" suggestions from a machine learning perspective:The model selection process used here is by its own admission quite ad-hoc, based on a gut feel about diminishing returns. There are various more principled methods you can use to find the sweet spot between over- and under-fitting with these kind of models, a lot of them based on held-out validation data.One way to do this would be leave-one-out cross validation (LOO-CV): hold out one whisky, fit the model, and see how 'surprised' the model is by the held-out whisky, repeat for the next whisky and average over all the folds. Because the dataset is tiny this should be quite feasible.To measure 'surprisal' you could e.g. look at the distance from the held-out data point to the nearest cluster, although something better motivated would be if you switched to a probabilistic model and used likelihood of the held-out data. Probably the simplest next thing you could try in that direction would be a Gaussian mixture model (GMM) trained using EM. K-means is actually a degenerate limiting case of this.A probabilistic model would also allow you to use Bayesian model selection criteria, which can get quite interesting (and might lead you eventually to things like Dirichlet process mixture models).It would make it easier to compare the model's explanatory power with other unsupervised probabilistic models. For example some kind of latent factor model like Factor analysis or pPCA would be quite interesting to investigate too, whether taken alone or in combination with clustering as a dimensionality reduction step as tlarkworthy is suggesting.Also concur that doing multiple runs with different randomised initialization is generally a good idea for k-means or EM, since they can get stuck in poor local minima. Perhaps more common practise to pick the best of multiple runs than to average them though.-----
 _deh 67 days ago | link Something on beer: http://bit.ly/JLNgZA (wisc.edu)-----
 ced 90 days ago | link I agree. As a Bayesian hoping to understand my data, P(X|M1) is useful: it's the probability I have for X under M1's modelling assumptions. Of course M1 is an approximation, but that's how science is done. You get to understand how your model behaves, and you may say "Well, X is a bit higher than it should be, but that's because M1 assumes a linear response, and we know that's not quite true".Bayesian model averaging entails P(X) = P(X|M1)P(M1) + P(X|M2)P(M2). It assumes that either M1 or M2 is true. No conclusions can be derived from that. It might be useful from a purely predictive standpoint (maybe) , but it has no place inside the scientific pipeline.There is a related quantity which is P(M1)/P(M2). That's how much the data favours M1 over M2, and it's a sensible formula, because it doesn't rely on the abominable P(M1) + P(M2) = 1-----
 mjw 90 days ago | link Yeah good perspective -- I guess I was thinking about this more from the perspective of predictive modelling than science.Model averaging can be quite useful when you're averaging over versions of the same model with different hyperparameters, e.g. the number of clusters in a mixture model.You still need a good hyper-prior over the hyperparameters to avoid overfitting in these cases though, as an example IIRC dirichlet process mixture models can often overfit the number of clusters.Agreed that model averaging could be harder to justify as a scientist comparing models which are qualitatively quite different.-----
 ced 89 days ago | link Model averaging can be quite useful when you're averaging over versions of the same model with different hyperparameters, e.g. the number of clusters in a mixture model.Yeah, but in this case, there's a crucial difference: within the assumptions of a mixture model M, N=1, 2, ... clusters do make an exhaustive partition of the space, whereas if I compute a distribution for models M1 and M2, there is always M3, M4, ... lurking unexpressed and unaccounted for. In other words,P(N=1|M) + P(N=2|M) + ... = 1butP(M1) + P(M2) << 1Is the number of clusters even a hyperparameter? Wiki says that hyperparameters are parameters of the prior distribution. What do you think?-----
 avaku 90 days ago | link Great explanation. I would like to add to this, that held-out data is often used in Bayesian learning too - for example, in cases when you intentionally over-specify the model (adding more parameters than might be needed) because you don't really know what the best model might be. The inference goes until the likelihood on held-out data keeps increasing. Example, gesture recognition in Kinekt. If someone finds this info useful, I also recommend Coursera course on Probabilistic Graphical Models.-----
 mailshanx 90 days ago | link What are some good resources to understand Bayesian model averaging?-----
 mjw 90 days ago | link These slides have a bit on this (although quite dense material): http://www.gatsby.ucl.ac.uk/teaching/courses/ml1-2011/lect5b... as part of http://www.gatsby.ucl.ac.uk/teaching/courses/ml1-2011.htmlI quite like "Bayesian reasoning and machine learning" too: http://web4.cs.ucl.ac.uk/staff/D.Barber/textbook/090310.pdf-----
 mjw 135 days ago | link | parent | on: Cheaper to rent in Barcelona and commute to London Yes the London market is a bit nuts, but if you're paying anything close to 25k/year (that's 480/week!) for a 2 bed in a "not great" part of London then either you have very high standards or you're being seriously ripped off.For example see what this gets you N1, a pretty central and desirable postcode: http://www.nestoria.co.uk/n1/property/rent/bedrooms-2/maxpri...-----
 csomar 135 days ago | link The links you suggested are as much expensive. I think the OP didn't mean a miserable appt but just not a great one considering the money paid.Don't forget there are other costs that the OP might have added to equation: water, ec, internet, phone...-----
 mjw 146 days ago | link | parent | on: D3 visualization of San Francisco BART employees' ... The coloured dots are cute and all, but if the goal is to make visually apparent the relationship between salary and union membership, some more traditional visualisations might have made this clearer. For example boxplots broken down by union.-----
 glaugh 146 days ago | link Here's a different perspective on the data. Threw the data into Statwing, related Salary to Union Membership:https://www.statwing.com/open/datasets/6019645769abb7de64d3e...You can play around in there a bit, too. I thought just running descriptives on everything was pretty interesting.Disclosure: I work at Statwing. Thanks to OP for making original data easy to get to.-----
 taeric 146 days ago | link Yeah... I'm honestly not sure what I'm supposed to be seeing with this visualization. :(-----
 tpurves 146 days ago | link Yes it's indeed visual. Is the visualization insightful? no.-----
 mjw 165 days ago | link | parent | on: What's the Most Concave State in the U.S.? Using R... So with a few minor complications convexity generalises to Riemannian manifolds like the earth. You need to replace "straight line" with "minimising geodesic" i.e. shortest path, which don't depend on the choice of coordinate chart, just on the Riemannian manifold structure (which includes an inner product hence a metric).This is complicated slightly when there isn't a unique shortest path between any given two points (e.g. the earth's north and south poles), leading to definitions of strongly convex, convex and weakly convex. See http://en.wikipedia.org/wiki/Geodesic_convexity and the debate at http://en.wikipedia.org/wiki/Talk:Geodesic_convexity#Dispute...-----
 mjw 165 days ago | link | parent | on: What's the Most Concave State in the U.S.? Using R... Maths challenge:I wonder if you could prove this "probability of a random line segment violating convexity" definition equivalent to something given in terms of a ratio of different areas like the area to convex hull area suggestion below.-----
 mjw 165 days ago | link | parent | on: What's the Most Concave State in the U.S.? Using R... Yeah that was what I was expecting they'd do also.Maybe you'd want to compute the convex hull on a per-connected-component basis, though, to avoid allowing small islands to have undue influence.Either that, or say that any area of sea within its convex hull gets treated as part of the state's area. This approach differs in that it special-cases sea over other kinds of land.-----
 lessnonymous 165 days ago | link Would you not get the mean height above sea level of all points along the border and divide that by the mean height above sea level of the entire state (which I guess is really an average of some sampling of points)?-----
 mjw 165 days ago | link For that suggestion I was just considering it a two-dimensional problem with a "sea or not sea" distinction, not using heights above sea level.-----
 mjw 175 days ago | link | parent | on: How To Deconstruct Almost Anything (1993) An aside, but:> A Novel Method for Applying a Trivial Modification of an Already-Known Algorithm to Some Type of Specific DataPapers sharing empirical findings on applications of existing research can be very useful to those of us also, you know, looking to apply said research. Don't underestimate the amount of value (and legwork!) involved in figuring out how to adapt and apply theoretical work to real problems in some CS-related fields.Obviously that kind of thing isn't accepted at the top conferences, so you know where not to look if you're not interested in it.-----
 mjw 185 days ago | link | parent | on: Probabilistic Scraping of Plain Text Tables On the subject of scraping data from OCR'd tables:I heard from a colleague who moved into finance, that there's a mini arms race going on between some funds(?) who are subject to regulatory requirements to release financial performance metrics but for a variety of reasons would rather not (and certainly would rather not make the data machine readable), and other hedge funds who want to run automated trading strategies off said released figures.They keep obfuscating the tables to make them harder and harder to parse algorithmically while still remaining theoretically human-readable.Great fun no doubt for everyone involved.-----
 gwu78 185 days ago | link The regulations should require that the disclosures be provided in both human and machine readable format.-----
 volokoumphetico 181 days ago | link any resources to back this up?-----
 mjw 174 days ago | link Entirely anecdotal.-----
 More

Search: