 Information Geometry (ucr.edu) 79 points by clebio on May 14, 2015 | hide | past | web | favorite | 20 comments It is a wonderful series. For a bit simpler take on entropy (from the same guy), there is a series on biodiversity (some of my faviourite: http://stats.stackexchange.com/a/144235/6552).E.g.:https://johncarlosbaez.wordpress.com/2012/06/21/the-mathemat...https://johncarlosbaez.wordpress.com/2012/07/02/the-mathemat... Such cool sounding things make me wish I went to graduate school for mathematics... I really wish one could learn such things on their own free time, may be even publish in their free time separate from affiliation with a university or educational institution. Well, you can (at least, learn). I've been reading and practicing ever since finishing college. It is an enormous time effort, you have to practice (not just read), and it really helps to have people to share discussions and ideas with (physics forums seems like a good place to go, for example). But I certainly think it's possible. The alternative is not trying, which you know won't succeed (though of course, it may be a conscious choice to do other things). It's a lot like quantum mechanics but all real numbers. >Information geometry is the study of 'stochastic manifolds', which are spaces where each point is a hypothesis about some state of affairs.First sentence and I am already lost. In statistics you often think about having a "space of possible states of the world" and then seeking the point in that space which is in best concordance with the data you observe.[For instance, we might assume that a basketball player's score in a game is a linear function of the number of shots they make. The possible states of the world are the multiplicative factors (points scored)/(shots made). This space is simple: it's just a line, probably even just a ray since negative points are impossible.]One major trick is in the "seeking". To do so we often assume that the space is parameterizable like that it has latitude and longitude and then we scan over all possible choices of parameters looking for the parameters of the optimal point. Depending on the kind of model of the world you're working with, these parameterizations change.[In the running example, the space is a line and the actual assignment of positive numbers along that line is a suitable parameterization. Higher dimensional models or curved models make parameterization tougher.]Information geometry uses the tools of differential geometry, the same ones used to characterize general relativity in physics, to characterize this "state of the world" space more completely. It provides new tools for parameterization and understanding when older parameterization tools failed.[In the running example we don't much need differential geometry to understand the geometry of our ray. In high dimensions, models with interactions, curvature, with discrete and continuous parts, intuition breaks down.]It also provides a rich geometric vocabulary useful for visualizing the "state of the world" space which can be instrumental in understanding statistics, building new models, evaluating how models compare with one another. Thanks, tel. It was your reply to my comment on the Tensor discussion the other day that sent me searching for a general reference, and I found this link this morning. Cheers! Hey, I'm glad! This is a really good find. I love Baez's writing so I'm pretty eager to run through this as well. Papers-we-love G+ Hangouts? Guess I just need to start my local Papers we love Meetup. that's a fantastic idea. Send me an email with link if you ever do launch something like that. If I get around to organizing such a thing, I suppose I'll throw together a quick single-page site, and share in on HN or something. I've been trying to piece this field apart for a while now, so here's my take on this: Each manifold is just a collection of points that can be locally mapped to some coordinate system. Furthermore, distances between points can be characterized by a given metric and paths across these points define tangent spaces of valid directional derivatives.But that's not as important as the idea that the points composing the manifolds in question represent distinct probability distributions. For example, 2D surface could be made that represents all normal distributions. One dimension is the 1st parameter (mean over all reals) and the second is the 2nd parameter (variance over pos reals).This let's you think about quantifiably describing the differences between probability distributions of a given parameterization. It's cool and way more complicated than what I've described here. Check it out! Go slow and write out the definitions as you go along. In my mind the trick is always in training models as it amounts to picking an optimal probability distribution from your manifold. Even in relatively simple mathematical statistics we quickly encounter curvature in exponential families but the tools to handle it are ad hoc. Differential geometry provides exactly the right tools to both talk about curvature and to talk about traversing the manifold and that is an appropriate place to talk about maximization as is needed. Well a (topological) manifold is a topological space X, that is locally homeomorphic to some d-dimensional euclidean space R^d. A topology on a set X is a way of defining what points are "close together" in some sense, more precisely it is a collection of subsets of X, called the open sets with respect to the topology. The collection has to contain both the set X and the empty set, any union of sets in the collection and any finite intersection. Continuous maps between topological spaces are then maps f: X -> Y, that have the property that every f^{-1}(U) preimage of an open set U of Y is an open set of X. And homeomorphisms are continuous maps with continuous inverse.Euclidean spaces have a natural topology induced by the standard euclidean norm, whenever you have a notion of distance or metric d: X * X -> R defined on a space X you can speak of the "open balls" with respect to this metric, that is for a fixed point P all points P', such that the distance d(P,P') < c is smaller than some constant c. Those generate a topology.For some arbitrary topological space X to be locally homeomorphic to R^d, then just means that every point p of X has a open neighborhood, that maps homeomorphically to some open ball in R^d. That is to say it locally "looks like" euclidean space.One usually wants to do differential geometry on manifolds, that is have some notion of first/second/higher order geometry, in other words do all the things one knows and loves from calculus/analysis on manifolds. For that to work one need that coordinate changes are differentiable, that is given two open neighborhoods U and V of a point p in X and and homeomorphisms f: U -> U' in R^d and g : V -> V' in R^d one needs the composition g . f^{-1} to be differentiable.A stochastic manifold is then simply such a (differentiable) manifold together with a density p, that integrates to one $\int_X p = 1$. I start from Principal Component Analysis. PCA is finding the axes of maximal eccentricity within a multidimensional space -- where the space is the variables describing some data set.This is taking that _sort_ of approach and running it thoroughly out. What are topological and differential properties of a sample space, what can we do with the geometrical forms within that space, given what we know about (diffeomorphic) mappings, etc. The space here is the space of probability distributions (each point is a distribution), not the sample space. So, no real connection with PCA. I'm thinking at least now I know the name of something else I don't know. Way to look on the bright side! I recently got a book on Linear Mathematics. My level of education in mathematics is midway through 8th Grade Geometry.One of the very first chapters mentioned the word "eigenvalue".I then spent several hours using Google as a dictionary jumping from one mathematical term to the next just to define eigenvalue in a way I could understand with my very lacking mathematical knowledge.It really makes it tiring to fully understand the math when the vocabulary they use to describe it may as well be Elvish to me. So I understand how you feel 100% when the opening sentence mentions 'stochastic manifolds'. Agreed. Applications are open for YC Winter 2020

Search: