aseg's comments

aseg · 2024-11-11T23:33:26 1731368006

If you're looking to run a .exe file, there are a couple of hypervisors in the market (sometimes found on the high seas). I've tried these on a couple of obscure .exe files:

- Parallels

- VMWare fusion

- Apple's gamekit

Parallels has had the highest coverage, but they've locked down Parallels pretty well, and there is no way to outright own a copy of Parallels (monthly subscription). I think the gamekit project is the most exciting one of these three though!

wtallis · 2024-11-12T00:47:37 1731372457

Parallels still offers a one-time purchase for the most limited edition. If you need to give VMs more than 8GB RAM then you'll need a subscription-only edition, but you can get pretty far with 8GB and 4 cores for a VM that's only running Windows 11 and one application.

aseg · on Nov 14, 2022

I encourage everyone to read this paper. It's well written and easy to follow along. To the uninitiated, SR is the problem of finding a mathematical (symbolic) expression that most accurately describes a dataset of input-output examples (regression). The most naive implementation of SR is basically a breath first search starting from the simplest program tree: x -> sin(x) -> cos(x) ... sin(cos(tan(x))) until timeout. However, we can prune out equivalent expressions and, in general, the problem is embarrassingly parallel which alludes to some hope that we can solve this pretty fast (check out PySR[1] for a modern implementation). I find SR fascinating because it can be used for model distillation: learn a DNN approximation and "distill" it to a symbolic program.

Note that the paper talks about the decision version of the SR problem. ie: can we discover the global optimum expression. I think this proof is important for the SR community but not particularly surprising (to me). However, I'm excited by the potential future work for this paper! A couple of discussion points:

* First, SR is technically a bottom up program synthesis problem where the DSL (math) has an equivalence operator. Can we use this proof to impose stronger guarantees on the "hyperparameters" for bottom up synthesis. Conversely, does the theoretical foundation of the inductive synthesis literature [2] help us define tighter bounds?

(EDIT: I was thinking a bit more about this and [2] is a bad reference for this... Jha et al give proofs for synthesis w/ CEGIS where the synthesizer queries a SMT solver for counterexamples until there are none... kinda like a GAN. Apologies.)

* Second, while SR itself is NP hard, can we say anything about the approximate algorithms (eg: distilling a deep neural network to find a solution[3])? Specifically, what proof tell us about the PAC learnability of SR?

Anyhow, pretty cool seeing such work get more attention!

[1] https://github.com/MilesCranmer/PySR

[2] https://susmitjha.github.io/papers/togis17.pdf

[3] https://astroautomata.com/paper/symbolic-neural-nets/

noobermin · on Nov 14, 2022

Random question, if you don't mind. What does symbolic fitting give you that normal regression doesn't? Usually in physics, an analytic solution (our term) yields physical insight, although not everyone is a physicist and cares about that, hence why machine learning in general is popular.

aseg · on Nov 14, 2022

You might find (Cramer, 1985) interesting! IIRC they go into exactly this problem. However, I can't find an open pdf to link to unfortunately. I'll edit this list tomorrow morning in case I misjudged (Cramer, 1985) or if I find a link..

Here are two hand-wavy arguments that may not be 100% correct:

* Structured Bias: A symbolic regression term allows you -- the scientist -- to control exactly what sort of expressions you expect to see. If I'm looking at data coming from a spring, I expect to see a lot of dampened sinusoids and little quantum physics. SR gives you control over the "programming language" while parametric regression will only allow you to change the number of parameters (not useful in this context).

* Generality: A regression term guarantees the best fit parametric equation as long as you have a comprehensive sample of your data range. A symbolic expression (most of the time) extrapolates beyond the provided data range. In fact, this is one of the constraints in the main proof in the paper (f* should generalize)! Basically: If I only have data for sin(x) from 0 to \pi, PR will find the best fit but there is no guarantee that the best fit will also work in the range \pi to 2\pi.

I want to stress that these aren't established facts and each of these pros actually introduces a lot of cons in the process (what if you introduce incorrect structured bias / what if the "general/simple" solution is actually a little imprecise... Like Newton's laws vs Einstein's theory)! This just means that there is plenty of exciting work to be done!

danuker · on Nov 14, 2022

> while parametric regression will only allow you to change the number of parameters

I suspect you mean "the value of parameters" here.

rocqua · on Nov 14, 2022

The regression will 'set the value of the parameters'. OPs point is that when you want to change the regression your main 'hyper-parameter' (i.e. one that you get to choose rather than one that the system determines for you) is the number of parameters.

Most Neural networks have other hyper-paramters, but the ones in SR are probably quite interpretable and intuitive.

danuker · on Nov 14, 2022

In "regular" regression, you have to come up with an input -> output equation. The regression only figures out the best parameters for it.

This might be difficult, especially if you can't visualize the data (many dimensions etc).

For example, a numeric regression for a1 * x1 + a2 * x2 + a3 = y can not fit a relationship like y = x1/x2.

In symbolic regression, the system can automatically discover equations also, not just coefficients.

SkyBelow · on Nov 14, 2022

My initial guess is that regression determines how well you fit an existing pattern or group of patterns but doesn't find out if there is an even better fitting pattern not used. Linear regression will determine how well you fit a linear pattern and while you can use it to search for non-linear patterns (like a regression of home prices against area squared instead of area), it only tests the patterns you put in.

taylorius · on Nov 14, 2022

Potentially much better extrapolation from the data, for one thing. The hope with SR is that the equation that is found will in some sense have a deeper connection to the process that generated the data, than a neural network made of lots of RelU units does.

usgroup · on Nov 14, 2022

Non linear forms most importantly.

thrown_22 · on Nov 14, 2022

A Fourier expansion is a symbolic fitting of data to a curve, yet it provides no information about the systems sampled as shown by the fact that it fits a geocentric model just as well as a heliocentric one.

The idea that an arbitrarily large expression is somehow more understandable than the fourier coefficients of a the first few large terms could only have been done by someone who hasn't looked the the vast array of semi-empirical formulas out there which are as clear as mud.

usgroup · on Nov 14, 2022

Well, I think the hope is that formulae reveal relationships. Notable applications of SR have mostly been to physics and other science phenomena. So, you do some SR to understand the mathematical form of the relationship, and then carry on by trying to understand it causally.

aseg · on Nov 9, 2022

The algorithms are generally in the area of machine learning + programming languages and pretty flexible. This paper talks about how we "bias" these algorithms for applications in the hard sciences (specifically talking about behavioral neuroscience but has/is being applied in other areas as well).

aseg · on Nov 9, 2022

Hello! I'm one of the authors on this. Feel free to ask me any questions. Happy to clear up any confusion about what we're talking about as well!

Here's a survey paper describing the algorithms used under the hood: https://www.cs.utexas.edu/~swarat/pubs/PGL-049-Plain.pdf