Couldn't the overdetermination problem be solved by tagging intervals with their...

kragen · 2024-06-26T01:13:59 1719364439

this is basically how affine arithmetic works. but in interval arithmetic or modal interval arithmetic the representation of a float is two floats, and in affine arithmetic the representation of a float is a potentially unboundedly large set of float coefficients, none of which can be safely ignored, so there are some drawbacks to this approach

i'm not sure why wayne doesn't mention affine arithmetic and modal interval arithmetic in this post; i'd think they were relevant

lifthrasiir · 2024-06-26T04:29:08 1719376148

> none of which can be safely ignored

Or more accurately speaking, they can be ignored but at some expense of accuracy. (You can always approximate every variable with a (-inf, +inf) interval!) There are several working ways to "compress" affine variables when they get too many.

Probably a worse problem with AA is that every multiplication introduces a new variable, so it can be very frequent to head into the maximum number of variables set for the reasonable performance...

kragen · 2024-06-26T08:11:52 1719389512

agreed. i find reduced affine arithmetic the most appealing, but haven't tried it

moonchild · 2024-06-26T08:04:16 1719389056

this requires all terms to be represented symbolically, which 'works', but now you need a full-blown symbolic rewrite engine to try to simplify things. very reasonable for an optimiser but not so much for numerics. for numerics a more common approach is interval mincing. we have a function f(x) = x*x, and try to apply f([-3 3]). we can 'mince' the inner interval and say that (e.g.) [-3 3] = [-3 -1] U [-1 1] U [1 3]. so we have f([-3 -1] U [-1 1] U [1 3]); distribute function application over U (which is sound) to get f([-3 -1]) U f([-1 1]) U f([1 3]), which is obviously tighter than f([-3 3]). we picked a subinterval size of 2, but can shrink it arbitrarily until we're satisfied with the result. this technique is essentially a form of case analysis, and can also be applied to optimisers/static analysers (complementarily to simplifying rewrites)

kragen · 2024-06-26T08:10:54 1719389454

affine arithmetic does not need a full-blown symbolic rewrite engine, though it too sometimes produces overbroad intervals

moonchild · 2024-06-26T08:25:15 1719390315

so a middle ground (as many pseudo-symbolic approaches). glanced at wikipedia—this seems not dissimilar conceptually to the abstract domain of polyhedra, in that it's symbolic but has a flat structure and expresses only linear relationships. of course, polyhedra are exponential where affine arithmetic is quadratic (overall space in number of terms), and polyhedra are global where affine is local (therefore easier to implement as a library, but no sharing). more missed connections between numerical analysis and program analysis? (meh but no sharing probably means affine generally loses for a given resource budget. would still be interesting to try though)

kragen · 2024-06-26T09:15:25 1719393325

this sounds interesting!

i think aa is linear in the number of terms, and reduced aa is linear in the number of independent variables, but possibly i don't understand your meaning

moonchild · 2024-06-26T09:26:03 1719393963

feel free to ping on irc if interested in chatting more on the topic, though i'm not sure i have a ton more original thoughts atm

if you have an expression with n terms, then you will end up with O(n) terms each taking up O(n) space, so the overall space usage is quadratic. (that's the fair way to compare to polyhedra since they're always global)

kragen · 2024-06-26T10:13:05 1719396785

what's the scenario where you'd materialize the values of all those terms at once? maybe cache-invalidation-based incremental evaluation, like acar's self-adjusting computation? affine arithmetic (or raa) seems like it could be a good fit for that kind of thing, because it can handle small input changes within the affine approximation (with just a multiply-accumulate) and sometimes would benefit from being able to recursively subdivide the input space into smaller intervals without recomputing unaffected parts of the expression

on the other hand, if i am merely evaluating a polynomial with n terms, such as 3x⁴ - 2x³ + 17x² + x - 4 for n = 5, i only have three aa objects at any one time during evaluation of the expression, regardless of n. in this case the program might look like this

    a ← x; a ← a⁴; b ← 3; a ← a · b
    b ← x; b ← b³; c ← -2; b ← b · c; a ← a + b
    b ← x; b ← b²; c ← 17; b ← b · c; a ← a + b
    b ← x; a ← a + b
    b ← -4; a ← a + b

this assumes we have three registers for aa objects, creatively called a, b, and c; that we can only do arithmetic on these registers; and that the power operations are unary (otherwise they require an additional register)

this is true of any polynomial

with horner's-rule evaluation we can cut this down to two registers

    a ← 3
    b ← x; a ← a · b; b ← -2; a ← a + b
    b ← x; a ← a · b; b ← 17; a ← a + b
    b ← x; a ← a · b; b ← 1;  a ← a + b
    b ← x; a ← a · b; b ← -4; a ← a + b

obviously there do exist expressions with deeper minimum evaluation stacks, but i think they're only worst-case logarithmic in expression size, aren't they?

moonchild · 2024-06-26T22:00:43 1719439243

(dynamic) computation forms a dag, not a tree. i think a sum scan (n inputs -> n outputs) will trigger the worst case. it might be that computations tend to be tree-shaped, so you rarely hit the worst case, but that helps everybody out, not just affine arithmetic

kragen · 2024-06-27T00:31:47 1719448307

that's a good point; i was thinking of single-output expressions, where i think always evaluating the deepest subtree first limits your space usage to worst-case logarithmic?

moonchild · 2024-06-27T01:24:29 1719451469

even single-output is a dag; you can explode the dag into a tree, but then you pay in time. suppose some expensive term x is used to compute n other terms. after computing x, assuming we want to share it, we have to compute all n terms before killing x; hence x sticks around longer than it might. (this doesn't necessarily lead to explosion, but i'm pretty sure you can use this to construct a scenario that needs a large number of live terms to avoid compromising time complexity; at least n live terms if the dag has nlogn nodes)