Use Gröbner bases to solve polynomial equations

Fermat963 · on April 25, 2023

I did my master's thesis on the Gaussian Elimination part of Gröbner Bases making multiple optimizations to the Faugère-Lachartre method (working with Faugere himself) and had the first C (C++ish) public implementation at the time. The thesis can be found here https://hpac.imag.fr/gbla/data/bib/Mart12.pdf.

Many ideas from the thesis ended up in the GBLA library used for fast Gaussian Elimination over these large matrices: https://dl.acm.org/doi/10.1145/2930889.2930914

This is a nice presentation from Christian Eder explaining how the algorithm works and the different optimization techniques implemented in GBLA https://www-user.rhrk.uni-kl.de/~ederc/download/gbla-aca-201...

Syzygies · on April 26, 2023

I was at a conference in Germany where Faugère presented his work. The room was stunned, but didn't understand. Hendrik Lenstra asked me to spend several days with Faugère, then give an expository talk. My marching orders: Assume nothing: groups, rings, fields.

I ended up lecturing on Gröbner Bases and Faugère's algorithm to an audience that included Buchberger and Faugère. Faugère appreciated my efforts. Buchberger was like a shook up soda bottle; he made a twenty minute speech during the question period.

I coauthored the computer algebra system Macaulay, that first introduced Gröbner Bases to algebraic geometry, opening up the field to applications. At the time there was a separate community in France studying "standard bases", from which I learned a lot. Their focus was on power series not polynomials, but the translation is easy.

Algebraic geometry is a great subject, and computerizing it was a noble cause, but it thrust me into a backwater where the difference between polynomials and power series is considering significant. I got out.

j2kun · on April 27, 2023

Wow, Macaulay is a big deal! Were you involved in any applications that used your software?

LastNevadan · on April 26, 2023

It looks like your thesis is inaccessible at the moment due to the HN kiss of death.

ransac9000 · on April 25, 2023

Author here. Gröbner bases are powerful tools for us to understand polynomials. I started on this article when I encountered them during research, and decided to write down some notes. After a few months, these notes turned into this blog post. It's quite a long read, and let me know if you have any questions.

red_trumpet · on April 25, 2023

Looks nice!

A tip for your latex: I would use \langle and \rangle to surround the generators of an ideal, which looks better than using < and >. And I think they are called angle brackets, not square brackets (which would be [,]).

ransac9000 · on April 25, 2023

Good catch! I thought I changed all < and > into \langle and \rangle.

bainsfather · on April 26, 2023

A bit after halfway, following: "In this case, .. , and we have", I think you need a "-" sign rather than a "+" in your equation?

ransac9000 · on April 26, 2023

Good catch! Thanks for the feedback.

abdullahkhalids · on April 25, 2023

Can the method of Grobner bases also be used to solve systems of polynomial inequalities?

nsajko · on April 25, 2023

Inequalities can be transformed into equations quite easily, by adding variables:

Consider a strict inequality with real variables x_i:

    f(x_1,...,x_n) > 0

By adding the variable y we obtain an equivalent equation:

    f(x_1,...,x_n) = y^-2

This is correct because y^-2 is always strictly positive.

If f is a polynomial, the above can be written as a polynomial equation like so:

    f(x_1,...,x_n) * y^2 = 1

To transform a non-strict inequality into an equation, on the other hand, the procedure is the same, just use 2 instead of -2 as the exponent. That is, this inequality:

    f(x_1,...,x_n) ≥ 0

... is equivalent to this equation:

    f(x_1,...,x_n) = y^2

enedil · on April 26, 2023

Trouble starts when your field doesn't always have a solution for f(x_1,...,x_n) = y^2, like rational numbers (but I'm sure you can find more examples). But maybe that can be mitigated as well?

JadeNB · on April 26, 2023

I'm not an expert in this area, but, precisely for the reason you mention, I'd expect it's easier to solve a polynomial inequality over ℚ by solving it over ℝ and intersecting down to ℚ, rather than by working directly over ℚ.

macrolocal · on April 26, 2023

As long as you don’t bump into Gal(ℝ/ℚ) somewhere along the way.

ransac9000 · on April 25, 2023

It seems that you can reformulate inequalities into equalities and solve the system: https://arxiv.org/pdf/1603.01183.pdf

cevi · on April 26, 2023

For systems of polynomial inequalities, the appropriate tool is Cylindrical Algebraic Decomposition (this tool generalizes to systems of exponential inequalities as well).

vasvir · on April 25, 2023

thanks for writing this. I have read about 1/3 to 1/2 so far and it is very nicely written and easy to follow.

ransac9000 · on April 25, 2023

No problem, and thank you for the kind words!

jjtheblunt · on April 25, 2023

nicely done

ransac9000 · on April 25, 2023

thanks!

wenc · on April 25, 2023

I tried using Groebner bases in grad school to solve implicit polynomial equations (part of an optimization problem) but found quickly that it didn’t scale. I ran into trouble with a system of about 300 polynomials and this was with Maple which implemented state of the art algorithms at the time.

The computational complexity of Groebner is doubly exponential in the number of variables.

adamnemecek · on April 25, 2023

Groebner bases are insanely powerful https://arxiv.org/abs/math/9812097

I have been looking into them lately in the context of ML and I have a hunch that in the future, all ML will be running a Grobner basis algorithm on a large scale.

jjtheblunt · on April 25, 2023

Why?

(I know Gröbner bases and their history for 25 or so years, for context)

adamnemecek · on April 25, 2023

Hopf algebra subsumes like all all currently fashionable architectures (transformers, convnets, SSM, diffusion models) and then some. Hopf convolution is a surprisingly general concept, Groebner bases are the thing used to calculate the antipode.

Hopf algebra is tensor bialgebra, i.e. tensor with a built in feedback (which subsumes autodiff).

I have written a paper on this recently https://arxiv.org/abs/2302.01834

This paper is also good https://arxiv.org/abs/1206.3620

I have a discord channel https://discord.cofunctional.ai.

cscheid · on April 25, 2023

Interesting paper.

I have a question about the claim in 6.2 that attention matrices are SPD, if you don't mind my asking.

It seems to me that accepting the empirical result that the eigenvalues are positive isn't enough to get a Fourier Transform interpretation. Specifically, I don't understand the assumption that all attention matrices are symmetric. (I'm sure you know that positive eigenvalues are not enough by themselves, but for other folks reading, [[1 1/2] [1/3 1]] is a simple concrete example.)

Consider Fig. 17 here: https://lilianweng.github.io/posts/2018-06-24-attention/ (this is Fig.1 in Attention is all you need). I understand that you get symmetric attention matrices for the self-attention matrix in the input stream, as well as the masked attention matrix in the output stream (the first block). But I don't understand how you claim symmetry for the final attention mechanism that combines input and output.

And if you don't get symmetry, you don't get the Fourier Transform interpretation and all the nice algebra that follows.

adamnemecek · on April 25, 2023

I don’t mention Fourier transform.

cscheid · on April 25, 2023

Sorry. You do mention a linear systems response, and that's what I meant.

In that setting, the eigenvectors work as a generalized forward and inverse fourier transform, and the eigenvalues form the transfer function you allude to in the bold sentence

"The attention mechanism’s role is the same as that of a transfer function in a linear time-invariant system, namely it calculates the frequency response of the transformer model,"

Specifically, it seems to me that this requires a _symmetric_ attention matrix. Which you get from the self-attention mechanisms (two of the three places where they're used in transformers), but not all of them, notably not the one that combines the output of the first two attention mechanisms (one input, and one output)

adamnemecek · on April 25, 2023

I think that the magic comes from the antipode which makes things symmetric. The Ising model is somewhat similar.

Rereading the paper, quite a bit has changed in my understanding of all this. My conclusions still stand, but some of the reasoning needs to be explained better.

hgsgm · on April 25, 2023

You might the first person to understand and comment on the paper Adam has been shopping around HN for months!

umutisik · on April 25, 2023

Gröbner basis methods do not scale well as the number of variables increases. The computational complexity becomes too high. Numerical Algebraic Geometry methods give the correct answer in many cases where there is a practical reason to solve the equations, but not a provably correct answer.

mathisfun123 · on April 25, 2023

Got a pointer? Paper or code either works.

nsajko · on April 25, 2023

https://en.wikipedia.org/wiki/System_of_polynomial_equations...

https://en.wikipedia.org/wiki/Numerical_algebraic_geometry

Aardwolf · on April 25, 2023

Very well written and informative!

One random thing I wonder: in the lexicographic ordering, is there any reason why the convention is that x_2 < x_1, while x^2 > x^1? Where _ denotes subscript, ^ superscript exponents, in other words that x > y uses opposite of alphabetic ordering, while the exponents are done in non-opposite numeric order? I don't think the naming of the variables matters for the mathematical result (you can use any subscript or letter for any variable as long as the corresponding ones remain the same between the equations), so the convention x_1 < x_2 would work just as well, and be less confusing I think.

ransac9000 · on April 25, 2023

This is a good point. I think it's just due to convention --- people tend to put terms involving x only at the front, and in the textbook by Cox et. al. that I referenced they use this convention as well.

wizzwizz4 · on April 26, 2023

I find x₁ > x₂ for lexicographic ordering less confusing, because it works like the dictionary.

i2km · on April 26, 2023

Gröbner bases are powerful but do not scale.

In doctoral school I spent some time applying the state-of-the-art methods to trying to break lightweight symmetric ciphers. The idea was that the system of polynomials generated from a number of plaintext/ciphertext pairs might be solvable via Gröbner bases methods if the number of rounds of the cipher was low enough.

Quickly ran out of steam after a couple of rounds and ~200 polynomials or thereabouts (doubly exponential)

ur-whale · on April 25, 2023

Naive question, but gotta ask: can Gröbner basis get over the "in general, a polynomial of degree >=5 can't be solved algebraically" hump?

Given that the above is a theorem (from Galois IIRC) I'd assume the answer is no.

And given that any system of polynomial equations in multiple variables - even of low degree - eventually subsume to a univariate polynomial of very high degree , then ... what are Gröbner actually useful for?

jjtheblunt · on April 25, 2023

You know how someone walking down the street might give you a collection of vectors over a field, and ask questions about the subspace the vectors generate? You might make an orthonormal basis set of vectors which generate the same subspace, as a means to more easily answer questions about the original vectors.

Groebner bases play a role similar to a minimal orthonormal basis for a set of arbitrary vectors in a vector field, except the set of things to be linearly combined, and about which the linear combinations will be asked questions, are not tuples of floats (vectors) but instead are polynomials.

https://a.co/d/iuciOR5

is a famous book that's very nice (or was 25 years ago in earlier editions) discussing this topic and the various answers to your question.

_dain_ · on April 25, 2023

>You know how someone walking down the street might give you a collection of vectors over a field, and ask questions about the subspace the vectors generate?

they can be very insistent but legally you don't have to answer them.

_a_a_a_ · on April 25, 2023

> You know how someone walking down the street might give you a collection of vectors over a field, and ask questions about the subspace the vectors generate

what on earth...? I can't make sense of the last 2/3ds and with the first 1/3rd it is just weird. What does any of it mean anyway?

NtochkaNzvanova · on April 26, 2023

Most of these concepts (except Groebner bases) are basic stuff you would learn in a first course in linear algebra. If you don't know what they mean, this post is likely not relevant to you.

jjtheblunt · on April 26, 2023

A simple analogy would be that a groebner basis, for a set of given polynomials, is similar to an orthonormal basis for a set of vectors.

jjtheblunt · on April 26, 2023

( Well doesn’t happen quite like that usually, unless the Someone is your math phd research Prof. looking to generalize an idea. )

enedil · on April 26, 2023

Galois actually proved, that you cannot express the solutions as a set of certain expressions (composition of polynomials, division and taking roots). Using Gröbner basis, it is possible to construct an algorithm that computes results (now, I'm not sure if it's always possible, I'm not an expert. But this distinction is important nonetheless).

ransac9000 · on April 25, 2023

If you can't algebraically solve the univariate polynomial, you can still get some solutions to the system as long as you can obtain some solutions to the univariate polynomial numerically.

hgsgm · on April 25, 2023

Thorough introduction to computing with Grobner basis, using Python. Bonus: goofy ASCII art for surds.

https://mattpap.github.io/masters-thesis/html/src/groebner.h...

It's part of Mateusz Paprocki's Masters thesis about Sympy, which he co-created

jjtheblunt · on April 28, 2023

what's a surd?

teho98 · on April 26, 2023

Does anyone know how Gröbner bases compare to homotopy continuation methods for solving systems of polynomials?

https://www.juliahomotopycontinuation.org/

I haven’t been able to find much discussion about the specific trade offs in both, and how they compte from a practical perspective.

zhamisen · on April 30, 2023

Roughly speaking, Gröbner bases is used for symbolic computations while homotopy continuation is used for numeric computation of roots