

How to multiply numbers in a column in MySQL - physcab
http://lists.mysql.com/mysql/166184

======
joshu
switching to log-space is insanely useful for compounding returns.

inside mysql, i used it all the time.

for example, perhaps i want to get a sense of the distribution of users; i
have a table "users" with columns "user" and "count"

i could get a log-scaled histogram like so:

select pow(int(log(count)/log(10)),10) as bin, count(*) as ct from users

this would get a table like such (forgive my ascii art)

    
    
        bin | ct
        ----+---
        1   | 12
        10  | 27
        100 | 35
    

another thing you can do is us the order of magnitude with repeat(char, num)
to make a sort of visual bar chart.

that said, i'm sorta glad i don't have to use mysql professionally anymore.

------
physcab
I was surprised that MySQL lacked a Product() aggregate function with its
GROUP BY.

After much searching I came to this little nugget:

"exp(sum(log(coalesce( _the field you want to multiply_ ,1)))

The coalesce() function is there to guard against trying to calculate the
logarithm of a null value and may be optional depending on your
circumstances."

MySQL is full of hacks.

~~~
crux_
I'm torn as to whether to call that a 'hack' or not, since it's just a bit of
math. [ exp(log(x)) == x; log(ab) == log(a) + log(b) ]

~~~
kogir
It loses precision in scenarios where the desired aggregate would not. It's a
hack.

~~~
crux_
I'd be more inclined to call it a hack because it probably either crashes or
gives the wrong answer when it encounters a zero. ;)

------
jwr
This solution is not equivalent numerically to a multiplication and I hope
nobody uses this where precision is important.

Remember that floating point isn't the same as the math you were taught in
high school.

------
huntse
Relational databases really are for doing relational operations. If you're
using them for doing bits of arithmetic then you're going to come up against
all sorts of limitations. Trivially, if you use this computed value in a where
clause the query optimiser will have no choice but to do a full table scan. It
can't index by all possible computed values.

Horses for courses. Do the query in whatever programming language, then just
multiply the numbers together. I don't try to do left outer joins on my HP12-C
calculator either.

~~~
nimbix
I agree that using a precomputed value in a where clause is a good idea. But
it's very likely that computing that value inside a DB engine and fetching a
single value is going to be faster than fetching 50 values and calculating the
product in your app.

Also, consider the following case: UPDATE huge_table SET someval = (SELECT
PRODUCT(x) ...). Splitting this into 50 million queries wouldn't be a very
good solution.

------
kogir
Limitations like this are among the most trivial of the reasons I pay for MS
SQL. Even though no such built in aggregate is available, I'm free to write my
own in C/C++/.Net.

I'm pretty sure PostgreSQL lets you extend it however you want as well.

~~~
joshu
You can write plugins in mysql as well.

<http://dev.mysql.com/doc/refman/5.1/en/plugin-creating.html>

Please do not make things up and claim your imagination as truth.

~~~
nimbix
The Postgres way is much simpler:

    
    
        CREATE OR REPLACE FUNCTION product_sfunc(state NUMERIC, value NUMERIC) 
        RETURNS NUMERIC AS $$ 
        BEGIN
            IF value IS NOT NULL THEN
                IF state IS NULL THEN 
                    RETURN value;
                ELSE
                    RETURN state * value;
                END IF;
            END IF;
            RETURN state;
        END; $$ LANGUAGE plpgsql;
        
        CREATE AGGREGATE PRODUCT (NUMERIC) (
            SFUNC = product_sfunc,
            STYPE = NUMERIC
        );
    

This took only a couple of minutes to write and didn't require any extra
tools. Now I can use the PRODUCT() aggregate in queries, like this:

    
    
        SELECT SUM(n), PRODUCT(n) from generate_series(1, 5) as n;
         sum | product
        -----+---------
          15 |     120
    

In Postgres 8.4 it also works as a windowing function which means I can use it
to get a running product (is that the correct expression?):

    
    
        SELECT n, PRODUCT(n) OVER (ORDER BY n) from generate_series(1, 5) as n;
         n | product
        ---+---------
         1 |       1
         2 |       2
         3 |       6
         4 |      24
         5 |     120

~~~
joshu
You presented some made up limitation as fact; you said it CANNOT be extended.
Now you're trying to support your argument further by saying it's EASIER
elsewhere. You should have just said that. I hate it when the opinionated use
made up facts to justify their opinions. Please don't do that.

All these database engines are pretty weak. I prefer to use SAS for this kind
of thing (including seeing stuff from previous/future rows.) It may not be
client/server, but it is astoundingly fast for large datasets nonetheless.

~~~
kogir
The person to whom you replied did no such thing. For someone so caught up in
correctness, it's surprising you can't be bothered to read user names :)

------
annoyed
i think this originated from joe celko's sql for smarties:

[http://books.google.com/books?id=P1o4lxVnExsC&pg=PA340&#...</a>

------
mahmud
Is there any good reason why you want to do computation like this with a
database query, instead of, say, middle-ware?

For the product

    
    
      (reduce #'* (select 'value :from 'product :flatp t))
    
    

If you want the sum change #'* to #'+.

~~~
garethm
If you do it in the database, you don't have to shunt all the values across
the wire from the database server. If the middle tier is on a different
machine, this can make a significant difference to the time taken to do the
query.

~~~
mahmud
good point, but as engineering trade offs go; do you want to tie your
application's performance to one RDBMS' non-standard hacks; or do you want to
throw cheap hardware, ram, fat pipes and caches at the problem? I can scale my
lousy middleware-as-calculator solution with something as cheap as running
data consumers and producers on the same machine, or as "expensive" as a
gigabit switch and a fat cable. What would you gain from a beefy RDBMS that
you couldn't get done with sharded db wisely spread across machines?

