Hacker News new | comments | show | ask | jobs | submit login
Sparse matrix multiplication using SQL (notes.mindprince.in)
22 points by mindprince on June 30, 2013 | hide | past | web | favorite | 7 comments



Sparse matrix multiplication using SQL was also a homework assignment in the latest coursera course "Introduction to Data Science" by Bill Howe


Correct me if I'm wrong, but the only reason the matrix being sparse is mentioned is because the author stuck a bunch of 0s in the matrices... this solution should work regardless of the sparsity.


Not sure if I completely understand what you mean, but the benefit of a sparse matrix is that zero-valued elements are made implicit by the absence of that (row, column) relation in the database. The drawback is that you need to store a (row, column) for every non-zero-value.

It would be wasteful to use this approach for non-sparse matrices, but it would still "work".


Any idea how fast this is compared to java / python ?


Depends on the application really. The operation itself would surely be slower than a specialised mmul, but there can be big advantages to keeping it inside your database if you can phrase most of your problems in terms of simple linear algebra operations.

In other words, numpy is damn-fine with matrices, but if you can replace DBMS->network->ORM->numpy->ORM->network->DBMS with just some SQL, it's pretty clear which will be faster.

On the other hand, if your data isn't already in an RDBMS in matrix-like tables, there is little point in moving it to one just so you can use tricks like the one mentioned in this article.


It's not a speed optimization. It just allows you to process a much larger dataset than you can fit in main memory. It will be slow because it's hard disk I/O, but you can do a sparse matrix multiplication on terabytes worth of data in a database with SQL. You couldn't do that in Python or Java without running out of memory, unless you used a MapReduce job distributed to many worker machines--and then you'd get a bill for the EC2 time or whatever.


Very fast, and very slow, very rarely - the same




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: