Hacker News new | past | comments | ask | show | jobs | submit login
Froid: Optimization of Imperative Programs in a Relational Database [pdf] (vldb.org)
64 points by rodionos 3 months ago | hide | past | web | favorite | 12 comments

I am a co-author of the Froid paper, and am around if people have any questions/comments/feedback.

Froid is now available as a feature of SQL Server 2019 preview. The feature is called "Scalar UDF Inlining" https://blogs.msdn.microsoft.com/sqlserverstorageengine/2018...

Available to try out for free here: https://www.microsoft.com/en-us/sql-server/sql-server-2019

I've only read part of it, but it seems great so far! I always appreciate the clarity and practicality y'all at the JGL take.

I'm amazed that the implementation was under 1500 LOC! Was that the research prototype or the shipped preview?

Congratulations on the VLDB paper! Hopefully I'll come say "hi" in LA :)

Thank you.

The shipped preview has only a bit more than 1500LOC.

The VLDB paper was presented at Rio in Aug this year already, but I'll try to come over to LA anyways :)

Karthik, I'm no Spark expert but almost all advice I read is to avoid UDFs if at all possible. Examples below:

- https://medium.com/teads-engineering/spark-performance-tunin... - https://www.inovex.de/blog/efficient-udafs-with-pyspark/

Thank you for those pointers.

There are definitely some differences between the kind of UDFs that Spark supports and the kind that Froid handles. For one, Spark UDFs cannot invoke a Spark SQL query in their definition AFAIK, whereas TSQL functions can. But still, some techniques might be applicable. Definitely worth digging further!

Doh! Guess I should've checked. I didn't make it to Rio last year... Figured I was gonna miss a bunch of good stuff.

Thank you for the paper - it is well-written and succinct. Karhik, do you think this approach can be applied to Apache Spark as well (given its well-known slowness with UDFs)?

Thank you. Conceptually the ideas behind Froid follow from relational algebra so it can be applied to other relational engines as well. However, the details still need to be figured before making any concrete statement.

If you could share any pointers about UDFs and their performance problems in Spark, I would love to investigate more.

You might want to check out this related work: http://casper.uwplse.org

Thank you. Casper is very interesting work, and I am aware of it. Program synthesis offers an alternative approach to such problems, with different trade offs and characteristics.

The paper includes a brief discussion on synthesis-based techniques, and the reasoning behind Froid's design choices.

Why does the first example return price as a char? Looking forward to reading the paper fully. I just scanned it.

It returns a formatted string including the price and the currency code. Eg: "5000 USD".

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact