Hacker News new | past | comments | ask | show | jobs | submit login
Why LINQ beats SQL (linqpad.net)
98 points by douche on Feb 16, 2017 | hide | past | favorite | 137 comments

> And if you need to support older databases (prior to SQL Server 2005), it gets worse: [some bad SQL]

If you're stuck using a database version that first came out a decade ago, you've got bigger problems than legacy syntax for pagination.

> Not only is this complicated and messy, but it violates the DRY principle (Don't Repeat Yourself). Here's same query in LINQ. The gain in simplicity is clear: var thirdPage = query.Skip(20).Take(10);

If you're using OFFSET / LIMIT to do paging then you're doing it wrong. See here for the real way to handle paging for large result sets: http://use-the-index-luke.com/blog/2013-07/pagination-done-t...

Sometimes you have no choice. Like when an employer looks at the license cost of going from SQL Server 2005 to 2016 and figures that if it isn't broken, why fix it.

Great link, thanks. Is PostgreSQL still the only db supporting that?

The right charts on slide 45 says "X" for SQLite. That changed recently: https://www.sqlite.org/releaselog/3_15_0.html

DB2 LUW has also decent row-values support: http://use-the-index-luke.com/blog/2014-11/seven-surprising-...

Need to update the slides...

I know many are going to disagree but I have come to despise almost all SQL generators unless they are as close to 1-1 as possible (aka Lukas Eder's jOOQ is acceptable in my book).

Even more controversial... I don't want to make it any easier for developers to interact with the database unless it is absolutely correct and transparent. I know.. I know... how could I say such I thing! But you see I want every interaction with the our data repository... our precious data repository... that is usually the bottleneck... I want you to know WTF you are doing when you touch it.

We (my company and probably others) are past the days of monolithic rapid prototype apps with ORMs. Now you pick a database and you learn the hell out of it because you can stick to it. Back in the day one of the reasons ORMs were so popular because companies would have to ship on premise ware that would have to work with multiple databases. With the cloud that is no longer the case. However I still think a lot of .NET shops come from the on premise mentality. Thus performance is less of concern (speed of release is more desired).

For me there are two primary reasons for having an ORM or ORM-like framework

1. Compile time checking. I hate when I'm writing a straight SQL query in a string literal and misspell one of the column names, because everything will compile and unit tests will pass, and the mistake isn't caught until I run integration tests. I want a short edit-compile-debug cycle. jOOQ is a good example of something that helps here.

2. Arbitrary filters. Often I have to expose a REST API where the user can add an arbitrary number of filters to their query. You can't write SQL literals for these because there's a combinatorial explosion of possibilities. So you wind up dynamically generating SQL yourself by concatenating strings in the WHERE clause, hoping and praying you don't accidentally allow SQL injection. I want an ORM to deal with that shit for me.

3. Modularity and abstraction. Views only get you so much, because they don't let abstract away the table or columns that you are targeting. Also, I typically don't want my views to clutter the main namespace if I'm just trying to refactor some queries. I can use the "WITH" syntax in some cases, but bizarrely it comes with an unavoidable optimiser fence on most RDBMSs (i.e. your subquery will always be materialised), which severely limits its utility.

jOOQ helps with both 1 and 2. With jOOQ you can write something like (totally off-the-cuff code incoming, written outside of an IDE with only basic syntax highlighting, be warned, etc)

    List<Condition> conditions = new ArrayList<>();

    if (notNull(lessThan)) {

    if (notNull(greaterThan)) {

    if (notEmpty(name)) {

(This is obviously really contrived but I hope it illustrates the ideas)

We use constructions like this with jOOQ relatively frequently, especially for cases where we offer all sorts of arbitrary filter knobs (useful for internal tools especially).

In general, you can do some pretty fancy manipulation with `Condition` in jOOQ, like you can build a list of conditions and then call `DSL.or(conditionList)` to OR them all together. Stuff like that.

Source: use jOOQ in production code -- been very pleased so far, other than sometimes poor documentation

I much prefer:

    var query = from c in db.MyTable
                where lessThan    == null ? true : c.MyColumn < lessThan    
                where greaterThan == null ? true : c.MyColumn > greaterThan 
                where name        == null ? true : c.Name == name        
                select c;
Although IQueryable supports your approach also:

    var query = db.MyTable;

    if (lessThan != null) 
        query = query.Where(c => c.MyColumn < lessThan);

    if (greaterThan != null) 
        query = query.Where(c => c.MyColumn > greaterThan);

    if (!String.IsNullOrEmpty(name)) 
        query = query.Where(c => c.Name == name);

    var results = query.ToList();

> other than sometimes poor documentation

Thanks for your feedback. We're currently collecting API that is not documented well enough yet. Your feedback would be very welcome: https://github.com/jOOQ/jOOQ/issues/5816


there is at least one way to write plain sql and get compile time checking, with f#

> that is usually the bottleneck... I want you to know WTF you are doing when you touch it.

Why don't you hire a DBA and have all major queries go through them if it's that important instead of asking developers to master yet another thing because you got burned in the past?

I'm all for learning database stuff, but you and everyone else are pulling developers in all different directions by simultaneously claiming "YOU NEED TO KNOW THIS SHIT" about <THING THAT HURT YOU PREVIOUSLY>.

"you need to know this shit" is something I say often to developers displaying complete ignorance of SQL. I think if you're an embedded systems programmer or an games developer focusing on the rendering pipeline or something then sure be blissfully ignorant of SQL. If you are a web application developer, where 99.999% of web applications use a backing RDBMS and a huge part of the heavy lifting and most likely bottlenecks and nearly all performance tuning is in the RDBMS - then you need to know this shit.

Performance doesn't necessarily kill a product or service. And since they were too busy figuring out all of the odds and end of X database, the UX went to shit because they didn't learn about some quirk of Chrome, Firefox, Safari or something on mobile and suddenly a deployment breaks stuff for one of the browsers/platforms.

If you can't afford a DBA when DB performance is critical to your application, you probably don't have good QA for all of those platforms either. But suddenly your developer is supposed to stretch over all three areas?

When people talk about hilariously long laundry lists of technical requirements, it's an aggregation of demands like these over time that has inflated those requirements.

in my experience 99/100 major debilitating webapp performance issues are from SQL-oblivious developers making basic variations on the "n+1" problem, executing a thousand queries instead of one, failing to think in sets, treating the database like a dumb key-value store and paying extreme taxes on roundtrips and connection overheard. Bring a DBA into this situation, he might look at your queries, look at your indexes, and say "everything is fine". It's the developers who need to know how to use a database.

But how do you draw the line on required knowledge? At some point you need to say "This should be DBA territory", and that's the soft wall where responsibility gets thrown over to the other side. Do I spend time poking around at a problem I've got, or do I hand it off?

The thing is that soft wall is wayyyy too vague for anyone to prepare themselves before actually joining a team. I may not want to join your team if you require everyone to be a database expert because I don't like databases. But because the communication around this is so damned vague, there's no way to know before joining.

If you somehow have a SQL-oblivious developer, he should be identified early on because it's not compatible with the level of competence you've established for your business.

Why is this guy suddenly writing performance-critical queries for your webapp and not working with his strengths?

Does he have any strengths? No? Why was he hired? Shouldn't someone be reviewing this guy's code?

The SQL-oblivious developer is a perfectly natural occurrence from a time past when front and back end job postings were not wrapped into a full stack developer position for less pay than those positions combined. It's just that it's so easy to pile on requirements that it just outpaces some people.

Can we identify those people reliably? Yes? Congrats, you've probably got a way to actually test developer competence without invoking another 300-post threadnought on HN and you will likely be very, very rich.

But see, you can't go off studying every single thing mentioned in the context of "Things important for developers to know" even if you wanted to try and keep up. There have been tons and tons of blog posts and articles and angry posts about terrible situations created due to a lack of knowledge and everyone just says "Hey, you should know this, get to it". Even if I wanted to filter through all of those posts, how do I know who to listen to?

If you want to specialize in javascript and html that's fine. The middle/backend/API then ought to be developed by someone knowledgeable in SQL - but it's still a developer, not a DBA. DBAs are sysadmins not programmers. If you want to specialize in middle/backend development and think you should be able to defer all your SQL problems to a DBA then no your dead wrong. You simply cannot be a good backend developer for web applications utilizing RDBMSs and not know this shit.

If you do sane things with your DB then you really don't need to mess with all the edge case crap that breaks ORM's.

Great DBA's are rare, but vastly more important than most people think.

I constantly pick up the metaphorical phone to call the DBA then remember that I am the DBA...

I've found that specialization is an emergent property of teams and organizations. Each developer naturally has propensity for and interest in some areas over others. Imagine a completely flat company with twenty developers, where everyone is supposed to be a full-stack, you-need-to-know-everything-about-everything developer. Well, in even a short amount of time, you're naturally going to end up in a situation where some developers specialize in certain sets of frameworks and tools, and become the go-to experts in that.

So something like a DBA, i.e. an expert in your software's database, will emerge naturally over time on a product. In my current project, there are definitely some areas that I am the "go to" guy on, and likewise there are other people on my team that are the "go to" people for other areas. And yes, we even have two people who are especially knowledgeable about our backing datastore, and the library we use to interact with it.

That a DBA can make almost every query faster and every db schema better is often an unknown unknown for developers. Combine that with the need of having the DBA at hand all the time, because developers create new queries and possibly new tables or suboptimal solutions all the time. Small companies might get a DBA only when it's too late and they would need a major rewrite of some code and of the db schema. I've even seen companies with plenty of DBAs in Operations and none in development and no chances for them to affect the development process, both because the unknown unknown and because rigid compartmentalization between company departments.

SQL is itself an abstraction over the "language" of query execution plans. Why do you regard SQL as particularly more fundamental than any other abstraction?

I don't think it's conceptually right to regard LINQ as a "SQL generator". I think it's better to think of both LINQ and SQL as just different database control languages.

(It might be the case that behind the scenes LINQ is implemented by generating SQL, but that's an implementation detail. Just like Haskell is implemented by generating C behind the scenes but it's more reasonable to think of Haskell as its own separate programming language than as a "C generator")

> SQL is itself an abstraction over the "language" of query execution plans. Why do you regard SQL as particularly more fundamental than any other abstraction?

> I don't think it's conceptually right to regard LINQ as a "SQL generator". I think it's better to think of both LINQ and SQL as just different database control languages.

Thats the thing. There is nothing theoretical about this. It is about what is actually closer to the real thing which in this case is the data.

I agree LINQ is far easier to understand and is more productive but those benefit just don't outweigh my performance, safety, and concurrency concerns. Understanding the database along with its native SQL gets you that.

For others the productivity is worth it.

> (It might be the case that behind the scenes LINQ is implemented by generating SQL, but that's an implementation detail. Just like Haskell is implemented by generating C behind the scenes but it's more reasonable to think of Haskell as its own separate programming language than as a "C generator")

Doesn't GHC backend compile directly to assembly?

> It is about what is actually closer to the real thing which in this case is the data.

Why is SQL closer to the data?

> Understanding the database along with its native SQL gets you that.

My whole point is that SQL is not any more "native" than LINQ. The database takes SQL and computes an execution plan in its own internal representation. This can look totally different from the SQL as the other commentator pointed out. SQL does not in any way correspond to the "native" way the data is stored or how query results are computed.

> Doesn't GHC backend compile directly to assembly?

Maybe now but at some point it generated C.

> My whole point is that SQL is not any more "native" than LINQ. The database takes SQL and computes an execution plan in its own internal representation. This can look totally different from the SQL as the other commentator pointed out. SQL does not in any way correspond to the "native" way the data is stored or how query results are computed.

Yes please point a reference to me to a database that speaks LINQ or how LINQ directly generates to an execution plan (I admit I haven't touched MS SQL Server in some time).

In theory you are right but not in reality. To give another analog you still have to know Javascript to know TypeScript or Coffeescript (or whatever else in vogue) because WASM isn't a reality yet.

SQL is an abstraction over relational algebra. Relational algebra is, for the most part, easy to grasp. You can use it to define a query tree trivially.

Any tool that can map to that query tree has the same level of abstraction as SQL.

Repeat after me, SQL is nothing special. Its semantics are extremely limited. Nothing it does can't really be done by LINQ or a similar DSL that maps to relational algebra operators.

> SQL is itself an abstraction over the "language" of query execution plans.

I don't agree with this at all. SQL does not abstract a specific execution plan. The execution plan is calculated by the RDBMS to efficiently compile the results required by a given SQL statement. The plan depends on available indexes, table statistics, input parameters and so on. The execution plan might change from time to time as new indexes are built or as table statistics change. When you write an SQL statement, you are defining what you want and how the tables are related in your context. You are NOT telling the RDBMS what the actual execution plan should look like.

> When you write an SQL statement, you are defining what you want and how the tables are related in your context. You are NOT telling the RDBMS what the actual execution plan should look like.

Which is the exact same thing you do when you write a LINQ query against a database context...

Both are just languages that describe what you want to achieve, not how to achieve it.

How does this contradict my point?

C is an abstraction over machine language, in the same way; but so is Prolog.

C statements map to very specific ASM, you can look at a C statement and be pretty sure what the ASM will end up looking like.

SQL does not map any specific execution plan. The execution plan will be very different depending on external factors such as the contents of the tables and so on. Have you looked at some actual execution plans?

Related rhetorical question for you: What is the difference between a 3rd generation and a 4th generation programming language? (SQL is a 4th GL and C is a 3rd GL)

> C statements map to very specific ASM, you can look at a C statement and be pretty sure what the ASM will end up looking like.

This is totally not true unless you have optimizations turned off. Have you ever looked at what clang or gcc generates with -O2 or -O3 ?

Anyway, regardless of whether you agree with me on this point, I don't see how it relates to the general point I'm making. This seems like it's just disagreement over the definition of the word "abstraction". It doesn't change (and in fact, actually supports!) my main point, which was that SQL is not properly regarded as "native" to database engines in some special way that can't be true of LINQ or any other language.

You still seem to think that LINQ to SQL is an 'implementation detail' and it would be possible to do LINQ to DB Execution Plan directly.

This is just plain wrong and you are missing important details of how LINQ and SQL and RDBMSes work in the real world.

I can't see a reason why LINQ to DB Execution plan couldn't exist. If it's wrong, please tell us why.

Today's db optimizer builds a set of execution plans and chooses the best one based on "cost". Number of rows, memory access vs. disk access, and other things factor into this cost. Also, the higher the number of joins, the greater the number of plans. 7 joins would be 7! possible plans +. This has to happen fast, so db optimizer cannot generate every possible plan and pick the optimal one. There's a body of research to delve more into how it does this. The point is that the optimizer does real work, as fast as possible to generate the execution plan (they called this dynamic programming in one paper I read).

The LINQ interpreter on the client would have to retrieve the statistics from the server's tables and indexes of interest and perform the same operation as the optimizer (costing more network and some db cpu). Next it would have to send back the plan to the db executor in some way. This plan "marshalling" would cause more network traffic, requiring the executor to "unmarshall" the plan costing more CPU. This is less efficient than the current scheme.

Alternatively the LINQ language could be implemented on the database but with its own inherent difficulties, but in reverse.

Once a plan is cached and is reused, this inefficiency goes away to some degree. So there could be a possible mechanism for the server to send back to LINQ client a hash identifying the plan if it is cached and then have LINQ only send the hash with parameters to the DB on next execution. (You would have to see if this isn't covered by some patent of course!)

The constraint is the network connection. If the network connection between client and server were faster and bigger than the data bus on the computers, then it would change the equation significantly. But longer distances means slower communication all things being equal (lightspeed and all). So a network connection being better than the data bus would be inefficient and quickly remedied in a competitive marketplace.

How is implementing LINQ on the database any different implementing SQL on the database?

Linq is compiled once. The database will continually monitor performance of the query and recompile the execution plan as needed. Linq runs on the client machine. The query planner runs on the database machine. Many clients will connect to the same database server, and they may not even know beforehand what exactly the RDBMS is capable of, what indexes are available, what hardware the database server is running on, etc. Therefore LINQ can't compile an optimal execution plan before sending the request off to the database server.

In short - LINQ runs on the client machine, the execution plan happens on the database server.

Think of it this way: Lets say you have a database of clients and contacts. Lots of systems in your company connects to this database to access this data. Each of those systems will submit SQL in the form of 'select clientname from clients where id = 123' or whatever. Now lets say the client list grows and the old execution plan is not optimal any more. Our smart RDBMS can just dynamically fix the execution plan and performance goes up for every system accessing the database. If the RDBMS instead received a rigid execution plan, EVERY client system will need to recalculate the execution plan.

Also: lets say your client app connects to several different database servers. What's a good exection plan on one server it not going to be a good execution plan on another server, so LINQ would need to keep a list of database servers with table statistics, indexes, etc etc and continually monitor all of those for changes. It's massive duplication of work. It's much more efficient to have each database server look after its own execution plans.

Sorry,I should have been clearer. The GP post mentioned 2 ways to implement a LINQ to execution plan transform. The first way would be to do it on the client. This is problematic because the client doesn't have enough information to generate an efficient execution plan. That makes sense to me. The second way would be to do it on the server. This is the part I don't understand. Why can't we do it that way? This is how it is done for SQL.

Interesting discussion as it considers how SQL is implemented, what is an abstraction, etc.

On the topic of concept-rightness you quickly veer into a language game. gcc transforms c code into machine code (and may also output error messages, warnings and such). For some, this is not important, and may not be conceptually right. But if you are interested in say the ABI for instance, you do care.

On the point of "fundamentalness of abstraction", I interpret it as a measure of how many layers of abstraction (Ab) can be decomposed into the concrete thing of interest. LINQ generates another layer, that of SQL. If it generated an execution plan and sent that to the database optimizer, then I would say it does not generate another layer and would be equal in level of Ab to SQL.

But if LINQ can generate SQL that runs on the database, then it is also possible for the database to support LINQ natively. And maybe that was the original intent.

However, as it stands today LINQ to SQL costs CPU resources. So in adopting LINQ one of the tradeoffs you are adopting is CPU resources for developer use/benefit of LINQ.

>>you can look at a C statement and be pretty sure what the ASM will end up looking like.

Perhaps 20 years ago, perhaps more. Optimizing compilers have gone long way.

Compiling an execution plan is a significantly easier task than actually guessing how exactly an optimizing compiler would schedule the code, down to assembly. (and cheating w/ hints in SQL- [looking at Oracle]just reinforces that.)

Have you ever written or edited an execution plan directly?

They are all abstractions, but not at the same level of abstraction.

"SQL is itself an abstraction over the "language" of query execution plans."

This is completely wrong. SQL is a declarative language, it tells database what to do, not how to do it (aka query plan).

I'm not sure what you guys think an abstraction is, but a declarative language on top is just another kind of abstraction.

Yes, hence it's an abstraction.

I didn't say SQL is a way of writing query execution plans but it seems some people are interpreting my post that way.

I think the problem is that too many developers think of ORMs as a black box that lets them avoid SQL and the DB, rather than a powerful tool for working with them.

Vlad Mihalcea, a Hibernate developer/evangelist, has been doing amazing work improving the project's documentation and providing educational material about how to use it properly without sacrificing performance[1]. I can't recommend his book High-Performance Java Persistence enough: https://leanpub.com/high-performance-java-persistence

For me, his key lesson is that ORMs can be incredibly useful if, and only if, your understanding of SQL and DB performance is good enough that you can use them in a DB-friendly way. It's really simple things like expressing relationships between your domain entities such that the resulting tables can be easily joined. You still get the productivity that comes from skipping loads of JDBC boilerplate, but you don't feel like there's a load of magic between you and the DB.

He also pushes the idea that you should use Hibernate in combination with other DB tools: Liquibase/Flyway for migrations, and JOOQ for optimised queries with lots of joins,or DB-specific work. By having a decent ORM, SQL builder, and migration tool, you can use be flexible depending on the situation. Complex, performance-insensitive business logic can use the domain model from the ORM, whereas API-backing queries can be expressed a single, tightly constrained JOOQ call, and the actual creation and maintenance of the schema can be done via pure SQL scripts that are managed by your migration tool.

[1] https://vladmihalcea.com/

How many apps even deploy to the larger web and not just some intranet? It would surprise me if 99% of database backed applications weren't tiny-to-medium intranet and LOB type apps with very modest performance requirements. Hell, a pretty large number of Db apps are single user!

I half agree with you and half totally disagree with you.

The happy medium is a flexible ORM that automates the repetitive stuff but let's you call out to the higher level stuff...because if you don't have that your developers are going to end up building their own awful ORM-ish layer to automate the repetitive stuff anyway.

And I sort of half agree with you as well :) except I think the repetitive part is the mapping of the objects of which I think the automation should be of generating objects that you are going to map into.

In Java there are host of libraries that do this. I have even used Hibernate to do the mapping at times as I do think it does a pretty good job of that.

What I mean by mapping is minimal code generation (either host language or SQL) of Class <-> Entity and Row result <-> Object.

I think jOOQ approaches that medium you talk about fairly well btw. JDBI and MyBatis are also fairly good as well (both mapping technologies).

I agree with you. I haven't found an ORM that doesn't end up making a mess of my data structure eventually. Data is becoming one of the biggest, most important currencies of our times. People wanting to automate what their data looks like baffles me.

Personally I like creating an API of stored procedures. Make users that can _only_ execute those sprocs with zero dynamic SQL in them. Users can only do what you tell them they can do via your sproc API.

20 year old oracle dba best practices...

Lukas Eder here. Absolutely in favour of 1-1 SQL. In some cases jOOQ is great help, writing SQL queries with it. In other cases jOOQ is also great help accessing your stored procedures where the SQL queries are.

Here's a linq to Lukas Eder's JOOQ https://www.jooq.org/

I do like the syntax also though.

I love that mindset. By your logic we should all be coding in assembly.

It seems like an apples to oranges comparison - it is looking at the readability of code returning similar results, but it doesn't consider how these results are going to be generated, which is a major part of writing queries.

Do the LINQ examples, especially the more complex ones, result in the same or better execution plans? Maybe they do, but the article doesn't tell, and without it it's kind of inappropriate to make judgments or even recommendations.

In my experience, the SQL that LINQ generates typically is good, but LINQ may run part of the query or, rarely, even all of it client-side.

For example, if you want the top N items ordered by f(x), and LINQ doesn't know how to map f(x) to a SQL Server built-in, it has to pull the entire set over the wire, and find the top N client-side. Obviously, if your table is huge, performance won't be good.

That can even affect more than performance. For example, SQL Server orders GUIDs differently than .NET, and SQL Server's decimal isn't quite a IEEE float (https://msdn.microsoft.com/en-us/library/bb738633(v=vs.110)....)

Actually, i think it's more about:

- Prototype and use Linq first ( faster to develop and good enough queries in production)

- Static Typed LINQ reduces mistakes ( a domain object changes, an error appears in your code), definatly later on the road when changing the domain.

- Iterate and improve when i goes slow. Try to keep LINQ, but off course, SQL is faster.

This was my concern as well. When I saw the "query.Skip(20).Take(10)" syntax, I was wondering if the previous line that built the query variable downloaded all 15 bajillion rows in my Customers table of people whose names started with 'A' so it could get rows 21 through 30.

If it actually only ends up creating some complex sql behind the scenes to get those ten rows, how horribly convoluted did it become and wouldn't it be better to write a straight SQL query to do the same thing, even if it's a little more complicated than the pseudo code? I guess I prefer more control. If my SQL query turns out to be slow, then I can learn from that and fix it. If the generated query from LINQ is slow, I have no recourse over it.

> I was wondering if the previous line that built the query variable downloaded all 15 bajillion rows in my Customers table of people whose names started with 'A' so it could get rows 21 through 30.

That would be a silly thing to do. LINQ is made to be lazy. At worst it would pull the first thirty rows, twenty in Skip and ten in Take.

I was pretty bewildered by the convoluted way they posited for selecting a subrange of rows, but then I googled around and learned that LIMIT is a nonstandard MySQL extension. Huh.

(Apparently, while LIMIT n,k is MySQL-only, MySQL, PostgreSQL and sqlite all support LIMIT k OFFSET n. So I assume this blog post is only really relevant for people who live in a Microsoft ecosystem?)

It took a while, but SQL Server 2012 introduced OFFSET/FETCH:

    SELECT * FROM items
    ORDER BY key

Was talking to another person at work who is non technical but starting to dip his toes in the waters. He asked how to do just this and I had to go on a fairly long explanation, prompted by repeated questions, into the SQL standard, how that want added until near 3 decades after SQL's invention, vendor specific extensions and syntaxes etc.

It's a pretty bizarre situation.

Part of it may be due to a deep problem with OFFSET in that although it can be convenient, there is no reliably efficient way to implement it. In fact, the only way it can possibly be efficient is if your data is fully ordered, and that ordering is indexed. As a result, I find it's better to just stick to the following pattern:

    select * from data where data.indexed_column > last_seen order by data.indexed_column desc limit 100

"SQL is a very old language—invented in 1974."

Not a good start. SQL is still in widespread use 43 years later because of how good it is. It's not a legacy weighing us down. Rather it has consistently proven its usefulness over and over again. Some of the constructs can feel awkward to be sure. But part of our impression of awkwardness is really just due to SQL's declarative nature, which makes the structure of queries look a lot different than the procedural languages we spend most of our time in.

But it's not clear to me from this article that LINQ is even anything different that what already exists in most every modern application platform. It provides a cleaner syntax for some subset of common query patterns. Okay. There's a ton of SQL generators out there. Is LINQ doing something _else_ for us? This article doesn't say. I don't know, and I didn't learn that from this article.

If you use C# or VB to write apps against relational databases, then by all means, use this to write your SQL. But don't pretend the SQL isn't there or isn't important. LINQ just places you one more potentially leaky abstraction away from your dataset.

> SQL is still in widespread use 43 years later because of how good it is. It's not a legacy weighing us down.

I'm rather dubious of this claim. This is kind of like claiming the reason we use Javascript is because of how good it is. The awkwardness of SQL is not because it is declarative, otherwise LINQ would have the same awkwardness because it is also declarative. I think the real reason SQL is because database vendors have not made an effort to add support for other languages.


It's a language that can barely express subqueries and query aliases.

Every ORM has shorthand syntax for expressing filters and order by statements that's way better than the SQL equivalents.

It's barely composable. Do you want to build a query dynamically based on a set of conditions. Tough luck, use a query builder that's prettier anyway.

Its syntax is not amenable to autocompletion. Join syntax is ugly on the eyes and hard to indent.

Seriously, let's kill the love for SQL. I'd rather read queries written in EF Codd's original relational algebra syntax (https://en.wikipedia.org/wiki/Relational_algebra) rather than embedding the abortion that is SQL in my code.

Only it doesn't. The examples given _compile_ to SQL. So you're not comparing LINQ with SQL but LINQ-to-SQL (or the entity stuff) with SqlCommand.

SQL's a pretty grim language, but it's the only language the database understands as a first-class citizen. Until someone designs a new foundation (like WebASM is doing), the rest is just syntactic sugar.

> SQL's a pretty grim language, but it's the only language the database understands as a first-class citizen. Until someone designs a new foundation (like WebASM is doing), the rest is just syntactic sugar.

SQL is an ugly COBOLesque veneer over an elegant relational calculus core.

But the "new foundation" you speak of already exists, and has for decades. Competing query languages based on relational calculus are less ugly – QUEL, D, etc.

Sadly, those better looking query languages have failed in the marketplace–ugly SQL won.

I have looked at D, and read some of CJ Date's books. As far as I can tell, D doesn't solve any of the actual problems with SQL, like composibility and code reuse, and just solves Date's imagined problems about not following relational theory close enough.

I'm kind of curious why there aren't wrappers / APIs / libraries for these other query languages that convert them to SQL at execution time. Arguably that is what LINQ is anyway..

Perhaps these options exist and I've just never encountered them? It's surprising that what I most often seem to encounter in casual discussion is this sophie's choice dichotomy between using SQL directly versus an ORM.

Yeah, in my db class at Uni we learnt QUEL before we touched on SQL.

It was clear then that QUEL was better in a number of ways.


I believe the first thing to understand about SQL is that everything is a table (aka relation). Once one understands that, the idea of modelling entities as relations and relationships as references starts making sense. Also, the tenets of normalization become understandable on an intuitive level.

Sadly, many IT professionals lack the intuition on this.

SQL is a highly sophisticated language since many years. Having done insane amounts of data prepping and validation for analytics, I will attest to this.

However, if I am writing software in .NET, I would trade away SQL for LINQ any day of the week. (Preferrably by means of some sort of intuitive DAL.)

This may just be because I'm not a database expert, but it seems weird to me that after all these years Microsoft hasn't written a native LINQ driver for SQL Server.

It's funny how async/await has influenced so many other languages but LINQ, that in my opinion is much more interesting (specially the DB stuff) for solving down to earth problems is sistematically ignored by other languages (Python, Java, JavaScript, etc...)

I guess you can argue that Haskell's do notation is exactly what LINQ (the grammar) is. F#'s computation expressions is a super-charged LINQ also.

I would like to see C#'s LINQ get an update. It seems to have been left to rot since Eric Meijer left the team. It doesn't even support the new values tuples where it's been fitted to every other part of the language:

    from (x,y) in Some((1,2))   // (x,y) will error
    select x + y;
The from x in y, let x = y, where x, select x, and, join x in y on a equals b, could and should be extended. Even better would be to allow custom operators like F#'s computation expressions.

To be fair, that feature is in the next version of C#; not release yet.

For me, LINQ is fundamentally list comprehensions on steroids. Way better than Pythons, about the same as using list functions in Haskell or F#, Java's finally caught up with its stream library.

The LINQ to SQL stuff annoys me, because while it works, it's a painful disaster to implement yourself against your own data stores. Believe me, I've tried.

In the case of Python and Java, I think it's because generator expressions and Streams, respectively, accomplish much the same thing.

Honestly, if you've seen what LINQ compiles to, it looks an awful lot like a Java 8 Stream.

Using Wikipedia's example of LINQ translation (https://en.wikipedia.org/wiki/Language_Integrated_Query#Lang...):

Written LINQ is:

    var results = from c in SomeCollection
                  where c.SomeProperty < 10
                  select new {c.SomeProperty, c.OtherProperty};
It compiles to:

    var results =
            .Where(c => c.SomeProperty < 10)
            .Select(c => new {c.SomeProperty, c.OtherProperty});
And in Java 8:

    Stream<> results = someCollection.stream()
                                     .filter(c -> c.getSomeProperty() < 10)
                                     .map(c -> new AbstractMap.SimpleEntry<>(c.getSomeProperty(), c.getOtherProperty()));
(of course, iterating over them is different; you just use a for loop in C#, but in Java you have to either use .forEach() or .collect() to a collection and then for over that)

I think Streams have more LINQ in them than most people think.

Disclaimer: it's been about a year since I've written Java 8, and this is off the top of my head (and the last time I used it, my employer's codebase had a class for pairs that was better than just SimpleEntry), so the code could be wrong.

Edit: and just for completeness... the same in Python generator expressions:

    results = ((c.some_property, c.other_property)
               for c in some_collection
               if c.some_property < 10)
Not very LINQ-like, but it serves the same purpose.

The thing you're missing is the way that LINQ creates an anonymous type, which is used to close over the values in the expression:

    var results = from a in x
                  from b in y
                  from c in z
                  select a * b * c;
Is very much more attractive than:

    var results = x.SelectMany(a => y.SelectMany(b => z.Select(c => a * b * c)));
If you use LINQ for more than just SQL queries (for monadic types like Option, Either, etc.) then these types of expression are commonplace. I'd certainly rather use C#'s LINQ grammar over its fluent API.

What you miss is that with LINQ you can extract the expression tree : https://msdn.microsoft.com/fr-fr/library/mt654263.aspx This is what is used to generate the SQL.

In the real world, MOST people are using the "it compiles to" version directly in the code. I certainly don't write the "from x in y select" version, except in the case of multiple hairy joins. Even those have become second nature at this point.

It's just more readable, and turns it into an object pipeline/functional programming style instead. The Java 8 streams were pretty much directly taken from LINQ and given their traditional functional names.

> The Java 8 streams were pretty much directly taken from LINQ and given their traditional functional names.

That was my point. I was disagreeing with olmo's suggestion that Java "systematically ignored" LINQ.

Ah okay, I misunderstood. Indeed.

> In the real world, MOST people are using the "it compiles to" version directly in the code

What's your source for this claim? Because anecdotally I don't see that at all.

Our codebase at work has hundreds if not thousands of LINQ queries, mostly using method syntax. I generally prefer it over query syntax except in a few cases, such when there's multiple joins or subqueries, or the when the statement is more than a few dozen lines. Occasionally a very complex LINQ statement requires using a `let` declaration which also is easier with query syntax. Luckily Resharper has an action to convert back and forth on the fly, so it's easy to use that to switch over when necessary.

I also don't use LINQ to SQL at all. It's all on in-memory stuff. I pretty much never write code like this:

    var result = new List<string>();
    foreach (var item in input) 
    return result; 
instead, using LINQ:

    return items.Select(x => x.Value);
The real power comes when you start mixing conditions:

    return items
        .Where(x => IsValidKey(x.Key))
        .Select(x => x.Value);
Or doing quick checks:

   if (items.Any(x => x == null || x.SomeValue == null)) 
       throw new InvalidArgumentException(nameof(items));

And now, with C#6 that last bit becomes:

if(items.Any(x => x?.SomeValue == null))

I think it depends on what the Linq to is. I find the sql like proper linq form only natural when databases are involved (and just barely).

Most uses of where/order/select in C# I'm pretty sure is linq to objects, not sql. I very rarely see the linq proper form in any C# code neither in OSS or in my dat job.

I never use the query syntax, or ever see it in any of the books or blogs I read, to the point that it threw me for a loop a little bit when I was studying for the C# certification exam, to see how emphasized it was. I learned that style long ago when I first started, but the fluent, extension method style is easier and less ugly, IMO, particularly when you start getting into more complex queries, or LINQ-to-SQL, where you end up wrapping your multi-line LINQ query in parentheses and materializing it by calling ToList() or ToDictionary() at the end.


Closest thing we have to real data, but it's been my experience in OSS and in professional life.

.net lambdas are in my opinion cleaner than linq syntax. java 8 implemented lambdas following .net success.

Linq is used pretty loosely to include also the equivalent form so

       .Select(d => d.Id)
       .Where(d => d.Age > 3)
would be considered just as much "linq" by most c# devs even though this isn't the query language integrated - simply because it's the same exact thing as the regular linq code.

I too find the "real" linq form mostly distracting.

The weird SQL-ish query syntax is an abomination. But using the LINQ extension methods, making use of lambdas as you say, is gold.

Most of it boils down to plain old functional map/fold/filter/zip constructs under different names.

Array.reduce(), Array.filter(), Array.map(), in JavaScript are pretty LINQ-ish if you ask me!

The thing with LINQ is that it can represent itself (reified or 'quoted code), so you can transform it into SQL, GPU instructions, whatever. This isn't a new idea, and F# had it before C# did (and LISP had it before, and others).

map/filter/reduce are just basic functional programming elements.

They are nothing even remotely likely linq. Those functions return the resulting arrays. Linq expressions are not executed until enumerated.

This allows you to compose linq expressions without computing intermediate results.

Sounds like Java 8 Streams, then.

Streams bear a pretty strong resemblance to translated LINQ.

Elixir's Ecto is very similar to this.


I couldn't help by roll my eyes at that title. ORMs and the like are great, but they still end up using SQL under the hood. LINQ is by no means the most aesthetic ORM, either (and half of the article is about much better LINQ looks than SQL).

The headlines may as well read: "Why [interpreted language] beats assembly".

Someone seems to write one of these every few years about whatever the flavor of the month is. I recall reading one from sometime in the early 2000s about Hibernate that read the same way. And another one for one of the Perl ORMs before that.

Yet somehow, dirty, ugly, hard-to-write SQL is still here while ORMs keep coming and going. And I have yet to see one without an escape hatch allowing raw SQL.

>"And I have yet to see one without an escape hatch allowing raw SQL."

I don't know where you've been looking, but from what I've seen it's an easy to find feature. To give two examples:



Unless my reading ability is broken, we seem to be agreeing. I said I have never seen one that doesn't have one.

My point being that ORMs can't fully replace SQL, and the proof is they all provide an escape hatch.

They provide an escape hatch because the language has support for features such as temporary tables, native DB functions and transaction control that are nonstandard and have a hundred gotchas, and are therefore not amenable to implement in an ORM.

I have 30k lines of business logic written for Django that does not use that hatch even once.

Again, I see point of disagreement, other than an odd bit of defensiveness about ORMs.

LINQ isn't an ORM though, is it? At least not in the traditional sense.

LINQ is a set of language features and libraries that help enable better ORMs, among other things.

LINQ to SQL and Entity Framework are examples of ORMs that use the LINQ features

It's really an EDSL

If the title says some piece of tech "beats" some other piece of tech can we just all act like proper engineers and not post it here at all and save us all some time?

I think sometimes it's worth looking at the usernames of who posts and consider whether you want to contribute to the thread.

Judging the username, or the person behind the username?

I agree with the article and try to use write my queries with Linq instead of SQL almost always.

The article says to avoid using Linq for bulk inserts, but for years I've been using an extension method to translate Linq to a bulk insert and it works fine. I forget where I found it.

    public static class DataContextExtension
        public static void BulkInsertAll<T>(this DataContext dc, IEnumerable<T> entities)
            using (var conn = new SqlConnection(dc.Connection.ConnectionString))

                Type t = typeof(T);

                var tableAttribute = (TableAttribute)t.GetCustomAttributes(
                    typeof(TableAttribute), false).Single();
                var bulkCopy = new SqlBulkCopy(conn) 
                    BulkCopyTimeout = 1200,
                    DestinationTableName = tableAttribute.Name

                var properties = t.GetProperties().Where(EventTypeFilter).ToArray();
                var table = new DataTable();

                foreach (var property in properties)
                    Type propertyType = property.PropertyType;
                    if (propertyType.IsGenericType &&
                        propertyType.GetGenericTypeDefinition() == typeof(Nullable<>))
                        propertyType = Nullable.GetUnderlyingType(propertyType);

                    table.Columns.Add(new DataColumn(property.Name, propertyType));

                foreach (var entity in entities)
                        property => property.GetValue(entity, null) ?? DBNull.Value


        private static bool EventTypeFilter(System.Reflection.PropertyInfo p)
            var attribute = Attribute.GetCustomAttribute(p,
                typeof(AssociationAttribute)) as AssociationAttribute;

            if (attribute == null) return true;
            if (attribute.IsForeignKey == false) return true;

            return false;

I'm actually doing something else when updating a webshop from a remote source..

I convert all objects that i import to a big SQL Query.

Something like

First query: UPDATE table SET active=0;

Second ( big ) query: UPDATE table SET properties=values IF ROWCOUNT =0 INSER INTO table(properties)VALUES(values)

What i mostly like about LINQ is IQueryable. It's awesome!

I just design the IQueryable with if's, switches, ... and when i need to execute it. I use .ToList()

The IQueryable is then translated to SQL and executed it.

Follow up on my usage on Linq ( fyi)

Now i design most of my pages through 1 service/method with a lot of if's and else and re-use that method all the time

For example, a ecommerce-site i made has one function for querying the products, paging, searching, sorting by X, ...). Although, it get's rather complex lately, i wouldn't have it any other way. ( with complex, i mean i now also grouping them by category and category code, a category order number, a product order number, by product title ( varying by the current language), ...

And yes, it' seperated in my service layer ;) and no, i won't change it because this is the fastest way untill now.

Alternatively, you could use Extensions on IQueryable and IOrderedQueryable. Then you do something like db.Products.FilterBySearchTerm(searchTerm).OrderAllBy(sortPropertyName);

But i like ProductHelper.GetProducts(filterWithSettings,SortOnSettings,GroupBySettings); more, mostly because it's easier when i need to customize all the methods because of some "sudden" Change Request that changes all the internal workings :(

Also, if you are new to Linq To Sql. Don't forget DbFunctions / SqlFunctions for built-in SQL Functionality ( depending the EF version you are using). Eg. The LIKE operator has been added in the new EF.

Also, you can always using DataContext.Database.Sql() for using standard SQL.


Pro Tip: DbContext is already a Repository Pattern if you are using DDD !


PS. If you don't know LINQ. It generates a long SQL query and it's a bit slow, BUT it's very productive in development . Here's an example with both Generated SQL (commented between /* */ ) and Linq in one of my subbmitted issues on EF : https://github.com/NicoJuicy/EF-ContainsAndStartsWith/blob/m...

I like LINQ: Develop fast and improve when something executes slow :)

And then time goes by some guy comes along rips out the LINQ and fixes all that so it's performant under load with hand crafted sql queries.

And then you have no more errors in the VS Error Window ( LINQ is static typed), when someone changes the domain layer and everything goes to * when going into production and now you have to manually check all strings if they are adjusted..

As said before, change to pure SQL WHEN something goes too slow. You don't have to change everything. It will bite you in the * later ;)

Gotta love static typed :)

I currently use a bit of LINQ, SQLAlchemy, Blaze/Odo, and a few other tools, depending on the project. However, primarily I'm still writing SQL in 2017 despite want of a prettier alternative. Here's the main reasons:

* In pretty much every project I've ever worked on, there's a small, un-standardised set of database functions I rely on. I need to parse strings and dates, round to the start or end of the month or week, group and categorise values in order to build a dimension table. I've never found a SQL generator that can provide this, although I believe SQLAlchemy Core folks are working on it.

* Planners are still not so good that you can ignore them altogether. I often rely on EXPLAIN to see why things are taking seconds instead of milliseconds, and I often end up looking for ways to trick the planner. If I'm using a SQL generator, I'm now trying to trick both the planner and the generator, neither of which I have direct control over.

* If you're building a lot of complicated queries, setting up indexes and views, and so on, it's hard to get past the advantages of an IDE. Currently, the best tools are pretty much all database specific. Even just being able to quickly see your data in a tabular format is huge.

To your first point, LINQ-to-SQL can bind database functions for use directly in LINQ queries. I've used this first hand for parsing comma delimited string parameters into table values and joining from there in LINQ.

I haven't used entity framework or any newer tech beyond LINQ to SQL but I'd have to believe such a trivial feature is present in them.

You are right, and just about every ORM product gives you some way to get at raw database functions. It would be great, however, if they also offered a small RDBMS-agnostic library of functionality that would generate the correct database functions.

The outer join example is not what I'd want LINQ to do. Most of the time it would be preferable to send two queries, one to retreive customers, the other to retreive high value purchases, then do the "joining" locally.

Otherwise we're potentially fetching the customer data N times per customer (depending on the average number of high value purchases)

Sigh. It's hard to know where to begin critiquing this.

I guess the TL;DR would be something like: "For constrained but not atypical data models (no N:M relationships, foreign keys explicit in the schema, etc), it's possible to have a more concise syntax than SQL, with less impedance mismatch."

But even that underwhelming statement is generous.

In the 1980s, the object database claim to fame was less impedance mismatch, and ODI's "coding by deletion", but at least ODBs coupled that with claims to superior performance for network data models.

Maybe it's just the grandiose title "LINQ beats SQL" that's so annoying. Simpler syntax would certainly be welcome for the relatively rare cases of humans generating SQL (I think most SQL is machine generated these days)... but only with a host of etceteris parabis conditions, prominently including performance.

In the 70s, when I was all bushy-tailed, I cornered Gerry Sussman at MIT to tell him about "this new relational data model and SQL language". About two minutes into the conversation I saw him looking at me like "God, why have you chosen to torture me with this idiot."

Or maybe it was the look you have when you are SURE that someone is telling you an elaborate joke and suddenly realize that no punch line is coming.

LINQ: I'm looking at YOU :-/

In most of the examples, it will still execute the more complex SQL against the database... however, it's somewhat easier to reason about.

That said, I tend to prefer to use the collection extension methods, over the LINQ syntax, which make more sense in my mind.

There will always be 1-2 queries that can be proven to work better in an alternative query language, other than SQL. For instance, I can surely find a couple of awesome graph queries in the Cypher query language (Neo4j). Or some JSON queries in N1QL (CouchBase). Back in the days, they tried to advocate that QUEL was a BetterThing™. Where's QUEL now?

Same will be said of LINQ in a decade. Whereas SQL will still be there.

SO LINQ is more terse but there were no performance metrics given. Given the same RDBMS with proper indexes, statistics, etc., all functioning properly which is quicker? I think it'd be murky if LINQ just pulls raw data to the client and then does the transformation there vs. on the server with SQL.

There is a bit of an art to writing LINQ queries, but if you know what you're doing (i.e. spend a little time experimenting, with the traces turned on in EF or Linq to SQL that output the generated SQL, or a good profiler), you can write queries that are exactly the same as hand-tuned SQL. There's a little reflection overhead mapping results to types, but, you either make use of that, or have to write all that boilerplate yourself, so that's the tradeoff.

Of course, if you don't know what you're doing, you can write some monstrous Linq queries that will be dog slow and exhibit all the worst n+1 behavior, pretty easily.

I guess not many came across this guide.

For LINQ to other languages: Swift Java Kotlin Clojure Dart Elixir


So I'll admit I'm not at all familiar with LINQ and MSSQL, but from reading this, it looks like LINQ is just an ORM?

And if that's the case, then of course it is easier and looks nicer than SQL -- that's kind of the point of an ORM.

But I could show you a whole bunch of articles about Python and Ruby and Go ORMs too.

Seems like an Apples to Oranges comparison.

It's not an ORM, more like an alternate way to write queries that's kind of embedded in the language itself (as opposed to SQL where you're putting together a string). If you're familiar with the Rails world, I'd say it's more like "Arel" than "ActiveRecord".

But Linq literally generates SQL before sending it to the server:


Doesn't that make it an ORM?

That link is about LINQ to SQL, not LINQ proper

* LINQ is a set of language features and libraries that help enable better ORMs, among other things

* LINQ to SQL and Entity Framework are examples of ORMs that use the LINQ features

No, because you can use linq without even having a server or even a database. ORMs are one application of linq, but it's a more general concept.

As I understand, it's ORM, but without 'mapper'. Every ORM has querying DSL, some of them are not producing objects at all. I think LINQ is that part of ORM. Some ORMs have poor querying ability: Rails, Django (more OOP approach), but some have powerful: Sequel, Sqlalchemy (more data processing approach).

For example, that's Sequel querying DSL: http://sequel.jeremyevans.net/rdoc/files/doc/querying_rdoc.h...

And I think LINQ is "language integrated" because C#, like Java, has poor DSL building possibilities. So they're added ORM-specific syntax right into language.

But LINQ is too slow comparing to vanilla SQL.

There's one BIG problem with LINQ. I'll agree that the syntax is marvelous, a game changer. The problem is performance.

It's gotten slightly better over the years but compared to an ORM with less abstraction like Hibernate it's still dog slow. Every MS project I've worked on was mostly LINQ...then a folder called "dirty SQL" for the heavy stuff.

I'm not sure if it's due to the highly abstracted nature or just not making performance a priority but in my experience sane Hibernate queries are about 1/2 the speed of native SQL and LINQ is closer to 1/50.

I hope and pray they can make the performance at least within an order of magnitude of raw SQL or even Hibernate so I can say goodbye to SQL forever.

At a previous job I worked hard to prove that I could make LINQ queries perform within small tolerance of the window of hand-written SQL alternatives.

Performance tuning LINQ is a science that I think escapes far too many people. Often it seemed to simply be mistakes that they wouldn't make in hand-written SQL, but they were writing LINQ too much like C# instead of enough like SQL.

Use LINQ for development and production, when you need performance use Dapper. It shouldn' be too hard if you follow the DDD pattern.

You can always use DataContext.Database.SQL(query) ofc

> You can always use DataContext.Database.SQL(query) ofc

that's pretty much our MO :) . Never used Dapper, how reliable is it? I ask because we use StackExchanges' Redis client for C# and it mysteriously crashes even after untold hours of debugging

Actually, haven't had any issues with it. So, it's safe to say that i find it very stable.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact