Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> What should we make of it?

People are idiots! (I am sorry to say that, I don't really mean it, I empathize, everyone sometimes is.)

Yes, dataframe is pretty much just a table. (And yes, GraphQL is a poor reinvention of SQL.) However, to be fair, there are different considerations. Database needs to know things like storage constraints and foreign keys (so you have many different column types), when you're doing just analytics (i.e. pandas), you pretty much only need two types - number and (trimmed) string (and sometimes a datetime, but that's just conveniently formatted number). (I think SAS got that right.)

Anyway, I think the way out of this mess is to have a functional, Haskell-like language for processing tables (not necessarily Turing complete) that would subsume declarativness of SQL, and that could be compiled into different targets. The language would basically specify some (limited, so recursion in that language would not be allowed) processing of tables (of known types), and you could then compose these processing pieces, or integrate them with custom processing in other language.

I understand why people hate SQL, it is on some level hard to write and compose. I think a correctly designed functional language would help here. Or maybe just tear LINQ out of the .NET ecosystem.




In the re-inventing SQL department, I'd take a look at EdgeQL/EdgeDB. It's not perfect, but much closer to a functional language and composes well.

I would not consider GraphQL a poor reinvention of SQL, since its niche of decoupling and simplifying untrusted high latency clients is too different for flexible queries created by a trusted server. It competes with REST and RPC, not SQL.

GraphQL's native operations are limited to following a foreign-key link, which has predictable performance (quasi linear in the number of objects in the response) and selecting a subset of fields (reduce the response size and enable field level deprecation and usage tracking). These limitations prevent both malicious clients and less performance concerned front-end developers from putting excessive load on the database. These limitations also allow it to work with both data stored in a database and data generated by an application, while supporting SQL is pretty much limited to being processed by a database.


The way I see it, SQL and GraphQL are solving somewhat complementary problems. In SQL, I have a structure (all these tables that possibly have to be joined) in the database and I want to pick something out as a simple result table. In GraphQL, I create the more complex structure on the output.

But I do consider GraphQL somewhat unnecessary, because if those REST APIs composed just like tables do in the database, then you wouldn't need GraphQL, and you could run a normal query. (There is also a problem of externalities, putting the processing costs on the client is cheaper.)

And thanks for pointing out EdgeDB.


>you wouldn't need GraphQL, and you could run a normal query.

You're making the assumption that there is something to run a normal query on. As soon as you write even a single line of server side code this assumption is broken. What if the GraphQL query doesn't actually use an SQL database and just reads something from a file or another service? What if the server is responsible for granting row level access under very complicated rules that cannot be implemented with just SQL and would be completely insecure if it was done on the client? What if you actually need to do things like implement business logic?

What you're talking about is downright nonsensical within that context.


These counterpoints are probably valid for most database systems, but with Postgres it's actually far more efficient to use it as the substrate in which everything else is embedded.

* Postgres has a robust, battle tested role based security model with inheritance.

* Postgres has foreign data wrappers that let you encapsulate external resources as tables that behave the same way as local tables for most use cases.

* Postgres has plugins for most of the popular programming languages.

If you really like GraphQL, the Postgres approach can still give you that too, using Hasura or PostGraphile.


That's a distinction without a difference; you can execute SQL against CSV with the right tool, nothing you need to write yourself, likewise with programmatic data sources.

Implementing significant business logic on your data access path has other ramifications, but they're orthogonal to the query syntax chosen. And I say that as someone who thinks GraphQL is a fine idea, especially for reducing churn in API specs when you have separate teams for front and back end dev.


I'm not trying to move the goalposts -- you've mentioned SQL specifically, and not relational databases in general -- but RDBM systems offer these features, and in a way that's compatible with access via SQL. Foreign data wrappers can present external data sources as if they were SQL views, allowing cross-source relations to be constructed and queried. Complex row level access and business can be implemented in stored procedures and triggers, if a simpler method can't be found. People will debate the extent to which these features ought to be used, but they certainly exist.

My point is that it isn't "downright nonsensical" at all.


You're talking a lot about implementation, when you compare 2 languages. In principle - it's worth inventing a new language when you can express some common patterns clearer/ more naturally. Otherwise - just make a query engine that's very performant for a subset of SQL (SELECT + following foreign-key link), and outright reject all other kinds of SQL.


OK, so, disclaimer, I haven't ever used GraphQL with live ammunition, but I really don't get the impression that it can be dismissed so easily.

TFA does a good job of showing that, while SQL is a cleaner design and theoretically better than the hot mess that is Pandas dataframes in every way, the dataframes API offers a lot of conveniences and ergonomic affordances that can make it more pleasant to program against for certain tasks. As someone who spends a heck of a lot of time using Pandas to prep data for loading into an RDBMS, I agree 1,000%. As someone who, once upon a time, did assembly language programming on both RISC and CISC architectures, and, once upon another time, bounced back and forth between Scheme and Objective-C, this sort of situation doesn't surprise me at all. It's just the standard "worse is better" story that we all know and love to hate.

I suspect it's similar for GraphQL. For example, the JSON basis for the language does make it awful to write by hand, but, if you're talking APIs here, you may not be doing too much much hand-authoring of queries. And it's going to be a lot easier than SQL to manipulate programmatically, since, particularly in a language like JavaScript or Python, its syntactic structure maps trivially to basic data structures that already exist in the language.


>> if you're talking APIs here, you may not be doing too much much hand-authoring of queries

No experience with GraphQL myself, but this is a very good point. A lot of the practical problems that come with SQL queries boil down to the code that constructs a string to be used as the query, so it can then be deconstructed by different code to execute the query. There's a lot of mistakes that happen right there.


And this is the pain I'm thinking of. Doing dynamically generated SQL that isn't susceptible to SQL injection can get tricky, and ORM frameworks generally only partially solve the problem for a certain subset of possible schemata.

I do think SQL is a well-designed language. But it was designed, first and foremost, as a language for human analysts to bang into a terminal. Having computers flexibly communicate using it is an off-label use.


Have you had a look at Datalog? It's a carefully selected subset of prolog that corresponds to primitive recursive functions.

So: plenty of computational expressivity, but solidly removed from Turing completeness.

Adding types for relational algebra to Haskell is a bit of a slog. Not because it's impossible, but just because the natural way to express those data types is not what comes natural in Haskell.


I think you are describing something close to the Dataset API in Spark. Spark is built on the RDD, a novel data structure that creates transparent concurrency and distribution. Additionally, you can access the same data with 4 APIs, one of which is SQL, and another which is a typed functional API.

The RDD paper is one of my favorite papers, and is great bed time reading.

https://www.usenix.org/system/files/conference/nsdi12/nsdi12...


As a big fan of the relational model (but not so much of SQL) I just wish that pandas didn't tread indices specially but just as a normal columns. Also, multiindices should just be indices on multiple columns. And I should be able to have as many indices as I want.


> Anyway, I think the way out of this mess is to have a functional, Haskell-like language for processing tables

I agree, and in a Haskell-like setting we can have any structured types (rather than just primitives).

That’s why I built hobbes: https://github.com/Morgan-Stanley/hobbes


> And yes, GraphQL is a poor reinvention of SQL.

No. GraphQL is an RPC specification. There is not a single comparison operator defined it the GraphQL spec. There is no way to join two separate collections. GraphQL was never meant to be an alternative to SQL, and people mainly try to compare them because they both end in "QL".


The main similarity is the whole idea of declaratively saying what you want in a single request. In SQL, you use joins or subqueries, and in GraphQL you use use nested edge/node blocks.

Either way, you define what you want in a nested/tree-like manner and submit one big-ass request to the server.

The difference is that GraphQL is usually way less verbose and tedious to type out, but they’re fundamentally the same idea.


I don't think it needs so much a full blown language - its just an api that allows you to feed data in and pull it out for any arbitrary variety of data (including types/classes and links between data), but then has special commands to tell it how to optimise data storage (for speeding up a particular kind of access need, for minimising data storage, etc.)


Sounds like datalog could fit this bill


Everything is RMDB

I advocate: Building a relational data model on top of hash-map to achieve a combination of NoSQL and RMDB advantages. This is actually a reverse implementation of PostgreSQL.

[Clojure is a functional programming language based on relational database theory](https://github.com/linpengcheng/PurefunctionPipelineDataflow...)

[Everything is RMDB](https://github.com/linpengcheng/PurefunctionPipelineDataflow...)

[Implement relational data model and programming based on hash-map (NoSQL)](https://github.com/linpengcheng/PurefunctionPipelineDataflow...)


I agree, but prefer a tree map in combination with natural keys where possible to get ranges & sorting without complicating the design. Then I embed additional tree maps in records to model has-many-relations. And I know SQL very well, but this setup often provides the capabilities I need and is much more convenient to deal with.


Can you elaborate on the topic of a "relational data model on top of a hash map"?

Are there any books that cover the concepts?


"a relational data model on top top of hash-map" is my original idea, and then combine [The Pure Function Pipeline Data Flow v3.0 with Warehouse/Workshop Model](https://github.com/linpengcheng/PurefunctionPipelineDataflow), can perfectly realize the simplicity and unity combination of system architecture.


cockroach is built on top of a key/value index, i believe.

https://www.cockroachlabs.com/blog/sql-in-cockroachdb-mappin...


not the parent (but I like their thinking); I think the key concept is to treat the hash map as just a (potentially primary) index over your data. Of course keeping secondary indices up to date is now the job of the application (or the abstraction layer)


But isnt that just a sql db?


But it is also Nosql (hash-map). It is flexible according to your needs, as sqldb or Nosql or data structure.


the idea is, if for whatever reason you do not have access to a proper DB, can you build your own relational model on top of the bits you have?




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: