Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: EdgeDB – Next generation database (edgedb.com)
224 points by 1st1 on April 11, 2019 | hide | past | favorite | 85 comments



I'm a pretty experienced Data Engineer working at one of the bigger data companies around. I read your copy and landing page, here's what I took away:

1) New database product that's an alpha. My first thought is "will be interesting to follow for 5 years to see if they make it". database choice is possibly the single biggest decision I make in my role; I don't make it lightly.

2) It looks like it's trying to bridge the NoSQL/SQL gap -- more consistent for SQL fans, more flexible for NoSQL fans. (My judgement: "serving 2 masters is not a great signal for me to choose this DB" but I'm a SQL curmudgeon.)

3) GraphQL is really neat out of the box. Some neat features overall.

And, finally, I'm still not really sure what situation could possibly arise that would make me choose this DB, mostly because that choice isn't something I'd select a <5 year old solution for.

I do like that it's based on Postgres (Citus taught me how powerful the "build on pg" model can be). I really like the graphql and schema introspection features. I don't think I would put this on my vendor/technology comparison matrix.

Hope this is helpful, please don't read too much into the negative stuff -- if I were choosing any other technology pillar besides a db, I wouldn't be nearly as strict. It's an uphill climb for your team, but I wish you the best of luck!


Thank you for your insightful comment!

I agree, people do choose a database based on its track record, and this is where EdgeDB will have to prove itself. Overall, I think that the decision to build on PostgreSQL is a great competitive advantage over other new databases: we can confidently say that the data will be safe with EdgeDB. And we can actually spend more time improving the ergonomics of the data model and the query language.


DB marketing is an interesting case. It's easier to market on features that devs want, which is what Mongo did. Your serious users (DE w/ years of experience) are more into the reliability and guarantees, the building of which tends to slow down feature launches. (Or, sometimes, the features developers want conflicts with more "Traditional" db systems).

One thing I've rarely seen mentioned, but I believe is a force behind MySQL's success: metadata manipulation. DESCRIBE TABLE, SHOW CREATE TABLE,... stuff ends up being a lot more useful than I would think as an outsider. Going from MySQL to pgsql was a steep learning curve because the commands to display table listings (\dt; i think) felt very arcane. Also, being able to rename, add, remove columns. Finally, can I create a table with the same schema (data optional) as another table? I do that a lot.


Since you actually forked Postgresql, how is it realistic to assume that the data will be safe?


We maintain the number of modifications to the minimum, and we don't touch critical paths (there's simply no need for that).


The following areas are pretty though when building a new database:

- transactions - proper paging - indexes - query engine - and more ...

However, if they use postgresql as a base.. it significantly improves the stability and feature completeness. Especially if it is only a different frontend (query language and schema).


Very well said.

I don't want to touch a database product until it has a proven production use for several years.

Creating a database is hard, very hard, as history has shown us recently.

That it sits on top of Postgres is a great start though.


Same here, only thing to add that my number of years before I can consider a database production ready is closer to 10 years. I am willing to make an exception only if a major company with scale and engineering resources uses it and willing to publicly talk about it. RocksDB falls into this category, for example.


I really like the re-imagination of the type system, data model, and the user interface. This is an area where databases could be significantly better than they currently are.

My only concern is that there is nothing about the backend that suggests that this will be scalable beyond tiny data sets. Even on single machines, there is a performance cliff that arrives very quickly when doing these kinds of data model traversals unless you are using something exotic under the hood.


> I really like the re-imagination of the type system, data model, and the user interface. This is an area where databases could be significantly better than they currently are.

Thank you!

> Even on single machines, there is a performance cliff that arrives very quickly when doing these kinds of data model traversals unless you are using something exotic under the hood.

PostgreSQL is used under the hood. We compile EdgeQL to an efficient SQL behind the scenes, similarly to how you compile a high-level language to lower-level language. All our fancy data types fully utilize the relational model, so what's achievable with scaling Postgres will be achievable with EdgeDB. I know quite a few massive deployments of Postgres that work just fine, there's simply no reason why EdgeDB would be different.


Can you point an EdgeQL 'server' at, say, an Amazon RDS Postgres cluster?


No. Some of the features required us to slightly modify Postgres, but we'll be trying to upstream the changes.


> An occasional post tracking the status of that upstreaming effort would be very welcome.

We've already found a few bugs in Postgres and upstreamed fixes for them. There is a patch or two from us in their review pipeline right now.


> May I assume that the slight modifications you were required to make to postgres in order to support your features weren't just bug-fixes?

Yes. Some of them, like strict parsing of timezone-aware vs naïve datetimes can be upstreamed (as a configurable option) if the Postgres community accepts it. It's in our interest to upstream things like that :) We'll see how it goes.


> We've already found a few bugs in Postgres and submitted fixes for them. There is a patch or two from us in their review pipeline right now.

May I assume that the slight modifications you were required to make to postgres in order to support your features weren't just bug-fixes?


> Some of the features required us to slightly modify Postgres, but we'll be trying to upstream the changes.

An occasional post tracking the status of that upstreaming effort would be very welcome.


So, EdgeQL is closer (than SQL) to relational algebra as it was originally envisioned? Did Tutorial D have any influence on the language and the system?


You got it! We designed EdgeQL from the first principles of the relational algebra and the set theory, with focus on practicality. There are a lot of similar ideas in Tutorial D, but EdgeQL is not a descendant of it.


Well, NOW I am curious. And I do a lot data engineering these days ;-)

btw, do you implement any edgeql-specific optimizations? I know that the db shares most of the code with psql, and its optimizer is okay. But being closer to the math foundation opens up lots of opportunities..!


> btw, do you implement any edgeql-specific optimizations?

Not on the planner level. Most of EdgeQL-specific low-level stuff is currently built as an extension with helpers. Our strategy is to upstream as much as possible, including any possible improvements to the planner that benefit EdgeDB.


> EdgeQL has no NULL.

So how is a missing value in a column represented? A column value is a scalar value, not a "set". As much as I accept that NULL values are a pain to deal with, I don't see a way to represent the absence of information in another way. You can't store an "empty set" in a column.

A NULL _value_ in SQL is something completely different than a NULL reference or pointer in a programming language. Although both have the same name the underlying concept is completely different.

> Functions in SQL can return NULL to signal an error condition.

That's plain wrong. functions in SQL signal errors by throwing an exception


> So how is a missing value in a column represented?

Explained in this thread: https://news.ycombinator.com/item?id=19641060

> > Functions in SQL can return NULL to signal an error condition.

> That's plain wrong. functions in SQL signal errors by throwing an exception

Well, just as an example: take a look at the 'to_number()' function. It will happily return NULL if the format string is an empty string. An empty format is an error condition (it's not documented to return NULL, btw, see [1]) at least in some contexts, so the returned NULL breaking the query can be very unintuitive.

So yes, throwing an exception is what happens in 99% of situations, until you hit a weird case where a NULL is returned unexpectedly for you. And I'm not saying that returning NULL is bad for 'to_number()'; in SQL that's OK. It's just not how we approach things in EdgeQL.

[1] https://www.postgresql.org/docs/11/functions-formatting.html


So this is basically a fork of Postgresql? Interesting. Why not just have an extension? I do like the idea of object oriented queries (without having to go through ORM), and simplified JSON usage. Lots of good ideas! I'm excited about this project.

Btw I suggest coming up with a better name than EdgeQL. Maybe just EQL?


> So this is basically a fork of Postgresql? Interesting. Why not just have an extension?

EdgeQL as a language is so intuitive and advanced only because it operates on a very specific data model that can be compiled to a relational model, but isn't really compatible with it. So naturally, EdgeDB wants to be in absolute control of the underlying Postgres backend, which goes beyond of what's acceptable for a Postgres extension.

> Btw I suggest coming up with a better name than EdgeQL. Maybe just EQL?

There are other languages called EQL (at least that's the impression I got when I last googled it), whereas EdgeQL is unique :)


Just go with EQL the other EQL people will understand.


I like the name. It sounds like everything is one fast and painless join away.


Hello HN!

This is a blog post about EdgeDB 1.0 Alpha 1. It's a new object-relational database, focused on type safety, usability, and developer ergonomics. We hope you will try it out!


I like this a lot! What a great evolution of Postgresl/SQL. I also love what you do for the Python/asyncio ecosystem. Congratulations on the release!

Their is one point I'd like to address, though. It's still a real pain to write queries from within other programming languages. You either write them within strings, which sucks, because you lose all support from the editor (highlighting, auto-formatting, auto-suggestions, etc.), or you use an ORM, which I think is not a good solution to the problem. If you want to boost "developer ergonomics", make editor plugins for EdgeQL first class citizens. My 2 cents. :)


Thank you!

> You either write them within strings, which sucks, because you lose all support from the editor (highlighting, auto-formatting, auto-suggestions, etc.), or you use an ORM, which I think is not a good solution to the problem. If you want to boost "developer ergonomics", make editor plugins for EdgeQL first class citizens. My 2 cents. :)

Yes, that's exactly what we are going to do. We'll obviously add a query builder to edgedb-python and other language bindings. We also plan to reflect the DB schema to your language of choice so that the query builder is fully typed. Then IDEs will have no problem with auto-completing your code.

Another thing we're considering is implementing the Language Server Protocol for EdgeDB.


That sounds awesome! :)


Can you publish benchmark code?


Yes, it's linked from the article: https://github.com/edgedb/webapp-bench


Congrats on your launch. While I like the idea, I believe that it would take at least a couple of years to convince people trying out in a real production environment. This is such a competitive space and companies prefer staying in the safe zone.

I'm sure you would need to cut off many features but I would start small and release it as a Postgresql extension. Even better, you can sit in front of Postgresql and develop your own client libraries for DDL and queries. That way, you could support RDS and Cloud SQL so companies can confidently try out EdgeDB. Once you prove that the new query language works, you can extend and build a new database from Postgresql.


I got lots of question, got bit excited while glancing thru docs.

What is the minimal system requirements? And what about partitioning, replication, event log, etc.

Is there a native way hook to graphql resolvers? I really like it either way, there some tools for composing and stitching schemas.

plans for hosted and managed solution?

Did you consider "pageInfo" type, maybe it is just me, but it proved very useful on frontend.

instead of `Movies: [Movie]`

    Movies {
      pageInfo: PageInfo # { hasNextPage, endCursor }
      nodes: [Movie]
    }


> What is the minimal system requirements?

Minimal requirements are similar to that of PostgreSQL. EdgeDB should run comfortably on an average server.

> And what about partitioning, replication, event log, etc.

Tooling for that will be coming in the next few alpha releases.

> Is there a native way hook to graphql resolvers? I really like it either way, there some tools for composing and stitching schemas.

Not right now. Although I think it should be possible to do what you want via one of the existing GraphQL proxies, like Apollo.

> plans for hosted and managed solution?

Yes, but no ETA right now. This is one of our highest priorities.

> Did you consider "pageInfo" type, maybe it is just me, but it proved very useful on frontend.

We can add that. One of the first-priority things on our list w.r.t. GraphQL though is to implement Database Views, which will allow selectively exposing your schema (and controlling how it is exposed).


This is really good to hear. Thank you


Is it correct to say that EdgeDB is an attempt to make Postgres better in terms of query language and abstractions? By "better" I mean the idea of being more flexible and convenient for users and generating efficient SQL for them?

Could you clarify why the benchmarks show lower performance than vanilla Postgres? Is it because of less efficient generated SQL? (I am not a database engineer and probably missing something obvious)


Yes, one of the main goals of EdgeDB is to provide improved ergonomics and productivity through a modernized data model and query language. And do that without a significant performance cost.

Lower performance in benchmarks is explained by an I/O bottleneck in the EdgeDB server, the queries are not less efficient. There will be performance improvements in subsequent releases.


Congratulations on launching. I've been following EdgeDB after finding about the company behind uvloop. Excited to test this out on a side project.


Thank you! Please share with us how it will go.


Every time I see a new database announced I expect to see the database of my dreams: a simple JSON store that runs arbitrary computations and store these so they can be queried fast (and reruns just what's needed when some piece of data is updated).

Still nothing so far.


You can do this in EdgeDB. The fact that the data is normalized doesn't really matter, because you can request it as JSON.


Can you automatically compute and store computed data?


Which benefits does EdgeDB have over Hasura which isn't built on top of PostgreSQL but as a layer that runs on top of it and exposes a GraphQL API along with access control, event triggers and other features?


The main advantages are EdgeQL and the Data Model.

EdgeQL is a fully featured query language, with support for subqueries, aggregation, transactions, rich datatypes like tuples/named tuples, arrays, JSON, etc.

The Data Model is object-relational, with support for multiple inheritance of object types and links between them. GraphQL really shines when it's used to query EdgeDB, it just feels natural.

That said, GraphQL can only do relatively simple queries, like "fetch me a hierarchy of objects with this shape with some basic filters". EdgeQL, on the other hand, allows you to build queries as advanced as what you can do with SQL.

Finally, there's only one schema in EdgeDB. You don't need to maintain a separate GraphQL schema, like you need to with hasura and other GraphQL solutions.


You started out with the claim that your database is fast, but you added almost no info to support this claim. Don't get me wrong but if you remove the initial claim the article would be more trustworthy.


The benchmark results are right in the blog post.


I wonder how this stacks up against a dedicated graph database like Neo4J.

Are arbitray-depth graph queries a design goal, or are the nested results just for optimizing network traffic?


EdgeDB is not a graph database, i.e. we didn't build it for traversing super deep schemaless graphs.

On the contrary, we optimized it for the kind of applications that are usually built with relational or document databases. Those scenarios frequently involve a relatively complex schema with queries that fetch hierarchies of objects 2-3 levels deep. Doing that efficiently in SQL or NoSQL isn't as simple as it sounds (and the blog post makes a point about that).

That said, I expect EdgeDB to perform on par with neo4j on moderately deep object hierarchies. We also plan to add support for recursive queries at some point, although it's not a priority right now.


Thanks for the answer.

> queries that fetch hierarchies of objects 2-3 levels deep [...] efficiently in SQL or NoSQL isn't as simple as it sounds

It doesn't sound simple and it isn't simple. :-)


I love the class of databases that are built on top of Postgres. It goes to show how far people can extend Postgres. I hope others can follow suit.


Postgres is an amazing foundation to build on!


To return your challenge to Django, do you have an easy way to compose queries as neatly?

``` issues_query = Issue.objects.filter(location__contained=bounding_box) if all_of: for tag in all_of: issues_query = issues_query.filter(issuetag__tag=tag) if any_of: issues_query = issues_query.filter(issuetag__tag__in=any_of) ```


We are working on a similar query builder syntax for edgedb-python.


How will the api look like?


Similar in spirit to Django and SQLAlchemy Core SQL expressions.


What are the chances that in the future, EdgeDB can upstream their changes to Postgre so that EdgeQL could be run on top Postgre as an Extension rather than current way of basically forking the whole Postgre?

I really like some body is finally doing something to "SQL" in terms of the language itself. I hope this catches up and other will consider doing something similar.


Not sure if this is correct, but seems like only client APIs are Python or hitting the GraphQL API for anything else. Obviously it's an alpha so not sure what baseline one should set but it's a complete non-starter for me even to play with until it proves it can / will interoperate with a broader range of clients (eg: JDBC).


We want to provide high-quality bindings for other languages as soon as we can. In the meantime both EdgeQL and GraphQL can be exposed via an HTTP endpoint.


For someone looking for a GraphQL solution, which benefits does EdgeDB have over https://github.com/graphile/postgraphile ?


Key benefits: easier data model to deal with, integrated migrations, using EdgeQL (not SQL) to write advanced queries and expose them as GraphQL views. In the future we'll have integrated access control layer, but that's a subject for another great blog post :)


Regarding the migrations, any more information on that?


The reference page has some details [1]. Our goal is to match and exceed the experience of managing the migrations with a high-level ORM like Django. This is one of the reasons behind the SDL/DDL duality. EdgeDB has to grow some tooling around it before we get to that point, though.

[1] https://edgedb.com/docs/edgeql/ddl/migrations


Any plans on making migrations being able to rollback as well?


How do you think you compare to Cockroach? Besides the query semantics, I’m curious about the depth of the thought around cross region / cross continent and horizontal scaling concerns?


EdgeDB is based on Postgres. There're ways of scaling it as Citus Data has shown, and there's a lot of ongoing work in PostgreSQL itself to improve the scalability. We'll be using that as well as actively contributing to further improve it.


The database itself looks neat, but wasn’t at all what I expected from the name EdgeDB. At the moment, “edge” usually refers to edge computing, i.e. running code in many data centres around the world, near to their users.

I expected EdgeDB to be a database suitable for edge computing, namely able to run globally, near the edge compute nodes, and replicate data with eventual consistency.


I don’t know, I first thought of graph databases and computing edges between nodes.


Me too. If anyone has any pointers to such a database, I'd be interested. I have some devices that are looking like they'll need a custom filesystem based key store, but others running linux with a need to sync the two types of devices.


Realm is probably one of the most well known.


Is there any plans to support full-text search?


Yes. EdgeDB actually used to have it earlier on, but we felt that the design wasn't good enough, so we dropped the FTS support for now. The idea is to have a datamodel/QL UI such that we are not just copying the Postgres FTS design, so things like a different search index backend (e.g. Elastic) are possible.


Are there any plans regarding GIS?


Yes, we will have it (via abstracted away PostGIS)


Free


The folks at magicstack have written some seriously cool libraries. Lots of neat cython stuff, postgres, libuv, http parsing. I really like reading their code.


Thank you :)


Why would you replace nulls with empty sets?

Now you can't tell if it's empty or just not set.

Now I have to track another thing just to tell me if I have touched a thing.

Please don't tell me this is some kind of nulls were a mistake cargo culting?

Nulls are just option types with good usability. Not the languages fault if you decide to put a tripple meaning on it.


You seem to treat NULLs as if they have a specific meaning. So then the question is this: what should count(NULL) be?

Here are some options:

1) count(NULL) = 0, so apparently NULL is just like an empty set at least some of the time. This means that some of the code will treat NULLs as empty sets and other code will not, leaving the burden on the programmer to keep in mind these implicit differences.

2) count(NULL) = 1 because NULL is a value, albeit a sentinel value. This can lead to tricky problems where a count() suggests that there's some data, while, in fact, there is none.

3) count(NULL) = NULL. If the idea behind this option is that sentinel value cannot be operated on, then this is pretty much like throwing an error at every NULL, which, in turn, will result in the necessity to guard many expressions with some error (NULL) handling code.

One of the things to note is that an empty set has rather unambiguous semantics, while a NULL presents options each of which can be justified depending on how a particular person thinks about this special value. The line between "a value has not yet been assigned", "no value has been assigned" and "a value has been unassigned" is very blurry. On the other hand, the line between "there is no value" and "there is a value" is pretty clear.


> You seem to treat NULLs as if they have a specific meaning.

Null is a word, it has a meaning. In programming that meaning is the absence of a value.

For ergonomic reasons dynamic languages will cast it to 0 or false in some situations. But that doesn't change its meaning and it doesn't mean it has multiple meanings.

That would be like complaining about how most languages treat 1 == true and 0 == false.

> So then the question is this: what should count(NULL) be?

No, the question isn't that.

The question is what is null. It is null.

Count(null) does not change what null is. Null is still null.

If count wants to treat null like a 0 for ergonomics it can do that. If it wants to treat it like an empty set it can do that too.

If count wants to treat null as U countable and throw an error it can do that as well.

If it wants to treat it as a null operation and return null it can do that too.

Personally, I find 2, and 3 fucking useless so if it was my language, I would define a countable interface for null and set null = 0. Because there are no usecases where you want that to error at runtime.


> Now you can't tell if it's empty or just not set.

Those are the same things in EdgeDB. You can have a "required link", in which case a relationship between two objects is required, and therefore the link cannot ever point to an empty set. Alternatively you can also declare your link as "optional", in which case you have it pointing to an empty set (essentially to "nothing"). You can coalesce that empty set with a non-empty set, if you need.

That is similar to SQL; the key differences here are:

* The boolean logic in EdgeQL is two-valued, it's "true and false", not "true, false, and NULL". Some people find it hard to guess what "true OR NULL" returns in SQL, because unless you are a DBA, this isn't something you deal with on a daily basis.

* Functions/operators in EdgeQL never return empty set to indicate an error condition. This is one of my personal pet peeves with SQL; in some situations it's hard to understand why a big and complex query yields slightly wrong results sometimes.

> Now I have to track another thing just to tell me if I have touched a thing.

I suggest you to try writing EdgeQL queries. The experience is usually quite the opposite, and I hope you'll end up liking it!

> Please don't tell me this is some kind of nulls were a mistake cargo culting?

While I do think that NULL is a hard to deal issue in many languages and I admire languages like Rust without NULL, EdgeQL is slightly different. As you noticed, we do have a concept of empty set, which in some ways is similar to NULL in SQL (with the caveats I explained above).


I may not be following correctly, but if a Boolean field doesn’t support null, how would you handle a use case like: a survey question that hasn’t been answered, vs setting as false by default? Would you use an enum with 3 states vs a Boolean type?


This goes a bit beyond what fits in a comment reply, but I'll try to be brief.

- You can define a `bool` property as "optional" and then it can have an empty set as its value.

- The 'exists' operator coverts sets into bools; you can use it with the 'if..else' operator to handle empty sets where needed.

- You can coalesce (the "??" operator) your potentially empty property with a non-empty set, say a "{false}".

- An empty set isn't some magical value like NULL in SQL. Functions and operators in EdgeQL are strictly defined if they accept sets as their arguments, or not. In the case of logical operators -- they don't; they are defined as element-wise operators, therefore no three-valued boolean logic. You learn the rules once and they apply to all functions and operators in all contexts.

This documentation page (and subsequent sections) explains how exactly EdgeQL is defined: https://edgedb.com/docs/edgeql/overview.


how would you handle a use case like: a survey question that hasn’t been answered

Very simple, you leave it out! Then you can query for this.


> While I do think that NULL is a hard to deal issue in many languages and I admire languages like Rust without NULL,

Why would you admire this? Dropping .unwrap().wrap() on all the things is not any better than doing the same thing on nullable types.

"duh you're meant to handle them straight away so its not an issue" - no shit? Do you think maybe the same could be said about null values?

Removing optionality from a type system is a mistake, it just leads to more gibberish to handle undefined/notset.

> EdgeQL is slightly different. As you noticed, we do have a concept of empty set, which in some ways is similar to NULL in SQL (with the caveats I explained above).

Yes, it is slightly different, no it is not similar to null. It is what you get when you don't have null and don't have options. i.e. no sentinel, just a maybe.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: