Hacker News new | comments | ask | show | jobs | submit login
Why We’re Building Flux, a New Data Scripting and Query Language (influxdata.com)
91 points by pauldix 7 months ago | hide | past | web | favorite | 113 comments



> I don't want to live in a world where the language I speak evolved slowly over the past 1600 years.

Also, this? This is the illustrative example you chose for your amazing new query language?

    square = (table=<-) => {
      table |> map(fn: (r) => r._value = r._value * r._value)
    }
So you're saying you're combining the expressiveness of SQL with the readability of, what, perl?


Agreed. The reason SQL rules is that nothing has really come into existence that is as or more powerful AND easier.

I used to hate writing SQL (I still do somewhat) but I took the time some years ago to really learn the language and once you do that you appreciate a few things:

1. Yes it requires a high mental load to craft something sophisticated

2. I can’t think of a ‘better’ alternative to SQL that isn’t a compromise.

3. You can do a crap load of stuff with joins, procedures and CTEs once you understand it.


> 3. You can do a crap load of stuff with joins, procedures and CTEs once you understand it.

Yes, or you can skip that part and do the tricky part in an actual programming language ;) For instance in Haskell or JavaScript. There is then even a chance that the code is both fast and actually works. :D

I mean in the end this is just a cheap marketing trick to literally buy people into the platform. People can write it into their CVs. The whole Influx ecosystem looks great. Although the UI (Grafana) is slow as hell once there are enough Graphs in and if you need fail-over or scaling, you need to buy an expensive license.

Also already the existing query language is obscure and counter-intuitive.


> Yes, or you can skip that part and do the tricky part in an actual programming language

But then

- you miss out the power of the query optimiser

- you have to transfer a lot of unnecessary data

- have a harder time to enforce consistency

- have a harder time to enforce access rules / security

There are good reasons to do as much as possible in the database and not in a programming language in the backend or even worse, the frontend/browser.


I don't think that follows.

Many of the optimizations you mention (for example the first two) may in fact be removing baggage that probably would not have been there in the first place.


> Yes, or you can skip that part and do the tricky part in an actual programming language

That works great up to a point but will not scale for environments that need to scale. I’ve worked on a number of systems where they implemented excel export or bulk import components and wondered why their application locked up after a few thousand rows.

Usually the developers turn around and blame their SQL engine for being deficient and then try Mongo or similar - not because it’s better necessarily but it allows them to continue working the way they want to (on application code).

Not to sound mean but the developer types who shy away from writing proper SQL queries also seemed to be oblivious to asynchronous programming techniques and using the right data structures in their code - indicating that they were generally less interested in finding the most efficient methods.


> Yes, or you can skip that part and do the tricky part in an actual programming language

Are you proposing pulling in an entire dataset into memory and post-processing it vs. creating efficient SQL to get what you need?


As crazy as this sounds, yes. Not the entire dataset of course, a selection of it and then tie things together on the fly.

I've seen a handful of systems (one of them I wrote myself :D) that did complex queries. In part and in certain situations this can be very efficient, no doubt. But what I observed is:

- dev-wise these systems become one-man-shows

- implementing feature over feature becomes exponentially slower with time

- initial performance is there but it turns into slow performance over time

I prefer to do things on the application side.


Just so you are aware, in reality all the points you listed are exactly the results of what you are describing.

It becomes a one man show because anyone competent is going to run away from that mess right away.

New features become exponentially slower with time because you are re-implementing some ad-hoc SQL implementation in JavaScript of all things.

Initial performance is fine on your device setup, but drastically slows down over time when you pull in more data and do more complex things.

As someone else pointed out, you could benefit from actually learning SQL. There is a reason it's used.


Sounds like you could benefit from learning SQL.


The Influx ecosystem actually offers https://github.com/influxdata/chronograf for graphing in stead of Grafana.


An improvement would be a slight reorder of clause's to improve intellisense ala linq though.


What resources did you use to master SQL?


I used a number of online resources - don’t recall them all but notable ones were:

https://use-the-index-luke.com/

http://www.postgresqltutorial.com/

Postgres documentation

Postgres conference videos and slides (can find on Postgres website and on YouTube)

...you can guess I don’t rely on books too much!


Thank you!


I recommend everything by Joe Celko and CJ Date. Start with Celko.

The most advanced book on SQL I have read is https://www.goodreads.com/book/show/1255518.SQL_Design_Patte...


I have to say...I definitely laughed harder at this than I should have...


TXR Lisp:

  1> (op mapcar [dup *])
  #<interpreted fun: lambda #:arg-rest-0011>
  2> [*1 '(1 2 3 4 5)]
  (1 4 9 16 25)
Let's do it with a vector of objects that need their value slot accessed:

  1> (defstruct (silly-wrapper value) () value)
  #<struct-type silly-wrapper>
  2> (defun square (table)
       (mapcar [chain .value [dup *]] table))
  square
  3> (square (vec (new (silly-wrapper 2)) (new (silly-wrapper 5))))
  #(4 25)


As someone who actually glares at Perl quite frequently can I gently request you take that slight back. Or at least qualify it to be "perl regex". Unless youve used Perl 6 (a different language) and not found it readable youd do well to say Perl 5 too. Or perhaps you've never used either language but use it as a defacto meme to make a point?

The bigger argument is not readability but where the abstraction lies. From the examples I cant see much sugar beyond what an ORM often provides.


I write Perl5 every work day and find Perl6 less readable due to double symbols and increased symbol madness. Like the new ternary operator, which is probably much more visible but looks just crazy. Or private/public variables. In some respects it's worse than Perl5, in others it's exponentially more powerful. It's like giving you the equivalent of a swiss army knife for each and every tool inside your toolbox.

https://docs.perl6.org/type/Signature#index-entry-Type_Captu...

That said, I am more tempted to try Ruby, Crystal and Dlang for a new project, rather than Perl6. The latter looks like it was designed by a mad scientist on LSD.


> less readable due to double symbols

Double symbols? What do you mean by that?

> increased symbol madness

Could you elaborate what you mean by that?

> It's like giving you the equivalent of a swiss army knife for each and every tool inside your toolbox.

Please note that the link you've given, is about being able to restrict calling a subroutine by making sure that the first 2 parameters are of the same type (regardless of which type). Generally one knows which types to expect. And even so, typechecking is an optional thing in Perl 6 (hence the term "gradual typechecking".

> The latter looks like it was designed by a mad scientist on LSD.

I think that's uncalled for. But that's your opinion. The same mad scientist who gave you Perl 5, by that reasoning, by the way.


I mean $.x and $!x or such craziness:

     submethod BUILD(:&!callback, :@!dependencies) { }
https://docs.perl6.org/language/015-classtut#Private_Methods

or the quite useful Z=> operator.

    my %hash = @keys Z=> @values;
or the Bat range operators:

    (1^..^5).list; # (2 3 4)
I know. It was meant as a Perl6 example chosen at random. It looks as something taken out of the Discworld novels:

https://en.wikipedia.org/wiki/Unseen_University#Octavo


You don't understand what the following means

  1 ^..^ 5
It means a Range of values from 1 to 5 excluding 1 and 5.

  1         ~~  1^..^5; # False
  5         ~~  1^..^5; # False
  1.000001  ~~  1^..^5; # True
  4.999999  ~~  1^..^5; # True
It is short for

  Range.new( 1, 5, excludes-min => True, excludes-max => True )
One of the benefits if you want to get the first 10 items from an array is that you can use the numbers 0 and 10

  @a[ 0..^10 ]
It also means you don't have to do so many +1 or -1 when generating a sequence

  $n+1  ..  $m-1
  $n   ^..^ $m
We don't have to declare ranges with a number past the end like Python

  range(0,n+1)

  0..$n
… but we can if it increases clarity

  range(0,m)

  0..^$m
Since we don't use prefix ^ for anything else, we have it as shorthand for the above

  0..^$m
     ^$m
---

Perl 6 doesn't have a Z=> operator. It has a Z meta-operator and a => operator.

  my %hash = @keys Z[=>] @values;
short for

  my %hash = zip( @keys, @values ).map: -> ( $key, $value ) { Pair.new( $key, $value }
I would like to know which of those you can scan over and be sure it is correct without much thought.

We also don't have a += operator, instead = is also a meta-operator

  $i  +=  2;
  $i [+]= 2; # more explicit, but identical
Which means that if you modify an existing infix operator or add a new one, you get something like += for free.

  sub infix:< +++ > ( $l, $r ) { $l + 1 + $r }

  $i +++= 1;
Note that since operators are so simple, we don't have to re-use operators for completely different operations. Just add a new one.

---

Sigils and Twigils allow us to know the scope of a variable and that it is a variable, at a glance.

  sub value () { 10 }

  submethod BUILD ( $value ) {
    $!value = $value || value
  #   ^          ^        ^
  #   |          |         \_ not a variable
  #   |           \__________ subroutine scoped
  #    \_____________________ class scoped
  }
There are also compile time "variables"

  say $?LINE;
and dynamic variables (thread local)

  say %*ENV;
This makes it so that when you modify a dynamic variable, it is immediately obvious that is what you are doing.

  sub foo () {
    $*foo = 42;
  }

  my $*foo = 0;
  foo();
  say $*foo; # 42



  sub bar () {
    $bar = 42;  # compile-time error
  }

  my $bar = 0;
  bar();
  say $bar;


>can I gently request you take that slight back. Or at least qualify it to be "perl regex".

Regex has the same complexity in any language, that can't be a valid reason to denigrate Perl. My biggest issue with Perl is that functions don't even have proper parameters and you need somewhat cryptic syntax to work with arrays, scalars and dictionaries.


Define "proper parameters". Perl subroutines are delightfully flexible in that regard compared to most other languages, and they also happen to be closer to your machine's reality (x86 doesn't care about what's in your stack when you JMP). With the Perl approach, you have the tools to implement multiple dispatch, variable numbers of arguments, etc. rather easily and elegantly. Yeah, it looks weird from the outside, but once you're used to it you start to want it in other languages.

That's one example of what makes Perl a great teaching tool for general programming: it doesn't hide the mechanics of function/method calls behind the language, but instead shows it off to you and encourages you to explore and experiment.


I don't want a great teaching tool, I can use ASM for that. I want something that's easy to read and maintain.

I think Perl 6 got it right, but a little too late.


People keep saying that Perl 6 is too late, but they don't say that about other new languages.

I mean nobody says the same about Julia, which has been in development for the better part of a decade.

The only way it could be too late is if it doesn't do anything better than a single other language. By that I mean if there is a language that does everything that Perl 6 does, and does it just as well.

Even if all it does is bring a collection of useful features that haven't already been collected together into a single language, then it isn't too late.

It could also create a new design of an existing feature that goes on to influence the design in future languages. That would also make it so that it isn't too late. (I think that in the future we may be able say that about Perl 6 grammars.)

I think the real reason people keep saying that about Perl 6 is that they want it to be true. We can only really make that determination years from now when we have perspective.


It does have subroutine signatures since 5.20. And there's autobox.


Ok, I didn't know that, haven't used Perl in a long time. But I don't think any of my Perl programming colleagues have heard of it or like using it. I still think their code is harder to read because of this ingrained bad practice.


> I don't want to live in a world where the language I speak evolved slowly over the past 1600 years.

English is ultimately derived from Proto-Indo-European (as far as we can tell), which goes back at least six thousand years.


people this cool obviously shoot for the readability of J.


Perl is actually readable compated to that.


SQL definitely has weaknesses, but I wish people wouldn't use "straw man" examples to crap all over it. The flux example from his blog post would look something like this:

    select 
        lag(value, 0) over recent * 1 + 
        lag(value, 1) over recent * 0.5 + 
        lag(value, 2) over recent * 0.25 + 
        ... as exp_moving_avg
    from telegraph
    where time > datetime_sub(current_datetime(), interval 1 hour)
    and measurement = 'foo'
    window recent as (order by time rows 10 preceding)
Here, the main difference between flux is that flux has a built-in exponential moving average function, whereas in SQL we have to actually write out the formula.


> Here, the main difference between flux is that flux has a built-in exponential moving average function, whereas in SQL we have to actually write out the formula.

The possibility to provide arbitrary data processing functions is one of the core features of contemporary query languages. In SQL it has always been a huge problem (and your example is a good demonstration). In the example provided by they also rely on a built-in (exponential smoothing) function and therefore it is not clear whether I can really perform arbitrary ad-hoc computations within the query itself (without built-in or externally defined functions).


> The possibility to provide arbitrary data processing functions is one of the core features of contemporary query languages.

Yes and this is actually a big selling point of PostgreSQL based solutions. It's important that peoples don't confuse Modern SQL solutions with the ones they may have encountered decades earlier or comparison with NoSQL will indeed be unfair.


Are you referring to windowing functions?


SQL has a very solid ground in research - a lot of - in relational algebra. If you try to make a query language that is a dsl for anything without a really different data model underneath, you will accomplish nothing great.


In the world of "theory meets engineering" SQL is pretty embarrassing too. Read this: https://www.dcs.warwick.ac.uk/~hugh/TTM/HAVING-A-Blunderful-...

Just being built on the relational model is a tiny part of what makes SQL what it is (which is largely catastrophic, as languages go.) It says little about how queries are executed, indexes, syntax, data types, yada yada...

And the relational calculus itself isn't super lovely as a base... Turing completeness is a nice thing to have, after all.


> It says little about how queries are executed, indexes, syntax, data types, yada yada...

I believe that this is the whole idea of SQL or any declarative language.

You want to express your query without saying anything about indexes, how data are retrieved, how joins are executed and so on.

That is (suppose to be) a feature.


Sure, those things aren't specified in SQL, but again -- that's a feature of SQL, not the relational model. You could just as easily have an imperative relational language as a declarative one, so those theoretical foundations can be used to justify very little of the query language.


> Turing completeness is a nice thing to have, after all.

It is, and plain old non-procedural SQL has it.


Hmm, that's an interesting argumentative fork to run into -- either

a) SQL is Turing complete and departs from the relational algebra that the earlier post said gave it a solid theoretical grounding, or

b) SQL is not Turing complete, and so loses useful expressive power.

For the actual fact of the matter, you're right. I had no idea. See https://wiki.postgresql.org/index.php?title=Cyclic_Tag_Syste... for concrete code.

I think the specific departure from the "traditional" relational model that gets them there is "WITH RECURSIVE". I don't know whether there are other ways to get there.

Personally I think it's the right side of the fork to land on.


>And the relational calculus itself isn't super lovely as a base... Turing completeness is a nice thing to have, after all.

And then you have C++ templates all over again...


Which is actually the case for influxdb : there are no relations in it. Remember that what's used in influxql is the syntax, not the data model. However I must admit I don't see a particularly big need to change the query language to something completely different.


Holy shit. The language name alone is a really, really stupid idea.

Protip: If you're inventing a new esoteric programming language (and until you have other people implementing non-trivial projects in/using your language, it is an esoteric programming language), google the fucking name first.

If googling your intended <thing's> name results in more then 1000 hits, CHANGE YOUR <thing's> FUCKING NAME.

If you don't trying to find any resources about the <thing> on the internet will be a huge pain in the ass. Name your project something unique.

Googling "flux" results in "About 197,000,000 results". If you just make it a little more specific as "fluxql", you get ~142 results.

People looking for language resources will actually find the shit they're looking for, and the name actually tells you something about what it does, which is nice.


fluxlang is what people should be searching for and we'll continue to point that out in future blog posts, documentation, #fluxlang on Twitter and SO and everywhere else. People got there with Go, so I assume people will get there with Flux if the language is successful.


People got there with Go because google is huge.

If you're going with fluxlang, don't call it anything but "fluxlang". Referring to the same thing multiple ways is even worse.


> People got there with Go

Just because being Google can compensate for a bad choice doesn't mean that it's a choice to emulate.


I'd be interested in seeing a language that can take full advantage of the architecture in Out of the Tarpit.

It's always touted as a must read paper but I haven't seen many inroads towards something truly Functional-Relational.

I don't think a new query language is the solution to the problem. SQL isn't that bad. Sure, for sanity, every language has a reimplementation of SQL syntax using whatever abstractions available. But even if you don't go SQL, you've got datalog.

A language being 40 to 50 years old doesn't make it a problem. I'm speaking English - that's been through 2000 years of iteration. I don't think that makes it a good candidate for a ground up rethink when so much thought by great thinkers has already gone into it.


You can check out Bistro [1] which is an alternative to SQL-like languages and to set-oriented approaches in general. It focuses on column operations (formally, functions) as opposed to having only set operations. It is pricesely why it works well for time series and it is why it has been used for stream processing.

[1] https://github.com/asavinov/bistro


SQL is a bad solution to the wrong problem.

After 50 years it remains non-intuitive and confusing and extracts a tax penalty far greater than its value add. Worse, the problem it claims to solve, "I have all my data in one database now let me transact over it", isn't viable and was never actually viable at scale.

Unfortunately this proposal doesn't understand the real problems with SQL (lack of types, lack of distribution, and no separation between the data write model and the data read model). It actually doesn't seem to introduce anything really new that can't be done using CEP Engines already.


isn't viable and was never actually viable at scale.

And yet the entire global economy somehow works. Maybe viable and scale don’t mean what you think they mean... I mean I run relational databases of 10s of Tb and know people doing 100s. And people with mere Gb tell me they’re doing amazing things with scalability.... lol


Do you have any recommendations on where to read about the problems of SQL? When you put it this way, clearly my assertion that "SQL isn't bad" needs a review.


There have been many critiques of SQL for example: https://sigmodrecord.org/publications/sigmodRecord/1306/pdfs...


Thanks for this :)

I ask for recommendations because I don't know what I don't know. I've read a few criticisms of SQL where some conclude "just use mongodb". A good recommendation from an expert helps avoid faulty understanding.


I've been writing Datalog for the last week or so. It took me a bit to adapt to the syntax (also, [1] helped), but now I find myself enjoying the combination of terseness and expressiveness.

It's probably a matter of familiarity, but looking at Flux, it seems both noisier than Datalog and less readable than SQL.

Given that readability is a goal for Flux, I guess it's a matter of subjectivity: readable for whom? What background do you have in order for Flux to look readable?

1: http://www.learndatalogtoday.org/


On some level, readability is a subjective thing. So is expressiveness and general feel for a language. It's about aesthetics and reasonable people can disagree about language and API design choices. We're making our choices and we hope that a good number of people come to agree with them. However, part of the engine design is to decouple the language from the actual execution. The engine takes a DAG represented as a JSON object. We'll have parsers that create that DAG from Flux or from other languages like PromQL or anything that people might think of.


> Writing a SQL equivalent example of that query is, at this point, beyond my SQL capabilities.

Let me get this straight, someone who doesn't know SQL is going to solve all of the deficiencies of SQL by inventing something new?

InfluxDB is awesome and the syntax looks great, but please spare me the "problems" you're solving when you can't write a simple moving-average query.


They are just building a subset of semantics which SQL supports, and no join. Nothing to see here.


I'm the dir. of eng. for the team building flux. Before that, I built a transactional SQL system (during peak NoSQL hype). I like SQL.

After a year+ of watching people use InfluxQL and thinking about the types of user experiences that timeseries specific platforms can offer - I'm eager to see flux enter the world.

People like exploratory and notebook like environments. Building a language that integrates with those workflows and even supports a REPL for writing queries is a nice fit to this space.

InfluxDB chooses a non-relational data model. Timeseries queries almost all filter on terms, partition, window, group - and then apply a sequence of functions to those groups. Most queries end up using SQL analytic functions that many users aren't experienced using... while mapping a only vaguely-relational (and very non-normalized) data model to boot.

The timeseries space is visual - visual tooling really matters. SQL isn't an easy fit there, either. It is hard to write SQL incrementally or to interpret just part of a SQL query to show intermediate results. Additionally, users expect a large set of non-standard SQL functions to be builtin.

There are competing systems betting explicitly on SQL and others choosing a functional approach, a strong competition of ideas and practices which should be a win for end-users.

(And to correct the above comment, flux expresses select, project, and join operators.)


when making a new project I honestly believe the best way of doing it is documenting other projects, in this case SQL, datalog, etc and why the choices they made were not what you wanted and the alternatives and why $x was chosen, this way if people disagree with your design you can refer to the research you did.


"I don’t want to live in a world where the best language humans could think of for working with data was invented in the 70’s. I refuse to let that be my reality."

---- re-play the same sentence for c/SQL/ENGLISH ---

I don’t want to live in a world where the best language humans could think of for "communicating" was invented in the (c. 550–1066 CE). I refuse to let that be my reality.

So, I'm starting new language...

sdflakfj lsjkfaldfj sdfkjaslf dflkasjdfk sldfkjaslf laskfdas....

------ P.S.: Any technology is built over time with sedimentary layers... every layer has played key role in where we are today.. I'd not discount any...


The post doesn't mention datalog, but Dedalus/Bloom[1] makes a good case for why datalog is a good starting point for a data query language.

1. https://www2.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-...

2. https://www.youtube.com/watch?v=R2Aa4PivG0g&t=2295s


The ultimate fantasy of every programmer is to a) invent a new language and b) force other people to use it. It’s OK, we all get it, it’s fine. But let’s be honest about our motivations...

A previous employer had RQL, "relational query language". It was between 10,000 and a million times slower than SQL depending on what you were doing (under the covers it was just generating really bad SQL). But the engineer who invented it was sufficiently well connected to get it declared the corporate standard, so...


I love this story. Please tell us more.

Years before I ever read Thinking Fast and Slow I clicked on to the fact that humans are predisposed to making exceedingly poor decisions. This is such a fun example.

I almost want to start a side-project based on collecting these...


I too lived the nightmare that was RQL... it was truly horrible.


You two made me interested, but I couldn’t find anything about this RQL...


If the OP is talking about the same RQL as I am, it was an in-house solution that basically tried to be graphql, kafka and spark solution, all in one, being built at the same time as the app meant to consume it. Horrible experience.


Author here, I just noticed that this got picked up so I'm late to the party. I suppose I'll take the bait and aim to clarify one thing that I think is funny people are getting hung up on. My line: "I don’t want to live in a world where the best language humans could think of for working with data was invented in the 70’s"

Read in context, the meaning of that sentence isn't that things invented decades (or centuries or millennia) ago are all bad. I even state that SQL is a great and powerful tool. If you took from the post that I think SQL is shit and needs to be replaced, you weren't paying attention.

The point of that line (and really the point of us creating Flux) is that we think there can be a more elegant and understandable language (read: API) for working with time series data. But that we won't get there by trying to improve SQL. You don't build an automobile by creating better wheels for your horse and buggy.

Also, SQL isn't a language like English. SQL is an API and APIs change all the time. Yes, code is communication, but its form evolves much more quickly than spoken and written language between humans.


>> If you took from the post that I think SQL is shit and needs to be replaced, you weren't paying attention.

Why, why, why, why, why???? I didn't even have to look it up and I could already tell from your attitude in your blog post and this comment that your company is based in SF and backed by a bunch of VC money. And why is it that the systems software startups always have the worst attitudes to boot? Usually, when a bunch of people take something from an article that wasn't intended by the author it's because the author did a poor job not because a whole bunch of people don't get it or aren't paying attention. The arrogance is unbelievable.


All the words in the english language (not to mention place names, words in other languages/transliterations), possible acronyms, etc, and they choose one that already has a Facebook-promoted architectural pattern using it.

Not sure if arrogance or ignorance, perhaps just apathy?


It seems more complicated than necessary. Why not a syntax like...

    from("telegraf")
      .range(-1h)
      .where(_measurement, "foo")
      .exponentialMovingAverage(-10s)


What's wrong with a pipeline operator? JS is getting one as well.

https://github.com/tc39/proposal-pipeline-operator

Chaining calls in JS means that a function returns the original object, piping is actually passing over the result as an argument, so it's semantically different.


> What's wrong with a pipeline operator?

Honestly, for the use case, the same thing that's wrong with a (mandatory) visible function application operator in Haskell.

> Chaining calls in JS means that a function returns the original object, piping is actually passing over the result as an argument, so it's semantically different.

The syntax Flux uses for creating the result is the same as JS would use for mutating an inbound object, so it actually would be consistent if pushing the result used the same syntax as passing the (mutated) original object would in JS.

Though I’d prefer whitespace for piping just like Haskell does for application. If your are going to specialize a language for a domain, don't be timid about it.


I totally agree. Building a language without an elegant/concise syntax won't go very long in my view.


Exactly, I came here to write about the same comment. I'd optionally keep the named arguments because they self document the code

    from("telegraf")
     .range(start:-1h)
     .filter((r) => r._measurement == "foo")
     .exponentialMovingAverage(size:-10s)
No need of writing fn because that's obviously a function.

Maybe it's a tradeoff between ease of implementation and ease of use.


This looks a lot like SQLAlchemy.


>I don’t want to live in a world where the best language humans could think of for working with data was invented in the 70’s

Maybe we just got it right relatively early!



There seems to be a lot of not understanding SQL in the SQL criticism, and the design of the new language seems to have a lot of excess noise.

1. If arity-1 functions are as dominant in use as the examples suggest, mandatory named args are excessive noise.

2. From the examples, the |> operator also looks like needless noise; code would be cleaner and more readable if this operator was whitespace without any other character (like function application in Haskell, but newlines should also be acceptable) and there was a different punctuation for when that isn't intended.

3. This seems really noisy:

  square = (table=<-) => {
    table |> map(fn: (r) => r._value = r._value * r._value)
  }
Map is a transform applied to an input, so is square, so why can't it be:

  square = map(fn: (r) => r._value = r._value * r._value)
Or, better:

  square = map(_._value *= 2)


Based on all the reflexively negative comments I’ll assume that most the participants in this thread write SQL for a living. Notwithstanding certain ridiculous assertions on the part of the author attempting to correlate the value of a technology with the year in which it was created this seems like a pretty compelling idea. We use InfluxDB at my company to manage time series data and it’s specialization for that use case has been a big benefit. I don’t see any reason to be dismissive by default of a language designed for interacting with data having these specific characteristics in a manner explicitly suited to it.


We use InfluxDB at my company to manage time series data and it’s specialization for that use case has been a big benefit

You should check out https://www.timescale.com . There is a reason everyone is ditching InfluxDB for Postgres now.


No indeed so why don’t they do that instead of pulling a PR stunt to claim they reinvented the wheel?


I don't write sql for a living, and the little I do in my free time is pretty miserable.

I still think this is a really dumb idea.

SQL, as hard as it can be to do stuff with SQL, at least you can use google for getting help. A obscure, single-database-specific with a completely un-googleable name is going to be a complete clusterfuck to try to do anything with it.

The name alone will make trying to get help with the language a disaster. The fact that it will have such limited market penetration (it only works on one specific time series database) don't make the unsearchable name any better.

------

If the author would come out and just admit they want to spend time intellectually masturbating over query language design (I think about inventing a "better" language in my free time too!), I'd have a lot more respect for the project.


Isn't that the problem though? For even a mildly interesting problem you have to google and google for a correct and usually non-obvious solution. And when you find it, it usually works in only one dialect but not other (e.g. MS SQL vs MySQL). It's elitism at its finest, people probably get good money too writing obscure queries, no wonder they are so defensive. I think your comment shows short-sightedness.


....what?

So your arguing that the solution to the annoyance of the platform-specific nature of SQL is to create a platform-specific language?

Or do you think that this language won't heavily depend on the internal implementation of InfluxDB? If you believe that, I want to know what you're smoking. It's gotta be some good stuff.


I'm arguing that SQL is not the be-all and end-all just because it can be googled.


I'm not claiming SQL is great (it's not). I'm arguing that the alternative presented is worse.


>I don’t want to live in a world where the best language humans could think of for working with data was invented in the 70’s

Yeah, because languages go stale... especially those based on mathematical abstractions like SQL.


It was a bad idea to focus on Flux vs SQL.

It would be better compared to pig or pandas.

I like the approach. I find it annoying that in SQL I need more than select privileges to write my own function or view (Aliases and WITH is all that you can use to structure a big query). Also macros that work on all tables that have certain columns are at best difficult to write in SQL, so there is room for improvement.

OTOH in Flux you write something that looks a lot like the output of a planner, so if things change in your dB you might have to modify your scripts instead of adding an index.


I've gone down a similar line of thinking when writing a lot of SQL over timeseries or otherwise ordered datasets recently. Going in a more functional and composable direction while keeping it limited so that hopefully the query execution engine can still make good optimizations seems like the right idea.

Making it a separate and open language outside of Influx is also a great approach - I'd love to see other databases try adopting this. I'll definitely be keeping an eye on this project.


> I don’t want to live in a world where the best language humans could think of for working with data was invented in the 70’s

You don't, most popular one is. Best one is XQuery.


xquery has a power and limitation with it's data model being xpath. sql and relational has proven very capable of being optimized as well as being extensible. the schema that relational systems impose at times has thought to be too restrictive but it makes it possible to reason about the query and data and that makes them powerful; it also helps the user to understand the shape of their data. with systems like google's big query showing how you can have schemaless non 1nf powerful and scalable systems still queried with sql there needs to be a powerful and innovative system to justify moving on from sql. having lambda like syntax for where clauses filters just looks like linq to sql.


> xquery has a power and limitation with it's data model being xpath.

You're conflating things, they share a data model.

https://en.wikipedia.org/wiki/XQuery_and_XPath_Data_Model


They took the name of the popular Julia neual network package? That makes it confusing.


Flux has been used for many many many project and product names.



In ML? What other ML and data science software? I'm curious because I couldn't find any. Of course a generic name is used many ways, but of course we're talking about in context.


Probably! A generic fictional software product name is something like “Acme Flux,” which is analogous to John Smith for people. Given it’s common use as a placeholder, there must be a few, here is one from 1984: https://www.sciencedirect.com/science/article/pii/0743731584....

It is difficult to search for though since it is such a widely used name, probably one of the most widely used one.


What you link to isn't an ML software. If it's so easy to find, why not link to one? http://fluxml.ai/ comes to mind without having to search of course, but I am still not seeing any others.


They should use Dash instead. Oh wait.

wslh 7 months ago [flagged]

Why We’re Building Flux, a New Data Scripting and Query Language? Who knows why?

Before pedantically saying "I don’t want to live in a world where the best language humans could think of for working with data was invented in the 70’s" show us your breakthrough that makes us think you deserve reading across all your article paragraphs.


It seems the argument behind many ”X reinvented” posts is the age of X, not it’s flaws.

A more cynical observer might also guess people are shooting for a place in history. If you successfully launch the better mouse trap, fame and riches await. See also: intense churn in JS land.


Ironically, the fundaments of why computers even work were established 70 years before the 70s and haven't really changed a bit. I think his quote is definitely tongue-in-cheek.


Uhm let me think: if the world (or even a small niche really) switched to a tool I have authored, and they become fundamentally dependent on it, I’ve basically secured myself a significant and perpetual stream of income. Not bad eh! /s


I don't know what "breakthroughs" you are talking about, but there are examples and comparisons between Flux and SQL just below that sentence.

If you used the same amount of time to finish the article that you used to write this rant, you wouldn't have to write that in the first place.


I skimmed through the crap and was highly deceived by the example. So I think he would still have ranted. They chose a very specific example that made their product look good.

Beside the article start by stating that they rebranded their product influxQL —> Flux precisely because power user found major lacking features as to compare with SQL.

Very bad PR move.


"We started with poorly-implemented SQL, which frustrated people, so we decided to make something without any of the advantages of SQL instead."


From TFA: ". This is kind of like the worst part of Lisp (nested function calls), but even less readable. Not only was the Flux example more terse, it was more readable and understandable."

It's kinda funny, because his forward pipe operator (suspiciously similar to Elixir's) is the same as a threading macro, which you have in lisp (or can trivially write if you don't)


"I don’t want to live in a world where the best language humans could think of for working with data was invented in the 70’s"

I don't want to live in a world where the best language humans could think of for communicating with other humans was invented in the ${CENTURY_WHEN_ENGLISH_WAS_INVENTED}70's.


Seems more like the XKCD comic on standards.

I really loved the tenets given, ie.. Useable Readable Composable Testable Contributable Shareable

But post that the article really failed to connect how the new language is going to do the above in a way no other language has done.

Also, I agree with the rest on the point against SQL, doesn't matter if SQL was invented in the 70s, so was the many programming languages and paradigms we use today.

PS. I don't make a living writing SQL.


these things are very easy to do and clear to write using R's tidyverse.


It looks indeed a lot like graphite, and since you explicitly mention in your talk that your objective is to reimplement all the functions that are present in graphite, why no instead present your work as a port of the graphite language, with some extension to work on other data sources and sinks (and dots replaced by the fat pipe)?

This is interesting to me as I'm currently working on something close: a lightweight stream processor to allow system engineers to manipulate some large streams of data while in flight to a database. And I've been wondering (and still am) about the trade-offs between simple and expressive. Very early, I decided not to be TS specific at all (since we were prevented to use an off-the-shelf product for that reason that our data does not look enough like a TS -- not a single time nor a single value fields). Eventually, after a few detours, we ended up favoring a SQL like language for that reason that it's field agnostic.

Regarding the language itself, the main differences I can see are that you query over a time range while we process infinite streams, with the consequence that we must explicitly tells each operation when it has to output values (windowing); the other is that you have an implicit key and one TS by "group" with the same key, which makes piping many operations easier (but JOINing harder), while we have to be more specific about how to group.

So for instance, where you have:

  from(db:"foo") |> window(every:20s) |> sum()
we would have the more SQL-alike:

  select sum value from foo group by time // 20
("//" being the integer division).

Or, if you needed the start and stop additional columns added by window():

  select sum value, (time // 20)*20 AS start, start+20 AS stop group by start
But then, because fluxlang process a range of time while we stream "forever" we would also have to tell when to output a tuple, for instance after 20s has passed:

  select sum value, (time // 20)*20 AS start, start+20 AS stop group by start commit after in.time > group.stop
which gets verbose quickly.

But to us this constraint imposed by streaming (as opposed to querying a DB for the data to process) is essential since our main use case is alerting from a single box, so querying every minute the last 10 minutes of data for thousands of defined alerts would just not work.

Another interesting difference is the type system. One thing I both like and hate in SQL is the NULL. It's convenient for missing data but it's also the SQL equivalent of the null pointer. So we have a type system that looks closely on it: we support this special case of algebraic data type that a "type?" is a NULLable "type", and that NULLs must be dealt with before they reach a function that does not accept NULLs. For instance, there is no way to compile a filter which condition can be NULL, and one would have to COALESCE it first. What's your thoughts about missing data? Do you manage to avoid the issue entirely, including after a JOIN operation?

The other difference I noticed is how nice your query editor is. For now our query editor is $EDITOR, but my plan is to build a data source plugin for Grafana. What do you think of this approach?




Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: