Hacker News new | past | comments | ask | show | jobs | submit login
GoPlus – The Go+ language for data science (github.com/qiniu)
143 points by dx034 10 months ago | hide | past | favorite | 61 comments

IMHO these conveniences should just be in the language. List comprehensions, dictionary comprehensions, and short-hand for literals are the kind of syntactic sugar that have almost no downsides, save you some RSI, and make code more readable. I'm surprised comprehensions in particular haven't spread to more modern languages.

> I'm surprised comprehensions in particular haven't spread to more modern languages.

I've come to dislike list comprehensions. Simple ones are ok but they do not handle incremental complexity well. Invariably it means code slowly becomes really unreadable as time goes by because nobody wants to rewrite the list comprehension when one more little tweak stuffed in there will do the job.

I much prefer object-functional chaining style ala Scala, Groovy, Ruby etc:

      .findAll { it.foo > 20 }
      .groupBy { it.bar }
      .countBy { it.value.size() > 5 }
They scale much better over time as new constructs get inserted as functional additions in the sequential pipeline.

This assumes some_giant_list is an object that has "findAll", that "findAll" returns and object that has "groupBy", which returns an object that has "countBy".

Looking at that code, I don't even know what the intermediary objects are, but given the names of the operations, they can't be all flat lists.

Besides, you may not put your list comprehension inline. It is often more readable to make it span on several lines, espacially if it's a complex one:

    banned_ip = {
        for con in connexions 
        if con.ip in blacklist
           con.type == "internal"

This is exact how I write mine, I find it much more readable personally.

The real challenge I've encountered is typically you can only use one expression. If I need to write a slightly more complex mapping I am forced to write a function, which I normally define just before the comprehension. Even though this works, it introduces boilerplate I'd rather not write.

I will admit this doesn't happen often, but it happens enough to bother me.

> This assumes some_giant_list is an object that has "findAll", that "findAll" returns and object that has "groupBy", which returns an object that has "countBy".

> Looking at that code, I don't even know what the intermediary objects are, but given the names of the operations, they can't be all flat lists.

In regards to this, I would say sure, but does that really matter. Most IDEs will give you enough inference to list the operations you can perform and the return types they give. If anything the "super collections" make a developers life far easier, I think Kotlin does a particularly good job of this with a vast set of extension functions. Have a scroll of this documentation https://kotlinlang.org/api/latest/jvm/stdlib/kotlin.collecti... mapIndexedNotNull is a great example

Functional chaining (aka “fluent interfaces”) is very bad for dependency injection / mocking.

In f(g(x)) you can directly access or patch f and g as top level names.

In x.g().f() you have to patch methods internal to other structures, and this can lead to problems if the patching should only happen in a certain local scope.

I encountered this recently with differences between pathlib.Path and os in Python.


    def my_mkdir(pathname):
        # or
From the pov of testing and decomposition, the second option is much nicer, because I can patch os.mkdir directly, and not patch pathlib.Path.mkdir (and also worry about controlling when instance creation happens to use the patch when I need it, but let other possible pathlib.Path objects my tests interacts with be constructed normally). mkdir is even a very simple example since pathlib.Path.mkdir is an instance method but only relies on the string data the instance has. Imagine how much harder if pathlib.Path.mkdir has complex interaction with the internal object structure or other instance methods.

Obviously you _can_ solve it either way, but the fluent interface does nothing except require more code.

On this balance I think list comprehensions (or just fmap, which is all comprehensions are) are much, much better than chaining.

Also if you want to operate on data structures just operate on them with module functions.

I think pandas really messes up on this.

    agg(groupby(df, cols), funcs)
is way better than


A very valuable insight, thank you.

Obviously bad practice can ruin any good thing, but I much prefer list comprehensions for smaller operations and will refactor them as they become unwieldy. If you're in an environment where "nobody really wants to" rewrite stuff that's becoming hard to work with (for whatever reason) it probably doesn't matter which language feature dooms you.

They're great until you find something like this in code:

[x for y in z for x in zz if y in x]

Then you begin to wonder how great they really are.

To me that looks super easy to read. You just go left to right.

    for y in z:
        for x in zz:
            if y in x:
                yield x
It’s extremely easy to sight read. If the lines become long, just split them up and it’s even more obvious.

     for y in z
     for x in zz
     if y in x]
There’s nothing tricky about multiple loops and conditions in comprehensions in most languages that support them. Just go left to right and it mirrors outer to inner loops/conditionals.


Also, most data scientists presumably have some advanced math background so it's akin to reading set-builder notation (albeit with listcomps the sets are sometimes nested). Your listcomp reads:

{x : y ∈ z, x ∈ zz | y ∈ x}

If you've read enough statistics/ML papers, this becomes second nature.

> If you've read enough statistics/ML papers, this becomes second nature.

As someone who breathes that style of notation, there is excellent support in vim and emacs for custom display for readability. Conceal syntax in vim¹(builtin for rust/help/$few_others and external such as vim-cute-python²), and pretty-mode³ in emacs. I'm sure similar things exist in other editors too, but I don't use those ;)

You don't get the exact representation in your example with the things I've mentioned, but if you're used to a more "mathy"-style they really are quite nice. YM[and taste]MV.

1. http://vimdoc.sourceforge.net/htmldoc/syntax.html#conceal

2. https://github.com/ehamberg/vim-cute-python (moresymbols branch for er… more symbols)

3. https://github.com/akatov/pretty-mode

Interesting. I didn't know that!

This is only delightful when you are doing synchronous computation or you have async/await in the language. Until Loom is available, using this same pattern with CompletableFuture breaks in the face of try-with-resources.

Also I don't think it's possible in Python using await since await is a prefix keyword so in your example .groupBy is not a method on Future so it won't work.

I love ReactiveX as much as anyone but these issues cheese me off.

Rust has await as a postfix operator so this pattern works flawlessly when mixing async and sync code.

Yes, as a dev who started in Python and has done more JS recently, I’ve really come to believe that the chaining of in-line functions is a better approach in the end. Python theoretically has the advantage of working with lists or iterators, but in practice, all you really use are lists.

> all you really use are lists.

The more you get experience in Python, and the less you use specifically lists.

In fact, you can spot people getting confortable with the language when they start importing itertools, unless they are data scientists, of course.

To be fair many tools use iterator types that are not lists like Django querysets.

I feel the nice thing in Python is generator expressions, so for lots of code you can decrease the memory footprint and speed some code up through easy lazy evaluation.

I do appreciate the clarity of function chaining, and I think Rust made particularly good decisions around iteration in that way.

Function chaining ideally needs good ways to break iteration, skip and things, and JS isn't very feature rich beyond the basic map, reduce, forEach. It feels very incomplete compared to a lot of languages that offer that stuff. Hence the popularity of underscore and then lodash.

If you’re doing a list comprehension with a Django queryset, you’re risking having a million accidental DB requests

> I'm surprised comprehensions in particular haven't spread to more modern languages.

Comprehensions become unreadable quite quickly - even simple ones have to be read inside-out, and complex (especially nested) ones are much worse.

Most modern languages seem to prefer functional-style methods like `map`, `filter` etc, which are more readable in all but the simplest cases.

> which are more readable in all but the simplest cases.

Except in practice, the vast majority of comprehensions are simple.

It doesn't really help that the Go developers are stuck in the early 80's still.

I hope this succeeds and forces some changes.

I'm here to tell you that comprehension in this form is not everyone's thing. Scala's comprehension is more generic and it's actually monad binding. Comprehension as seen in python is only readable for simple computation, so you end up having to use map reduce sometime. That makes 2 ways to do the same thing and the "zen of python" a lie in this instance.

I love the idea of main-less Go scripts for times where you just want to do something simple, fast.

Neugram [1] aimed to make Go a better scripting tool, but unfortunately it seems the project is dead. Although Go+ says its focus is on data science, I think it could fill this niche too.

By the way: does Go+ have shebang support?

1: https://github.com/neugram/ng

Go doesn't need to have shebang support, the kernel will provide that. But for the record it will work:

        frodo ~ $ cat t.go 
        //opt/go/bin/go run $0 $@ ; exit
        package main
        import( "fmt" )

        func main() {
           fmt.Printf("Hello, world\n" );

        frodo ~ $ ./t.go
        Hello, world
Otherwise there are a bunch of go-interpreters out there, which can be used for adhoc scripting. I'm sure that in some circumstances they can be useful, but I've only used them for providing extensions / scripting to host-applications, rather than trying to use them interactively.

I’m specifically asking about Go+ (not Go) “having” shebang support in that you can legally use the standard #! shebang. Yes, the kernel handles the shebang, but if the shebang target doesn’t ignore the line and errors then that’s where the problem lies.

The workarounds mentioned by you and the other commenter’s linked article are sub-optional in that they require a system-wide modification, require wrapper scripts, or are non-standard hacks.

Not main-less but you can run go scripts using a shebang with some elbow grease. https://blog.cloudflare.com/using-go-as-a-scripting-language...

> I love the idea of main-less Go scripts for times where you just want to do something simple, fast.

What do you mean by a main-less script? And what would be "simple and fast" about it? If I want to try out something fast, I just do everything in a main.go and run it with "go run main.go", and that works well as a scripting language.

> What do you mean by a main-less script?

probably something that's common in "scripting" languages, where you don't have to wrap your code in main(), it just executes from the top:


  func main() {

I have a feeling languages like this have their niche cases however I'm not getting a good reason for why this is need when compared to Python for data science. Maybe I'm missing something but this is essentially a wrapper for Golang code to make it feel more like Python.

Agreed, but it might be useful for a full stack data scientist that is forced to work in a Go systems environment.

That's why Python+PyData has had so much success. There are packages to support data science, but the language itself can also be used to implement a system, so integration is rather seamless. That's not true for, say, R.

I guess it depends on what you’re trying to accomplish (I’ve worked heavily with both R and Python).

If you’re trying to create ETL pipelines that integrate with BigQuery, Mongo, or whatever other database, I think it’s fair to say that the Python packages are generally better documented than their R counterparts.

For most other things, IMO it’s hard to really separate the two languages. Is standing up a Flask API really easier than in plumber?

For dashboarding, it’s is as quick (if not much quicker) to create a decent prototype with Shiny vs Plotly Dash or bokeh.

For simple linear and logistic model training, R’s built-in stats package has much more interpretable outputs vs sklearn, and directly inspired statsmodel. Wes McKinney has acknowledged that pandas draws heavily from R’s native dataframe. And so on and so on.


Also forgot to mention that with R packages like reticulate, you can also directly run Python code within an R environment now. So if there happens to be some Python package that doesn’t have an R equivalent, you can still work in R (though I’ve found the opposite situation to be far more common).

I use Python and R for data science, and I've never had any issue with R. In fact, I find that many tasks are much simpler in R than in Python.

I am referring to using R to build systems. It's not common.

What would you consider a system? Python definitely has more market share than R, but there's still name brand companies of various sizes that use an R stack for data science.

RStudio lists dozens of example clients here: https://rstudio.com/about/customer-stories/.

Use cases include collaborative model development, EDA tools, dashboarding, printed report generation (PDFs and HTML), public facing websites, etc.

> never had any issue

The official Github repo of Go+ is hosted on a PaaS company called Qiniu's team repo with their founder & CEO as the top contributor[1][2]. He is a Golang enthusiast[3] and my guess is that this is probably his favorite pet project ;)

[1]: https://github.com/qiniu/goplus/graphs/contributors

[2]: https://github.com/xushiwei

[3]: https://twitter.com/xushiwei

We introduce rational number as native Go+ types. We use -r suffix as a rational constant. For example, (1r << 200) means a big int whose value is equal to 2200. And 4/5r means the rational constant 4/5.

Nice! Coming from Python, I miss being able to use the normal arithmetic operators on bigints in Go.

I'm founder of the Go+ project. And thanks for your attention. We run this project very seriously. It will be released weekly until version 1.0 is released.

Thank you so much for this! I submitted it here because I stumbled across it and was really surprised that it hadn't been on HN yet! If we somehow get Go+ in a Jupyter Kernel that would be awesome!

I feel where they are trying to go with this, I wrote https://github.com/aunum/gold in Go because of all the nice parts of Go.

This solves some of the pain points but is still fatally flawed as any other Go ML tool in that it can’t accelerate due to the Go<->C FFI.

Until that issue is resolved Go simply won’t be broadly accepted in data science.

Would you please explain?

So due to how Go handles memory, it needs to do something called "trampoline", over to the C stack. This causes Go->C calls to have a substantial overhead 70-200ns/op. Most other languages that value is closer to 1ns. This means that every call to a GPU suffers from this latency.

This isn't too much of a problem if your doing all supervised learning with batch operations because the speedup of a GPU over a bigger operation outweighs the FFI latency.

However, it's a problem that doesn't appear to have a solution due to Go's memory management choices, and will hamper it ever being used for accelerated computing problems. This is one of the reasons Rust moved to using ownership rules.

You can read a bit more at https://dave.cheney.net/2016/01/18/cgo-is-not-go

Nice improvement, but given the community's receptions regarding such kind of changes, why not just make use of Julia, Ocaml, F#?

This is very cool! I wonder what the chances of this being pulled into Go proper is, and whether the lack of cgo is an intractable issue, which would rule it out entirely.

The error handing is similar enough to a go proposal that was rejected: https://github.com/qiniu/goplus/wiki/Error-Handling

    foo()? // for goplus
    try(foo()) // the go proposal

Why not gosci? Anything with plus in the name is awful to google. Then again, go is a similarly awful thing to google so I guess youre in good company.

I'm not a Go user, but is the important idea here that it can be run from a scriptingish environment like Jupyter or something?

It's significant syntactic sugar that brings it closer to Python, but no notebook support from what I can see.

Do any notebooks support a compiled language?

There is support for many languages Haskell :: https://github.com/gibiansky/IHaskell Rust :: https://github.com/google/evcxr

https://github.com/jupyter/jupyter/wiki/Jupyter-kernels will give you a more detailed list

There is a Go kernel for jupyter/nteract.


R notebooks support a bunch. I count about 41 languages in my install, including Go, C++, Fortran and Julia, as well as Python, SQL, Bash, etc.

Absolutely, see eg: BeakerX [1]

[1] http://beakerx.com/

The Jupyter project has a notebook called Xeus for C++.

Go+ is planed to support jupyter and nteract.

Is there any talk on changing garbage collection from standard go? Is that needed in the data science world?

Surprisingly to see my former employer on HN lol. Better than qiniulang in many ways.

Can we try our best to not fork languages? Someone in 8 years is going to have to refactor a bunch of this GoPlus code to be compatible the Go2 or Go3 and it’s going to suck.

We already have Python and Scala

You're barking up the wrong tree.

I love it!

It will end up being like Google plus

This project is not owned by Google.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact