IMHO these conveniences should just be in the language. List comprehensions, dictionary comprehensions, and short-hand for literals are the kind of syntactic sugar that have almost no downsides, save you some RSI, and make code more readable. I'm surprised comprehensions in particular haven't spread to more modern languages.
> I'm surprised comprehensions in particular haven't spread to more modern languages.
I've come to dislike list comprehensions. Simple ones are ok but they do not handle incremental complexity well. Invariably it means code slowly becomes really unreadable as time goes by because nobody wants to rewrite the list comprehension when one more little tweak stuffed in there will do the job.
I much prefer object-functional chaining style ala Scala, Groovy, Ruby etc:
This assumes some_giant_list is an object that has "findAll", that "findAll" returns and object that has "groupBy", which returns an object that has "countBy".
Looking at that code, I don't even know what the intermediary objects are, but given the names of the operations, they can't be all flat lists.
Besides, you may not put your list comprehension inline. It is often more readable to make it span on several lines, espacially if it's a complex one:
banned_ip = {
con.ip
for con in connexions
if con.ip in blacklist
or
con.type == "internal"
}
This is exact how I write mine, I find it much more readable personally.
The real challenge I've encountered is typically you can only use one expression. If I need to write a slightly more complex mapping I am forced to write a function, which I normally define just before the comprehension. Even though this works, it introduces boilerplate I'd rather not write.
I will admit this doesn't happen often, but it happens enough to bother me.
> This assumes some_giant_list is an object that has "findAll", that "findAll" returns and object that has "groupBy", which returns an object that has "countBy".
> Looking at that code, I don't even know what the intermediary objects are, but given the names of the operations, they can't be all flat lists.
In regards to this, I would say sure, but does that really matter. Most IDEs will give you enough inference to list the operations you can perform and the return types they give. If anything the "super collections" make a developers life far easier, I think Kotlin does a particularly good job of this with a vast set of extension functions. Have a scroll of this documentation https://kotlinlang.org/api/latest/jvm/stdlib/kotlin.collecti... mapIndexedNotNull is a great example
Functional chaining (aka “fluent interfaces”) is very bad for dependency injection / mocking.
In f(g(x)) you can directly access or patch f and g as top level names.
In x.g().f() you have to patch methods internal to other structures, and this can lead to problems if the patching should only happen in a certain local scope.
I encountered this recently with differences between pathlib.Path and os in Python.
Consider
def my_mkdir(pathname):
pathlib.Path(pathname).mkdir()
# or
os.mkdir(pathname)
From the pov of testing and decomposition, the second option is much nicer, because I can patch os.mkdir directly, and not patch pathlib.Path.mkdir (and also worry about controlling when instance creation happens to use the patch when I need it, but let other possible pathlib.Path objects my tests interacts with be constructed normally). mkdir is even a very simple example since pathlib.Path.mkdir is an instance method but only relies on the string data the instance has. Imagine how much harder if pathlib.Path.mkdir has complex interaction with the internal object structure or other instance methods.
Obviously you _can_ solve it either way, but the fluent interface does nothing except require more code.
On this balance I think list comprehensions (or just fmap, which is all comprehensions are) are much, much better than chaining.
Also if you want to operate on data structures just operate on them with module functions.
Obviously bad practice can ruin any good thing, but I much prefer list comprehensions for smaller operations and will refactor them as they become unwieldy. If you're in an environment where "nobody really wants to" rewrite stuff that's becoming hard to work with (for whatever reason) it probably doesn't matter which language feature dooms you.
To me that looks super easy to read. You just go left to right.
for y in z:
for x in zz:
if y in x:
yield x
It’s extremely easy to sight read. If the lines become long, just split them up and it’s even more obvious.
[x
for y in z
for x in zz
if y in x]
There’s nothing tricky about multiple loops and conditions in comprehensions in most languages that support them. Just go left to right and it mirrors outer to inner loops/conditionals.
Also, most data scientists presumably have some advanced math background so it's akin to reading set-builder notation (albeit with listcomps the sets are sometimes nested). Your listcomp reads:
{x : y ∈ z, x ∈ zz | y ∈ x}
If you've read enough statistics/ML papers, this becomes second nature.
> If you've read enough statistics/ML papers, this becomes second nature.
As someone who breathes that style of notation, there is excellent support in vim and emacs for custom display for readability. Conceal syntax in vim¹(builtin for rust/help/$few_others and external such as vim-cute-python²), and pretty-mode³ in emacs. I'm sure similar things exist in other editors too, but I don't use those ;)
You don't get the exact representation in your example with the things I've mentioned, but if you're used to a more "mathy"-style they really are quite nice. YM[and taste]MV.
This is only delightful when you are doing synchronous computation or you have async/await in the language. Until Loom is available, using this same pattern with CompletableFuture breaks in the face of try-with-resources.
Also I don't think it's possible in Python using await since await is a prefix keyword so in your example .groupBy is not a method on Future so it won't work.
I love ReactiveX as much as anyone but these issues cheese me off.
Rust has await as a postfix operator so this pattern works flawlessly when mixing async and sync code.
Yes, as a dev who started in Python and has done more JS recently, I’ve really come to believe that the chaining of in-line functions is a better approach in the end. Python theoretically has the advantage of working with lists or iterators, but in practice, all you really use are lists.
To be fair many tools use iterator types that are not lists like Django querysets.
I feel the nice thing in Python is generator expressions, so for lots of code you can decrease the memory footprint and speed some code up through easy lazy evaluation.
I do appreciate the clarity of function chaining, and I think Rust made particularly good decisions around iteration in that way.
Function chaining ideally needs good ways to break iteration, skip and things, and JS isn't very feature rich beyond the basic map, reduce, forEach. It feels very incomplete compared to a lot of languages that offer that stuff. Hence the popularity of underscore and then lodash.
I'm here to tell you that comprehension in this form is not everyone's thing. Scala's comprehension is more generic and it's actually monad binding. Comprehension as seen in python is only readable for simple computation, so you end up having to use map reduce sometime. That makes 2 ways to do the same thing and the "zen of python" a lie in this instance.
I love the idea of main-less Go scripts for times where you just want to do something simple, fast.
Neugram [1] aimed to make Go a better scripting tool, but unfortunately it seems the project is dead. Although Go+ says its focus is on data science, I think it could fill this niche too.
Go doesn't need to have shebang support, the kernel will provide that. But for the record it will work:
frodo ~ $ cat t.go
//opt/go/bin/go run $0 $@ ; exit
package main
import( "fmt" )
func main() {
fmt.Printf("Hello, world\n" );
}
frodo ~ $ ./t.go
Hello, world
Otherwise there are a bunch of go-interpreters out there, which can be used for adhoc scripting. I'm sure that in some circumstances they can be useful, but I've only used them for providing extensions / scripting to host-applications, rather than trying to use them interactively.
I’m specifically asking about Go+ (not Go) “having” shebang support in that you can legally use the standard #! shebang. Yes, the kernel handles the shebang, but if the shebang target doesn’t ignore the line and errors then that’s where the problem lies.
The workarounds mentioned by you and the other commenter’s linked article are sub-optional in that they require a system-wide modification, require wrapper scripts, or are non-standard hacks.
> I love the idea of main-less Go scripts for times where you just want to do something simple, fast.
What do you mean by a main-less script? And what would be "simple and fast" about it? If I want to try out something fast, I just do everything in a main.go and run it with "go run main.go", and that works well as a scripting language.
I have a feeling languages like this have their niche cases however I'm not getting a good reason for why this is need when compared to Python for data science. Maybe I'm missing something but this is essentially a wrapper for Golang code to make it feel more like Python.
Agreed, but it might be useful for a full stack data scientist that is forced to work in a Go systems environment.
That's why Python+PyData has had so much success. There are packages to support data science, but the language itself can also be used to implement a system, so integration is rather seamless. That's not true for, say, R.
I guess it depends on what you’re trying to accomplish (I’ve worked heavily with both R and Python).
If you’re trying to create ETL pipelines that integrate with BigQuery, Mongo, or whatever other database, I think it’s fair to say that the Python packages are generally better documented than their R counterparts.
For most other things, IMO it’s hard to really separate the two languages. Is standing up a Flask API really easier than in plumber?
For dashboarding, it’s is as quick (if not much quicker) to create a decent prototype with Shiny vs Plotly Dash or bokeh.
For simple linear and logistic model training, R’s built-in stats package has much more interpretable outputs vs sklearn, and directly inspired statsmodel. Wes McKinney has acknowledged that pandas draws heavily from R’s native dataframe. And so on and so on.
EDIT:
Also forgot to mention that with R packages like reticulate, you can also directly run Python code within an R environment now. So if there happens to be some Python package that doesn’t have an R equivalent, you can still work in R (though I’ve found the opposite situation to be far more common).
What would you consider a system? Python definitely has more market share than R, but there's still name brand companies of various sizes that use an R stack for data science.
The official Github repo of Go+ is hosted on a PaaS company called Qiniu's team repo with their founder & CEO as the top contributor[1][2]. He is a Golang enthusiast[3] and my guess is that this is probably his favorite pet project ;)
We introduce rational number as native Go+ types. We use -r suffix as a rational constant. For example, (1r << 200) means a big int whose value is equal to 2200. And 4/5r means the rational constant 4/5.
Nice! Coming from Python, I miss being able to use the normal arithmetic operators on bigints in Go.
I'm founder of the Go+ project. And thanks for your attention. We run this project very seriously. It will be released weekly until version 1.0 is released.
Thank you so much for this! I submitted it here because I stumbled across it and was really surprised that it hadn't been on HN yet! If we somehow get Go+ in a Jupyter Kernel that would be awesome!
So due to how Go handles memory, it needs to do something called "trampoline", over to the C stack. This causes Go->C calls to have a substantial overhead 70-200ns/op. Most other languages that value is closer to 1ns. This means that every call to a GPU suffers from this latency.
This isn't too much of a problem if your doing all supervised learning with batch operations because the speedup of a GPU over a bigger operation outweighs the FFI latency.
However, it's a problem that doesn't appear to have a solution due to Go's memory management choices, and will hamper it ever being used for accelerated computing problems. This is one of the reasons Rust moved to using ownership rules.
This is very cool! I wonder what the chances of this being pulled into Go proper is, and whether the lack of cgo is an intractable issue, which would rule it out entirely.
Why not gosci? Anything with plus in the name is awful to google. Then again, go is a similarly awful thing to google so I guess youre in good company.
Can we try our best to not fork languages? Someone in 8 years is going to have to refactor a bunch of this GoPlus code to be compatible the Go2 or Go3 and it’s going to suck.