But Nim is only one of a whole suite of languages that easily cruise to a 10x performance win over Python. And that isn't counting multicore - if you count that you quickly get to a 100x improvement.
Personally I use Groovy for much of what I do for similar reasons (which is somewhat unusual) but its just a placeholder for "use anything except python".
From my experience in using Python at my last job, I'll also add that Python is decent at tasks that aren't CPU-bound.
I wrote a lot of scripts that polled large amounts of network devices for information and then did something with it (typically upsert the data into a database, either via direct SQL or a REST API to whatever service owns the database). All these tasks were heavily network-bound. The amount of time the CPU was doing any work was minuscule compared to the amount of time it was waiting to get data back from the network. I doubt Nim or any other language would have been a significant performance improvement in this case.
For what it's worth, that made these scripts excellent candidates for multithreading. I'd run them with 20+ threads, and it was glorious. At first I did multiprocessing, because of all the GIL horror stories, but multiprocessing made it very difficult to cache data, so eventually I said "well, all this is network-bound so the GIL doesn't even apply" and switched over to multiprocessing.dummy (which implements pools using the same API as multiprocessing but with threads instead of processes), and I never looked back.
Edit: For what it's worth, Nim sounds like a really cool language, and it's right up my alley in several ways, I just don't think Python is particularly slow at network-bound tasks that use very little CPU.
And suddenly you need to introduce quite a bit more technical complexity into this story that‘s gonna be hard to explain to management - all they see is that you now can insert a couple of millions of DB rows and their Big Data consultants[TM] told them that this is nowadays not even worth thinking about.
Point being: If your performance ceiling is low, you‘re gonna hit it sooner.
My team and I have been using Python for web, scripting, and ETL development since 2007. I don't recall the last time Python wasn't "fast enough" for anything I needed to do. I'm sure it's legitimately too slow for plenty of use cases and classes of programming domains. But for a general purpose language that makes our developers incredibly productive (which is what we optimize for organizationally), I've not "often" found the point where it becomes bothersome. On the contrary, I'd say I've rarely found it. And in those cases, the workaround to make it fast enough is there.
That's not to dispute your experience. I just want to provide a counter example.
Python developers are easier to hire (certainly than Nim), have a monolanguage experience (meaning less onboarding), but are not magically more productive. Developer availability is still a great reason to choose Python.
And that isn't the only culprit. Large code bases will be hard to structure, maintain and organize. And you will spend more time writing tests than writing productive code. Because you don't have many defenses against errors and because once a bug will hit production it will be very difficult to debug.
So for me it's really performance being the main thing to think about with python. If one anticipates tighter latency or throughput requirements I'd go for something else.
IO-bound tasks are almost by definition outside of your Python application's control. You yield control to the system to execute the actual task, and from that point on - you're no longer in control of how long the task will take to complete.
In other words, Python "being fast" by waiting on a Socket to complete receiving data isn't a particularily impressive feat.
But as demonstrated, Nim is fast to write and fast to compile, so Python has little edge. Just it's huge ecosystem.
self-contradiction at its best. kindly be reminded the same advantage was the only thing that kept Java alive for so long until it finally started to enter the XXI century a couple years ago.
you just can't discount an ecosystem, especially if its huge.
E.g. random example:
Sprinkle some cdef's in your python and suddenly you're faster than c++
25.8 seconds down to 1.5
Still, getting Java level performance out of python is a huge improvement and should be enough for most cases.
For a c++ comparison python would be much better pointing out the productivity advantage it has over the notoriously low productivity of c++ development - rather than competing on execution performance.
Some may consider Jax, and its XLA compiler, but unless you require gradients, numba will be significantly faster, an instance of this is available here .
XLA runs on a higher level than LLVM and therefore can't achieve the same optimizations as numba does using the latter. IIRC numba also has a Python to Cuda compiler, which is also very impressive.
CPython's slowness doesn't boggle my mind at all. It's a bytecode interpreter for an incredibly dynamic language that states simplicity of implementation as a goal. I would say performance is actually pretty impressive considering all that. What _does_ boggle my mind is the performance of cutting-edge optimizing compilers like LLVM and V8!
At least there is a benefit to a simple implementation: Someone like me can dive into CPython's source and find out how things work.
No, Nim is truly among the top fastest languages when writing idiomatic code as shown in many benchmarks.
> But Nim is only one of a whole suite of languages that easily cruise to a 10x performance win over Python
...while also being very friendly to Python programmers, intuitive and expressive. Unlike many other languages.
Granted, but inside its optimised numerical science ecosystem, Python is, in fact, fast enough. If most of your program is calls into numpy, Python will get you where you need to go. In my experience, one scalar Python math operation takes about the same amount of time as the equivalent numpy operation on a million-element array. Linked against a recent libblas, numpy will even distribute work across multiple cores. So much for the GIL.
Also, "awful" is too harsh. Probably 90% of Python code just doesn't need to be faster than it is.
Also I don't know how anyone could design a language in the 21st century and make basic mistakes like this:
> Nim treats identifiers as equal if they are the same after removing capitalization (except for the first letter) and underscore, which means that you can use whichever style you want.
If that's any indication of the sanity of the rest of Nim then I'd say steer well clear!
Nim's underlying, perhaps understated philosophy is that it lets you write code the way you want to write code. If you like snake case, use it. If you want camel case, sure. Write your code base how you want to write it, keep it internally consistent if you want, or don't. Nim doesn't really care.
(That philosophy extends far beyond naming conventions.)
Your code is inconsistent because someone else's code was inconsistent - that's simply not a problem in Nim.
Could Nim have forced everyone to snake_case naming structures for everything from the start? Well, sure, but then the people that have never actually written code in Nim would be whining about that convention instead and we'd be in the same place. After having actually used Nim, my opinion, and I would venture to say the opinion of most, is that its identity rules were a good decision for the developers who actually write Nim code.
This is a serious design flaw. It absolutely should be front and center when Nim is discussed.
> Nim's underlying, perhaps understated philosophy is that it lets you write code the way you want to write code. If you like snake case, use it. If you want camel case, sure. Write your code base how you want to write it, keep it internally consistent if you want, or don't. Nim doesn't really care.
I want to write code where "myFoo != my_foo". Evidently, nim doesn't allow that, so this argument seems pretty hollow.
> Now you're stuck with screaming linters or random `# noqa` lines stuffed in your code, and that one variable that you're using from a library sticks out like a sore thumb.
This is because we have crappy linters. If a linter can't tell that an identifier is camel case because it is third party, it's a bad linter.
> Could Nim have forced everyone to snake_case naming structures for everything from the start?
This would have been preferable to this madness.
Spot on. I wrote plenty of Nim and the style-insensitivity is a feature and not a bug.
Not allowing the use of 3 different variables named userName, user_name and username only encourages readable and robust code.
Not entirely. Nim‘s benefit here is that it’s superficially similar enough to Python that’s it’s easy for people from that world to pickup and start using Nim.
> Also I don't know how anyone could design a language in the 21st century and make basic mistakes like this:
> If that's any indication of the sanity of the rest of Nim then I'd say steer well clear!
It may seem like a design mistake at first glance but it’s surprisingly useful. It’s intent is to allow a given codebase to maintain a consistent style (eg camel vs snake) even when making use of upstream libraries that use different styles. Not including the first letter avoids most of the annoyance of wantonly mixing all cap constants or lower case and linters avoid teams mismatching internal styles. Though mostly I forgot it’s there as most idiomatic Nim code sticks with camel case. I’d say not to knock it until you’ve tried it.
The rest of Nim’s design avoids many issues I consider actual blunders in a modern language such as Python’s treatment of if/else as statements rather than as expressions, and then adding things like the walrus operator etc to compensate.
That doesn't sound right at all. It sounds like a design choice aimed at achieving the exact opposite: inconsistency without any positive tradeoff in return.
> Not including the first letter avoids most of the annoyance of wantonly mixing all cap constants or lower case and linters avoid teams mismatching internal styles.
That does not sound right at all. At most, it sounds like the compiler does not throw errors when stumbling on what would otherwise be syntax errors, but you still have all the mismatches in internal styles and linters complaining about code and teams wasting time with internal piss matches, and more importantly a way to foster completely futile nitpicking discussions within a language community.
This doesn't make sense. For an entirely new language you can just have the entire ecosystem use the same style, e.g. like Rust does. Or even Python!
I prefer snake_case too, but avoiding a language because of a minor convention like that seems weird.
With respect to the identifier resolution in Nim, it strikes me as more of a matter of preference. Especially given the universal function call syntax in Nim, at least it's consistent. For example, Nim treats "ATGCA".lowerCase() the same as lowercase("ATGCA"). I do appreciate the fact that you can use a chaining syntax instead of a nesting one when doing multiple function calls but this is also a matter of style more than substance.
* It makes searching for identifiers harder. For Nim you can't even use case insensitive search because of the underscore thing! Better practice your regexes.
* The case insensitivity rules are usually super complicated and don't apply to everything, so now it's an extra thing you have to mentally compute when coding. This is probably the biggest problem and I'm sure it has led to bugs, e.g. in SQL.
* Do you enjoy the tabs vs spaces debate? How about single quote Vs double quotes? Ugly inconsistently styled code? Well you'll love this!
* Unicode case insensitivity is actually really really complicated (this mostly applies to filesystems).
You're basically opening yourself up to an array of annoyances and gotchas for essentially no benefits.
I've literally never seen anyone use two identifiers that only differ by case, but if that were actually a big problem it could be solved just by making that illegal. You don't have to resort to the insanity of case insensitivity.
Nim solved tabs/spaces debate by allowing only spaces. Single/double quote are also completely separate, so there is no inconsistency as well. About "inconsistently styled code" - due to style insensitivity, effects are not viral. If you depend on a library that uses `get_name` but your project adopted `getName()` your don't need to suffer.
> Unicode case insensitivity is actually really really complicated
Which is why nim only handles case insensitivity for ASCII identifiers and not Unicode. Which makes sense because 99.9% code is written in ASCII.
> The case insensitivity rules are usually super complicated and don't apply to everything
Quoting from the manual - "only the first letters are compared in a case-sensitive manner. Other letters are compared case-insensitively within the ASCII range, and underscores are ignored." Unicode is not handled in style-insensetive manner, so the rule is pretty simple.
> It makes searching for identifiers harder. For Nim you can't even use case insensitive search because of the underscore thing!
Technically true, but in reality this comes up so rarely, I didn't have any issues with this ever. Nim projects usually adopt camel/pascal case.
For some reason, people often assume that allowing style insensitivity instantly throws the whole language ecosystem in complete disarray and everyone starts writing code mixing every possible style of writing at once, swapping styles on every second identifier just for their amusement.
So it's only identifiers, what about keywords, compiler directives? And only ASCII characters... And it doesn't affect the first character randomly. Sure very simple.
> If you depend on a library that uses `get_name` but your project adopted `getName()` your don't need to suffer.
Yeah because nobody ever imports code from other projects, copy/pastes from StackOverflow etc. /s
> Nim projects usually adopt camel/pascal case.
So why do you need case insensitivity??!
Trust me this is a decision they will regret.
If we are going to dismiss any nontrivial behavior as "sure very simple" then sure, having two distinct incompatible ways to write get_name (or getName) looks better.
It does not "randomly" not affect first character. Most style guides for camel/pascal case treat first character differently, with types being capitalized and regular variables starting in lowercase. Making whole identifier style-insensetive would significantly reduce number of possible names. So this rule makes 'building' and 'Building' different, but at the same time does not differentiate between 'get_name' and 'getName'.
> Yeah because nobody ever imports code from other projects, copy/pastes from StackOverflow etc. /s
I don't understand your point (even considering /s) - if I do import code from my other projects or copy-paste it from somewhere it simply doesn't matter to me what style they used - I'm not required to fix copied code, or adapt to the style my dependencies use.
> So why do you need case insensitivity??!
If their code gets into larger ecosystem it will not affect anything else, or at least it's effects would be minimized. It is a win-win solution - library author uses their preferred style, I (and everyone else in the ecosystem) stick to common convention, and everyone bis happy.
> Trust me this is a decision they will regret.
I've been using nim for multiple years now, and I have never seen anyone actually regretting this. Of course, when new people come to the language they sometimes are really surprised by this, but rarely ever complain about this in the long run.
EDIT: forgot to mention that nim is more than a decade old language, and "decision they will regret" should have happened years ago, yet it does not seem to be the case.
Not at all, unless you decide to mix different styles in the same codebase.
One of the big, big things for improving performance on DNA analysis of ANY kind is converting these large text files into binary (4 letters easily converts to 2 bit encoding) and massively improves basically any analysis you’re trying to do.
Not only does it compress your dataset (2 bits vs 16 bits), it allows absurdly faster numerical libraries to be used in lieu of string methods.
There’s no real point in showing off that a compiled language is faster at doing something the slow way…
I’m surprised you need the full 4 bits to deal with ambiguous bases, but it probably makes sense at some lower level I don’t understand.
I'll also read through your links, thanks for posting them.
(As in GATTACA might be read as is, but might be read as GAT?ACA.)
Still that's a minimal of 3 bits versus much longer.
[Edit : i see another commenter with the same observation, more thoroughly explained! ]
Because we use it as a nice syntactic frontend to numpy, a large and highly optimized library written in C++ and Fortran (sic). That is, we actually don't use "Python-native" code much, and numpy is essentially APL-like array-oriented thing where e.g. you don't normally need loops.
For native-language data processing, Python is slow; Nim or Julia would easily outperform it, while being comparably ergonomic.
The funny thing is that Nim and Julia libraries are still wrapping Fortran numerical library while D has beaten the old and trusted Fortran library in its home turf five years back:
You say that, but Julia is rapidly acquiring native numerical libraries that outperform OpenBLAS:
For Nim, there’s also NimTorch which is interesting in that it builds on Nim’s C++ target to generate native PyTorch code. Even Python is technically a second class citizen for the C++ code. Most ML libraries are C++ all the way down.
So a stiff ODE solve is pure Julia, LU-factorizations and all. This is what allows it to outperform the common C and Fortran libraries very consistently. See https://benchmarks.sciml.ai/html/MultiLanguage/wrapper_packa... and https://benchmarks.sciml.ai/html/Bio/BCR.html
https://news.ycombinator.com/item?id=28506531 - project allows creating pythonic bindings for your nim libraries pretty easily, which can be useful if you still want to write most of your toplevel code in python, but leverage nim's speed when it matters.
If you want to make your nim code even more "pythonic" there is a https://github.com/Yardanico/nimpylib, and for calling some python code from nim there is a https://github.com/yglukhov/nimpy
However, in any case I would never replace Python with Nim as it is too niche of a language and you would struggle with recruiting. I could consider Julia if it's popularity keeps growing.
That is the ultimate challenge of a language. It either needs a large backer (Go and Google) or be so good, it gets a natural market adaptation(Julia). As a manager I am reluctant to adapt yet another language unless there is a healthy job market for it.
Not all technologies require the full cycle and the normal risk management.
with open("orthocoronavirinae.fasta") as f:
text = ''.join((line.rstrip() for line in f.readlines() if not line.startswith('>')))
gc = text.count('G') + text.count('C')
total = len(text)
gc = 0
total = 0
with open("orthocoronavirinae.fasta") as f:
for line in f.readlines():
if not line.startswith('>'):
line = line.rstrip()
gc += line.count('C') + line.count('G')
total += len(line)
Yes, you can implement a faster Python version, but notice also:
* This faster version is reading all the file into memory (except comment lines). The article mentions the data being 150MB, which should fit in memory, but for larger datasets, this approach would be unfeasible
* The faster version is actually delegating a lot of work to Python's C internals by using text.count('G'). All the internal looping and comparisons is done in C, while on the original version, goes through Python
So yes, you can definitely write faster Python by delegating most of the work to C.
The point of the article is not about how to optimize Python, but about how given almost identical implementations in Python and Nim, Nim can outperform Python by 1 or 2 orders of magnitude without resorting to use C internals for basic things like looping or comparing characters.
To make it streaming, take the second version and remove the readlines (directly iterate over f).
Delegating work to Python's C internals is fine IMO because "batteries included" is a key feature of Python. "Nim outperforms unidiomatic Python that deliberately ignores key language features" is perhaps true, but less flashy of a headline.
And to be honest, I mainly wrote this because the other top level Python implementations for this one were terrible at the time of the post.
f = io.StringIO(
total = sum(map(lambda s: 0 if s==">" else s.count('G') + s.count('C'), f.readlines()))
Your first example takes 3.1 seconds, my previous comment takes 2.3 seconds, this one takes 1.4 seconds.
start = time.perf_counter()
with open("orthocoronavirinae.fasta", "rb") as f:
total = sum(map(lambda s: 0 if s==65 else s.count(b"G") + s.count(b"C"), f.readlines()))
end = time.perf_counter()
print(total, " total")
print(end-start, " seconds")
In my use case, I don't really see how Nim would make my life easier right now.
The main places you find it the other way are spreadsheets and shells.
Is there an explanation from the Nim authors as to why they made such an odd choice?
I don't write code only for myself.
How would I convince my employer to let me use Nim instead of a better known language?
And even I would convince my employer, if we want to start a new project how could we find programmers well-versed in Nim?
And even id we can find those people, it would mean we would have to write many things ourselves, which in other languages we can take for granted as they have libraries for almost anything.
So having a nice, performant and good language is just a small part of achieving your goals. You also need the people and the ecosystem.
Go, Rust, Kotlin, Swift and even Julia have the luck of having some industry heavyweights behind them, pushing the ecosystem and contributing with money and developers. Nim has only a bunch of passionate people behind it.
If a programmer can't pick up a language like Nim in a few weekends (from what I gather, it's similar to Python and not much different from most common languages, i.e. not something relatively exotic like Haskell) then I don't know. Our mainly PHP shop transitioned to Go quite effortlessly. Today we hire PHP juniors without any Go experience (easier to find), we teach them, and then they work on Go codebases already after a month of internship. So lack of "professional Nim programmers" doesn't look like a problem to me.
Lack of libraries is a good point but from what I read, Nim compiles to C, so I understand they can have access to tens (hundreds?) of thousands C libraries without writing everything from scratch.
However, indeed, if you are to choose between, for example, Nim and Go for a new project, then I am not sure why would anyone prefer Nim. I'm really interested to know.
Same here, curious to know what HN crowd recommends between Nim vs Go for new projects.
This makes it really easy to reduce boilerplate and create low- or 0-cost abstractions for your problem domain. Example of this done in a microcontroller project here:
Async/await is also implemented by metaprogramming, rather than as a "core" part of the language:
DSLs have some use cases but generally I'd avoid being too clever and creating basically entire sublanguages for the sake of it - unless I really need to. Most projects don't need it, it's harder to pick up for novices (especially if there's a zoo of DSLs, different between projects). In Go the last resort is usually code generation, there are built-in tools for it.
>Async/await is also implemented by metaprogramming, rather than as a "core" part of the language
I guess most programmers don't really care if a feature is implemented in the language or in the library, provided it's easy to use. In Go, it's not extendable however; but so far, I've never had a need to extend the default goroutine scheduler.
Go can be compiled to WASM, but practically I've never seen code that could be reused between backend/frontend because they solve different problems using different principles, so the idea of using one language for frontend/backend was never compelling to me. Maybe it's just me.
Hiring for Nim skills can be a signal that a company has people who learn languages beyond the run-of-the-mill ones. A bunch of passionate people you might say. That would make the company promising to work for.
Why the phrase "only a bunch of passionate people"? This is how software gets written, parasitical corporations and their unproductive developers who are installed in existing OSS projects come later and mainly associate themselves with the result (speaking of Python again).
This is a rephrasing of "nobody ever got fired for buying IBM".
Some organization prioritize innovation and technical acumen.
> So having a nice, performant and good language is just a small part of achieving your goals. You also need the people and the ecosystem.
Many applications don't need a large ecosystem. People can learn.
> Go, Rust, Kotlin, Swift and even Julia have the luck of having some industry heavyweights behind them
Python was never corporate-driven, thankfully, and it is successful.
Nim's easy to learn if you have any experience with any compiled language and can understand anything along the line of C#, Kotlin or Python syntax. Also because it compiles to C and JS it makes it easy to add it to a project incrementally in many cases.
The answer for the latter is programmer time, and some things can be scaled easily using `joblib`, or `dask`. Now, it isn't as trivial as importing parallel iterators with rust and changing `.into_iter` to `.into_par_iter`, but still needs less time, and once it is done, I don't need to think about it again.
That's horrifyingly slow for a compiler. The author mentioned "modern languages look like Python but run as fast as C", which is a common promise those languages make that never really materialize except for a few very happy path cases they heavily optmised the language for. Julia, for example, makes this promise too, but compiles even slower than that and takes ridiculous amounts of RAM even for hello world.
Did the author post the data set they used for the examples? Would be nice to try it out on a few languages to see how fast that can compile and run on a mature language like Common Lisp (which is just as easy to write) or even node.js.
Nim's advantage is that it uses a good old C compiler for the backend (which has been hyperoptimized for decades), but the frontend (transpiler) is also pretty fast. Nim's compilation speed should improve a bit when incremental compilation support is added (which would probably solve a lot of other current issues for Nim, for example better IDE tooling)
Here's a comparison with Common Lisp:
~/fasta-dna $ time python3 run.py
~/fasta-dna $ time sbcl --script run.lisp
~/fasta-dna $ ls -al nc_045512.2.fasta
-rw-r--r-- 1 156095639 2021-09-25 11:15
So, almost as fast as Nim (the time includes compilation time)?
Here's the Common Lisp code:
(with-open-file (in "nc_045512.2.fasta")
(loop for line = (read-line in nil)
with gc = 0 with total = 0 do
(unless (eql (aref line 0) #\>)
(loop for i from 0 below (length line)
for ch = (char line i) do
(setf total (1+ total))
(when (or (eql ch #\C) (eql ch #\G))
(setf gc (1+ gc)))))
finally (format t "~f~%" (/ gc total))))
EDIT: compiling the Lisp code to FASL and annotating the types brings the total runtime to 2.0 seconds. Running it from source increases the time very slightly, to 2.08 seconds, showing how the SBCL compiler is incredibly fast. Taking 0.7 seconds to compile a few lines of code is crazy, imagine when your project grows to many thousands of lines.
The Lisp code still can't really match Nim, which is really C at runtime, in speed when excluding compile-time, but if you need a scripting language, CL is great (specially when used with the REPL and SLIME).
Also, @benjamin-lee this version of the Nim program is a bit lower level, but probably much faster:
import memfiles as mf
var gc = 0
var total = 0
var f = mf.open("orthocoronavirinae.fasta")
for line in memSlices(f):
let n = line.size
let cs = cast[cstring](line.data)
if n > 0 and cs == '>': # ignore comment lines
for i in 0 ..< n:
let letter = cs[i]
if letter == 'C' or letter == 'G':
gc += 1
total += 1
echo(gc.float / total.float)
mf.close(f) # not really needed; process about to end
Alternatively, the exact file I used for the post is available for one week here with MD5 sum 3c33c3c4c2610f650c779291668450c9 . Anyone who wants the file is free to reach out to me directly (email is on site).
Last time I used it, I liked it but didn't use it long enough to have a strong opinion.
It's a compromise, but I always prioritise _my_ time over my computers time, so if I can write something quickly and just go and get a coffee while it runs - I will do that. I won't spend twice as long writing a single-run script just because it'll finish before the kettle has boiled.
Static types help for basic data munging when you haven’t used a script for months to get up to speed and make tweaks.
It’s a shame because I think Nim has some neat features that allow it to present as a serious competitor to Rust but it will ultimately have to compete against Python instead to secure its niche.
Well I believe there's room between Rust and Python where Nim can grow. It made the TIOBE top 50 lately even. Likely it can eat enough market share from the edges of both Rust & Python to become more well known (more libs, tools, etc).
Rust is fantastic but tedious to program (to me at least) and it's community focuses on more formal type traits, etc making "scripting" trickier. Python is great for a mix of quick scripts, web dev, and data science but it's slow enough (and getting complex enough!) for many to want something faster and more stable yet still easy to write. Nim lives in between them and is more enjoyable to write than either for many. Also, Nim _could_ add Rust as a backend target and be relevant even if Rust displaces C/C++. ;)
Nim is also great for embedded systems too! I've been using it a fair bit and it's really nice . There's a lot of room to grow in that field.
So is Golang.
My point, which apparently it wasn't evident enough, is that you can get the most of the benefits by doing nothing, just trying a different Python implementation, without the hassle of learning a niche language, as easy as it might be.
BTW, if you take into account compilation times the difference is even meager, and in all fairness the PyPy warmup period should have had to be discounted.
The general guideline has always been that Python is ideal for glue code and non-performance-critical code, and when performance became an issue then Python code would simply be used as glue code to invoke specialized libraries. Perhaps the most popular example of this approach is bumpy, which uses BLAS and LAPACK internally to handle linear algebra stuff.
This Nim advertisement sounds awfully desperate with the way it resorts to what feels like a poorly assembled strawman, while giving absolutely nothing in return.
> A nice feature of the lines function is that it automatically strips newline characters such as LF and CRLF so we no longer need to doline.rstrip().
Python has never been one of my favorite languages, but easy support in Google Colab, AWS SageMaker, etc. as well as most of my professional deep learning work using TensorFLow + Keras, it makes Python a go-to language for me. If you want a Lisp syntax on top of Python, you can try Hy (and get a free copy of my Hy book at https://leanpub.com/hy-lisp-python by setting the price to $0.00).
That said, for unpaid experiments I like Julia + Flux, which also solves the author's preference to avoid slow programming languages. Julia is really a nice language but no one has ever paid me to use it.
When you write C++, you kind of cheat because even code with high computational complexity is pretty fast. Whereas the equivalent code in Python will be awfully slow.
So, while it's true that Python requires less development time, this statement can't be used generally. I have spent hours optimizing Python code when in C++ I would have just moved on to my next task.
If Nim had cloud SDKs I would use it as my default language for pretty much everything.
If I were writing something from scratch that dealt with data, I would probably use Nim though. It's super easy to write something fast in and is more pleasant than pretty much any other compiled language.
"Benchmarking programming languages/implementations for common tasks in Bioinformatics"
in cases where what you want to do doesn't exactly fit standard operations, cython can be pretty nice. e.g. 200x -- 1000x speedups for translating C-oriented number crunching code from python to cython. but if you do want performance, you have to think about it while writing the code (avoid needlessly allocating memory in tight loops, data-oriented programming with simple arrays, statically type all of your variables, ...).
lines = (line for line in lines("orthocoronavirinae.fasta") if not line.startswith(">"))
gc_lines = (1 if ('G' in line or 'C' in line) else 0 for line in lines)
gc = sum(gc_lines)
total = len(list(gc_lines))
# Alternatively, a more "memory efficient" total would be:
total = sum(1 for _ in lines)
My point is: this is a highly I/O bound program. The implementation matters. With the correct implementation there shouldn’t be much difference between the languages.
That won't work properly; you've already exhausted the gc_lines generator in the previous line.
EDIT: you can do it with functools.reduce and a generator of tuples:
from functools import reduce
with open("orthocoronavirinae.fasta") as f:
lines = (line.strip() for line in f if not line.startswith(">"))
sums = ((len(line), sum(1 for ch in line if ch in "CG")) for line in lines)
total, gc = reduce(lambda x, y: (x + y, x + y), sums)
Or an arithmetic trick. Since there's the `1 if ('G' in line or 'C' in line) else 0 for line in lines` construct in the code, if you simply replace 1 with (len(line)<<K)+1 and 0 with len(line)<<K (for a suitable value of K), you can extract the two results using div/mod.
From what I gather, the author is a researcher in bioinformatics related field. This may indicate that they tend to work either alone or in a relatively small group. The domain is small scope data processing/manipulation, research/exploratory code, ,likely short-lived or even one-off.
The progress in this context will possibly be governed by sheer processing speed (e.g. it’s unlikely anyone will delve deep into the code, a lot of iterations to ‘just get it done’ instead of testing etc.).
If this is more or less correct, the point that Nim might be more useful than Python for the author sounds very sensible to me. It’s a nice spot between command line tools and more functionality-loaded languages.
cat test.py | py2many --nim=1 -
Yes, that is the achilles heel of Python.
I am always torn between Python and PHP for new projects because of this.
The Python Syntax plus its import system are huge advantages over PHP. On the other hand, you suffer a 6x slowdown if you go with Python. Decisions decisions. I so dearly wish I could have the good parts of both worlds.
And for data processing of all things ...
Php is also very slow, on top of being many other kinds of unpleasant and broken.
That doesn't make it fast at all. Just faster than if it wasn't jitted.
And that certainly does not eradicate the vast ocean of other problems php has, the first of which being that is was never, ever "though out" and instead grew like a cancerous mushroom.
I think that is pip.
Python is good where speed of development matters, where you write throw-away code testing some ideas and you want to do it fast, where you write glue code, for prototypes, for small code bases.
Once you are getting outside of that area, you better should use a language more suited for the task.
As for myself, even if I can use Python in some cases, I can churn C# code almost as fast so I prefer doing it that way in case I want to grow the code later or use it somewhere else. Being lazy, I dislike rewriting code.