Unless you go all-in on typing -- which is difficult today with any meaningfully sized existing codebase -- maintenance is largely a "hope manual testing and unit tests catch anything resembling type errors" which is a major challenge for, say, structural change to a code base like refactoring. Plus typing is still young, the tooling somewhat immature, and can lead to false senses of security if you aren't very careful and opt into the strictest modes. This makes large number of developers and codebase size a major stumbling block.
The interpreter performance and GIL are fundamental issues as well. Multiprocessing and hacks around the GIL are quite painful if you even glance at any native threading code (say in a C++ library) and even when you stick to pure Python, you have a debugging mess when anything goes wrong.
But if someone can improve performance, that'd be great, and has massive impact potential. It is incremental though and doesn't solve fundamental issues with the language. I'm also skeptical of the "5x" plan referenced in the blog, and very skeptical we can ever see meaningful removal of the GIL due to library and baked in design decisions in existing code. This means performance will fall further and further behind compiled languages.
(I write all of this having been a part of supporting Python at massive scale for over two decades, including at two FAANG companies who invest heavily in it. I've seen the curve and the pain it's caused, and would never use it for any code that needs to be performant or actively developed on the multi-month or year timescale).
Because I have a 1 million unique users video streaming service still running python 2.7, using a few servers. The thing has a mobile version serving a different media on the fly, encodes user uploaded videos, features comments, tagging, and even has machine learning detection of content now.
It is still maintained by one single person, and he is not a professional dev.
And I'd say it's already very rare to reach that size in any project, in any company. Hell my last 3 paid projects as a freelancer have less than 20 concurrent users. And that's professional. That's people's real life.
2 out of 3 IT projects fail, no code is even shipped. I would worry about the scaling later. Once you do have the problem, pay the price for more hardware or a rewrite, you made it!
And if you do know for sure you will have the problem (you already work for a GAFAM), then not choosing python is ok, it's not a trap, it's a trade off.
But for the vast majority of us out there, having an easy, solid, battle tested and clean language with a huge ecosystem is worth a 10000 times more than some scaling potential in 5 years down the road.
I have seen so many big C++ or Java project fail I don't associate any language with big and failure anyway. I've seen devs arguing about the purity of this in Haskell, or that in Ocaml, and nothing gets to the end user because it's never perfect, or it's a toy, and doesn't handle IRL.
But what powers most companies ? Rail apps that you won't upgrade ever, but are still running. Spaghetti PHP code and wordpress that just won't dies. Horrible SAP scripts, VBA macros and excel sheets that actually deal with your real data.
Because they shipped. They did the job.
But solve the problem first, and solve it fast. It's why so many startups from the 2010's on used Ruby; they were productive in it, solving a real problem, rolling out features fast.
I've worked in a few projects where they dove onto the cargo cult, of wishing they had the kind of requests and load that e.g. a microservices architecture might help with. Massively overpriced projects too, because they hired lots of consultants and self-employed people.
The one time there was a guy who rocked up in a Maserati (used, lol); he took two weeks to build a page that was just a centered bit of text and a button, and it didn't even work.
When rewriting you already know how the project is now, so you can design your app around that, which makes things simpler and faster.
It doesn't require any extra work either. Just requires knowing what you're doing.
•How many servers 2, 5, 10?
•How many pure python libraries did you have to replace with a Cython extension after you realized that it cannot support your scale?
•How many external dependencies? How many of them did you replace after they were abandoned? Surely not all actively developed library are still using Python2.7.
> It is still maintained by one single person, and he is not a professional dev.
So he cannot upgrade the project to Python3.x along with all matching dependencies either, So the project stays with all bugs and vulnerabilities.
Not everyone would be happy with such a setup and aforementioned drawbacks, that's why I think the parent comment says Python is a trap at scale and I concur. I've faced issues at at scale (few hundred thousand) and there's no way a python project can meet that scale without relying upon Cython dependencies.
> A trap for your dream world were you suddenly get google size ?
So only Google size is a benchmark for scale? If I were to start a web project which has just 10 concurrent users I would choose Go because I know I can develop the entire project with little to no dependencies, it can scale to N users without touching the code and with the confidence that the number of servers required will be < than that of python.
I still use Python every day, for where it's good i.e. small scripts for personal use, prototyping algorithms. For professional products I use Python only as an Machine Learning endpoint(With all the bandages) because I have no other choice.
But the question is rather, whats the cost / ratio ? The 7 servers costs 7 less times money that they bring. The solo dev makes $150k/years with it, and doesn't need to work full time, have flexible hours and its own business.
> How many pure python libraries did you have to replace with a Cython extension after you realized that it cannot support your scale?
Many. Being extensible with compiled extension is a feature of Python. Numpy and co are part of the ecosystem. That's an awesome thing about it: I can code in Python and get c speed anyway.
> •How many external dependencies? How many of them did you replace after they were abandoned? Surely not all actively developed library are still using Python2.7.
Many. And the project still runs, with so little resource, bring the money in and serve its purpose, after 10 years of service.
> So he cannot upgrade the project to Python3.x along with all matching dependencies either, So the project stays with all bugs and vulnerabilities.
Sure he can, it's just an expense he won't do. Bugs are workaround, vulnerabilities are shielded by other parts of the system. Essential things like injections are dealt with. Some trade off are made, like always.
> So only Google size is a benchmark for scale? If I were to start a web project which has just 10 concurrent users I would choose Go because I know I can develop the entire project with little to no dependencies,
No you can't. Not if you want the first version to come out in a few months. Because you need a scrapper, and a bdd, and registration, and auth, and permissions, and screenshotting, and upload, and videos encoding, with background tasks, and load balancing all the videos and pics, tagging, i18n, xss protection, and search, etc. So either you pay to use cloud services to do all those things for you, or you leverage a rich ecosystem. Or you take a year to do it. Even if Go were the best language in the world, it can't beat 10 years of django plugins, 25 years of python modules and the iterative speed of a dynamic syntax. Well, it can, with AWS :)
In fact, Go is a very niche language. If you take it out of the I/O and concurrency specialty, it's quite average. Rust is better at raw perfs. At same perf, java has a better ecosystem. For simplicity and easy compilation, you have nim and zig that are more expressive. Resilience ? Erlang. And for speed of dev, python is better.
So for our use case it is not were it would shine, because:
> it can scale to N users without touching the code and with the confidence that the number of servers required will be < than that of python.
1 server only run the python code, the others are for the video hosting and the db. Python is not the bottleneck in this site. It almost never is in web dev. nginx serves the content, postgres and redis deals with the data.
There is also the stdlib. JS doesn't have simple things built in like left trim a string, csv loading, uuid generation, etc.
Then there is the API. Sorting in JS requires this weird -1/1 function. Getting the last element or truthiness to manipulate length.
Suddenly not only you have to install tons of things, but libs and frameworks can't rely on a common ground, nothing seems to integrate well. You glue a lot of things manually.
The popularity of JS is because it has a monopoly on the browser, and so a lot of people must use it, and a lot of users know it by default, and look for it elsewhere.
But when you have Ruby, Python, or even modern PHP frameworks in front of it, it's not a great deal.
Of course, not all programming tasks are about the Web. And this is where JS collapses completely. Sysadmin, data analysis, pen testing or automation using JS is a pain compared to the fluent versatility of alternatives. You know, things like connect to this ftp, download the excel files every hour, put the average of every column in the mysql db then export the thing as a PDF and send that to the boss.
The best testament to this is the reality that popular web services were able to limp along on a slow language for a long time before they had to rewrite their code.
But if one is doing something that requires raw power or precise control over resources, or precise timing then not choosing Python is not merely a tradeoff to be considered, but an obvious architecture decision.
If you're a millionaire by that time, or have a strong established business, then it doesn't matter, it's a nice problem to have. Have your engineers have a go at it.
See: Dropbox, Disqus, AirBnB, and others...
And if you're not, then going from 0->100 in 2 years instead of 0->1 in a few months with Python wasn't worth it anyway (and you also don't have enough customers/users to make use of those 100x improvement).
I wrote a few things in what I’m pretty sure was correct, idiomatic, best-practices Python 3 and almost immediately I could see some familiar problems from my Perl days looming on the horizon, especially around testing, packaging and maintainability.
I enjoyed writing Python code and for the right kind of startup or hobby project I might use it in the future, but there is so much black-boxed weirdness in the ecosystem, so much inconsistency in paradigms, and so much temptation to “cheat” that I’d definitely think twice before committing a team to it. And I didn’t even get as far as the performance issues.
On the other hand, of course, you can opportunistically “move fast and break things” within a larger org too. But I worry that if you get too committed to Python, you will be unreasonably dependent on your lead engineers sticking around.
Python might not be the most performant, but once you need the scale, it's a good problem to have because at that point, your product has already made it.
Making good engineering choices from the beginning matters a lot.
If the choice is between a sub-optimally performant but working system, or a boondoggle that never gets off the ground, the “better engineering choice” is probably the former.
At what point do you embrace the trap and happily live with it?
For example Dropbox runs many millions of lines of Python, has massive traffic and their server costs aren't completely out of line for the service they offer. Their code base is also over 10 years old.
It seems to be working very well for them.
I talked to one of their engineers a few months ago. He gave me a complete run down of how they build and deploy Dropbox at https://runninginproduction.com/podcast/82-dropbox-gives-you....
Like many companies, they had initial scaling successes with Python,hit performance bottlenecks and looked for a way out with faster languages.
In a ton of cases you'll be completely fine serving a few million monthly page views of your SAAS app on a single $20-40 a month server using Flask, Django or whatever Python web framework you prefer. Performance will be really good too. Just talking about most apps where it's mainly reading and writing data to a DB.
I write all my backend code in Go (used Python before Go existed) because I can deploy Go executable on the cheapest render.com server and it runs using fraction of the available resources (~50 MB out of 512 MB available and literally 0.01% CPU use).
If I used Python (or Ruby, node or even Java) I would likely have to go higher (and pay more) because those languages are not only slow, they are also memory hogs.
I'm as productive in Go as I used to be in Python, I see no reason not to use Go.
As a frame of reference, ASP.NET services in C# can exceed the throughput and performance of an equivalent Django app, but Django (and Python) may be the best choice for some people. What's not considered here is the experience of the developer and whether they can avail of the platform capabilities fast. Do they need to rewrite libraries? How much work is needed to be as proficient as you are with Go? The language and platform choice is rarely a differentiator here.
Over the long-term you can see some patterns. Python companies are able to scale up their teams more quickly and cheaply in some markets than Go ones. You choose to optimize hosting cost while they optimize other costs. They may need to avail of libs like Pandas and these are readily available. I don't know what the situation is with Go today, but similar libs weren't available originally and the Go community didn't have a big foothold in data science.
If you were NOT able to do that, that would have actually been a reason to not build your app in Python.
Even 5-10 years ago, that's not a "huge success story"; that's table stakes for any serious language.
I did not mean "huge win" as in it is something unique to Python. I meant it in the sense that it greatly adds to the value proposition of the language.
it's always the same
prototype in an easy language
tighten the bolts closer to the metal after some time
... but I think you're forgetting the real motivations that drive people to use Python: it's easy to learn, it's very readable, there's a large (and not so much toxic) community behind, it's versatile.
No one is saying that it will replace every other language, but it still can find its space even among large projects.
Python already does so much for you, and provides so much tooling to do even more, it's hard to write bad code if your daily job is to be a coder.
The iterator protocol means you rarely get off by one errors, context managers and GC deals with resource handling, the GIL, for all its faults, helps a lot of concurrency errors...
Now, add a good IDE, and syntax errors, names errors and attribute errors go away. Play a little in the REPL, and use cases become more obvious, behaviors more clear.
Want more security? You can add type hints and unit tests (which are incredibly easy to write, thanks to pytest).
Even without all that, the size of the ecosystem means you won't write most of the code anyway, and less code, means less potential for mistakes. Adds to that the language bare bone syntax, and you really has something that actively helps with writing good code.
Not to say you can't write bad code in Python, but unless you are not a professional dev (in that case, you can write bad code in any lang), you have a lot to help you there.
Writing bad code is easy in most statically typed languages too.
I’ve used Python for more scripting type tasks and creating basic APIs to get specific jobs done. I’m very weary to go “all in” on Python for an app based on all the feedback about performance.
All that being said I’ve never heard someone say we used X language and it scaled remarkably without issues. I’ve noticed lots of companies eventually go to Java once they get big but nobody really likes to brag about Java apps.
Here's a benchmark where JS is 50x faster than Python: https://github.com/kostya/benchmarks
Note that PyPy does much better.
For example, here are filters for both JS and Python: https://www.techempower.com/benchmarks/#section=data-r20&hw=...
Do note that for most of the "realistic" stacks out there, you'd probably want to filter out all of the micro or no framework approaches, to compare something like Express and Koa against Django and Flask, like in the following link: https://www.techempower.com/benchmarks/#section=data-r20&hw=...
Here's the summary page with composite scores across all tests (though you might also want to filter by data storage solution, for example, MySQL): https://www.techempower.com/benchmarks/#section=data-r20&hw=...
To summarize, the performance from the best to worst currently is:
Composite scores: Koa (JS) > Express (JS) > Flask (Python) > Django (Python)
> "more real world scenario" "realistic"
I'm of the opinion that "Real programs may not be representative either".
We have to show that they are or discover they are not.
Of course, if you can, take benchmarks with a grain of salt and ideally do some prototyping and load testing.
Is "exactly" really the situation, or is it that the benchmark does something kind-of like one of the use cases?
For example, look at a full platform Java Spring Boot example here: https://github.com/TechEmpower/FrameworkBenchmarks/tree/mast...
The controller code, the model code and even the repository code are all very reflective of the primitives you'd find in production code.
The only differences would be when you add additional complexity, such as validations and other code into the mix, such as additional service calls etc., but if even the simpler stuff in Python (as an example) would be noticeably slower, then it stands to reason that it wouldn't get magically faster on the basis of less overhead alone.
The exceptions to this which i could imagine would be calls to native code, since the aforementioned Python has an amazing number crunching or machine learning support, but that is hardly relevant to web development in most cases.
Otherwise we risk running into the "No true Scotsman" fallacy: https://en.wikipedia.org/wiki/No_true_Scotsman
In such a case, we'd adamantly claim that these benchmarks are simply not good enough even for educated guesses and we'd have to build out our full system to benchmark it, which no one actually has the resources for - in practice you'd develop a system and then would have to do a big rewrite, as was the case with PHP and Facebook, as well as many other companies.
On the other hand, premature optimization and dogmatic beliefs are the opposite end of the spectrum: you don't need to write everything in Rust if using Ruby would get your small application out the door in a more reasonable amount of time.
There are no easy answers, everything depends on the context (e.g. small web dev shop vs an enterprise platform to be used by millions) and the concerns (e.g. resource usage and server budget vs being the first to market, as well as developer knowledge of tech stacks, for example market conditions of PHP vs Rust).
Regardless, in my eyes, there's definitely a lot of value in these attempts to compare idiomatic interpretations of common logic for web development primitives (serving network requests in a known set of formats).
> … which no one actually has the resources for…
iirc they instrumented a web browser to collect "real-world" data and demontrated that wasn't like the behavior of benchmark code.
That webpage references "A comparison of three programming languages for a full-fledged next-generation sequencing tool" — "reimplemented … in all three languages and benchmarked their runtime performance and memory use."
> … you don't need to write everything in Rust if using Ruby…
Performance doesn't matter until it matters.
As-in the home page quote and reference — "It's important to be realistic: most people don't care about program performance most of the time."
Python basically using C types and GMP to write C in Python...
It’s not really what you’d expect for a site called computer language benchmarks game. Each example just calls out to a very fast c library (pcre2) to perform the heavy lifting regardless of which language is being “benchmarked”.
Seems a pretty pointless site.
The other examples have similar nonsense.
> "... a pretty solid study on the boredom of performance-oriented software engineers grouped by programming language."
I think that's probably the best way to describe it.
Edit to answer yours: There may be less Python programmers that are bored, or less performance-oriented Python programmers.
Your python 3 one isnt the fastest python 3 solution, it’s this one which uses pcre2
The slower program is an example from that website of a regex program which does not " just calls out to a very fast c library (pcre2)…".
from re import sub, findall
pcre2 is not —
Here's some of a comment from _operator.c
"This module exports a set of functions implemented in C corresponding\n\
to the intrinsic operators of Python. For example, operator.add(x, y)\n\
is equivalent to the expression x+y."
Why wouldn't you expect that CPython regex would "ultimately" be written in C?
Did you? :-)
That Node pidigits program also uses GMP.
Besides those 2, did you notice other JS programs not 50x faster than the corresponding Python programs ?
But everyone knows it is significantly faster than Python. Probably at least a few times for pure JS vs. pure Python programs. Not that it matters, Python is basically glue to run mostly C and C++ programs.
(Java probably is also fine; I've less personal experience there so I can't say).
A lot of them have huge amounts of code that runs in browsers, so they essentially have to invest in JS whether they like it or not.
> All that being said I’ve never heard someone say we used X language and it scaled remarkably without issues.
Sometimes you have to go by what people don't say. Humans tend to talk most about problems or surprises. The former are actionable and the latter are more interesting to talk about because of their rarity. That means when things work well as expected, you get silence.
I remember when people constantly talked about how Java was too slow. Then there was a period of time where people talked a lot about how Java was fast enough because that was an interesting change. Now people don't talk about Java performance much at all, which is a good sign (but easy to overlook), that Java performance is now consistently reliably good for most users.
I think the rewriting part is only necessary if there’s some characteristic in your use case that you need that Python doesn’t provide — like raw computational speed, low latencies etc. If you’re a senior engineer familiar with Python, these trade offs are always front of mind.
For the majority of software I’ve developed, these characteristics were not essential. Python was a good choice for 80-90% of all of the important software products I’ve ever written.
For many data science projects, maintaining large code bases in Python is often the optimal decision (rather than reinventing the wheel and writing your own data frame and machine learning libraries, or using immature poorly maintained ones in other languages — for data manipulation and scientific algorithms, these libraries are highly optimized in Python anyway since the underlying code is in C or Fortran). Python is also a good choice for data engineering pipelines.
Where I might hesitate to recommend Python is when you have to write web services or desktop apps or any kind of application which requires speed or scale that cannot be handed down to a lower level library in Python.
Also for very large code bases, static typing truly helps to keep things sane, especially when you have different teams working on different parts of the codebase — with statically typed languages, no type checks in unit tests are needed, and refactoring is much more solid and error free. Python’s type annotations are an attempt to move in this direction, but static languages truly excel at type integrity (which are sometimes the cause of subtle errors in Python).
Otherwise my experience has often been that Python is a good first or second choice, depending on what you’re doing. Making that choice correctly is what sets senior/principal engineers apart from junior folks.
I wouldn't call it an attempt. I mean, sure it has no chance against for example Rust, but Python's type system is actually quite decent and I think it is more powerful than the one from Go. Also you have freedom of using both nominal and structural typing if you chose.
The only problem I have with it when I have to use a package that don't have types, but fortunately that is happening less and less.
Some package authors refuse to add types, but others provide stubs to solve that problem. For example boto3-stubs provides types for boto3.
I really like that thanks to types I can also easily refactor code without worrying about breaking something in the process. I suspect if types existed and were popular with Python 2 then the whole Python 2 -> Python 3 migration would be a non issue.
Or you could use R, instead of the half-baked immature clones in Python :) I'm only slightly joking here, even if Python is much much better for string processing, it's much less useful for DS tasks (unless it's NLP or DL, to be fair).
That being said, Python is the second best DS language, as well as second best for everything else, so it can be a good choice for these projects.
For our use case of deep learning, REST APIs, and image processing (often a mix of these), Python seems like the best choice we have. The GIL isn't a problem at all because all our computations are done with NumPy or PyTorch, which release the GIL during most of their operations.
Personally I've got a lot of mileage out of Hypothesis for property-based testing. It's good at exercising edge-cases, and works particularly well when we sprinkle assertions through a codebase (where the "property" we're testing is simply "calling Foo doesn't throw an exception").
One of the benefits of typing (that is rarely discussed) is that it deters people from writing code whose type would otherwise be crazy (e.g., "if someone passes the string 'foo' in for a parameter, then the return type is a string, otherwise it's a bytestring"). In other words, it deters a lot of crappy code. When you annotate after the fact, the annotation becomes really difficult and people get crabby that they're having to make this really complicated annotation and they blame typing (rather than their own poor coding). To your point, typing after-the-fact is still better than nothing, but you're missing out on a lot by waiting. Moreover, Python's typing story is still really immature with respect to syntax and tooling.
The technical solution is a very useful way of preventing mistakes by well-intentioned people though.
The claim is very narrow: typing can inhibit certain kinds of bad code and guide people toward better solutions. Whether someone is writing bad code because they're lazy or inexperienced, typing keeps some of the bad code from entering the code base.
Of course it's not a replacement for good management, hiring practices, etc.
I argue that somebody who would try something like your example above is either lazy or trying to wreak havoc. In my experience, people like that won't be deterred by strict typing, they'll keep typing nonsense and copying dangerous lines from StackOverflow until the thing outputs what they want. Having them write code that's complicated or important enough to require typing checks will result in disaster anyway, the only solution is to keep them out of the codebase, or review (and often rewrite) all their work.
I agree that typing is a very good guardrail for well-intentioned people with a minimum of competence, but in that case there's nothing to lose by making typing optional: they'll follow it anyway, and the added flexibility is often useful when well-used.
An inexperienced dev who wants to learn also won't need to be forced to follow typing guidelines, we just have to explain it to them...
Any engineer who would be qualified to write these kinds of guidelines would tell you not to waste time drafting and enforcing guidelines because type checkers exist. And if they're a really good engineer, they'll politely rebuke you for attributing poor code quality to the author's moral character, as this is a self-limiting outlook and generally toxic behavior.
- As for the GIL thing, I'm with you that it's never going to be lifted in any meaningful way, but wouldn't the subinterpreters idea along the lines of Ruby's ractors fix most of the pain in most cases?
Also, why are you skeptical about "the 5x plan"? I don't really have an opinion about it, but I'm interested in hearing any input.
Type inference & hinting at the IDE level is just not as developed either
I used Python and I noticed over the years that things like type interference drastically improved.
Similarly type checker in Rust extension for IntelliJ was lacking even though Rust has superior type system.
I assume you're saying this out of experience, so can you give a practical example of what you call 'notable scale'? And are you talking about desktop or web or mobile, or just all of them? Just so that we know what you are talking about in a concrete way. Not the 'large software with large number of developers is always a problem' way.
edit I also see others here using 'scale' in a 'performance / number of request per second' etc meaning - I thought we were talking codebase size though. I mean if you know on beforehand this type of performace is a requirement it seems weird to turn to Python, of all things?
The codebase is being slowly migrated to static typing. On one hand, as the parent says, the typing module is still immature and there are still some Python constructs (not too weird ones, see  for an example) that you can't type-check correctly. On the other hand, I like the fact that you can include typing slowly and not all at once, it makes the effort much easier to tackle. And, if typing works, it works well.
Regarding performance, well. Parallelism is pretty hard to do well, and the language itself is not the fastest thing. Some parts are migrated to a C extension but that's costly in terms of development and debugging time.
Despite all of that, I do think that Python was the best choice for our situation, and still is. Maybe from this point onwards another language would make things easier, but without Python's library support, ease and speed of development and expressiveness the cost of just getting to market would have been far higher, and probably we wouldn't have reached this point with other languages. And migrating the codebase to another language is just not worth it at all, as there are still a lot of areas we can improve quite a lot with far less effort than a full rewrite.
Would be cool if some experienced dev could share some estimates when the Python scaling issues start.
When you build a backend using Python + Django/FastAPI, I assume in most cases the DB and not Python is the limiting factor. Moreover, you could always spin up more workers to mitigate scaling issues.
When you train ML models, your Python code just calls C++ functions. Python is not a limiting factor here either.
For us, the scaling problems started when we had two independent teams working in the same codebase. The extreme dynamism of the language meant that classes and data structures were being mutated willy-nilly in ridiculous ways across the execution flow. The lack of static typing made onboarding new developers difficult as they had to parse generations of excessively clever code and magic left behind by departed developers. This problem has only gotten worse in Python 3, which keeps piling on more ways to accomplish the same task.
The deployment story was also awful, but I don't think that's a surprise to anyone who has deployed Python at scale.
In terms of raw performance, at one point we estimated that our Python stack was adding 3-400ms of request latency compared to a comparable system written in Go. With the Python 3 deadline coming, we convinced management to invest in rewriting performance-critical parts of the service in Go, instead of the migration to Python 3. I left before the project was completed but we were already seeing massive improvements.
Edit: code example https://gist.github.com/asemic-horizon/2830ed3637cfd278e7937...
Otherwise you’re right — whenever you have “for” loops you’re always going to end up ahead if you rewrite those in C, Rust, Julia or Fortran. And that’s exactly what authors of high performance numerical codes do.
I did my dissertation on symplectic exponential Runge-Kutta schemes with stable numerics for the big matrix exponentials needed (which have this specific structure that allow for theorems to be proven). But I didn't have time to write any code at all. I wonder if open source ODE solvers are getting good high-order symplectic methods by now...
Up to 10th order https://diffeq.sciml.ai/stable/solvers/dynamical_solve/#Symp... . Also Magnus methods https://diffeq.sciml.ai/stable/solvers/nonautonomous_linear_... and exponential integrators https://diffeq.sciml.ai/stable/solvers/split_ode_solve/#Ordi... . And the expmv implementations are highly optimized as well: https://github.com/SciML/ExponentialUtilities.jl . A lot of this specialized matrix exponential stuff is rather fun, for example see https://github.com/SciML/ExponentialUtilities.jl/pull/64
That said, matrix notation can sometimes be the more readable way of expressing a calculation.
Anyways, my point was more in response to this from the person I responded to: "my instinct is to look for opportunities to vectorize by rewriting loops into matrix notation on paper and then expressing them as array calculations". In some languages (Julia in particular) that is slower than the equivalent loop based code.
1 - https://nim-lang.org
This lets you gradually transition hot path Python modules to Nim, get compiled performance generally on par with C and Rust, whilst enjoying strong, static typing with great type inference.
In my experience (6-7 years 4 of which are full time) Nim strikes the perfect balance of the productivity you get with Python with high performance at the same time.
Also the metaprogramming features are incredible and, importantly, don't use a language subset but use the base Nim language itself.
There's also a QML wrapper here: https://github.com/filcuc/nimqml
But sometimes the productivity benefit is all that matters and you accept you may have to throw it all out later (or invest insanely in making it work)... it's all about making an informed decision.
To an outsider who knows a bit of python because of Airflow and Spark, it seems Python has become a popular metaprogramming language that various disparate ecosystems have all adapated.
Pyspark is python but kind of its own language and much of the processing is happening outside the python runtime. I think you could say the same for people doing Numpy or Pandas.
Tensorflow probably even more so where I believe Python is the most popular interface, but you're really just programming a program to run somewhere else. Again this is the same with Airflow conceptually, though I believe the runtime is python.
If my hypothesis is overall correct, and that a majority (or large share) of python programmers aren't sharing the same ecosystem, libraries and packages, then shifting to a different runtime or language that just encapsulates a subset of the language is much less challenging.
This is the opposite problem of the JVM, where no one really likes Java the langauge, so everyone tries to create enjoyable (and productive) languages for the jvm to keep using the libraries and the runtime optimizations.
But now in hindsight, I'm shocked that Python 3.0 wasn't seized as an opportunity to lose more of its baggage - the under-optimized interpreter, the poor parallelization story, the excessive surface area of the interpreter exposed to Python itself, making it impractical to re-implement in other interpreters, better typing, etc.
They broke backwards compatibility and all we got for it was Unicode?
Personally I refuse to touch our Python services with a ten-foot pole. My local app hits the testing environment and that's that.
For most python projects, that can be as simple as `pip install tox & tox`
That's certainly my impression. I'm not a professional Python programmer, though I have used it in a few projects that went to prod. I really want to like Python, and it is really nice for some quick one-offs. However, in my project, where the codebase was 30-40% Python, I estimate that 80-90% of bugs and time spent were in Python. The other side of things was more or less write once, never revisit.
This is exactly as it should be: the nature of the interpreter is that it has to do a lot of the work a compiler does when it compiles whenever a new command is run (and in Python's case, that is before you cover the whole "everything is an object" part). Yes, some of it can be mitigated, but the expectation that an interpreter keep pace with compiled code is somewhat of a pipe dream. What I'd love is a language that can be compiled (and be highly optimized) and interpreted without changing it's behavior. The developer experience with an interpreter is fantastic, the performance of compiled code is... better.
You mean something like a byte code like language with a JIT system? Like Java and C#? C# in particular now supports AOT compilation, but can still be run as “interpreted” CIL.
The VM abstraction is not a barrier to fully optimizing for your architecture, but rather how much time you can spend converting IR/bytecode into assembly, whether you do it at runtime, and whether runtime information lets you optimize even further.
This has been a great feature of many lisps for 4+ decades now. Not unique to them of course.
By the way, there are some (high-level) compiled languages with REPLs like Lisp, Haskell and Elixir.
Does elixirc compile to "native code" ?
Does the Haskell REPL use an interpreter not the GHC compiler?
Anything done at significant enough scale /must/ be multi-threaded to take full advantage of modern hardware which makes languages which don't provide a reasonable threading model effectively non-starters or at best a toy language for internal prototyping. I shifted from Python to Ruby, simply because Ruby provided a realistic way to interface with real threads, and then to Go, and haven't looked back. Simple applications are massively more performant in Go than in Python, and it's not just due to typing and compilation.
I love how simple Python is to learn and how it brings more people into the fold in approaching solving complex problems, but at the end of the day you should treat it like runnable pseudocode to get shared understanding so you can implement in a real language.
PyPy is 4x faster (on benchmarks), no xp with Jython.
But when Cython fits, it's great, and a good tool to have in your toolbox if you're working with Python.
Typed python is safer and more maintainable than Go. Slow as a dog, but it's really a hard choice when choosing performance and a bad type system with a half baked language vs a good language with a good type system that is a bit of a drag on performance.
And you really don't have to go all in typing. I have worked on gradually typing code bases and it pays off from day one almost.
* specifically, this happened long after the tragically many-ways-to-do-it of packaging systems and virtual environments, which survive to this day.
For example, mypyc is a python to C compiler that is part of mypy. It uses types for both correctness and optimization:
Micropython seems to make embedded code more readable and bug free than Arduino C-based programs, judging from a few projects I looked into.
Yes, it would probably be a mistake to write a large distributed, high performance database (say, for a login system handling billions of users) in python, but many, if not most, projects never hit the wall you are referring to.
In that sense, I feel that writing something in the most scalable language (if such a thing existed) would just be premature optimization at the highest level.
To use a metaphor: Python is the duplo of programming languages.
So I still look suspiciously to all mid to large projects what are written in Python. It takes some serious effort and good engineering to maintain good quality, and still the tooling is not there for error detection, like in other better typed languages would be. I wouldn't feel confident relying on such project for a critical infrastructure piece. The idea should have been validated quickly with Python, then implemented in a more appropriate language.
EDIT: Obviously that's just my possibly wrong opinion. There are huge projects out there proving otherwise, but the issues discussed here are _very_ real.
But I've found that I really dislike maintaining larger Python codebases. Whenever I've tried to develop packages or libraries, or create executables it always felt really clunky. And when it comes to personal growth as a programmer, I find it's just too easy to stack ready made libraries on top of libraries, without ever coding things from scratch.
But I'm glad to hear that there has been more focus on improving CPython's performance. I don't disagree with the things that the Python Devs have chosen to focus on, and I don't think anyone can really say it hasn't been successful for them. I use Python for work, so I won't forget it anytime soon, but for programming at home, I'll use other languages that align more with my opinions of programming that I've developed after having used Python for an extended period of time.
This is working as intended. This is how everything should be.
Is that the problem? I thought I read somewhere that the C/C++ extension API would have to be broken to allow for removing the GIL, for example. The API, not just internals that shouldn't be touched by user code.
If it were only a rebuild, I don't think anyone would object; pip packages are built once per Python version anyway. Breaking the API would be a different story for libraries that make heavy use of C++ extensions.
And it doesn't matter if the compatibility was broken for some thing you actually care about: if your code can't run at all until the broken third-party dependency is fixed by its maintainer, you simply won't upgrade, will you?
What's their beef with PyPy?
I think I would like to get into some of the optimization work on Python. I've contributed to some of the DS/ML/NLP libraries in the past, but the core language would be super interesting.
Anyone have good resources on getting started on that sort of thing?
However, with pip's native freeze and install -r abilities, I'm also curious what you mean by hunter gatherer style installs. I've not yet had a problem with installing Python modules.
And heaven help you if you accidentally get conflicting versions installed in the same venv. (By conflicting, I mean two libraries that both depend on different versions of the same underlying library. Odds of this increase dramatically the more you use.)
It's another story for conda, which flat-out refused to install my list of around 10 packages (that seems to stems from the fact that most packages are in conda-forge, and requirements are often inconsistent with the default repo).
Installs are now an scp.
Now of course, you gotta remember that installing something is usually more than copying the code, and there is nothing to help you with that, in any language.
Welcome to the world of deployment, which is why things like ansible and docker exist.
And this is demonstrated in the thread: people start listing ways you can streamline installs and try to avoid version conflicts. And there are, as you can see, several. Some don't even anticipate what kinds of problems the user will encounter with versioning conflicts, old Python versions coming with the OS etc.
What is a bit disheartening is that people actually thought they were providing helpful suggestions for solutions when what they did was only to prove my point. But they won't necessarily see it that way.
This is why I prefer using the target OS dependent package management - there is usually just a single one such official system, which is also language agnostic and much more robust than the various per language kludge.
Then you have languages that are "half way there", like Java. I suppose there are still people who make what I refer to as "splat style projects", where you unpack some horror-zip containing thousands of files and litter your surroundings with JAR files, config files and other detritus. But you can build everything into a single JAR file that contains all the dependencies and can be run without any other requirement than a sufficiently new Java runtime. And when shown how to do this, most people tend to adopt it as their default build product.
Languages like Python don't have a well-established build product that is self-contained. This places an undue burden on those who package software for distribution. It forces them to involve themselves in concerns that should be contained within the project - not leak out onto everyone's floor and potentially cause accidents.
Packagers have to make sure that "all arms and legs are inside the vehicle" for all possible permutations of system configurations. Which is easy when you have a statically linked binary. Less so when the software in question is more like a cranky, writhing toddler.
What really made the problem visible for me was when I worked in an organization where suddenly the place was filled with "data scientists". Mostly math or statistics people. Or more precisely: when more non-engineers started writing code and proved to be not only unable, but unwilling, to learn sufficient software engineering to ensure other people than themselves could actually run the code they wrote.
Which was kind of unfortunate because researchers who were unable to reproduce their own computations was a _regular_ occurrence. They simply couldn't figure out how to get their old code to run on their new computer, for instance.
And it isn't because they are jerks. It is because they use tools to get work done. They are not as interested in the tools as the people who make those tools are. To them the rest is noise that wastes their time.
This highlighted the fact that you shouldn't have to be a software engineer in order to produce programs that are easy to distribute - it has to be part of the path of least resistance.
The fact that Python says "not my problem" isn't helpful.
(I think there are interesting lessons to learn from Go. There are a lot of things I don't like about Go, but almost every instance where the language developers put their foot down and said "this is how we do it" actually made things better for everyone. It might not be your favorite way of doing things, but someone has made a choice - which is better than "I don't know...do whatever you like")
I blame the publish-or-perish mechanics and grant approval process not actually motivating the participants in any way to write maintainable or reusable code. Like, I don't say they should make it good enough for industry to consume, but at least for the followup scientist to build in the work to run!
But thinking about it, even in the computer science course I studied there was hardly any emphasis on software maintenance, cove versioning or even how to collaborate with others - all non technical courses were basically about being and analyst and planning how to write a project, with zero interaction with others.
As for self contained deployment mechanisms - I do agree that effectively packing everything into a single static compiled binary like AFAIK Go and Rust effectively do has a lot of benefits. But comming from the distro background (Fedora) it also terrifies my quite a bit!
With dynamic linking you can patch CVEs by rebuilding the system library everyone uses (and not just encryption libraries can get CVEs!), you can recompile shared libraries with hardening flags and you actually know if something no longer builds from source next time rebuilding one of the parts fails.
Compared to that a developer provided massive static binary is quite a significant black box that can have multiple unpatched CVEs, compiled in fixed version of patched libraries or even non-publicly available code (meaning the binary can't be in-depndently rebuilt from source).
This scenario kinda describes a fully open source project - I guess for proprietary stuff a lot of these downsides does not really apply though out of necessity.
Ideally one should make the software as robust as possible so that it runs with high probability even in unforeseen environments - and if it fails, it should do that noticeably so that the developers will learn about it and can fix it.
It pretty much works like this in Fedora - lot of the various API breakages and regressions show up when the bits and pieces first land in the rolling distro called Rawhide, but thats fine as hardly anyone runs Rawhide as a daily driver. By the time the next Fedora release is branched from Rawhide and goes via Beta and RC all these issues are ironed out and the end result is pretty stable yet very up to date.
It's usually all the proprietary or heavy bundling software that have the biggest problems with dependency updates as they don't go with the flow or avoid updating the dependency for so long to end up with an insane jump to do.
And this brings us back to where this discussion started: it takes real effort to write non-trivial Python code that will run just by installing the program itself. The path of least resistance is to dump all the problems in the lap of the user, which many Python programmers tend to do in practice. That's not very nice to the consumer.
I think Python could benefit from developing empathy with the consumer.
I guess Numba would be the closest thing to this at the moment.
[disclaimer: I work on Cinder]
One of the difficulties is of course to not break existing C extensions.
But you still do have to jump through some hoops then. But like the example in the docs show there are shareable primitives, you can share numpy arrays, etc.
Multiprocessing can work if you have all your logic in Python already, and there is no notable shared state between the processes, and you don't need to coordinate much at all. So pretty much just the very simplest problems in concurrency. Everything else might technically be possible, but will likely be very slow and annoying.
It's basically fine anywhere you need a function call that you can dispatch out to like 50 or 500 workers on a queue and then do something after that returns, but any shared memory or IPC between the workers is up to you.
Python is also fine for webserving because most web servers pre-fork workers or whatever, so this doesn't come into play there either.
It's harder if you want to do something different where you want threaded workflows with synchronized/protected like constructs that folks might be familiar with from say Java.
Firing up multiprocessing (forking) has some costs to bringing up the interpreters so it's not something you want to start up a lot and then close down a lot, better if you can start things and leave them running. Once it's up it is pretty fast.
I guess mainly it changes the style of your program too much - it's basically just glue around forks.
There is this:
> It's harder if you want to do something different where you want threaded workflows with synchronized/protected like constructs that folks might be familiar with from say Java.
- https://docs.python.org/3/library/threading.html - which you can use as a context manager (i.e. `with lock`)
That's not to say it's useless, it's a nice tool to have in the standard library, but if you're coming from one of the many mainstream programming languages with lightweight threading libraries which make it much easier to take advantage of parallelism it's easy to get frustrated at both python's standard threading and multiprocessing libraries for their respective shortcomings, at least in their CPython implementations.
There is a project to effectively internalise multiprocessing by running separate interpreters inside one process. It’s meant to be cheaper than having separate modules too.
Not on Windows though, as it lacks fork.
Basically, since the reference count is kept right before the object data, as soon as the child process touches it - even just to look at it! - you immediately trigger a copy on write on the nearest 4k of memory, which tends to add up fast if you're not careful. Even if you never touch 99% of them, the garbage collector is happy to do it for you.
At my previous job it was bad enough that I ended up writing a small patch to be able to set some refcounts to 0xFF...FF and treat them specially, never changing their value. (Yes, this also meant that they never got destroyed properly, and the extra checks made our codebase around 4% slower, but it was an acceptable tradeoff. No, the patch was no longer small by the time it hit production.)
The real problem with Python multicore is that the main problem is solves is the one ctur is talking about, namely, "Oh crap, I used Python and it doesn't run fast enough for my needs... maybe I can run more processes?" Using a language that is already in the slowest class of languages and realistically 40-50x slower than other languages means that just to recover the performance you'd get from switching to a compiled language, you need ~50 perfectly parallel processes solving an embarrassingly parallel problem. If you can't do that... and that's a fairly common case, things that are a full, true 50x embarrassingly parallel aren't actually that common on real hardware because you'll get some sort of contention... then you can't even work your way up to the performance that you could have gotten by starting with Go or C# or some other reasonable language.
And that's ignoring the serialization overhead between the processes. If that starts costing you noticeable amounts the number of processes you need to recover goes up really fast, due to the way the math works.
Note the only thing about Python causing this effect is its performance; it is equally true of anything else that is the noticeably slower performance league. So, also, note that if you are using Python in one of the ways where it doesn't have this performance problem, such as NumPy with almost all your compute in native code, this analysis doesn't apply.
In the 1990s when Python was born, programming in Python was massively easier than programming in the static languages of the day. The landscape has shifted... Python is now only incrementally better than modern static languages in some dimensions, and I tend to agree with ctur, it just plain isn't better once you pass a certain size and the stereotypical problems with dynamically-typed code start hitting you harder and harder. There are now a lot of good statically-typed languages where you can start a new project, writing something just incrementally harder to write than Python, with the type inference, built-in associative types, tons more libraries, etc. all the advances of the past 25 years, and get static-typed performance. The gulf between "easy Python" and "pull your hair out static language" is not quite closed. Not quite. But it's a lot closer than it used to be; less "Grand Canyon" as it was in the 1990s and more "can I jump that creek? I mean, it's pretty close... I think I can jump it...".
The upshot is, if you're reaching for multiprocessing for performance reasons, not convenience reasons, you've very nearly already lost. There's a narrow window where it might still make a bit of sense, but you're getting perilously close to "You need to switch languages" just by reaching for it at all.
What I'm getting at is, let's say you have some system where you need highly performant real time processing of some data. Perhaps 10 years ago, you might have needed a complex multi-threaded java/c++ app to handle it. But perhaps now, you use, say, DynamoDB, kinesis, kafka, lambdas, or some other collection of services that you're basically gluing together with... python.
I'm not saying you're wrong. I'm just wondering, if you could truly factor in the costs in both developer time -- developing multi-threaded apps is usually pretty tricky after all -- and infrastructure costs and whatnot, where the boundary between "python is adequate" vs. "we really need go/java/c++" lies, and in which direction it's moving.
In retrospect, I guess it's sort of a silly question, since at the core, the services we'd be gluing together are no doubt written in faster languages. Perhaps a better way of framing it what % of problems can be adequately solved (cheaply) by just using language X, and how has that changed over time?
"It depends.", of course. Despite the significant slow down of single-core performance improvements, 1 CPU is a lot of power in a lot of use cases, even if you divide it by 50. I do frequently find myself reminding some people who get a little too deeply into the cloud mindset and the believe that any non-trivial problem needs clusters of systems that you can still do an awful lot with one CPU. And I often wonder how many "clusters" are out there drinking down the watts doing work that if somebody would just spend a week or two optimizing their code to get the O(n^2.5) algorithm out of their system could be comfortably done on a mid-grade laptop... if somebody just realized you shouldn't need a "cluster" to do this task.
However, flipping your perspective around is probably more interesting... multi-threading is still not "easy", per se, but it is also way easier than it used to be. Threading hell is true and exists, but a non-trivial amount of it was down to the architecture being attempted. It's a bad idea to try to coordinate everything with piles of mutexes everywhere. If you program with more agent-like approaches to resource management, even if you don't do it 100% like Erlang forces you, multithreading gets much easier. What kind of things does it open up to for it to be much easier than it used to be?
I still use Python for certain tasks where it is suited. But I'd have a hard time going back to it for most of my programming. I've internalized the ability to say "and I need a server here with its own thread of control managing this resource, and I need to set up a worker pool there for this data processing task, and I can set up a recurring, independent process to check this other thing periodically without it interfering with anything else" whenever it is necessary to ever be able to go back to architecting systems without that capability. I'm not going back to cooperative scheduling unless basically forced. There's just so many places where you ought to have this capability available to you and it's actually easier to work multithreaded rather than do the work to try to thread a single execution context through all the things that shouldn't be that tightly coupled together.
It is generally underappreciated that having to have two bits of code share an execution context is a deep level of coupling, and languages like Python force all your code into one execution context.
Having multithreading easily available has its downsides, yes, but it also has its legitimate upsides for architecture.
The main benefit of Python now over those newer languages is its ecosystem.
For example MacOS cannot use fork while it was (is?) the default on Linux.
I wish I knew what went on in modern CPUs when it comes to branch prediction and inline caching because that is absolute magic.
Seems spot on, and I really wonder as well. Always had the feeling that some decades ago it was still possible to outsmart the CPU (well, and the compiler/optimizer) and get performance improvements by thinking like a CPU, but these days this seems to have become impossible, for this reason I guess?
This could mean that people working on the Python interpreter encounter more problems with CPU magic than most people do, since changes they make move it away from what the manufacturers optimized for.
E.g. tackling performant numerical computations via numpy and other such libraries seems to be a workable pattern. Even with compiled languages like C/C++ and fortran these types of computations are best handled with tuned libraries.
Instead, we got the "stone soup" where everyone wanted their favorite feature from some other language added in. You'll see this in most languages, where people want something from another language they are more comfortable with. Privately, I refer to this as the California problem: people move away from California (or anywhere else really, I am picking on California) but then bring all of the baggage and voting habits that made California unpalatable to them eventually.
Python is especially vulnerable to this because it conflicts with "There should be one -– and preferably only one –- obvious way to do it." The more features added to the language, the more ways there are to do something, and we then must invent new idioms and lean on "convention."
- iterators everywhere
- context managers
- powerful function arguments
- a good stdlib
- unicode handling
- string formatting
It's easy to get started, and then to get going.
Turns out a lot of people value that.
How difficult is it to develop a language with (mostly) python syntax while keeping the performance of go?
I guess most people who use python use it because of its aesthetics and might even never heard of the GIL issue in python, so i guess among all languages the python syntax is the most liked one.
Python is often 1-2 orders of magnitude slower than java. That really is a lot yet Python and Java are about comparable in adoption.
- speeds matters of that level matters only for a limited number of use cases. Most programming use cases are fine with python speed. They were fine, in fact, with it 20 years ago, before we got that hardware speed.
- people are starting to complain about speed in python only because of all the data science going up. Suddenly here speed matters. Before, you had to get to google level of infra to find use cases where python was too slow.
- dev are expensive. Hardware is cheap. If you have a slow website, paying $200 dollar more on your server is not a big deal. But taking 3 more month dev that features with your $300k pro is another thing entirely.
Python could do what most other languages do and allow programmers to explicitly add locks to the code that needs to be locked, rather than locking everything, all the time.
Python has a fundamental design decision that is difficult, if not impossible, to fix - its dynamic typing. This means it allows users to run code without having to declare what type of data their objects are receiving or giving up any typesafety guarantees with their data.
This can lead users into building applications that are prone to errors in the future due to unforeseen changes in the application's structure.