Related to this, if you set the env var `PYTHONPROFILEIMPORTTIME=1` or run python with `-X importtime`, it will print out the cumulative and self times to import various modules.
Highly recommend to find the worst imports affecting your program startup time.
In general, the python community values tend towards functionality over performance. For example, large modules (looking at networkx here) will often import a bunch of there submodules in their __init__.py, which means all modules now end up loaded even if you didn't need them.
At the other extreme, checkpointing the whole process once all imports have been resolved and restoring it for every execution can be used for frequently-run tools: https://github.com/albertz/python-preloaded
I've been writing python for going on 20 years now and while it was a good language to cut my teeth on thus sort of analysis brings only horror. Many thanks to the author for dropping into plain view.
I'm going to go back to learning more C and Forth... And shake my fist at passing clouds :)
Yeah. I recently worked on a small web project being developed at a university. The project is written in flask, and it presents a reasonably simple UI on top of some data living in a mysql database.
When I started on the project, page loads often took 10 seconds or more. The web application is used by about 20 people and that was enough to bring their single beefy server to its knees. Someone in NY tried scraping the site the other week and the site became completely unresponsive. They resorted to banning the IP to keep the website up. The reasons it was slow were all the usual culprits - a misused ORM being the main one.
It’s a nice language, but I really felt like I’d been transported back in time a few decades working in it. It feels like I’m using a computer from the 90s where performance choices matter again because the language is so slow. And where dependency management is a circus of half working tools and half hearted attempts at versioning. Packages conflict with one another. Some “pinned” package versions have apparently rusted and won’t actually install on my computer. And the system to install packages locally was obviously bolted on, badly, long after the horse had left the gate.
It reminds me of working in C in the early 2000s. I never thought I’d say this but it makes server side JavaScript with npm look positively modern and fast by comparison.
That sounds like some quick kills to be easily made.
I use a dev machine that's quite archaic compared to a modern server, a 2nd gen i5 ThinkPad to be precise, that struggles to top 20ms for a request including loading a user and data object, joined tables and all, via ORM from Postgres running locally with a few hundred thousand records in said tables, before touching anything like explicitly adding caching.
Check your indexes, joins, general DB design and in-app looping. Flask's not your problem. You'll have equal or worse woes (if lower level with less hand holding) with anything else.
Yeah I spoke in past tense about the performance problems because, as you said, there were an awful lot of easy wins to be made. The site is about 2 orders of magnitude faster now, which is incredibly satisfying.
> Flask's not your problem. You'll have equal or worse woes (if lower level with less hand holding) with anything else.
I’m not so sure about that. It’s hard to run the experiment, but I’ve never seen a nodejs app run anywhere near that slowly. The default-synchronous nature of Python combined with its mediocre performance for straight code magnifies the impact of any bad design choices. At least in a nodejs application your server can happily run many sql queries at the same time, or do other work while it waits for the database. I’m sure sufficiently mediocre web server code can bring nodejs to its knees. But in a decade of working with node, I’ve never seen it done. Certainly not in a web app with only 20 users.
I once worked on a small web project, at a university, in Python, using WSGI IIRC. It loaded a lot faster than any of the big expensive apps the university had written.
Well, there was one exception. The little import statement to import the Oracle database client took maybe 15 seconds. MySQL for the win :)
(I would not recommend MySQL for new applications today, although I might recommend it over Oracle…)
I have developed a lot of Python based websites (mostly Django), some quite complex, and I have very rarely seen anything that takes seconds to load - sometimes some database queries have been slow. In most cases load time is dominated by loading JS and images.
> The reasons it was slow were all the usual culprits - a misused ORM being the main one.
So, slow queries.
I have not has such bad issues with dependency management either. Not even with old stuff someone else wrote years ago.
I'm so confused by this, Python is really fast. This isn't to say that other languages aren't a lot faster but I can afford to be so ungodly wasteful with CPU bound tasks (on "leaf" programs don't worry I'm not doing this in libraries to be consumed by others) because it literally doesn't matter. The IO to call print(), write a log, or read a file on disk dwarfs the time actually spent running Python code and this is before using the new JIT.
I wouldn't number crunch in Python without something like numpy because you'll pay the cost of Python's dynamism for nothing but a lot of work has gone into making Python's primitives and standard library performant. I steal algorithms from CPython all the time.
Im curious what the orm misuse was because Python can obviously handle a lot higher loads than that. Perhaps the orm is to blame for offering some footgun. Or maybe the developer did something impossibly idiotic.
One big problem they had was that the system checked the user’s access permissions on every request. Access control in this application is quite complex, and so the access control code ended up issuing multiple queries and doing a lot of over fetching to do its job. (The classic ORM problem.)
It turned out that this was also happening for all static assets. Oops. And the site is covered in very small images. Double oops.
All told, to load a single page the server was making over 150 sql queries. And because it’s Python, those queries were all issued with blocking code. More than enough to keep the server busy for ages.
It doesn’t take 20k syscalls to print a plot, the 20k syscalls is for the import call. I would hope that drawing plots takes a lot less.
To engage with your point: loading a dynamic library in a regular language takes significantly less than 20k syscalls. Probably 20-40 for C on Linux. Python is uniquely inefficient. On most plots comparing resource use by different languages, in order to even show python together with regular languages like Java and C, either you use the log scale, or everything but Python is shown as a single point.
Of course, most people use Python to glue together stuff written in C, so it’s not that big of a deal, but it becomes a problem when people forget pure Python code is literally hundreds or thousands times slower than a “regular” program doing the same thing.
Well, duh. Seaborn is a plotting lib for research — you will probably make a couple of hundred calls to it in a week. In a week. After that, you will save your figures for the report and forget about your code. It's plainly obvious that you don't use it in a high load production scenario.
I just don't see how a person could spend 20 years using python and still can't figure out that you shouldn't hammer nails with a microscope.
Oh absolutely, for some tasks Python is amazing. I use Jupyter notebooks a lot, for example, and the flexibility is an incredible feature.
It just worries me when I sometimes see those same Jupyter notebooks running in production, crunching 100s of terabytes of data. Maybe I’m wrong, but I didn’t get the impression everyone realizes exactly how wasteful that is. I guess AWS credits are easy to come by.
One thing Google did well back in the day, was making resource costs report in SWE/hours, the idea being that you see if you should go and rewrite something. If it cost 100 SWE/h to run, and it only took you a day to cut that in half, you should do it.
Numpy is fine. But people write a lot of complicated code to pull JSON from somewhere, transform it in Python, and write it to parquet somewhere else, for example. JSON, the dict type and parquet are all implemented in C, but a comprehension on top of a Python iterable is just gonna be pure Python “bytecode”. It has been my experience that rewriting such things in C++, or even Go or Java is an easy way to quickly save truly incredible amounts of compute.
A team I used to work with was forced to throw away a finished Python data pipeline that took them a year to build, because it cost more to run than the combined salaries of the team. And I really think if they’d had better intuition about Python’s performance under different scenarios, they could have saved a year of effort. This is why I feel it’s worth having frank discussions about trade offs when it comes to this language.
It’s incredibly useful, but people in the community aren’t clearly told about its limitations. (Especially wrt performance, but also maintainability.)
> It has been my experience that rewriting such things in C++, or even Go or Java is an easy way to quickly save truly incredible amounts of compute.
Sure but there's a trade off, no? Go is typically 3x the code than python. And C++ is 10x the complexity easily.
There was one point back when I stopped coding C++ where one coder might not understand what another C++ coder was doing because the standard was so large.
> A team I used to work with was forced to throw away a finished Python data pipeline that took them a year to build, because it cost more to run than the combined salaries of the team.
You know, I have horror stories about C++ and Java as well. Usually that kind of blame goes to management for not understanding the issues up front. Pretty soon, I'll have slew of stories about go misusage as well.
Well, not to split hairs, but it depends on what you mean by complexity. I would describe Python as possibly the most complex programming language in existence - it’s built in terms of a high number of abstractions, many of which are leaky, and it behaves very differently from version to version and environment to environment.
Python is certainly very terse and expressive. I like writing Python, it’s fun. And it hides a lot of problems from the programmer, but that’s not the same as being simple.
Go is simple, that’s why it’s verbose. It has no syntax sugar and it’s not fun to write Go, but you can read it and see what it’s doing really quickly.
Anyway, it’s about picking the right set of trade-offs, as you say. But the trade off in performance is 1:100, and that’s so punishing at scale that all other considerations kind of fall by the wayside.
> I would describe Python as possibly the most complex programming language in existence.
You haven't lived until you've argued with a C++ language lawyer.
> Go is simple, that’s why it’s verbose. It has no syntax sugar and it’s not fun to write Go, but you can read it and see what it’s doing really quickly.
Python is great. It has a lot of syntax sugar, but it's also easy to read and understand what it's doing. They teach it to elementary school kids. But they use it in F500 companies. And it has made huge strides into the scientific computing, because it's relatively easy to call existing C/Fortran libraries.
Go's experience by comparison is awful. Their community is an anti-social gate-keeping echo chamber. Their FFI is awful. Their language design is awful as well.
Edited to add: I feel like Go got popular because Rob Pike had no problem bad mouthing other languages. "Python/C++ are so terrible...".
Consider Rust on the other hand, where python and Rust seems to be getting along quite well. Rust seems to care about the coding experience. I think that makes a difference.
If they had done performance testing from the start they could have saved a year. A pipeline that has not been performance tested was in no way "finished". Performance is not something that can be tackled on later. In any language...
I've seen bad Java/C++ code go to production before, and cost many more hours to fix it than it was just to replace the code with a working python script using the built in libraries.
Can you cite a source/example for that? I cannot imagine an optimized C program that doesn't blow python with numpy out of the water. Even a poorly written C program is likely to be 2x faster simply because it doesn't have to round trip operations from C to python and back.
Yes, because you're importing a library that does a lot more than just print a plot. A purpose-built Python program that just printed the plot, nothing else, would need a lot less than 20k syscalls too.
You're missing the point - importing a module in other languages takes ~100x fewer system calls. It's a rare example of Python doing something that's mostly written in pure Python, rather than invoked via an FFI, and it shows some of the inefficiency of the language laid bare. That makes it an interesting case to study.
(Of course an import call in Python does a lot more, but the end result is roughly the same as calling `dlopen` in, e.g., Swift.)
Python will do a lot under the hood that a hand-rolled C solution wouldn’t. So I wouldn’t expect the C equivalent to make the same number of syscalls as Python.
> I'm going to go back to learning more C and Forth
Why would you expect that to decrease the number of syscalls you need? The syscalls are there because the program needs the OS to do things. That need is driven by the application domain, not by the programming language you use.
> The syscalls are there because the program needs the OS to do things.
Maybe some of them are. Many of those syscalls are there because Python (not the core program someone is creating, but rather its platform) needs the OS to do things.
Importing an empty python file takes 28 syscalls (30 measured by their tool, but the last two are closing out the trace not actually related to the import). 29 syscalls if you have any text in it (presumably more for larger files).
The logical equivalent in C for many portions of the import process in Python happen at compile + linker time, not during execution. So while it might not be a pleasant experience to develop, a C equivalent of many Python programs would involve far fewer syscalls at execution time.
Python's module importing / $PYTHONPATH lookup/traversal is incredibly inefficient, especially with cold FS caches...
I've worked at places where we've significantly patched the logic (in a way which breaks compatibility in some cases, so couldn't be up-streamed) which makes Python startup / module loading with hundreds of paths in $PYTHONPATH orders of magnitude faster...
I also wrote a somewhat similar tool. I call it deep-ast. It's pretty flexible in what it can track. I used it when refactoring some code in urllib3, to see what Exceptions could get raised along a given code path.
Today I asked a devops engineer to tell me how much time a long (3 seconds avg) api call was spending on database queries, application logic, and network etc. He couldn’t understand the request and instead opened up the azure console and recommended we increase the number cpu cores / memory if performance is an issue.
I keep telling people that a hundred buses will let you take a hundred times more people, but nobody will get to their destination a hundred times faster.
Inevitably this comment is followed by quiet blinking as they digest this and then this question: “Are you saying we need to scale up one hundred times bigger?”
On two occasions I have been asked, — “Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?” In one case a member of the Upper, and in the other a member of the Lower, House put this question. I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question.
— Charles Babbage, Passages from the Life of a Philosopher (1864), chapter 5, Difference Engine No. 1
Disagree, DevOps teams should be looking for resources that are being hit hard unnecessarily and request moves to better solutions when possible.
DevOps teams should be looking at CPU spikes, and should be performing RCAs, and they should be maintaining resources in a healthy state, and they should reject/revert changes and notify problem areas in code by product focused devs.
Product devs, for the most part, are only implementing human lex traces to debug business logic when it arises. Product devs are not equipped with the knowledge to identify system errors that are not "bugs in the code", i.e. they will not be good at telling you why SPROC_LXC1 fails as a result of making a ExcelParserFactoryFactory
If it's a large project, I'll use local imports to defer this cost only around where I'm plotting. That way, if I have another entry point that only does computation or is part of a larger system like a web application, it won't have this sort of overhead.
There is then this neat tool to visualize the data. https://kmichel.github.io/python-importtime-graph/
Highly recommend to find the worst imports affecting your program startup time.
In general, the python community values tend towards functionality over performance. For example, large modules (looking at networkx here) will often import a bunch of there submodules in their __init__.py, which means all modules now end up loaded even if you didn't need them.
I've never tried https://pyoxidizer.readthedocs.io/en/stable/oxidized_importe..., but it compiles all the imports into one, memory mapped file, that _may_ speed up the importing.
Having everything compiled to bytecode also helps a bunch.