Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I think in general Python's biggest challenge is that it doesn't scale well. This is an agglomeration of issues around the same theme: bad packaging when there's a lot of cross-cutting dependencies, slow performance, no concurrency, typing as second-class citizens, etc. All of that is barely noticeable when you're just getting started on a small experimental project, but incredibly painful in large production systems.

I strongly suspect that devs' satisfaction with Python is strongly correlated with the size of the codebase they're working on. Generally people using Python for one-off projects or self-contained tools tend to be pretty happy. People stuck in sprawling enterprise codebases, with O(million) lines of code to wrangle, seem almost universally miserable with the language.

What I've observed a lot is that many startups or greenfield projects start with Python to get an MVP out the door as fast as possible. Then as the scope of the software expands they feel increasingly bogged down and trapped in the language.



I work at Instagram, which is a O(millions) LOC Python monorepo with massive throughput and a large engineering team. It's actually quite nice — but our code is heavily, heavily typed. It would be miserable without the type system. Some of the older parts of the codebase are more loosely typed (although they're shrinking reasonably quickly), and those sections are indeed a huge pain to ramp up on.

Part of the success of IG's large Python codebase is FB's investment into developer tooling; for example, FB wrote (and open-sourced, FWIW) our own PEP-484 compliant type checker, Pyre [1], because mypy was too slow for a codebase of our size.

1: https://github.com/facebook/pyre-check


That's my major complaint about Python...

For it's age and popularity - the tooling is abysmal.

I have to rewrite too many things, that I expected to just be there for the age of the project.

And some things were only fixed now! dict + dict only started to work with 3.9!


dict1 + dict2 is syntactic sugar, you had been able to do `dict1.update(dict2)` since forever.


No, it's completely different: dict1+dict2 creates a new object and leaves inputs unchanged, dict1.update(dict2) modifies dict1.


you've been able to do splat for forever! https://dpaste.org/yU7s


Yeah, you could use my hairline to tell when I’m working with Python... I still haven’t figured out a good typed framework for it yet.


I also struggle with weak(er) typing in Python, compared to strong(er) typing in C++, Java, or C# -- or even VBA(!).

Frustrated like you, I wrote my own open source type checking library for Python. You might be interested to read about it here: https://github.com/kevinarpe/kevinarpe-rambutan3/blob/master...

I have used that library on multiple projects for my job. It makes the code run about 50% slower, on average, because all the type checking is done at run-time. I am OK with the slow down because I don't use Python when I need speed. My "developer speed" was greatly improved with stricter types.

Finally, this isn't the first time I wrote a type checking library/framework. I did the same for Perl more than 10yrs ago. Unfortunately, that code is proprietary, so not open source. :( The abstract ideas were very similar. Our team was so frustrated with legacy Perl code, so I wrote a type checking library, and we slowly applied it to the code base. About two years later, it was much less painful!


Try using Pyre! It's open-source. I use it daily at IG.


Have you guys published any whitepaper on this subject? These last few years working on moderately large python codebases with dynamic typing have been less than idilic.



"devop" here

> doesn't scale well.

Nothing scales well. scaling requires lots of effort. It doesn't matter what language you use, you'll rapidly find all its pain points.

> bad packaging when there's a lot of cross-cutting dependencies

Much as I hate it, docker solves this. Failing that poetry or if you must venv. (if you're being "clever" statically compile everything and ship the whole environment, including the interpreter) its packaging is a joy compared to node. Even better, enforce standard environments, which stops all of this. One version of everything. you want to change it? best upgrade it for everyone else.

> slow performance

Meh, again depends on your use case. If you're really into performance then dump out to C/C++ and pybind it. fronting performance critical code in python is a fairly decent way to allow non specialists handle and interface performance critical code. Its far cheaper to staff it that way too. standard python programmers are cheaper than performance experts.

If we are being realistic, most of the time 80% of python programs are spend waiting on network.

Granted, python is not overly fast, but then most of the time your bottleneck is the developer not the language.

> no concurrency

Yes, this is a pain. I would really like some non GIL based threading. However its not really been that much of a problem. multiprocessing Queues are useful here, if limited. Failing that, make more processes and use an rpc system.

> typing as second-class citizens

The annotation is under developed. being reliant on dataclass libraries to enforce typing is a bit poop.

> People stuck in sprawling enterprise codebases, with O(million) lines of code to wrangle, seem almost universally miserable with the language.

I work with a _massive_ monorepo. Python isn't the problem, its programmer being "clever" or making needless abstractions of abstractions. None of that is python's issues, its egotistical programmer not wanting to read other people's (un documented) code. And not wanting to spend time make other people's code better.


>Nothing scales well. scaling requires lots of effort. It doesn't matter what language you use, you'll rapidly find all its pain points.

This is very important. A lot of people think that just using go or rust or whatever other language is new fixes all of this. But with a big enough project, you'll find all the issues. It's just a matter of time.


Do not miss that one will find all of the language's pain points. I'd wager that a dynamically typed languages such as Python has quit a few more pain points at scale than a more principled language such as OCaml.

I love Python's bignum arithmetic when I write small prototypes for public key cryptography. I love Python's extensive standard library when I'm scrapping a couple web pages for easier local reading. But I would never willingly chose it for anything bigger than a few hundred lines. I'm simply not capable of dealing with large dynamically typed programs.

Now if people try Rust or OCaml with the mentality of an early startup's Lisp developer, they're going to get hurt right away ("fighting the language" and "pleasing the compiler" is neither pleasing nor productive), and they're going to get hurt in the long run (once you've worked around the language's annoying checks, you won't reap as much benefit).

If you'll allow the caricature, don't force Coq down Alan Kay's throat, and don't torture Edsger Dijkstra with TCL.


Though OCaml’s tooling pain points hurt at least as much as Pythons, even though I adore the language.


This is somewhat true - scaling is hard no matter what - but some things scale much better than others. I have been miserable working with ruby on rails codebases that are much smaller than java codebases I have been content working on. This is despite personally enjoying the ruby language far more than the java language.


> Much as I hate it, docker solves this. Failing that poetry or if you must venv. (if you're being "clever" statically compile everything and ship the whole environment, including the interpreter) its packaging is a joy compared to node. Even better, enforce standard environments, which stops all of this. One version of everything. you want to change it? best upgrade it for everyone else.

No, docker doesn't solve the fact that some packages just won't play nicely together. NPM actually does this better than the python ecosystem too since it will still work with different versions of the same dependency. You get larger bundle sizes but that's better than the alternative of it just flat not working.


Scalability is not just runtime, it's also developer time scalability. The larger the project, the more you have to split it up and write interface documentation between libraries - which adds complexity.

As for processing scalability - Python is OK, but it's considerably hampered by BDFL's own opinions. The result is a few third party libraries that implement parallelism in their own way. That functionality should be integral to the standard library already. The worst part is lack of standard API for data sharing between processes.

> packaging

Python's packaging issues only start with package management. Setuptools is a wholly mess of a system, that literally gave me headaches for the lack of "expected features". I hate it with every single cell in my body.

And then there are systems and libraries, where you literally cannot use docker (Hello PySpark!).

>read other people's (un documented) code

I lolled! Seriously... We get Python fanboys moan about how indentation makes everything more readable and it's a pleasure to write code in python. Give me a break!


its programmer being "clever"

When I have to revisit old code I've written, I occasionally encounter my "cleverness" at the time. I always hate that past version of me. I think I've mostly learned my lesson. I guess I'll know in a few years.


"When I have to revisit old code I've written, I occasionally encounter my "cleverness" at the time. I always hate that past version of me. I think I've mostly learned my lesson. I guess I'll know in a few years."

...I feel attacked.


I'm sorry, but as a fellow Python "devop" too, this really reads like empty apologism.

>Nothing scales well. scaling requires lots of effort.

Sure, just like all PLs have their flaws, and most software has security vulnerabilities. But it's a question of degree and the tendency of the language. Different languages work better in different domains, and fail in others, and what Python is specifically bad at is scaling.

If only for the lack of (strong/static) typing and the relatively underpowered control flow mechanisms (e.g. Python often using exceptions in their stead)... While surely all languages have pain points that show up at scale, Python still has a notable lot of significant ones precisely in this area.

>docker, poetry, venv...

Yes, and this is exactly the point. There's at least three different complex solutions, none of which can really be considered a "go-to" choice. What is Rust doing differently? Hell, what are Linux distros doing differently?

>If you're really into performance then dump out to C/C++ and pybind it.

If you want performance, don't use Python - was the parent's point.

>If we are being realistic, most of the time 80% of python programs are spend waiting on network.

This really, really doesn't apply to all of programming (or even those domains Python is used in). Besides, what argument is that? If it were true for your workload, then it would be so for all other languages too, meaning discussion or caring about performance is practically meaningless.

>Granted, python is not overly fast, but then most of the time your bottleneck is the developer not the language.

Once again, this applies to all languages equally, yet, for example, Python web frameworks regularly score near the bottom of all benchmarks. I doubt it is because of the lack of bright programmers working in Python, or the lack of efforts to make the frameworks faster.

>Python isn't the problem, its programmer being "clever" or making needless abstractions of abstractions.

Just as C isn't the problem, it's the programmer forgetting to check for the size of the buffer, and PHP isn't the problem, it's the programmer not using the correct function for random number generation.

You can always trace any given error to a single individual making an honest mistake, that's really not a useful way to think about this. It's about a programming language (or an environment) leading the programmer into wrong directions, and the lack of safety measures for misguided "egotistical programmers" to do damage. You can blame the programmers all you want, but at the end of the day, the one commonality is the language.

Now Python is still one of my favorite languages, and I think that for a lot of domains, it really is the right choice, and I can't imagine doing my work without it. But performance and large, complex systems, is not one of those domains, and I honestly feel like all you've said in Python's favor is that other languages are like that too, and that it's the fault of the programmers anyway.


I have thought about what you've written. I broadly agree. I didn't mean for my post to be a "python is great really", It was more to illustrate that all programming languages have drawbacks.

The is a points that I think I've failed to get over:

> Just as C isn't the problem, it's the programmer forgetting to check for the size of the buffer, and PHP isn't the problem, it's the programmer not using the correct function for random number generation

I don't think I was arguing that point. of course all languages have their USP. My point I wanted to get across is that large python projects are not inherently hard to manage. That kind of scaling is really not that much of an issue. I've worked on large repos for C, C++, python, perl, node and as a punishment, php. The only language that had an issue with a large codebase was node, because it was impossible to build and manage security. The "solution" to that was to have thousands of repos hiding in github.

The biggest impediment to growth was people refusing to read code, followed swiftly by pointless abstractions. This lead to silly situations where there were 7-12(!) wrappers for s3 functions. none of them had documentation and only one had test coverage.


Very much agree. I oversee a relatively small python codebase, but getting good quality, safe code out of the developers in a controlled way is really hard - there are so many ways in which the language just doesn't have enough power to serve the needs of more complex apps. We have massive amounts of linting, type hinting, code reviews spotting obvious errors that would be just invalid code in other languages.

It's like getting on a roller coaster without a seat belt or a guard rail. It's fun at first, and you will make it around the first few bends OK ... then get ready ...

Of course, with enormous discipline, skill and effort you can overcome all this. But it just leaves the question - really, is this the best tool for the job in the end? Especially when you are paying for it with horrifically bad performance and other limitations.


Have you ever seen O(million) lines enterprise codebase that didn't suck?


This is surely anecdotic and very subjective, but I have (in Java and in C++; IIRC the exact versions were Java 7 and C++03), and the level of pain was lower than with a Python code base that was about one order of magnitude smaller. In the case of C++, the pain was mostly asociated with an ancient build system we used; the code itself was relatively manageable. There was almost zero template code, and maybe that helped (although in other occasions I've worked with smaller C++03 codebases that relied heavily on templates and I didn't find them that bad).

Not all codebases are equal and maybe I was lucky, but in my experience, using dynamic languages (or, to be exact, any language where the compiler doesn't nag you when there is a potential problem) doesn't scale well.


I've worked with an O(100k) line code base in Python that was pure torture. Honestly, I was so desperate for static-typing by the end that I would have preferred if it was all written in C++.

Large codebases are really hard reason about without types. I'm glad we now have projects like Pyre that are trying to bring typing to Python.


I've worked with Python for more than 15 years, usually with code bases 50-100k lines per app. The only time I have had real issues with types was a codebase I inherited where previous developers were using None, (), "", [], and {} all to mean roughly the same thing and sometimes checking for "", sometimes () etc. I couldn't handle it, so I put asserts everywhere and slowly found out where those things were coming from and sanitized it to be consistent.


There's some confounding issues that are often confused together tho.

large python code bases _could_ be written with well modularized, clean separation of concerns and composability. Or it could be written in spaghetti.

Using types _could_ help a code base from becoming spaghetti, but it's not the only way. I think the understandability and maintainability of a code base has more to do with the person writing it than the availability of a type system tbh.


No they can't, at least not with the same amount of effort. Of course, you can make everything good by throwing enough time and money on it, but that's not the point.

The issue is that to have a nice and well architected code base, you have to constantly refactor and improve - sometimes you need to re-arrange and refactor huge parts of the code. Without types _and_ tests, this is just not gonna happen. It will be unproductive and scary, so that people will start to stop touching existing code and work their way around it.

> I think the understandability and maintainability of a code base has more to do with the person writing it than the availability of a type system tbh.

That is the same thing. Because someone who wants great maintainability will also want a great type system (amongst other things).


A good carpenter never complains about his tools. He works around their limitations or uses something else.

The quality of the product is down to the skill of the worker either way.


We can assume buffer overflows are less common in Java than in C and I doubt that Java programmers are better craftsmen.

The same with types: they make some kind of errors much less likely though there is no silver bullet in general e.g., I much prefer a general purpose language such as Python for expressing complex requirements in tests over any type system (even if your type system is turing complete and you can express any requirement in it; it doesn't mean it is a good idea)


How often is a carpenter told to use this particular rusty saw or their work won't be compatible with everyone else's?

Everything interlocks in such intricate ways that you can't meaningfully choose your own tools, and working around problems only goes so far. And you can't repair your own tools.


There's also failures of the community to provide good guidance.


> desperate for static-typing

Can you explain why? I honestly don't know, because my experience with C++ was during school ~20 years ago, and since then professionally I've used mostly python in relatively small codebases where it's all my own code (mostly for data processing/analysis). Thanks!

(Although I did have to write some C code to glue together data in a very old legacy system that didn't support C++, much less python. It took a lot more effort to do something simple, but it was also strangely a really rewarding experience. Kind of similar to feeling when I work with assembly on hobby projects)


The main problem with duck-typing like python has is the lack of consistency between different objects that code has to work on. Different callers may pass objects with different sets of methods into a function and expect it to work. You run into the case where the object that was passed in is one with subtly-mismatched behavior from what your method expects, but you don't know who created it - it was probably stored as a member variable by something 10 callstack levels and 5 classes distant from what you're currently working on.

Static typing prevents that by telling you early where the mismatch is happening - some method calls into another with a variable of the wrong type, and that's where the bug is. It also allows tooling to look up the types of variables and quickly get information about their properties.


Got it, that makes sense. It also makes sense why I've not much been bothered by it in python since my relative small code bases don't have that many layers of abstraction laid on top of each other. I'm generally not working with more than 2,000-3,000 lines, and I can just about keep the basic structure in my head. (Unless it's been a while since I've had to revisit it... then I often hate my past self for getting "clever" in some way)


For these small code bases, static typing is still great (if you are used to it already) but the adverse effects of not having it usually show much stronger with a team (and not a single person). And yeah, if you keep the structure in your head, then you are good anyways.


> it was probably stored as a member variable by something 10 callstack levels and 5 classes distant from what you're currently working on

If you can define methods on an object dynamically in Python, it doesn't mean that you should. Monkeypatching is not encouraged in culturally in Python. Most often it is seen in tests otherwise, it is rare.

Nobody forbids using ABCs to define your custom interfaces or using type hints for readability/IDE support/linting (my order of preference).


Funny, I'm working on a project about the same size, and the overly aggressive type and value restrictions are the main problem that I struggle with daily.


I am literally working on two projects that are roughly 100kLOC each.

The Scala Spark project I can navigate, understand, test and consider to be average complexity... with some failures, unique to Scala.

The Python Spark project is barely readable.

People who built the Python Spark codebase are "experienced Python devs". While Scala codebase was built by people who used Scala for the first time.

(take this anecdote, as evidence for the poor tooling and guidance present in python community.... and BDFL's own failures)


I've worked on several separate projects of that size in C++ and Go. None of them seemed to achieve a similar mess as Python codebases with one or two dozen thousand lines seem to. OTOH, all the typing developments in Python should have helped? I don't have that much experience with them in enterprise setting.


I have - and it's not that bad. The key is you have to have someone coordinating and driving a shared vision for the codebase and patterns. But it's hard to find people with that sort of passion and drive to follow-through as it's a multi-year endeavor with politics all over.

Otherwise its a thousand implementations of the same 100-line piece of code interspersed everywhere.


It seems like quality code management gets passed over by (bad) management because it looks like it doesn't directly move the project forward.

Which is strange because those same managers may be full adherents to micro tasking projects in a project management system whose purpose is basically to do for the project what code management does for the code itself.

In my workplace, we've recently had leadership that appreciates these things, and the difference is night & day. Simple requests from "stakeholders" (I hate that term) are often filled in days, or same day, instead of weeks. I think it helps tremendously that the primary manager is also a coder herself, and still codes ~25% of her job.


That's the problem with some languages - they lack a visionary, that drives the overall understanding of how things should be structured.

I believe it's Guido that basically said - if you don't like how Python does it, then implement it in C. And that's how you end up with great C based libraries bound to python... and python is often used as a messy orchestrating language.


LOL!!

Or even worse - could you imagine how many lines that would be in C++ ?

Yowza!


n!


And how much of those problems are an artefact of moving fast and getting things down.

I've seen the exact same scenario with other languages. The problem is that in a start up environment you are likely adding amd retiring more "features" at a speed that layers so much complexity that you can no longer reason about what business rules are actually valid any more.


I think that's part of it. There is a convention over configuration issue as well. A language like Go forces some patterns like package management and formatting unless you actively try to subvert it.

It wouldn't surprise me if many of these issues are self-selecting in the language communities as well.


I work on Python every day on a reasonably large code base and have none of the issues you’re talking about. I’m 10x more productive than similar C or Java projects.

Dependency management is about as easy as it is going to get. We have problems with our dependencies breaking stuff, but who doesn’t?

People talk as if packaging is a solved problem. It isn’t in any language. And then they complain that Python packaging changes too much. That’s because folks are iterating on a hard problem.


Do you handle deployment of this Python application? For me, that's where the pain points arise. I love writing Python, but deploying it does not spark joy at all, at all.


Here's some of the ways to deploy Python code:

- `curl -L https://app.example.com/install | sh` that downloads installer and runs for instance: apt/yum install <your-package>

- in CI environment on a VM: `git checkout` & `pipenv install --deploy`

- `pipx install glances` on a home computer

- just `pip install` e.g., in a docker container [possibly in a virtualenv]. For pure Python packages, it can work even in Pythonista for iOS (iphone/ipad)

- just copy a python module/archive (PyInstaller and the like)

- give a link to a web app (deployed somewhere via e.g., git push)

- for education: there are python in the browser options e.g., brython, repl.it, trinket.io, pythontutor.com

- just write a snippet in my favourite editor for literate devops tasks/research (jupyter-emacs + tramp + Org Babel) or give a link to a Jupyter notebook

- a useful work can be done even in a REPL (e.g., Python as a powerful calculator)


The fact that you have 9 different ways all with their own different problems is exactly the problem here.


Do you use a single program on all of your devices for all possible computer-related tasks? Do you see a fault in such logic? Why do you think the deployment space is any different: do you use kubernetes for everything?

I dare you. Do mention any tool/any language that handles all the above use cases without sacrificing the requirements for each use-case.


> Do you use a single program on all of your devices for all possible computer-related tasks? Do you see a fault in such logic?

No. But if I talked about how I used 9 different word-processing programs, you'd see that as a problem, or at least an indictment of those programs. Deployment isn't that complicated.

> I dare you. Do mention any tool/any language that handles all the above use cases without sacrificing the requirements for each use-case.

I use Maven/Scala and as far as I can see it covers all of them other than "give a link to a web app" which isn't actually deploying at all (and I'd still have used maven to deploy the webapp wherever I was deploying it).

I don't think there's any legitimate case for curl|sh, and I don't think there's any real reason for separate pip/pipenv/pipx (did you make that one up? Have I fallen for an elaborate troll?) - rather pipenv exists to work around only being able to install one version of a library at a time. Nothing's gained by having "just copy a module/archive" be different from what the tool does. Running in browser, notebook, or REPL can and should still use the same dependency management tooling as anything else.

If I want to deploy my code, I use maven. You can use curl (since maven repositories use standard HTTP(S)) or copy files around by hand, if you have a use case where you need to, but I can't think what that would be. If you want to bundle up your app as a single file, you can configure things to do that when publishing, but the dependency resolution, repository infrastructure, and deployment still look the same. Even if you want to build a platform-level executable, it's the same story, all the tooling just works the same. If I want a REPL or worksheet, I can start one from maven (and use the same dependency management etc. as always), or my IDE (where it's still hooked up to my maven configuration). If I want to use a Zeppelin notebook then there's maven integration there too.

Ever wonder why you don't hear endlessly about different ways of doing dependency management in non-Python ecosystems? Because we have tools that actually work, and get on with actually writing programs. It baffles me that Python keeps making new tools and keeps repeating the same mistakes over and over: non-reproducible dependency resolution, excessively tight integration between the language and the build tools, and tools and infrastructure that can't be reused locally.


My core problem is with C/C++ depedencies. Can you describe to me how you handle these when you deploy Python?


  - system packages (deb/rpm/etc)
  - binary wheels (manylinux)
  - building from source
plus some caching if appropriate


God, I wish that would work for me.

To take your examples in order:

1) system packages: almost always out of date for my needs

2) Binary wheels: I actually haven't investigated this much, maybe it will work (and if it does, I'll buy you a drink if we ever meet in person).

3) Building from source: this kinda proves my point about Python having poor dependency management tools if this is a serious response. In general, this would be much further down the rabbit hole than I want to go.


I use Anaconda exclusively and deployments (with virtual environments) have been fairly ok.

That said, I do run into trouble when I have a dependency that requires compilation on Windows (i.e. like the popular turbodbc) because say, a wheel isn't available for a particular Python version. Any time a compilation is needed, it's a headache. Windows machines don't come with compilers, so one has to download and install a multigigabyte Visual Studio Build Essentials package just to compile. Sometimes the compilation fails for various reasons.

Require gcc compilation is headache for installing dependencies inside Docker containers too -- you have to install gcc in order to install Python dependencies and then remove gcc after.

I think requiring local compilation (instead of just delivering the binary) is a UNIX-mindset that is holding back many packaging solutions. I think a lot of pain would be alleviated if we could somehow mandate centralized wheel creation for all Python versions, otherwise the package manager marks a package as broken or unavailable and defaults to the last available wheel.

Also if only we applied some standards like R's CRAN repo does -- ie. if it doesn't pass error checks or doesn't build on certain architectures (institute a centralized CI/CD build pipeline in the package repo), it doesn't get published -- the Python packaging experience would be much improved.


Yeah, if PyPi was as annoying as Cran with respect to new versions, then a lot of this pain would go away.

For those who don't realise, when there's a new version of R, anything that doesn't build without errors/warnings is removed from the archive.

This is really annoying if you want something to keep running, but it prevents the kind of dependency rot common to Python (recently I found a dependency that was four years out of date).


Curious to know what issues you have with deploying Python codebases. Out of all of the minor and major gripes I have with Python, deployment is not one of them.


To me, python deployments are painless, as long as you can stick to pure dependencies and possibly wheels.

Once a pip install needs to start compiling C, things do go way south very quickly. At that point you can install the union of all common C development tools, kernel headers and prepare for hours of header hunting.

I've done that too much to like python anymore.


Yup, yup. I deploy statistical models with Python, and these always have C dependencies.

Additionally, they are part of a larger application, which is mostly managed by pip, which means that I need both pip and conda which is where things get really, really hairy.

I actually blame Google and FB here, as neither of them use standard python dependency management tools, and many of their frameworks bring in the world, thus increasing the risk of breakage.


Adding data files via setuptools....

And putting them into a common shared directory.

Try doing that without writing convoluted code in your setup.py.


It is funky, but importlib package resources helps.


Production deployment is Docker all the time.

Deployment for development is just pyenv and virtualenv.


No concurrency? asyncio is great for I/O bound network stuff!


"No parallelism" is probably what was meant.


> "No parallelism" is probably what was meant.

Which is still wrong, of course, but "no in-process (or in-single-runtime-instance) parallelism" would be correct, as would "forking inconvenient parallelism".


Posix fork() doesn't really count, if that's what you mean...


Why doesn't Python's multiprocessing module (which uses fork by default on Unix) count? It literally exists for parallelism.


It's understood that you can have "parallelism" by running two copies of your program using basic system facilities like fork(), or even by buying several computers and running one instance of your program on each of them. That's not what is meant by a language "supporting parallelism". If it was, then every language ever designed supports parallelism and so the term is meaningless.

To claim that a language "supports parallelism", it has to do something more to facilitate parallel programming. I would say that parallel threads of computation with shared memory and system resources is the bare minimum. You can go the extra mile and support transactional memory or other "nice" abstractions which make parallel programming easier.

Saying that Python support parallelism because it has a fork() wrapper is like saying that Posix shell is a strongly typed language because it has strings and string is a type.


It doesn't use fork() on macOS anymore, because some of Apple's own APIs get broken by its use.

Pretty much any app that uses both fork and threads, has to jump through many hoops to make the two work together well. And this applies to all the libraries that it uses, directly or indirectly - if any library spawns a thread and does some locking in it, you get all kinds of hard-to-debug deadlocks if you try to fork.

So unless you have very good perf reasons to need fork, I would strongly recommend multiprocessing.set_start_method("spawn") on all platforms. No obscure bugs, and it'll also behave the same everywhere, so things will be more portable. Code using multiprocessing that's written to rely on fork semantics can be very difficult to port later.


You wouldn't fork() for performance, but for security reasons.


It's not wrong. If running two processes counts as parallelism, then everything does parallelism, and it becomes pointless to talk about it.


Then one should talk about how convenient the related abstractions are. I like the concurrent.futures library.


Concurrency is not the same as parallelism. Python has good concurrency support, I agree. Python (C Python) does not support parallelism however due to its Big Interpreter Lock which actively prevents any parallelism in Python code.

This was probably a conscious design decision on the part of C Python implementers and perhaps a good one. But we should not claim that Python is something which (actively and by design) it's not.


I use `concurrent.futures.ProcessPoolExecutor` fairly often. I handle inter-process communication usually through a database or message queue, expecting that someday I'll want to go clustered instead of just single-machine multiprocessing. I've been burned by implementing multithreading and then needing to overhaul to clustered enough times to stop doing it.


At O(million), the problem wrangling it has more to do with how well its architected and written than it being Python. Python is at least easy to read. Its major deficiency is the lack of annotation of parameters, and that's something that could now be fixed... but it isn't going to be fixed in that much historical code.

It you are trying to get performance out of it (which doesn't really hinge on whether it's a million lines of code), then Python might be the wrong choice. But you can always write it in Rust or C and give Python an API to the functionality.

I agree that packaging is a mess. Fixing that mess with modularization in Java took a long time, and most other languages have that problem, too.


I disagree. Python is not inherently easy to read.

Explicitness and naming standards screw up the clarity of any code... Not to mention the complexity when you get into OOP.


Also lack of switch-case statements dont help. (Workaround is either if statements or dict of functions to be called)


>People stuck in sprawling enterprise codebases, with O(million) lines of code to wrangle, seem almost universally miserable with the language.

This seems to be the case with most languages, especially if good code control isn't practiced, and unfortunately that's not uncommon.


Is concurrency really an issue? Yes, you do not have threads, but you can launch multiple processes. Do you really need to habe shared memory for your concurrent needs (i think it is muuuch easier to introduce subtle bug into a shared memory concurrency (threads))


Could I ask you what language you use instead then?


This described is perfectly




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: