Hacker News new | past | comments | ask | show | jobs | submit login

If your project has any third-party dependencies, and so (nowadays) you're going to set up requirements.txt and virtualenv and whatever anyway, I can see that you're going to think things like "this XML parser in the standard library is just getting in the way; I can get a better one from PyPi".

But I think a lot of the value of a large standard library is that it makes it possible to write more programs without needing that first third-party dependency.

This is particularly good if you're using Python as a piece of glue inside something that isn't principally a Python project. It's easy to imagine a Python script doing a little bit of code generation in the build system of some larger project that wants to parse an XML file.




I think the biggest problem is going from zero third-party dependencies to one and more. Adding that very first one is a huge pain since there are many ways of doing it with many different trade offs. It is also time consuming and tedious. The various tools like you mention are best at adding even more dependencies, but are hurdles for the very first one.


Not true. The more external dependencies you add, the more likely it is that one of them will break. I try to have as few external dependencies as possible, and to pick dependencies that are robust and reliably maintained. There is so much Python code on GitHub that is just broken out of the box. When people try your software and it fails to install because your nth dependency is broken or won't build on their system, you're lucky if they open an issue. Most potential users will just end up looking for an alternative and not even report the problem.


To add one more third party dependency when you already have some is as simple as adding one more to whatever solution you are already using (eg another line in requirements or running a command).

When you have no third-party dependencies, then adding the first one requires picking amongst trade offs and lots of work. A subset of choices include using virtualenv, using pip, using higher layer tools, copying the code to the project, using a Python distribution that includes them, writing code to avoid needing the first dependency ...

* You have to document to humans and to the computer which of the approaches is being used

* Compiled extensions are a pain

* You have to consider multiple platforms and operating systems

* You have to consider Python version compatibility (eg third party could support fewer Python versions than the current code base)

* And the version compatibility of the tools used to reference the dependency

* And a way of checking license compatibility

* The dependency may use different test, doc, type checking etc tools so they may have to be added to the project workflow too

* Its makes it harder for collaborators since there is more complexity than "install Python and you are done"

I stand by my claim that the first paragraph (adding another dependency) is way less work, than the rest which is adding the very first one.


They're typically broken out of the box because they don't pin their dependencies. pip-tools[1] or pipenv[2], and tox[3] if it's a lib, should be considered bare minimum necessities - if a project isn't using them, consider abandoning it ASAP, since apparently they don't know what they're doing and haven't paid attention to the ecosystem for years.

[1] https://github.com/jazzband/pip-tools [2] https://docs.pipenv.org/en/latest/ [3] https://tox.readthedocs.io/en/latest/


It's trickier than just pinning dependencies because some libraries also need to build C code, etc. Once you bring in external build tools, you have that many more potential points of failure. It's great. Also, what happens if your dependencies don't pin their dependencies? Possibly, uploading a package to pipy should require freezing dependencies or do it automatically.


Modern python tooling like pipenv pins the dependencies of your dependencies as well. This is no longer an issue


I used a requirements.in file to list out all the top-level direct dependencies & then used pip-compile from piptools to convert that into a frozen list of versioned dependencies. pip-compile is also nice because it doesn't upgrade unless explicitly asked to which makes collaboration really nice. I then used the requirements.txt & various supporting tooling to auto-create & keep updated a virtualenv (so that my peers didn't need to care about python details & just running the tool was reliable on any machine). It was super nice but there's no existing tooling out there to do anything like that & it took about a year or two to get the tooling into a nice place. It's surprisingly hard to create Python scripts that work reliably out-of-the-box on everyone's environments without the user having to do something (which always means in my experience that something doesn't work right). C modules were more problematic (needing Xcode installation on OSX, potentially precompiled external libraries not available via pip but also not installed by default), but I created additional scripts to help bring a new developer's machine to a "good state" to take manual config out of the equation. That works in a managed environment where "clean" machines all share the same known starting state + configs - I don't know how you'd tackle this problem in the wild.

I do think there's a lot of low-hanging fruit where Python could bake something in to auto-setup a virtualenv for a script entrypoint & have the developer just list the top-level dependencies & have the frozen dependency list also version controlled (+ if the virtualenv & frozen version-controlled dependency list disgaree rebuild virtualenv).


I don't know if it'd work the same way, but I've had a lot of success with Twitter's Pex files. They package an entire Python project into an archive with autorun functionality. You distribute a Pex file and users run it just like a Python file and it'll build/install dependencies, etc. before running the main script in the package.

I used it to distribute dependencies to Yarn workers for PySpark applications and it worked flawlessly, even with crazy dependencies like tensorflow. I'm a really big fan of the project, it's well done.

https://github.com/pantsbuild/pex


Unless your dependency is a C-header file updated by your distro as part of a new version.


Requiring people to "pay attention...for years" is not the way to build long-term robust software.


the problem is it can fall apart quickly. the XML parsing in the standard library is limited and slow, so most people consume lxml instead [0]. so it depends on the case. counterpoint: e.g. pathlib being in included is great. it was at least inspired by 3rd party libraries, but the features are relatively stable and the scope defined, and relatively few dependencies, and so moving it into the standard library is a win IMO. not only for import ease, but for consistency.

[0] https://pypi.org/project/lxml/


ElementTree is in the stdlib. It isn't slow and has incremental parsing and so on.

It's also a nice API for dealing with XML.


> ElementTree is in the stdlib. It isn't slow and has incremental parsing and so on.

I had enough trouble using it efficiently that I went and wrapped Boost property tree[0] and can happily churn out all sorts of data queries (including calling into python for the sorting function from the C++ lib) in almost no time.

I was taking daily(ish) updates of an rss feed and appending it to a master rss file but sorting was pretty slow using list comprehensions so now I convert it automagically to json and append it as is. No more list comprehensions either, just hand it a lambda and it outputs a sorted C++ iterator.

Though I probably should've just thrown the data into a database and learned SQL like a normal person...

[0] https://github.com/eponymous/python3-property_tree


> and so (nowadays) you're going to set up requirements.txt and virtualenv and whatever anyway

If only that were the norm amongst long-tail python users. Heck I don't do it; I have the anaconda distribution installed on Windows and when I need to do a bit of data analysis hope I have the correct version of packages installed.

Making this core to the python workflow (bundling virtualenv? updating all docs to say "Set up a virtualenv first"?) is the first required change, before thinking about unbundling stdlib


I develop in python since 10+ years. And setting up a virtualenv only happens when developing patches for 3rd party packages. (due to the other commercial environment that doesn't work with venvs).

And it is a totally miserable experience on Windows, every single time.


The stdlib would still ship with Python I presume, it would be simply updatable. But that doesn't mean it wouldn't work without a "requirements.txt" (use Pipenv which is light-years ahead).


Last week I have written a Python script purely relying on the stdlib. Basically a coworkers shell script had to be adjusted to account for newer datafiles I was processing and I am not that experienced with shell script magic. I typed down 15 lines of Python code only relying on the std lib and was happy, it ran on the server's 2.x Python without an issue. This is the primary selling point of a larger std. library.

But, I do wonder whether it needs to grow. Python is in a stage where adoption of new std library features is inherently slow, not only in third-party libraries like twisted, but also in applications, even if they use newer Python versions.

What kills Python here is that it is most commonly bundled with the linux distribution or the OS (true also on mac). This reduces cycle times drastically. Compare this to newr language platforms that people like to install in newer versions on older platforms quite regularly.

Some recent additions would be fine additions at an early stage of language development, but surely not for Python.


> What kills Python here is that it is most commonly bundled with the linux distribution or the OS (true also on mac)

It doesn't kill it, quite the contrary, rather makes it ubiquitous. If you want another version you just install virtualenv. It's the same with Perl. We use the Perl version shipped with the distribution (openSuSE) and deploy to that. It's older but it's stable and it works. On our dev environment (mac) we have the same version with all of the modules installed in plenv. We also chose a framework with as little deoendencies as possible (Mojolicious). It looks like it was a great choice.


What they need is an Apache Commons or Guava of Python. They're both defacto part of the standard java library.


I try to avoid Guava because they have a habit of making incompatible breaking changes, and because so many libraries depend on it, it's likely to cause version conflicts. The way Apache Commons puts the major version in the package is much better in that regard.


I have not experienced this running guava 16-23 in various apps. Maybe incompatible but they're good about security patches for old versions. I have never seen a version conflict between guava releases


It's very easy to get a Guava version conflict because (a) Guava frequently adds new stuff, and (b) Guava semi-frequently deprecates and removes stuff a couple versions later.

So all you need is one dep that needs Guava version X with method M that is removed in version X+2 (say) and another dep that needs something new introduced in version X+2, and you have a Guava version conflict. That's, Guava releases are not backwards compatible due to removal of classes and methods.

You can sometimes fix this with a technology like shade or OSGi or whatever to allow private copies but it does not always work.


Transitive dependencies on Guava 19, 20, 21 can lead to runtime crashes if your dependencies differ in what guava versions they expected when they were compiled:

https://www.google.com/search?q=guava+nosuchmethoderror




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: