Hacker News new | past | comments | ask | show | jobs | submit login
'New' Python modules of 2015 (rtwilson.com)
591 points by ColinWright on Dec 23, 2015 | hide | past | favorite | 131 comments

Hands down, my favorite new library is schema:


Here's a schema I use in production, see how readable it makes the parameters of the API and how quick all the validation and normalization is:


At the end, you get an object called data, and you can do data.title, data.language, etc, and be sure that everything is as you expect.

For those of us more dictionary oriented, there is https://pypi.python.org/pypi/voluptuous (which is OK for the most part, as long as you are only trying to do validation, and nothing too crazy)

Similar, I've used this library with a lot of success: https://marshmallow.readthedocs.org/en/latest/

+1 for marshmallow - most of the serialization libraries are 80% there, but marshmallow has it 95% down - all the weird corner cases about nested models and lists of nested models and all that. Plus the dev is very helpful and courteous on github.

I've tried truckloads of Python serialisation libs over the last few years and marshmallow is the one that finally makes me feel like I don't need to look for another one.

It's very easy and does everything I've needed it to!

Looks extremely similar to https://github.com/schematics/schematics . Always good to see two projects approach the problem the same way independently - higher chance that the solution is right :)

Schematics has a very useful feature in "roles" - i.e. a good way of hiding certain fields in certain situations (e.g. admin views vs self vs other vs anonymous). Does marshmallow have something similar?

Marshmallow is also my favourite lib of the year. Pyramid guys love it too! The developer is responsive, and the library moves really fast!

That looks great, I've built similar things myself and it's great to find someone elses I can use.

hear hear. Steve Loria, the author, has done tremendous work this year. Thanks Steve!

On the binary side Construct is quite nice: https://construct.readthedocs.org

I've used it for parsing various network protocols to good effect.

Huh, voluptuous looks pretty much exactly the same as schema. Or at least a subset of it, as it doesn't have And/Or/Use, as far as I can see.

I believe it may, perhaps analogous to its Any and All, etc.

voluptuous is simple and useful

There is also JSON-Schema protocol: http://json-schema.org/implementations.html curious how schema may be superior :).

Yup, this one is way easier to use and very cross-language compatible.

> Here's a schema I use in production, see how readable it makes the parameters of the API and how quick all the validation and normalization is: https://www.pastery.net/mhwwnv/

Thank you so much for providing this example. I couldn't grok what schema did, and your code made it make sense.

If people want to provide examples for the other libraries in this thread, you'll be popular :)

Anyone that have been using schema after using 'rx' (https://pypi.python.org/pypi/Rx/)?

I generally like the simplicity of Rx and the fact that's language agnostic (I've used it with both json and yaml and other serialization libraries), with schemas themselves being written both systems. However, the lacking documentation has always been a problem.

Any relation to the Clojure Schema? https://github.com/Prismatic/schema

Possibly? It certainly looks similar.

Optional being an attribute of the key rather than the value is pretty bizarre to me, I've seen that patten a couple times. Feels like the wrong way to go about it

Think about it in terms of composability of validators; "having a key" is a property of the dict/object, not the value that is stored there. I feel your intuition, but experience says otherwise.

I see what you mean, but it's the key that's optional, not the value.

Ahh. That makes sense then. The value of the key being optional is indeed a different concept. Fwiw, this is one huge downside of Swagger APIs. Swagger supports the concept of optional keys but not optional values >.>

I have yet to find a validation library that supports all the sorts of things I expect it to. One place they tend to fail is the ways they can fulfill default values.

Say a field has an error and I want to just give it a default value when it's broken? That particular feature doesn't exist in any library i've found so far (for python) =/

Granted, this sort of blends into the usual "It's a validation library not a serialization library". But they all make a half-assed attempt at the other side in my experience.

I like Schema and Voluptuous enough that as a polyglot who mostly works in PHP I'm working on a PHP version at the moment.

In similar vein is valideer: https://pypi.python.org/pypi/valideer

What's the difference between these sort of libraries and an ORM? Don't ORM's provide built in validation and serialization?

They are useful when you want backend-independent orm (say you want to save your objects as JSON in ceph in deployment, and in sqlite in development)

I thought most ORMs were back-end independent. Isn't that 1/2 of their value proposition? The other half being accessing a persistence layer in native code.

The Python module I learned to love this year is Click. Gets me better command line interfaces fast. URL is http://click.pocoo.org.

It does progress bars too...

Wow, thanks for this...I was literally in the middle of writing a CLI tool to fetch a URL and parse out metadata and was getting a little tired of the argparse route. I browsed the documentation and honestly can't say that I immediately grok the advantages but given who its author is, I'm more than happy to switch libraries in mid-coding :)

I'm looking for a new CLI library.

I like Click because it does very simple things very well. It gets hairy when you want to build more complex CLIs. For example, value options and validators don't play nicely because Click doesn't distinguish between the absence of a value and an invalid value, so you wind up dropping Click features and rewriting your own plumbing for that kind of stuff. The alternative is writing a more verbose CLI grammar, which leads to a really clunky UI.

docopt was inspiring in its own way, but click was 'simpler' do toy with.

I find click to be much more natural too. Docopt was always touchy about more complicated CLIs when I used it (probably because a lot of "magic" happens behind the scenes), whereas click lets you drill down and arrange things just so. It also feels very well designed, hats of to Ronacher as usual for being good at designing Python libraries.

I also find myself using click even when I don't want a CLI. The pretty-printer (`secho`) and progress bars are extremely handy, plus some of the other stuff in utilities. It's quite nice that they handle detecting when output is an interactive terminal versus piping to a file.

I'm loving Click too.

I'm feeling a lot of love for Pandas. Any (biology related) project I work on starts with multi-headered dataframes and ends in beautiful Seaborn graphs. In combination with Jupyter notebook I breeze through large data sets while leaving a perfect trail of what goes on in the data pipeline. Python is great.

Seaborn is great for visualization; it basically packages up some of the more specialized R plots for MPL. The other thing I really love about it are plotting contexts, which make it really easy to properly size and format the same plot for e.g. a poster and a paper.

Interestingly enough, my biggest use of Pandas is to serialize to and from HDF5. I work with a lot of large datasets and Pandas simplifies using HDF5 quite a lot.

For those interested, you can also get nice styles with just matplotlib (including seaborn style) using stylesheets:


I had no idea it existed, very useful, thanks!

I've been loving Pandas used alongside Seaborn as well. It's really just so easy to manipulate/visualize my data (and make it look gorgeous) with the combo.

Pandas are great. Looking forward to Dask.

Not really 2015 but Q! https://pypi.python.org/pypi/q

Print-debugging on steroids. This really does make things so much easier, especially when dealing with huge apps you don't have time to learn. Not just useful as a dev but also as a sysadmin.

What's the advantage of Q over pdb?

First of all, pudb is fantastic, just use it over pdb all the time.

q is for when you want to log data, pudb is for when you want to step through and evaluate lines in-context. It's very possible that you'll want to use both together.

They don't do the same thing. Q does printing. import q; q(var) -> prints var to /tmp/q, with syntax highlighting, separate files for big output, etc. It can also do lots of other cool things, cf 1-page documentation in the link. :)

Wow, there are some great new tools to explore. Thanks!

Some of the new libraries I'm using this year that I've found really handy include:

Odo - http://odo.readthedocs.org/en/latest/ It is ridiculously handy for converting data from one format to another - especially for transforming a table from a database or csv into a DataFrame and back.

Arrow - Makes for quick datetime processing. - http://crsmithdev.com/arrow/

Xlsxwriter - http://xlsxwriter.readthedocs.org - I'm building beautiful reports, with charts, using this tool. As someone who moves data around a lot, but has to work with less technical business and analyst folks, this is becoming my goto for handing them some data to play with.

Blessings - https://pypi.python.org/pypi/blessings - as I get older staring at simple black and white text on the screen seems to be getting harder. Putting a little color and flare in my command line interfaces cheers me up even if it doesn't do much in the way of actually getting the job done.

Lastly, switching from curl to httpie was a huge help in working with API's of all sorts. It solved a problem I didn't even know I had. https://pypi.python.org/pypi/httpie

I am using Scrapy a lot. http://scrapy.org/ It is very well designed web crawling library.

I found out, through reddit a couple of days ago, about Pomp:


It looks like a much cleaner Scrapy-inspired spider framework, without the twisted dependency. And it's python 2+3 compatible. I'm very excited to try it out.

It looks very promising. I will give it a try.

I love that the Python community is coming up with so many robust frameworks. I have used Scrapy, and it provides most boilerplate functionality out of the box. Gone are the days I would use wget for my scraping tasks. This being said, it's a bit disappointing that Scrapy still doesn't have native support for dynamic pages. I am hoping to see this feature in the upcoming releases. With more and more of the web becoming dynamic, this should be priority feature. Other than that, I have nothing but praise for Scrapy. Pomp is apparently a simplistic take on Scraping, but it doesn't handle redirects, caching, cookies, authentication etc. I wonder if it provides parallel processing out of the box; most likely not. This is a bit strange, because these are the features for which I would prefer using a framework over writing my own code, which makes me less inclined to try Pomp. Scrapy for the win :D

Just seeing this for the first time. Got a proof-of-concept demo working in no time (after fussing with install requirements...). Looks like a great tool for non-devs like me who still need to scrape things occasionally for data collection and analysis.

My vote goes to pyspider[1]

[1] - https://github.com/binux/pyspider

Another vote for scrapy, massively useful for web crawling. Particularly love the shell for debugging things.

No Python 3 support yet?

unfortunately not yet, just tried it yesterday. There has been some progress but it looks like it might still take a while [1].

[1] https://github.com/scrapy/scrapy/issues/263

Interesting list. I love Anaconda!

A few years ago I tried to set up a Mac with a scientific computing stack and it took me days to hack my way through all the various dependencies and incompatible versions. Anaconda now lets me do that in a minutes.

Anaconda is very underrated, I use it both on my linux and mac systems. Also check out: Simple, Clean Python Deploys with Anaconda http://blog.stuart.axelbrooke.com/deployment-with-anaconda

I love it on Windows but I wonder why you use it on Linux. What does it do better than the alternative (native packages and pip)?

One word - isolation.

One issue with using your system's packaging system is that a lot of system utilities are written in Python which makes it harder to play around with new versions, bleeding edge libs, etc.

Virtualenv is pretty good for isolation.

Not for binary packages, which usually end up requiring libraries installed into /user/local (and let's not get started about the mess that is Python binary deployment on Windows).

I use pyenv on my Mac https://github.com/yyuu/pyenv

infinite8s beat me to it, exactly what he/she said.

Me too...I taught a python class by making everyone download Anaconda's distribution of 3.x...and everyone could do the assignments no matter what kind of computer they used. Anaconda does a little too much for me to have it be my own default install but it does quite well in on boarding beginners. I use pyenv to install maintain Anaconda on my own machine when I need to replicate student work

What does it do that ends up being a little too much for your use?

It takes precedence in the path over everything...and in the last version I used (before I upgraded to OS X El Capitain and wiped out everything), things like `curl` were provided [1] ...which I completely understand for Anaconda's use case, but it caused a lot of confusing grief to me when I hadn't expected that and OpenSSL was having its rough times.

I don't know if that's the case (curl being part of the package) now, with Anaconda 3 2.4.0+? It certainly isn't so when installed via pyenv, so I'm happy with that. But there were other issues in the past build...BeautifulSoup was inexplicably broken. I mean that it simply did not correctly parse non-trivial HTML pages and yet threw no errors. The results could be replicated for all of my students but I never could isolate the issue... I installed Python 3 and the same version of BS4 from scratch and had no problems, but I can't imagine where the Anaconda build would have gotten wrong. It ended up being OK since I just switched to lxml which I now happily use over BS4 on any day, but it was frustrating to not be able to diagnose the problem (I didn't get a response in the support forums either). I'm assuming this problem has gone away in subsequent versions of Anaconda though I haven't tried since lxml is perfectly fine to me.

And finally...well, I have to admit it, but I use Python like a goddamned moron in that I still don't know how to use virutalenv/venv to do proper dev isolation. And from the brief research I did, I see that Anaconda has its own conventions, or work flow...something with the conda utility. Again, I can see why it's necessary for Anaconda's use case (people who want to do data science and not hand-tweak their environment every time they upgrade a package over pip), but it added too many layers for me at the time.

[1] https://groups.google.com/a/continuum.io/forum/#!topic/anaco...

> And finally...well, I have to admit it, but I use Python like a goddamned moron in that I still don't know how to use virutalenv/venv to do proper dev isolation.

I was the same way for quite a while, until I bumped into pyenv-virtualenv[1]. Just install that plugin, and you can do, eg,

    pyenv virtualenv 3.5.1 my-project
to get a virtual environment called `my-project` based off of Python 3.5.1 (assuming that you've installed 3.5.1 via pyenv, of course). Or, you can just do

    pyenv virtualenv my-project
to make a virtualenv called `my-project` based off of the current version of Python that you're using.

Once you do that, pyenv treats `my-project` just as another installation of Python. In fact, `my-project` will show up in the list of installed versions (`pyenv versions`), and you can switch to it:

    pyenv global my-project
(Or you can switch at the local or shell levels. Whichever.)

And voila! You have your own virtual environment that can contain its own list of libraries.

And no, I'm not a shill for the creator of pyenv, I just really like the software.

[1]: https://github.com/yyuu/pyenv-virtualenv

Thanks for this...wrapping it up in pyenv is a lot more familiar to me. And why would you apologize for shilling for pyenv?...it's amazing :) (as is rbenv, its inspiration)

You're very welcome.

And ehhh, I've been downvoted and bitched at about evangelizing pyenv before. Just thought I'd preempt that. But yes, it's an amazing piece of software. :)

FYI, I've put together a bash function for my .bash_profile that adds an indicator to my prompt showing the current Python version/virtualenv in use[1]. That's saved me a bit of frustration when going into a directory where a local pyenv version overrides the global version.

[1]: https://github.com/jackmaney/bash-profile/blob/ddb57091aab44...

A few years back I made the mistake of allowing Anaconda to prepend it's path to .bashrc without realizing it. I was a bit of a novice back then, but I had a number of existing projects in virtualenvs on my system and was rather upset when everything stopped working because my default python had changed. For those out there that would like to test out Anaconda but already have a lot of projects using their default installation, I would recommend using this installation guide to keep things separate:


Yep, on Windows, Anaconda is a godsend.

While there are certainly advantages to Anaconda, I've never encountered any troubles installing Pandas, NumPy, SciPy, or scikit-learn on any OS X or linux system. In my experience, getting GCC up and running is far more of a pain in the ass (and it usually isn't even that bad).

I use pyenv[1] and pyenv-virtualenv[2] to easily keep track of Python versions and virtual environments. I keep one virtual environment for each project I'm working on, and things prettymuch Just Work.

[1]: https://github.com/yyuu/pyenv

[2]: https://github.com/yyuu/pyenv-virtualenv

No kidding! Thankfully brew makes this more pleasant.

I've got a lot of love for py.test right now. http://pytest.org/latest/

I feel I'm able to write much more concise test scripts than I could with unittest.

Looks down (503), here is a cached version: http://webcache.googleusercontent.com/search?q=cache:Bf9Iv63...

Pyrasite. When you have a running python app that is behaving oddly and you can't replicate the bug elsewhere, you can run python code inside the running process - without any preparation beforehand - to display stack trace, output vars,...

Anaconda has been a lifesaver, because it can be installed and managed quite easily without root privileges (it even installs pip). Some of the sysadmins where I work are slower than molasses when it comes to installing python packages (as in, it takes months of repeated emails from multiple people to get anything done), and what is installed is often years out of date.

While we're on the subject of python modules: sqlalchemy is, and will probably continue to be, my favorite library for any language.

> I went looking for a pure-Python NoSQL database and came across TinyDB…which had a simple interface, and has handled everything I’ve thrown at it so far!

Why would anyone need a simple NoSQL? Why would you go the NoSQL route if it isn't a HUGE complex database?

You've already got a perfectly fine KVS built into the language.

Could you name it?

This is probably what he meant:

$ py

Python 2.7.8 |Anaconda 2.1.0 (64-bit) ... on win32

>>> import bsddb

>>> print bsddb.__doc__

Support for Berkeley DB 4.3 through 5.3 with a simple interface.

For the full featured object oriented interface use the bsddb.db module instead. It mirrors the Oracle Berkeley DB C API.

bsddb is deprecated as of python 2.6 and removed in python 3

That particular back-end is deprecated, but the same API is provided by the dbm/gdbm/dumbdbm modules. Those still exist in Python 3, although they've been consolidated under one top-level module.

also, no unicode keys for this or shelve in 2.7 which really causes some coding pain. (not sure about 3.X)

Ahhh... Thanks, never knew that!


dict type

stdlib have various ways to persist, if needed.

It's really good for prototyping things, actually.

Folium + Geopandas is my new goto GIS toolkit this year

We've transitioned our local/dev/prod instances to use conda on Heroku, and couldn't be happier. It was a tiny bit of work to get it set up, but now everything is consistent, and we can set up new local environments in seconds.

So I have been considering this. does conda track pypi or does it lag it? I have been concerned about moving over my requirements.text for a webapp with lots of dependencies

It slightly lags, but you can include pip requirements in an environment.yml file, and they install normally.

I really only use conda for the non-python bits of our stack: numpy/scipy/pandas etc - packages that are a pain to install on Heroku.

It's also pretty straightforward to set up your own Conda package tree. Nice for packaging your app for deployment or making sure you have very precise dependencies.


I think deployment is a solved problem with docker. Its libraries like blas,etc that are a huge pain. I'm not sure why static linked bumpy is not possible - even anaconda could not achieve it.

If you've ever tried to dive into the NumPy build process you'd see why. It's unbelievably complicated... not that they really could do it better given that they are compiling about a billion scientific libraries and support alternatives and optimizations (like MKL).

Yes - unfortunately I have and I failed miserably. These days I'm trying to see if there's a docker build that can build a great numpy (with all optimizations). Interestingly there are even docker images to call cuda APIs from python.

They are a pain to install on a desktop as well!

Especially with all the blas linking. Was there anything special you had to do or was it simply conda install numpy-blas or something like that ?

Fro Heroku we created/modified a custom buildpack (https://github.com/joshowen/conda-buildpack), and use that with the multibuildpack.

We use conda-env (https://github.com/conda/conda-env) which makes it really simple to manage environments locally and in remote environments.

We have to use a mix of pypi and conda since quite a few of our dependencies are not in conda. We have a script which checks conda first, then falls back to pypi, all from one requirements.txt

Any chance you can share that script? I'm looking for something similar, since I too am using both conda and pypi for dependencies.


Instead of requirements.txt we use py.prereqs, but otherwise this script should get you close.

Why not conda all the way? Anything of concern?

Incidentally is the buildscript of anaconda itself opensource? Couldn't find it anywhere.

Not sure if this is what you're looking for: https://github.com/conda/conda

Obiwan pypi.python.org/pypi/obiwan/

validating JSON

also: type checking on function signatures

I really like lists like this. I get updates daily on which of my github friends (is that what they're called?) have starred and there is no real reason why they're following a project. I can look at the README and guess. I did see someone start following this project the other day https://github.com/elastic/elasticsearch-dsl-py, which seems pretty interesting. Has anyone used it?

It's something like Django models but with Elasticsearch. You can create object classes and then save them to Elasticsearch, query them, etc. It's built on the lower-level https://github.com/elastic/elasticsearch-py. Very handy.

tqdm looks super promising. progressbar and progressbar2 end up being complicated and weird enough to use that my company ended up making wrappers. Why maintain that when you can just use a library that works out of the box.

It would be great if it had ipython notebook support. I often end up doing long operations that scrape services for data but have no idea what their progress is.

For me 2015 has been the year of tox. It is a great tool and worth using for just about any python project.

tqdm is failing for me on Windows at the moment. To be fair, it might not be its fault (I'm mixing it with Blinker signals), but still I'm slightly disappointed.

A lot of these "magic" tools fall apart when you're trying to do something slightly more structured than "throwaway bunch o' functions".

tqdm does have ipython notebook support :). Progress is displayed in stderr field under cell. But no progress bar AFAIR, just textual information.

progressbar2, can also takes iterables as input, for easy display of progress bars

There's something to be said for the nearly non existent api provided by tqdm.

"python-bond" also came out in 2015: https://pypi.python.org/pypi/python-bond allows a simple interface between python runtimes and php/perl/nodejs.

There is a lot of overlap between these languages, and I wonder what any of them can do that can't be done in pure Python.

It's not much about language, it's mostly about code/ecosystem re-use (that is: if you have a library available in system X and you're writing for Y you can still take advantage of it).

Easy platform independent gui as webpage. Used it on my pc a robot and raspberry pi.


I haven't used Python in a while, but I shall once more look into it. Python was always fun. Guess I'll do that over christmas, see where it takes me.

I might even get around to learning Python 3 after only ... what? seven-or-so years?

Just write the next thing in Python 3, there's not really that much to learn right away as much as there are minor surprises that you can very quickly get up to speed on as you encounter them.

I've been dragging my heels on this for a long time, but I'm finally starting to take the plunge into 3.

I think for any greenfield project, there's very little reason to use CPython 2 anymore. If you want performance, use PyPy. If you want features, use CPython 3. From now on, that's the philosophy I'm following whenever I write something new.

I have been a bit 'off Python' for a while, but this list article prompted me to take a renewed look at it because of Jupyter Notebook, and I have to say I'm quite impressed .. this is a really nice way of working on code, wow .. especially using folium this way is very cool.

Are there any ruby alternatives to tqdm?

Also interested in the answer, because if there are none, it's something I could look into doing over a weekend!

or node?

It's only around 100 lines of code. Should be easy to replicate in language of choice: https://github.com/noamraph/tqdm/blob/master/tqdm.py

This is a tangent, but the most annoying change in the latest Python versions is you can no longer write print "foo". Now it has to be print("foo"). Damn kids ruining my language.

> This is a tangent, but the most annoying change in the latest Python versions is you can no longer write print "foo". Now it has to be print("foo").

The statement-to-function migration for print is, IMO, generally an improvement, but in any case its not a change in the latest versions of python, except with an unusually broad interpretation of latest; its a change in Python 3.0, which was released a little over 7 years ago.

I switched to Python three a few months ago. And it still gets me and I end up typing "print variable", only to have Python complain. It's probably stuck with me because of it's simplicity, though I do understand why they've removed it. It's an aberration in the syntax, for lack of a better way of putting it.

I was always way more annoyed having to remove/add parans when I swapped a logging, Exception, or write to a print.

It was a glaring inconsistency.

You just wake up Rip Van Winkle?

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact