Here's a schema I use in production, see how readable it makes the parameters of the API and how quick all the validation and normalization is:
At the end, you get an object called data, and you can do data.title, data.language, etc, and be sure that everything is as you expect.
Schematics has a very useful feature in "roles" - i.e. a good way of hiding certain fields in certain situations (e.g. admin views vs self vs other vs anonymous). Does marshmallow have something similar?
I've used it for parsing various network protocols to good effect.
Thank you so much for providing this example. I couldn't grok what schema did, and your code made it make sense.
If people want to provide examples for the other libraries in this thread, you'll be popular :)
I generally like the simplicity of Rx and the fact that's language agnostic (I've used it with both json and yaml and other serialization libraries), with schemas themselves being written both systems. However, the lacking documentation has always been a problem.
I have yet to find a validation library that supports all the sorts of things I expect it to. One place they tend to fail is the ways they can fulfill default values.
Say a field has an error and I want to just give it a default value when it's broken? That particular feature doesn't exist in any library i've found so far (for python) =/
Granted, this sort of blends into the usual "It's a validation library not a serialization library". But they all make a half-assed attempt at the other side in my experience.
It does progress bars too...
I like Click because it does very simple things very well. It gets hairy when you want to build more complex CLIs. For example, value options and validators don't play nicely because Click doesn't distinguish between the absence of a value and an invalid value, so you wind up dropping Click features and rewriting your own plumbing for that kind of stuff. The alternative is writing a more verbose CLI grammar, which leads to a really clunky UI.
I also find myself using click even when I don't want a CLI. The pretty-printer (`secho`) and progress bars are extremely handy, plus some of the other stuff in utilities. It's quite nice that they handle detecting when output is an interactive terminal versus piping to a file.
Interestingly enough, my biggest use of Pandas is to serialize to and from HDF5. I work with a lot of large datasets and Pandas simplifies using HDF5 quite a lot.
Print-debugging on steroids. This really does make things so much easier, especially when dealing with huge apps you don't have time to learn. Not just useful as a dev but also as a sysadmin.
q is for when you want to log data, pudb is for when you want to step through and evaluate lines in-context. It's very possible that you'll want to use both together.
Some of the new libraries I'm using this year that I've found really handy include:
Odo - http://odo.readthedocs.org/en/latest/ It is ridiculously handy for converting data from one format to another - especially for transforming a table from a database or csv into a DataFrame and back.
Arrow - Makes for quick datetime processing. - http://crsmithdev.com/arrow/
Xlsxwriter - http://xlsxwriter.readthedocs.org - I'm building beautiful reports, with charts, using this tool. As someone who moves data around a lot, but has to work with less technical business and analyst folks, this is becoming my goto for handing them some data to play with.
Blessings - https://pypi.python.org/pypi/blessings - as I get older staring at simple black and white text on the screen seems to be getting harder. Putting a little color and flare in my command line interfaces cheers me up even if it doesn't do much in the way of actually getting the job done.
Lastly, switching from curl to httpie was a huge help in working with API's of all sorts. It solved a problem I didn't even know I had. https://pypi.python.org/pypi/httpie
It looks like a much cleaner Scrapy-inspired spider framework, without the twisted dependency. And it's python 2+3 compatible. I'm very excited to try it out.
 - https://github.com/binux/pyspider
A few years ago I tried to set up a Mac with a scientific computing stack and it took me days to hack my way through all the various dependencies and incompatible versions. Anaconda now lets me do that in a minutes.
One issue with using your system's packaging system is that a lot of system utilities are written in Python which makes it harder to play around with new versions, bleeding edge libs, etc.
I don't know if that's the case (curl being part of the package) now, with Anaconda 3 2.4.0+? It certainly isn't so when installed via pyenv, so I'm happy with that. But there were other issues in the past build...BeautifulSoup was inexplicably broken. I mean that it simply did not correctly parse non-trivial HTML pages and yet threw no errors. The results could be replicated for all of my students but I never could isolate the issue... I installed Python 3 and the same version of BS4 from scratch and had no problems, but I can't imagine where the Anaconda build would have gotten wrong. It ended up being OK since I just switched to lxml which I now happily use over BS4 on any day, but it was frustrating to not be able to diagnose the problem (I didn't get a response in the support forums either). I'm assuming this problem has gone away in subsequent versions of Anaconda though I haven't tried since lxml is perfectly fine to me.
And finally...well, I have to admit it, but I use Python like a goddamned moron in that I still don't know how to use virutalenv/venv to do proper dev isolation. And from the brief research I did, I see that Anaconda has its own conventions, or work flow...something with the conda utility. Again, I can see why it's necessary for Anaconda's use case (people who want to do data science and not hand-tweak their environment every time they upgrade a package over pip), but it added too many layers for me at the time.
I was the same way for quite a while, until I bumped into pyenv-virtualenv. Just install that plugin, and you can do, eg,
pyenv virtualenv 3.5.1 my-project
pyenv virtualenv my-project
Once you do that, pyenv treats `my-project` just as another installation of Python. In fact, `my-project` will show up in the list of installed versions (`pyenv versions`), and you can switch to it:
pyenv global my-project
And voila! You have your own virtual environment that can contain its own list of libraries.
And no, I'm not a shill for the creator of pyenv, I just really like the software.
And ehhh, I've been downvoted and bitched at about evangelizing pyenv before. Just thought I'd preempt that. But yes, it's an amazing piece of software. :)
FYI, I've put together a bash function for my .bash_profile that adds an indicator to my prompt showing the current Python version/virtualenv in use. That's saved me a bit of frustration when going into a directory where a local pyenv version overrides the global version.
I use pyenv and pyenv-virtualenv to easily keep track of Python versions and virtual environments. I keep one virtual environment for each project I'm working on, and things prettymuch Just Work.
I feel I'm able to write much more concise test scripts than I could with unittest.
Why would anyone need a simple NoSQL? Why would you go the NoSQL route if it isn't a HUGE complex database?
Python 2.7.8 |Anaconda 2.1.0 (64-bit) ... on win32
>>> import bsddb
>>> print bsddb.__doc__
Support for Berkeley DB 4.3 through 5.3 with a simple interface.
For the full featured object oriented interface use the bsddb.db module instead. It mirrors the Oracle Berkeley DB C API.
stdlib have various ways to persist, if needed.
I really only use conda for the non-python bits of our stack: numpy/scipy/pandas etc - packages that are a pain to install on Heroku.
Especially with all the blas linking. Was there anything special you had to do or was it simply conda install numpy-blas or something like that ?
We use conda-env (https://github.com/conda/conda-env) which makes it really simple to manage environments locally and in remote environments.
Instead of requirements.txt we use py.prereqs, but otherwise this script should get you close.
Incidentally is the buildscript of anaconda itself opensource? Couldn't find it anywhere.
also: type checking on function signatures
Using JSON Schema with Python to validate JSON data:
It would be great if it had ipython notebook support. I often end up doing long operations that scrape services for data but have no idea what their progress is.
For me 2015 has been the year of tox. It is a great tool and worth using for just about any python project.
A lot of these "magic" tools fall apart when you're trying to do something slightly more structured than "throwaway bunch o' functions".
I might even get around to learning Python 3 after only ... what? seven-or-so years?
I think for any greenfield project, there's very little reason to use CPython 2 anymore. If you want performance, use PyPy. If you want features, use CPython 3. From now on, that's the philosophy I'm following whenever I write something new.
The statement-to-function migration for print is, IMO, generally an improvement, but in any case its not a change in the latest versions of python, except with an unusually broad interpretation of latest; its a change in Python 3.0, which was released a little over 7 years ago.
It was a glaring inconsistency.