Save yourself time and effort down the road and read through both libraries' documentation, they're well worth the effort:
I tend to use defaultdict, deque (thread safe), namedtuple, imap, izip, drop/takewhile. In Python 3, map and zip have been replaced with their itertools equivalents.
I blame Haskell for all the lazy evaluation influence. :P
I'm in the process of converting a middle-sized PHP codebase completely over to Scala. It is not uncommon for me to see functions that run between 15 and 30 lines shrink to 3 or 4 line functions thanks to the Collections API. On top of that its performance is something you'll never see in PHP or Python. I love Python for its readability and wonderful syntax, but Scala is starting to have an even greater pull on me for its built in concision and the ability to use the entire Java ecosystem without submitting to java's imperative style.
Not only in Python, but programming languages in general.
I still find people writing Java or .NET code that aren't aware of all nice classes that are part of the runtime and end up creating their half baked solutions for their problems.
Nowadays developers seem to code without reading.
When your standard library documentation is so vast that it would take weeks to read and understand it all, and you'd never remember most of it anyway without context and experience using it, I don't think "coding without reading" is really a fair complaint.
We as an industry need to get better at documentation, and in particular about separating tutorial/overview documentation that presents a map and summary of what's available from reference documentation, or we're going to keep reinventing wheels like this.
Python is a particularly unfortunate example, because while its documentation is vast, it has very little tutorial/overview material beyond the very basics. For example, given that a substantial proportion of Python's standard library actually doesn't work very well in practice, it would be helpful to have a deeper tutorial/map document somewhere that introduced the various areas of the standard library and that also promoted the good ones and suggested popular alternatives for the not so good ones where they exist.
As an example an old colleague wanted to dump some data from python to a csv-file and did this by for-looping through each row and each item and concatenating each cell and a semicolon to a string. Even after pointing out to him that python already has a built in csv writer, that handles all issues of escaping etc, he didn't want to use it because he didn't know what it did and didnt want to learn anything new. His version didn't even do escaping inside the for-loop and he didn't see the issue of not doing it. To him the for-loop gave exactly the same result and didn't require any learning and was thus better, and why change something that works... My last suggestion was to at least use ";".join(...) but it was also a bit too magic so he stuck to his well known for-loop.
Usually standard libraries are quite reliable but in some cases, and especially if adding third party libraries, bugs and performance issues inside the library can really give you hell. If the library is supposed to just perform a simple task maybe you would rather implement it yourself as you then also have influence to fix those issues yourself later. Experiences like this can scare you away from even the most reliable libraries in the future.
I think part of the problem is that the statement above is maybe not as true as it used to be.
Let's stick with Python as an example, though it's far from the only culprit so I hesitate to single it out here. I have a growing list of areas of the standard library that today I just assume won't work acceptably. I have tried to use them before, and I have found them to be either bug-ridden or not robustly portable or so slow as not to be worthwhile or missing enough basic functionality that you need to add something else anyway or just write everything from scratch. The everyday stuff in Python is pretty good, the basic data structures and common supporting functions like itertools, but when you start getting into the less common areas I have a very low opinion of the design and quality of the Python standard library, and that opinion is born of direct personal experience.
On top of the quality and robustness, there's also usability to consider. Even if some of Python's built-in libraries do work, there might be much neater, easier ways to achieve the same result that are only a `pip install` away. Libraries like Kenneth Reitz's Requests come immediately to mind; if I were teaching a newbie to program Python tomorrow, somehow I doubt urllib[N] would feature much.
I'm not sure how that hypothetical newbie is supposed to discover these things today without someone experienced to guide them, though. Whether it's Python and PyPI or Perl and CPAN or C++ and Boost or whatever other language and library repository you like, there's a lot of collective wisdom about the easiest/safest/fastest ways to get things done, but it lives in the combined experience of veterans rather than in comprehensive tutorials to follow once you've got the basics down. And that's only when there is already a recognisable place to look for general use third party libraries, not even considering all the third party libraries that might be out there but for whatever reason aren't incorporated into any de facto standard repository to make discovery (relatively) easy if you at least know what you're looking for.
Is it any wonder that newbies reinvent wheels under these conditions? It seems almost inevitable to me.
The subprocess system is fairly awful in both usability and portability.
The shutil filesystem tools had bugs and documentation issues the only time I ever tried to use them.
The various compression libraries had horrible performance problems last time I tried them; shelling out to various command-line equivalents was around 4-5x faster.
The command-line parsing tools are OK if you want to write a *nix-style command line tool, but not quite flexible enough for more advanced/customised uses.
I have yet to discover any decent GUI library for Python, standard or otherwise, so I'm not sure whether this one counts.
Logging is flexible but can be awkward to configure, particularly across an application that wants various logging itself but also uses libraries that offer to log.
But what if I want something that isn't like Git? I'm slightly amused that anyone would suggest Git as some sort of example of a good CLI, but in any case, not all platforms share the command line conventions of *nix shells.
Suppose I'm running on Windows (where options conventionally start with '/') and I don't want all the magic that argparse does with initial '-' characters. If I set prefix_chars to '/', does that also disable the '--' pseudo-argument? We were originally talking about documentation, and as far as I'm aware, the documentation for argparse doesn't actually specify this either way.
Suppose I want to have a set of basic choices, each setting a flag to say it's there. What if I also want some shortcut choices that represent combinations of the basic ones and set all of the corresponding flags? As far as I'm aware, you can't quite do this with any of the standard actions, so you have to start writing an entire new class to define a custom action instead. At least you can do that, but what was wrong with accepting a simple function, and where does anything say how argparse.Action is actually defined and why it's necessary instead?
Suppose I want to present the same data as the automatic help option, but reformat it in some completely different way that makes more sense for my program before it gets printed? There are assorted functions to display or return formatted help strings, but nothing seems to just give back a neat bundle of the relevant information for further processing. Collecting the data and rendering it for output are conflated.
Argparse, like much of the Python standard library, has a lot of power as long as you want to do things exactly its way, but it's not designed in a way that is particularly easy to extend. IMHO, a better strategy for designing standard libraries for languages is to create templates/frameworks/whatever you want to call them, and then to provide some specific implementations for basic cases. This way, when inevitably someone needs to go beyond the out-of-the-box functionality, they can still fit in with established conventions instead of starting over from scratch, which is generally better both for compatibility and for minimising the amount of extra logic that much be built on top of the tried and tested standard library. Of course you do have to be careful not to go too far and make simple cases look artificially complicated, but no-one ever said designing good APIs was easy. :-)
I taught high school for a while, and I had kids that refused to stop using their fingers for addition, regardless of the fact that it was preventing them from learning how to do more abstract math. What you're describing is pretty much the same attitude.
I'm really not trying to bash the students. As a teacher, my job was to invest the students in wanting to learn, and I admittedly wasn't always effective.
I am old enough to remember the days the only way to learn how to program was to go through, sometimes very dry, books and manuals. There was no Internet on those days.
Young developers seem like spoiled kids that want to do something right away, without setting the time to learn how to do it properly.
Join the club. We're getting T-shirts made. :-)
The thing is, in those days we really could learn all the commands of an operating system shell by reading the manual cover to cover in an afternoon, or play with graphics demos or write low-level system utilities after reading the Pink Shirt Book.
Today's systems are so vast and complicated that anything offering similar coverage in book form would be the size of an encyclopaedia, so the way we were able to learn doesn't scale to modern needs.
The trend over the years has definitely been towards writing glue code and joining up ready-made components for a lot of professional work rather than reinventing things from scratch, and in some ways that's no bad thing. However, I think it only works if you know what you've got available in your toolbox, and so does being the person who understands and creates new components. Either way, it comes back to needing a way to navigate the vast amounts of information now available and pick out the bits you need to achieve whatever it is that you're trying to do.
As someone who started programming 4 year back (which from the conversation seems to be pretty much "nowadays" :-)), I think it is a bit of a generalization. I am not saying it's not true. There are certainly places such as StackOverflow, mailing lists etc that attract newbies early on because they provide quick answers or even code, but at some point every developer who is serious about computer programming as a long term profession does need to start reading the docs. There is no other alternative and one eventually comes to realize that it's much faster than arbitrarily hunting for code and asking questions on mailing lists and IRC.
It also depends upon the style of programming language (imperative/functional) and the previous experience of the developer IMO. For eg. I find my self reading the docs significantly more in Erlang/Scala than in Python than in PHP/JS. It is also the reverse order in which I learnt these languages. Of course that is my personal experience.
- processing files with imap and ifilter to rapidly grab data, find a subset of it, then process it with a function
- defaultdict(list) is incredibly useful for collecting data, arranging it by a certain key (like date or object id), then collecting into a list
- namedtuple is occasionally useful for efficiently stuffing data into an object with a few named attributes.
def flatmap(f, items):
return itertools.chain(*map(f, items))
2. it's being eagerly unpacked through *, itertools.chain also provides a from_iterable method which doesn't have that issue (and can be used to flatten infinite streams), introduced in 2.6
So `flatmap` would probably be better as:
def flatmap(f, items):