Hacker News new | past | comments | ask | show | jobs | submit login
PyFormat – Practical examples of old and new style string formatting (pyformat.info)
159 points by hgezim on Apr 15, 2015 | hide | past | web | favorite | 65 comments



A good reference.

My take home is that every single version that is possible to do in old style formatting is longer in new style formatting. Even more so if the old-style formatting were written without the extraneous 1-ples.

    '%4d' % 42
vs

    '{:4d}'.format(42)
It isn't surprising that so much code is still being written in old-style. The advantages of the new style, like padding with arbitrary characters; value reuse (particularly for I18n, though that tends to use a different mechanism for production systems that need it); and nested display; are really quite minor.

This feeds into the other python issue on HN today - why does py3 have so little uptake? Because it is a solution to a problem few people have.


Yes, very nice, helpful site.

.format() is mostly too verbose for my tastes. Even Guido himself put it into the "meh" category in a talk. To be clear, I do use it when I need quick lightweight templating, say building a very complicated string.

But for most simple output printf style is shorter, simpler and I know it by heart. There was a guy at work who insisted on using .format() for everything that soured me on it, sometimes even simple string concatenation:

    print 'result: {long_var_name}'.format({long_var_name=long_var_name})
Sometimes lines reached 200 chars. If lines were broken down, they might extend to 4 lines for each debug statement, adding tons of clutter. Said it was more readable because you could ignore the end. :/

As sago said, printf is more concise in the majority of use cases. Not to mention the logging niceties:

    log.info('result: %s %s', one, two)  # shorter and note that fmt is deferred!
A _big_ one in my eyes. Too bad logging doesn't support {} internally.

I've also wondered if .format could be improved if the % operator could be used with it, given appropriate precedence rules. Or perhaps use another char... dunno & or /?

    print 'results: {} {}' %  results
    print 'results: {} {}' %% results
    print 'results: {} {}' /  results
Pycon, are you listening? ;)


This is what your co-worker does now:

  'result: {long_var_name}'.format(long_var_name=long_var_name)
A similar option is possible with '%'-style formatting:

  'result: %(long_var_name)s'%{"long_var_name":long_var_name}
    -or-
  'result: %(long_var_name)s'%dict(long_var_name=long_var_name)
The first of these alternatives is indeed shorter, so long as there's only a single parameter. If there are multiple parameters then the .format() solution will be shorter.

My experience with '%(name)s' % dict formatting is that I would often forget the terminal 's', and write it as '%(name)'. For template-like strings, where I want named substitutions, I am very much in favor of {}-style formatting.

So if your co-worker insists on named substitutions, then s.format() is less verbose than s % dict.

Regarding logging, there's a section of how to use format templates in the logging system, at https://docs.python.org/3/howto/logging-cookbook.html#format... . In addition to using a LogRecord factory, you could reduce the overhead in old-style logging to only an extra constructor calls:

  log.info(M('result: {} {}', one, two))
where M.__str__ forwards to str.format.

Personally, I would like a way to pre-compile the format string:

  fmt = format_compile("result: {} {} {}")
  for result in results:
    print(fmt(*result))
because currently (in Python 3.3) "%d %d %d" is about 40% faster than "{} {} {}".

EDIT: Never mind: It can be faster to interpolate the string each time than to call a constructor:

  % python3.3 -mtimeit 's="%s %s %s"' 's % (1, 3, 9)'
  1000000 loops, best of 3: 0.36 usec per loop
  % python3.3 -mtimeit 'class Spam:pass' 'Spam()'
  10000 loops, best of 3: 23.6 usec per loop
I did not expect a factor of 65!


Your timeit calls in the edit need the -s parameter before the first statements, so they aren't executed on each loop iteration. The overhead is in creating the class, not the instances.

  $ python3 -mtimeit -s 's="%s %s %s"' 's % (1, 3, 9)'
  1000000 loops, best of 3: 0.343 usec per loop
  $ python3 -mtimeit -s 'class Spam:pass' 'Spam()'
  10000000 loops, best of 3: 0.106 usec per loop


Thanks! That's a big forehead smack on my side.

The numbers are more in line with what I expected. Meaning it is possible to defer some of the time cost by using a single constructor rather than evaluating the string.


I don't like all that verbosity in short debug text at all, no matter which syntax is used. I would have done, e.g.:

    print 'result:', long_var_name
I do like your idea of a custom formatter though, perhaps it could be installed as a formatter into the logger beforehand, getting the slightly improved syntax without frequent object creation.


Right, but the verbosity here is primarily due to your co-worker, given that the guy at your work would likely have used %(name)s instead of {name}.

Regarding logging, the docs suggest that setting your own LogRecord factory is the way to go if that's what you want.


Still, with % you can skip the .format() and while logging you can even skip the %. If {} was more "built-in" with an operator/logging I think it'd be a clearer win.


My point is that even with the ".format()", if you use named arguments then .format() is shorter than '%' if you have two or more terms:

    "%(name1)s %(name2)s"%{"name1":"ABC","name2":"DEF"}
    "{name1} {name2}.format(name1="ABC",name2="DEF")
To get what you want, you need to convince your co-worker to 1) switch to positional arguments, and 2) switch to '%' instead of .format.

Even if Python never gained the .format() option, so #2 weren't an issue, you would still be stuck with #1, because '%' supported them as well.

As for logging, I pointed out two possible ways to address the problem. Personally, I don't log in performance critical loop since Python's function call overhead is so high. Testing now, you can see that 3/4ths of the overhead for logging a simple format statement is just in calling the logger:

  % python3.3 -mtimeit -s 'import logging' 'logging.info'
  10000000 loops, best of 3: 0.0525 usec per loop
  % python3.3 -mtimeit -s 'import logging' 'logging.info("abc")'
  1000000 loops, best of 3: 1.59 usec per loop
  % python3.3 -mtimeit -s 'import logging' 'logging.info("abc %d %d %d", 1, 2, 3)'
  1000000 loops, best of 3: 1.71 usec per loop
  % python3.3 -mtimeit -s 'import logging' 'logging.info("abc %d %d %d"%(1, 2, 3))'
  1000000 loops, best of 3: 1.64 usec per loop
  % python3.3 -mtimeit -s 'import logging' 'logging.info("abc %d %d %d".format(1, 2, 3))'
  1000000 loops, best of 3: 1.99 usec per loop
According to this, calling info() with more than one argument is actually slower than calling '%' every time(!).

Is your objection based on a measured performance impact (eg, more complicated formats will be worse) or a theoretical evaluation?


He and I'd never write out the full dictionary and would use a single arg or tuple.

Why? dunno... probably habit, sample code never showed it, mostly because it would be ridiculous. Why is it not ridiculous with {}? It is, though he would never agree.

I usually don't care about performance either, but the third example is easier to read and type. Looks just like a printf.


Going back a few posts:

> There was a guy at work who insisted on using .format() for everything that soured me on it, sometimes even simple string concatenation

It sounds like you are soured on new-style .format() because someone else doesn't use it appropriately. You concluded that it's intrinsic to .format(), even though s%{} is even more verbose, because of what seems to be an idiosyncratic preferences of one person.

I empathize with your other arguments. I too am much more familiar with printf-style % arguments than the Python-specific .format() syntax, which still cause me to reach for the documentation. But I do not think that it's reasonable to complain about its verbosity.


It is, Guido said it himself. Looks like I added an extra pair of curly brackets into the example that I first mentioned. That was a typo, sorry about that, it shouldn't have existed, and unfortunately probably why you insist that someone should do this 's%'%{...} though no one ever has to my knowledge.


I use format because you can break lines within the parentheses. You also get to name parameters, which is nice and explicit.

I also import print_function from the future, mainly because you can break lines inside the parentheses.

I also use ' '.join([...]) for the same reason when building a large-ish string, you can break lines inside parentheses (or square brackets in this case).

I like breaking lines inside parentheses and other delimiters.


Should be able to do both of those with % as well:

    print '%(foo)s %(bar)s' % dict(
        foo='foo',
        bar='bar',
    )


huh. I like it when I learn something new every day.


Value reuse/positional parameter references could've been added easily to the old-style formatting in the same way POSIX added them to C's printf(), using the $ specifier (although printf() unusually indices from 1, not 0):

    '%1$s %0$s' % ('one', 'two', )
Changing the padding character would probably not be too difficult either - just add the appropriate specifier to the formatting DSL; and I'm not so keen on supporting nested structures in the format string, as it seems to be going in the direction of allowing arbitrary expressions/Python code into a formatting DSL - one that I think shouldn't be more than a regular language.


I had to read your statement a couple of times to understand it. You're saying that the old syntax is totally incompatible with the new syntax? It's a shame, but I'm sure it was necessary.

The various other improvements are all things I've made use of in real code. Format is now a tool that you can reach for before you need to go to a more heavyweight templating lib (like Jinja2).

    '{p.type}: {p.kinds[0][name]}'.format(p=Plant())


Most of the examples of the brevity of the old style seem to be quite unconcerned with actual formatting. Honestly I just do simple print "Hello: " + a when I'm not too picky, and use .format when I actually want a nice output; the time of the few extra chars in format is unimportant compared to the time it takes to choose good field widths for the first time anyway, and I find the old syntax more cryptic.


I almost always use format because it looks nicer, hence more readable. Shorter is not always better. It is also mandatory in i10n strings because if there are more than 2 arguments, they might be in different order in another language.


I think the following makes more sense:

'{}'.format(format_decimal_with_prepended_spaces(42, 4))


This approach makes it really hard to change the formatting template. I might have several, which change at runtime (or even user-configurable).

With this approach, my own app needs to do the actually formatting, while `format()` merely concats.


They missed my favorite trick! locals() will give you a dict of, well, local variables, which generally coincides with what you want, so:

    a = 4
    'A is: {a}'.format(**locals())
works as expected.


You can also:

    a = 4
    'A is: %(a)d' % locals()
Perhaps is my experience programming in C, but "format" looks lees familiar and I tend to use the old style formatting.


Might not work in some special cases though. Try:

  def foo():
      a = 4
      def bar():
          print("a = {a}".format(**locals()))
      bar()
        
  foo()
which will raise a KeyError, while it will work fine if you add a

  print(a)
to the end of bar().


Actually, if you reference 'a' at all in bar, it will work:

    def foo():
        a = 4
        def bar():
            a
            print("a = {a}".format(**locals()))
        bar() 
    foo()
successfully prints

    a = 4
which is a bit confusing, but makes sense if you think about it.


In this case, 'a' isn't a local variable binding though.


Right, but it's accessible in scope though. It's just a "gotcha" for that method that might be easy to overlook if you aren't paying attention (or if someone with less experience stumbles onto this 'trick' in production code).


I've seen this and tried to avoid it since getting bitten my moving code around and not catching which variables were used. It might have been even more obscured with something like TEMPLATED_STRING.format(locals())


Why not just

    a = 4
    'A is:{}'.format(a)


I think the point would be to do something like:

    a = 3
    b = 4
    c = 6
    "A is {a}, C is {c}, B is {b}".format(**locals())
Then you don't have to figure out specifying order or definitions like ".format(a=a, b=b, c=c)". I'd probably go the more explicit route in production code anyway, but for debugging / unstable development it seems useful.


I'm still not convinced. In order to write the string portion of the code, I need to know the names of the variables in play and I have to order them in them in some order in the string regardless.

So I still don't think

    "A is {a}, C is {c}, B is {b}".format(**locals())
is better than

   "A is {}, C is {}, B is {}".format(a, c, b)


str.format_map takes a mapping directly (added in 3.2).


My last remaining complaint with new formatting is with Pylint and logging messages. The old-style way is to write:

    logging.debug('some value: %d', 123)
which avoids the cost of string interpolation if that log message wasn't going to be emitted, say because the log level is higher than DEBUG. If you instead write:

    logging.debug('some value: {}'.format(123))
then Pylint will complain that I "Use % formatting in logging functions but pass the % parameters as arguments (logging-format-interpolation)".

Yes, I can disable that particular warning, either by maintain a pylintrc file or adding annotations to the source code. Yes, this is a dumb complaint. But yes, it still bugs me.


I really don't mind these apparent (small) improvements to Python.

However, the fact that Python 3 just went off and did it's own things rather than the usual cycle of:

    1) improve a language feature
    2) deprecate the old way of doing it
    3) give people time to update code (usually a couple of point releases)
    4) remove features that have been replaced
Python 3 should not have dropped lots of little changes with no backwards compatibility. They should still make 2.8 and 2.9 that are releases that remove features and add new ones until most python code works in Python 3.


Somebody should write a HN plugin that does something like this:

    if is_about_python3(post):
        insert_token_random_complaint_from_2008()
(edit: snake-case)


Can _someone_ write a filter for snarky unconstructive comments while they are at it?


Just make sure not to filter tedious unconstructive 7-year-old lists of grievances. We can't live without those.


I really don't see what I said wasn't true, Debian still won't escape Python 2 for many years for example. Are you saying the move to Python 3 was smooth and a resounding success or are you just pretending that it's done and no one is still using Python 2.x?

Sorry my comment has annoyed you so much but you really haven't made constructive points to disprove it.


they've already done it:

improve a language feature & deprecate the old way of doing it -> Python 2.6+ and 3.2+

remove features that have been replaced -> 3.x

Did I also mention it is not hard to have a single code base that is compatible with both 2.x and 3.x?


There should be two-- and preferably only two --obvious ways to do it.

-- The Zen of Python


Python 2.7 also added a non-locale aware thousands separator that be be combined with other options:

    >>> "{:>40,.2f}".format(sys.maxint)
    '            9,223,372,036,854,775,808.00'


    Basic formatting
    Old: '%d %d' % (1, 2)
    New: '{} {}'.format(1, 2)
It should rather be `{:d} {:d}'.format(1, 2)` but even that isn't strictly equivalent (try both styles with a float or a Decimal).


I'm a c dinosaur, so I've always used old-style formatting because I know it off by heart. Having said that, the alignment operator (<^>) reminds me of my joy when learning python - power and simplicity!


This fails to mention the str.format_map() shortcut method (new in Python 3.2). It's useful if you already have a dictionary of all your values!


Really cool and the site looks great too! Was a css framework used for the styling or is it hand written? Can't tell from the minification.


    '%d %d' % (1, 2)
I would have done this as

    '%s %s' % (1, 2)
Because we're turning everything into a string at the end of the day anyway!

Yeah. Having finished the article, .format() just isn't really needed. If I'm at the point where I'm doing templating with key:values, I'll be using jinja.


%d converts the value to int first which might be useful:

    >>> '%d' % 3.14
    '3'
    >>> '%d' % 'foo'
    TypeError: %d format: a number is required, not str


I personally prefer the old style still, and I still don't see the value of deprecating it in favor of the new style. I started Python in 2008 (p3k was brand new), and I used new style formatting for about 2 years. After realizing that I was the only one using new style I switched and now I cannot imagines going back.


I use the new style for three reasons:

• it is supposedly significantly faster

• new, means, eventually we'll have to use it, so might as well get used to it now!

• it's a method on an object, which makes my brain happy. I never managed to get my brain wrapped around the old style syntax, it's inconsistent with the rest of the python syntax.


My timing tests have never come across a case where it's faster, for those cases that can be expressed directly in a '%' format string:

  % python3.3 -mtimeit 's="%s %s %s"' 's % (1, 3, 9)'
  1000000 loops, best of 3: 0.361 usec per loop
  % python3.3 -mtimeit 's="{} {} {}"' 's.format(1, 3, 9)'
  1000000 loops, best of 3: 0.593 usec per loop
  % python3.3 -mtimeit 'f="{} {} {}".format' 'f(1, 3, 9)'
  1000000 loops, best of 3: 0.569 usec per loop
The construct "x % y" has special support in CPython. Quoting from ceval.c:

        TARGET(BINARY_MODULO)
            w = POP();
            v = TOP();
            if (PyUnicode_CheckExact(v))
                x = PyUnicode_Format(v, w);
            else
                x = PyNumber_Remainder(v, w);
            Py_DECREF(v);
            Py_DECREF(w);
            SET_TOP(x);
            if (x != NULL) DISPATCH();
            break;
That is, in the '%' case, if the left-hand side is a string, then go directly to the string format. Otherwise, go though the normal binary operation resolution mechanism.

While the "".format(x) path goes through the normal method invocation path - it has no special treatment.


• it's a method on an object, which makes my brain happy. I never managed to get my brain wrapped around the old style syntax, it's inconsistent with the rest of the python syntax.

uhh... so you'd prefer to do something like `1.add(2)` ???


While arithmetic syntactic sugar is useful since it's much clearer, I agree that I'd often prefer a function call to some unintuitive symbol syntax.

Another example for me, in Python, is:

    print >> sys.stderr, 'foo' 
vs.

    print ('foo', file=sys.stderr)


Ah yes, I've been doing

    from __future__ import print_function
pretty much since it has been available. flush=True, end='' etc.... much more intuitive.


I'd be happy with `int().add(1, 2)`, it'd be more consistent.

If we carry on this discussion lispers are going to tell us we're the ones with the funny syntax!


None of those seem like very good reasons, except for the speed argument, and that is surely just an implementation issue.


One caution that this document does not mention is that the new formatting is available only in recent versions of Python. One of my projects runs on a CentOS 5.x server, which has Python 2.4, so I had to convert all of the new style formatting to old in order to get it to run there.


Wow, Python 2.4 was last updated on October 18, 2006.

Since Centos 5 still has about 2 years of life but fewer and fewer Python packages support 2.4, it might be worth looking at Pyenv:

https://github.com/yyuu/pyenv

Good overview of what it lets you do:

http://fgimian.github.io/blog/2014/04/20/better-python-versi...


No it's not. It's available on quite old versions of python.

Python 2.4 is actually pretty ancient. To put it into perspective, it's from when we had things Firefox 2.0, and IE6 was the latest IE.


Can't tell you how many times I've Googled this and had to dig through the docs. Thank you!


I use the format: foo = 'foo' bar = 'bar' print "%(foo)s%(bar)s" % locals() which I guess is what this is similar too.

basically locals(), vars(), and globals() I find very useful for string formatting.


In Dash.app (on Mac), there's actually a nifty file that you can download that shows you how all the different formatting options work.

Its good, but I think this is even better.


Its very lame that the new format kept the bad design from C hacks.

Why not 'field: {:center :minsize=8 varname}'.format(varname=123) ?


Once you start formatting more than one variable, isn't your proposed syntax going to be horribly long?

    '{:center :minsize=8 varname} {:center :minsize=8 varname2} {:center :minsize=8 varname2}'.format(varname=123, varname2=123,varname3=123)
instead of

    '{:^8} {:^8} {:^8}'.format(varname, varname2, varname3)
Concision is not always a good thing, but still...


top one is still much more readable.

also, if you are going to repeat always the same, you can standardize it in a string var and just use that instead of repeating.

and if you are NOT, then it beats '{:^8} {:*8} {:@8} {:☃8}' anytime, because now you have to know and notice the change in those cryptic chars.


Hey, very nice. I knew about the "new" format() but didn't know it was so powerful.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: