Python coding tips #2: string formatting

The idea behind this topic is a newer way of doing string formatting. If we want to say a simple print out like "Error: Test case 'test_string_formatting' has failed", where test_string_formatting is a value determined by a variable test_case_name, we have a couple of options:

String concatenation

The simplest method, and the one most likely to be used by beginners, is to simply concatenate the strings together:

1
2
def print_error(test_case_name):
    print "Error: Test case '" + test_case_name + "' has failed"

Easy to understand, although a bit messy in the above example with the single quotes. The main issue with this method is  only works if the variable you are using is a string. For example:

1
print "The port  number " + port_number + " is in use"

If the variable port_number is a string, then it will work fine. If the variable port_number is an integer value, then you will get a type exception. The work around is to always cast the variable as a string:

1
print "The port  number " + str(port_number) + " is in use"

It also begins to be messy when multiple strings are being concatenated:

1
2
def print_error(test_case_name, error, admin_name, admin_email):
    print "Error: Test case '" + test_case_name + "' has failed with message " + error + ". Please re-run '" + test_case_name + "' or contact " + admin_name + " at: " + admin_email

 Modulo formatting

The second approach, which is more commonly used in Python 2.X, is the modulo style formatting. This behaves almost identically to how the print functions in C/C++ do their formatting.

1
2
def print_error(test_case_name):
   print "Error: Test case '%s' has failed" % test_case_name

This allows your string to format to look a lot cleaner than using the concatenation approach. It also allows the use of non-string elements without casting.

1
print "The port number %d is in use" % port_number

The above example uses %d to indicate that it is expecting an integer value, instead of the normal %s string format. This also allows easier use of multiple formats per line, for example:

1
2
def print_error(test_case_name, error, admin_name, admin_email):
    print "Error: Test case '%s' has failed with message %s. Please re-run '%s' or contact %s at: %s" % (test_case_name, error, test_case_name, admin_name, admin_email)

It is a little nicer than the string concatenation method in terms of readability, but it is still not the best when you get over 3-5 formats or duplicate elements in the string (above, test_case_name appears in the string twice). You can do some tricks to make it a bit easier to read:

1
2
def print_error(test_case_name, error, admin_name, admin_email):
    print "Error: Test case '%(tc)s' has failed with message %(error)s. Please re-run '%(tc)s' or contact %(admin_name)s at: %(email)s" % {"tc": test_case_name, "error": error, "admin_name": admin_name, "email": admin_email}

The above is easier to associate which variable appears where in the string, but requires you to create a dictionary.

format () method

Starting in Python 2.6, they introduce a built-in method called format() to replace the above modulo style formatting. 

1
2
def print_error(test_case_name):
   print "Error: Test case '{}' has failed".format(test_case_name)

Each {} is replaced by the argument of the format() method. The main benefit is that this method does not care what the passed type is.

1
print "The answer to {} is {} + {} = {}".format("Question 1", 4, 6.578, 4 + 6.578)

In the above, you don't have to specify which value is a string, which is an integer, or a float. All that is handled in the format() method. If you want to type check, you can specify what you expect per formatting item:

1
print "The answer to {:s} is {:d} + {:f} = {:f}".format("Question 1", 4, 6.578, 4 + 6.578)

You can also use indexing to not have to repeat multiple arguments, for example:

1
2
def print_error(test_case_name, error, admin_name, admin_email):
    print "Error: Test case '{}' has failed with message {}. Please re-run '{}' or contact {} at: {}".format(test_case_name, error, test_case_name, admin_name, admin_email)

 is equivalent to:

1
2
def print_error(test_case_name, error, admin_name, admin_email):
    print "Error: Test case '{0}' has failed with message {1}. Please re-run '{0}' or contact {2} at: {3}".format(test_case_name, error, admin_name, admin_email)

This uses indexes to determine what is being substituted. The number in the braces represents the argument position of the format() method starting at zero. In the above example, we don't have to repeat the test_case_name argument as we are referencing it as {0}

You can also use keywords to reference each substitution:

1
2
def print_error(test_case_name, error, admin_name, admin_email):
    print "Error: Test case '{tc}' has failed with message {error}. Please re-run '{tc}' or contact {admin_name} at: {email}".format(tc=test_case_name, error=error, admin_name=admin_name, email=admin_email)

Of course, there is much more you can do with this method that I can't describe here, so please check out out the official Python docs on string types.

Which to use

There is no real downside to using any of these approaches, however the .format method will most likely make your life easier due to its internal handling of types. Another example of this is with unicode characters, for example:

1
2
3
4
5
# coding=utf-8
a = u'text'
b = "日本のフォルダ"
print "> {0} {1}".format(a, b)
print "> %s%s" % (a, b)

The second modulo formatting example will fail with encoding issues.