/ python

Python Gone Bad

Not that long ago perl was the most common go-to language for developers, and many claimed its syntax was nothing more than line noise. Python should have fixed all that. But did it?

Line Noise

The choice of programming language is the least significant factor effecting readability or maintainability of the code. Good developers write good code everywhere, bad developers write bad code most of the time and cool kids will find the too cool ways to act cool in every language.

When perl was the go-to langauge it was easy to find real world programs that featured code like this:

$_="  ".<>;s@.@"$&"^chr($'=~/^  |^$/?0:3)@ge;print

Python promised to change all that with beautiful syntax that'll make the above invalid. That worked, but I think it didn't solve the problem. You see the unreadability of code is determined by its semantics. Here are some things to avoid when writing Python.

Common Module

A first code smell you see in many designs is the "common" or "utils" module. This holds everything we didn't find a better place for. And the sooner you'll write a "common" module you'll find more functions that belong there.

Specifically in Python there's probably a module for real common code, and if there isn't one the code is probably not that common and should be part of the class hierarchy.

A recent common function I encountered in a common module was:

def namedtuple_asdict(obj):
    """ Converts a nested namedtuple to dict """
    if hasattr(obj, "_asdict"):  # detect namedtuple
        return OrderedDict(zip(obj._fields, (namedtuple_asdict(item) for item in obj)))
    elif isinstance(obj, str):  # iterables - strings
        return obj
    elif hasattr(obj, "keys"):  # iterables - mapping
        return OrderedDict(zip(obj.keys(), (namedtuple_asdict(item) for item in obj.values())))
    elif hasattr(obj, "__iter__"):  # iterables - sequence
        return type(obj)((namedtuple_asdict(item) for item in obj))
    else:  # non-iterable cannot contain namedtuples
        return obj

Which converts a named tuple to a dictionary recursively. This looks very clever but completely unnecessary. The simplejson module already knows how to JSON encode recursive named tuples.

More often however such common functionality hide the fact that it's usually used only once in your code, and only in a very specific situation. Here's another common function called lowercase_value:

def lowercase_value(d, key):
    return {
            k: v.lower() if k == key else v for k, v in d.items()

The function may look common but it was actually written with a very specific use case in mind. In my case it was to lowercase the email value after parsing a JSON with data. A better approach would be to have the entire parsing logic in one place.

If your common function is really common search for a module (or write one and publish). More often though it'll only be used this one time, and probably means your class hierarchy needs some more thought.

Abstract Classes

This Java idiom takes a completely wrong turn when translated to Python:

class DogInterface:
    def bark(self):
        raise NotImplementedError

    def fetch(self, stick):
        raise NotImplementedError

This worked in Java (or C++) because those languages provided typed variables, but Python only provides typed data. So you'll always use the real dog class and never mention that abstract class after it was declared.

Calling an undeclared function in Python raises an error anyways, and without compile time checking the interface has no added value over not writing it. And it'll make maintainance harder because now we need to document and change along with actual changes in real code.

Python uses implicit interfaces by design. Let's stick with it.

Too short or too long functions

We can probably agree that functions should do just one thing, but it's hard to agree on the resolution. Python code could go bad by having too many functions, each doing a fraction of an interesting task. Alternatively it'll go bad by having one function doing a huge task that should have been performed by multiple functions.

As a rule of thumb I like to take the number of lines in a function. Ideally it should be 15-20 lines. Here's a function that is too short:

async def tweets(self):
    server_response = await self._http_request('tweets')
    formatted_lists = server_response['items']
    return self._parse_tweets(formatted_lists)

It declares two new variables and calls another function to continue the parsing. The overhead of this function is far greater than its use.

The most notable downside of having many short functions is that each has access to a specific small set of data. When we need to change the logic the change will probably span many of them and is thus harder to test.

I'll save you the trouble of pasting the opposite as I'm sure you've all seen a function with hundreds of lines. It didn't start that way of course. Every long function started as a normal function and then reality came.

Writing code that's easy to write and maintain is not a simple task. Although python does not permit perl style line noise, its class based semantics can lead to complex architectures that are not in par with the task at hand.

Keep it simple, Keep it concise, and enjoy the good parts in the language. Good code grows only when it needs to.