Python and Django Logging in Plain English

If you’ve ever written a program and printed out a value to see what’s going on during execution, then you understand at some level why logging is so valuable. Knowing what’s happening in your code at a point in time is enormously useful from both technical and business perspectives. This knowledge lets developers and product managers make smart choices about what systems to fix or modify and lets them see what actions users take when they use your software.

Thankfully, Python has a much more powerful, built-in library for logging than the print() function, and it’s named, simply enough, logging. The library is flexible and customizable but has a reputation for being hard to understand at first because there are multiple moving pieces. Django uses this library by default and makes it easy to integrate into your project.

So while this is technically an article about logging with Django, the best way to explore that topic is to understand the logging library in general, and how Django uses it specifically. This is part two of a four-part series about what happens when the setup function in __init__.py gets executed.

  1. How Django accesses project settings.
  2. How Django and Python manage logging.
  3. Why Django allows the script_prefix to be overridden and how this is used.
  4. How Django populates and stores applications.

Since this piece is quite long, feel free to jump to whatever section is most relevant to you. If you’re new to logging you may want to start with A Short Overview of Logging

A Short Overview of Logging

Logging is how you can see inside of your code while it’s running. At a high level, adding logging to your code involves three main tasks:

  1. Deciding what information to output and where in the code to do so.
  2. Deciding how those logs get formatted (literally, what text gets output).
  3. Deciding where to send those logs (e.g. stdout or syslog).

Let’s first look at how a simple print() statement would be used to address these three tasks. As an example, we’ll use a simple recursive function that finds the nth Fibonacci number (which follows the sequence 1, 1, 2, 3, 5, 8, 13, …).

def find_nth_fibonacci(n):
    if (n < 0):
        print("Invalid entry: {}".format(n))
        return
    if n <= 1:
        return n

    print("Find f({}) + f({})".format(n - 1, n - 2))

    return find_nth_fibonacci(n - 1) + find_nth_fibonacci(n - 2)

This algorithm does three things.

  1. If n is negative it prints out an error message and returns.
  2. If n <= 1 it triggers the base case so it returns the value n and doesn’t recurse.Otherwise, it recurses to calculate find_nth_fibonacci(n - 1) + find_nth_fibonacci(n - 2) and returns the sum.

find_nth_fibonacci(-2) would yield:

Invalid entry: -2

Running find_nth_fibonacci(6) would print out the following:

Find f(5) + f(4)
Find f(4) + f(3)
Find f(3) + f(2)
Find f(2) + f(1)
Find f(1) + f(0)
Find f(1) + f(0)
Find f(2) + f(1)
Find f(1) + f(0)
Find f(3) + f(2)
Find f(2) + f(1)
Find f(1) + f(0)
Find f(1) + f(0)

In the first case, we check for an invalid entry and notify the user before quitting. In the second case, we print out the recursive calls about to take place. Let’s compare the print statements in this example to the tasks required for a logging system:

Deciding what information to log and where in the code to do so.

We do this by placing the print statements in the appropriate location based on the surrounding code and adding a message that is contextually appropriate (e.g. “Invalid entry: -2”).

Deciding how those logs get formatted (literally, what text gets output).

In this case, the message is formatted as plain text, but it would be easy to output the message as JSON, XML, etc.

Deciding where to send those logs (e.g. the command line or syslog).

Since print() sends the output to the command line via stdout (Standard Output) we don’t have to explicitly define the output. But as an example, we could make a function named send_log_to_email(message, email_address) that would take the encoded message and then email it to the specified email address.

There are two important takeaways from looking at print() through the requirements for a logging system:

  1. A lot is happening implicitly here. If we want to get custom outputs or behavior, we’ll need to start defining actions, encodings, and destinations explicitly.
  2. If you were trying to add logging to production software based on the print() function things would quickly become hard to manage.

In other words, the print() function doesn’t scale. That brings us to Logging, an industrial-strength logging library.

The logging Library

logging is a powerful library that is the standard for logging in Python codebases. It’s the default logging library in Django, so we’ll use Django as a real-world way of understanding how logging works.

The logging funnel

The key to understanding how logging works is knowing that it encourages you to log a lot of data, and then gives multiple mechanisms for filtering out that data. It’s similar to a funnel. Lots of data goes in, but most of it gets filtered out before leaving the system as formatted logs (for the nitpicky, yes I know that funnels don’t actually filter, but work with me here. It’s a metaphor).

LogFunnel (1)

Why does it work this way? Because that leads to simple application code (just log everything!) that stays the same from environment to environment. Filters can then be configured via settings and config files to match the environment. If you’re debugging locally you may want to filter nothing out and create log records of everything. But on production, you only want to know about errors, in which case you can add more filters (i.e. the funnel gets much more narrow).

Logging Objects

Now that we’ve looked at the logging funnel at a high level, we have the necessary context to examine logging as a whole by looking at Django’s default configuration. We’ll cover how the different filters work together to customize behavior by environment and also look at what happens to logs once they’ve left the logging funnel.

This JSON blob outlines Django’s default logging configuration.

DEFAULT_LOGGING = {
    'version': 1,
    'disable_existing_loggers': False,
    'filters': {
        'require_debug_false': {
            '()': 'django.utils.log.RequireDebugFalse',
        },
        'require_debug_true': {
            '()': 'django.utils.log.RequireDebugTrue',
        },
    },
    'formatters': {
        'django.server': {
            '()': 'django.utils.log.ServerFormatter',
            'format': '[{server_time}] {message}',
            'style': '{',
        }
    },
    'handlers': {
        'console': {
            'level': 'INFO',
            'filters': ['require_debug_true'],
            'class': 'logging.StreamHandler',
        },
        'django.server': {
            'level': 'INFO',
            'class': 'logging.StreamHandler',
            'formatter': 'django.server',
        },
        'mail_admins': {
            'level': 'ERROR',
            'filters': ['require_debug_false'],
            'class': 'django.utils.log.AdminEmailHandler'
        }
    },
    'loggers': {
        'django': {
            'handlers': ['console', 'mail_admins'],
            'level': 'INFO',
        },
        'django.server': {
            'handlers': ['django.server'],
            'level': 'INFO',
            'propagate': False,
        },
    }
}

If we look at the JSON blob above we’ll see four types of objects defined, which match the four main objects in logging.

Loggers

Loggers are the application-level interface to the logging system. A system may have multiple different loggers, each with their own name and rules.

'loggers': {
    'django': {
        'handlers': ['console', 'mail_admins'],
        'level': 'INFO',
    },
    'django.server': {
        'handlers': ['django.server'],
        'level': 'INFO',
        'propagate': False,
    },
}

Here we see two loggers, named django and django.server, respectively. They each have extra keys which we’ll get back to shortly.

So instead of calling print(message), you would request a specific Logger object and then use that to log the message:

logger = logging.getLogger(‘django.server’)
logger.warning(message)

This would emit a log as a LogRecord object (don’t worry about warning yet, we’ll get there).

Handlers

Handlers answer the question “now that we have a log, where should we send it?” It could go to a file, to the console, get sent as an email, or other possible destinations. The exact behavior could differ depending on context as well. While on your local computer you may want the logs to go to the console so it’s easier to debug, if your code is in production you’ll want a more robust logging setup.

To allow different behaviors, a single logger can send a LogRecord to multiple different handlers, each with their own rules for how to handle the log. If we take another look at the django logger we’ll see that it has two handlers, console and mail_admins.

'loggers': {
    'django': {
        'handlers': ['console', 'mail_admins'],
        'level': 'INFO',
    },
}

 

OneLogTwoHandlers

These handlers are also defined in the DEFAULT_LOGGING configuration.

'handlers': {
        'console': {
            'level': 'INFO',
            'filters': ['require_debug_true'],
            'class': 'logging.StreamHandler',
        },
        'django.server': {
            'level': 'INFO',
            'class': 'logging.StreamHandler',
            'formatter': 'django.server',
        },
        'mail_admins': {
            'level': 'ERROR',
            'filters': ['require_debug_false'],
            'class': 'django.utils.log.AdminEmailHandler'
        }
    },

We’ll return to the level and filters keys shortly, but make note of the class keys. These define the actual classes that handle the log. The console log uses logging.StreamHandler, which comes with the logging library. mail_admins uses the django.utils.log.AdminEmailHandler class, which is a custom class defined by Django. It has an emit() method that accepts a LogRecord, and a send_mail() method that sends an email containing log data to the site admin.

One logger. Two handlers. Two different destinations.

Filters

Filters allow granular rules for whether or not a record should be emitted. Both loggers and handlers can have filters linked to them, which are called every time they receive a LogRecord. In the example of the handlers above we see that the console handler is linked to the require_debug_true filter, and mail_admins is linked to the require_debug_false filter.

'filters': {
    'require_debug_false': {
        '()': 'django.utils.log.RequireDebugFalse',
    },
    'require_debug_true': {
        '()': 'django.utils.log.RequireDebugTrue',
    },
},

These filters do exactly what the names suggest. When called, RequiredDebugTrue() checks to make sure that the app is in debug mode. If so, it returns True, indicating that logging can proceed. Otherwise, it returns False and the record isn’t logged. The opposite is true for RequiredDebugFalse().

 

OneLogTwoHandlersWithHandlers

 

Why can both loggers and handlers have filters?

We don’t see it in this example, but both handlers and loggers can have filters. This confused me when first learning about logging. If something shouldn’t be logged, wouldn’t only one filter be needed to block the record from being emitted?

The reason that both loggers and handlers have filters is that sometimes you want to configure what gets logged and sometimes you want to configure where that log gets sent.

Consider the example of logging an HTTP response. Perhaps you use an external API and want to log any 4XX or 5XX response statuses. You could add application-side code that checks the response status and only logs 4XX or 5XX responses. But that adds unnecessary logic to your codebase that has little to do with application behavior.

By using a filter you can easily separate concerns. You just log every response regardless of status and modify your logging configuration to add a filter to that logger. This filter checks the response status and stops anything with a 2XX or 3XX status from being emitted. Afterward, any logs that don’t get filtered out will be sent to the handlers, which have their own filters to determine whether or not this is the right scenario for them to emit this log record.

Formatters

Formatters are responsible for receiving a LogRecord object and outputting text based on predetermined rules. Think of them as Python’s String.format() method on steroids. As an example, in the event of an error, a formatter may receive a LogRecord that includes a stack trace. Depending on the rules, the formatter may then truncate the stack trace to a certain number of characters, add a timestamp, and output this data as a JSON-encoded string.

In the django.server handler we see that it has a formatter named django.server.

'handlers': {
    ...
    'django.server': {
        'level': 'INFO',
        'class': 'logging.StreamHandler',
        'formatter': 'django.server',
    },
    ...
},

If we look at the django.server formatter we’ll see three keys, (), format, and style:

'formatters': {
    'django.server': {
        '()': 'django.utils.log.ServerFormatter',
        'format': '[{server_time}] {message}',
        'style': '{',
    }
},

Let’s look at each key in detail.

  1. () – This is a special key that indicates an absolute path to a user-defined formatting class. If this key doesn’t exist then a standard logging.Formatter object is created. Since it’s explicitly defined for django.server, the logging library will instantiate a django.utils.log.ServerFormatter object and pass in the remaining keys as keyword arguments.

    So in this case, the dictionary above is functionally the same as the following code.
django.server = ServerFormatter(format='[{server_time}] {message}', style=’{‘)
  1. format – The actual format of the log message with placeholders for server_time and message. These will be replaced with the data from the actual logs.
  1. style – This lets the logger know what character will be used to identify placeholders within the string, as Python provides multiple patterns.  This formatter is using curly braces, which can be seen in the format string.

Logging Object Summary

We covered the main objects used in Python’s logging library, which include the following:

  1. Loggers are used by application code to create LogRecords.
  2. Handlers control where LogRecords are sent or output to.
  3. Filters are responsible for filtering out LogRecords that we don’t want to keep.
  4. Formatters determine what the output data actually looks like.

We know what individual objects are responsible for and how they connect to one another, but there’s still key pieces of information that we need to understand.

Logging Structure

The logging library has this handy (and slightly overwhelming) flowchart to explain the logging logic and how these objects fit together. If you look carefully, you’ll notice that there are a few topics we haven’t covered, which we’ll get to next.

 

logging_flow

 

Observations:

  1. Log levels are apparently a thing?
  2. There are parent and child loggers.
  3. Propagation is apparently a thing too?

Let’s look at each of these in more detail:

Log Levels

If we look back at our previous find_nth_fibonacci method we’ll see that our ‘logs’ have two different purposes.

def find_nth_fibonacci(n):
    if (n < 0):
        print("Invalid entry: {}".format(n))
        return
    if n <= 1:
        return n

    print("Find f({}) + f({})".format(n - 1, n - 2))

    return find_nth_fibonacci(n - 1) + find_nth_fibonacci(n - 2)

print("Invalid entry: {}".format(n)) should only be executed if an error has occurred but print("Find f({}) + f({})".format(n - 1, n - 2)) gets executed during normal execution as a way to see what’s happening inside the code.

The value of using logs for different purposes is even more pronounced inside of a large application running in production. That’s why logging actually has five different types of logs (in increasing order of severity) and lets you create custom levels if desired.

  1. DEBUG – Very detailed system information. You’d usually only want this for low level debugging.
  2. INFO – Output that occurs during standard operations. For example, a program making SQL queries might use this to log different queries when they occur.
  3. WARNING – From the logging docs: “An indication that something unexpected happened, or indicative of some problem in the near future (e.g. ‘disk space low’). The software is still working as expected.”
  4. ERROR – A problem has occurred that prevents normal behavior from executing properly.
  5. CRITICAL – A serious enough error has occurred that may cause the entire program to shut down.

If we consider the case of an application that makes SQL queries, we can imagine that normal queries might be logged at the INFO log level, but a bad connection to the DB or a failed request might be logged at the WARNING or ERROR levels. As a developer, you may also want different behaviors for alerting you of these different types of logs. You may want an email alert if there’s an error but normal INFO-level logs should be sent to a log file.

logging handles this by assigning a minimum log level to each logger and handler. If we create our logger with a minimum log level of ‘WARNING’, then only ‘WARNING’, ‘ERROR’, and ‘CRITICAL’ log levels will be passed through. ‘DEBUG’ and ‘INFO’ logs will not be passed on.

We can see this in Django’s logging config JSON.

'handlers': {
    'console': {
        'level': 'INFO',
        'filters': ['require_debug_true'],
        'class': 'logging.StreamHandler',
    },
    'django.server': {
        'level': 'INFO',
        'class': 'logging.StreamHandler',
        'formatter': 'django.server',
    },
    'mail_admins': {
        'level': 'ERROR',
        'filters': ['require_debug_false'],
        'class': 'django.utils.log.AdminEmailHandler'
    }
},
'loggers': {
    'django': {
        'handlers': ['console', 'mail_admins'],
        'level': 'INFO',
    },
    'django.server': {
        'handlers': ['django.server'],
        'level': 'INFO',
        'propagate': False,
    },
}

The console and django.server handlers, as well as the django and django.server loggers, all have a minimum log level of INFO. The mail_admins handler has a minimum level of ERROR. This intuitively makes sense; it would be extremely annoying if you got emails for every DEBUG or INFO LogRecord, but it may be helpful to get emails if an error occurs.

Log levels can be saved as environment variables or read from a configuration file. This is one of the best parts about log levels because this allows you to modify the log handling rules without changing any application code.

How do logs get their level?

Earlier we looked at this snippet of code:

logger = logging.getLogger(‘django.server’)
logger.warning(message)

In this scenario, message doesn’t include the level. Once we’ve looked up the logger, we use the logger method that matches the name of the desired log level. So in this case, warning() creates a WARNING level log. The same applies for the other standard log levels.

Inheritance

To understand inheritance, we need to understand how loggers are structured. If you look at most functional Django apps you won’t see the django.server logger getting looked up in application code. You’ll see something similar to the following:

logger = logging.getLogger(__name__)

__name__ is a built-in Python variable that evaluates to the current module. So in the module project/app/tests, __name__ will evaluate to project.app.tests. That doesn’t mean that the project has explicitly defined a logger named project.app.tests. This is where inheritance comes into the picture.

Python loggers are organized in a parent-child structure with a hierarchy similar to module namespacing. Every application has a main logger that is never (or should never be) used directly. Child loggers point back to parent loggers, which eventually point to the main logger. Names are defined using dot-notation, so the name ‘parent_logger.child_logger’ denotes a child_logger which “inherits” from a logger named parent_logger.

The child can inherit a log level from the parent, and if we look up a logger that hasn’t explicitly been defined in the logging configuration then logging will create that logger and move up the inheritance tree until one is explicitly defined. The new logger will then inherit that logger’s log-level. So if project.app.tests isn’t defined, logging will check if project.app is defined and use that log-level. If that isn’t defined it will try project. If this still isn’t defined then the root logger will be used.

Benefits of the parent-child architecture:

  1. General behavior can be added to parent loggers and specific behavior can be added to child loggers.
  2. Naming and relationships follow the structure that Python already uses for imports, so we don’t need a new mental model.
  3. It allows us to use module-specific loggers without explicitly configuring them.

Propagation

If we create a new logger dynamically at runtime then that logger won’t have any handlers of its own. It also doesn’t inherit them from its parent. So how does that LogRecord get to a handler?

This problem is solved via propagation. Propagation means passing a log from a child logger to a parent logger’s handlers, which is the default behavior. This means that a LogRecord created by child_logger will first get sent to child_logger’s handlers and then if child_logger.propagate == True it will get passed to the parent logger. The parent logger will then bypass its own filters and send the LogRecord directly to its handlers. If parent_logger.propagate == True this passing will continue until the logger reaches the root logger, or until an ancestor has propagate set to False.

InheritancePropagation

Let’s take another look at Django’s loggers.

'loggers': {
    'django': {
        'handlers': ['console', 'mail_admins'],
        'level': 'INFO',
    },
    'django.server': {
        'handlers': ['django.server'],
        'level': 'INFO',
        'propagate': False,
    },
}

Based on what we just covered about logger hierarchy and dot-notation, we can see that django is the parent of the django.server logger. But in this case, it means relatively little because propagate is set to False on django.server. If propagate was set to True, then something logged by django.server would first be handled by the django.server handler, and then propagate to django’s handlers, console and mail_admins.

A common design that takes advantage of this hierarchy involves only adding handlers to the root or project-level logger and then adding filters to the application’s child loggers. Module-level loggers are then responsible for what gets logged but the decision of where to send the final LogRecord is controlled at the root-level.

Propagation is useful because, without it, a module-level logger that doesn’t have it’s own handlers wouldn’t actually do anything with the logs it creates. Because of propagation, we can create a new module, lookup the logger for that specific module via the __name__ variable, and immediately have a new logger that inherits a log-level and propagates its logs up to its parent’s handlers.

Django Logging

We’re finally done covering Python’s logging library and we’ve looked at the settings that Django uses to configure its own logging setup. The final piece is to look at how this is all wired up inside of Django.

Let’s return to the __init__.py file that this series focuses on:

from django.utils.version import get_version

VERSION = (2, 2, 0, 'alpha', 0)

__version__ = get_version(VERSION)

def setup(set_prefix=True):
    """
    Configure the settings (this happens as a side effect of accessing the
    first setting), configure logging and populate the app registry.
    Set the thread-local urlresolvers script prefix if `set_prefix` is True.
    """
    from django.apps import apps
    from django.conf import settings
    from django.urls import set_script_prefix
    from django.utils.log import configure_logging

    configure_logging(settings.LOGGING_CONFIG, settings.LOGGING)
    if set_prefix:
        set_script_prefix(
            '/' if settings.FORCE_SCRIPT_NAME is None else settings.FORCE_SCRIPT_NAME
        )
    apps.populate(settings.INSTALLED_APPS)

For the purposes of logging we care about two lines of code:

from django.utils.log import configure_logging

and

configure_logging(settings.LOGGING_CONFIG, settings.LOGGING)

The configure_logging() function is imported from django.utils.log and gets called with settings.LOGGING_CONFIG and settings.LOGGING as arguments (If you’re curious how settings are looked up in Django we covered that in the last post.

Let’s lookup settings.LOGGING_CONFIG and settings.LOGGING, which live in /django/conf/global_settings.py

# The callable to use to configure logging
LOGGING_CONFIG = 'logging.config.dictConfig'

# Custom logging configuration.
LOGGING = {}

So log.configure_logging gets called with the arguments logging.config.dictConfig and {} passed in.

That doesn’t tell us much. Let’s keep digging.

If we look in /django/utils/log.py we’ll find the following:

  1. Our configure_logging() function (reproduced below).
  2. The DEFAULT_LOGGING dictionary from the start of this post.
  3. Multiple classes that inherit from logging.Handler, logging.Formatter, and logging.Filter.
  4. A function named log_response().

Let’s look at these below:

The configure_logging() Method

def configure_logging(logging_config, logging_settings):
    if logging_config:
        # First find the logging configuration function ...
        logging_config_func = import_string(logging_config)

        logging.config.dictConfig(DEFAULT_LOGGING)

        # ... then invoke it with the logging settings
        if logging_settings:
            logging_config_func(logging_settings)

This is called from init.py with the following default arguments:

configure_logging(logging_config='logging.config.dictConfig', logging_settings={})

The default value for the logging_config argument is logging.config.dictConfig. logging provides multiple different ways to configure its settings. dictConfig is one of these, and as the name suggests, it accepts configs that are formatted as a dictionary.

Notice that the first line of the method is a conditional:

if dict_config:

This gives us the option of not using Django’s default logging configuration if we want to set up something custom elsewhere.

A few lines later we’ll see the following:

logging_config_func = import_string(logging_config)

import_string is a package that allows you to import an object from a path that’s encoded as a string. Here the function logging.config.dictConfig gets imported and stored as logging_config_func.

So this dictConfig method must be used with the DEFAULT_LOGGING dictionary from the start of this post, right? Nope, that actually isn’t what happens.

Let’s look at the final few lines of the method:

logging.config.dictConfig(DEFAULT_LOGGING)

# ... then invoke it with the logging settings
if logging_settings:
   logging_config_func(logging_settings)

DEFAULT_LOGGING is always configured by dictConfig; the user isn’t given control over this unless they opt of logging entirely. The logging_config_func function that the user can change (but which defaults to dictConfig) is used to configure the user-specified logging_settings value, which defaults to an empty dictionary.

This makes sense if you think about it. Django always uses its preferred function to import default logging settings, and then lets the user override or add additional logging with their own settings and configuration function.

Customizing logging

We saw that the default value of logging_settings is an empty dictionary? But we can define settings.LOGGING to a custom value in our project’s settings.py file. This could be used to add additional loggers, handlers, filters, or formatters, and then the logging_config_func method will be used to configure these logging settings right after DEFAULT_LOGGING is configured.

Inheritance and Propagation in Action: Django Logging Extensions

We covered inheritance and propagation in addition to looking at Django’s default logging configuration. But does the default implementation actually make use of these features?

We can see them in practice with Django’s logging extensions. Since Django handles much of the request-response cycle in the background, it makes sense that Django would use its own loggers to log relevant web server data. In addition to django.request, which is defined explicitly, it dynamically uses the following loggers in code:

  • django.server
  • django.template
  • django.db.backends
  • django.db.backends.schema
  • django.security.*

By this point, you should have the knowledge to understand how these can be used (and how they are useful) without being explicitly configured.

What’s next?

If you’ve made it this far you may be wondering how to actually use logging effectively. While that wasn’t the purpose of this piece, there are some quite helpful articles online about how to set up logging in a manner that is intuitive and will extend into production well. I particularly like the post Django Logging, The Right Way by Peter Baumgartner at Lincoln Loop. I think it covers the main topics, such as naming loggers and how to send them to Sentry to check for errors, in a clear and opinionated manner.

Finally, if you really have made it this far, I’d love to hear your thoughts or comments below or by shooting me a tweet at @JamesTimmins. This piece is obviously quite long and I’d love to know if it was useful to anyone other than myself! I’d also encourage you to signup for email alerts when I post new articles. You’ll occasionally get deep dives into Django’s codebase and you’ll provide me with a small amount of the internet affirmation which we all so desperately crave.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.