Preventing SQL Injection in Python

I've seen a rash of people asking questions in #python lately about code that incorrectly parameterizes data for use in SQL queries. The asker is typically a new programmer, or is at least new to Python, and the question tends to be something like "Why do I get a SQL syntax error with my code?". Other times, the question is unrelated, but I notice the problem in their code and warn them about it. Here's an example.

# cursor is, say, a Cursor object from MySQLdb
cursor.execute(
    # DON'T DO THIS!
    "SELECT name, email FROM customer WHERE customer_id = %s" % (1234,)
)

This code is susceptible to one of the more common security vulnerabilities in applications that talk to a database: SQL Injection

Fixing the code is pretty simple to avoid the vulnerability is pretty straightforward. We call the DB-API-standardized cursor.execute method with a second parameter instead of doing string formatting:

cursor.execute(
    "SELECT name, email FROM customer WHERE customer_id = %s", (1234,)
)

The difference is small, but quite significant.

Why Should I Care?

SQL injection is one of the most common types of security vulnerabilities. If you use an application that's susceptible to SQL injection, the impact can be quite large. Attackers could delete or edit data at will, or even get full administrator access to the application without you realizing it.

Here's an example of some real damage we can do if someone uses string formatting instead of parameterization.

from __future__ import print_function  # Python 2 compat
import sqlite3
import pprint

conn = sqlite3.connect(':memory:')
conn.execute('CREATE TABLE user (user_id INTEGER PRIMARY KEY, name TEXT)')
conn.commit()

cursor = conn.cursor()
for name in ['alice', 'bob', 'carol', 'david']:
    cursor.execute('INSERT INTO user (name) VALUES (?)', (name,))
conn.commit()

def print_all_rows(context_message):
    print('here are the current rows', context_message)
    pprint.pprint(list(cursor.execute('SELECT * FROM user')), width=20)

def execute_safe_query(name_to_delete):
    print('executing safe query')
    # The ``?`` placeholder used is "qmark" style supported by sqlite3,
    # we'll take a look at that a little later.
    cursor.execute(
        "DELETE FROM user WHERE name = ?", (name_to_delete,)
    )

def execute_injectable_query(name_to_delete):
    print('executing injectable query')
    cursor.execute(
        # DON'T DO THIS!
        "DELETE FROM user WHERE name = '%s'" % (name_to_delete,)
    )

if __name__ == '__main__':
    print_all_rows('before running queries')
    name_from_attacker = "alice' OR name NOTNULL--"
    execute_safe_query(name_from_attacker)
    print_all_rows('after safe query')
    execute_injectable_query(name_from_attacker)
    print_all_rows('after injectable query')

The output:

[(1, 'alice'),
 (2, 'bob'),
 (3, 'carol'),
 (4, 'david')]
executing safe query
here are the current rows after safe query
[(1, 'alice'),
 (2, 'bob'),
 (3, 'carol'),
 (4, 'david')]
executing injectable query
here are the current rows after injectable query
[]

As we can see, the attacker's attempt failed on with proper parameterization, but succeeded in deleting all rows in the table when we used string formatting to build the query.

Now you might think "But I would have checked that the name was actually in the database before issuing the query!" or "But I would have sanitized the name!". These are the wrong approaches, used alone. Validation and sanitization can be reasonable ways to fail fast and inform the user their input was bad, but you don't have to prove your sanitization techniques correct if you use parameterization as well.

The Source of the Problem

I believe this problem trips up newbies so often because of what I've come to consider a design flaw in the DB-API: allowing printf-style %s placeholders for parameterizing queries. This "format" paramstyle makes the difference between the vulnerable and safe queries below difficult to see by an untrained (or sleepy, or distracted) eye. Let's compare the initial examples:

# UNSAFE:
cursor.execute(
    "SELECT name, email FROM customer WHERE customer_id = %s" % (1234,)
)
# SAFE:
cursor.execute(
    "SELECT name, email FROM customer WHERE customer_id = %s", (1234,)
)

The difference is only two characters... quite easy to overlook if you're skimming or not looking very closely during a code review.

The situation is similar using named parameters in the "pyformat" paramstyle:

# UNSAFE:
cursor.execute(
    "DELETE FROM customer WHERE customer_id = %(id)s" % {'id': 1234}
)
# SAFE:
cursor.execute(
    "DELETE FROM customer WHERE customer_id = %(id)s", {'id': 1234}
)

Not all DB-API adapters use this paramstyle. The standard library's sqlite3 module supports the "qmark" paramstyle for positional parameterization as well as the "named" paramstyle:

## "qmark" with sqlite3

# UNSAFE:
cursor.execute(
    "SELECT name, email FROM customer WHERE customer_id = %s" % (1234,)
)
# SAFE:
cursor.execute(
    "SELECT name, email FROM customer WHERE customer_id = ?", (1234,)
)

## "named" with sqlite3

# UNSAFE:
cursor.execute(
    "DELETE FROM customer WHERE customer_id = %(id)s" % {'id': 1234}
)
# SAFE:
cursor.execute(
    "DELETE FROM customer WHERE customer_id = :id", {'id': 1234}
)

These make it harder to mistakenly write vulnerable code, or pass over it in a code review. If you see any %s or other printf-style placeholders in a query involving the sqlite3 module, it should set off alarm bells in your head immediately. Unfortunately, only some DB-API drivers support this style, and most drivers support at most two different styles (typically one positional and one named style).

Since the standard has been around for a long time, and many database adapters have been written to support it, it's unreasonable to expect this flaw to be corrected. It's now a problem of education. Seasoned developers have a duty to our less-experienced colleagues to correct these mistakes as soon as possible.