The Difficulty of Fixing Python Indentation Errors

by Bob Lyons
April, 2023

Indentation errors in Python programs are usually easy to fix. For example, we can fix the indentation error in the following Python program by removing the space before the print statement:

total = 1 + 2 + 3 + 4
 print(total)

Recently, I fixed the indentation errors in dozens of online Python programs that I didn't write. For each of the Python programs, I knew the expected output. For most of these programs, it was easy to fix the indentation errors. However, some of the programs were difficult to fix, because they had multiple indentation errors (e.g., a program with all the indentation removed!), and because I was unfamiliar with the programs.

Let's take a look at a Python program that has indentation errors that might be difficult to fix. I removed all the indentation of the following Python program that finds the shortest path from two vertices in a graph using the breadth-first search (BFS) algorithm.

from collections import deque

def bfs_shortest_path(graph, start, end):
if start == end:
return [start]

visited = set()
queue = deque([(start, [])])

while queue:
node, path = queue.popleft()
visited.add(node)

for neighbor in graph[node]:
if neighbor not in visited:
if neighbor == end:
return path + [node, neighbor]
queue.append((neighbor, path + [node]))

return None

It would be painful to restore the indentation if you were not familiar with the BFS algorithm!

Here's the program with the indentation errors fixed. (The program probably has some logic errors, even though I didn't write it 😀; ChatGPT wrote it.)

from collections import deque

def bfs_shortest_path(graph, start, end):
    if start == end:
        return [start]

    visited = set()
    queue = deque([(start, [])])

    while queue:
        node, path = queue.popleft()
        visited.add(node)

        for neighbor in graph[node]:
            if neighbor not in visited:
                if neighbor == end:
                    return path + [node, neighbor]
                queue.append((neighbor, path + [node]))

    return None

After fixing the indentation errors in those online Python programs, it occurred to me that, in the worst case, fixing indentation errors in Python programs is an NP-hard problem.

We can use the subset sum problem, which is NP-hard, to prove that the problem of fixing indentation errors in a Python program is NP-hard.

We'll use the following instance of the subset sum problem:

  • integers = [3, 34, -314, 4, 12, 5, 2]
  • target = 9

The solution is [4, 5], because 4 + 5 = 9.

The following Python program prints the solution to this instance of the subset sum problem:

#
# Program that prints the solution for the following instance of the subset sum problem:
#   integers = [3, 34, -314, 4, 12, 5, 2], target = 9
#
integers = [ 3, 34, -314, 4, 12, 5, 2 ]
target = 9
subset = []

if False:
    pass
    subset.append(integers[0])
if False:
    pass
    subset.append(integers[1])
if False:
    pass
    subset.append(integers[2])
if False:
    pass
subset.append(integers[3])
if False:
    pass
    subset.append(integers[4])
if False:
    pass
subset.append(integers[5])
if False:
    pass
    subset.append(integers[6])

if len(subset) > 0 and sum(subset) == target:
    print(f"The sum of subset {subset} is {target}.")

Now we introduce indentation errors, so that none of the integers in the integers list are added to the subset list:

#
# Program that prints the solution for the following instance of the subset sum problem:
#   integers = [3, 34, -314, 4, 12, 5, 2], target = 9
#
integers = [ 3, 34, -314, 4, 12, 5, 2 ]
target = 9
subset = []

if False:
    pass
    subset.append(integers[0])
if False:
    pass
    subset.append(integers[1])
if False:
    pass
    subset.append(integers[2])
if False:
    pass
    subset.append(integers[3])
if False:
    pass
    subset.append(integers[4])
if False:
    pass
    subset.append(integers[5])
if False:
    pass
    subset.append(integers[6])

if len(subset) > 0 and sum(subset) == target:
    print(f"The sum of subset {subset} is {target}.")

With these indentation errors, the program will not print the solution for the instance of the subset sum problem.

It's easy to correct the indentation in this program, because we know that the solution subset is [4, 5].

However, if the instance of the subset sum problem in this Python program were much larger, then it would be very difficult to fix the indentation errors.

The following is a brief sketch of the proof that the problem of fixing indentation errors in a Python program is NP-hard:

  1. The subset sum problem is NP-hard.
  2. An instance of the subset sum problem can be encoded (in polynomial time) into a Python program like the one above (where all the subset.append() statements are indented).
  3. Fixing the indentation in the Python program is as least as hard as finding the solution to the instance of the subset sum problem.
  4. Finding the solution to an instance of the subset sum problem is as least as hard as deciding whether or not the instance has a solution.
  5. Deciding whether or not the instance of the subset sum problem has a solution is an NP-hard problem.
  6. Thus, the problem of fixing the indentation in a Python program is NP-hard.

There are other programming languages that have significant indentation, including the following:

  • ABC
  • CoffeeScript
  • Nim
  • occam
  • SageMath
  • Scala (optional mode)
  • etc.

It's a safe bet that the problem of fixing indentation errors in these other languages is also NP-hard.

Thanks to N. J. A. Sloane for his helpful feedback on an earlier version of this article.