The Difficulty of Fixing Python Indentation Errors
by Bob Lyons
April, 2023
Indentation errors in Python programs are usually easy to fix.
For example, we can fix the indentation error in the following Python program by removing the space before the print statement:
total = 1 + 2 + 3 + 4
print(total)
Recently, I fixed the indentation errors in dozens of online Python programs that I didn't write.
For each of the Python programs, I knew the expected output.
For most of these programs, it was easy to fix the indentation errors.
However, some of the programs were difficult to fix, because they had multiple indentation errors (e.g., a program with all the indentation removed!), and because I was unfamiliar with the programs.
Let's take a look at a Python program that has indentation errors that might be difficult to fix.
I removed all the indentation of the following Python program that finds the shortest path from two vertices in a graph using the breadthfirst search (BFS) algorithm.
from collections import deque
def bfs_shortest_path(graph, start, end):
if start == end:
return [start]
visited = set()
queue = deque([(start, [])])
while queue:
node, path = queue.popleft()
visited.add(node)
for neighbor in graph[node]:
if neighbor not in visited:
if neighbor == end:
return path + [node, neighbor]
queue.append((neighbor, path + [node]))
return None
It would be painful to restore the indentation if you were not familiar with the BFS algorithm!
Here's the program with the indentation errors fixed. (The program probably has some logic errors, even though I didn't write it 😀; ChatGPT wrote it.)
from collections import deque
def bfs_shortest_path(graph, start, end):
if start == end:
return [start]
visited = set()
queue = deque([(start, [])])
while queue:
node, path = queue.popleft()
visited.add(node)
for neighbor in graph[node]:
if neighbor not in visited:
if neighbor == end:
return path + [node, neighbor]
queue.append((neighbor, path + [node]))
return None
After fixing the indentation errors in those online Python programs, it occurred to me that, in the worst case,
fixing indentation errors in Python programs is an NPhard problem.
We can use the subset sum problem, which is NPhard, to prove that
the problem of fixing indentation errors in a Python program is NPhard.
We'll use the following instance of the subset sum problem:

integers = [3, 34, 314, 4, 12, 5, 2]

target = 9
The solution is [4, 5], because 4 + 5 = 9.
The following Python program prints the solution to this instance of the subset sum problem:
#
# Program that prints the solution for the following instance of the subset sum problem:
# integers = [3, 34, 314, 4, 12, 5, 2], target = 9
#
integers = [ 3, 34, 314, 4, 12, 5, 2 ]
target = 9
subset = []
if False:
pass
subset.append(integers[0])
if False:
pass
subset.append(integers[1])
if False:
pass
subset.append(integers[2])
if False:
pass
subset.append(integers[3])
if False:
pass
subset.append(integers[4])
if False:
pass
subset.append(integers[5])
if False:
pass
subset.append(integers[6])
if len(subset) > 0 and sum(subset) == target:
print(f"The sum of subset {subset} is {target}.")
Now we introduce indentation errors, so that none of the integers in the integers list are added to the subset list:
#
# Program that prints the solution for the following instance of the subset sum problem:
# integers = [3, 34, 314, 4, 12, 5, 2], target = 9
#
integers = [ 3, 34, 314, 4, 12, 5, 2 ]
target = 9
subset = []
if False:
pass
subset.append(integers[0])
if False:
pass
subset.append(integers[1])
if False:
pass
subset.append(integers[2])
if False:
pass
subset.append(integers[3])
if False:
pass
subset.append(integers[4])
if False:
pass
subset.append(integers[5])
if False:
pass
subset.append(integers[6])
if len(subset) > 0 and sum(subset) == target:
print(f"The sum of subset {subset} is {target}.")
With these indentation errors, the program will not print the solution for the instance of the subset sum problem.
It's easy to correct the indentation in this program, because we know that the solution subset is [4, 5].
However, if the instance of the subset sum problem in this Python program were much larger, then it would be very difficult
to fix the indentation errors.
The following is a brief sketch of the proof that the problem of fixing indentation errors in a Python program is NPhard:

The subset sum problem is NPhard.

An instance of the subset sum problem can be encoded (in polynomial time) into a Python program like the one above (where all the
subset.append()
statements are indented).

Fixing the indentation in the Python program is as least as hard as finding the solution to the instance of the subset sum problem.

Finding the solution to an instance of the subset sum problem is as least as hard as deciding whether or not the instance has a solution.

Deciding whether or not the instance of the subset sum problem has a solution is an NPhard problem.

Thus, the problem of fixing the indentation in a Python program is NPhard.
There are other programming languages that have significant indentation, including the following:

ABC

CoffeeScript

Nim

occam

SageMath

Scala (optional mode)

etc.
It's a safe bet that the problem of fixing indentation errors in these other languages is also NPhard.
Thanks to N. J. A. Sloane for his helpful feedback on an earlier version of this article.
