# Is It Codeable?

Years ago, I worked with my good friend Jim at a telecom company. He wrote the requirements for new network features that I developed. I remember when he burst into my office one day to describe a great idea for a new feature. After he finished telling me about the feature, he asked "Is it codeable?" I recall being impressed that Jim, who majored in engineering rather than computer science, had an intuitive understanding that not all programming problems are solvable.

I was a computer science student when I first learned that there are math and computer science problems that are undecidable (i.e., unsolvable). Like many students, I was surprised to learn that mathematicians and computer scientists have proved the undecidability of a variety of problems. A problem is undecidable if it is not possible to devise an algorithm that solves the problem. It's widely believed that if a problem is undecidable, then it's not "codeable" (i.e., it can't be solved using software). Perhaps the most famous undecidable problem is Alan Turing's Halting Problem.

While surfing the web recently, I learned about the Post Correspondence Problem (PCP), which is another famous undecidable problem, which we'll examine below. The proof that the PCP is undecidable can be built upon the undecidability of the Halting Problem. The undecidability of the PCP has often been used to prove the undecidability of other problems. It occurred to me that the undecidability of the PCP could be used to prove the undecidability of the Schematron Conformance Problem.

In the remainder of this paper, we'll describe the Schematron Conformance Problem (SCP) and Post Correspondence Problem (PCP). We'll then prove that the Schematron Conformance Problem is undecidable (even if we restrict ourselves to Schematron schemas that don't use the `document` and `key` functions). We assume that you are familiar with the basics of Schematron, which is a powerful XML schema language in which validity constraints are defined using XPath expressions. For information about Schematron, please see the Schematron resources that are listed in the Further Reading section below.

The following is the outline of this paper:

# What is the Schematron Conformance Problem (SCP)?

The Schematron Conformance Problem (SCP) is defined as follows:

Given a Schematron schema, is there an XML document that conforms to this schema?

When we say that the SCP is undecidable, we mean that it's not possible to devise an algorithm that accepts any Schematron schema as input and responds with one of the following two outputs:

• "Yes" if there is an XML document that conforms to the Schematron schema.
• "No" if there is no XML document that conforms to the Schematron schema.

The fact that the SCP is undecidable doesn't mean that you can't devise a specific algorithm that determines if there is an XML document that conforms to a particular schema. The undecidability of the SCP does mean that it is impossible to devise a general algorithm that can make this determination for any Schematron schema.

Note that the SCP presumes that there are Schematron schemas for which there are no conforming XML documents. In other words, the SCP presumes that you can find a Schematron schema, such that there are no XML documents that conform to this Schematron schema. For example, the following is a Schematron schema that has no conforming XML documents:

```<?xml version="1.0"?>
<schema xmlns="http://www.ascc.net/xml/schematron">

<!-- This schematron does not match any XML document.
-->

<pattern name="check-root">
<rule context="/">
<assert test="count(*) != 1">
Error: The number of document elements is
equal to 1.
</assert>
</rule>
</pattern>

</schema>
```

An XML document would conform to this Schematron schema only if the document does not have exactly one document element. However, all XML documents have exactly one document element. So, there are no XML documents that conform to this Schematron schema.

The fact that the SCP is undecidable means that a Schematron authoring tool (i.e., an editor) would not be able to analyze the user's schema and always let the user know whether or not there are any XML documents that conform to the schema.

# Why is the SCP is Unsolvable?

To prove that the SCP is undecidable, we show, by way of contradiction, that if the SCP is decidable, then the PCP is decidable. But before delving into the proof, let's describe the PCP. (If you are already familiar with the PCP, then you may want to jump ahead to the Back to the Proof section.)

## The Post Correspondence Problem (PCP)

In 1946, Emil Post proved that the following problem is undecidable:

Let A be an alphabet with at least two letters, and let L and M be two lists of nonempty strings over A, such that L and M have the same number of strings. We can represent L and M as follows:

• L = ( s1, s2, s3, ..., sk )
• M = ( t1, t2, t3, ..., tk )

Does there exist a sequence of one or more integers, which we represent as ( i1, i2, i3, ..., im ), that meet the following requirements?

• Each of the integers is greater than or equal to one.
• Each of the integers is less than or equal to `k`. (Recall that each list has `k` strings).
• The concatenation of si1, si2, si3, ..., and sim is equal to the concatenation of ti1, ti2, ti3, ..., and tim.

Let's look at some examples. We'll look at three instances of the PCP; the first two have solutions and the third one has no solution.

### Example #1 of a PCP Instance

Consider the following instance of the PCP:

• Alphabet A = { a, b }
• List L = ( a, ab )
• List M = ( aa, b )

To enhance the readability of the solution to this PCP instance, we use different colors for the strings in each list; note that the color of the `i`th string of list L is the same as the color of the `i`th string of list M.

We see that ( 1, 2 ) is a sequence of integers that solves this PCP instance, since the concatenation of "a" and "ab" is equal to the concatenation of "aa" and "b" (i.e., aab = aab).

Other solutions include:

• ( 1, 2, 1, 2 )
• ( 1, 2, 1, 2, 1, 2 )
• and so on

### Example #2 of a PCP Instance

Consider the following instance of the PCP (from David Matuszek's Post's Correspondence Problem course notes):

• Alphabet A = { a, b }
• List L = ( a, abaaa, ab )
• List M = ( aaa, ab, b )

A sequence of integers that solves this problem is ( 2, 1, 1, 3 ), since the concatenation of "abaaa", "a", "a" and "ab" is equal to the concatenation of "ab", "aaa", "aaa" and "b" (i.e., abaaaaaab = abaaaaaab).

### Example #3 of a PCP Instance

Consider the following instance of the PCP:

• Alphabet A = { a, b, c }
• List L = ( a, b, c )
• List M = ( aa, bb, cc )

There is no solution to this problem, since for any potential solution, the concatenation of the strings from list L will contain half as many letters as the concatenation of the corresponding strings from list M.

## An Alternative Description of the PCP

The following is a less abstract description of the PCP:

We have a finite number of rectangles. Each rectangle consists of an upper half and a lower half. On each half of each rectangle, there is a non-empty string (e.g., "aabb"). The following is an illustration of such a rectangle:

 aabb ccab

Can one or more of these rectangles be arranged in a row (left to right), such that the following requirements are met?

• A rectangle may appear in the row once, multiple times or not at all.
• When constructing the row, you are free to place the rectangles in any order. However, you can't rotate any of the rectangles (e.g., you can't flip one upside down).
• The concatenation of the strings in the upper half of the row is equal to the concatenation of the strings in the lower half of the row.

### Reformulation of Example #1

So, we can reformulate our first example as the following set of rectangles:

 a aa

 ab b

The strings in the top halves of the rectangles are the strings in list L, and the strings in the bottom halves of the rectangles are the strings in list M. The first string in list L (i.e., "a") is the string in the top half of the first rectangle, and the second string in list L (i.e., "ab") is the string in the top half of the second rectangle. Likewise, the first string in list M (i.e., "aa") is the string in the bottom half of the first rectangle, and the second string in list L (i.e., "b") is the string in the bottom half of the second rectangle.

This instance of the PCP can be solved with the following row, since the concatenation of the strings in the top half of the row (i.e., "aab") is equal to the concatenation of the strings in the bottom half of the row:

 a ab aa b

### Reformulation of Example #2

We can reformulate our second example as the following set of rectangles:

 a aaa

 abaaa ab

 ab b

This instance of the PCP can be solved with the following row, since the concatenation of the strings in the top half of the row is "abaaaaaab" and the concatenation of the strings in the bottom half of the row is also "abaaaaaab":

 abaaa a a ab ab aaa aaa b

## Back to the Proof

Now that we understand the PCP, we can get back to our proof that the SCP is undecidable.

Recall that we assumed that the SCP is solvable (i.e., we assumed that there is an algorithm for solving the SCP). Now we extend this algorithm into a new algorithm that solves the PCP.

The input to the new algorithm is an instance of the PCP (i.e., two lists of nonempty strings over an alphabet containing at least two letters, such that the two lists have the same number of strings).

The following is the new algorithm that solves the PCP:

1. We assume that the strings in the two lists do not contain any whitespace characters. (For PCP instances where this assumption is not true, we could always transform the PCP instance into an equivalent instance by replacing each whitespace character with a non-whitespace character.)
2. We transform the PCP instance into a Schematron schema, such that only XML documents that contain a solution to the PCP instance will conform to the Schematron schema. We'll demonstrate in the next section how to construct such Schematron schemas.
3. We use the algorithm that solves the SCP to determine whether or not there are any XML documents that match our Schematron schema. If there is an XML document that matches our Schematron schema, then our new algorithm responds with "Yes" (i.e., yes, there is a solution to the PCP instance). If there are no XML documents that match our Schematron schema, then our new algorithm responds with "No" (i.e., no, there is no solution to the PCP instance).

So, we now have an algorithm that solves the PCP. However, the PCP is unsolvable. So, our initial assumption (that there is an algorithm for solving the SCP) must be false. Therefore, the SCP is undecidable.

Now we'll show how a PCP instance can be transformed into a Schematron schema, such that an XML document conforms to the schema only if the XML document contains a solution to the PCP instance.

### XML Document Containing a Solution from Example #2

First, we describe the format of the XML documents that contain solutions to PCP instances. The following XML document contains a solution for the PCP instance of example #2:

```<?xml version="1.0"?>
<pcp-instance-solution>
<list-1>
<string id="2">abaaa</string>
<string id="1">a</string>
<string id="1">a</string>
<string id="3">ab</string>
</list-1>
<list-2>
<string id="2">ab</string>
<string id="1">aaa</string>
<string id="1">aaa</string>
<string id="3">b</string>
</list-2>
</pcp-instance-solution>
```

Recall that a solution to this PCP instance is the following sequence of integers: ( 2, 1, 1, 3 ). This sequence of integers is specified by the `id` attributes of the `/pcp-instance-solution/list-1/string` elements; the solution is specified again by the `id` attributes of the `/pcp-instance-solution/list-2/string` elements.

The `/pcp-instance-solution/list-1` element contains the 2nd string of list L, followed by the 1st string of list L, followed by the 1st string of list L, followed by the 3rd string of list L. The `/pcp-instance-solution/list-2` element contains the 2nd string of list M, followed by the 1st string of list M, followed by the 1st string of list M, followed by the 3rd string of list M.

Notice that the concatenation of the values of the `/pcp-instance-solution/list-1/string` elements is equal to the concatenation of the values of the `/pcp-instance-solution/list-2/string` elements. (In other words, "abaaaaaab" is equal to "abaaaaaab".)

You may have noticed that this XML document contains much more information than the sequence of integers that solves the problem. The extra information makes it easier for the reader to verify the solution. Also, if the document merely specified a list of integers that solved the PCP instance, then I would not have been able to write a Schematron schema that verifies that the solution is correct (unless the schema used the `document` function to invoke a CGI script, which would be cheating!).

### Schematron Schema for Example #2

The following is a Schematron schema for example #2. An XML document will match this schema only if it provides a solution for the PCP instance of example #2.

```<?xml version="1.0"?>
<schema xmlns="http://www.ascc.net/xml/schematron">

<!-- This Schematron schema is used to verify that
an XML document contains a solution to the
following instance of the Post Correspondence
Problem (PCP):

Alphabet A = { a, b }

List L = ( a, abaaa, ab )

List M = ( aaa, ab, b )

An XML document conforms to this Schematron
schema if it contains a solution to this
PCP instance.

If an XML document conforms to this Schematron
schema, then the schema will not issue any
messages when validating the document. If an
XML document does not conform to this Schematron
schema, then the schema will issue one or more
error messages when validating the document.
-->

<pattern name="check-root">
<rule context="/">
<assert test="pcp-instance-solution">
The /pcp-instance-solution element was not
found.
</assert>
</rule>
</pattern>

<pattern name="check-pcp-instance-solution">
<rule context="/pcp-instance-solution">
<assert test="list-1">
The /pcp-instance-solution/list-1 element
is missing.
</assert>
<assert test="list-2">
The /pcp-instance-solution/list-2 element
is missing.
</assert>
<assert test="count(*) = 2">
The /pcp-instance-solution element has an
incorrect number of subelements.
</assert>
<assert test="translate(
normalize-space(list-1),
' ',
'' ) =
translate(
normalize-space(list-2),
' ',
'' )">
The concatenation of the
/pcp-instance-solution/list-1/string/text()
nodes is not equal to the
concatenation of the
/pcp-instance-solution/list-2/string/text()
nodes.
</assert>
</rule>
</pattern>

<pattern name="check-list-1">
<rule context=
"/pcp-instance-solution/list-1/string[@id='1']">
<assert test=". = 'a'">
The value of the
/pcp-instance-solution/list-1/string[@id='1']
element is not equal to 'a'.
</assert>
</rule>
<rule context=
"/pcp-instance-solution/list-1/string[@id='2']">
<assert test=". = 'abaaa'">
The value of the
/pcp-instance-solution/list-1/string[@id='2']
element is not equal to 'abaaa'.
</assert>
</rule>
<rule context=
"/pcp-instance-solution/list-1/string[@id='3']">
<assert test=". = 'ab'">
The value of the
/pcp-instance-solution/list-1/string[@id='3']
element is not equal to 'ab'.
</assert>
</rule>
<rule context=
"/pcp-instance-solution/list-1/string">
<assert test="@id = '1' or
@id = '2' or
@id = '3'">
The value of a
/pcp-instance-solution/list-1/string/@id
attribute is not equal to '1', '2' or '3'.
</assert>
</rule>
<rule context=
"/pcp-instance-solution/list-1/text()">
<assert test="normalize-space( . ) = ''">
Found a non-whitespace text node as
child of the
/pcp-instance-solution/list-1 element.
</assert>
</rule>
<rule context=
"/pcp-instance-solution/list-1/*">
<assert test="name( . ) = 'string'">
The /pcp-instance-solution/list-1 element
has a subelement other than a string
element.
</assert>
<assert test="@id">
A subelement of the
/pcp-instance-solution/list-1 element
has no id attribute.
</assert>
</rule>
<rule context="/pcp-instance-solution/list-1">
<assert test="string">
The /pcp-instance-solution/list-1 element
does not contain any string subelements.
</assert>
</rule>
<rule context="/pcp-instance-solution/list-1">
<assert test="count( string ) =
count( ../list-2/string )">
The /pcp-instance-solution/list-1 element
does not have the same number of string
elements as the
/pcp-instance-solution/list-2 element.
</assert>
</rule>
</pattern>

<pattern name="check-list-2">
<rule context=
"/pcp-instance-solution/list-2/string[@id='1']">
<assert test=". = 'aaa'">
The value of the
/pcp-instance-solution/list-2/string[@id='1']
element is not equal to 'aaa'.
</assert>
</rule>
<rule context=
"/pcp-instance-solution/list-2/string[@id='2']">
<assert test=". = 'ab'">
The value of the
/pcp-instance-solution/list-2/string[@id='2']
element is not equal to 'ab'.
</assert>
</rule>
<rule context=
"/pcp-instance-solution/list-2/string[@id='3']">
<assert test=". = 'b'">
The value of the
/pcp-instance-solution/list-2/string[@id='3']
element is not equal to 'b'.
</assert>
</rule>
<rule context=
"/pcp-instance-solution/list-2/string">
<assert test="@id = '1' or
@id = '2' or
@id = '3'">
The value of the
/pcp-instance-solution/list-2/string/@id
attribute is not equal to '1', '2' or '3'.
</assert>
</rule>
<rule context=
"/pcp-instance-solution/list-2/text()">
<assert test="normalize-space( . ) = ''">
Found a non-whitespace text node as the
child of the
/pcp-instance-solution/list-2 element.
</assert>
</rule>
<rule context=
"/pcp-instance-solution/list-2/*">
<assert test="name( . ) = 'string'">
The /pcp-instance-solution/list-2 element
has a subelement other than a string
element.
</assert>
<assert test="@id">
A subelement of the
/pcp-instance-solution/list-2 element
has no id attribute.
</assert>
</rule>
<rule context=
"/pcp-instance-solution/list-2">
<assert test="string">
The /pcp-instance-solution/list-2 element
does not contain any string subelements.
</assert>
</rule>
</pattern>

<pattern name="check-correspondence">
<rule context=
"/pcp-instance-solution/list-1/string">
<assert test=
"@id = ../../list-2/string[
count( current()/preceding-sibling::* )
+ 1 ]/@id">
The value of an id attribute in a
/pcp-instance-solution/list-1/string
element is not equal to the value of the
id attribute of the corresponding
/pcp-instance-solution/list-2/string
element.
</assert>
</rule>
</pattern>

</schema>
```

The validations performed by this schema include the following, which verify that the solution solves the PCP instance:

• The concatenation of the values of the `string` elements in the `list-1` element must be equal to the concatenation of the values of the `string` elements in the `list-2` element.
• The value of each `list-1/string` element must be equal to the `i`th string in list L, where `i` is the value of the `id` attribute of the `list-1/string` element.
• Likewise, the value of each `list-2/string` element must be equal to the `i`th string in list M, where `i` is the value of the `id` attribute of the `list-2/string` element.
• The value of the `@id` attribute in the `n`th `string` element of the `list-1` element must be equal to the value of the `@id` attribute in the `n`th `string` element of the `list-2` element.
• The `list-1` element must contain at least one `string` subelement.
• Likewise, the `list-2` element must contain at least one `string` subelement.
• The `list-1` element must have the same number of `string` subelements as the `list-2` element.

The schema also performs the following validations of the structure of the document:

• The document element must be `pcp-instance-solution`.
• The subelements of the `pcp-instance-solution` must be the `list-1` and `list-2` elements.
• The subelements of the `list-1` and `list-2` elements must be `string` elements.
• Each `string` element must contain an `id` attribute.
• Each `string` element must not have any subelements and must not contain any attributes other than the `id` attribute.
• Etc.

# The Schematron Intersection Problem

A corollary of the undecidability of the Schematron Conformance Problem (SCP) is that the Schematron Intersection Problem (SIP) is undecidable. The SIP is defined as follows:

Given two Schematron schemas, is there an XML document that conforms to both schemas?

To prove the corollary, we show, by way of contradiction, that if the SIP is decidable, then the SCP is decidable. If the SIP is decidable, then we can devise an algorithm that accepts any two Schematron schemas as input and responds with one of the following:

• "Yes" if there is an XML document that conforms to both schemas.
• "No" if there is no XML document that conforms to both schemas.

Now we extend this algorithm into a new algorithm that solves the SCP.

The input to the new algorithm is a Schematron schema, which we'll call S. We pass S and another Schematron schema, which we'll call T, as inputs to the algorithm that solves the SIP. T is the following Schematron schema, which matches all XML documents:

```<?xml version="1.0"?>
<schema xmlns="http://www.ascc.net/xml/schematron">

<!-- This schematron schema matches any XML document.
-->

<pattern name="check-root">
<rule context="/">
<assert test="count(*) = 1">
Error: The number of document elements is
not equal to 1.
</assert>
</rule>
</pattern>

</schema>
```

Our new algorithm responds with one of the following:

• "Yes" if the algorithm that solves the SIP determines that there is an XML document that conforms to both S and T. Thus, the new algorithm responds with "Yes" if there is an XML document that conforms to S.
• "No" if the algorithm that solves the SIP determines that there is no XML document that conforms to both S and T. Thus, the new algorithm responds with "No" if there is no XML document that conforms to S, since all XML documents conform to T.

Thus, the new algorithm solves the SCP (i.e., the SCP is decidable). However, since we showed that the SCP is undecidable, our initial assumption (that the SIP is decidable) must be false. Therefore, the SIP is undecidable.

# What About Other XML Schema Languages?

Is the schema conformance problem undecidable for other XML schema languages, such as XML Schema, DTD and RELAX NG? In their paper entitled What's Hard about XML Schema Constraints?, Marcelo Arenas, Wenfei Fan and Leonid Libkin show that the XML Schema Conformance Problem is undecidable.

The DTD Conformance Problem looks solvable. It seems easy to construct an XML document that conforms to a DTD, as long as the DTD contains at least one element type declaration and contains an element type declaration for each child element type that is mentioned in the element type declarations.

It is possible to write a RELAX NG schema, such that there are no XML documents that conform to the schema. The trick is to use a data type that includes two contradictory constraints (e.g., a string must consist of an even number of 'a' characters, but must be 3 characters in length).

For example, there are no XML documents that conform to the following RELAX NG schema:

```<?xml version="1.0" encoding="UTF-8"?>
<element name="conflicted"
xmlns="http://relaxng.org/ns/structure/1.0"
datatypeLibrary=
"http://www.w3.org/2001/XMLSchema-datatypes">
<data type="string">
<param name="pattern">(aa)+</param>
<param name="length">3</param>
</data>
</element>
```

So, the RELAX NG Conformance Problem is an interesting problem, but it doesn't seem to be unsolvable.

# Summary

In this paper, we showed that the Schematron Conformance Problem (SCP) is undecidable. This is true even if we restrict ourselves to Schematron schemas that do not use the `document` and `key` functions of XPath. To prove the undecidability of the SCP, we showed that if the SCP is decidable, then the Post Correspondence Problem is decidable; however, the Post Correspondence Problem is undecidable. Therefore, the SCP must be undecidable.

The fact that the SCP is undecidable means that it's impossible to build a Schematron schema editor that evaluates the user's schema and always lets him/her know whether or not there are any XML documents that conform to the schema.

# Acknowledgements

Thanks to John Cowan for his comments on DTD conformance.