Building a Turing Machine Simulator With Ruby (Part 1)

To better understand Turing’s machine and its contributions to mathematics and computer science, we will build a simulator in Ruby. This simulator will take a description of the machine’s configuration, create a model of the machine, and run through its steps. We will be able to observe the action of the head and changes to the tape at each step. Hopefully this will serve to illuminate some aspects of Turing’s machine that might otherwise be less accessible.

Although the Turing machine is often thought of as prefiguring later computer science innovations such as the state machine, its original purpose was quite different. Turing created his machine in an attempt to understand the limits of mechanical computation, i.e., the limits of computability. The paper that introduced the Turing machine, published in 1936, refers to “computable numbers” and their use in solving David Hilbert’s Entscheidungsproblem, or “decision problem”.

Turing’s Marvelous Machine

Informally, a Turing machine is a mathematical model of a machine that mechanically operates on a tape. This tape contains squares where the machine can read or print a symbol using a tape head. The machine can also move left and right over the tape, one square at a time. The machine’s operation is fully determined by a list of elementary instructions such as “in state 42, if the symbol seen is 0, print a 1; if the symbol seen is 1, shift to the right and change into state 17; in state 17, if the symbol seen is 0, print a 1 and change into state 6;” etc. Turing called these instructions m-configurations. Modern computer scientists typically refer to them as states (as the Turing machine is a type of Finite-state machine). I will refer to them interchangeably as “instructions” or “states”. Turing, in his paper, labeled these instructions using gothic script lower-case letters. We will simply use lower-case letters.

A bit more formally, a Turing machine can be specified as the 3-tuple $M= \langle Q, \Gamma, I \ \rangle$. $Q$ is a finite, non-empty set of states where $q_0 \in Q$ is the first state; $\Gamma$ is a finite, non-empty set of tape alphabet/symbols with $S_0 \in \Gamma$ representing a blank, also called $None$ (the only symbol allowed to occur on the tape infinitely often at any step during the computation); and $I$ is a finite, non-emtpy set of instructions. An instruction $i \in I$ can be defined with the 5-tuple, or quint, $\langle q_i, S_j, S_k/N/E, L/R, q_m \rangle$, which consists of the following:

• $q_i \in Q$, the current state.
• $S_j \in \Gamma$, the symbol scanned.
• The symbol to be printed
• $S_k \in \Gamma$, print the symbol $S_k$
• $N$, equivalent to $S_j$, indicates a “noop” (alternatively, print the current symbol again).
• $E$, equivalent to $S_0$, print a blank (erasure).
• $L$, move the head left.
• $R$, move the head right.
• $q_m \in Q$, the new state.

This is a bit less formal than the definition used by Hopcroft and Ullman, for instance, but should suffice for our purposes.

Our First Turing Machine

With the formal definition out of the way, let’s turn to Turing’s first machine, which computes the sequence 0 1 0 1 0 1…. This machine’s 3-tuple looks like this:

• Q, the states: { b, c, e, f }
• Γ, the symbols: { 0, 1}
• I, the instructions: $\{ \langle b, None, 0, R, c \rangle, \langle c, None, N, R, e \rangle, \langle e, None, 1, R, f \ \rangle, \langle f, None, N, R, b \rangle \}$

In Turing’s original table form, these same instructions are represented as:

Configuration Behavior
m-configuration Tape symbol Tape operations Final m-configuration
b None P0, R c
c None R e
e None P1, R f
f None R b

Turing describes the above table as follows:

This [example] table (and all succeeding tables of the same kind) is to be understood to mean that for a configuration described in the first two columns the operations in the third column are carried out successively, and the machine then goes over into the m-configuration in the final column

Once Again, With ASCII

For our Turing machine simulator, I decided to use an ASCII representation of the Turing machine’s instructions. In this format, the instructions look like this:

The m-configurations (instructions) in my machines will begin with b, just as Turing’s did. (I believe he used b for begin.)

The BNF grammar for this representation is:

<instruction> ::= <ident> "," <symbol> "," <operation> "," <ident> <EOL>
<ident> ::= <char>
<symbol ::= <char> | "None"
<operation> ::= <movement> | <print> <movement>
<print> ::= 'P' <char>
<movement ::= 'R' | 'L'


Representing An Instruction

To begin modelling our simulated Turing machine, we need some way to represent instructions internally. We’ll use Ruby arrays as tuples and Ruby symbols as, well, symbols (with the exception of None, 0 and 1, which will be be represented as nil and the integers 0 and 1, respectively). Tape operations will be represented as another array of tuples, to wit: Pn as [ :print, n ], E as [ :erase ], L as [ :left ], and R as [ :right ].

Here is our Ruby representation of the above instruction list:

Parsing An Instruction

Writing a parser for the ASCII instruction format is a fun little exercise. As a Turing machine is a finite-state machine, it should be no surprise that this specification grammar forms a regular language (which can be accepted with a finite-state machine). This means that we can parse it with regular expressions—without having two problems! (Actually, we’ll use the comma separation to make tokenization a bit easier with String#split, but we’re getting ahead of ourselves.)

We’ll use a Turing::Parser class to parse the entire specification. It will split lines and delegate to Turing::Parser::Line class to parse individual instruction lines. This in turn will need to parse (lex) individual tokens into a Ruby representation. We’ll start there:

This will take a token like “None” and return its Ruby representation, in this case nil. Bits will be converted to Ruby Integers (close enough). Characters will be interned (converted to Ruby Symbols).

Next, we’ll deal with the operations. We can write a simple string scanner that will lex the operation string and turn it into a list of operation tuples (as described above).

Now we can parse each instruction line by converting tokens with parse_token and converting the operations with parse_operations (which, of course, uses parse_token itself to lex the tokens it scans):

Using A Struct For Instructions

We can add a bit of semantic value to the tuples we use to represent machine instructions by wrapping them in a struct. In this way, the struct is functioning like an intermediate data type. The struct will also make it easier to unpack specific values out of the tuple. Structs also have the useful property of duck typing as arrays (through the use of to_ary). You can even splat them, which will come in handy later.

Parsing The Machine Specification

Now that we can parse in individual line, parsing the entire specification is just a matter of mapping lines to their parsed version:

That’s (Not) All, Folks

In upcoming parts, we’ll use these instructions to form the basis of our Turing machine’s configuration system; we’ll implement the machine’s tape and tape head, which will allow us to step through its execution; and we’ll add a simple renderer that will print out the machine’s operations so we can follow along.