cl: Combinatory Logic Interpreter

version 1.6

last revision: 2011-06-15

Bruce Ediger

Table of contents

  1. Introduction
  2. Starting
  3. Using
  4. Built-in Primitives
  5. Bracket Abstraction
  6. Building and installing

Introduction

This document describes how to build and use cl v1.6. cl interprets a programming language with a lot of similarities to various "Combinatory Logic" (CL) formal systems. It doesn't exactly interpret any "Combinatory Logic" in that it runs on computers with finite CPU speed and a finite memory. Most or all formal systems fail to take these limits into account.

Release 1.6 fixes a few bugs, tries to improve performance by eliminating code from the most common execution path and adds a way to stop a reduction when the result matches a pattern.

I started writing this interpreter as a way to test the "graph reduction" implementation of a lambda calculator. Along the way, I bought a copy of Raymond Smullyan's To Mock A Mockingbird. While working that book's problems, I ended up making cl into a reasonably complete and usable system.

Another page documents the design and implementation of an earlier version of cl. That document still holds as a good description of the internals of cl.

Starting the interpreter

After building the interpreter's executable, you can start it from the command line:

 7:57AM 87 % ./cl
CL>

The interpreter uses "CL>" as its prompt for user input. cl has a strict grammar, so you must type either a term for reduction, or an interpreter command, or a command to examine a term.

You have to use keyboard end-of-file (usually control-D) to exit cl.

Giving the interpreter a CL term causes it to execute a read-eval-print loop. After reading and parsing the input, the interpreter prints the input in a minimally-parenthesized representation, reduces it to normal form, and prints a text representation of the normal form. It prints a prompt, and waits for another user input.

cl does "normal order" evaluation: the leftmost outermost redex gets evaluated first. This seems like the standard for CL, unlike for lambda calculus, where a lot of ink gets expended distinguishing between "normal order" and "applicative order".

Command line options

-cenable reduction cycle detection
-ddebug contractions
-L filenameInterpret a file named filename before reading user input
-mon exit, print memory usage summary
-N numberperform up to number contractions
-pDon't print any prompt.
-ssingle-step contractions
-T numberevaluate an expression for up to number seconds
-ttrace contractions
-C combinatortreat combinator as a non-primitive. Combinator one of S, K, I, B, C, W, M, T, J
-B algorithmUse algorithm as default for bracket abstraction. One of curry, curry2, church, tromp, grz, btmk, turner

The -s option has no use without the -t option, but -t alone can perform handy tasks.

You can use one or more -C combinator options to confine yourself to a given basis, or turn off the K combinator to work in λI calculi.

-L filename can occur more than one time. cl will interpret the files in the order they appear on the command line. After the last (or only) file, it prints the CL> prompt, then waits for interactive user input.

Using the interpreter

Interactive input

I designed cl for use as an interactive system, with a user typing CL expressions at a prompt. The interpreter reduces the expression to a normal form (if it has one), or hits some other limit, like a pre-set timeout, count of allowed contractions or the user's patience.

The built-in prompt for input is the string "CL>". It appears when the interpreter starts up, or has finished reducing whatever expression the user gave it last, or it has executed an interpreter command.

You have to type an end-of-file character (almost always control-D) to quit, as it has no built-in "exit" or "quit" command.

A keyboard interrupt (almost always control-C) can interrupt whatever long-running reduction currently takes place. A keyboard interrupt at the "CL>" prompt will cause the interpreter to exit.

Non-interactive input

The -p command line option causes the interpreter to not print a prompt. This might only have use for non-interactive input. The interpreter does read from stdin and write to stdout. You can use it as a non-interactive "filter", with input and output redirection.

Grammar, briefly

Expressions consist of either a single term, or two (perhaps implicitly parenthesized) terms. Terms consist of either a built-in primitive or a variable, or a parenthesized expression.

Built-in pimitives have names consisting of a single upper-case letter. Variables (which can also serve as abbreviations) can look like C or Java style identifiers: a letter, followed by zero or more letters or underscores. You cannot define a variable (or an abbreviation) with the same name as a built-in pimitive.

The interpreter treats primitives and variables as "left associative", the standard in the Combinatory Logic literature. That means that an expression like this: I a b c d ends up getting treated as though it had parentheses like this: ((((I a) b) c) d)

To apply one complex term to another, the user must use parentheses. Applying W (W K) to C W would look like this: (W (W K)) (C W).

Parentheses

Users can parenthesize input expressions as much or as little as they desire, up to the limits of left-association and the meaning they wish to convey to the interpreter. The grammar used by cl does not allow single terms inside paired parentheses. It considers strings like "(I)" as syntax errors. You have to put at least two terms inside a pair of parentheses, and parentheses must have matches.

The interpreter prints out normal forms in minimal-parentheses style. Users have the ability to cut-n-paste output back into the input, as output has valid syntax. No keyboard shortcuts exist to re-use any previous output.

Built-in Primitives

I built-in 9 primitive combinators. They contract like this:

I a → a
K a b → a
S a b c → a c (b c)
B a b c → a (b c)
C a b c → a c b
W a b → a b b
T a b → b a
M a → a a
J a b c d → a b (a d c)

This set of built-ins lets you use {S, K}, {S, K, I}, {B, W, C, K}, {B, M, T, K} bases for λK calculi, or {J, I}, {B, C, W, I} and {I, B, C, S} as bases for λI calculi.

Built-in combinators require a certain number (one to four) of arguments to cause a contraction. They just sit there without that number of arguments.

You can "turn off" any of the nine combinators as built-ins with a -C X command line option (X is any of the nine built-in combinators). No interpreter command exists to turn off or on a combinator during a session.

Bracket Abstraction

"Bracket abstraction" names the process of creating a CL expression without specified variables, that when evaluated with appropriate arguments, ends up giving you the original expression with argument(s) in the place of the specified variables.

The cl interpreter uses the conventional square-bracket notation. For example, to create an expression that will duplicate its single argument, one would type:

CL> [x] x x

You can use more than one variable inside square brackets, separated with commas:

CL> [a, b, c] a (b c)

The above square-bracketed expression ends up working through three bracket abstractions, abstracting c from a (b c), b from the resulting expression, and a from that expression. You can nest bracket abstractions: [a][b][c] a (b c) should produce the same expression as the example above.

A bracket abstraction makes an expression, so you can use them where ever you might use any other simple or complex expression, defining an abbreviation, a sub-expression of a much larger expression, as an expression to evaluate immediately, or inside another bracket abstraction. For example, you could create Turing's fixed-point combinator like this:

CL> def U [x][y] (x(y y x))
CL> def Yturing (U U)

Note the use of nested bracket abstractions. The abstraction of y occurs first, then x gets abstracted from the resulting expression.

You could express [x][y] (x(y y x)) as [x,y] (x(y y x)). The same nested abstraction occurs.

Algorithms

cl offers seven bracket abstraction algorithms:

You can set a "default" algorithm with the this command: abstraction name. For name, substitute one of the seven algorithm names above. cl starts with curry as the default bracket abstraction algorithm.

You can specify the abstraction algorithm next to the abstracted variable: CL> [x]btmk (x (K x))

cl's grammar allows mixing algorithms, one for each pair of square brackets. The resulting expression may not make any sense.

Bracket Abstraction Algorithms

curry

  1. [x] x → I
  2. [x] N → K N    x does not appear in N
  3. [x] M N → S ([x] M) ([x] N)

curry2

  1. [x] x → I
  2. [x] N → K N    x does not appear in N
  3. [x] M x → M    x does not appear in M
  4. [x] M N → S ([x] M) ([x] N)

Hindley & Seldin call this "Definition 2.18" in the 2008 edition of Lambda Calculus and Combinators: An Introduction. This algorithm has the distinct advantage of giving back a single combinator when you ask for that combinator's definition as a bracket abstraction. [p,q,r] p r (q r) works out to S, and so forth.

btmk

I put in an algorithm that uses B, T, M and K combinators. I devised it, but presumably someone else has come up with this same algorithm in the past. The {B, T, M, K} basis is not unknown.

  1. [x] x → B (T M) K
  2. [x] Z → K Z          x not appearing in Z
  3. [x] Q x → Q           x not appearing in Q
  4. [x] Q P → B Q ([x] P)     x appears only in P, not in Q
  5. [x] Q P → B (T P) ([x] Q)     x appears only in Q, not in P
  6. [x] Q P → B (T (B (T [x]P) (B B [x]Q))) (B M (B B T))      x appears in both Q and P

grz

cl includes an implementation of M. Price and H. Simmon's reconstruction of Grzegorcyzk's "g" algorithm, using B, C, K, W, I combinators.

  1. [x] x → I
  2. [x] Z → K Z          x not appearing in Z
  3. [x] Q x → Q           x not appearing in Q
  4. [x] Q P → B Q ([x] P)     x appears only in P, not in Q
  5. [x] Q P → C ([x]Q) P     x appears only in Q, not in P
  6. [x] Q P → W((B(C([x]Q)))([x]P))     x appears in both P and Q

If you consider W K as a replacement for I, you can get away with using B, C, K and W.

church

Abstraction for I,J basis from A. Church, A Proof of Freedom From Contradiction, Proceedings of National Academy of Sciences, Vol 21, 1935, page 275.

  1. [x] x → I
  2. [x] U V → J (J I I) ([x]V) (U I J) where x appears only in V
  3. [x] U V → J (J I I) V [x]U where x appears only in U
  4. [x] U V → J (J I I) (J I I) (J I (J (J I I) (J I I) (J (J I I) [x]U (J (J I I) [x]V J)))) where x appears in U and V.

The {I,J} basis constitutes a "λI" basis. All well-formed abstractions contain the abstracted variable in the "body" of the abstraction. The above algorithm will not work given input like [x,y]x, where the abstracted variable y does not appear in the expression out of which y gets abstracted. In fact, that input crashes the interpreter.

tromp

John Tromp created a nine rule algorithm (in Binary Lambda Calculus and Combinatory Logic) which optimizes for the minimum number of S and K combinators in the abstracted expression. Tromp wants to get a small self-interpreter, and he confines himself to just S and K combinators. For example, the "S K M → S K" rule could change to "S K M → K I", if you have an I primitive.

  1. [x](S K M) → S K (For all M)
  2. [x] M → K M (x not appearing in M)
  3. [x] x → I
  4. [x] M x → M (x not appearing in M)
  5. [x] x M x → [x] (S S K x M)
  6. [x] (M (N L)) → [x] (S ([x] M) N L) (M, N combinators)
  7. [x] ((M N) L) → [x] (S M ([x] L) N) (M, L combinators)
  8. [x] ((M L) (N L)) → [x](S M N L) (M, N combinators)
  9. [x] M N → S ([x] M) ([x] N)

Phrases like M, N combinators in rules 6, 7 and 8 mean "M, N have no variables, free or otherwise". This abstraction algorithm has the goal of smallest resulting expression. For Tromp's purposes, any variable exists for the purpose of abstracting it out of the expression. These rules don't produce the smallest resulting expression if you abstract another variable from them.

I implemented rule 8 by doing a depth-first traversal of parts of parse trees to determine if the potential "L" portions of the rule equate. A faster method of doing that may exist.

turner

  1. [x] x → I
  2. [x] N x → N              x not appearing in N
  3. [x] N → K N              x not appearing in N
  4. [x] M N → C ([x]M) N       x appears only in M, not in N
  5. [x] M N → B M ([x] N)       x appears only in N, not in M
  6. [x] M N → S ([x]M) ([x]N)   x appears in both M and N

Defining abbreviations

define and def allow a user to introduce "abbreviations" to the input. Each time the abbreviation appears in input, cl makes a copy of the expression so abbreviated, and puts that copy in the input. No matter how complex the expression, an abbreviation still comprises a single term. Effectively, the interpreter puts the expanded abbreviation in a pair of parentheses.

def makes an easy-to-type abbreviation of define.

A name in interpreter input has the same format as a Java or C identifier: a letter, followed by zero or more letters, digits or underscores.

The reduce command produces an expression, just like a bracket abstraction. Unlike define or def, you can use reduce anywhere an expression would fit, as part of a larger expression, as part of an abbreviation, or as part of a bracket abstraction.

Information about expressions

print lets you see what abbreviations expand to, without evaluation, as does printc. The "=" sign lets you determine lexical equality. All combinators, variables and parentheses have to match as strings, otherwise "=" deems the expressions not equivalent. You can put in explicit reduce commands on both sides of an "=", otherwise, no evaluation takes place.

size and length seem redundant, but authorities measure CL expressions different ways. These two methods should cover the vast majority of cases.

Intermediate output and single-stepping

You can issue any of these commands without an "on" or "off" argument to inquire about the current state of the directive.

trace and debug make for increasingly verbose output. trace shows the expression after each contraction, debug adds information about which branch of an application it evaluates.

detect causes trace to also print a count of possible contractions (not all of them normal order reductions), and mark contractable primitives with an asterisk.

step on causes the interpreter to stop after each contraction, print the intermediate expression, and wait, at a ? prompt for user input. Hitting return goes to the next contraction, n or q terminates the reduction, and c causes it to quit single-stepping.

Reduction information and control

You can turn time outs off by using a 0 (zero) second timeout. Similarly, you can turn contraction-count-limited-evaluation off with a 0 (zero) count.

timer on also times bracket abstraction.

Some CL expressions end up creating a cycle: M M or W W W or B (W K) (W (W K)) (B (W K) (W (W K))). After a certain number of contractions, the interpreter ends up with an expression it has already encountered. If you issue the cycles on command, the interpreter keeps track of every expression in the current reduction, and stops when it detects a cyclical reduction.

detect on causes the interpreter to count and mark primitives eligible for contraction (with an asterisk), regardless of reduction order. It does "normal order" reduction, but ignores that for the contraction count. This only has utility with trace on.

Turning cycle detection on will add time to an expression's reduction, as will possible contraction detection.

Matching a pattern during reduction

cl v1.6 adds a way to control reduction: stop when the (partially) reduced expression matches a pattern. cl can perform pattern matching after each contraction, when the user sets a pattern using the match command. unmatch destroys the internal state created by match, so the user can't recall a previous match except by complete re-entry of the command.

Patterns look like expressions with one exception. An asterisk ('*') acts as a wildcard, matching any expression. Any other variable or built-in primitive ocurring in the pattern matches itself literally.

For example, issuing the command match S K K would cause the reduction of S (I K) (S K) K to stop after a single contraction.

Reading in files

You have to double-quote filenames with whitespace or non-alphanumeric characters in them. You can use absolute filenames (beginning with "/") or you can filenames relative to the current working directory of the cl process.

Canonical Expression Representation

The printc command, and cycle detection output use a canonical form of representing a CL expression.

In this case, "canonical" means: pre-order, depth-first, left-to-right traversal of the parse tree, output of a period (".") for each application, and a space before each combinator or variable.

A simple application (K I, for example) looks like this: .K I

A more complex application (P R (Q R)) looks like this: ..P R.Q R

The advantage to this sort of notation is that every application appears explicitly, and variant, semantically-equivalent parentheses don't appear.

Building and installing

I developed this program on Slackware and Arch Linuxes. I used versions of the C compilers GCC, LCC, PCC, TCC and Clang. I tried to make it as ANSI-C as possible by using various compilers, not allowing any compiler warnings when building, and using as many warning flags for GCC compiles as possible.

I expect it will build on any *BSD-based system, but I haven't tried that.

  1. Get the source code. You can:
  2. Unpack source code. From the command line: tar xzf cl-1.6.tar.gz
  3. Change directory. From the command line: cd cl-1.6
  4. Compile source code. From the command line: make gnu.
    That target ("gnu") should work for almost every linux or BSD computer. Other make targets might work better. make cc uses traditionally-unix-named tools, and may work better on Solaris. Traditional, AT&T derived lex will require a minor edit to file lex.l
  5. At this point, you can test the newly-compiled executable. From the command line: ./runtests. Most of the tests should run quite rapidly, in under a second. At least two of the tests run for 30 seconds or so, and at least one of the tests provokes a syntax error message from the interpreter.
  6. Install the interpreter if you want, or you can execute it in-place. To install, use the cp or mv commands to move or copy the executable to where ever you want it. It does not care what directory it resides in, and it does not look for configuration files anywhere.