Bruce Ediger

November 3, 2004


srf (Simple Recursive Functions) interprets a very simple programming language similar to Stephen Kleene's recursive functions. You can use srf to help understand recursive functions, or Peano arithmetic.

Command line

srf [-L filename [-L otherfile ...]] [-t timeout]

Once you've built and installed srf, you can start it at a shell prompt:

 9:54PM 3 % ./srf

For better or for worse, srf doesn't provide a prompt. You just enter things after starting it.

You can give it command line options. The -L option causes srf to read in a file or files before reading interactive input:

 9:55PM 4 % ./srf -L examples/numbers
load file named "examples/numbers"

You can use more than one -L option - srf reads them in order before allowing interactive input.

The -t option allows you to get srf to interrupt execution of long-running calculations after a certain number of seconds:

10:02PM 12 % ./srf -t 5 -L examples/functions
Unset value

srf reads from stdin, so you can also use it with shell redirection, or, bless your heart, as a filter.


The world needs another programming language like it needs a hole in the head. Why on earth did I choose to write this?

I wrote srf because I wanted to try out the assertions various text books made about recursive functions and Peano Arithmetic. The text books say that you can start with a few functions (zero, successor, identity or projection), a few ways to combine them (composition or substitution, primitive recursion, minimization) and end up with a fully functional system of arithmetic like we have all grown accustomed to. But the proofs never seem satisfying, depending on reductio ad absurdum and other abstract techniques. The proofs never build the real thing. I wanted to do what the text books said you can do with the primitive functions and methods of producting new functions. In this sense, you can use srf to make constructive proofs of the simple recursive nature of operations like addition, subtraction, exponentiation and so forth by implementing them.

The srf interpreter ends up right at the intersection of "programming language" and "computability".

When I was 12 or 13 I used to steal passwords out of trash cans at the local university, and log in to the Honeywell 1640 to play tic-tac-toe, or "lunar lander". Eventually I had the idea of doing my algebra homework in BASIC. The answers didn't come out as exact as I desired, so I wrote a program that looked like this:

10 A = 1
20 B = 3
30 C = A/B
40 D = C * 3
50 IF D <> 1 GOTO 30

I wanted to have a program calculate the exact value of one-third. The program above just looped endlessly. I had the crushing realization that I did not write a program that gave me the answer I wanted, and in fact I did not know how to write a program that gave me the answer I wanted. As I walked out of terminal room, I had the further crushing realization that whoever wrote the BASIC interpreter had known enough to write a programming language that allowed me to write some program that the interpreter writer would never even have thought to write. How did he/she/it/them know what to put into BASIC to allow this?

I've pondered this ever since. srf constitutes my cut at learning what to put in a programming language.


I chose to license srf under the GNU General Public License (GPL).

Download source code

I don't want to provide binaries. If you want to use srf you have to download the source code, build an executable and install it. Luckily, you don't have to do much to perform those actions.

Download source code


I used the GNU Autoconf/Automake system to help with portability. I tested srf with several lex and yacc variants, on at least three instruction set architectures. Nevertheless, I don't believe that srf will work under Windows.

The installation follows the usual, easy steps:

  1. gunzip -c srf-1.0.tar.gz | tar xf -
  2. cd srf-1.0
  3. ./configure
  4. make
  5. make check
  6. su
  7. make install

You don't really have to do the final two steps: srf makes no assumptions about the directory in which you have installed it. You can just copy the srf executable anywhere.

Development environment

I developed srf under SuSE 7.3 linux and Slackware 9.1 linux. I have compiled and tested srf with gcc 2.95, gcc 3.2.3 and lcc 4.1. I generated the lexer with flex and generated the parser with both byacc and yacc. I have compiled and tested srf under Solaris 9 with a SPARC CPU, using genuine lex and yacc, using Sun C 6U2. I compiled and tested both 32 and 64-bit versions. I have compiled srf on an HP PA-RISC CPU, under HP-UX 11.0. The variety of CPUs (x86, SPARC, PA-RISC) have both big- and little-endian addressing. The Sun compilers allowed me to test with 32-bit (ILP32) and 64-bit (LP64) code. The PA-RISC processor has the unusual convention of stack growing up (towards higher addresses) and heap growing down (towards lower addresses). I feel reasonably certain that no endianness or pointer size assumptions exist in the code.

Similar Work

One or more of the following may fit what you want or need to do better than srf does:

I found Dave Boozer's interpreter wanting: you have to have numerals to follow his identity (projection) function naming convention. My interpreter allows the programmer to do without numerals - he or she uses the return value of z() or s() or a built-up function.

Mike DePristo's Scheme definitions and Terry McConnell's C implementation seem very interesting but have the same underlying flaw: you can't truly verify that you only worked with built-in or built-up functions. You could mistakenly slip in some ordinary Scheme or C.

If you actually want to do proofs, rather than work with functions, perhaps Johan Granstrom's tt program would work for you.

Programming language


This programming language has next to no "design". I let Boolos, Burgess and Jeffrey, authors of "Computability and Logic" guide me nearly completely.

I did choose to make "statements" end in a semicolon, rather than just plain end-of-line, and I chose not to have symbols except at the very top level of interpretation. I made the latter decision based on enforcing purity on the programmer: I did not want to discover that I'd used a number symbol somewhere, violating basic axioms of Peano Arithmetic.

As a programming language, what srf interprets seems fairly crude. The interpreter understands three types: numbers, functions, and functors. Functions take zero or more numbers as arguments, and return a single number. Functors take functions as arguments, and return a single function.

Syntax and Semantics


srf considers all strings of non-whitespace and non-reserved characters as identifers. Reserved characters consist of left and right parentheses, left and right square brackets, equals sign, semicolon, comma, pound sign (octothorpe), and double-quotes. This lets the srf user name numerals the way that some writers do in expositions on Peano Arithmetic: 0, 0', 0'', 0'''.

Formal Grammar

Since yacc, byacc and bison all compile it, the following grammar counts as no more than LALR(1). It may fit in a much less complicated category, like LL(1), but I haven't investigated that at all.


TK_CONSTRUCTOR one of the strings "Mn", "Cn", "id" or "Pr"
TK_LOAD "load", used to read in a file of statements
TK_DEBUG "debug", used to turn debugging on in a specific function
TK_TIMEOUT "timeout", sets the interval srf interpreter lets a function run
TK_MODIFIER one of the strings "time" or "count"
STRING_CONSTANT A quoted string, used as a file name

The TK_CONSTRUCTOR terminal exists solely to keep function construction pure. Production 17 below forces the use of one of the "constructor" identifiers with square brackets in order to build up new functions from old. This use enshrines "Cn", "Pr", "Mn" and "id" as reserved keywords.


  1. program —> stmnt TK_SEMI

  2. program —> program stmnt TK_SEMI

  3. program —> error

  4. stmnt —> TK_IDENTIFIER TK_EQUALS value

  5. stmnt —> TK_IDENTIFIER

  6. stmnt —> function_call

  7. stmnt —> constructor


  9. stmnt —> TK_DEBUG value

  10. stmnt —> TK_TIMEOUT value

  11. stmnt —> ε empty production to allow for lonely ';'

  12. function_call —> TK_IDENTIFIER TK_LPAREN value_list TK_RPAREN

  13. function_call —> TK_MODIFIER TK_IDENTIFIER TK_LPAREN value_list TK_RPAREN

  14. function_call —> constructor TK_LPAREN value_list TK_RPAREN

  15. function_call —> TK_CONSTRUCTOR TK_LPAREN value_list TK_RPARENcatch error

  16. function_call —> TK_IDENTIFIER TK_LBRACKET value_list TK_RBRACKET

  17. constructor —> TK_CONSTRUCTOR TK_LBRACKET value_list TK_RBRACKET

  18. value_list —> value

  19. value_list —> value_list TK_COMMA value

  20. value_list —> ε    empty argument list

  21. value —> TK_IDENTIFIER

  22. value —> function_call

  23. value —> constructor

Productions 1 and 2 allow yacc-generated parsers to interpret input of more than 1 statement. Production 3 allows yacc-generated parsers to have a spot they can "unwind" to when they hit a syntax error. Production 3 basically allows usually correct recovery from syntax errors when using srf interactively.

Production 4 assigns a value (either numerical or a function) to an identifier. No numerals exist in srf without the programmer creating them. This production allows the creation of numerals.

Using production 4 to do something like this: "a = b;" causes srf to find the value previously assigned to "b" and then assign that value to a symbol "a". If "b" evaluates to a function, the function gets copied before assignment to "a".

Production 5 allows srf to print out the value of an identifier - it really has no mathematical reason for existance. If an identifier has a function as a value, srf prints out a representation that you can paste back in, provided you have assigned values to the right numerals.

Productions 8, 9 and 10 constitute the non-mathematical commands that srf understands.

Production 13 has great similarity to production 12. Production 13 turns on either a timer to measure function evaluation elapsed time, or a counter, to count the number of primitives (s(), z() and id[]) called during function evaluation.

Production 16 just catches a likely error: mistakenly using square brackets in an ordinary function call. srf only allows the use of square brackets when constructing new functions, handled by production 17.

Productions 18, 19 and 20 build up lists of formal parameters to functions. If an identifier appears in a parameter list, it gets evaluated almost immediately, and its value gets passed to the function in question.

Comments and whitespace in the interpreted language

The srf interpreter ignores everything from a pound sign ('#', a.k.a. octothorpe) to an end-of-line. You can have a comment line, or put comments at the end of a line of "code". srf ignores white space characters (space, tab, newline, etc). You can indent srf input as you please. You just have to end statements with semi-colons.

Built-in Functions

The srf interpreter starts with only 3 built-in functions:

One of the major flaws in srf lives in the these simple functions. z() should allow no arguments (or constitute a family of functions, like id), and s() should allow only one argument. With those additions, the interpreter could work out how many arguments a built-up function requires.

Many authors use some notation like id32 to name the function that returns the second of three arguments. As I limited myself to in-line text, I had to use some other notation. I chose to go along with composition and primitive recursion (below), using square brackets to set off the two "arguments" of id.

Even though the z and s functions don't check the number of arguments they get, s and id complain if they don't get enough arguments to do their defined tasks. In the case of calling s with no arguments, or a member of the id with too-few arguments, they return a value of 0, and print a message on stdout.

Ways to combine functions to produce other functions

All of the following take functions or identifiers of functions as arguments, and return a new function.


Also known as "substitution". srf has a way to make a function that calls one function, and then calls a second function with the result of the first. For example, typing s(s(X)); into srf adds two to the value of "X".

Typing in +2 = Cn[s, s]; yields a function named "+2" that adds 2 to the value of any argument you pass it.

Mathematicians often get sloppy about notation when talking about composition - they write f(g(x)) to mean "the function resulting from the composition of functions f and g". srf forces the programmer to make a distinction between composing functions f and g, and applying f to the result of g.

Primitive Recursion

Primitive recursion comprises one of two methods of "flow control" in the programming language interpreted by srf.

You need two functions to put together by primitive recursion: the function called when the recursed-on argument has zero as a value, and the function called with other arguments.

Note that srf's primitive recursion "recurses" on the final argument. Some authorities (Klaus Suttner, in his Computationally Discrete Mathematics) have the recursion take place on the first argument. Watch out.

You combine two functions with the Pr functor. One function gets called when the final argument of the created function has a value of zero. The second function gets called on all the other values.

Boolos, Burgess and Jeffery have it like this:

h(x, 0) = f(x, 0)
h(x', y) = g(x, y, h(x, y))
h = Pr[f, g]

So, for a subtract-as-much-as-you-can function, sub(x, y):

decr=Cn[pred,id[3,3]];  # This function decrements the third of three arguments
sub=Pr[id[2,1],decr];   # id[2,1](x, 0) yields value of x


Minimization comprises the second of two methods of "flow control" in the programming language interpreted by srf.

Basically, minimization involves find what arguments of a funtion cause that function to return zero. Most authorities have a minimization operator that takes a function of N arguments, and returns a function of N - 1 arguments. The N'th argument of the original function varies from 0 to whatever value makes the original function return zero. If the original function never returns zero, the minimized function never terminates.

Textbooks and class notes on the web seem singularly lacking in examples of why the system needs minimization, and what you can do with it. I develop a division algorithm below that uses minimization.

Minimization has a rather simple syntax. It requires a function of one or more arguments, and gives back a function that accepts one fewer arguments.

f = absdiff;
F = Mn[f];

Non-mathematical commands

The srf interpreter understands 3 non-mathematical commands:


You can have srf time a given function evaluation, or count the number of built-in primitives a function ends up calling. Just use time or count immediate preceding a function call:

11:13PM 39 % ./srf -L examples/numbers
load file named "examples/numbers"
+3=Cn[s, Cn[s, s]];
count +3(1);
primitive count: 3

The function named +3 ends up calling the s() built-in function 3 time, returning a value of 4.


These examples demonstrate aspects of the programming language interpreted by srf.

Set a symbol's value

Set a symbol's value to the numerical value of 3:

3 = s(s(s(z())));

Set a symbol's value to a function that computes the predecessor (1 less than) its argument:


Build up a function

If you have an algebraic function that looks as follows (pred and sub examples come in the source code distribution):

y = (x-3)2-1

You can build up a function in srf like this:

Create a function that square its argument:

square = Cn[ prod, id[1,1], id[1,1]];

First, pre-define a function that always returns three, and a function that always returns one:

three = Cn[s, Cn[s, Cn[s, z]]];
one = Cn[s, z];

Observe that the original algebraic function squares the quantity x-3, and remember that primitive recursive modified subtraction only returns values as small as zero. Choose to use absolute difference:

intermediate1 = Cn[square, Cn[absdiff, id[2,1], id[2,2]]];

Put intermediate1 together with a function that always returns 3:

intermediate2 = Cn[intermediate1, id[1,1], three ];

Finally, subtract 1 from whatever intermediate2 returns, but remember that you have to use a function returning 1:

y = Cn[sub, Cn[intermediate2, id[1,1]], one ];

Division using Minimization

The usual arithmetic definition of division goes like this:

N/D = Q + R/D

You know natural numbers N (numerator), D (denominator), up front, and you find Q (quotient) and possibly R (remainder), where R < D.

You can manipulate the above equation algebraically to get this:

0 = QD + R - N

You still want to find Q and R, but now you can see the possibility of using srf's Mn functor. In Peano arithmetic, you can only really define a modified subtraction (variously referred to a "subtract as much as you can", "modified subtraction" or "monus") that returns 0 when either a = b or when b > a. Luckily, you can easily calculate an absolute difference: (a-b)+(b-a), which will return the absolute value of the difference between a and b. Phrasing the rearranged definition of division with absolute difference:

0 = absdiff(QD + R, N)

If we decide we only care about finding Q, we can let primitive recursion take care of the R < D requirement.

We define two functions for use in a Pr functor:

f(N, D, Q) = absdiff(Q*D, N)
g(N, D, Q, R, y) = min(absdiff(Q*D+R, N), y)
h(N, D, Q, R) = Pr[f, g]

R has to stay less than D, so we wrap h like this:

j(N, D, Q) = h(N, D, Q, D)

At this point, we can apply Mn to function j. Mn will give back a function that, when invoked, supplies values 0, 1, 2, 3... for the third argument of j until j returns 0. The caller supplies values for N and D.

div = Mn[j];

When you invoke div(10,5);, the srf interpreter starts with a guess of 0 for the quotient. It finds the minimum value of absdiff(Q*D+R, N), starting with R of 0 and working up to the value of D. That ends up as 6, when R has 4 as a value. N always has a value of 10. Six does not equal 0 numerically, so srf (via its C function minimizor()) tries 1 as a guess for the quotient. absdiff(Q*D+R, N) ends up with a minimum value of 1, again when R has 4 as a value. 1 does not equal 0, so srf tries a guess of 2. Finally srf ends up with a minimum value of 0, which causes Mn to terminate and return 2, the correct answer.

Interpreter Design

I sincerely doubt that srf breaks any ground in interpreter design or implementation. I just wish to give a sketch of its implementation, and note pieces that worked well, and peices that didn't.

I wrote the srf interpreter in as close to ANSI C (C89) as I could get. I believe it uses some signals not defined for ANSI C, but you should still find it fairly portable (except to Windows).

flex and byacc/bison/yacc

The interpreter uses lex to break up its input stream into tokens, and it uses an LALR(1) grammar (yacc, byacc or bison compatible) to parse the input.

I consider this part mainly a success. The lexer seemed very easy to write. I feel that writing the regular expression to recognize "identifiers" constituted the worst part. I wanted to match the different forms of "numeral" that various text books and papers used: 0, 0', 0'', 0''' ... or 0, s0, ss0, sss0 ....

I let the yacc-style grammar structure the internals of the interpreter. All the work of the interpreter happens in the actions associated with the productions of the grammar. srf has such a simple syntax that the productions don't build up a parse tree before execution. I find that the productions necessary to cause the generated parser to keep reading input after a single statement (productions 1, 2 and 3 above) constitute the only confusing parts of a grammar.

Data structures


Internally, srf uses the "Atom" interface from C Interfaces and Implementations, by David Hanson, ISBN 0-201-49841-3. I implemented that interface via a hashtable: all strings used by the program as identifiers for values get stored in the hashtable.

The general idea of the "Atom" interface has C programs treating pointers to strings as the strings themselves by always looking up a string in some data structure. After the program looks up a string, it gets back the pointer to that string The pointer gets used to do numerical compares etc instead of byte-wise compares via strcmp. I didn't get much mileage out of the "Atom" interface: srf doesn't do many strcmp() calls internally. The "Atom" interface does centralize memory allocation and deallocation for strings.

Hash table

srf uses a C implementation of Per Ake Larson's dynamically expandable hashtable. You can find a description of this hashtable in: "Dynamic Hash Tables" by Per-Ake Larson, Communications of the ACM, April 1988, volume 31, number 4.

I used Dan Bernstein's DJB2 hashing algorithm.

A single instance of this hashtable holds both strings (as Atoms) and structs value named by strings.

Representation of values

Given the grammar above, I had to represent identifiers, numerical values, filenames and functions with a single struct to make any of the yacc family (yacc, byacc, bison) of parser generators happy. I ended up with struct value, below.


struct value {
    enum valueType type;
    union {
        const char *identifier;
        const char *filename;
        int value;
        struct function *function;
    } v;
    int assigned;
    struct value *next;

The assigned element marks a value as bound to an identifier. Internally, the C functions that evaluate srf functions or identifiers call a deallocate function on whatever structs value the C functions get passed. The deallocation function evaluates the assigned element to decide whether or not to truly deallocate the struct value in question.

The next element allows a "fast" allocation of structs value. The function internal to srf that returns new structs value keeps a stack of previously allocated (from malloc()), and now unassociated with either an identifier or an intermediate calculation, structs value. The next element chains those unassociated structs value in srf internals.

Representation of functions


struct function {

	struct value *(*function)(struct function *, struct value_list *);

	int debugflag;

	enum builtinType type;

	struct {
		int count;
		int index;
	} identity_data;

	struct {
		struct function *first;  /* pointer to "outermost" function */
		struct function **rest;   /* array (not list!) of inner function(s) */
	} composition_data;

	struct {
		struct function *zero;
		struct function *increment;
	} recursion_data;

	struct {
		struct function *function;
	} minimization_data;

	/* unused function pointers kept on a "free list" */
	struct function *next;

Any function of interest will comprise a vast tree of structs function. Even something as simple as "add 2" (Cn[s, s]) will involve 3 structs function, one to hold each of two copies of s(), and another to hold those two in its composition_data field.

Function evaluation takes place via mutually-recursive calls to the C-language functions composer, recursor, minimizor, funneled through evaluate_function. Any built-up function in srf consists of a tree, where leaf nodes consist of calls to built-ins, and interior nodes consist of calls to composer, recursor and minimizor. No matter how elaborate a forest of structs function comprises some function, evaluation consists of a depth-first traversal of the tree.

In retrospect, it seems like an error that I didn't use a union to make struct function smaller in-memory. All the data for the different "types" of functions got spread out all over the place. The type element of struct function allows distinguishing between different "types" of functions, should one choose to make them all members of a union element. I believe that I added the type element late in development. Early in development, I used NULL values to distinguish which "type" of function should execute.

Again, the next element allows the interpreter to keep unused structs function on a "free list", to avoid having to call malloc every time the interpreter needs a struct function.

The values of enum builtinType should go far in explaining how srf keeps straight whether to call a "builtin function" (SUCCESSOR, ZERO, IDENTITY), construct a new function (CONSTRUCTOR) or run one of the "flow-of-control" constructs (COMPOSITION, PRE, MIN). The function member of struct function contains a pointer to a C function for each of these things to do.

Recursion by iteration

srf does primitive recursion by iteration, internally. I believe that the definition of primitive recursion used by Boolos, Burgess and Jeffrey encourages this, it amounts to tail-call recursion. Primitive recursive calculations get performed "on the way back up" the recursion stack - primitive recursion works its way down to zero, and then does all calculation starting with zero. In C, this would amount to performing any calculation with the return value of the recursive call. This seems contrary to usual C-programmer's practice of performing some calculation, then passing the value to the recursive call, and returning the return value of the recursive call.

Error productions in grammar

The grammar has a few productions to catch common errors like trying to construct a function from a user-defined function, or calling a constructor as a regular function.

The interpreter passes "null" values when an identifier of a numerical value gets used in a function-call context, and vice versa, or if it finds that an identifier doesn't exist in its hashtable. The interpreter represents a "null" value by a struct value that has a type element with value of UNSET.

rfunc - simple compiler

rfunc constitutes a very simple compiler, basically transliterating ordinary algebraic statements, plus an if-then-else construct, into function definitions understood by srf.

rfunc grammar

  1. program —> stmnt ';'
  2. program —> program stmnt ';'
  3. stmnt —> additive_expr
  4. stmnt —> "if" '(' relation ')' stmnt "else" stmnt
  5. relation —> additive_expr LOGICAL_OP additive_expr
  6. additive_expr —> multiplicative_expr
  7. additive_expr —> additive_expr '+' multiplicative_expr
  8. additive_expr —> additive_expr '-' multiplicative_expr
  9. multiplicative_expr —> value
  10. multiplicative_expr —> multiplicative_expr '*' value
  11. multiplicative_expr —> multiplicative_expr '/' value
  12. function_call —> identifier '(' value_list ')'
  13. value_list —> additive_expr
  14. value_list —> value_list ',' additive_expr
  15. value —> constant
  16. value —> identifier
  17. value —> function_call
  18. value —> '(' additive_expr ')'

You use C-style identifiers in rfunc input. During translation, rfunc turns these into formal arguments to recursive functions. For example, if you write an algebraic statement using variables a, b, c, rfunc assumes that you want them in alphabetical order as arguments, and that the recursive function takes 3 and only 3 arguments.

I wrote rfunc in a most horrible fashion: it leaks memory, it should share code with srf etc etc. It merely solved a problem I had at hand, writing the prime() predicate for srf.

rfunc examples

The following illustrates how to write a "max" function. Note that rfunc does not produce a prompt when before it waits for input.

10:46PM 13 % ./rfunc
if (a > b) a else b;
Cn[ sum,Cn[ prod,Cn[greaterthan, id[2,1], id[2,2]], id[2,1]], Cn[ prod,
Cn[lessthan_or_equalto, id[2,1], id[2,2]], id[2,2]]];

The conditional has to contain "<", ">", "<=", ">=", "==" or "!=". The file examples/relations in the source distribution has functions that match what rfunc compiles those conditionals to.

Need to write some kind of algebraic function? rfunc can do that:

10:51PM 14 % ./rfunc
(m*m + 2*m*n + n*n -m - 3*n + 2)/2;
Cn[ div,Cn[ sum,Cn[ sub,Cn[ sub,Cn[ sum,Cn[ sum,Cn[ prod,id[2,1], id[2,1]],
Cn[ prod,Cn[ prod,Cn[s, Cn[s, z]], id[2,1]], id[2,2]]], Cn[ prod,id[2,2],
id[2,2]]], id[2,1]], Cn[ prod,Cn[s, Cn[s, Cn[s, z]]], id[2,2]]],
Cn[s, Cn[s, z]]], Cn[s, Cn[s, z]]];

The algebraic operators +, -, * and / get transliterated into calls to functions named "sum", "sub", "prod" and "div" respectively. The examples/ subdirectory of the source distribution has such functions. You should take care when using "-" in rfunc, as the primitive recursive subtraction function only returns values as small as zero - it does not return negative numbers. Sometimes using absdiff makes the difference between a correct and an incorrect function.

Further work

In no particular order:

Recursive Function Resources

I'm open to suggestions about other books, I just don't have much money to spend on them. I see that Computability and Logic gets a lot of criticism for its plethora of typos.