A Programmers Introduction to Russell H. Boehm A. Demers J. Donahue Our intent is to provide a highly informal, but hopefully compre hensible, description of the language Russell. No attempt is made to provide a complete definition of the language; it is as sumed that the online rhelp facility [Boe 85] is available for this purpose. This description is aimed primarily at a read er who is about to start using the language. It includes discus sion of both fundamental concepts and relatively superficial (but nonetheless important) syntactic issues. The reader who is in terested only in conceptual issues might prefer to read [Don 85]. Russell is a very compact expression language which is neverthe less extremely general and uniform in its treatment of functions and, more importantly, data types. By the term expression lan guage we mean that there is no distinction between expressions and statements. Any executable construct in the language returns a value. In particular assignment statements and conditionals return values, which may or may not be used. By the same token, most constructs in the language can also potentially modify the value of some variable, and thus play the role of statements. Well start with a very simple example which will serve to illus trate a few of the points which follow. In keeping with tradi tion, we will use a declaration of the factorial function: 1 fact == 2 func [ n : val Short ] val Short 3 { 4 if 5 n > 0 ==> n * fact[ n 1 ] 6 # n = 0 ==> 1 7 fi 8 } Russell has a completely uniform declaration mechanism. An identifier a can be bound to a simple value, a function, a type, or a variable, with a declaration of the form a == b. This has the effect of binding the identifier a to the value produced by the expression b. We could use b == 0 to declare b to be equiva lent to the integer constant 0. (This is completely distinct from declaring an integer variable. The identifier b is purely a value. It does not make sense to assign to it. ) In this case, the expression appearing in the declaration is an explicit func tion construction (abstraction). Its value is the factorial function whose body consists of lines 47. The heading of the function construction (line 2) indicates it is a function map ping a single short (16 bits in the Vax implementation, 32 bits in the Sun implementation) integer value n to a short integer value. If we had wanted a factorial function that could pro duce arbitrarily large results without overflowing, the heading should have been: 2 func [ n : val Long ] val Long The body of the fact function is an expression whose value will be re turned as the value of a function call. In this case the expres sion produces a simple integer value. In general, the body of a function may be any expression; it could just as well produce an other function or a type. Similarly it might accept a function or type argument. Here the body consists of a conditional. Here we used a slightly extended version of Dijkstras guarded command notation [Dij 76]. (More conventional, Algollike syntax is also allowed.) Since Russell is an expression language, conditionals return a value. In this latter respect Russell conditionals re semble Algol conditional expressions. The conditional here should be interpreted as follows: If the first guard is true, i.e. n is greater than 0, then the value returned is the value of the expression n*fact[n1]. Note that arguments to function ap plications (calls) are enclosed in square brackets rather than parentheses. Function application syntax is actually quite flex ible. Brackets may frequently be omitted. This is discussed be low. If the second guard is true then the integer 1 is returned. If neither guard is true a runtime error occurs. If two or more guards of some conditional are all true, then any one of the cor responding expressions may be chosen to be evaluated. Assuming n 0, the conditional could also have been written as: if n > 0 ==> n * fact[ n 1 ] # else ==> 1 fi or as if n > 0 then n * fact[ n 1 ] else 1 fi So far, we have not introduced any variables or as signments. The identifier n denotes a fixed value, which cannot be changed. In order to look at variables, we first have to ex amine the notion of type. Russell was designed to explore a particular view of data types. Traditional programming language designs typically fall into one of two categories. Languages like LISP impose essentially no type structure which could be checked by a compiler; a typical LISP compiler would allow any argument to be passed to any function or operator. (Executing the function call may of course result in a runtime error.) This allows a great deal of flexibility, but many trivial errors are likely to go undetected, at least until that section of the program is actually executed. It also makes the compilers job much more difficult, since it cannot necessarily tell in advance whether a + should be compiled to, say, an integer or float ing point add instruction. Languages like Pascal solve these problems by imposing a very rigid type structure. A Pascal com piler would object to an expression such as 3 * False Unfortu nately this results in a loss of generality. Such languages make it particularly difficult to write general purpose, reusable software. It does not suffice to write one implementation of bi nary search trees; a different one must be provided for each kind of data to be stored in the leaves. This can usually be avoided by defeating the type structure. This, at least partially, negates the purpose of having the type structure in the first place. A number of more recent language designs try to combine the flexibility of dynamically typed languages, such as LISP, with the reliability and efficiency resulting from compiletime type checking. Russell is one such polymorphically typed language. The fundamental view underlying Russell is that all objects manipulated by a Russell program are in fact elements of one universal domain or set. We may, if we wish, think of this set as being the set of all bit sequences. Thus we can identify an integer with its binary representation, a character string with a sequence of bits representing the ASCII codes for the characters, and a function with a sequence of bits representing the code to be executed to evaluate the function, possibly fol lowed by another sequence of bits describing the environment in which the code is to be executed. Thus the object itself gives us no information about how to interpret it. The same bit string could represent either a character string or a function mapping integers to integers. (For a much more abstract view of such a domain see, for example, [Sto 77].) What distinguishes one data type from another is not the set of its elements, but rather the set of operations (or functions) that may be applied to interpret elements of the universal set. In fact, a data type is just that collection of operations. Thus the Boolean type is just the col lection of functions and, or, not, ... . What makes a value an integer is not any characteristic of the value itself, but the fact that it is intended to be interpreted by exactly these func tions. Many programming languages allow users to treat functions as data objects in at least some contexts, e.g. as parameters. Since Russell data types are just collections of functions, they can similarly be treated as data objects. Furthermore, since both functions and data types are again part of the same univer sal data domain as other objects, this can be done uniformly in all contexts. Compiletime type checking is performed by as sociating a syntactic type, or signature with each expression. Such a signature specifies in which contexts an expression may appear. Signature checking (type checking translated to Russell terminology) insures that an expression does not appear in a con text in which it is not meaningful. Thus the value produced by an expression cannot be misinterpreted. For example, assume we are representing integers using their binary representation. Furthermore we represent the Boolean constants True and False as the bit strings 1 and 0 respectively. Sig nature checking in Russell will prevent us from writing the ex pression 3 * False in spite of the fact that it produces a well defined answer. Clearly the 0 value produced by the expression False is intended to be used in a Boolean context, and is not intended to be treated as an integer. This is reflected by assigning the expression False the signature val Boolean (Boolean value). The signature of * is func [val Short; val Short] val Short Actually there are several instances of * with different signatures. Each of the types Short, Long, and Float has a separate * operation. They provide 16bit integer, unlimited size integer, and floating point multiplica tion, respectively. This indicates, among other things, that it is a function whose arguments must have signatures val Short, and thus must be interpretable as short (i.e. 16 bit) integers. Since the parameter signature for * (val Short) does not match the signature of the second argument expression (val Boolean), the above expression will be rejected by the compiler. It is important to distinguish signatures, which are associated with expressions, and are purely syntactic objects, from types, which are collections of functions. Short is a builtin type, which contains operations such as *. It is convenient to use the same identifier (or expression, in general) inside signatures, to indicate appropriate uses of an expression. It is much easier to write applicative (variablefree) programs in Russell than in most conventional languages. Our experience indicates that small Russell programs tend to naturally be ap plicative. Nonetheless, it is easy to introduce variables if one prefers. We illustrate this by rewriting the fact function so that it uses a loop rather than recursion. (Clearly loops are not useful without variables.) Variables, like functions and types, are themselves data objects. They identify a location where other values can be stored. A short integer variable N can be declared by means of a declaration such as N == Short$New[ ] The type Short is, like all Russell types, nothing but a collec tion of functions. One of these functions is New. It allocates a new integer variable and returns it. Thus its signature is func [ ] var Short The $symbol is used to indicate a selection from a type. Short$New explicitly specifies the New function from the type Short, and Short$New[ ] is an expression which returns a new integer variable. N is then simply bound to the the result of this expression. The loop construct has the same syntax as the conditional, except that if and fi are re placed by do and od respectively. (See [Dij 76].) It indi cates that while one of the guards is true the corresponding ex pression should be executed. A loop always returns the special value Null. An imperative version of the factorial function can be written as: fact == func [ n : val Short ] val Short { let N == Short$New[ ]; F == Short$New[ ] in N := 2; F := 1; do N <= n ==> F := F * N; N := N + 1 od; F ni } A block can be used anywhere an expression is legal. Its general syntax is let in ni where both declarations and expressions are separated by semi colons. The value of the block is the value of the last expres sion. Thus the value returned by the block in the example is the final value of the variable F. As before this becomes the value returned by the function. There is no separate notion of pro cedure in Russell. Functions may may modify variable parame ters. The assignment operator := is an example of such a function. It expects two arguments, a variable on the left and a value on the right. Both preceding examples have been written using syntax similar to that of other programming languages. This is usually the case for real Russell programs. Formally this syntax is only an abbreviation. Full Russell syntax re flects the concepts underlying the language much more clearly, but is usually unpleasant to write. It is useful to look at the full language, both to simplify explanations, and to understand the generality of the language. Consider the assignment N := N + 1 in the preceding example. We stated previously that operations on Short values are components of the type Short. Thus the operation + does not exist by itself; it is one element in the Short collection of operations. In full Russell, + would need to be replaced by Short$+. Similarly, := is an operation provided by many types, includ ing Short. The instance provided by Short has signature func [var Short, val Short] val Short indicating that it expects a location as its left argument, a value as its right argument, and returns a value (namely that of its right argument). The value returned by the operation is discarded in this case. As in other programming languages, the two occurrences of N actual ly have different meanings. The occurrence on the left side de notes a variable, or location, at which the value produced by the right side is to be stored. The right side occurrence simply produces a value. As declared, N has signature var Short, which is appropriate for the left side. On the right side it appears as an argument to Short$+, which expects a val Short subexpression. To get from one to the other, we have to go from the location N to the value stored there. This is done by calling the V operation provided by most types. In particular, Short$V has signature func[var Short] val Short A more complete version of the assignment would be N Short$:= Short$V[N] Short$+ Short$1 We have still used conven tional infix notation for := and +. These are just func tion calls Thus a real purist could write the assignment in fully expanded Russell as Short$:=[N, Short$+[Short$V[N], Short$1[]]] and have it be correctly recognized by the compiler. (The inter pretation of Short$1 is discussed in a later section.) Appendix 2 contains a short summary of the differences between full Rus sell, and the abbreviated version which is normally used. Now that we have hopefully conveyed some rough feeling for the lan guage, we finally start over at the beginning. A Russell program consists of a sequence of identifiers, keywords, strings, com ments, and punctuation characters. Upper versus lower case dis tinctions are significant in all contexts. An identifier may have one of three forms: It may consist of any letter followed by some number of letters and digits. "_" is considered to be a let ter. Any sequence of characters enclosed in single quotes con stitute an identifier. These identifiers are most commonly used for constants, as will be described in the next section. A symbol can be included in such an identifier by doubling it, or preceding it by a \ character. Any sequence of operator characters. These are: ! % & * + . / : < = > ? @ \ ^ | ~ By convention these are used as function names. Some of the se quences formed using (1) and (3) are actually reserved for use as keywords and may not be used for other purposes. Note that none of these include upper case letters. These are: Key words Purpose cand, cor conditional and, or do, od, if, fi, else, elsif, ==> conditionals, loops enum, record, prod, union, extend building types export, hide, with, con stants modifying types let, use, in, ni, ==, === declarations val, var, func, type, field, characters, readonly signatures extern separate compila tion <<, >> explicit overload res olution : parameters, decla rations The syntax of strings is complementary to that of identi fiers. The following are legal strings: Any sequence of charac ters enclosed in double quotes. Any sequence of letters and dig its starting with a digit. The following are examples of strings: 123 "123" 2A1FB "A1FB" "Hello" """" ("" represents a single " in side the string) "" "\"" (same as """") The sequences \n, \r, and \t may be used to denote linefeed, car riage return, and tab characters inside strings and quoted iden tifiers. The next section discusses the meaning of strings in the language. Usually they are interpreted as in other program ming languages. In particular 123 will usually represent the in teger and "Hello" the corresponding character string. The formal definitions differ however, both to allow treatment of infinite sets of constants within the Russell framework, and to allow the user the same mechanism in constructing his own types. Russell comments are delimited by (* and *). Comments may be nested. Thus the following is legal: (* This is a comment (* This is a nested comment *) *) On the other hand, (* (* This is an improp erly constructed comment *) is illegal. As we pointed out be fore, types in Russell are just collections of functions. Thus the only way one can talk about particular values is by having functions in that type that produce them. Consider the builtin type Boolean. It contains two functions, named True and False, with signatures (roughly speaking) func [ ] val Bool ean. These are functions with no arguments which produce a Bool ean result. Thus we can always get the Boolean value false by writing Boolean$False[ ] in full Russell, or, more commonly, just False. This allows us to deal with finite sets of con stants for a given type. Since we always want to consider types as finite sets of operations, we need to extend this idea to han dle infinite sets of constants (such as those in the builtin Short type). This is done by treating strings, in the sense de scribed above, as abbreviations. The only Short constants pro vided explicitly (i.e. in the same way as True and False for the Boolean type) are 0 through 9. (The symbols are part of the identifiers.) Also provided is a concatenation oper ation ^+ which, in this case, gives the value of the integer obtained by writing an integer next to another digit. Roughly what happens then is that the expression 123 is treated as an ab breviation for: (1[ ] ^+ 2[ ]) ^+ 3[ ] If the concatenation operator were not already built in, it could be declared as: ^+ == func[ x, y : val Short ] { let 10 == 5 + 5 in 10 * x + y ni } Note that using 10 instead of 10 would result in infinite recursion. In the above example we have again omit ted the selections of the constants 1, 2, and 3, as well as the ^+ operator, from the Short type. Formally strings are always selected from a type. Thus we should have written Short$123. This selection would normally be inferred, so that, in practice, writing 123 is usually sufficient. All this allows us to define the meaning of (unquoted) strings a lit tle more precisely. In general T$a1a2...an is expanded to ((...((T$a1[ ] T$^+ T$a2[ ]) T$^+ T$a3[ ]) ...) T$^+ T$an[ ]) Quoted strings are treated only slightly differently. Since we want "" to be legal we have to generalize the treatment to handle this in a reasonable way. We do this by agreeing that (2 single quotes) will implicitly be concatenated onto the left of any such string. Furthermore, to distinguish such strings more explicitly from unquoted ones ^* will be used as the concatenation operator. Thus T$"ab" is expanded to (T$[ ] T$^* T$a[ ]) T$^* T$b[ ] In accordance with this scheme, the builtin type ChStr (character strings) has constants, that is nilary functions, with names and c, for all characters c in the character set. It also includes a ^* concatenation function, which unlike the one for integers, really does character string concatenation. We have gained something besides uniformity in this approach. It is possible to make use of the string mecha nism for user defined types. For example, we can define an oc tal integer type by simply modifying the builtin type Short in the following two ways: First the constants 8 and 9 have to be deleted. Secondly ^+ now has to multiply by 8 rather than 10. As will be described later Russell allows the user to easily construct such new types. Russell provides the following primi tives for building expressions: Selection from a type. A compo nent function f of a type T may be selected by writing T$f Func tion constructions (lambda abstraction in lambda calculus termi nology). Any expression E can be turned into a function by writ ing function_signature { E } The result signature may be omitted if it can be inferred. Any identifiers appearing in E can be treated as parameters by including them as such in the signature. The above declarations of factorial functions are simple examples of this. Function application. A function f may be applied to arguments a1 ... an by writing [a1, ... , aj] f [aj+1, ... , an] Which arguments go before the function an which go after is a choice that can be made arbitrarily (though hopefully consistent ly) by the programmer. If one of the argument lists is empty it can always be omitted. (As mentioned above, if both are empty they can usually, but not always, both be omitted.) Thus any function can be treated as an infix, prefix, or postfix function. (We could introduce more reasonable syntax for the factorial function by declaring it as "! == func ... ".) Blocks. The use of blocks to introduce declarations was illustrated above. Rus sell allows two other kinds of blocks. The block let in ni can be abbreviated as ( ) . The con struct use in ni tells the compiler that it should use the types given in inferring omitted selections in the second list of expres sions. Otherwise it is equivalent to ( ) . Note that in inferring such selections the compiler will first try to use the types of the arguments and then search surrounding "use" lists inside out. (As it turns out this means that "use" lists are primarily useful for implicit selections of constants, which have no arguments.) Furthermore, in abbreviated Russell, any us er program is treated as if it were embedded in the following construct: use Float in use ChStr in use Boolean in use Short in user_program ni ni ni ni Sequence control constructs. Loops and conditionals were described above. (Conditionals can also be written in a more Pascallike ifthenelse form, not de scribed here.) Conditionals can sometimes be abbreviated by the conditional and/or constructs: expression1 cand expression2 ex pression1 cor expression2 Here all expressions return Boolean values. Function calls in Russell always pass arguments by val ue, and thus must evaluate the arguments first. This is not al ways desirable with the Boolean functions and and or. The cand and cor constructs are similar to and and or, except that they are not callbyvalue operations. (This is hinted at by the absence of [ ] around the arguments.) The second argument is only evaluated if necessary. Thus e1 cand e2 is actually equivalent to if e1 ==> e2 # else ==> Boolean$False[ ] fi Type con structions and modifications. These provide ways to build new types out of existing ones. They are discussed below. A few more remarks on the syntax of applications and selections are in order at this point. First, an expression such as [a]b$c is cur rently ambiguous. The function b could produce a type, and thus the expression could be interpreted as ([a]b)$c. This is re solved by having selections bind more tightly than application. Thus the correct interpretation is [a](b$c). The second problem is that the above syntax would require us to write the statement x := x + 1 as [x] := [[x] + [1]] This would, at best, be accept able only to LISP programmers. Thus the actual syntax allows dropping the brackets for functions used as either binary infix, or as unary prefix operators. A standard FORTRAN style prece dence scheme is used to disambiguate the resulting expressions. It is worth noting that this scheme relies purely on the identi fiers appearing in the expression, and not at all on any semantic or signature information. As previously mentioned, each Russell expression has a signature associated with it. This signature describes how the result produced by that expression should, and should not, be interpreted. Russell expressions always produce objects which are intended to be used as either simple (i.e. non function, nontype) values, variables, functions, types, or sig natures. In order to distinguish between these, there are five different kinds of signatures, corresponding to the above five categories. They are described somewhat informally below. The general form of such a signature is val T where T is an expres sion denoting a type. Informally it indicates that the value produced by the corresponding expression should be interpreted as a value of type T. More formally it means only that the result produced by the expression should only be interpreted by (that is passed as a parameter to) a function that expects an argument of signature val T. Thus one way of viewing the whole issue is that the expression T is just a tag used to match up functions with proper arguments. An alternate view is the following: We want to guarantee that the firstclass value in question is passed only to functions that know how to interpret it. An obvious way to do that is to keep track, in its signature, of all the functions that can be applied to it. Since usually all these functions are components of a type, we use the expression representing that type as a shorthand. (In fact some functions not in T may also expect val T arguments. These however are usually built out of the primitive functions in T.) As an example consider the following expressions, where it is assumed that all identifiers have the natural meanings: a) 1 + 2 b) (BoundedStack[Short, 10])$push[ S, 3 ] c) (BoundedStack[Short,5+5])$push[ S, 3 ] (a) is a simple integer expression. It has signature val Short. In (b) and (c) we assume that BoundedStack is a function which takes two arguments. The first is the type of the individual el ements to be pushed onto the stack. The second is the maximum size of the stack. Its result is the corresponding type of bounded stacks. We assume that everything works in an applica tive fashion, so that the push operation returns a stack value. The signature of (b) then is val BoundedStack[Short,10]. That of (c) is val BoundedStack[Short,5+5]. It should be em phasized that signatures are purely syntactic objects, which are determined using some simple rules described below. In particu lar nothing in the whole signature mechanism knows anything about the semantics of +. Thus the signatures of (b) and (c) are completely distinct, and the functions in Bounded Stack[Short,5+5] Cannot be used to interpret objects of signa ture val BoundedStack[Short,10]. This restriction rarely causes any real inconvenience. (If we want to do any syntactic type checking at all, some such restriction is clearly essen tial. Semantic equivalence of Russell type expressions is in general undecidable.) Conversely, the above interpretation re quires that two syntactically identical type expressions should always denote the same set of functions. It does not make sense to say that S has signature val BoundedStack[Short,x], where x is a variable; the value of x may change, but certainly the collection of functions used to access S should not. In order to insure that this property holds, Russell generally enforces the following import rule: No expression with function or type signa ture may mention a nonlocal variable. (As described below, the orecise rule can be somewhat more relaxed.) It follows from this that no expression may depend on (or alter) variables it does not explicitly mention. Thus no function or type valued expression can depend on the values of variables. The general form of a variable signature is var T where T is a type expression, as above. This indicates that the result of the expression is a variable (or location) which can hold an object to be interpreted as by the functions of type T. Usually (though not always) the only thing that can be done with an object of signature var T is to pass it as an argument to T$:= or T$V, or to bind an identifier to it. The most common example of an expression of signature var T (other than a simple variable) is T$New[ ] where New is the function in T that allocates a new variable. Another example is A.1 where A is a variable of an appropriate array type, and . is the name of the subscription operator in the (builtin) array type. It produces the location of the first element of the array. (Array types usually have two versions of the . operation. In addition to the one mentioned here, there would normally be an operation mapping an array value and an index into a component value.) In order to insure that func tions are passed only appropriate arguments, it is clearly neces sary that their signature include the signatures of the argu ments, as well as the signature of the result. Function signa tures may also include the names of the formal parameters. Al though these are not important in determining the correctness of a particular application, there are nevertheless two reasons for putting them here. First the syntax of function constructions requires it. (Theres nowhere else to put them.) The second, more important one, should become apparent when we state the rule for determining the signature of an application. The syntax for function signatures is func[param1; ... ; paramn] result_signa ture where each parami is a list of parameter names with identi cal signature, and the signature itself: id1, ... , idm : parame ter_signature If one of the parami includes only a single parame ter name which is not otherwise used, then both it and the : can be omitted. A signature such as func[x,y: val Short] val Short is considered to be identical to func[x: val Short; y: val Short] val Short Function signatures are usually written explic itly only in function headings. Even in this case, the result signature may be omitted whenever it can be easily determined from the body of the function. Some examples of function signa tures were already given above. More interesting examples will be presented below in conjunction with type signatures. If we want to specify how a type expression can be used, we need two kinds of information. First, we need to know what operations can be selected from it. Second, we need to know how those opera tions themselves can be used. Thus a type signature consists of operation names, and of the signatures corresponding to those names. The syntax is: type { op1; ... ; opn } The syntax for the individual opi is the same as that used for parameters in a func tion signature (except that no names can be omitted, and all sig natures have to be either type or function signatures). The sim plest example is the builtin type Void. It has no operations as part of the type. Its signature is therefore: type {} There is a builtin function Null, which has signature func[ ] val Void. (Recall that a Void value is also produced by the do ... od loop.) Now consider a type VOID which has the function Null as its only component. Such a type could be declared as VOID == Void with { Null == Null } It would make sense to use this as the builtin type. The other alternative was chosen only so that we could write Null[ ] instead of of Void$Null[ ]. It would have signature type { Null: func [ ] val VOID } Unfortunately, this notion of a type signature wont get us very far. It is frequently useful to build a new type which behaves like an existing one, but is nev er theless distinct from it. For example, we may wish to have two types, Meters and Feet with the same operations of addition, subtraction etc. By keeping them distinct we can use the signa ture checking mechanism to guarantee that we dont get the two mixed up. We should be able to build both of these types by first constructing Meters, and then simply using the declaration: Feet == Meters to get the type feet. (Note once more that the signature mechanism is purely syntactic. Thus val Feet and val Meters are still distinct signatures, and we can thus achieve the desired protection. In fact, this is probably too much protection. In reality, we would want to introduce some conversion functions at some point. Again the signature checking mechanism can assure that these are used exactly where they are appropriate.) To illustrate what goes wrong in a simple context, lets try the analogous exercise with the type VOID. We use the declaration VOID2 == VOID to get a second type with identical characteristics. Certainly its signature has to be the same as that of Void, namely type { Null : func [ ] val VOID } But what we wanted was type { Null : func [ ] val VOID2 } The problem is that Null should return value of whatever type it is a component of, not the specific type VOID. Therefore we need to give this type a name in the signature. Such a local type name is written immediately after the type keyword in the signature. Thus a more appropriate signature for VOID would be type V { Null : func [ ] val V } Now the construction of VOID2 works properly. This version of VOID is harder to obtain. It might be declared as VOID = extend { Void } with V { Null == func[ ] { V$In[Null] } } export{ Null } These constructs are discussed below. We conclude with two more examples. The builtin type Short will serve as the first. Its signature is: type S { New; :=; V; < ; > ; = ; <=; >=; <>; 0; 1; 2; 3; 4; 5; 6; 7; 8; 9; : func [ val S ] val S ; +, , *, /, **, %, ^+ : func [ x,y : val S ] val S; +=, = : func [var S; val S] val S; get: func [var Void] val S; put: func [val S] val S; puts: func [val S] val ChStr; shift : func [what, how_much: val S] val S; New: func [ ] var S; New: func [ val S ] var S; } (The func tion get returns an integer read from the standard input file, put prints an integer, and puts converts an integer to its decimal character string representation. The second New function initializes the freshly created variable.) The above makes use of yet another form of abbreviation. Operation names such as V tend to occur in many types with the same signature. They therefore have default signatures associated with them which can be omitted from a type signature. Such names and the associated signatures are given in appendix 2. As a fi nal illustration, the following is a possible signature of the BoundedStack function used above: func [ T : type { New; V; := }; val Short ] type S { push : func [ val S; val T ] val S; top : func [ val S ] val T; pop : func [ val S ] val S; emp ty: func [ ] val S } The signature indicates that the BoundedStack function expects an integer and a type argument. The type argument used must include the functions New, V, and :=. The signature checking rules described below allow the actual type argument to have additional components as well. Thus the builtin Short type would be an acceptable argument. In fact if we were willing to be slightly clever about the implementation of the BoundedStack function we would not even have to require that the parameter include the three functions specified. The parame ter signature would then be given simply as type {} and any type whatsoever could be passed to it. (If the three operations are provided then an array implementation could be used.) Signatures themselves may appear as expressions. Such expressions do not compute interesting values. (Since there are no interesting op erations that are applicable to signatures, there is not much we can do with such a value in any case.) Being able to write sig natures in expression contexts does however allow us to write functions that operate uniformly on values, variables, functions, and types. This will be illustrated below. It also allows us to give names to long signatures so that they dont need to be re peated. For example, We could write let s === val Short; fact == func[n: s] { ... } in ... We use === instead of == to indicate that s and val Short should be con sidered to be completely identical, even for purposes of checking signature correctness. The Russell type checking system can be described by a pair of rules for each language construct. The first rule gives constraints on the signatures of the subexpres sions. These constraints must be satisfied for the expression to be signature correct. The second gives the signature of the con struct itself. Signature constraints usually require that two or more signatures be the same. This means that they should be syntactically identical, with the following exceptions: The present compiler allows some other minor differences, such as re ordered guards in conditionals. We list only those that are likely to be of interest. Parameter names and local type identi fiers may differ, provided they are uniformly renamed. Type sig natures may list components in different order. (The components must, however, have the same names.) Parameters with identical signatures may, or may not, be grouped together. The left side of a === declaration is viewed as an abbreviation for the right side. As we pointed out earlier, type expressions may not depend on the values of variables. Thus, if two signatures are the same, then all type expressions they contain must actually denote the same types. As an example of the general approach, consider a function construction such as func [x: val Short] val Short { if x%2 = 0 ==> x/2 # else ==> 3*x+1 fi } (x%2 yields the remainder of dividing x by 2.) A large number of con straints needs to be checked to insure that this is signature correct. We will elaborate on a few of them. The checking rule for the conditional requires that all clauses have the same sig nature, This constraint is not enforced when the value of the conditional is immediately discarded. Specifically, if x, y, and z are Short variables then if x > 0 ==> y := x # else ==> put["error\n"] fi; z := y; ... is signature correct, in spite of the fact that the first clause of the conditional has signature val Short, and the second has signature val ChStr. A similar situation occurs if the conditional is the body of a function with val Void signature. In our example this is satisfied since x/2 and 3*x+1 both have signature val Short (for reasons discussed below). The checking rule also requires all guards to have signature val Boolean. The guard x%2 = 0 satisfies this constraint. The second guard abbreviates not x%2 = 0 and thus also has the appropriate signature. The signature of the conditional itself is that of the arms. In our example, the conditional, and thus the body of the function, has signature val Short. Similar rules exist for function constructions. We require that the specified result signature (if any) match the signature of the body. The signature of the construction is that specified by the heading (with the possible addition of an inferred result signa ture). Thus the above function construction has signature func [x: val Short] val Short which may be abbreviated to func [val Short] val Short since the parameter name is not significant for signature checking. Signature calculus rules for the other lan guage constructs are given in [Boe 85]. Only two of them are nonobvious, and they are described below. We first examine function applications. Given our examples so far, an obvious set of rules for function applications would be: The parameter signa tures in the function signature must be the same as the signa tures of the corresponding argument subexpressions. The signa ture of the function application is the result signature of the function. Unfortunately, this is no longer adequate, once we start taking advantage of the Russell type structure. As a sim ple example, consider writing an identity function. A first at tempt might be: identity == func [x: ?] ? { x } Clearly, it should be possible to apply this function to any simple value, any function, any type, or any variable. Substituting, say, val Short for ? would restrict it to one argument signa ture, and is thus not acceptable. This situation is dealt with in Russell by passing the signature associated with the argument as another argument. Thus we would rewrite the identity function as identity == func [x: S; S: signature] S Alternatively, if we wanted to restrict the function to only simple values, we could write it as identity2 == func [x: val T; T: type {}] val T This was necessary in earlier versions of the language. If we wanted to apply the identity function to the (short) integer 13, we would write identity[13, val Short] which can be abbreviated as Trailing arguments may be omitted if they can be inferred from the explicit ones. identity[13] This application is not signa ture correct by rule (1) above. First, 13 has signature val Short, and not S. Clearly, the parameter signature S is intended to denote the second parameter, and should not be used literally in the comparison. Thus the checking rule needs to be adapted to read: The signatures of the argument sub expressions must match the parameter signatures, after any param meter names in the parameter signatures have been replaced by the corresponding actual argument. Thus we first replace the S in the signature of x by Short, and then check that the resulting parameter signature matches the signature of 13. In the case of identity2, we would write the function call as identity2[13, Short]. Here another problem occurs when we check that the argument Short matches the second parameter signature. The type Short contains a large number of opera tions, but the parameter signature calls for none. This should not matter; the signature type{} was intended to indicate that we did not require any operations from the type T. Thus the word match in our revised rule should be interpreted to mean that the two signatures are either the same, or they are both type signatures, and the argument signature contains a su perset of the operations in the parameter signature. Any type signature matches the signature type{}. Finally, rule (2) needs to be updated to correspond to the revised checking rule. The application identity[13, val Short] should have signature val Short, rather than S. Thus we write The signature of the function application is the result signature of the function with parameter names replaced by actual arguments. A situation similar to that occurring for applications occurs for selections of an operation from a type. The + component of the type Short has signature func [val S; val S] val S where S is the local type identifier. But the signature of Short$+ should be func [val Short; val Short] val Short (In particular, we need to insure that 1 + 2 has signature val Short, and not val S.) Thus the signature of a selection is the signa ture of the component, with the local type identifier replaced by the type expression preceding the $sign. As was pointed out above, we require that two syntactically identical type expres sions appearing inside signatures refer to the same type value. The signature val T implies that operations from the type T are applicable. But this only make sense if the type ex pressions T evaluates to a welldefined type value. This re quires that a type expression may not depend on the value of a variable. In the original version of the language [Boe 80], this was enforced by requiring that neither type valued expressions nor function valued expressions (which could, after all, be called from type valued expressions) may mention variables. Un fortunately, the restriction that functions be pure, i.e. not mention variables, has proved to be impractical. It forbids functions that update debugging or profiling information. More importantly, it prevents functions from remembering values that it previously computed to reduce the running time of subse quent calls. In order to avoid these problems, we optionally al low a more liberal restriction on functions. (This is invoked with the L option to the compiler.) This is based on the intro duction of a variable (called FS for historical reasons) that represents the entire state of the machine and its file system. It has signature var Void. There are no operations that al locate Void variables, so this is the only such variable avail able. (The use of the Void type is rather arbitrary. A new and different type could have been introduced instead.) A function that has access to a var Void variable, that is a function that was passed a var Void variable as a parameter, has access to the entire machine state, and thus may access global variables. (We currently insist that the var Void parameter be the last parame ter of the function.) Such functions are known as impure functions. This scheme still insures that type valued expres sions always produce the same value, since a type expression may not mention the variable FS, and there is no possibility of allo cating a local variable with var Void signature. This scheme is made syntactically much more pleasant by the following two conveniences: The entire user program is implicitly surrounded by let impure === var Void in ... ni Arguments with var Void signa tures may be omitted. They are automatically inferred to be the trailing var Void argument of the innermost surrounding function (or FS at the outer level). The first point allows reasonable syntax for impure functions. A function that increments the global variable x by a value y passed as a parameter can be written as func[ y : val Short; impure] { x += y } The second point assures that it is essentially never necessary to explicit ly pass FS to an impure function; calls to impure functions look like calls to pure functions. Much of Russell is devoted to the creation and manipulation of types. This is what gives the lan guage most of its flexibility. Rather than discussing the neces sary facilities in detail, we give an overview of whats avail able, and defer to [Boe 85] for the details. Types in Russell are obtained: as builtin types, by applying builtin typepro ducing functions, through the use of type constructors provided by the language, or by modifying some existing type. The builtin types provided by the current implementation are: Provides no operations. Void values are used where the value of an expression is not of interest. Provides standard Boolean (logical) operations. Provides 16 bit integer operations. Pro vides (64 bit) floating point operations. Provides operations on integers of size limited only by machine memory constraints. Builtin type producing functions are: Requires a (Short) size, and an element type as arguments. The result is a type containing operations to access arrays of the given size and ele ment type. A twodimensional array of 100 by 100 floating point numbers might be declared as: matrix == Array[100, Array[100, Float]]; A == matrix$New[ ]; The element Ai,j can then be refer enced as: A.i.j Provides operations on linear lists containing elements of the specified argument type. Operator syntax is cho sen so that it is relatively easy to build functions on a vari able number of arguments by writing them as functions on lists Similar to List except that an efficient lazy cons operation, i.e. one that defers evaluation of its second argument, is pro vided. Provides operations on references or pointers. The following type constructions are provided. Note that the above types and functions are simply predeclared identifiers, and are not otherwise special. Type constructions, on the other hand, are really language primitives, with associated syntax. Provides Cartesian product operations, that is, functions to ma nipulate tuples of objects. in particular, the resulting type includes a function Mk to build a tuple, functions to select a component of a tuple, and New, :=, and V functions to manipulate product variables. For example, prod { i: val Short; x: val Float } contains operations on pairs, the first of which represents an integer, and the second of which represents a floating point number. These would include Mk to build a pair, i to obtain the first component, and x to obtain the second component. They would have signatures func [val Short; val Float] val t func [val t] val Short func [val t] val Float where t is the local type name. The signature of one com ponent may depend on another component. We may build a product such as: prod { x: val T; T: type { ... } } Values corresponding to such a type are, in a sense, selfdescribing. They contain both a value (x) and information about how to interpret it (T). Objects in the sense of objectoriented programming can be built in this manner. Since the result of an expression with product type can be assigned to a variable, and a product may have function or type components, product types may also be used simply to convert between functions and assignable objects. As an example, the function fact defined earlier cannot be directly assigned to a variable. It can be argued convincingly that both this, and the distinction between Ref and var should and could have been avoided by changing the design of the signature calculus. We agree, and we may eventually make the necessary changes. assignment operations provided by the lan guage require the second subexpression to have val, and not func signature.) On the other hand, if we let T be prod { x: func [val Short] val Short } then T$Mk builds a product with one field, that is, it has signature func [func [val Short] val Short] val T Thus T$Mk[fact] has signature val T, and can thus be assigned to a var T variable (using the assignment operation provided by the product). The T$x operation can be used to convert back. Informally, a union construction is simi lar to a product construction, except that an object of the type contains only one of the kinds of objects specified, rather than all of them at once. In the Russell view, a union type contains operations to convert back and forth between the union type, and a number of other types. The names of these operations are built by adding standard prefixes to the component names specified in the union construction. Thus if T is defined as T == union { i: val Short; x: val Float } then T will contain operations to con vert between val Short (the signature of the i field) and the union type, as well as between val Float and the union type. More precisely, T$from_i will convert from val Short to val T, T$to_i will convert in the other direction, T$is_x will test whether a val T expression actually represents a val Short, and thus whether T$to_i may be safely applied. Analogous functions will be included for the x component. It is allowable to give the union (or a product) a local name to facilitate recursion. For example: union U { end: val Void; other: val prod { first: val Short; rest: val U } } operates on elements which are either Null or con sist of an integer and another similar element. Thus such an ob ject is effectively a list of integers. This is similar to the prod construction, except that all fields must have val signature, and that individual fields of record variables may be assigned to. Builds enumeration types (scalar type in Pascal terminology). Creates a copy of an existing type with conversion operations between the new and old types. Russell types may be modified with the following language con structs: Adds operations to, or replaces operations in, an exist ing type. The syntax is old_type_expression with local_type_name { new_operation_declarations } New operations may refer to old ones (or to themselves) by selecting from the local type name. Remove any unmentioned operations from an existing type. Remove the specified operations from an existing type. Russell provides for a rather simpleminded, but effective, separate compilation facility. Any function that refers only to builtin identifiers (and parameters to the function) may be placed in a separate file and compiled by itself. The function in file filename.r can be referenced in other files with the expression: extern { "file name" } Typically separately compiled functions will produce types. For example, stacks could be implemented by placing the following Russell program in a file stack.r: func [ele ment_type: type{}] { (List [element_type]) with stack { empty == stack$nil; top == stack$head; pop == stack$tail; push == func [S: val stack; x: val element_type] { cons[x,S] }; } export { New; :=; V; top; pop; push } } The body of this function takes the builtin type producing function List and applies it to the type of stack elements (the parameter to the stack implementation). The with construct is then used to add operations appropriate for stacks. Finally the export clause removes all operations from the resulting type except those that are appropriate to stacks. (For example, the cons, head and tail operations from the original list type are removed. But the original assignment operation is pre served, since it applies equally well to stacks.) The above im plementation, once compiled, can be referenced from a main pro gram using, for example, let stack == extern { "stack" }; in ... (stack[Short])$push ... ni Compilation of the main program will examine information pro duced by the compilation of stack.r in order to check signa ture correctness of the main program (and to produce better code for it). We require that subsidiary functions be compiled before those functions that reference them. This restriction can be ef fectively circumvented (sometimes at the cost of some execution efficiency) by parametrizing different program sections with re spect to all other units they reference. For example, we could rewrite the above main program (in file main.r) as func [ stack: func [ t: type{ } ] type s {New; := ; V; empty: func [ ] val s; ...} ] { ... (stack[Short])$push ... } It is now possible to com pile it without first compiling the stack program. We do need to subsequently compile a main program that specifies how to assem ble the various program pieces. In this case, it would read sim ply extern {main} [ extern { stack} ] We conclude with some more examples. The following is the factorial function we saw before, modified to compute results of essentially unbounded size, and embedded in enough context to turn it into a complete program: let ! == func [ n : val Short ] { if n > 0 ==> Long$In[n] * ((n 1)!) # n = 0 ==> Long$1 fi }; x == Short$New[ ]; in do (put["Factorial of?"]; x := get[FS]) >= 0 ==> put[x!]; put["\n"] od ni The program will calculate factorials until it is given a negative input. Note that the do ... od construct in Russell can be used to simulate a number of other loop constructs, in cluding a repeat ... until loop, by moving more of the loop body into the guard. Dijkstras original construct does not allow side effects in guards. Thus this comment does not apply to his language. This, and the following example, illustrate the use of functions as objects to be manipulated by the program. Certain operations are naturally viewed as mapping functions to func tions. Many programming languages force us to modify this view, and to recast them in a different framework. Russell allows them to be represented directly. Here we look at (a naive view of) numerical differentiation. We give a function derivative which returns an approximation to the derivative of a given func tion. We illustrate its use by embedding it in a program which uses it to multiply 13 by 2, the hard way: let epsilon == 0.0001; derivative == func [f: func[val Float]val Float] { func[x: val Float] val Float { (f[x] f[x ep silon])/epsilon } }; square == func[x: val Float] {x * x}; double == derivative[square] in put[double[13.0]]; put["\n"]; ni The last illustration is an unusual implementation of binary trees. The following is a func tion which expects a type describing values stored at leaves as an argument, and produces a corresponding binary tree type as its result. The result type contains functions to obtain the left or right subtree of a given tree, to obtain the value stored at a leaf, to build a leaf containing a given value, to combine two subtrees into a new tree, and to inquire whether a tree consists solely of a leaf. (The latter is provided directly by the union construction and is not explicitly implemented.) A nonleaf tree could be represented as an explicit product or record type. We instead use a function which maps {left, right} to the left and right subtrees: func [L: type {}] { let lr == enum { left, right }; in use lr in union B { leaf: val L; interior: func [val lr] val B } with B { left_sub_tree == func [x: val B] { B$to_interior[x][left] }; right_sub_tree == func [x: val B] { B$to_interior[x][right] }; leaf_value == B$to_leaf; make_leaf == B$from_leaf; make_tree == func [l,r: val B] val B { B$from_in terior [ func [x: val lr] { if x = left ==> l # x = right ==> r fi } ] } } export { New; :=; V; left_sub_tree; right_sub_tree; leaf_value; make_leaf; is_leaf; make_tree } ni ni } We would like to thank Hausi Muller for his construc tive criticism of an earlier draft of this paper. Boehm, H., A. Demers, and J. Donahue, An Informal Description of Russell. Technical Report 80430, Computer Science Department, Cornell University, 1980. Boehm, H., A Logic for the Russell Programming Language, Thesis, Cornell University, 1984. Boehm, H., Russell online rhelp facility. Distributed with the Russell Compil er. (A version of this paper is available as rhelp intro.) Boehm, HansJ., and Alan Demers, Implementing Russell, Pro ceedings of the SIGPLAN 86 Symposium on Compiler Construction, SIGPLAN Notices 21, 7, July 1986, pp. 186195. Demers, A., and J. Donahue, Data Types are Values, Department of Computer Science, Cornell University, Technical Report TR79393, 1979. Demers, A. and J. Donahue, Data Types, Parameters, and Type Checking. Proceedings, Seventh Annual Principles of Program ming Languages Symposium, 1980, pp. 1223. Demers, A. and J. Donahue, TypeCompleteness as a Language Principle. Proceed ings, Seventh Annual Principles of Programming Languages Sympo sium, 1980, pp. 234244. Demers, A. and J. Donahue, The Seman tics of Russell: An Exercise in Abstract Data Types. Technical Report 80431, Computer Science Department, Cornell University, 1980. Demers, A. and J. Donahue, Making variables abstract: an equational theory for Russell. Proceedings, Tenth Annual Prin ciples of Programming Languages Symposium, 1983. Donahue, J., and A. Demers, Data Types are Values, ACM Transactions on Programming Languages and Systems 7, 3 (July 1985), pp. 426445. Dijkstra, E., A Discipline of Programming. PrenticeHall, 1976. Hook, Jim, Understanding Russell A First Attempt, Semantics of Data Types, Proceedings, Springer Lecture Notes in Computer Science 173, 1984, pp. 6986. Stoy, J., Denotational Semantics: The ScottStrachey Approach to Programming Language Theory. MIT press, 1977. See esp. chapter 7. The language Russell was devel oped at Cornell, primarily by the latter two authors. The con cepts involved are are discussed more fully in [Dem 79], [Dem 80a], [Dem 80b], and [Don 85]. The purpose of the language design is to test the practicality of the type system in the con text of a very small language (at least as measured by number of grammar productions). As a result, no attempt was made to ad dress certain issues, notably that of concurrency. It seems pos sible to graft a notion of concurrency onto the implemented lan guage in a very inelegant fashion. Later designs can deal with the issue more easily, though the proposed formal semantics gen erally do not. We described the implemented version of the lan guage. The implementation differs form the language of [Boe 80] primarily in that it corrects some problems with the signature calculus, adds the prod type construction in place of the more restrictive image construct, revises the semantics of extend and record for the sake of efficiency, provides explicit constructs to support separate compilation, and provides a different collection of builtin types and operations (which now includes Call/cc). Unfortunately it does not incorporate a number of more recently proposed design changes, such as Algol 68 style ref types, and an explicit notion of purity in the signature calculus. It also has not been brought into con formity with the ideas in [Hoo 84]. A number of language de scriptions more formal than [Boe 80] have also been published. [Dem 80c] gives a denotational/operational definition, using a Russell subset as the metalanguage. [Dem 83] gives a purely equational definition of much of the language. [Boe 84] performs the same task in an axiomatic framework. [Hoo 84] provides a more mathematical interpretation of the type structure of the language. Currently implementations exist for Sun 3 and Vax ma chines. The Vax implementation is described in [Boe 86]. For details on obtaining a copy, please send mail to boehm@rice.edu. The following kinds of abbreviations are allowed by the current implementation in at least some contexts. Selections from a type T can usually be omitted if any of the arguments has signature val T or var T. It is also possible to specify a list of default types for selections. Types such as Short nor mally appear on this list. Applications of V functions can usu ally be omitted. Square brackets [ ] denoting function ap plication with no arguments can be omitted if there presence can be inferred from signature information. Trailing arguments to function applications can usually be omitted if their identity can be determined from the remaining arguments, if their signa ture is var Void, or if the function is named ., and the ap plication looks like a floating point constant. Function re sult signature can be omitted if they can be ascertained from the body of the function construction. Application brackets can be omitted if they can be reconstructed from the precedences de scribed above. Strings may be used to abbreviate certain kinds of repeated concatenations, as described above. Type components with certain names have associated default signatures. If such components have the indicated signature, then the signature spec ification may be omitted from a type signature. The following is a complete list of such names: Identifi er Default Signature New func [ ] var T := func [ var T; val T ] val T =, <, >, <=, >=, <> func [ x, y: val T ] val Boolean ^+, ^* func [ x, y: val T ] val T V func [ var T ] val T any quoted identifier func [ ] val T In all of the above, T represents the local type name. If none is explicitly specified it is generated by the compiler. Selection types and omitted arguments are in ferred primarily in a bottomup manner. That is arguments to function applications are used to infer selections of the func tion and to infer missing arguments for the application. The context in which the application appears is not used. This means that selections of functions from types other than the type sof their arguments frequently need to be specified in some explicit manner. In particular, the compiler is very poor at choosing types from which to select constants and prod Mk func tions. The use in ni construct can be used to suggest types from which to infer such selections. The following help files are available through the rhelp facili ty. A few of them are actually duplicates of each other. A num ber of topics are not even touched on in this introduction. Thus this list should also be useful in providing an overview of omis sions from this document. Function to ungracefully terminate ex ecution. Function to test whether two variables refer to the same location. Function calls. Access command line arguments. Function for constructing array types. Unlimited size integers. Builtin type. Capture the current continuation associated with a computation. Can be used to implement exception handling, coroutines, etc. Partial evaluation and. Builtin character string type. Acceptable abbreviations. How to invoke the com piler. Russell conditionals. Partial evaluation or. Blocks and declarations. The Russell loop construct. Enumeration type constructor. Test for end of file on the standard input. Ex plicitly increase heap size. Removing components from types. Types of expressions and how theyre parsed Building new types with conversion functions Access separately compiled Russell and nonRussell programs File I/O. Builtin double precision float ing point data type. The machine state variable. Function constructions. How to get started; also printed if no arguments are given. Common problems. Lexical rules. Restriction on the use of global variables in functions. The builtin 16 bit inte ger data type A copy of this introduction. Limits imposed by the compiler. Function for producing linear list types. Val Void constant. Function for producing pointer types. Product type constructor. Record type constructor. Accessing a component of a type. An unusual interface to UNIX (trademark AT&T Bell Labs) signals. Brief description and syntax. The unorthodox treatment of character strings and numeric constants. Some simple debug ging facilities. The type constructor. The builtin type. Adding operations to types.