ML (Meta Language)
|Paradigm||multi-paradigm: functional, imperative|
|Designed by||Robin Milner & others at the University of Edinburgh|
|Typing discipline||strong, static|
|Standard ML, Caml Light, OCaml, F#, LazyML, OcaMl|
|Miranda, Haskell, Cyclone, Nemerle, C ++|
The Functional Programming Language ML was originally developed by Robin Milner (Turing Award 1991) and was the first language to include Polymorphic Type Inference, together with a type-safe exception-handling mechanism. ML stands for Meta Language.
it was conceived to develop proof tactics in the LCF theorem prover (whose language, pplambda, a combination of the first-order predicate calculus and the simply typed polymorphic lambda calculus, had ML as its metalanguage). ML provides the following features:
- Automatic Type Inference: Don't write types yourself, let the compiler do it for you!
- Mostly declarative syntax using Pattern Matching
- Generic Types: Somewhat like C++'s templates
- Parametric Modules (functors): Modules can take other modules as an argument.
Unlike Haskell (and like Lisp Language and Scheme), ML is strict: the arguments to a function are always evaluated before evaluating the body of the function.
There are three ML dialects in wide use:
- Standard ML
The programming language Standard ML, also known as SML, is inspired by certain fundamental concepts of Computer Science, making them directly available to the programmer. In particular: Trees and other recursive datatypes may be declared and used without mention of pointers;
Functions are values in Standard ML; functions can take functions as arguments and return functions as results;
Type polymorphism (Milner, 1978) makes it possible to declare functions whose type depends on the context. For example, the same function may be used for reversing a list of integers and a list of strings.
Standard ML emphasises safety in programming. Some aspects of safety are ensured by the soundness of the Standard ML type discipline. Other aspects of safety are ensured by the fact that Standard ML implementations use automatic memory management, which precludes, for example, premature de-allocation of memory.
Standard ML has references, arrays, input-output, a modules system, libraries and highly sophisticated compilers, all in order to give it the convenience and efficiency required for large-scale applications.
Usage of Standard ML
Standard ML is or has been taught to undergraduate and graduate Computer Science students at University of Edinburgh, University of Cambridge, Carnegie Mellon University, University of Princeton, and University of Copenhagen, among others.
Standard ML is used as a tool in research on theorem proving, compiler technology and program analysis. For example, the HOL theorem prover from Cambridge University is written in Standard ML.
The IT University of Copenhagen has around 100,000 lines of SML in web-based self-service systems for students and staff, including the personnel roster, a course evaluation system and work-flow systems for student project administration.
Tens of man-years of research and development have gone into developing mature compilation technology for Standard ML. The resulting compilers include Standard ML of New Jersey, Moscow ML, MLWorks, SML.NET, SML Server and the ML Kit with Regions.
A Standard ML compiler contains a type checker, which checks whether the source program can be elaborated using the elaboration rules. The static semantics only says which inferences are legal, it is not an algorithm for deciding whether a given source program complies with the inference rules. The type checker is such an algorithm, however. It employs type unification to infer types from the source program (Damas and Milner, 1982).
The basic types available are integer, real, string, boolean. From these we can construct objects using tuples, lists, functions and records, we can also create our own base types - more of this later. A tuple is a sequence of objects of mixed type. Some tuples:
(2,"Andrew") : int * string (true,3.5,"x") : bool * real * string ((4,2),(7,3)) : (int * int) * (int * int)
While a tuple allows its components to be of mixed type and is of fixed length, a list must have identically typed components and may be of any length. Some lists:
[1,2,3] : int list ["Andrew","Ben"] : string list [(2,3),(2,2),(9,1)] : (int * int) list [,,[1,2]] : int list list
Note that the objects [1,2] and [1,2,3] have the same type int list but the objects (1,2) and (1,2,3) are of different types, int*int and int*int*int respectively. It is important to notice the types of objects and be aware of the restrictions. While you are learning ML most of your mistakes are likely to get caught by the type checking mechanism.
Unlike most other languages ML allows the left hand side of an assignment to be a structure. ML "looks into" the structure and makes the appropriate binding.
- val (d,e) = (2,"two"); val d = 2 : int val e = "two" : string - val [one,two,three] = [1,2,3]; std_in:0.0-0.0 Warning: binding not exhaustive one :: two :: three :: nil = ... val one = 1 : int val two = 2 : int val three = 3 : int
Note that the second series of bindings does succeed despite the dire sounding warning - the meaning of the warning may become clear later.
The list is a phenomenally useful data structure. A list in ML is like a linked list in C or PASCAL but without the excruciating complexities of pointers. A list is a sequence of items of the same type. There are two list constructors, the empty list nil and the cons operator ::. The nil constructor is the list containing nothing, the :: operator takes an item on the left and a list on the right to give a list one longer than the original. Examples
nil  1::nil  2::(1::nil) [2,1] 3::(2::(1::nil)) [3,2,1]
In fact the cons operator is right associative and so the brackets are not required. We can write 3::2::1::nil for [3, 2, 1]. Notice how :: is always between an item and a list. The operator :: can be used to add a single item to the head of a list. The operator @ is used to append two lists together. It is a common mistake to confuse an item with a list containing a single item. For example to obtain the list starting with 4 followed by [5,6,7] we may write 4::[5,6,7] or @[5,6,7] however 4@[5,6,7] or ::[5,6,7] both break the type rules.
:: : 'a * 'a list -> 'a list nil : 'a list
To put 4 at the back of the list [5,6,7] we might try [5,6,7]::4 however this breaks the type rules in both the first and the second parameter. We must use the expression [5,6,7]@ to get [5,6,7,4]
Pattern matching and recursion
When defining a function over a list we commonly use the two patterns
fun lfun nil = ... | lfun(h::t) = ... lfun t ...;
However this need not always be the case. Consider the function last, which returns the last element of a list.
last [4,2,5,1] = 1 last ["sydney","beijeng","manchester"] = "manchester"
The two patterns do not apply in this case. Consider the value of last nil. What is the last element of the empty list? This is not a simple question like "what is the product of an empty list". The expression last nil has no sensible value and so we may leave it undefined. Instead of having the list of length zero as base case we start at the list of length one. This is the pattern [h], it matches any list containing exactly one item.
fun last [h] = h | last(h::t) = last t;
This function has two novel features.
When we enter the function as above ML responds with a warning such as std_in:217.1-218.23 Warning: match non exhaustive
h :: nil => ... h :: t => ...
The function still works, however ML is warning us that the function has not been defined for all values, we have missed a pattern - namely nil. The expression last nil is well-formed (that is it obeys the type rules) however we have no definition for it.
Overlapping left hand sides
As the pattern [h] is identical to the pattern h::nil we might rewrite the definition
fun last(h::nil) = h | last(h::t) = last t;
Examining the patterns of the left hand side of the = we note that there is an overlap. An expression such as 5::nil will match with both the first equation (binding h to 5) and the second equation (binding h to 5 and t to nil). Clearly it is the first line which we want and indeed ML will always attempt to match with patterns in the order that they appear. Note that this is not really a novel feature as all of our first examples with the patterns x and 0 had overlapping left hand sides.
A function may be defined with being named. The syntax is as follows
fn <parameters> => <expression>
- fn x => 2*x; > it = fn : int -> int - it 14; > 28 : int
This can be particularly useful when using higher order functions like map
map (fn x=> 2*x) [2,3,4];
Exceptions are raised with the raise keyword, and handled with pattern matching handle constructs.
exception Undefined fun max [x] = x | max (x::xs) = let val m = max xs in if x > m then x else m end | max  = raise Undefined fun main xs = let val msg = (Int.toString (max xs)) handle Undefined => "empty list...there is no max!" in print (msg ^ "\n") end
The exception system can be exploited to implement non-local exit, an optimization technique suitable for functions like the following.
exception Zero fun listProd ns = let fun p  = 1 | p (0::_) = raise Zero | p (h::t) = h * p t in (p ns) handle Zero => 0 end
When the exception Zero is raised in the 0 case, control leaves the function p altogether. Consider the alternative: the value 0 would be returned to the most recent awaiting frame, it would be multiplied by the local value of h, the resulting value (inevitably 0) would be returned in turn to the next awaiting frame, and so on. The raising of the exception allows control to leapfrog directly over the entire chain of frames and avoid the associated computation. It has to be noted that the same optimization could have been obtained by using a tail recursion for this example.
A Standard ML compiler also generates code which, when executed, will give the result prescribed by the dynamic semantics of the language definition. Some compilers generate byte code, others native code. Most Standard ML compilers perform extensive program analysis and program transformations in order to achieve performance that can compete with what is obtained in languages like C. All Standard ML compilers can compile source programs into stand-alone programs. The compiled programs can be invoked from a command line or as a web-service.
Run-time memory management
All Standard ML implementations provide for automatic re-cycling of memory. Standard ML of New Jersey and Moscow ML use generational garbage collection. The MT Kit with Regions and SML Server instead employ a static analysis, called region inference(Tofte and Talpin, 1994), which predicts allocation at compile-time and inserts explicit de-allocation of memory at safe points in the generated code.
Separate compilation, libraries and tools
All Standard ML Compilers allow for separate compilation, to deal with large programs. The Standard ML Basis Library (Gansner and Reppy, 2004) consists of a comprehensive collection of Standard ML modules. Some of these give the programmer access to efficient implementations of text-book data structures and algorithms. Other modules provide support for advanced input-output. Still others give access to the operating system level, so that one can do systems programming in Standard ML. Tools include generators of lexical analysers and parsers.
History of Standard ML
"ML" stands for meta language. ML, the predecessor of Standard ML, was devised by Robin Milner and his co-workers at Edinburgh University in the 1970’s, as part of their theorem prover LCF (Milner, Gordon and Wadsworth, 1979). Other early influences were the applicative languages already in use in Artificial Intelligence, principally LISP, ISWIM, POP2 and HOPE. During the 1980’s and first half of the 1990’s, ML inspired much programming language research internationally. MacQueen, extending ideas from HOPE and CLEAR, proposed the Standard ML modules system (MacQueen, 1984). Other major advances were mature compilers (Appel and MacQueen, 1991), a library (Gansner and Reppy, 2004), type-theoretic insight (Harper and Lillibridge, 1994) and a formal definition of the language (Milner, Tofte, Harper and MacQueen, 1997). Further information on the history of Standard ML may be found in the language definition (Milner et al., 1997).
The following program "hello.sml":
print "Hello world!\n";
can be compiled with MLton:
$ mlton hello.sml
$ ./hello Hello world!
Insertion sort for lists of integers (ascending) is expressed concisely as follows:
fun ins (n, ) = [n] | ins (n, ns as h::t) = if (n<h) then n::ns else h::(ins (n, t)) val insertionSort = List.foldr ins 
This can be made polymorphic by abstracting over the ordering operator. Here we use the symbolic name << for that operator.
fun ins' << (num, nums) = let fun i (n, ) = [n] | i (n, ns as h::t) = if <<(n,h) then n::ns else h::i(n,t) in i (num, nums) end fun insertionSort' << = List.foldr (ins' <<) 
Quicksort can be expressed as follows. This generic quicksort consumes an order operator <<
fun quicksort << xs = let fun qs  =  | qs [x] = [x] | qs (p::xs) = let val (less, more) = List.partition (fn x => << (x, p)) xs in qs less @ p :: qs more end in qs xs end