A Hindley-Milner (HM) type system is essentially an extension of typed lambda calculus with parametric polymorphism. This polymorphism is usually implicit and can importantly only be used at the 'top-level'. It has some extremely nice properties such as the type of any term being inferable from its use, and a fairly simple, almost built-in, method for inferring said types.
HM deals with 2 kinds of type: Monotypes (aka. simple types) and polytypes (aka. type schemes).
Monotypes are simple, non-polymorphic types such as
Polytypes, or type schemes, are universally quantified monotypes, such as
We can convert between monotypes and polytypes using instantiation and generalization.
Instantiation takes a polytype and returns a monotype by removing the quantifier and replacing all usages of it with some monotype. For example,
Generalization takes a monotype and returns a polytype by quantifying over all free type variables in the monotype. For example,
Importantly, the types
HM only allows universal quantification of types at the "topmost" layer. IE, any polytype must have the form
The meat of HM can be specified concisely with a small set of inference rules. These look scary, but are fairly straight forward once you understand how to read them. Every inference rule has a set of premises and a conclusion, which are put on the top and bottom of a horizontal bar, like so:
Which reads as "if I know P, and I know Q, then I know P and Q". The rule has 2 premises and a single conclusion.
When dealing with actual type-theoretic inference, we often carry around an environment, usually written as
The '
Putting the previous knowledge to use, let's look at the inference rules for HM, one by one.
The first rule describes how to assign types to usages of variables. It is written as:
In english: "If the environment actually contains the information that
Interpretation: The environment keeps track of all the variables we have inferred the type of so far.
The next rules tells us how to type usages of functions in applications (function calls):
In english: "If the environment proves that
Interpretation: Giving the correct type of input to a function should result in the correct type of output.
This rule tells us how to type function abstractions (declarations of functions, concretely as lambdas):
In english: "If the environment and the assumption that
Interpretation: A lambda expression is a function taking an input of some initially unknown type, returning a value of the type we infer body to, and the body may make use of the input.
This rule implies something about how we should infer the type of lambda expression. First, we make up a new, fresh type variable for the input parameter. Then we add info to the environment stating that the input parameter has that new type, and then we infer the type of the lambda body in that altered environment.
This rule tells us how to type let bindings of the form let x = e1 in e2
where x
is an identifier and e1
and e2
are both expressions.
In english: "If the environment proves that
Interpretation: Let bindings bind a name to a value in a following expression, and evaluate to a value of the type we infer that following expression to be. The following expression can naturally make use of the bound name.
As for lambda expressions, this implies that we need to alter the environment before inferring the type for the following expression.
This rule usually isn't included in the minimal presentation of HM, but I include it since it is useful:
In english: "If the environment proves that the condition expression of an if-then-else is of type
Interpretation: The condition expression in an if-then-else must be a boolean, and the expression in both conditional branches must match it in type. The entire construct evaluate to said type.
What I have described so far is not the full set of inference rules. We are missing the rule for instantiation and generalization. I find these easier to explain and understand informally, so I have omitted them. Refer to the previous section on the topic for more info.
When actually performing inference, unification is a process that often comes up. Unification is a process that takes as input 2 types, and returns a substitution that when applied to either type, will make the 2 types equal. We use unification whenever we want to constrain 2 inferred types to be the same.
A substitution is a list of bindings from names (of type variables) to concrete types. It can be applied to any structure containing types, which will replace the type variables in said type with the type in their entry of the substitution, where applicable.
As an example, the types
The typical algorithm for unifying 2 types is fairly simple, written below as F# code:
let rec unify (t1: Type) (t2: Type) : Map<string, Type> =
match t1, t2 with
| a, b when a = b -> Map.empty
| TypeVar a, b when not (occurs a b) -> Map.ofList [(a, b)]
| a, TypeVar b when not (occurs b a) -> Map.ofList [(b, a)]
| TypeArrow (l1, r1), TypeArrow (l2, r2) ->
let s1 = unify l1 l2
let s2 = unify (apply s1 r1) (apply s2 r2)
compose s1 s2
| _ -> failwith "Could not unify types"
In English, this roughly means:
The so-called 'occurs check' is there to prevent pairs of types such as
The apply
function takes a substitution and a type, and applies that substitution to a type.
In HM, generalization happens only in 1 place: When inferring let bindings. This is known as let-polymorphism or let-generalization. This lets the type system handle expression such as:
let id = fun(x) -> x in (x true, x 3)
Where the name id
has been bound to a polymorphic value of type
This means concretely that, while we are inferring the type of a let expression, when we are about to put a new binding into the environment before inferring the body of the let, we generalize the type of that binding, and put the generalized type into the environment instead. Remember that generalization is just universally quantifying free type variables.
Disclaimer: I don't really know what I am talking about
HM is hard to extend without introducing a bunch of soundness holes. It can be done, though.
HM (as with many type systems) is essentially just glorified propositional logic.
Haskell-style typeclasses under this interpretation are predicates. The Haskell type Num a => a -> a -> a
written as predicate logic is
One can extend the type scheme, which was previously the most general way to describe a type at our disposal, to also include a list of predicates. Each predicate could be represented as a tuple of an identifier and a type, indicating that the type 'is a member of' the typeclass described by the identifier. If one accumulates these predicates during inference, just as is done for substitutions, one can solve these constraints in the end to see if the program typechecks. The problem basically becomes a typical predicate-logic problem, and one that languages such as Prolog interestingly seem like they were made to solve.
Much more about these ideas in the "Typing Haskell in Haskell" paper linked to below.