Next: , Previous: Introduction, Up: Top   [Contents]


2 Syntax


2.1 Syntax overview

A Mercury program consists of a set of source files, each of which contains a module. A module consists of a sequence of tokens, each of which is a variable, name, literal, or punctuation symbol. Tokens may be separated by any amount of whitespace, comments, and line number directives. These separators are mostly ignored by the parser, but in some cases whitespace may be required to separate tokens that would otherwise be ambiguous. In other cases whitespace is not allowed, e.g., before the open-ct token, or after a ‘.’ operator that would otherwise be interpreted as an end token.


2.2 Character set

Mercury program source files must be written using the UTF-8 encoding of the Unicode character set. In the rest of this chapter, “letters”, “digits”, “underscore” and other kinds of punctuation refer to characters in the Basic Latin code block.


2.3 Whitespace

Whitespace is defined to be the following characters:

Unicode nameUnicode code pointNotes
SPACEU+0020
CHARACTER TABULATIONU+0009Horizontal-tab
LINE FEEDU+000A
LINE TABULATIONU+000BVertical-tab
FORM FEEDU+000C
CARRIAGE RETURNU+000D

2.4 Comments

The ‘%’ character starts a comment that continues to the end of the line. The ‘/*’ character sequence starts a comment that continues until the next occurrence of ‘*/’. For example:

% Calculate the answer.
Result = 42     % This is the answer!

/*
omit this declaration for now
:- mode append(out, in, in) is semidet.
*/

2.5 Line number directives

A line number directive consists of the character ‘#’, a positive integer specifying the line number, and then a newline. Line number directives set the current line number; they are used in conjunction with the ‘pragma source_file’ declaration (see Source file name) to indicate that errors in the Mercury code following the directive should be reported relative to the line number set by the directive. This is useful if the code in question was generated by another tool, in which case the line number can be set to the corresponding location in the original source file from which the Mercury code was derived. The Mercury compiler can thereby issue more informative error messages using locations in the original source file. A ‘#line’ directive specifies the line number for the immediately following line. Line numbers for lines after that are incremented as usual, so the second line after a ‘#100’ directive would be considered to be line number 101.


2.6 Variables

A variable is an uppercase letter or underscore followed by zero or more letters, underscores, and digits. For example, ‘Sum’, ‘_NotNeeded’, ‘_a’, and ‘_123’ are variables, whereas ‘x’ and ‘_#’ are not. A variable token consisting of single underscore is treated specially: each instance of ‘_’ denotes a distinct variable.

Variables starting with an uppercase letter are expected to occur more than once; the compiler will issue a warning if this is not the case, as it often indicates a simple error. Variables starting with an underscore are presumed to be “don’t-care” variables; the compiler will issue a warning if a variable starting with an underscore, excluding single underscore variables, occurs more than once in the same scope.


2.7 Names

A name token is either an unquoted name, a quoted name, a graphic name, or a single semicolon character. An unquoted name is a lowercase letter followed by zero or more letters, underscores, and digits. A quoted name is any sequence of zero or more characters enclosed in single quotes ('). Within a quoted name, two adjacent single quotes stand for a single single quote. Quoted names can also contain backslash escapes of the same form as for strings. A graphic name is a sequence of one or more of the following characters

! & * + - : < = > ? @ ^ ~ \ # $ . /

where the first character is not ‘#’.

As a special case, the character sequences ‘<<u’ and ‘>>u’, are also graphic names. (They are intended to denote left and right shifts by unsigned amounts respectively.)

An unquoted name, graphic name, or semicolon is treated as equivalent to a quoted name containing the same sequence of characters.


2.8 Literals

The different literals in Mercury are as follows.

string

A string is a sequence of characters enclosed in double quotes (").

Within a string, two adjacent double quotes stand for a single double quote. For example, the string ‘ """" ’ is a string of length one, containing a single double quote: the outermost pair of double quotes encloses the string, and the innermost pair stand for a single double quote.

Strings may also contain backslash escapes. ‘\a’ stands for “alert” (a beep character), ‘\b’ for backspace, ‘\r’ for carriage-return, ‘\f’ for form-feed, ‘\t’ for tab, ‘\n’ for newline, ‘\v’ for vertical-tab. An escaped backslash, single-quote, or double-quote stands for itself.

The sequence ‘\x’ introduces a hexadecimal escape; it must be followed by a sequence of hexadecimal digits and then a closing backslash. It is replaced with the character whose character code is identified by the hexadecimal number. Similarly, a backslash followed by an octal digit is the beginning of an octal escape; as with hexadecimal escapes, the sequence of octal digits must be terminated with a closing backslash.

The sequences ‘\u’ and ‘\U’ begin a Unicode escape. ‘\u’ must be followed by the Unicode character code expressed as four hexadecimal digits. ‘\U’ must be followed by the Unicode character code expressed as eight hexadecimal digits. The highest allowed value is ‘\U0010FFFF’.

A backslash followed immediately by a newline is deleted; thus an escaped newline can be used to continue a string over more than one source line. (String literals may also contain embedded newlines.)

integer

An integer is either a decimal, binary, octal, hexadecimal, or character-code literal. A decimal literal is any sequence of decimal digits. A binary literal is ‘0b’ followed by any sequence of binary digits. An octal literal is ‘0o’ followed by any sequence of octal digits. A hexadecimal literal is ‘0x’ followed by any sequence of hexadecimal digits. A character-code literal is ‘0'’ followed by any single character.

Decimal, binary, octal and hexadecimal literals may be optionally terminated by a suffix that indicates whether the literal represents a signed or unsigned integer and what the size of that integer is. These suffixes are:

SuffixSignednessSize
i or no suffixSignedImplementation-defined
i8Signed8-bit
i16Signed16-bit
i32Signed32-bit
i64Signed64-bit
uUnsignedImplementation-defined
u8Unsigned8-bit
u16Unsigned16-bit
u32Unsigned32-bit
u64Unsigned64-bit

For decimal, binary, octal and hexadecimal literals, an arbitrary number of underscores (‘_’) may be inserted between the digits. An arbitrary number of underscores may also be inserted between the radix prefix (i.e. ‘0b’, ‘0o’ and ‘0x’) and the initial digit. Similarly, an arbitrary number of underscores may be inserted between the final digit and the signedness suffix. The purpose of the underscores is to improve readability; they do not affect the numeric value of the literal.

float

A floating point literal consists of a sequence of decimal digits, a decimal point (‘.’) and a sequence of digits (the fraction part), and the letter ‘E’ (or ‘e’), an optional sign (‘+’ or ‘-’), and then another sequence of decimal digits (the exponent). The fraction part or the exponent (but not both) may be omitted.

An arbitrary number of underscores (‘_’) may be inserted between the digits in a floating point literal. Underscores may not occur adjacent to any non-digit characters (i.e. ‘.’, ‘e’, ‘E’, ‘+’ or ‘-’) in a floating point literal, with one exception: underscores may occur between a digit and an ‘e’ or ‘E’ that introduces the exponent part of the number. The purpose of the underscores is to improve readability; they do not affect the numeric value of the literal.

implementation-defined-literal

An implementation-defined literal consists of a dollar sign (‘$’) followed by an unquoted name.


2.9 Punctuation symbols

The following punctuation symbols are used in Mercury’s syntax.

open-ct

A left parenthesis, ‘(’, that is not preceded by whitespace.

open

A left parenthesis, ‘(’, that is preceded by whitespace.

close

A right parenthesis, ‘)’.

open-list

A left square bracket, ‘[’.

close-list

A right square bracket, ‘]’.

open-curly

A left curly bracket, ‘{’.

close-curly

A right curly bracket, ‘}’.

backquote

A backquote character, ‘`’.

ht-sep

A “head-tail separator”, i.e. a vertical bar, ‘|’.

comma

A comma, ‘,’.

end

A full stop (period), ‘.’.


2.10 Operators

An operator is either a builtin operator or a user-defined operator. A user-defined operator is a name, module qualified name (see The module system), or variable, enclosed in backquotes (grave accents). User-defined operators are left-associative infix operators that bind more strongly than most other operators (see below).

The builtin operators, with the exception of comma, are all names, and as such they can be used without arguments supplied. For example, ‘f(+)’ is syntactically valid. In some cases parentheses may be required to limit the scope of an operator without arguments, e.g. if it appears as an argument to another operator. The comma operator is not a name and therefore requires single quotes in order to be used without arguments. An operator in single quotes is still an operator, so any requirement for parentheses will remain unchanged.

Operators are a syntactic concept. The ‘+’ infix operator, for example, is only a symbol; it does not mean addition, unless you write or import code that defines it as addition. Modules in the Mercury standard library, such as int, uint and float, provide such arithmetic definitions. Other, non-arithmetic definitions can also be provided, for example, the ‘-’ infix operator is defined as subtraction by those modules but is defined as a pair constructor by the pair module.

The following table lists all of Mercury’s builtin operators, as well as user-defined operators of the form `op`. Operators with a higher priority bind more tightly than those with a lower priority. (This is a recent change; previously, Mercury followed the Prolog tradition in using higher priorities to denote operators that bind less tightly.) For example, given that + has priority 1000 and * has priority 1100, the term 2 * X + Y parenthesises as (2 * X) + Y. Note that the module qualification operator, ‘.’, binds more tightly than any other operator. Therefore, operator terms using builtin operators need to be parenthesized in order to be module qualified, for example, integer subtraction can be written as ‘int.(A - B)’ whereas pair construction can be written as ‘pair.(A - B)’. (See The module system).

The “Specifier” field indicates what structure terms constructed with an operator are allowed to take. “f” represents the operator and “x” and “y” represent arguments. “x” represents an argument whose priority must be strictly higher than that of the operator. “y” represents an argument whose priority is higher than or equal to that of the operator. For example, “yfx” indicates a left-associative infix operator, while “xfy” indicates a right-associative infix operator.

Operator                        Specifier         Priority

.                               yfx               1490
!                               fx                1460
!.                              fx                1460
!:                              fx                1460
@                               xfx               1410
^                               xfy               1401
^                               fx                1400
event                           fx                1400
:                               yfx               1380
`op`                            yfx               1380
**                              xfy               1300
-                               fx                1300
\                               fx                1300
*                               yfx               1100
/                               yfx               1100
//                              yfx               1100
<<                              yfx               1100
<<u                             yfx               1100
>>                              yfx               1100
>>u                             yfx               1100
div                             yfx               1100
mod                             xfx               1100
rem                             xfx               1100
for                             xfx               1000
+                               fx                1000
+                               yfx               1000
++                              xfy               1000
-                               yfx               1000
--                              yfx               1000
/\                              yfx               1000
\/                              yfx               1000
..                              xfx               950
:=                              xfx               850
=^                              xfx               850
<                               xfx               800
=                               xfx               800
=..                             xfx               800
=:=                             xfx               800
=<                              xfx               800
==                              xfx               800
=\=                             xfx               800
>                               xfx               800
>=                              xfx               800
@<                              xfx               800
@=<                             xfx               800
@>                              xfx               800
@>=                             xfx               800
\=                              xfx               800
\==                             xfx               800
~=                              xfx               800
is                              xfx               799
and                             xfy               780
or                              xfy               760
func                            fx                700
impure                          fy                700
pred                            fx                700
semipure                        fy                700
\+                              fy                600
not                             fy                600
when                            xfx               600
~                               fy                600
<=                              xfy               580
<=>                             xfy               580
=>                              xfy               580
all                             fxy               550
arbitrary                       fxy               550
atomic                          fxy               550
disable_warning                 fxy               550
disable_warnings                fxy               550
promise_equivalent_solutions    fxy               550
promise_equivalent_solution_sets fxy              550
promise_exclusive               fy                550
promise_exclusive_exhaustive    fy                550
promise_exhaustive              fy                550
promise_impure                  fx                550
promise_pure                    fx                550
promise_semipure                fx                550
require_complete_switch         fxy               550
require_switch_arms_det         fxy               550
require_switch_arms_semidet     fxy               550
require_switch_arms_multi       fxy               550
require_switch_arms_nondet      fxy               550
require_switch_arms_cc_multi    fxy               550
require_switch_arms_cc_nondet   fxy               550
require_switch_arms_erroneous   fxy               550
require_switch_arms_failure     fxy               550
require_det                     fx                550
require_semidet                 fx                550
require_multi                   fx                550
require_nondet                  fx                550
require_cc_multi                fx                550
require_cc_nondet               fx                550
require_erroneous               fx                550
require_failure                 fx                550
trace                           fxy               550
try                             fxy               550
some                            fxy               550
,                               xfy               500
&                               xfy               475
->                              xfy               450
;                               xfy               400
or_else                         xfy               400
then                            xfx               350
if                              fx                340
else                            xfy               330
::                              xfx               325
==>                             xfx               325
where                           xfx               325
--->                            xfy               321
catch                           xfy               320
type                            fx                320
solver                          fy                319
catch_any                       xfy               310
end_module                      fx                301
import_module                   fx                301
include_module                  fx                301
initialise                      fx                301
initialize                      fx                301
finalise                        fx                301
finalize                        fx                301
inst                            fx                301
instance                        fx                301
mode                            fx                301
module                          fx                301
pragma                          fx                301
promise                         fx                301
rule                            fx                301
typeclass                       fx                301
use_module                      fx                301
-->                             xfx               300
:-                              fx                300
:-                              xfx               300
?-                              fx                300


2.11 Terms

Terms are the basic construct used in Mercury syntax. The term syntax is summarized by the following rules. (All of this information can be found in the descriptions below the rules.)

term = core-term | special-term

core-term = variable | literal | functor-term

literal = string | integer | float | implementation-defined-literal

functor-term = name | name open-ct functor-args close

functor-args = functor-arg | functor-arg,functor-args

functor-arg = arg | arg::arg

args = arg | arg,args

arg = term, where the term is not an operator term with priority >= 1000

special-term = operator-term | list-term | tuple-term | apply-term | paren-term

operator-term = term operator term | operator term | operator term term,
    where the term is constructed according to the requirements of the operator
    (see Operators)

list-term = ‘[list-body? ‘]list-body = arg | arg,list-body | arg ht-sep term

tuple-term = ‘{args? ‘}apply-term = term open-ct args close,
    where the term is not a name or operator term

paren-term = ‘(term)

Terms can be described in the following way.

term

A term is either a core term or a special term. A term normalization procedure, given below, translates terms that may contain special terms into terms that are only constructed from core terms; two terms are considered syntactically equivalent if they translate to the same term. Syntactically equivalent terms can be used interchangeably anywhere in a module (e.g. operator syntax can be used in declarations and clauses, in particular those that define an operator).

Note that there can be further equivalences in some contexts, e.g. an if-then-else can be written in either of two equivalent forms. Such equivalences will be covered in the relevant chapters.

core-term

A core term is a variable, a literal, or a functor-term.

literal

A literal is a string, an integer, a float, or an implementation-defined-literal.

functor-term

A functor term is either a name or a compound term. A compound term is a name followed without any intervening whitespace by an open parenthesis (i.e. an open-ct token), then followed by a functor argument list and a close parenthesis. E.g., ‘foo(X,Y)’ is a compound term, whereas ‘foo (X,Y)’ and ‘foo()’ are not (the first because the space after ‘foo’ is not allowed, the second because the parentheses must be omitted if there are no arguments).

The principal functor of a functor term is the name and arity of the term, separated by a slash, where the arity is the number of arguments (or zero if there are no arguments). For example, the principal functor of ‘foo(bar,baz)’ is ‘foo/2’, while the principal functor of ‘foo’ is ‘foo/0’. The principal functor of a special term is determined after term normalization. For module qualified terms, the principal functor is defined slightly differently (see The module system).

Note that the word “functor” has a number of definitions, but in Mercury it just means a symbol to which arguments can be applied, and which has no intrinsic meaning of its own. It is a syntactic concept that applies to all functor terms. In specific contexts functors may also be referred to as type constructors, data constructors (or just constructors), predicates, functions, etc. The principal functor may also be referred to as the “top-level constructor”.

functor-args

A functor argument list is a sequence of one or more functor arguments, separated by commas.

functor-arg

A functor argument is either a single argument or two arguments separated by a ‘::’ operator (the latter form is for mode qualifiers; see Different clauses for different modes).

args

An argument list is a sequence of one or more arguments, separated by commas.

arg

An argument is any term, except operator terms where the operator does not bind more tightly than comma (i.e., where the priority is greater than or equal to 1000). In such a situation parentheses can be used, e.g. ‘f((A,B))’ is a compound term with one argument that is a parenthesized operator term, whereas ‘f(A,B)’ is a compound term with two arguments (and no operators).

special-term

A special term is an operator term, a list term, a tuple term, an apply term, or a parenthesized term. The term normalization procedure, below, defines how these terms are represented internally as core terms.

operator-term

An operator term is a term constructed using an operator, which complies with the rules for constructing terms using that operator (see Operators). Operator terms can be infix, such as ‘A + B’, unary-prefix, such as ‘not P’, or binary-prefix, such as ‘some Vars Goal’.

list-term

A list term is an open square bracket (an open-list token), followed by an optional list body, followed by a close square bracket (a close-list token). If the list body is omitted it is the empty list. If present, the list body is an argument list, optionally followed by a vertical bar (a ht-sep token) followed by a term. E.g., ‘[]’, ‘[X]’, and ‘[1, 2 | Tail]’ are all list terms. The argument list gives the elements appearing at the front of the list. The term following the vertical bar, if present, gives the tail of the list (i.e. the remaining elements), otherwise the tail is the empty list. Note that technically the tail does not have to be a list for this to be syntactically valid, although generally it would need to be in order to be type correct.

tuple-term

A tuple term is an open curly bracket (an open-curly token), followed by an optional argument list, followed by a close curly bracket (a close-curly token). If the argument list is omitted it is the empty tuple, otherwise the arguments give the components of the tuple. E.g., {} and {1,'2',"three"} are tuple terms.

apply-term

An apply-term is a “closure” term, which can be any term other than a name or an operator term, followed without any intervening whitespace by an open parenthesis (an open-ct token), an argument list, and a close parenthesis (a close token). E.g., ‘A(B,C)’ is an apply-term. An apply-term represents the closure (i.e. a higher-order value) applied to the arguments.

Note that although the closure term cannot be an operator term, it can be a parenthesized term. Thus ‘(Var ^ foo)(Arg1, Arg2)’ is a valid apply-term, whereas ‘Var ^ foo(Arg1, Arg2)’ is not (it is an operator term whose second argument is a compound term).

paren-term

A parenthesized term is just a term enclosed in parentheses. E.g., (X-Y) is a parenthesized term.

The term normalization procedure works by rewriting special terms that occur anywhere within a term (i.e. at the top level or as some descendant) according to a set of rewriting rules, and repeating until no rules can be further applied. The rules are as follows.

term1 `name` term2name(term1, term2)
term1 `var` term2var(term1, term2)
term1 operator term2 → 'operator'(term1, term2)
operator term → 'operator'(term)
operator term1 term2 → 'operator'(term1, term2)

[ ] → '[]'
[ arg ] → '[|]'(arg, '[]')
[ arg , list-body ] → '[|]'(arg, [list-body])
[ arg | term ] → '[|]'(arg, term)

{ } → '{}'
{ args } → '{}'(args)

term(args) → ''(term, args)

( term ) → term

For example, the following terms are all syntactically equivalent (i.e. they are equal after term normalization). The last is constructed from core terms; the others all normalize to this term. The last one shows that the principal functor of all of them is '[|]'/2.

[1, 2, 3]
[1, 2, 3 | []]
[1, 2 | [3]]
[1 | [2, 3]]
'[|]'(1, '[|]'(2, '[|]'(3, '[]')))

Similarly, the following terms are all syntactically equivalent. The principal functor in this case is '+'/2.

A * B + C
(A * B) + C
'+'('*'(A, B), C)

2.12 Items

Mercury modules are parsed as a sequence of items. Each item is a term followed by an end token (a period). If the principal functor of the term is ‘:-/1’, it is a declaration item and the argument is the declaration. Otherwise it is a clause item and the term is the clause. Note that we often use “declaration” and “clause” informally to refer to the items themselves (i.e., including the end token).

Declarations are used in relation to a number of features. Details of their syntax are covered in the relevant chapters.

A clause provides part of the definition of a function or predicate, and takes one of the following forms. The first form is a DCG-rule and is not discussed further here (see Definite clause grammars).

DCG_Head --> DCG_Body.

Head :- Body.
Head.

Head is the head of the clause and Body, if present, is the body of the clause. If the principal functor is ‘:-/2’, the clause is a rule and the body is a goal (see Goals). If the principal functor is not ‘:-/1’, ‘:-/2’, or ‘-->/2’, the clause is a fact. A fact is equivalent to a rule that has the same head and a body of ‘true’.

A clause head takes one of the following forms.

FunctorTerm = Result
FunctorTerm

FunctorTerm is a functor term whose arguments are expressions (see Expressions), optionally annotated with mode qualifiers (see Different clauses for different modes). If the principal functor is ‘=/2’, then the clause is a function rule or a function fact, and Result is an expression, optionally annotated with a mode qualifier. Otherwise, the clause is a predicate rule or a predicate fact. The principal functor of FunctorTerm determines which function or predicate is being defined.

For example, the following three items are clauses. The first is a function fact that defines a function named ‘loop/1’, a not particularly useful function. The second is a predicate fact and the third is a predicate rule, that between them define a predicate named ‘append/3’.

loop(X) = 1 + loop(X).

append([], Bs, Bs).
append([X | As], Bs, [X | Cs]) :-
    append(As, Bs, Cs).

The following example contains a number of declaration and clause items, and forms a syntactically valid module. (The semantics of the clauses will be covered in the next chapter. Note that the length/1 function in the standard library is implemented more efficiently.)

:- module slow_length.
:- interface.
:- import_module list.

:- func length(list(T)) = int.

:- implementation.
:- import_module int.       % for '+'

length([]) = 0.
length([_ | Xs]) = 1 + length(Xs).

:- end_module slow_length.

Previous: Terms, Up: Syntax   [Contents]