[m-rev.] for review: Overhaul of the Syntax chapter of the reference manual

Julien Fischer jfischer at opturion.com
Sun Sep 11 23:37:00 AEST 2022


Hi Mark,

On Wed, 7 Sep 2022, Mark Brown wrote:

> Attached is another diff - it's the full diff as the relative diff is
> no shorter. Also attached is the html that results from
> --split=chapter.

I think that mostly achieves what you wanted.  (The Next and Previous
links look a little weird -- I couldn't find a way to omit them.)

...

> diff --git a/doc/reference_manual.texi b/doc/reference_manual.texi
> index fb2b23eac..6b8a55f01 100644
> --- a/doc/reference_manual.texi
> +++ b/doc/reference_manual.texi

...

>  @node Syntax overview
>  @section Syntax overview
> 
> -Mercury's syntax is similar to the syntax of Prolog,
> -with some additional declarations for types, modes, determinism,
> -the module system, and pragmas,
> -and with the distinction that function symbols
> -may stand also for invocations of user-defined functions
> -as well as for data constructors.
> -
> -A Mercury program consists of a set of modules.
> -Each module is a file containing a sequence of items
> -(declarations and clauses).
> -Each item is a term followed by a period.
> -Each term is composed of a sequence of tokens,
> -and each token is composed of a sequence of characters.
> -Like Prolog,
> -Mercury has the Definite Clause Grammar (DCG) notation for clauses.
> +A Mercury program consists of a set of source files,
> +each of which contains a module.
> +A module consists of a sequence of tokens,
> +each of which is a variable, name, literal, or punctuation symbol.
> +Tokens may be separated by any amount of
> +whitespace, comments, and line number directives.
> +These separators are mostly ignored by the parser,
> +but in some cases whitespace may be required to separate tokens
> +that would otherwise be ambiguous.
> +In other cases whitespace is not allowed,
> +e.g., before the @var{open-ct} token,
> +or after a @samp{.} operator
> +that would otherwise be interpreted as an @var{end} token.
> +
> +Mercury's syntax is similar to that of Prolog,
> +with some notable distinctions:
> +
> + at itemize
> + at item String constants are atomic;
> +they are not abbreviations for lists of character codes
> +(@pxref{Literals}).
> + at item
> +The operator table is fixed and cannot be modified.
> +Some operators differ in priority from those used in Prolog
> +(@pxref{Operators}).
> + at end itemize

Perhaps this could be moved to the Prolog Transition guide?

...

> + at node Line number directives
> + at section Line number directives
> 
> - at item line number directive
>  A line number directive consists of the character @samp{#},
>  a positive integer specifying the line number, and then a newline.
> -A @samp{#@var{line}} directive's only role
> -is to specifying the line number;
> -it is otherwise ignored by the syntax.
> -Line number directives may occur anywhere a token may occur.
> -They are used in conjunction with the @samp{pragma source_file} declaration
> -to indicate that the Mercury code following was generated by another tool;
> -they serve to associate each line in the Mercury code
> -with the source file name and line number
> -of the original source from which the Mercury code was derived,
> -so that the Mercury compiler can issue more informative error messages
> -using the original source code locations.
> +Line number directives specify a current line number;

I suggest:

    Line number directives set the current line number;

> +they are used in conjunction with the @samp{pragma source_file} declaration
> +(@pxref{Source file name})
> +to indicate that errors in the subsequent Mercury code should be
> +reported as coming from a different location.

I suggest:

    ... that errors in the Mercury code following the directive should be
    reported relative to the line number set by the directive.

> +This is useful if the code in question was generated by another tool,
> +in which case the line number can be set to the corresponding location
> +in the original source file from which the Mercury code was derived.
> +The Mercury compiler can thereby issue more informative error messages
> +using locations in the original source file.
>  A @samp{#@var{line}} directive specifies
>  the line number for the immediately following line.
>  Line numbers for lines after that are incremented as usual,
>  so the second line after a @samp{#100} directive
>  would be considered to be line number 101.
> 
> + at node Variables
> + at section Variables
> +
> +A variable is an uppercase letter or underscore
> +followed by zero or more letters, underscores, and digits.
> +A variable token consisting of single underscore is treated specially:
> +each instance of @samp{_} denotes a distinct variable.
> +(In addition, variables starting with an underscore
> +are presumed to be ``don't-care'' variables;
> +the compiler will issue a warning
> +if a variable that does not start with an underscore occurs only once,
> +or if a variable starting with an underscore
> +occurs more than once in the same scope.)

I suggest just making the parenthetical bit into a separate (un-parenthesized)
paragraph.

Also, I suggest giving some examples of valid and invalid variable names.

(Another issue here is that in the context of Unicode, "letter" and digits
are broader than what Mercury actually supports in variable names.)

> + at node Names
> + at section Names
> +
> + at c In these paragraphs, we use @t instead of @samp or @code
> + at c to wrap examples as @t will not put quotes around its argument,
> + at c and we cannot expect readers to figure out which quotes
> + at c are subject matter and which are decoration.
> +A name is either an unquoted name, a quoted name, a graphic name,
> +or a single semicolon character.

I suggest:

     A name token is either ...

> +An unquoted name is a lowercase letter followed by zero or more letters,
> +underscores, and digits.

(Ditto here with regard to "letters" and digits.

> +A quoted name is any sequence of zero or more characters
> +enclosed in single quotes (@t{'}).
> +Within a quoted name,
> +two adjacent single quotes stand for a single single quote.
> +Quoted names can also contain
> +backslash escapes of the same form as for strings.
> +A graphic name is a sequence of one or more of the following characters
> + at example
> +! & * + - : < = > ? @@ ^ ~ \ # $ . /
> + at end example
> +
> + at noindent
> +where the first character is not @samp{#}.
> +
> +An unquoted name, graphic name, or semicolon
> +is treated as equivalent to a quoted name containing
> +the same sequence of characters.

...

> + at node Punctuation symbols
> + at section Punctuation symbols
> +
> +The following punctuation symbols are used
> +in the description of Mercury's syntax.

I suggest:

    The following punctuation symbol tokens are used in Mercury's
    syntax.

...

> + at node Operators
> + at section Operators
> +
> +An operator is one of the builtin operators or a user-defined operator.

I suggest:

    An operator is either a builtin operator or a user-defined operator.

> +A user-defined operator is a name,
> +module qualified name (@pxref{The module system}), or variable,
> +enclosed in backquotes (grave accents).
> +User-defined operators are left-associative infix operators
> +that bind more strongly than most other operators (see below).
> +
> +The builtin operators, with the exception of comma, are all names,
> +and as such they can be used without arguments supplied.
> +For example, @samp{f(+)} is syntactically valid.
> +In some cases parentheses may be required
> +to limit the scope of an operator without arguments,
> +e.g. if it appears as an argument to another operator.
> +The comma operator is not a name and therefore requires single quotes
> +in order to be used without arguments.
> +An operator in single quotes is still an operator,
> +so any requirement for parentheses will remain unchanged.
> +
> +Note that operators are a syntactic concept.
> +The @samp{+} infix operator, for example, is only a symbol;
> +it doesn't mean addition unless you write or import code

s/doesn't/does not/ (and elsewhere).

> +that defines it as addition.
> +Modules in the Mercury standard library,
> +such as @code{int} and @code{float},
> +provide such arithmetic definitions.

...

> + at node Terms
> + at section Terms
> +
> +Terms are the basic construct used by most syntactic forms in Mercury.

Or just:

     Terms are the basic construct used in Mercury syntax.

...

> + at noindent
> +Terms can be described in the following way.
> +
> + at table @var
> + at item term
> +A term is either a @var{core term} or a @var{special term}.
> +A term normalization procedure, given below,
> +translates terms that may contain special terms
> +into terms that are only constructed from core terms;
> +two terms are considered syntactically equivalent
> +if they translate to the same term.
> +Syntactically equivalent terms can be used interchangeably
> +anywhere in a module
> +(e.g.@: operator syntax can be used in declarations and clauses,
> +in particular those that define an operator).
> +
> +Note that there can be further equivalences in some contexts,
> +e.g.@: an if-then-else can be written in either of two equivalent forms.
> +Such equivalences will be covered in the relevant chapters.
> +
> + at item core-term
> +A core term is a @var{variable}, a @var{literal}, or a @var{functor-term}.
> +
> + at item literal
> +A literal is
> +a @var{string},
> +an @var{integer},
> +a @var{float},
> +or an @var{implementation-defined-literal}.
> +
> + at item functor-term
> +A functor term is either a name or a compound term.
> +A compound term is a name followed without any intervening whitespace
> +by an open parenthesis (i.e.@: an @var{open-ct} token),
> +then followed by a functor argument list and a close parenthesis.
> +E.g., @samp{foo(X,Y)} is a compound term,
> +whereas @samp{foo (X,Y)} and @samp{foo()} are not
> +(the first because the space after @samp{foo} is not allowed,
> +the second because the parentheses must be omitted if there are no arguments).
> +
> +The @dfn{principal functor} of a functor term
> +is the name and arity of the term, separated by a slash,
> +where the arity is the number of arguments
> +(or zero if there are no arguments).
> +For example, the principal functor of @samp{foo(bar,baz)} is @samp{foo/2},
> +while the principal functor of @samp{foo} is @samp{foo/0}.
> +The principal functor of a special term is determined
> + at emph{after} normalization.

I suggest: s/normalization/term normalization/.  The latter is unambiguous.

...

> +( @var{term} ) @expansion{} @var{term}
> + at end example
> +
> + at noindent
> +For example, the following terms are all syntactically equivalent
> +(i.e.@: they are equal after normalization).

Ditto: s/normalization/term normalization/

> +The last is constructed from core terms;
> +the others all normalize to this term.
> +From the last one it can be seen that
> +the principal functor of all of them is @t{'[|]'/2}.
> + at example
> +[1, 2, 3]
> +[1, 2, 3 | []]
> +[1, 2 | [3]]
> +[1 | [2, 3]]
> +'[|]'(1, '[|]'(2, '[|]'(3, '[]')))
> + at end example

...

> + at noindent
> + at var{Head} is the @dfn{head} of the clause
> +and @var{Body}, if present, is the @dfn{body} of the clause.
> +If the principal functor is @samp{:-/2},
> +the clause is a @dfn{rule} and the body is a goal
> +(@pxref{Goals}).
> +If the principal functor is not
> + at samp{:-/1}, @samp{:-/2}, or @samp{-->/2},
> +the clause is a @dfn{fact}.
> +A fact is equivalent to a rule that has the same head
> +and a body of @samp{true}.
> 
> -Mercury's rules for implicit quantification (@pxref{Implicit quantification})
> -mean that variables are often implicitly existentially quantified.
> -There is usually no need to write existential quantifiers explicitly.
> +A clause head takes one of the following forms.
> 
> - at item @code{all @var{Vars} @var{Goal}}
> -A universal quantification.
> - at var{Goal} must be a valid goal.
> - at var{Vars} must be a list of variables
> -(they may @emph{not} be state variables).
> -This goal is an abbreviation for @samp{not (some @var{Vars} not @var{Goal})}.
> + at example
> + at var{FunctorTerm} = @var{Result}
> + at var{FunctorTerm}
> + at end example
> +
> + at noindent
> + at code{@var{FunctorTerm}} is a functor term
> +whose arguments are expressions
> +(@pxref{Expressions}),
> +optionally annotated with mode qualifiers
> +(@pxref{Different clauses for different modes}).
> +If the principal functor is @samp{=/2},
> +then the clause is a function rule or a function fact,
> +and @code{@var{Result}} is an expression,
> +optionally annotation with a mode qualifier.

s/annotation/annotated/

> +Otherwise,
> +the clause is a predicate rule or a predicate fact.
> +The principal functor of @code{@var{FunctorTerm}}
> +determines which function or predicate is being defined.
> +
> + at noindent
> +For example, the following three items are clauses.
> +The first is a function fact that defines a function named @samp{loop/1},
> +a not particularly useful function.
> +The second is a predicate fact and the third is a predicate rule,
> +that between them define a predicate named @samp{append/3}.
> +
> + at example
> +loop(X) = 1 + loop(X).
> +
> +append([], Bs, Bs).
> +append([X | As], Bs, [X | Cs]) :-
> +    append(As, Bs, Cs).
> + at end example
> +

...


> + at node Overview of Mercury semantics
> + at section Overview of Mercury semantics
> +
> +There is no agreed upon definition of ``declarative programming''.
> +One notable characteristic of Mercury as a declarative language, however,
> +is that it has both a @dfn{declarative} and an @dfn{operational} semantics.
> +The declarative semantics is conceptually the simpler of the two:
> +it is only concerned with the relationship between inputs and outputs,
> +and not the steps taken to execute a program.
> +The operational semantics is additionally concerned with these steps.
> +This is often expressed by saying that
> +the declarative semantics is about ``what''
> +whereas the operational semantics is about ``how''.
> +
> +In the remainder of this section we introduce
> +each of these semantics.
> +
> + at subheading Declarative semantics
> +
> +The declarative semantics is concerned with ``truth''.
> +For example, it's true that 1 plus 1 is 2,

s/it's/it is/

> +and that the length of the list [1, 2, 3] is 3.
> +Statements like this that may be either true or false
> +are known as @dfn{propositions},

I suggest:

     Statements that are either true or false like this are
     called @dfn{propositions}.

> +e.g., 1 + 1 = 2 and 1 + 2 = 5 are both propositions;
> +if + is interpreted as integer addition
> +then the first proposition is true and the second is false.

I suggest:

     For example, ...

> +
> +Mercury clauses say things that are true about

s/say/state/

> +the function or predicate being defined.
> +To illustrate we will use an example from the previous chapter.
> +(Note that, here and below,
> +some declarations would need to be added to make this compile.)
> +
> + at example
> +length([]) = 0.
> +length([_ | Xs]) = 1 + length(Xs).
> + at end example
> +
> + at noindent
> +Both of these clauses are facts about the function @code{length/1}.
> +The first simply states that the length of an empty list is zero.
> +The second states that no matter what expressions we substitute for
> +the variables @samp{Xs} and @samp{_},
> +the length of @samp{[_ | Xs]} will be one greater than
> +the length of @samp{Xs}.
> +In other words, the length of a non-empty list
> +is one greater than the length of its tail.
> +
> +These two statements are true according to our intuitive idea of length.
> +Furthermore, we can see that the clauses cover every possible list,
> +since every list is either empty or non-empty,
> +and every non-empty list has a tail that is also a list.
> +Perhaps surpisingly,

s/surpisingly/surprisingly/

> +this is enough to conclude that
> +our implementation of list length is correct,
> +at least as far as arguments and return values are concerned.

    as its arguments and ...

...

> + at subheading Operational semantics
> +
> +The declarative semantics doesn't tell us

s/doesn't/does not/

> +whether our program will terminate, for example,
> +or what its computational complexity is.
> +For that we need the operational semantics,
> +which tells us how the program will be executed.
> +
> +Execution in Mercury starts with a @dfn{goal}.
> +This is a proposition that may contain some variables,
> +and the aim of execution is to find a substitution
> +for which the proposition is true.

Make that into two sentences.

> +If it does, we refer to this as @dfn{success},
> +and we refer to the substitution that was found as a @dfn{solution}.
> +If execution determines that there are no such substitutions,
> +we refer to this as @dfn{failure}.

...

>  + at node Goals
> + at section Goals
> +
> +A goal is a term that takes one of the following forms.
> +
> + at table @code
> + at item @var{Call}
> +Any goal which is a functor term
> +that does not match any of the other forms below
> +is a first-order predicate call.
> +The principal functor determines the predicate called,
> +which must be visible (@pxref{Modules}).
> +The arguments, if present, are expressions.
> +
> + at item call(Closure)
> + at itemx call(Closure1, Arg1)
> + at itemx call(Closure2, Arg1, Arg2)
> + at itemx call(Closure3, Arg1, Arg2, Arg3)
> + at itemx @dots{}
> + at itemx Closure
> + at itemx Closure1(Arg1)
> + at itemx Closure2(Arg1, Arg2)
> + at itemx Closure3(Arg1, Arg2, Arg3)
> + at itemx @dots{}
> +A higher-order predicate call.
> +The closure and arguments are expressions.
> + at samp{call(Closure)} and @samp{Closure} just call the specified closure.
> +The other forms append the specified arguments
> +onto the argument list of the closure before calling it.
> +A higher-order predicate call written using
> +an apply term with @code{N} arguments
> +is equivalent to the form using @code{call/N+1}.
> + at xref{Higher-order}.

...

>  The declarative semantics of an if-then-else is given by
> - at code{( @var{CondGoal}, @var{ThenGoal} ; not(@var{CondGoal}), @var{ElseGoal})},
> -but the operational semantics are different,
> + at code{( @var{CondGoal}, @var{ThenGoal} ; not(@var{CondGoal}), @var{ElseGoal} )},
> +but the operational semantics is different,
>  and it is treated differently for the purposes of determinism inference
>  (@pxref{Determinism}).
>  Operationally, it executes the @var{CondGoal},
>  and if that succeeds, then execution continues with the @var{ThenGoal};
> -otherwise, i.e.@: if @var{CondGoal} fails, it executes the @var{ElseGoal}.
> -Note that @var{CondGoal} can be nondeterministic ---
> -unlike Prolog, Mercury's if-then-else does not commit
> -to the first solution of the condition if the condition succeeds.
> +otherwise, i.e.@: if @var{CondGoal} fails without producing any solutions,
> +it executes the @var{ElseGoal}.
> +Note that @var{CondGoal} can be nondeterministic---if the @var{CondGoal}
> +succeeds more than once then
> +the @var{ThenGoal} is executed once for each of the solutions.
> +(Unlike Prolog,
> +Mercury's if-then-else does not commit
> +to the first solution of the condition if the condition succeeds.)

Parenthetical Prology things can go in the transition guide.

>  If @var{CondGoal} is an explicit existential quantification,
>  @code{some @var{Vars} @var{QuantifiedCondGoal}}, then the variables @var{Vars}
>  are existentially quantified over the conjunction of the goals
> - at var{QuantifiedCondGoal} and @var{ThenGoal}.
> + at var{QuantifiedCondGoal} and @var{ThenGoal}
> +(see existential quantifications, below).
>  Explicit existential quantifications that occur as subgoals of @var{CondGoal}
>  do @emph{not} affect the scope of variables in the ``then'' part.
>  For example, in
> @@ -876,60 +1362,108 @@ the variable @var{V} is only quantified over @var{C}
>  because the top-level goal of the condition
>  is not an explicit existential quantification.

...

> + at item event @var{Goal}
> +An event goal.
> + at var{Goal} is a predicate call.
> +Event goals are an extension used by the Melbourne Mercury implementation
> +to support user defined events in the Mercury debugger, @samp{mdb}.

s/user defined/user-defined/

> +See the ``Debugging'' chapter of the Mercury User's Guide for further details.
> +
> + at end table
> +

...

> + at node Expressions
> + at section Expressions
> +
> +Syntactically, an expression is just a term.
> +Semantically, an expression is
> +a variable,
> +a literal,
> +a functor expression,
> +or a special expression.
> +A special expression is
> +a conditional expression,
> +a unification expression,
> +a state variable,
> +an explicit type qualification,
> +a type conversion expression,
> +a lambda expression,
> +an apply expression,
> +or a field access expression.
> +
> +A literal is a string, an integer, a float,
> +or an implementation-defined literal
> +(note that character literals are just single character names; see below).
> +

...

> +A functor expression is a name or a compound expression.
> +A compund expression is a compound term

s/compund/compound/

> +that does not match the form of a special expression,
> +and whose arguments are expressions.
> +If a functor expression is not a character literal,
> +its principal functor must be the name of
> +a visible function, predicate, or data constructor
> +(except for field specifiers,
> +for which the corresponding field access function must be visible;
> +see below).
> +
> +Character literals in Mercury are names with a single character,
> +possibly quoted.

I suggest:

    Character literals in Mercury are single character names, possibly
    quoted.

> +Since they sometimes require quotes
> +and sometimes require parentheses,
> +for code consistency we recommend
> +writing all character literals with quotes and
> +(except where used as arguments)
> +parentheses.
> +For example, @code{Char = ('+') ; Char = ('''')}.


> +
> +Special expressions
> +(not including field access expressions,
> +which are covered below)
> +take one of the following forms.

...

> @@ -1282,11 +2116,13 @@ whether they are defined by clauses or lambda expressions.
>  @samp{!@var{X}} is not a legal function result,
>  because it stands for two arguments, rather than one.
>  @item
> - at samp{!@var{X}} may not appear as an argument in a function application,
> +Neither @samp{!@var{X}} nor @samp{!:@var{X}}
> +may appear as an argument in a function application,
>  because this would not make sense
>  given the usual interpretation of state variables and functions.
> + at c XXX it appears the implementation does actually allow !:X
>  (The default mode of functions is that all arguments are input,
> -while in the overwhelming majority of cases, @samp{!:@var{X}} is output.)
> +while in typical usage, @samp{!:@var{X}} is output.)
>  @end itemize
>
>  Within each clause, the compiler

...

> +Variables occurring in types are called type variables.

And in type classes and instances.  (The existing stuff text on variable
scoping presumably predated them.)

> +Variables occurring in insts or modes are called inst variables.
> +Variables that occur in expressions,
> +and that are not inst variables or type variables,
> +are called ordinary variables.

...

> +The three different variable sorts occupy different namespaces:
> +there is no semantic relationship between two variables of different sorts
> +(e.g.@: a type variable and an ordinary variable)
> +even if they happen to share the same name.
> +However, as a matter of programming style, it is generally a bad idea
> +to use the same name for variables of different sorts in the same clause.

Julien.


More information about the reviews mailing list