[m-rev.] for post-commit review: better diagnostic for missing higher order insts

Peter Wang novalazy at gmail.com
Tue Jul 25 15:10:45 AEST 2023


On Tue, 25 Jul 2023 07:35:35 +1000 "Zoltan Somogyi" <zoltan.somogyi at runbox.com> wrote:
> Rewrite the manual section on higher order insts.
> 
> doc/reference_manual.texi:
>     Rewrite the manual section on higher order insts.
>     This used to be in subsection 9.3.1 "Builtin higher-order insts and modes",
>     but from a user's perspective, the insts and modes they want to use
>     and need education about are NOT themselves builtin. (The forms they
>     may take are builtin, but that is true for every part of the syntax
>     of Mercury.) Therefore this diff deletes the old section header for 9.3.1,
>     making the updated text part of 9.3 itself.

> 
>     Make the new text show examples before showing the general case.
> 
>     Move the paragraphs about the prohibition on unifying/comparing
>     higher order terms to its own section; it did not fit into the
>     section on higher order insts and modes.
> 
>     Fix some old spelling errors.

> diff --git a/doc/reference_manual.texi b/doc/reference_manual.texi
> index 7535c137f..e86989928 100644
> --- a/doc/reference_manual.texi
> +++ b/doc/reference_manual.texi
...
> - at menu
> -* Builtin higher-order insts and modes::
> -* Default insts for functions::
> -* Combined higher-order types and insts::
> - at end menu
> +Consider @samp{list.foldl}, one of the standard fold predicates on lists.
> +The types of its arguments are given by this predicate declaration:
> + at example
> +:- pred foldl(pred(L, A, A), list(L), A, A).
> + at end example
> +
> +The first argument is a higher order value (a predicate in this case),
> +whose types are the types bound by the caller
> +to the type variables @var{L}, @var{A} and @var{A} respectively,
> +where @var{L} is the type of the elements in the list in the second argument,
> +and @var{A} is the type of the accumulator
> +whose initial and final values are the third and fourth arguments.

I suggest T as the variable for the elements.

> +A @emph{higher order mode} is a mode in which
> +the initial and/or final inst is a higher order inst.
> +These @emph{can} be written like this:
> +
> + at example
> +pred(in, in, out) is det >> pred(in, in, out) is det
> + at end example
> +
> + at noindent
> +for a predicate being passed to another predicate or function,
> +and like this
> +
> + at example
> +free >> pred(in, in, out) is det
> + at end example

>  
> - at node Builtin higher-order insts and modes
> - at subsection Builtin higher-order insts and modes
> + at noindent
> +for a predicate being returned from another predicate or function,
> +but in practice, it is far more convenient

Start a new sentence: In practice, ...

> +to use the builtin mode constructors @samp{in/1} and @samp{out/1} to write
>  
> -The language contains builtin @samp{inst} values
> + at example
> +in(pred(in, in, out) is det)
> +out(pred(in, in, out) is det)
> + at end example
> +
> + at noindent
> +which are each equivalent to the corresponding example just above.
> +

examples

> +You can use higher order insts and modes in mode declarations like this:
> +
> + at example
> +:- mode foldl(in(pred(in, in, out) is det), in, in, out) is det.
> + at end example
> +
> + at noindent
> +The @samp{in()} wrapper around the higher order inst here
> +declares the first argument of @samp{list.foldl} to be an input,
> +but the higher order inst inside the wrapper goes further,
> +and tells the compiler the argument modes and the determinism
> +of the predicate that @samp{list.foldl} takes @emph{in this mode}.

I suggest:

    ... goes further,
    by specifying the argument modes and the determinism of the
    predicate that @samp{list.foldl} takes @emph{in this mode}.

> +
> +That last qualification is important,
> +because @samp{list.foldl} has several modes,
> +each differing in the argument modes, in the determinism, or both.
> +The set of mode declarations for @samp{list.foldl} (at least in July 2023) is
> +
> + at c Should we show just a subset of the modes?

Yes, delete the parenthetical and just show a subset of the modes.

    The set of mode declarations for @samp{list.foldl} includes

> + at c
> + at c This list is a good argument for (possibly constrained) mode variables
> + at c and determinism variables, which would allow all of the modes above
> + at c to declared with just
> + at c :- mode foldl(pred(in, InMode, OutMode) is Det, in, InMode, OutMode) is Det.
> + at example
> +:- mode foldl(in(pred(in, in, out) is det), in, in, out) is det.
> +:- mode foldl(in(pred(in, mdi, muo) is det), in, mdi, muo) is det.
> +:- mode foldl(in(pred(in, di, uo) is det), in, di, uo) is det.
> +:- mode foldl(in(pred(in, in, out) is semidet), in, in, out) is semidet.
> +:- mode foldl(in(pred(in, mdi, muo) is semidet), in, mdi, muo) is semidet.
> +:- mode foldl(in(pred(in, di, uo) is semidet), in, di, uo) is semidet.
> +:- mode foldl(in(pred(in, in, out) is multi), in, in, out) is multi.
> +:- mode foldl(in(pred(in, in, out) is nondet), in, in, out) is nondet.
> +:- mode foldl(in(pred(in, mdi, muo) is nondet), in, mdi, muo) is nondet.
> +:- mode foldl(in(pred(in, in, out) is cc_multi), in, in, out) is cc_multi.
> +:- mode foldl(in(pred(in, di, uo) is cc_multi), in, di, uo) is cc_multi.
> + at end example
> +
> +As you can see, you can pass predicates
> +with several different argument mode/determinism combinations
> +as the first argument.

s/As you can see, /That means/

> +In the case of @samp{list.foldl},
> +the modes of the accumulator arguments
> +and the determinism of the whole predicate will follow
> +the argument modes and the determinism of the higher-order argument,
> +but one can create predicates (and functions) where this would not be true.
> + at c An example of that may be nice, but putting it here
> + at c would probably be more confusing than helpful.
> +
> +You can give names to such insts and modes
> +using declarations such as
> +
> + at example
> +:- inst std_fold_pred == (pred(in, in, out) is det).
> +:- mode std_fold_pred_in == in(pred(in, in, out) is det).
> + at end example
> +
> + at noindent
> +Note that the parentheses around the right hand side

of the inst declaration

> +are required by the relative precedences of the operators involved,
> +such as @samp{==} and @samp{is}.
> +

are required due to the relative precedences of the
@samp{==} and @samp{is} operators.

> +Given these definitions, the declarations
> +
> + at example
> +:- mode foldl(in(std_fold_pred), in, in, out) is det.
> +:- mode foldl(std_fold_pred_in, in, in, out) is det.
> + at end example
> +
> + at noindent
> +will each tell the compiler exactly the same information as

are both equivalent to

> +
> + at example
> +:- mode foldl(in(pred(in, in, out) is det), in, in, out) is det.
> + at end example
> +
> + at noindent
> +but they may be more convenient to write,
> +especially in the presence of more arguments.

> + at footnote{If there is a single inst or a single mode
> +that all instances of a given type are expected to use,
> +programmers will often give that inst or mode
> +the same name as the name of the type.
> +The compiler looks up names in its type table when it expects a type,
> +in its inst table when it expects an inst,
> +and in its mode table when it expects a mode,
> +so this does not confuse the compiler.
> +However, this practice @emph{can} confuse Mercury programmers
> +who (a) do not know this fact, or (b) are not familiar with this convention.}

I'm not sure we should encourage the use of trivial mode definitions.
I think writing a mode with in() or out() around a named inst is
clearer, and convenient enough.

> +
> +As a convenience, the language allows the @samp{in()} wrapper
> +around higher order insts, which turn them into higher order modes,
> +to be omitted in the mode declarations of predicates and functions.
> + at c ZZZ or is that true more generally, of all places that expect modes
> + at c but find insts?
> + at c I (zs) don't know the answer.

> +In those parts of these declarations
> +where the compiler expects to find a mode, if it finds an inst instead,
> +it implicitly inserts the @samp{in()} around that inst.

I wouldn't describe it like that -- see below.

> +This is why the usualy mode of @samp{list.foldl},

usual

> +which we introduced above as being
> +
> + at example
> +:- mode foldl(in(pred(in, in, out) is det), in, in, out) is det.
> + at end example
> +
> + at noindent
> +can also be written as
> +
> + at example
> +:- mode foldl(pred(in, in, out) is det, in, in, out) is det.
> + at end example
> +
> + at noindent
> +and is actually written in the source code like this.
> + at c ZZZ *Should* it be written like that?
> + at c Arguably, it makes the code harder to read for novices.
> +

Yes, I think so. The fact that (pred(in, in, out) is det)
can both be an inst or a mode could be a source of confusion,
and it's only slightly more convenient than using the in() wrapper.

We might want to remove the builtin higher-order modes eventually.
I will admit it is tempting to use them to make a predicate's mode
declarations align with the type declaration...

> + at c ZZZ can you omit the in(...) wrapper around named insts?
> + at c Answer: NO. it is not even true of the inst is an inst name that expands
> + at c to a higher order inst, rather than being an *explicit* higher order inst.
> + at c ZZZ Next question: *why* is the answer NO?

My understanding is that, by design, the syntax of a builtin
higher-order mode makes them look very much like (i.e. exactly like)
a higher-order inst. But an inst is an inst, and a mode is a mode.
The compiler isn't "expecting to find a mode, but finding an inst",
it's just "expecting to find a mode".

I suggest restoring the idea/description of builtin insts.
They are "builtin" in the same way that predicate/function/tuple types
are described as "builtin".

As for builtin higher-order modes, we could describe them as "syntactic
sugar", a "shorthand" or "convenience" (as before), and refrain from
calling them "builtin".

> +
> +The general form of higher order insts follows one of two patterns,
> +one for predicates, and one for functions.
> +
> +The pattern for predicates is
>  
>  @example
>  (pred) is @var{Determinism}
>  pred(@var{Mode}) is @var{Determinism}
>  pred(@var{Mode1}, @var{Mode2}) is @var{Determinism}
>  @dots{}
> + at end example
> +
> +while the pattern for functions is
> +
> + at example
>  (func) = @var{Mode} is @var{Determinism}
>  func(@var{Mode1}) = @var{Mode} is @var{Determinism}
>  func(@var{Mode1}, @var{Mode2}) = @var{Mode} is @var{Determinism}
>  @dots{}
>  @end example
>  
> -These insts represent the instantiation state
> -of variables bound to higher-order predicate and function terms
> -with the appropriate mode and determinism.
> -For example, @samp{pred(out) is det} represents the instantiation state
> -of being bound to a higher-order predicate term which is @code{det}
> -and accepts one output argument;
> -the term @samp{sum([1,2,3])} from the example above
> -is one such higher-order predicate term which matches this instantiation state.
> +In the case of zero-argument predicates and functions,
> +the parentheses around @samp{pred} and @samp{func}
> +are required to tell the compiler that these words
> +are not being used as operators in this case.
> +And, as explained about, one will usually need parentheses

above

> +around any instances of these patterns in Mercury code.
> +
> + at c ZZZ Note that nontrivial examples for functions
> + at c are much harder to find than for predicates, because
> + at c - any functions with the default arg modes and determinism need no decl
> + at c - we don't want to encourage people to write functions which are not det
> + at c - we don't want to encourage people to write functions whose args
> + at c   have uniqueness requirements
> + at c This leaves functions whose arg modes restrict the allowed function symbols
> + at c (which would be better expressed using subtypes) and functions whose arg
> + at c modes include higher order insts/modes, which would be too complex to
> + at c be useful as an example to novices.

I don't think an example for functions is necessary.

>  
> -As a convenience, the language also contains
> -builtin @samp{mode} values of the same name
> -(and they are what we have been using in the examples up to now).
> -These modes map from the corresponding @samp{inst} to itself.
> -It is as if they were defined by
> +Consider the declaration of a function
> +that computes the intersection of two maps from keys to values:
>  
>  @example
> -:- mode (pred is @var{Determinism}) == in(pred is @var{Determinism}).
> -:- mode (pred(@var{Inst}) is @var{Determinism}) ==
> -    in(pred(@var{Inst}) is @var{Determinism}).
> - at dots{}
> +:- func intersect(func(V, V) = V, map(K, V), map(K, V)) = map(K, V).
>  @end example
>  
>  @noindent
> -using the parametric inst @samp{in/1} mentioned in @ref{Modes}
> -which maps an inst to itself.
> -
> -If you want to define a predicate which returns a higher-order predicate term,
> -you would use a mode such as @samp{free >> pred(@dots{}) is @dots{}},
> -or @samp{out(pred(@dots{}) is @dots{} )}.
> -For example:
> - at c XXX The space after the dots{} above works around a bug in texi2html
> +One could declare the mode of this function
> +either using a separate mode declaration, like this:
>  
>  @example
> -:- pred foo(pred(int)).
> -:- mode foo(free >> (pred(out) is det)) is det.
> -
> -foo(sum([1,2,3])).
> +:- mode intersect(in(func(in, in) = out is det), in, in) = (out) is det.
>  @end example
>  
> -In the above mode declaration, the current Mercury implementation
> -requires parentheses around the higher-order inst.
> -(This is because the operator @samp{>>} binds more tightly then the operator
> - at samp{is}.)
> -
> -Note that in Mercury it is an error to attempt to unify two higher-order terms.
> -This is because equivalence of higher-order terms
> -is undecidable in the general case.
> -
> -For example, given the definition of @samp{foo} above, the goal
> + at noindent
> +or by combining the mode declaration with the function declaration, like this:
>  
>  @example
> -foo((pred(X::out) is det :- X = 6))
> +:- func intersect((func(V, V) = V)::in(func(in, in) = out is det),
> +    map(K, V)::in, map(K, V)::in) = (map(K, V)::out) is det.
>  @end example
>  
>  @noindent
> -is illegal.
> -If you really want to compare higher-order predicates for equivalence,
> -you must program it yourself;
> -for example, the above goal could legally be written as
> -
> - at example
> -P = (pred(X::out) is det :- X = 6),
> -foo(Q),
> -all [X] (call(P, X) <=> call(Q, X)).
> - at end example
> +In both cases, just as for the predicate examples above,
> +the @samp{in()} wrapper around the higher order inst is optional,
> +though in the combined declaration the parentheses must stay.
>  
> -Note that the compiler will only catch
> -direct attempts at higher-order unifications;
> -indirect attempts (via polymorphic predicates,
> -for example @samp{(list.append([], [P], [Q])}
> -may result in an error at run-time rather than at compile-time.
> + at c ZZZ should we have an example of a pred or func of arity zero?
> + at c If so, we could use the force function from lazy.m.

I don't think it's necessary.

The other diff is okay.

Peter


More information about the reviews mailing list