[m-rev.] for tryout: type_spec_constrained_preds

Julien Fischer jfischer at opturion.com
Wed Jan 31 20:08:47 AEDT 2024


On Tue, 30 Jan 2024, Zoltan Somogyi wrote:

> For tryout now and later review by Julien. A tryout on typeclass-heavy
> code would either confirm or disprove that
>
> - this design does what it is supposed to do, and that
> - what this design is supposed to do is what its intended users
>  actually need.
>
> (For term_io.m, a single type_spec_constrained_preds pragma seems to be
> sufficient to replace all of its 52 type_spec pragmas, *and* several more
> that *should* be there but aren't.)

I have tried the new pragma with my CSV and JSON libraries. The
following concerns the CSV library (specifically commit c4fd9d7 of
the CSV library).

Source: https://github.com/juliensf/mercury-csv

There is one opportunity to use a type_spec_constrained_preds pragma,
namely the following:

:- pragma type_spec_constrained_preds(
         [
          stream.line_oriented(Stream, State),
          stream.unboxed_reader(Stream, char, State, Error)
         ],
         apply_to_superclasses,
         [subst([
             Stream = io.text_input_stream,
             State = io.state,
             Error = io.error])]).

The aim is that the predicate read_from_file/6 would call the
specialised versions that result. Compiling with the pragma above
results in the following error:

     Making Mercury/asm_fast.gc/x86_64-pc-linux-gnu/Mercury/opts/csv.typed_reader.opt
     In `:- pragma type_spec' declaration for function `csv.make_error_message'/1:
       error: variables `V_1, V_2' do not occur in the `:- func' declaration.

Changing apply_to_superclasses to do_not_apply_to_superclasses, gives
the following error:

     In `:- pragma type_spec' declaration for function
       `csv.record_parser.get_client_quotation_mark_in_unquoted_field'/1:
        error: variable `V_3' does not occur in the `:- func' declaration.

get_client_quotation_mark_in_unquoted_field/1 drops the unboxed_reader/4
constraint and is only constrained by line_oriented/2. This was almost
certainly a mistake since there's no reason for that function to have
either constraint. Deleting the redundant constraint allows the libray
to compile and the type specialisations are present in the C code.

That said, I'm not convinced that most of them are actually called,
since there are number of class method calls remaining in the HLDS
dumps.  (In the case of read_from_file/6, the call to the get/4
method is still present and that would presuambly call the unspecialised
versions.)

I'll post some findings about the JSON library separately.

...

> I am attaching the log and the diff, but a review would be worthwhile
> only after we discuss and settle the three design questions below,
> and I update the diff to implement the chosen solutions. I would also
> add the required documentation then, as there is no point in
> writing it before. However, I would like feedback now on the proposed
> documentation of a new option in compiler/options.m.
>
> -------------------------------------------------------------
>
> The first is about syntax. At the moment, this diff supports
> pragmas like this:
>
>    :- pragma type_spec_constrained_preds(
>        [stream.line_oriented(Stream, State),
>            stream.unboxed_reader(Stream, char, State, Error),
>            stream.putback(Stream, char, State, Error)],
>        apply_to_superclasses,
>        [subst([Stream = io.text_input_stream,
>            State = io.state, Error = io.error])]).
>
> The third argument here uses the syntax we agreed about two or three
> months ago,

For the benefit of other readers, the thread in question is
<http://lists.mercurylang.org/archives/reviews/2023-October/024417.html>.
(It goes on into November 2023.)

> but I wonder whether we should replace the equal signs
> with an operator that is more like an arrow, such as -> or =>.
> Unlike the existing syntax, this would make it clear that the
> two sides of the operator are NOT equivalent; they have different
> purposes, and therefore different restrictions.

I don't have a preference as to whether "=" is used or if we swap to
some other operator. (As proposed in the original thread, type_spec
pragmas should eventually move to whatever syntax is chosen as well.)

> Some terminology for the discussion below: I call
>
>    subst([Stream = io.text_input_stream,
>        State = io.state, Error = io.error])
>
> a substitution, and I call its component
>
>    Stream = io.text_input_stream
>
> a type var substitution.
>
> -------------------------------------------------------------
>
> The second question has to do with possible restrictions on what
> variables may appear on the left hand sides of those equal-signs/arrows.
> It is a given that each left hand side must be a variable.
> In the example above, all of these variables also occur in the
> list of constraints in the first argument. This means that
> when we apply that pragma to a predicate which uses those constraints
> (or a subset of them), we replace the variable names on the LHSs
> of the third arg of the type_spec_constrained_preds pragma
> with the names of the variables in the predicate declaration
> that occur in corresponding positions of those constraints
> when we generate the type_spec pragma. For example, given
>
>    :- pred p(....) is det
>        <= (stream.line_oriented(Strm, St),
>            stream.unboxed_reader(Strm, char, St, E),
>            stream.putback(Strm, char, St, E)).
>
> the type type_spec pragma we would generate for this predicate
> would map Strm to io.text_input_stream, St to io.state, and
> E to io.error.
>
> If some of the variables on the LHSs of the third arg did NOT
> occur in the first arg, as with this pragma and pred p,
>
> :- pragma type_spec_constrained_preds(
>    [stream.line_oriented(Stream, State),
>        stream.unboxed_reader(Stream, char, State, Error),
>        stream.putback(Stream, char, State, Error)],
>    apply_to_superclasses,
>    [subst([Stream = io.text_input_stream, Unit = char,
>        State = io.state, Error = io.error])]).
>
> we would not know what variable in the tvarset of this predicate
> declaration Unit corresponds to. And indeed, the code in the attached
> diff would generate a type_spec pragma that maps an unnamed variable
> to char because of this, which makes that generated type_spec pragma
> invalid.
>
> I can see four possible approaches to solving this problem.
>
> Approach one would simply disallow the occurrence of such variables
> on the LHSs of the third arg, and generate an error message if they
> nevertheless occur.

Making such variables an error is my preference.

> Approach two would allow such variables, but ignore them.

No!

> Approach three would allow such variables, but ignore them *unless*
> the predicate declaration contains a type var with that name,
> in which case it would use that name in the generated type_spec pragma.
>
> Approach four would disallow such variables in the LHSs of the
> third argument of the type_spec_constrained_preds pragma,
> but would allow such variables to specified in a new fourth
> argument of that pragma. This fourth arg would be a single
> substitution, and the rule would be:
>
>    if a type var occurs in the first arg, then
>        it may occur on the LHS of a type var substitution
>        in the third arg, but not in the fourth
>    if a type var does not occur in the first arg, then
>        it may occur on the LHS of a type var substitution
>        in the fourth arg, but not in the third
>
> But apart from this syntactic difference, the semantics would match
> approach three: we would pay attention to a given type var substitution
> in the fourth arg only in predicates whose declarations have
> a type variable with the name on its LHS.
>
> I would prefer approach four, though I could also live with approach one.
> I view approach two as violating the law of least astonishment. As for
> approach three, it could work, but any documentation of its semantics
> would be signicantly more complicated than a documentation of the
> semantics of approach four, and its use in practice would require
> more attention as well. With approach four, the compiler can help
> diagnose e.g. unintentional spelling mismatches between the first arg
> and the later args; with approach three, it cannot.

I am struggling to see the use case for approaches (3) and (4).
Is it that they also allow you to specialise unconstrained type
variables?

> -------------------------------------------------------------
>
> The third question has to do with possible duplicates.
> At the moment, the compiler does not try to diagnose duplicate
> type_spec pragmas, and if any occur, it has to be a user-written
> pragma duplicating another user-written pragma, so any problem
> is clearly the user's fault. After this diff, there can also be
> compiler generated pragmas duplicating user pragmas (or vice versa),
> and compiler generated pragmas duplicating other compiler generated
> pragmas. The latter can occur e.g. when two type_spec_constrained_preds
> pragmas specify distinct but overlapping sets of constraints in their
> first args, and they are both applied to a predicate whose class context
> consists of just that overlap.
>
> I believe that
>
> - we should report duplicates when both type_spec pragmas are user-provided,
> - but not if one or both pragmas are compiler generated.

That seems sensible.

> This raises the question of how exactly we can test whether one type_spec
> pragma duplicates another. Such pragmas consist of two arguments,
> a predicate or procedure specification and a substitution, and two pragmas
> clearly differ if they differ in one or both arguments.
>
> I believe the rules should be the following.
>
> - The first args of two type_spec pragmas differ if and only if
>  they specify distinct sets of procedures. If they both
>
>  - specify the same predicate,
>  - specify the same procedure,
>  - one specifies a predicate and the other specifies a procedure
>    in that procedure,
>
>  then the procedures they specify overlap, and this makes the two pragmas
>  duplicates (if the second args also match).
>
> - The second args of two type_spec pragmas differ if and only if they differ
>  after
>
>  - sorting the type var substitutions of which they consist
>    on the name of the LHS variable, and
>
>  - replacing all type variables that occur in the types on the RHSs
>    of those type var substitutions with distinct anonymous type vars.
>    (In fact, we should consider whether we should allow ANY non-anonymous
>    type variables to occur in the RHS types.)

Looking at what the standard library and compiler do, the RHS are nearly
always ground types. (The one exception is var(_) in the standard
library.)  In practice, we do use non-anonymous type variables in the 
RHS types.)

The above seems reasonable.

Julien.


More information about the reviews mailing list