[m-rev.] for review: add "did you mean" reminders for mistyped predicate names

Julien Fischer jfischer at opturion.com
Mon Sep 25 21:14:20 AEST 2023
Previous message: [m-rev.] for review: add "did you mean" reminders for mistyped predicate names
Next message: [m-rev.] for review: add "did you mean" reminders for mistyped predicate names
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Mon, 25 Sep 2023, Zoltan Somogyi wrote:

> For review by anyone. I am particularly looking for feedback
> on whether any part of the change to edit_seq.m should stay.
> I originally intended to base the new error message functionality
> on new code in edit_seq.m, but this turned out to be significantly
> harder than writing new code, in a new module (edit_distance.m)
> using the gcc code Julien linked to as the basis.
>
> I can see three options:
>
> - Keeping edit_seq.m as it was.
>
> - Keeping only the addition of the new predicate for finding the
>  closest candidate sequence, but undoing the generalization
>  of the edit_params type.
>
> - Keeping both the changes to edit_seq.m in the attached diff:
>  the new predicate, *and* the generalization.
>
> Note that the generalization requires changes in modules
> that use this existing module if they pass parameters around,
> so it is a breaking change, and the breakage does not buy you
> anything helpful if your use case is the existing use case for
> this module, i.e. the generation of diffs. This is why I would
> prefer one of the first two options.

1 or 2 is fine. I don't see a particularly strong reason for 2,
but don't have any objections to it.

> WIth either of the last two options, I would of course document
> the chosen option in NEWS.md, and with any of the options,
> I would add a note to edit_seq.m about the existence of
> edit_distance.m. (The form of the note would depend on
> the chosen option, which is why I haven't written it yet.)
> The fate of the new test cases for edit_seq.m also depends on this
> decision.
>
> There is also a new XXX in typecheck_error_undef.m
> I would like your opinions on: should we print "did you mean"
> hints in the presence of hints about pred vs func and/or
> missing module qualification?


> Add "did you mean ..." to undef pred reports.
> 
> compiler/typecheck_error_undef.m:
>     If we are reporting an error for a reference an undefined predicate,

     *to* an undefined predicate

>     check whether any predicates exist whose names are "close enough"
>     to the name of the referenced predicate, and if yes, add to the
>     error message a line containing "(Did you mean x, y or z?)".
>
>     (Doing the same for references to undefined functions is future work.
>     The two are separate because things that look like function references
>     can also refer to e.g. data constructors.)

There's quite a bit of scope for doing something similar for other undefined
entities. (Although, I'm sure you are aware of that.)

> library/edit_distance.m:
>     Add this new module to implement the "close enough" check.
>     This new module is similar to the existing edit_seq.m module,
>     but it is designed to serve different requirements, and it seems to me
>     to be far from trivial to write code to meet both sets of requirements
>     at once.
> 
> library/edit_seq.m:
>     Expand this module to serve one of the two requirements
>     now met by edit_distance.m that the the old edit_seq.m did not meet.
>     This is the requirement to treat case changes differently than
>     other replacements. The other requirement, the need to treat transpositions
>     differently than insert/delete pairs, is what is hard to reconcile
>     with the need to generate output that looks like it came from "diff -u".

...

> diff --git a/compiler/typecheck_error_undef.m b/compiler/typecheck_error_undef.m
> index bb93c2694..96d96e9ae 100644
> --- a/compiler/typecheck_error_undef.m
> +++ b/compiler/typecheck_error_undef.m
> @@ -346,11 +349,11 @@ report_apply_instead_of_pred = Components :-
>
>  %---------------------%
> 
> -:- func report_error_pred_wrong_name(type_error_clause_context, prog_context,
> -    predicate_table, pf_sym_name_arity, list(module_name), list(format_piece))
> -    = error_spec.
> +:- func report_error_pred_wrong_full_name(type_error_clause_context,
> +    prog_context, predicate_table, pf_sym_name_arity, list(module_name),
> +    list(format_piece)) = error_spec.
> 
> -report_error_pred_wrong_name(ClauseContext, Context, PredicateTable,
> +report_error_pred_wrong_full_name(ClauseContext, Context, PredicateTable,
>          PFSymNameArity, MissingImportModules, AddeddumPieces) = Spec :-
>      InClauseForPieces = in_clause_for_pieces(ClauseContext),
>      InClauseForComponent = always(InClauseForPieces),
> @@ -379,7 +382,51 @@ report_error_pred_wrong_name(ClauseContext, Context, PredicateTable,
>          PossibleModuleQualsSet0, PossibleModuleQualsSet),
>      QualMsgs = report_any_missing_module_qualifiers(ClauseContext,
>          Context, "predicate", PossibleModuleQualsSet),
> -    Msgs = [UndefMsg] ++ KindMsgs ++ QualMsgs,
> +    KindQualMsgs = KindMsgs ++ QualMsgs,
> +    ( if
> +        AddeddumPieces = [],
> +        KindQualMsgs = []
> +    then
> +        KindQualMsgs = [],
> +        % It seems that regardless of missing qualifications or equal signs,
> +        % the reference is to the wrong name. See if we can mention some
> +        % similar names that could be the one they intended.
> +        get_known_pred_names(PredicateTable, KnownPredNames),
> +        BaseName = unqualify_name(SymName),
> +        % Note: name_is_close_enough below depends on all costs here
> +        % except for case changes being 2u.
> +        Params = edit_params(2u, 2u, case_sensitive_replacement_cost, 2u),
> +        string.to_char_list(BaseName, BaseNameChars),
> +        list.map(string.to_char_list, KnownPredNames, KnownPredNamesChars),
> +        find_closest_seqs(Params, BaseNameChars, KnownPredNamesChars,
> +            Cost, HeadBestNameChars, TailBestNamesChars),
> +        BestNamesChars = [HeadBestNameChars | TailBestNamesChars],
> +        list.map(string.from_char_list, BestNamesChars, BestNames),
> +        ( if
> +            % Don't offer a string as a replacement for itself.
> +            Cost > 0u,
> +            % Don't offer a string as a replacement if it is too far
> +            % from the original, either.
> +            list.filter(name_is_close_enough(Cost, BaseName),
> +                BestNames, CloseEnoughBestNames),
> +            CloseEnoughBestNames = [_ | _]
> +        then

I suggest imposing a limit on the number of close enough names that are
printed, around a dozen should suffice; I can't imagine the resulting error
messages being particularly meaningful if there are more.

> +            SuggestionPieces = list_to_quoted_pieces_or(CloseEnoughBestNames),
> +            SuggestedNamePieces =
> +                [words("(Did you mean")] ++ SuggestionPieces ++
> +                [suffix("?)"), nl],
> +            SuggestedNamesMsg = simplest_msg(Context, SuggestedNamePieces),
> +            Msgs = [UndefMsg, SuggestedNamesMsg]
> +        else
> +            Msgs = [UndefMsg]
> +        )
> +    else
> +        % The AddeddumPieces part of UndefMsg and/or KindQualMsgs
> +        % offer hints about the error that allow for the base name being right.
> +        % Print just those.
> +        % XXX Should we print SuggestedNamesMsg as well even in this case?

My inclination would be not to.

> +        Msgs = [UndefMsg] ++ KindQualMsgs
> +    ),
>      Spec = error_spec($pred, severity_error, phase_type_check, Msgs).

...

> diff --git a/library/edit_distance.m b/library/edit_distance.m
> index e69de29bb..9ec45a54b 100644
> --- a/library/edit_distance.m
> +++ b/library/edit_distance.m
> @@ -0,0 +1,451 @@
> +%---------------------------------------------------------------------------%
> +% vim: ft=mercury ts=4 sw=4 et
> +%---------------------------------------------------------------------------%
> +% Copyright (C) 2023 The Mercury team.
> +% This file is distributed under the terms specified in COPYING.LIB.
> +%---------------------------------------------------------------------------%
> +%
> +% File: edit_distance.m.
> +% Stability: medium.
> +%
> +% This module computes the edit distance between two sequences of items.
> +% Its job is both similar to, and distinct from, the job of edit_seq.m.
> +% It is similar in that both modules work out the simplest, cheapest way
> +% to transform one sequence into another. It is distinct because the two
> +% modules aim to solve different problems.
> +%
> +% - edit_seq.m aims to solve the problem of displaying the difference
> +%   between two given sequences in a way that makes their differences
> +%   as easy to understand as possible.
> +%
> +% - edit_distance.m aims to solve the problem of finding in a pool of
> +%   candidate sequences the candidate that is closest to a given query
> +%   candidate sequence.

Delete that last occurrence of "candidate" there.

> +% Doing a second job with the second problem requires a mechanism that
> +% allows callers to specify that a transposition of two elements (such as
> +% replacing "bc" with "cb", thus transforming e.g. "abcd" into "acbd")
> +% has a different cost than deleting one element in one place in the sequence
> +% and inserting it back at another place. This mechanism does not help
> +% with the first problem at all (since the simplest way to display
> +% a transposition *is* as a delete/insert pair), and in fact its presence
> +% in the system would unnecessarily complicate the algorithm.

Somewhere here, I suggest we should say that this module computes the
Damerau–Levenshtein instance. I suspect potential users of this moulde  who are
aware of the different types of edit distance may be looking out for this.

> +%---------------------------------------------------------------------------%
> +%
> +% The code in this module was derived from the spellcheck.cc in gcc.

I suggest moving this comment to the beginning of the implementation;
I would also word it as follows:

      The code in this module uses a similar approach to that of
      spellcheck.cc in gcc.

...

> +%---------------------------------------------------------------------------%
> diff --git a/tests/invalid/qual_basic_test2.err_exp b/tests/invalid/qual_basic_test2.err_exp
> index 65e27c4a1..249d29c01 100644
> --- a/tests/invalid/qual_basic_test2.err_exp
> +++ b/tests/invalid/qual_basic_test2.err_exp
> @@ -1,2 +1,3 @@
>  qual_basic_test2.m:019: In clause for predicate `main'/2:
>  qual_basic_test2.m:019:   error: undefined predicate `io.io__write_string'/3.
> +qual_basic_test2.m:019:   (Did you mean `write_string'?)

That seems a little confusing; the actual error is that the module io.io
doesn't exist.

The diff looks fine otherwise.

Julien.
Previous message: [m-rev.] for review: add "did you mean" reminders for mistyped predicate names
Next message: [m-rev.] for review: add "did you mean" reminders for mistyped predicate names
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the reviews mailing list