[m-rev.] for review: add "did you mean" reminders for mistyped predicate names

Zoltan Somogyi zoltan.somogyi at runbox.com
Mon Sep 25 23:39:39 AEST 2023


On 2023-09-25 21:14 +10:00 AEST, "Julien Fischer" <jfischer at opturion.com> wrote:
> 
> On Mon, 25 Sep 2023, Zoltan Somogyi wrote:
> 
>> For review by anyone. I am particularly looking for feedback
>> on whether any part of the change to edit_seq.m should stay.
>> I originally intended to base the new error message functionality
>> on new code in edit_seq.m, but this turned out to be significantly
>> harder than writing new code, in a new module (edit_distance.m)
>> using the gcc code Julien linked to as the basis.
>>
>> I can see three options:
>>
>> - Keeping edit_seq.m as it was.
>>
>> - Keeping only the addition of the new predicate for finding the
>>  closest candidate sequence, but undoing the generalization
>>  of the edit_params type.
>>
>> - Keeping both the changes to edit_seq.m in the attached diff:
>>  the new predicate, *and* the generalization.
>>
>> Note that the generalization requires changes in modules
>> that use this existing module if they pass parameters around,
>> so it is a breaking change, and the breakage does not buy you
>> anything helpful if your use case is the existing use case for
>> this module, i.e. the generation of diffs. This is why I would
>> prefer one of the first two options.
> 
> 1 or 2 is fine. I don't see a particularly strong reason for 2,
> but don't have any objections to it.

I am implementing option 1, since this makes the decision
about which module to use the simplest for users.

>> > compiler/typecheck_error_undef.m:
>>     If we are reporting an error for a reference an undefined predicate,
> 
>     *to* an undefined predicate
> 
>>     check whether any predicates exist whose names are "close enough"
>>     to the name of the referenced predicate, and if yes, add to the
>>     error message a line containing "(Did you mean x, y or z?)".
>>
>>     (Doing the same for references to undefined functions is future work.
>>     The two are separate because things that look like function references
>>     can also refer to e.g. data constructors.)
> 
> There's quite a bit of scope for doing something similar for other undefined
> entities. (Although, I'm sure you are aware of that.)

Yep, though it is worth doing only for the entities that people generate
undefined references to reasonably often. Besides predicates and function
symbols, I think only type names, and maybe module names, may be
frequent enough. I am pretty sure that inst, mode and event names are not.
(Field names count as function names, due to field access syntax.)
Can you think of any others?

>> +    ( if
>> +        AddeddumPieces = [],
>> +        KindQualMsgs = []
>> +    then
>> +        KindQualMsgs = [],
>> +        % It seems that regardless of missing qualifications or equal signs,
>> +        % the reference is to the wrong name. See if we can mention some
>> +        % similar names that could be the one they intended.
>> +        get_known_pred_names(PredicateTable, KnownPredNames),
>> +        BaseName = unqualify_name(SymName),
>> +        % Note: name_is_close_enough below depends on all costs here
>> +        % except for case changes being 2u.
>> +        Params = edit_params(2u, 2u, case_sensitive_replacement_cost, 2u),
>> +        string.to_char_list(BaseName, BaseNameChars),
>> +        list.map(string.to_char_list, KnownPredNames, KnownPredNamesChars),
>> +        find_closest_seqs(Params, BaseNameChars, KnownPredNamesChars,
>> +            Cost, HeadBestNameChars, TailBestNamesChars),
>> +        BestNamesChars = [HeadBestNameChars | TailBestNamesChars],
>> +        list.map(string.from_char_list, BestNamesChars, BestNames),
>> +        ( if
>> +            % Don't offer a string as a replacement for itself.
>> +            Cost > 0u,
>> +            % Don't offer a string as a replacement if it is too far
>> +            % from the original, either.
>> +            list.filter(name_is_close_enough(Cost, BaseName),
>> +                BestNames, CloseEnoughBestNames),
>> +            CloseEnoughBestNames = [_ | _]
>> +        then
> 
> I suggest imposing a limit on the number of close enough names that are
> printed, around a dozen should suffice; I can't imagine the resulting error
> messages being particularly meaningful if there are more.

The code at the moment prints all the candidate strings that have
the same edit distance from the undefined reference. This means that
*if* there were ever more than 12 of them, we would have no useful
way to pick the 12 most likely to be the one the programmer wants.
I would punt this problem to the first time we ever encounter it for real.
I don't think that will ever happen on non-contrived code.
 
>> +            SuggestionPieces = list_to_quoted_pieces_or(CloseEnoughBestNames),
>> +            SuggestedNamePieces =
>> +                [words("(Did you mean")] ++ SuggestionPieces ++
>> +                [suffix("?)"), nl],
>> +            SuggestedNamesMsg = simplest_msg(Context, SuggestedNamePieces),
>> +            Msgs = [UndefMsg, SuggestedNamesMsg]
>> +        else
>> +            Msgs = [UndefMsg]
>> +        )
>> +    else
>> +        % The AddeddumPieces part of UndefMsg and/or KindQualMsgs
>> +        % offer hints about the error that allow for the base name being right.
>> +        % Print just those.
>> +        % XXX Should we print SuggestedNamesMsg as well even in this case?
> 
> My inclination would be not to.

Mine too, which is why the code does what it does. I will delete the XXX comment.

>> +% The code in this module was derived from the spellcheck.cc in gcc.
> 
> I suggest moving this comment to the beginning of the implementation;

Done.

> I would also word it as follows:
> 
>      The code in this module uses a similar approach to that of
>      spellcheck.cc in gcc.

I did write my code looking at spellcheck.cc, so I don't think
that would be fully honest.

>> +%---------------------------------------------------------------------------%
>> diff --git a/tests/invalid/qual_basic_test2.err_exp b/tests/invalid/qual_basic_test2.err_exp
>> index 65e27c4a1..249d29c01 100644
>> --- a/tests/invalid/qual_basic_test2.err_exp
>> +++ b/tests/invalid/qual_basic_test2.err_exp
>> @@ -1,2 +1,3 @@
>>  qual_basic_test2.m:019: In clause for predicate `main'/2:
>>  qual_basic_test2.m:019:   error: undefined predicate `io.io__write_string'/3.
>> +qual_basic_test2.m:019:   (Did you mean `write_string'?)
> 
> That seems a little confusing; the actual error is that the module io.io
> doesn't exist.

I agree, but the problem is not in the new code; it is given the sym_name
as qualified(unqualified(io"), "io__write_string"), i.e. it is given a string with
a double underscore in the *base* name. Given that __s as module separators
have gone the way of the dodo in  current code, I don't really feel like
chasing down the reason for that.

> The diff looks fine otherwise.

Thanks. I followed your other suggesions, and will add "did you mean" prompts
for function and type names over the next few days.

There was a question about edit_distance.m that I forgot to ask in my
original email. Should we add a version of the find_closest_seq predicate
that explicitly works on strings? It would do all the required conversions
between strings and char lists, so users wouldn't have to. We could call it
find_closest_string.

Zoltan.


More information about the reviews mailing list