[m-rev.] for review: Add a tool to help maintain Copyright lines.

Zoltan Somogyi zoltan.somogyi at runbox.com
Fri Feb 9 15:55:00 AEDT 2024


On 2024-02-09 15:16 +11:00 AEDT, "Peter Wang" <novalazy at gmail.com> wrote:
> tools/update_copyright.m:
> tools/.gitignore:
>     Add the update_copyright program. It is deliberately not integrated
>     into the build system as it should be compiled once, manually, then
>     left in the workspace across checkouts, mmake clean, etc.

The update_copyright program should go into its own directory.
I wouldn't want to start mixing Mercury code with sh code.

> +++ b/tools/update_copyright.m
> @@ -0,0 +1,356 @@
> +%---------------------------------------------------------------------------%
> +% vim: ft=mercury ts=4 sw=4 et
> +%---------------------------------------------------------------------------%
> +% Copyright (C) 2024 YesLogic Pty. Ltd.
> +% Copyright (C) 2024 The Mercury team.
> +% This file may only be copied under the terms of the GNU General
> +% Public License - see the file COPYING in the Mercury distribution.
> +%---------------------------------------------------------------------------%
> +%
> +% update_copyright [OPTIONS] [FILES]...
> +%
> +% Options:
> +%
> +%   -s, --suffix STR    Match only Copyright lines with STR in the suffix.
> +%
> +% This program updates the first matching Copyright line of each file to
> +% include the current year. If one or more file names are specified on the
> +% command line, then those files will be updated in-place (files already
> +% containing up-to-date Copyright lines will not be modified).
> +% If no file names are specified, the input will be read from standard input,
> +% and the output written to standard output.
> +%
> +%---------------------------------------------------------------------------%

Put the usage info *after* the paragraph that explains
what the program does.

> +:- type mod_state
> +    --->    unmodified
> +    ;       found_unmodified
> +    ;       found_modified.

I would add something to "mod_state" to indicate that what it is
talking about is the copyright line; that is what is found/not found
and modified or not modified.

> +:- pred option_default(option::out, option_data::out) is multi.
> +
> +option_default(suffix, maybe_string(no)).
> +option_default(dummy, int(0)).

I would add a comment about dummy being there to make the pred multi,
or, even better I would replace that "option" with another, -q for quiet
(since -s for silent would clash with suffix) which shuts up any non-error output.

> +:- pred process_file(int::in, maybe(string)::in, string::in, bool::out,
> +    io::di, io::uo) is det.
> +
> +process_file(CurrentYear, MaybeExpectSuffix, FileName, Continue, !IO) :-
> +    io.open_input(FileName, OpenInputRes, !IO),
> +    (
> +        OpenInputRes = ok(InputStream),
> +        read_lines_loop(InputStream, CurrentYear, MaybeExpectSuffix,
> +            [], RevLines, unmodified, ModState, !IO),
> +        io.close_input(InputStream, !IO),

Is there some reason why you didn't use read_named_file_as_lines?

> +        (
> +            ( ModState = unmodified
> +            ; ModState = found_unmodified
> +            ),
> +            io.format("Unchanged: %s\n", [s(FileName)], !IO),
> +            Continue = yes
> +        ;
> +            ModState = found_modified,
> +            % It would be better to write a temporary file then atomically
> +            % rename over the original file, but that would require extra work
> +            % to preserve the file ownership and permissions. For simplicity,
> +            % we forgo atomicity.
> +            io.open_output(FileName, OpenOutputRes, !IO),
> +            (
> +                OpenOutputRes = ok(OutputStream),
> +                write_rev_lines(OutputStream, RevLines, !IO),
> +                io.close_output(OutputStream, !IO),
> +                io.format("Modified: %s\n", [s(FileName)], !IO),
> +                Continue = yes
> +            ;
> +                OpenOutputRes = error(Error),
> +                report_io_error("Error opening " ++ FileName, Error, !IO),
> +                Continue = no
> +            )
> +        )
> +    ;
> +        OpenInputRes = error(Error),
> +        report_io_error("Error opening " ++ FileName, Error, !IO),
> +        Continue = no
> +    ).
> +
> +%---------------------------------------------------------------------------%
> +
> +:- pred read_lines_loop(io.text_input_stream::in, int::in, maybe(string)::in,
> +    list(string)::in, list(string)::out, mod_state::in, mod_state::out,
> +    io::di, io::uo) is det.
> +
> +read_lines_loop(InputStream, CurrentYear, MaybeExpectSuffix,
> +        !AccLines, !ModState, !IO) :-

Rename !AccLines to !RevLines. The name for this data structure should be
the same in caller and callee, and the lines *are* reversed.

> +    io.read_line_as_string(InputStream, ReadRes, !IO),
> +    (
> +        ReadRes = ok(Line),
> +        ( if
> +            !.ModState = unmodified,
> +            parse_copyright_line(Line, MaybeExpectSuffix,
> +                Prefix, Ranges0, Suffix)
> +        then
> +            % We could normalise ranges here.
> +            sort(Ranges0, Ranges1),
> +            ( if add_to_ranges(CurrentYear, Ranges1, Ranges) then
> +                make_copyright_line(Prefix, Ranges, Suffix, NewLine),
> +                !:AccLines = [NewLine | !.AccLines],
> +                !:ModState = found_modified
> +            else
> +                !:AccLines = [Line | !.AccLines],
> +                !:ModState = found_unmodified
> +            )
> +        else
> +            !:AccLines = [Line | !.AccLines]
> +        ),
> +        read_lines_loop(InputStream, CurrentYear, MaybeExpectSuffix,
> +            !AccLines, !ModState, !IO)
> +    ;
> +        ReadRes = eof
> +    ;
> +        ReadRes = error(Error),
> +        report_io_error("Error reading", Error, !IO)
> +    ).
> +
> +:- pred parse_copyright_line(string::in, maybe(string)::in,
> +    string::out, list(year_range)::out, string::out) is semidet.
> +
> +parse_copyright_line(Line, MaybeExpectSuffix, Prefix, Ranges, Suffix) :-
> +    string.sub_string_search(Line, "Copyright ", AfterCopyright),
> +    find_prefix_end(Line, ' ', AfterCopyright, PrefixEnd),
> +    find_suffix_start(Line, PrefixEnd, PrefixEnd, SuffixStart),

You use prefix and suffix to refer to the parts of the line before
and after the year ranges, but this is quite unusual terminology.
I would use something more like "find_start_of_range_sequence"
and "find_end_of_range_sequence". But this is moot, because ...

I would also convert the part of Line starting at AfterCopyright
into a list of chars, and process that. It should be simpler, easier
to read and understand, and since it would be done only for one line,
the perf impact would be negligible. And in that case, you would
need only one pred, named maybe find_year_ranges, and it could
convert the digits into integers and accumulate the ranges of integers
as it goes along. In short, I would rewrite this part of the program from scratch.

If you want, you can commit the code as is, and I would rewrite it along
the above lines.

> +:- pred write_rev_lines(io.output_stream::in, list(string)::in,
> +    io::di, io::uo) is det.
> +
> +write_rev_lines(Stream, RevLines, !IO) :-
> +    list.foldr(io.write_string(Stream), RevLines, !IO).

I would use list.reverse and then list.foldl, to avoid bad perf for e.g. options.m.
And I wouldn't create a predicate for them.

> +++ b/tools/update_copyright.pre-commit
> @@ -0,0 +1,32 @@
> +#!/usr/bin/env bash
> ...
> +if [ ! -x "$update_copyright" ]; then

We should use "test" instead of "[".

> +changed_files=($(git diff --name-only --cached --diff-filter=ACMR -- \
> +    '*.[mchly]' '*.java' ))

Don't we have some .cs files under git control?

I can never remember whether bash supports *.{m,c,h,l,y,java,cs} syntax,
but if it does, I would use it.

Note that we also have some shell scripts under git control, and they
do NOT have suffixes. I guess we could find them by looking for an
initial #!.

> +if [ "${#changed_files}" -eq 0 ]; then

As above.

> +"$update_copyright" --suffix 'Mercury team' -- "${changed_files[@]}" &&
> +git add -u -- "${changed_files[@]}"

I saw Peter's comment on "git add" while writing this, and I agree with him.

Zoltan.


More information about the reviews mailing list