[m-rev.] for review and benchmarking: speed up file name creation in write_deps_file.m

Peter Wang novalazy at gmail.com
Thu Sep 7 15:05:25 AEST 2023


On Thu, 07 Sep 2023 11:16:55 +1000 "Zoltan Somogyi" <zoltan.somogyi at runbox.com> wrote:
> For review and benchmarking by Peter. Can you please
> check out the effect of this diff on the benchmark you used
> to evaluate the effectiveness of the cache that this diff modifies?

The benchmark I used is to make a copy of the compiler directory,
then in that directory run the same command as mmake depend:

    mmc --generate-dependencies --grade hlc.gc --mercury-linkage static --flags COMP_FLAGS mercury_compile

with different values of MERCURY_COMPILER.

I see no difference without or with the patch:

    Benchmark #1: ./test.stock
      Time (mean ± σ):      2.149 s ±  0.014 s    [User: 2.039 s, System: 0.106 s]
      Range (min … max):    2.128 s …  2.165 s    10 runs

    Benchmark #2: ./test.wdf
      Time (mean ± σ):      2.148 s ±  0.041 s    [User: 2.036 s, System: 0.107 s]
      Range (min … max):    2.092 s …  2.237 s    10 runs

And also no difference with --use-subdirs added to command line options.

> Julien, can you please have a look at the comment on
> dir.relative_path_name_from_components? It implies
> that the predicate allows the construction of pathnames such as
> a/b/c/d from ["a", "b", "c", "d"], but does NOT allow
> the construction of the same pathname from ["a/b/c", "d"],
> even though the code does the right thing anyway.
> The relationship of that code, which merely adds
> the dir separator char between the components,
> and the / predicate, which does a lot more,
> should also be better documented.

I introduced relative_path_name_from_components specifically to be fast
when you KNOW the list elements are simple directory components, and you
only want a relative path name. Directory separators, drive letters, ., ..,
and any other stuff that dir./ tries to deal with are out of scope as it
would slow down relative_path_name_from_components for the case that is
it designed for. We can make that clearer in the documentation as you
say.

It so happens that on Windows if you call
relative_path_name_from_components(["a/b/c", "d"]), it will return
"a/b/c\d" which IS valid in that Windows accepts both slashes as
directory separators, but may not be want you want, e.g. to present to users.

> diff --git a/compiler/write_deps_file.m b/compiler/write_deps_file.m
> index d223f652c..5f4eee6f7 100644
> --- a/compiler/write_deps_file.m
> +++ b/compiler/write_deps_file.m
...
> @@ -1583,22 +1545,24 @@ generate_dv_file(Globals, SourceFileName, ModuleName, DepsMap,
>          ModuleMakeVarName ++ ".dep_errs",
>          list.map(add_suffix(".dep_err"), SourceFiles)),
>  
> -    make_module_file_names_with_suffix(Globals,
> +    % ZZZ
> +    make_module_file_names_with_ext(Globals,

ZZZ

> +:- pred make_module_file_name_in_loop(globals::in, ext::in,
> +    maybe(dir_name)::in, string::in, module_name::in, file_name::out,
> +    module_file_name_cache::in, module_file_name_cache::out,
> +    io::di, io::uo) is det.
> +
> +make_module_file_name_in_loop(Globals, Ext, MaybeDirPath, ExtStr,
> +        ModuleName, FileName, !Cache, !IO) :-
> +    % Keep this code in sync with make_module_file_name below.
> +    % NOTE This could be a complete switch on Ext, with an explicit decision
> +    % for each category of extension about whether it is worth caching files
> +    % with those extensions. However, the statistics below, gathered from
> +    % the execution of a bootcheck, show that such discrimination is not
> +    % neecessary for good performance.
> +    ModuleNameExt = module_name_and_ext(ModuleName, Ext),
> +    ( if map.search(!.Cache, ModuleNameExt, FileName0) then
> +        trace [
> +            compile_time(flag("write_deps_file_cache")),
> +            run_time(env("WRITE_DEPS_FILE_CACHE")),
> +            io(!TIO)
> +        ] (
> +            record_cache_hit(Ext, !TIO)
> +        ),
> +        FileName = FileName0
> +    else
> +        trace [
> +            compile_time(flag("write_deps_file_cache")),
> +            run_time(env("WRITE_DEPS_FILE_CACHE")),
> +            io(!TIO)
> +        ] (
> +            record_cache_miss(Ext, !TIO)
> +        ),
> +        % This code performs the job of
> +        %   module_name_to_file_name(Globals, From, Ext, ModuleName, FileName)
> +        % but with the directory path and extension string parts done
> +        % just done once in make_module_file_names_with_ext, instead of
> +        % being repeated each time here.

done just once

> +        BaseNameNoExt = module_name_to_base_file_name_no_ext(Ext, ModuleName),
> +        BaseName = BaseNameNoExt ++ ExtStr,
> +        (
> +            MaybeDirPath = no,
> +            FileName = BaseName
> +        ;
> +            MaybeDirPath = yes(DirPath),
> +            % XXX This cheats, since DirPath may contain directory separators.
> +            % The documentation of relative_path_name_from_components forbids
> +            % this, but its code does not care, and does the right thing
> +            % anyway.
> +            FileName =
> +                dir.relative_path_name_from_components([DirPath, BaseName])

In this case I think you should just use string.join_list.

> +        ),
> +        trace [compile_time(flag("sanity_check_write_deps_file"))] (
> +            module_name_to_file_name(Globals, $pred, Ext,
> +                ModuleName, FileNameB),
> +            expect(unify(FileName, FileNameB), $pred,
> +                "FileName = FileNameB")
> +        ),
> +        % We cache result of the translation, in order to save on
> +        % temporary string construction.
> +        map.det_insert(ModuleNameExt, FileName, !Cache)
> +    ).
> +

The diff looks fine.

Since I didn't measure any difference, I think there's no need to keep
make_module_file_names_with_ext and make_module_file_name_in_loop.
It's up to you.

Peter


More information about the reviews mailing list