[m-rev.] for review and benchmarking: speed up file name creation in write_deps_file.m

Zoltan Somogyi zoltan.somogyi at runbox.com
Fri Sep 8 12:03:15 AEST 2023


On 2023-09-07 15:05 +10:00 AEST, "Peter Wang" <novalazy at gmail.com> wrote:
> I see no difference without or with the patch:
> 
>     Benchmark #1: ./test.stock
>       Time (mean ± σ):      2.149 s ±  0.014 s    [User: 2.039 s, System: 0.106 s]
>       Range (min … max):    2.128 s …  2.165 s    10 runs
> 
>     Benchmark #2: ./test.wdf
>       Time (mean ± σ):      2.148 s ±  0.041 s    [User: 2.036 s, System: 0.107 s]
>       Range (min … max):    2.092 s …  2.237 s    10 runs
> 
> And also no difference with --use-subdirs added to command line options.

Thanks. Based on this result, I have undone the optimization that turned
out not to be one, and replaced with a long comment describing both
the "optimization" and why it is not being used. I also updated the
commit message accordingly.

> I introduced relative_path_name_from_components specifically to be fast
> when you KNOW the list elements are simple directory components, and you
> only want a relative path name. Directory separators, drive letters, ., ..,
> and any other stuff that dir./ tries to deal with are out of scope as it
> would slow down relative_path_name_from_components for the case that is
> it designed for. We can make that clearer in the documentation as you
> say.

At the moment, dir./ has a complicated structure, that does not make clear
exactly what requirements it imposes on its two operands. I think we should

- create a predicate for checking whether the second arg of dir./
   already meets the requirements
- create a predicate for mangling a string into meeting those requirements
- create two more such predicates for the first arg
- create a predicate for checking whether a string meets the requirements
  of *both* args in dir./, if this would not duplicate one of the earlier ones
- create a predicate for mangling a string into meeting both requirements
- document that dir.relative_path_from_components mangles the
  first dir name on its list to meet the requirements of both args of dir./
  and mangles all the other list elements to meet the requirements of
  the second arg
- add dir.relative_path_from_unchecked_components (which would be
  the current relative_path_from_components, renamed) which is
  documented to require its callers to ensure those same conditions.

We could also add a version of dir./, named maybe make_path_name_unchecked, 
which would assume the args satisfy its requirements, instead of ensuring that.

I assume that the differences between the requirements of the first
and second args on Unix would just be that the first arg is allowed
to start with "/", while the differences on Windows would be more
extensive.

Opinions?

> It so happens that on Windows if you call
> relative_path_name_from_components(["a/b/c", "d"]), it will return
> "a/b/c\d" which IS valid in that Windows accepts both slashes as
> directory separators, but may not be want you want, e.g. to present to users.

That would not have been a problem, because the "a/b/c" also was
constructed by dir.relative_path_name_from_components, and so
would have used \ as the separator on Windows.

>> diff --git a/compiler/write_deps_file.m b/compiler/write_deps_file.m
>> index d223f652c..5f4eee6f7 100644
>> --- a/compiler/write_deps_file.m
>> +++ b/compiler/write_deps_file.m
> ...
>> @@ -1583,22 +1545,24 @@ generate_dv_file(Globals, SourceFileName, ModuleName, DepsMap,
>>          ModuleMakeVarName ++ ".dep_errs",
>>          list.map(add_suffix(".dep_err"), SourceFiles)),
>>  
>> -    make_module_file_names_with_suffix(Globals,
>> +    % ZZZ
>> +    make_module_file_names_with_ext(Globals,
> 
> ZZZ

Deleted.


>> +        % This code performs the job of
>> +        %   module_name_to_file_name(Globals, From, Ext, ModuleName, FileName)
>> +        % but with the directory path and extension string parts done
>> +        % just done once in make_module_file_names_with_ext, instead of
>> +        % being repeated each time here.
> 
> done just once

Fixed.

>> +        BaseNameNoExt = module_name_to_base_file_name_no_ext(Ext, ModuleName),
>> +        BaseName = BaseNameNoExt ++ ExtStr,
>> +        (
>> +            MaybeDirPath = no,
>> +            FileName = BaseName
>> +        ;
>> +            MaybeDirPath = yes(DirPath),
>> +            % XXX This cheats, since DirPath may contain directory separators.
>> +            % The documentation of relative_path_name_from_components forbids
>> +            % this, but its code does not care, and does the right thing
>> +            % anyway.
>> +            FileName =
>> +                dir.relative_path_name_from_components([DirPath, BaseName])
> 
> In this case I think you should just use string.join_list.

Moot, since this code won't be kept.

>> +        ),
>> +        trace [compile_time(flag("sanity_check_write_deps_file"))] (
>> +            module_name_to_file_name(Globals, $pred, Ext,
>> +                ModuleName, FileNameB),
>> +            expect(unify(FileName, FileNameB), $pred,
>> +                "FileName = FileNameB")
>> +        ),
>> +        % We cache result of the translation, in order to save on
>> +        % temporary string construction.
>> +        map.det_insert(ModuleNameExt, FileName, !Cache)
>> +    ).
>> +
> 
> The diff looks fine.

Thank you.

Zoltan.


More information about the reviews mailing list