[m-rev.] for review: improve dir./

Peter Wang novalazy at gmail.com
Mon Sep 11 11:42:24 AEST 2023


On Sun, 10 Sep 2023 18:39:30 +1000 "Zoltan Somogyi" <zoltan.somogyi at runbox.com> wrote:
> Improve the structure of DirName/FleName.
> 
> library/dir.m:
>     Factor out common code in the code that canonicalizes filenames
>     by defining and using is_directory_separator_return_canon.
>     Improve the variable names used in the canonicalization predicates
>     (which previously used different names for the same concepts).
> 
>     Add a predicate that tests whether a dirname or filename is already
>     in its canonical form. and if so, skip the canonicalization process.
> 
>     Clarify the code of the / predicate by moving some parts of it
>     to separate predicates whose names describe their purpose.
>     Some of these predicates are called more than once, so they also
>     factor out common code.
> 
>     Switch from using unsafe to safe code to test whether filenames
>     name a drive letter on Windows.

> diff --git a/library/dir.m b/library/dir.m
> index a73fb42b8..b71bf320e 100644
> --- a/library/dir.m
> +++ b/library/dir.m
...
> +:- pred is_path_string_canonical(string::in) is semidet.
> +
> +is_path_string_canonical(Path) :-
> +    is_path_string_canonical_loop(Path, 0, prev_char_is_not_separator).
> +
> +:- type canon_prev_char
> +    --->    prev_char_is_not_separator
> +    ;       prev_char_is_separator.
> +
> +:- pred is_path_string_canonical_loop(string::in, int::in, canon_prev_char::in)
> +    is semidet.
> +
> +is_path_string_canonical_loop(Path, CurIndex, PrevChar) :-
> +    ( if string.index_next(Path, CurIndex, NextIndex, Char) then

The problem is string.index_next incurs a call to string.length for each
call, unlike string.unsafe_index_next. It's quite easy to use
unsafe_index_next safely, though.

> +        ( if dir.is_directory_separator_return_canon(Char, CanonChar) then
> +            % Two consecutive directory separators may not occur
> +            % in a canonical path.
> +            PrevChar = prev_char_is_not_separator,
> +            % A directory separator may not occur in a canonical path
> +            % without itself being a canonical separator.
> +            Char = CanonChar,
> +            is_path_string_canonical_loop(Path, NextIndex,
> +                prev_char_is_separator)
> +        else
> +            is_path_string_canonical_loop(Path, NextIndex,
> +                prev_char_is_not_separator)
> +        )
> +    else
> +        % We have come to the end of Path.
> +        true
> +    ).

> @@ -868,43 +916,32 @@ dotnet_path_name_is_absolute_2(_) :-
>  %---------------------------------------------------------------------------%
>  
>  DirName0/FileName0 = PathName :-
> -    DirName = string.from_char_list(canonicalize_path_chars(
> -        string.to_char_list(DirName0))),
> -    FileName = string.from_char_list(canonicalize_path_chars(
> -        string.to_char_list(FileName0))),
> -    ( if
> -        dir.path_name_is_absolute(FileName)
> -    then
> +    ( if is_path_string_canonical(DirName0) then
> +        DirName = DirName0
> +    else
> +        DirName = string.from_char_list(canonicalize_path_chars(
> +            string.to_char_list(DirName0)))
> +    ),
> +    ( if is_path_string_canonical(FileName0) then
> +        FileName = FileName0
> +    else
> +        FileName = string.from_char_list(canonicalize_path_chars(
> +            string.to_char_list(FileName0)))
> +    ),
> +    ( if dir.path_name_is_absolute(FileName) then
>          unexpected($pred, "second argument is absolute")
> -    else if
> -        % Check that FileName is not a relative path of the form "C:foo".
> -        use_windows_paths,
> -        Length = length(FileName),
> -        ( if Length >= 2 then
> -            char.is_alpha(string.unsafe_index(FileName, 0)),
> -            string.unsafe_index(FileName, 1) = (':'),
> -            ( if Length > 2 then
> -                not is_directory_separator(string.unsafe_index(FileName, 2))
> -            else
> -                true
> -            )
> -        else
> -            fail
> -        )
> -    then
> +    else if dir.path_name_is_cur_drive_relative(FileName) then
>          unexpected($pred, "second argument is a current drive relative path")

Simon Taylor called these types of paths "current drive relative paths"
or "drive current paths" in commit 89a4d190c, but that terminology is
confusing. Continued below.

>      else if
> -        DirNameLength = length(DirName),
>          (
>              % Check for construction of relative paths of the form "C:foo".
> -            use_windows_paths,
> -            DirNameLength = 2,
> -            char.is_alpha(string.unsafe_index(DirName, 0)),
> -            string.unsafe_index(DirName, 1) = (':')
> +            path_name_names_drive_letter(FileName, ThirdCharIndex),
> +            not string.index_next(FileName, ThirdCharIndex, _, _)
>          ;
>              % Do not introduce duplicate directory separators.
> -            % On Windows \\foo (a UNC server specification) is not equivalent
> +            % On Windows, \\foo (a UNC server specification) is not equivalent
>              % to \foo (the directory X:\foo, where X is the current drive).
> +            string.length(DirName, DirNameLength),
>              ( if DirNameLength > 0 then
>                  ends_with_directory_separator(DirName, DirNameLength, _)
>              else
> @@ -923,6 +960,43 @@ DirName0/FileName0 = PathName :-
>              FileName])
>      ).
>  
> +    % Is FileName a relative path of the form "C:foo"?
> +    %
> +:- pred path_name_is_cur_drive_relative(string::in) is semidet.
> +

These kinds of paths are not relative to the *current* drive,
but relative to the current directory on the specified drive.
On Windows, each process has a current drive, and also a current
directory per drive.

I'm not sure there is a short, standard name for these kind of paths.
I suggest "drive relative path" or "drive-letter relative path".

> +path_name_is_cur_drive_relative(FileName) :-
> +    % In the following, C stands for any drive letter, and xyz for
> +    % non-special characters that can occur in filenames.
> +    ( if path_name_names_drive_letter(FileName, ThirdCharIndex) then
> +        ( if string.index_next(FileName, ThirdCharIndex, _, ThirdChar) then
> +            ( if is_directory_separator(ThirdChar) then
> +                % FileName is "C:/xyz", which is a cur drive path,
> +                % but an absolute one.
> +                fail

Write the example the canonical way, i.e. "C:\xyz".

> +            else
> +                % FileName is "C:xyz", which is a cur drive relative path.
> +                true
> +            )
> +        else
> +            % FileName is "C:", which is a cur drive relative path.
> +            true
> +        )
> +    else
> +        % FileName cannot start with "C:".
> +        fail
> +    ).
> +
> +:- pred path_name_names_drive_letter(string::in, int::out) is semidet.
> +
> +path_name_names_drive_letter(FileName, ThirdCharIndex) :-
> +    use_windows_paths,
> +    string.index_next(FileName, 0, SecondCharIndex, FirstChar),
> +    string.index_next(FileName, SecondCharIndex, ThirdCharIndex, SecondChar),
> +    % SecondChar is *much* more unlikely to be ':'
> +    % than FirstChar is to pass is_alpha.
> +    SecondChar = (':'),
> +    char.is_alpha(FirstChar).

Peter


More information about the reviews mailing list