[m-rev.] for post-commit review: improve string.m

Zoltan Somogyi zoltan.somogyi at runbox.com
Mon Mar 25 17:56:19 AEDT 2024


On 2024-03-25 12:35 +11:00 AEDT, "Peter Wang" <novalazy at gmail.com> wrote:
>> -from_utf8_code_unit_list(CodeList, String) :-
>> +from_utf8_code_unit_list(CodeUnits, String) :-
>>      ( if internal_encoding_is_utf8 then
>> -        from_code_unit_list(CodeList, String)
>> +        from_code_unit_list(CodeUnits, String)
>>      else
>> -        decode_utf8(CodeList, [], RevChars),
>> +        decode_utf8(CodeUnits, [], RevChars),
>> +        % XXX This checks whether RevChars represents a well-formed string.
>> +        % Why? The call to decode_utf8 should have ensured that already.
>>          semidet_from_rev_char_list(RevChars, String)
>>      ).
> 
> decode_utf8 doesn't reject code points in the Unicode surrogate range
> (D800-DFFF, strictly speaking not valid to encode in UTF-8),
> which ARE rejected by semidet_from_rev_char_list.
> The change to semidet_from_rev_char_list was made later.

Ok, I have modified acc_rev_chars_from_utf8_code_units to reject surrogates.
The diff is attached for your review.
 
> If decode_utf8 (acc_rev_chars_from_utf8_code_units) detected surrogates,
> then there could be a variant of semidet_from_rev_char_list that assumes
> validity. I think it would not be worthwhile.

Agreed.

Thanks for both this review, and the module qual catch.

Zoltan.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Log.sc
Type: application/octet-stream
Size: 533 bytes
Desc: not available
URL: <http://lists.mercurylang.org/archives/reviews/attachments/20240325/f1888805/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: DIFF.sc
Type: application/octet-stream
Size: 2894 bytes
Desc: not available
URL: <http://lists.mercurylang.org/archives/reviews/attachments/20240325/f1888805/attachment-0001.obj>


More information about the reviews mailing list