[m-rev.] io.{read,write}_binary

Julien Fischer jfischer at opturion.com
Tue Apr 25 12:53:24 AEST 2023


Hi Zoltan,

On Tue, 25 Apr 2023, Zoltan Somogyi wrote:

> These two predicates are implemented using a form of type
> punning: they take the text_{in,out}put_stream wrapper off a stream
> and put a binary_{in,out}put_stream wrapper around it.
> This works because at the moment, these wrappers all wrap
> the same type, but this won't be the case soon. We therefore
> need a new way to implement these two predicates.
>
> The approach I propose is that
>
> - write_binary should call string.string to convert the term
>  to be written to a string,
>
> - it should could how many code units (utf 8 or 16 depending
>  on the target) the string has
>
> - write out the length as a binary integer, followed by the
>  code units of the string, again as binary data
>
> read_binary would then reverse the process.
>
> This should work. It should even work for Java, which the
> comments on these predicates say current code does not.
>
> Opinions? Objections?

IMO, the above predicates should be removed from the standard library
entirely.

That said, if they remain and keep using the term-to-string approach, we
should always write the string to the binary stream in UTF-8 encoding
regardless of what the backend is (e.g. using io.write_binary_string_utf8/{3,4}).
(Although we don't really have a convenient mechanism for going the
other way around yet.)

> Note that I think the obvious size of the length prefix is 64 bits.
> 32 would work in virtually all cases, but it is not futureproof, and
> the savings are as tiny as the chance that anyone will ever want
> to invoke these preds on a >4Gb term.

That's fine. Both C# and Java impose maximum lengths on string, but we
can check if the size is exceeds those. There's no point artificially
limiting the C backend here.

Julien.


More information about the reviews mailing list