Author: Jan Nijtmans <jan.nijtmans@gmail.com>
State: Draft
Type: Project
Tcl-Version: 8.7
Tcl-Branch: tip-575
Abstract
In the original TIP #389 implementation Tcl_UtfNext()
/Tcl_UtfPrev()
were able to jump 4 bytes forth and back. This
is contrary to the common expectation and history when Tcl_UtfNext()
/Tcl_UtfPrev()
where only able to jump
maximum TCL_UTF_MAX
bytes. Even though his was never documented, but - given other functions handling Tcl_UniChar
's -
a logical expectation. However, it is problematic to allow Tcl_UtfNext()
/Tcl_UtfPrev()
to jump to a byte
within a valid 4-byte UTF-8 byte sequence: It means that the pointer points to a continuation byte, which
the caller could interpret as an invalid byte sequence.
Specification
Implement new functions Tcl_UtfCharComplete()
/Tcl_UtfNext()
/Tcl_UtfPrev()
, which can jump 4 bytes forward resp. back,
so it is possible to jump over characters > U+FFFF in one step in stead of two.
Those 3 functions will get their own new entries in the stub table. So, extensions (however rare) using
Tcl_UtfNext()
/Tcl_UtfPrev()
but compiled against Tcl 8.6 headers will keep their original behavior.
The new Tcl_UtfCharComplete()
will behave almost identical to the old one. The only difference is when it encounters
a starting byte between 0xF0 and 0xF5: It will return true only when at least 4 bytes are available.
If an extension is compiled with -DTCL_UTF_MAX=4
or with -DTCL_NO_DEPRECATED
, then Tcl_UtfCharComplete()
will
start behaving like described in this TIP, if not then it will behave exactly as in Tcl 8.6.
TODO: Describe whatever results from this experiment.
Rationale
TODO
Implementation
Implementation is in development (highly expremental) in branch tip-575
Copyright
This document has been placed in the public domain.