Author: Jan Nijtmans <jan.nijtmans@users.sf.net>
Author: Jan Nijtmans <jan.nijtmans@gmail.com>
Author: Don Porter <donald.porter@nist.gov>
State: Draft
Type: Project
Vote: Pending
Created: 23-Jan-2018
Post-History:
Discussions-To: Tcl Core list
Keywords: Tcl
Tcl-Version: 9.0
Abstract
This TIP proposes to add full support for all characters in Unicode 10.0+, inclusive the characters >= U+010000, even the adaptation in the regexp engine. Also, the caveats remaining from TIP #389 will be handled here.
Summary
TODO
Rationale
TODO
Specification
This document proposes:
Add a new objType "UTF-32", which is able to store a string in 32-bits per character.
Adapt the regexp engine to start using the "UTF-32" objType: Any string handled by regexp will first be converted to "UTF-32".
Modify all API using Tcl_UniChar: If the string contains surrogate pairs, the "UTF-32" objType will used.
Modify all functions using or producing an index: "string length
" should return 1 for all Unicode characters, even the ones >= U+010000
TODO: everything else that comes up
Compatibility
TODO
Reference Implementation
A reference implementation is not started yet.
Copyright
This document has been placed in the public domain.