Author: Alexandre Ferrieux <alexandre.ferrieux@gmail.com>
State: Draft
Type: Project
Vote: Pending
Created: 02-Feb-2009
Post-History:
Keywords: Tcl,encoding,convertto,strict,Unicode,String,ByteArray
Tcl-Version: 8.7
Abstract
This TIP proposes to raise an error when an encoding-based conversion loses information.
Background
Encoding-based conversions occur e.g. when writing a string to a channel. In doing so, Unicode characters are converted to sequences of bytes according to the channel's encoding. Similarly, a conversion can occur on request of the ByteArray internal representation of an object, the target encoding being ISO8859-1. In both cases, for some combinations of Unicode char and target encoding, the mapping is lossy (non-injective). For example, the "e acute" character, and many of its cousins, is mapped to a "?" in the 'ascii' target encoding. Also, Unicode chars above \u00FF get 'projected' onto their low byte in the ISO8859-1 ByteArray conversion.
This loss of information, in the first case, introduces unnoticed i18n mishandlings. In the second case, it makes it unreliable to do pure-ByteArray operations on objects unless they have no string representation. This induces unwanted and hard-to-debug performance hits on bytearray manipulations when people add debugging puts.
Proposed Change
This TIP proposes to make this loss conspicuous.
For the first use case, the idea is to introduce a -strict option to encoding convertto, that would raise an explicit error when non-mappable characters are met. Lossy conversions during channel I/O would also fail if a -strictencoding true [fconfigure option] is set. For the second case, we simply want the conversion to fail, like does the Listification of an ill-formed list. In both cases, the change consists of letting the proper internal conversion routine like SetByteArrayFromAny return TCL_ERROR.
Rationale
The second case does imply a Potential Incompatibility, since currently SBFA is documented to always return TCL_OK. However, it is felt that virtually all cases that are sensitive to this, are actually half-working in a completely hidden manner. Hence the global effect is a healthy one.
Reference Example
See Bug 1665628 https://sourceforge.net/support/tracker.php?aid=1665628 .
Copyright
This document has been placed in the public domain.