TIP 465: Change Rule 8 of the Dodekalogue to Cut Some Corner Cases

Login
Bounty program for improvements to Tcl and certain Tcl packages.
Author:         Andreas Leitgeb <avl@logic.at>
State:          Draft
Type:           Project
Vote:           Pending
Created:        03-Mar-2017
Post-History:   
Tcl-Version:    8.7

Abstract

This TIP proposes to make $-substitution more conforming to naive expectations and just rule out certain odd-ball uses that can safely be assumed to not appear in serious use, but only in crafted examples "serving" for confusion, or as accidentally legal interpretation of mistyped Tcl code.

Rationale

Back in the days where $-substitution was added to Tcl, it was designed to be as syntactically simple as possible. Back then, Tcl was still an interpreted language, so optimising parse time was top priority. For this, it was designed that a sequence starting with ${ would end at the next close brace no matter how many open braces or backslashes are passed by, and $arr( would end at the next close paren no matter how many open parens are found before. This enables odd-ball corner cases that work in an interactive shell at top level, as tried by newbies:

   set "{{{" 42; puts ${{{{}

but fail within a braced block, unless a comment like:

   # }}} }}}

follows within the same block. These are just strange parts of Tcl, that nobody can seriously claim to use for good.

Another surprising part is with arrays, where parens are treated asymmetrically, in that any number of bare open parens, but also quote chars or braces may be part of the index in a $arr(...) substitution, but first bare close paren terminates the token. Quote characters or braces have no significance, apart from that bare close braces might pre-maturely finish the enclosing braced block, which may only be evident to seasoned Tclers - and not even always to them.

The final motivation for writing up this TIP came while discussing one part of [282]: assignment to array elements.

An informal poll showed a clear preference towards bare array(...) naming on left hand side of proposed assignment, without sympathy for any need of explicit disambiguation by quoting or tagging.

Generally disallowing bare open parens, quotes and braces within array indices would mean that array indices on left hand side of an assignment could follow same rules as on right hand side, and parsing an array by these new rules would make sure, that where parsing as a function call and parsing as an array are both successful, then both parses would end up consuming the same portion of the expression body - a prerequisite for making a sound decision about following assignment operator.

Without having array parses and function parses agreeing on close paren, then it is possible that parsing as an array will see a trailing assignment operator that would otherwise have been nested in a subexpression, or even part of a quoted literal value.

Because of the low expected impact on real code, a target of 8.7 is considered feasible.

Implementation

A full implementation of this TIP is now checked in on branch tip-465.

Alternatives

The following points show alternatives that would make sense, but would make the currently rather simple implementation of this TIP ways more complicated:

  • Allow bare parens in array indices if properly paired. Quotes and braces are still disallowed (even if paired), to avoid cases of bad nesting: {(}). This might save users two backslashes in some rare cases where a close paren in the index is already backslash-escaped.

  • Specifically add backslash-quoting to the "body" of ${...}. After all, the "body" is a variable name, and not a nested structure like most braced words in Tcl. This would make some odd variable names once again possible, but now in a consistent syntax that doesn't affect enclosing blocks.

  • ${...} syntax could also be further restricted by disallowing open braces and final backslashes, just to enforce "well-behaving" tokens.

  • A much stricter alternative would disallow unbalanced braces even within "-quoted and unquoted literals. This would disallow common but dangerous idioms like append var "{", which may be followed by an append var "}" in the same block and work, until one of these two commands gets moved into a nested block. The correct and safe way is, of course, backslash-escaping bare braces within string literals. Good Code(tm) wouldn't be affected by this alternative change.

Copyright

This document has been placed in the public domain.

History