TIP 480: Type and Alias Assertions for Tcl

Login
Bounty program for improvements to Tcl and certain Tcl packages.
Author:		Kevin B. Kenny <kevin.b.kenny@gmail.com>
State:		Draft
Type:		Project
Vote:		Pending
Created:	25-Oct-2017
Tcl-Version:	8.7
Keywords:	assertion, pragma, type, alias, compilation
Post-History:

Abstract

This TIP proposes a new ensemble in the ::tcl namespace, ::tcl::pragma, that will provide a place to install commands that make structural assertions about Tcl code. Initially, two subcommands will be provided: ::tcl::pragma type, which asserts that Tcl values are lexically correct objects of a given data type, and ::tcl::pragma noalias, which describes the possible aliasing relationships among a group of variables. The assertions are provided in an ensemble, so that the set of available assertions can be expanded in the future as additional opportunities are discovered to make useful claims about program and data structure.

Motivation

Tcl, of course, is a typeless language: every value is a string. Moreover, it is an intensely dynamic language: the association of names with commands and variables is made very late, sometimes only when code is executed that searches for a variable by name.

Nevertheless, often a programmer's intention is to have values from a restricted set of strings, or to make restrictions on what names may address what variables. For instance, it may be known that a given piece of code is prepared to accept only numeric data, well-formed lists, Boolean values, or some other restricted type of data as its input.

Similarly, a great many programs that import variables using forms such as global, variable, upvar, namespace upvar, and the custom variable resolutions of systems like TclOO cannot function correctly if two or more of their variable names actually designate the same variable. A procedure like

proc collect {inputVar} {
    upvar 1 $inputVar inputs
    variable collection
    for {set i 0} {$i < [llength $inputs]} {incr i} {
        lappend collection [lindex $inputs $i]
    }
}

will surely yield surprising results if called with collection as its parameter!

Giving the programmer the capability to specify restrictions on data types and alias relationships would have multiple advantages:

  1. It documents what is expected. In particular, procedure, method and lambda parameters can have assertions about their structure early in a procedure, informing callers what preconditions must be met.

  2. It fails early. Rather than having mistaken values or unexpected aliases run some way into a procedure and then fail mysteriously or even silently, it can yield an informative message at the first sign of a violated condition.

  3. It aids with code optimization. While data type restrictions can be deduced by a compiler with considerable effort (1), making them explicit can still lead to more performant code. Alias restrictions are considerably harder to deduce, and the problem is Turing-complete in general. Unexpected aliases can be created at points in the program far remote from a procedure. Code like

    uplevel #0 {upvar 0 ::path::to::variable ::some::other::thing}
    

will create an alias without any procedure accessing one or another of the variables being any the wiser.

Proposal

The ::tcl::pragma ensemble will be added. Initially, it will have two members: ::tcl::pragma type and ::tcl::pragma noalias.

tcl::pragma type

The ::tcl::pragma type command will have the syntax:

::tcl::pragma type typeName $value1 $value2...

In this usage, typeName is a description of the acceptable type of the given values. The values will be checked for whether they are instances of the given type, and a run-time error will be thrown if any value is not. Initially, the following types will be supported:

  • boolean: Indicates that the value is a Boolean: 0, 1, off, on, true, false, yes, no: in general, a value that will pass the test of string is boolean -strict.

  • integer: Indicates that the value is an integer, small enough to fit in a C 'int' value on the current platform.

  • wide: Indicates that the value is an integer, small enough to fit in a Tcl_WideInt value on the current platform.

  • entier: Indicates that the value is an integer, without constraint on its size.

  • double: Indicates that the value is representable as a double-precision floating point number (including the special values for Infinity and Not-a-Number).

It is anticipated that further TIP's will be proposed that expand the available set of types. In particular, lists and dictionaries are foreseen as being useful things to include. The reason that they are not yet being specified in this proposal is that they will also benefit from specification of the types of the contained objects, and the syntax for this specification is still being discussed.

Note that this command operates on values, not variables. A command like:

::tcl::pragma int $a

does not declare that a is an integer variable, and require future assigmnents to it to have the given type. It merely asserts that at the current point in the program, the value of a will be an integer small enough to fit in a C int.

One may think of this assertion as syntactic sugar for the longer codeburst:

if {![string is integer -strict $a]} {
    return -code error -level 0 "expected an integer but got $a"
}

and in fact the bytecode compiler will be free to compile that, or similar code. (The description is slightly oversimplified, since other error options must also be manipulated.)

tcl::pragma noalias

The syntax for the ::tcl::pragma noalias command shall be:

::tcl::pragma noalias set1 set2...

In this usage, set1, set2, ... are lists of variable names. The syntax expresses the assertion that variables that are mentioned in the call are not aliases of each other at the time the command is executed, except that variables in the same set are permitted to alias.

The most common usage will be simply to use singleton sets. For instance, the collect procedure above might contain

::tcl::pragma noalias inputs collection

following the command

upvar 1 $inputsVar inputs

This command would have the effect of asserting that inputs and collection designate distinct variables, avoiding strange behaviour of modifying the inputs while an iteration is in progress.

It is possible for any combination of aliases to be permitted by including the possibility on the command line. For instance to assert that a may be an alias of b or c, but b and c must not alias each other, the command:

::tcl::pragma noalias {a b} {a c}

might be used. (The program could specify, redundantly, b and c on the command line, but the noalias command will enforce that any variable mentioned anywhere in its arguments is not aliased to any other, except as specified.

As a final note, it is anticipated that

::tcl::pragma noalias [info locals]

will be a common usage - most programs do not tolerate any unexpected aliasing at all. It is therefore further anticipated that this specific usage may receive special handling in the implementation.

As with type, noalias is an assertion of the state of the program at a given point in the flow of execution. It does not establish a permanent constraint. A subsequent command such as upvar may change the aliasing relation, and there will be no prevention of such a change.

It is worth noting that the necessary interfaces to implement this command are not yet available at the Tcl level at all. A Tcl script has no easy way to determine whether one variable is an alias for another. This command has no counterpart in today's Tcl.

A quick view may lead one to suspect that noalias will require quadratic time to check the relationships at runtime. In at least the common cases, though, it is to be expected that noalias will run in time O(N), where N is the number of included variables. Instead of comparing all pairs, it will be easier to maintain a hash table of variable addresses, and check for collisions by looking for existing hash entries.

Discussion

The Naming of Names

An appropriate name for this ensemble is a difficult choice. A very early draft of this proposal, circulated privately, suggested ::tcl::assume (since it was seen as a claim that it is safe for a compiler to make a given assumption). This name was roundly rejected by the reviewers. An alternative that was counterproposed was ::tcl::assert. The disadvantage to the latter name is that it is easy to imagine a piece of code wanting to [[namespace import]] both ::tcl::assert and ::control::assert leading to a name collision. Moreover, ::tcl::assert does not take a Boolean expression but rather a different sort of expression of a constraint. The similarity of the names would therefore be confusing. In names, as in many other aspects of life, "the good ones are already taken."

Runtime behaviour

The assertions described in this TIP are not without cost at runtime. In an interpreted environment, it may be desirable to control, on a per-namespace basis, whether the assertions are enforced. In a compiled environment, many of these assertions will either enable more aggressive optimization, be removable themselves with appropriate analysis to prove they are unnecessary, or both. For this reason, the proponent wishes to consider enabling and disabling of structural assertions to be Out Of Scope at the present time. If it does prove to be necessary, it can be done with a mechanism analogous to the way that today's ::control::assert works.

References

  1. Kenny, Kevin B. and Donal K. Fellows. 'The State of Quadcode 2017.' Proc. 24th Annual Tcl/Tk Conf. Houston, Tex.: Tcl Community Association, October 2017. https://core.tcl.tk/tclquadcode/raw/doc/tcl2017/kbk-dkf-state-of-quadcode.odt?name=ed72b79c8b
History