Author: Kevin B. Kenny <kevin.b.kenny@gmail.com>
State: Draft
Type: Project
Vote: Pending
Created: 25-Oct-2017
Tcl-Version: 8.7
Keywords: assertion, pragma, type, alias, compilation
Post-History:
Abstract
This TIP proposes a new ensemble in the ::tcl
namespace,
::tcl::pragma
, that will provide a place to install commands that
make structural assertions about Tcl code. Initially, two subcommands
will be provided:
::tcl::pragma type
,
which asserts that Tcl values
are lexically correct objects of a given data type, and
::tcl::pragma noalias
,
which describes the possible aliasing relationships among a
group of variables. The assertions are provided in an ensemble,
so that the set of available assertions can be expanded in the future
as additional opportunities are discovered to make useful claims about
program and data structure.
Motivation
Tcl, of course, is a typeless language: every value is a string. Moreover, it is an intensely dynamic language: the association of names with commands and variables is made very late, sometimes only when code is executed that searches for a variable by name.
Nevertheless, often a programmer's intention is to have values from a restricted set of strings, or to make restrictions on what names may address what variables. For instance, it may be known that a given piece of code is prepared to accept only numeric data, well-formed lists, Boolean values, or some other restricted type of data as its input.
Similarly, a great many programs that import variables using forms
such as global
, variable
, upvar
, namespace upvar
, and the
custom variable resolutions of systems like TclOO cannot function
correctly if two or more of their variable names actually designate
the same variable. A procedure like
proc collect {inputVar} {
upvar 1 $inputVar inputs
variable collection
for {set i 0} {$i < [llength $inputs]} {incr i} {
lappend collection [lindex $inputs $i]
}
}
will surely yield surprising results if called with collection
as
its parameter!
Giving the programmer the capability to specify restrictions on data types and alias relationships would have multiple advantages:
It documents what is expected. In particular, procedure, method and lambda parameters can have assertions about their structure early in a procedure, informing callers what preconditions must be met.
It fails early. Rather than having mistaken values or unexpected aliases run some way into a procedure and then fail mysteriously or even silently, it can yield an informative message at the first sign of a violated condition.
It aids with code optimization. While data type restrictions can be deduced by a compiler with considerable effort (1), making them explicit can still lead to more performant code. Alias restrictions are considerably harder to deduce, and the problem is Turing-complete in general. Unexpected aliases can be created at points in the program far remote from a procedure. Code like
uplevel #0 {upvar 0 ::path::to::variable ::some::other::thing}
will create an alias without any procedure accessing one or another of the variables being any the wiser.
Proposal
The ::tcl::pragma
ensemble will be added. Initially, it will have
two members: ::tcl::pragma type
and ::tcl::pragma noalias
.
tcl::pragma type
The ::tcl::pragma type
command will have the syntax:
::tcl::pragma type typeName $value1 $value2...
In this usage, typeName
is a description of the acceptable type of
the given values. The values will be checked for whether they are
instances of the given type, and a run-time error will be thrown if
any value is not. Initially, the following types will be supported:
boolean
: Indicates that the value is a Boolean:0
,1
,off
,on
,true
,false
,yes
,no
: in general, a value that will pass the test ofstring is boolean -strict.
integer
: Indicates that the value is an integer, small enough to fit in a C 'int' value on the current platform.wide
: Indicates that the value is an integer, small enough to fit in aTcl_WideInt
value on the current platform.entier
: Indicates that the value is an integer, without constraint on its size.double
: Indicates that the value is representable as a double-precision floating point number (including the special values for Infinity and Not-a-Number).
It is anticipated that further TIP's will be proposed that expand the available set of types. In particular, lists and dictionaries are foreseen as being useful things to include. The reason that they are not yet being specified in this proposal is that they will also benefit from specification of the types of the contained objects, and the syntax for this specification is still being discussed.
Note that this command operates on values, not variables. A command like:
::tcl::pragma int $a
does not declare that a
is an integer variable, and require future
assigmnents to it to have the given type. It merely asserts that at
the current point in the program, the value of a
will be an
integer small enough to fit in a C int
.
One may think of this assertion as syntactic sugar for the longer codeburst:
if {![string is integer -strict $a]} {
return -code error -level 0 "expected an integer but got $a"
}
and in fact the bytecode compiler will be free to compile that, or similar code. (The description is slightly oversimplified, since other error options must also be manipulated.)
tcl::pragma noalias
The syntax for the ::tcl::pragma noalias
command shall be:
::tcl::pragma noalias set1 set2...
In this usage, set1
, set2
, ... are lists of variable names.
The syntax expresses the assertion that variables that are mentioned
in the call are not aliases of each other at the time the command is
executed, except that variables in the same set are permitted to
alias.
The most common usage will be simply to use singleton sets. For
instance, the collect
procedure above might contain
::tcl::pragma noalias inputs collection
following the command
upvar 1 $inputsVar inputs
This command would have the effect of asserting that inputs
and
collection
designate distinct variables, avoiding strange behaviour
of modifying the inputs while an iteration is in progress.
It is possible for any combination of aliases to be permitted by
including the possibility on the command line. For instance to assert
that a
may be an alias of b
or c
, but b
and c
must not alias
each other, the command:
::tcl::pragma noalias {a b} {a c}
might be used. (The program could specify, redundantly, b
and c
on the
command line, but the noalias
command will enforce that any variable
mentioned anywhere in its arguments is not aliased to any other,
except as specified.
As a final note, it is anticipated that
::tcl::pragma noalias [info locals]
will be a common usage - most programs do not tolerate any unexpected aliasing at all. It is therefore further anticipated that this specific usage may receive special handling in the implementation.
As with type
, noalias
is an assertion of the state of the program
at a given point in the flow of execution. It does not establish a
permanent constraint. A subsequent command such as upvar
may change
the aliasing relation, and there will be no prevention of such a
change.
It is worth noting that the necessary interfaces to implement this command are not yet available at the Tcl level at all. A Tcl script has no easy way to determine whether one variable is an alias for another. This command has no counterpart in today's Tcl.
A quick view may lead one to suspect that noalias
will require
quadratic time to check the relationships at runtime. In at least the
common cases, though, it is to be expected that noalias
will run in
time O(N), where N is the number of included variables. Instead
of comparing all pairs, it will be easier to maintain a hash table of
variable addresses, and check for collisions by looking for existing
hash entries.
Discussion
The Naming of Names
An appropriate name for this ensemble is a difficult choice. A very
early draft of this proposal, circulated privately, suggested
::tcl::assume
(since it was seen as a claim that it is safe for
a compiler to make a given assumption). This name was roundly
rejected by the reviewers. An alternative that was counterproposed
was ::tcl::assert
. The disadvantage to the latter name is that it is
easy to imagine a piece of code wanting to [[namespace import]]
both
::tcl::assert
and ::control::assert
leading to a name
collision. Moreover, ::tcl::assert
does not take a Boolean expression
but rather a different sort of expression of a constraint. The
similarity of the names would therefore be confusing. In names, as in
many other aspects of life, "the good ones are already taken."
Runtime behaviour
The assertions described in this TIP are not without cost at runtime.
In an interpreted environment, it may be desirable to control, on a
per-namespace basis, whether the assertions are enforced. In a
compiled environment, many of these assertions will either enable
more aggressive optimization, be removable themselves with appropriate
analysis to prove they are unnecessary, or both. For this reason, the
proponent wishes to consider enabling and disabling of structural
assertions to be Out Of Scope at the present time. If it does prove to
be necessary, it can be done with a mechanism analogous to the way
that today's ::control::assert
works.
References
- Kenny, Kevin B. and Donal K. Fellows. 'The State of Quadcode 2017.' Proc. 24th Annual Tcl/Tk Conf. Houston, Tex.: Tcl Community Association, October 2017. https://core.tcl.tk/tclquadcode/raw/doc/tcl2017/kbk-dkf-state-of-quadcode.odt?name=ed72b79c8b