TIP 401: Comment Words with Leading {#}

Login
Bounty program for improvements to Tcl and certain Tcl packages.
Author:         Lars Hellström <Lars.Hellstrom@residenset.net>
State:          Draft
Type:           Project
Vote:           Pending
Created:        29-Apr-2012
Post-History:   
Tcl-Version:    8.7

Abstract

The basic syntax rules of Tcl (the "dodekalogue") are modified to allow words that are comments. In analogy with the argument expansion {*}, such comment words will begin with {#} (left brace, hash sign, right brace).
The change concerns both words in scripts and, more importantly, words in lists.

Rationale

Tcl is special in that comments appear at the "statement" (command) level of the language syntax rather than the "token" level (as is the case in e.g. the ALGOL language family: C, Pascal, Java, etc.). This means a Tcl program has fewer places in which a comment can be placed than many other languages, but this has not been a serious problem in traditional Tcl programming as commands tend to be short (except when they have arguments that are themselves Tcl scripts) and places where a comment can be inserted thus frequent enough. Recent developments in the language have however changed this.

Various new features -- particularly dictionaries, ensembles, and argument expansion (all of which were introduced with Tcl 8.5) -- have encouraged a coding style where occasionally fairly large blocks of code can be spent setting up data structures as explicit values. For example, an ensemble that relies on the -map option typically has the entire mapping dictionary as a braced word in the code, and this can grow rather large if subcommands are being mapped to command prefixes of length greater than one. string map mappings can grow very large if there are many cases to deal with. Dictionaries have greatly simplified the use of values with inner structure, and as a result the complexity of the data that routinely gets passed to package commands has increased; a controlling data structure (e.g. a grammar, in the case of a parser) that once required an entire API of constructors to build might now have been redesigned as a simple nested dictionary that can be written raw in the code. And on top of it all, the various examples given above are not excluding each other, but may rather nest to form even larger blocks of code without any part that is a script. Adding a class of comments that can begin anywhere a word can begin undoes the forced "comment desert" status of such code blocks, because words are syntactic units not just in commands but also in lists, and all of the structured explicit values mentioned above are syntactically either plain lists or lists with additional restrictions.

A parallel development is that the gap between what some piece of Tcl looks like at coding time and at runtime has begun to widen, because an increasing number of APIs call for fragments of commands (particularly command prefixes) rather than full commands or scripts. This change is often good from a correctness and efficiency point of view, but can make code harder to maintain on account of being more obscure; the evocation of runtime calls resulting from a particular piece of code sometimes requires a considerable effort of imagination, even if all APIs involved are well documented. Comment words that appear in the position(s) where additional material is inserted at runtime can help the mind here, by letting the eyes see in the code those things that will be there when it is evaluated, e.g. rather than

  socket -server [list ::some::handler $settings] $port

one might write

  socket -server [
      list ::some::handler $settings {#}chan {#}client {#}clientport
  ] $port

to emphasize the fact that this ::some::handler command takes those four things as arguments, even though only one of them is present in the command prefix.

For command words to not change the interpretation of any presently legal syntactic construction, they must be something which is not valid as a list element word. The brace-something-brace-something syntax region where argument expansion was given a home is pretty much the only possibility there is for this (if one rules out unbalanced words), since there is very little that the Tcl parser finds outright wrong. The # character is the normal way of starting a command-level comment, so it is natural that it occurs also in the syntax of word-level comments.

Specification

Clause 5 (argument expansion) of the Tcl.n manpage is to be amended with the following conditions, and the language parser is to be modified accordingly.

If a word starts with the string "{#}" followed by a non-whitespace character, then the leading "{#}" is removed and the rest of the word is parsed and substituted as any other word. The result of this substitution is not used for anything, and no word is added to the command being substituted. For instance, "cmd a {#}{b c} d {#}e f" is equivalent to "cmd a d f".

Moreover, the analogous modification shall be made to the list parser; a word with {#} proper prefix is recognised as a comment also in a list, where the initial substitution phase only performs backslash substitution.

Note 1: The point of doing substitution is to stick as close to the behaviour of {*} as possible. Of the three steps involved in argument expansion - parse and substitute word, reparse result without substitution as a list of words, and append those words to command being built -- only the middle one need to be different for {#}, and thus a lot of the code can be shared.

Note 2: The comment prefix is typically most useful with words like

 cmd {#}{some [text]} {#}bareword {#}"Comment goes here"

but things like this are also legal:

 cmd {#}$var {#}$x,\n {#}[foo bar]

Note 3: It is very important for serveral use-cases that comment words are recognised as such also in lists. One might (as I would) argue that this should really follow automatically, since outside the source itself the only (and not very explicit) documentation of the string representation of lists is found in the lindex.n manpage which merely says that:

In extracting the element, lindex observes the same rules concerning braces and quotes and backslashes as the Tcl command interpreter; however, variable substitution and command substitution do not occur.

As only variable and command substitution are mentioned as things which differ between lists and commands, they therefore must treat {#} the same. However, presently the argument expansion {*} is not recognised in lists, despite there not being a stated exception for that either. (This state of affairs is reasonable, since it would be very complicated to extend present mechanisms to support argument expansion in lists and the benefit of doing so is slim at best, but it should be more clearly documented.)

Use Cases

More comments in switch

Beginning with an old issue, one may consider the placement of comments to switch cases. The current advice is to place them first in the bodies (which of course works), but it can often be exposition-wise more natural to place them before the pattern, especially if there is not a 1-1 correspondence between comments and bodies.

  switch -regexp [string trimleft $number +-] {

      {#}"Integer formats"
      {^0$} - 
      {^[1-9][0-9]*$} {
           # ...
      }
      {^0o[0-7]+$} {
           # ...
      }
      {^0x[0-9A-Fa-f]+$} {
           # ...
      }

      {#}"Float formats"
      {^[0-9]*\.[0-9]+(e[+-][0-9]+)$} -
      {^[0-9]+\.[0-9]*(e[+-][0-9]+)$} -
      {^[0-9]+e[+-][0-9]+$}  {
           # ...
      }

  }

Inline comments in long commands

Some commands can be very long simply because they require a lot of arguments to express what one wants, and then comments can help clarify what a particular argument contributes to.

  $canvas create polygon 0 0 {#}"left top" 30 0 {#}"bend down" 30 30 \
    {#}"concave part" 60 30 {#}"bend up" 60 0 {#}"right top" 90 0 \
    {#}"curving back" 90 30 45 67 0 30 {#}"done" -smooth true \
    -width 2 -fill orange -outline green {#}"Official colours" \
    -tags {button {#}"for clicking" buoyant {#}"affects movement"}

If there are several ideas in such a command, it might be nice to put the comments pertaining to each next to where that idea actually shows up in the code.

Inline argument descriptions

The Tcl style guide for C code suggests that comments describing function arguments appear inline with the argument declarations. Comment words would permit the same style in Tcl code.

 proc tcl::Pkg::CompareExtension {
    fileName      {#}"name of a file whose extension is compared"
    {ext {}}      {#}"The extension to compare against; you must
                      provide the starting dot. Defaults to the 
                      info sharedlibextension."
 } {
    # ...
 }

(Whether that style would be regarded as an improvement or not probably depends on one's taste.)

Filling in words that will be there at runtime

At one point, I found myself wanting to do some calculations with matrices whose elements were polynomials (with integer coefficients) of four noncommuting variables A, B, C, and D. Having previously implemented some basic algebraic constructions, I could quickly set up a command that implemented arithmetic with such matrices through the two interp aliases

 interp alias {} Z<A,B,C,D> {} \
   ::mtmtcl::rings::semigroup_algebra ::mtmtcl::rings::integers::all\
     ::mtmtcl::groups::string_free_monoid
 interp alias {} matrices {} \
   ::mtmtcl::matprop::trivial dummy ::Z<A,B,C,D>

This is however not a very readable definition even if one is familiar with the commands it uses (and realises that there is nothing magical about the command name Z), because the way that things appear in this piece of code is visually quite different from the context in which they will appear when evaluated. If instead comment words are inserted as visual placeholders for the missing material, then the overall appearance becomes much closer to that of a script calling these commands.

 interp alias {} Z<A,B,C,D> {#}method {#}args {} \
   ::mtmtcl::rings::semigroup_algebra {
      ::mtmtcl::rings::integers::all {#}method {#}args
   } {
      ::mtmtcl::groups::string_free_monoid {#}method {#}args
   } {#}method {#}args
 interp alias {} matrices {#}method {#}args {} \
   ::mtmtcl::matprop::trivial dummy {
      ::Z<A,B,C,D> {#}method {#}args
   } {#}method {#}args

"K combinator"

In the Tcl community, the K combinator idiom is when you first produce an argument for the main command (through variable or command substitution), then evaluate some other command which has beneficial side-effects but whose result is of no interest, and finally evaluate the main command. (The original K combinator, as found combinatory logic, is more about getting rid of unwanted arguments supplied to a command prefix than about exploiting side-effects, but there is a continuum connecting the two.) This is most often employed to have a variable release its reference to a Tcl_Obj, so that the latter becomes unshared and possible for a command to modify directly. {#} introduces the new form

 set stack [lreplace $stack {#}[set stack whatever] end end]

of this, providing yet another alternative to such old forms as

 set stack [lreplace $stack [set stack end] end]
 set stack [lreplace $stack[set stack ""] end end]
 set stack [lreplace $stack {*}[set stack ""] end end]

Of course, any

 foo $apa {#}[bar baz]

can trivially be rewritten to achieve the same effect using argument expansion instead:

 foo $apa {*}[bar baz; list]

Commenting out list elements

If a programmer needs to temporarily disable some functionality, then a standard technique is to comment out the corresponding code. However, if the code that needs to be commented out amounts to some elements in a long list (e.g., a list of commands to [interp expose] in a slave interpreter) then there is presently no way of commenting out less than the whole command containing that list. Comment words provide a more specific alternative.

   set tclCommands {
       after append array binary break case catch cd clock close concat
       continue dde else elseif encoding eof error eval exec exit expr
       fblocked fcondict figure fcopy file fileevent flush for foreach 
       format gets glob global history if incr info interp join lappend
       lassign lindex linsert list llength load lrange lrepeat lreplace
       lsearch lset lsort namespace open package pid proc puts pwd read
       regexp regsub rename resource return scan seek set slave socket
       source split string subst switch tell time trace unknown
       unset update uplevel upvar variable vwait while
       {#}{ {#}"Auto-loaded commands"
       auto_execok auto_import auto_load auto_mkindex auto_mkindex_old
       auto_qualify auto_reset parray pkg::create pkg_mkIndex tcl_endOfWord
       tcl_findLibrary tcl_startOfNextWord tcl_startOfPreviousWord
       tcl_wordBreakAfter tcl_wordBreakBefore
       }
   }

The same is true in dictionaries.

 namespace eval ::tcl::info {
     namespace ensemble create -command ::info -map {
        exists             ::tcl::info::exists 
        globals            ::tcl::info::globals 
        locals             ::tcl::info::locals 
        vars               ::tcl::info::vars 
        args               ::tcl::info::args 
        body               ::tcl::info::body 
        default            ::tcl::info::default 
        commands           ::tcl::info::commands 
        procs              ::tcl::info::procs 
        functions          ::tcl::info::functions 
        cmdcount           ::tcl::info::cmdcount 
        complete           ::tcl::info::complete 
        script             ::tcl::info::script 
        level              ::tcl::info::level 
        frame              ::tcl::info::frame 
        errorstack         ::tcl::info::errorstack 
        patchlevel         ::tcl::info::patchlevel 
        tclversion         ::tcl::info::tclversion 
        {#}{ {#}"Commented out for bug #nnnnnn."
        hostname           ::tcl::info::hostname 
        sharedlibextension ::tcl::info::sharedlibextension 
        loaded             ::tcl::info::loaded 
        library            ::tcl::info::library 
        nameofexecutable   ::tcl::info::nameofexecutable 
        }
        coroutine          ::tcl::info::coroutine 
        object             ::oo::InfoObject 
        class              ::oo::InfoClass
     }
 }

Line continuation

Comment words also provide an alternative to backslash--newline line continuations, namely as in

  a command which continues well beyond the normal line width {#}{
  } and which one therefore might want to split over two or more {#}{
  } lines of code

Being several times longer than the simple backslash, this is unlikely to replace it, but there could be cases where one wants to avoid the backslash because that character would be intercepted by something else.

Alternatives

Using the author's docstrip package http://tcllib.sourceforge.net/doc/docstrip.html , one can place comments between any two lines of code (whether there is a command separator there or not) and also comment out arbitrary code lines, even if that comes at the price of working with sources that are not raw Tcl code. However, such comments have a tendency to become more of a separate commentary track than an integrated part of the program narrative, and sometimes (for example to show things that are invisible in the code) one specifically desires comments to be an integral part of the Tcl code.

It is also possible to use regsub or something similar to the end of removing comments from a block of code before it is evaluated or put in a variable; instead of having the core language recognise some pieces of code as comments, one preprocesses the code as a string before telling the core that it actually is Tcl code (or a list/dict/etc.). Thus instead of (assuming comment words):

   set tclCommands {
       after append array binary break case catch cd clock close concat
       continue dde else elseif encoding eof error eval exec exit expr
       fblocked fcondict figure fcopy file fileevent flush for foreach 
       format gets glob global history if incr info interp join lappend
       lassign lindex linsert list llength load lrange lrepeat lreplace
       lsearch lset lsort namespace open package pid proc puts pwd read
       regexp regsub rename resource return scan seek set slave socket
       source split string subst switch tell time trace unknown
       unset update uplevel upvar variable vwait while
       {#}{ {#}"Auto-loaded commands"
       auto_execok auto_import auto_load auto_mkindex auto_mkindex_old
       auto_qualify auto_reset parray pkg::create pkg_mkIndex tcl_endOfWord
       tcl_findLibrary tcl_startOfNextWord tcl_startOfPreviousWord
       tcl_wordBreakAfter tcl_wordBreakBefore
       }
   }

one might write

   set tclCommands [regsub -all {#[^\n]*\n} {
       after append array binary break case catch cd clock close concat
       continue dde else elseif encoding eof error eval exec exit expr
       fblocked fcondict figure fcopy file fileevent flush for foreach 
       format gets glob global history if incr info interp join lappend
       lassign lindex linsert list llength load lrange lrepeat lreplace
       lsearch lset lsort namespace open package pid proc puts pwd read
       regexp regsub rename resource return scan seek set slave socket
       source split string subst switch tell time trace unknown
       unset update uplevel upvar variable vwait while
       ## Auto-loaded commands
       # auto_execok auto_import auto_load auto_mkindex auto_mkindex_old
       # auto_qualify auto_reset parray pkg::create pkg_mkIndex tcl_endOfWord
       # tcl_findLibrary tcl_startOfNextWord tcl_startOfPreviousWord
       # tcl_wordBreakAfter tcl_wordBreakBefore
   } \n]

A problem with the latter is that it destroys code line correspondences for [280]. The boilerplate code for performing the preprocessing may also need to be inserted far from the actual comment, which discourages use of commenting using such a mechanism. But the main problem with it is that quick fixes like this have a tendency to be half-baked, and would break some otherwise legal code. (Everything works fine in the example until the day someone needs to insert an element which contains the # character into the list.)

Since the canonical list-quoting of # is precisely {#}, one could (in analogy with the argument against {} as expansion prefix) argue that {#}something is too likely to arise as a typo for {#} something, which would suggest using some other character than # in the comment prefix. \# is however a shorter way to list-quote #, so it seems unlikely that {#} should be common in manually written code (unlike {}, which is very common).

Additional alternatives from tcl-core discussion

That (in script words) the initial substitution round performs variable and command substitution is by several seen as problematic. Only performing backslash substitution there is no great loss, as the only use-case above which relies on other substitutions is the K combinator one that anyway has several alternatives already, but it complicates the implementation by introducing a new parsing mode for scripts (grab word without variable or command substitution, but with backslash substitution). In particular, one would have to decide on the exact rules for this; it would probably be best to use the same rules as for command words in lists, which however implies that

 list {#}[foo bar]

would produce a list with one element, namely the string "bar]".

A different way of fitting comments into the brace-something-brace-something syntax region would be to put the comment material in the first something rather than the second. This could take the form of comment words which begin with {# and end with }#, as in

  $canvas create polygon 0 0 {# left top }# 30 0 {#bend down}# 30 30 \
    {# concave part}# 60 30 {#bend up }# 60 0 {# right top }# 90 0 \
    {# curving back }# 90 30 45 67 0 30 {# done }# -smooth true \
    -width 2 -fill orange -outline green {# Official colours }# \
    -tags {button {# for clicking }# buoyant {#affects movement}#}

This is similar to Cloverfield's current #{ and }# comment delimiters, and perhaps also reminicient of C's /* ... */ and Pascal's (* ... *). It should however be stressed that the nesting here would still be that of braces rather than brace+hash combinations, so

  {# a b { c }# d {# e f } g }#

would be one comment word, not two with a non-comment word d in between. Using this syntax could thus lead to false expectations about where a comment ends.

A technical disadvantage of {# and }# for comment word delimiters is that they make TclFindElement slightly more expensive than for the proposed {#} prefix. The reason is that TclFindElement needs to peek at the word after the element it finds to check if that is a comment (and if so scan past it and peek at the word after that too). If the comment status of a word can be determined from whether it carries a {#} prefix, then it is at worst necessary to peek at the first four characters, but if it matters how the word ends then it might be necessary to parse the whole of the next element in a list before being able to tell that it indeed was not a comment. From a usability point of view, this syntax lends less support for the argument placeholder use-case than the proposed prefix.

Donald G Porter has pointed out that this proposal has two functionally unrelated parts, which indeed are easy to discern in the reference implementation: comment words in lists and the like (changes in TclFindElement) and comment words in scripts (all other changes). It would be possible to propose implementing either one of these parts alone, and the one which is most interesting is then comment words in lists. (Given just that, one could even use

   {*}{{#}"Comment goes here"}

as an ugly substitute for word comments in scripts.) However, I believe it is more natural to introduce both together.

Suggested Stylistic Considerations

Like ordinary words, comment words come in three varieties: brace-delimited, quote-delimited, and barewords. A style rule which has been used in the above examples is that:

  • Natural language comment words are of the quote-delimited kind. This is for consistency with the common practice of quote-delimiting text strings within Tcl code, even if said string does not require substitution.

  • Comment words that serve as placeholders for actual program code are of the bareword variety, e.g. {#}method.

  • Comment words which serve a programmatic purpose, essentially that of the "K combinator", are barewords too.

  • Comment words which do not fall into any of the above categories, for example those used to comment of a block of words, are brace-delimited.

One reason for having a style rule about this is that prettyprinting and syntax colouring utilities might want to highlight these cases differently. Arguably, syntax colouring of Tcl code tends to do at least as much harm as it does good since few engines are able to keep track of enough context to found their claims about the code on reasonably accurate interpretations thereof, but comment words is one of the few things that they might actually manage to get right most of the time, precisely because a comment word is a comment word throughout so many contexts.

Reference Implementation

A reference implementation is available as SF Tcl patch #3522426 https://sourceforge.net/support/tracker.php?aid=3522426 . It includes some tests, but more could be needed. DGP has also created a branch tip-401 http://core.tcl.tk/tcl/timeline?r=tip-401 for it in the core fossil respository; further development is best conducted in the latter.

Implementing comment words in lists and the like could be achieved by modifications only in TclFindElement. Comment words in scripts are implemented as argument expansion that does not contribute any word.

Copyright

This document has been placed in the public domain.

History