TIP 176: Add String Index Values

Login
Bounty program for improvements to Tcl and certain Tcl packages.
Author:         Damon Courtney <damon@unreality.com>
Author:         Don Porter <dgp@users.sf.net>
Author:         Damon Courtney <damon-tcl@tclhome.com>
State:          Final
Type:           Project
Vote:           Done
Created:        16-Mar-2004
Post-History:   
Tcl-Version:    8.5

Abstract

This TIP proposes extended index formats to be recognized by TclGetIntForIndex, supporting simple index arithmetic for string and list indices.

Rationale

Most of Tcl's commands that accept an integer index of some kind also accept a string index in the format of end-N. This is an extremely useful feature which I hope to extend just slightly by allowing simple addition to the standard index forms to be done during index processing. The change is mostly just syntactic sugar, but it extends a functionality that is already there and does not change any other syntax.

The aim of the proposal is to allow shorthand list and string operations like so:

set x [lsearch $list $elem]
set new [lrange $list $x-1 $x+1]

set x [string first $string "foo"]
set new [string range $string $x $x+5]

Proposal

The description of the supported index formats documented for the string index command will be updated to read:

  • integer

    * For any index value that passes string is integer -strict, the char specified at this integral index (e.g. 2 would refer to the c in abcd).

  • end

    * The last char of the string (e.g. end would refer to the d in abcd).

  • end-N

    * The last char of the string minus the specified integer offset N (e.g. end-1 would refer to the c in abcd).

  • end+N

    * The last char of the string plus the specified integer offset N (e.g. end+-1 would refer to the c in abcd).

  • M+N

    * The char specified at the integral index that is the sum of integer values M and N (e.g. 1+1 would refer to the c in abcd).

  • M-N

    * The char specified at the integral index that is the difference of integer values M and N (e.g. 2-1 would refer to the b in abcd).

  • In the specifications above, the integer value M contains no trailing whitespace and the integer value N contains no leading whitespace.

The internal routine TclGetIntForIndex will be updated to implement the parsing specified by the documentation above.

The documentation of all Tcl commands that call TclGetIntForIndex to parse index values will be updated to refer to the documentation for string index for a description of the index format. These commands include lindex, linsert, lrange, lreplace, lsearch, lset, lsort, and string. This documentation update will remove any mention that the index values e and en are supported (as prefixes of end). Their support will be continued for compatibility, but that support will now be undocumented and deprecated.

The implementation of the commands regexp -start and regsub -start will be updated to call TclGetIntForIndex so that the full set of index formats will be supported. (Currently only integer values are accepted).

The error message produced by TclGetIntForIndex when parsing an invalid index will be updated to give the advice must be integer?[+-]integer? or end?[+-]integer?, in agreement with the extended set of index formats.

Implementation

Tcl Patch 1165695 is a draft implementation. It accomplishes everything proposed above except the update to the error message. If this proposal is accepted, that change, plus the large number of test suite updates it would require, will be added. Tests of the new functionality are also still to come.

Compatibility

Support for the index format end-N where N is an integer with leading whitespace is removed by this proposal. For example the string end- 1 will no longer be recognized as a valid index. Most programmers are surprised to learn that this format is supported currently, so this incompatibility is not expected to cause much of a problem.

Discussion Summary

This feature is added as a quick, simple convenience to having to always use a full-blown expression for a minor task.

It was briefly discussed that since indexes already allowed a space between the - and the N we should continue to support this, but it makes the implementation harder, and no one seemed to care that much. The fact that it breaks backwards compatibility is a small issue since no one seems to use indexes in this manner anyway.

The idea of making indexes extend a list beyond its length by doing something like [linsert $list end+10 $elem] was discussed but tabled since it is much harder to implement and is really outside the scope of this TIP. This may be something to discuss for a future TIP though.

Copyright

This document has been placed in the public domain.

History