Author: David Cuthbert <dacut@kanga.org>
Author: Kevin Kenny <kennykb@acm.org>
Author: Don Porter <dgp@users.sourceforge.net>
Author: Donal K. Fellows <fellowsd@cs.man.ac.uk>
State: Final
Type: Project
Vote: Done
Created: 19-Jan-2001
Post-History:
Discussions-To: news:comp.lang.tcl,mailto:kennykb@acm.org
Keywords: lindex,multiple arguments,sublists
Tcl-Version: 8.4a2
Abstract
Obtaining access to elements of sublists in Tcl often requires nested calls to the lindex command. The indices are syntactically listed in most-nested to least-nested order, which is the reverse from other notations. In addition, the nesting of command substitution brackets further decreases readability. This proposal describes an extension to the lindex command that allows it to accept multiple index arguments, in least-nested to most-nested order, to automatically extract elements of sublists.
Rationale
The heterogeneous nature of Tcl lists allows them to be applied to a number of useful data structures. In particular, lists can contain elements that are, themselves, valid lists. In this document, these elements are referred to as sublists.
Extracting elements from sublists often requires nested calls to lindex. Consider, for example, the following Tcl script that prints the center element of a 3-by-3 matrix:
set A {{1 2 3} {4 5 6} {7 8 9}}
puts [lindex [lindex $A 2] 2]
When these calls are deeply nested - e.g., embedded in an expr arithmetic expression, having results extracted through lrange, etc. - the results are difficult to read:
# Print the sum of the center indices of two 3x3 matrices
set p [expr {[lindex [lindex $A 2] 2] + [lindex [lindex $A 2] 2]}]
# Get all but the last font in the following parsed structure:
set pstruct {text {ignored-data
{ ... }
}
{valid-styles
{justifiction {left centered right full}}
{font {courier helvetica times}}
}
}
return [lrange [lindex [lindex [lindex $pstruct 1] 2] 2] 0 end-1]
Note that the list of indices in the latter example is listed in the reverse order of vector indices. In most other languages/domains, the last line might take on one of the following forms:
return list_range(pstruct[2][2][1], 0, end-1);
return pstruct[[2, 2, 1]][[0:-1]]
temp = pstruct(2, 2, 1);
result = range(temp, 0, length(temp) - 1);
Allowing the lindex command to accept multiple arguments would allow this more-natural style of coding to be written in Tcl.
Specification
Under this proposal, the syntax for the lindex command is to be modified to accept one of two forms:
lindex list indexList
or
lindex list index1 index2... > In either form of the command, the _list_ parameter is
expected to be a well-formed Tcl list.
> In the first form of the command, the _indexList_ argument is
expected to be a Tcl list comprising one or more list indices, each of which must be an integer, the literal string end, or the literal string end- followed by an integer with no intervening whitespace. The existing lindex command is a degenerate form of this first form, where the list comprises a single index.
> In the second form of the command, each of the _index_
arguments is expected to be a list index, which again may be an integer, the literal string end, or the literal string end- followed by an integer with no intervening whitespace.
In either form of the command, once the indexList parameter is expanded, there is a single list parameter and one or more index parameters. If there is a single index parameter N, the behavior is identical to today's lindex command, and the command returns the Nth element of list; if N is less than zero or at least the length of the list, a null string is returned.
If more than one index parameter is given, then the behavior is defined recursively; the result of
lindex $list $index0 $index1 ...
or
lindex $list [list $index0 $index1 ...]
is indentical to that of
lindex [lindex $list $index0] $index1...
or, equivalently,
lindex [lindex $list $index0] [list $index1...]
(This specification does not constrain the implementation, which may be iterative, recursive, or even expanded inline.)
When an invalid index is given, an error of the form, bad index "invalid_index": must be integer or end?-integer?, where invalid_index is the first invalid index encountered, must be returned.
If the list argument is malformed, the error resulting from an attempt to convert the list argument to a list must be returned. This behaviour is unchanged from the current implementation.
Side Effects
- Whether the result of the lindex operation is successful, the underlying Tcl_Obj that represents the list argument may have its internal representation invalidated or changed to that of a list.
Discussion
Some attention must be paid to giving the lindex command adequate performance. In particular, the implementation should address the common case of
lindex $list $i
where $i is a "pure" integer (that is, one whose string representation has not yet been formed). If the above specification is followed naively, the flow will be as follows.
Since objc is three, objv[2] is expected to be a list. Since it is not, it must be converted from its string representation. It does not have one yet, so the string representation must be formed. Now the string representation is parsed as a list. An array (of length one) of Tcl_Obj pointers is allocated to hold the list in question, and a Tcl_Obj is allocated to hold the single element. Memory is allocated to hold the element's string representation. Now the lindex command converts the first element of the list to an index (in this case an integer).
This elaborate ballet of type shimmering requires converting the integer to a string and back again. It also requires four calls to ckalloc:
Allocate a buffer for the string representation.
Allocate the array of Tcl_Obj pointers for the list representation.
Allocate the Tcl_Obj that represents the first (and only) element of the list.
Allocate a buffer for the string representation of that element.
And at the end, the result is the same integer that was passed as a parameter originally.
To avoid all this overhead in the common case, the proposed implementation shall (in the case where objc==3)
Test whether objv[2] designates an object whose internal representation holds an integer. If so, simply use it as an index.
Test whether objv[2] designates an object whose internal representation holds a list. If so, perform the recursive extraction of indexed elements from sublists described above.
Form the string representation of objv[2] and test whether it is end or end- followed by an integer. If so, use it as an index.
Attempt to coerce objv[2] to an integer; if successful, use the result as an integer.
Attempt to coerce objv[2] to a list; if successful, use the result as an index list.
Report a malformed index argument; the indexList parameter is not a well-formed list.
This logic handles all the cases of singleton lists transparently; it is effectively a simple-minded type inference that optimizes away needless conversions.
Assuming that the related [33] is approved, this logic will most likely be combined with the identical logic required in that proposal for parsing index arguments to the lset command.
Comments
Don Porter dgp@users.sourceforge.net
I agree that it would be helpful to many programmers to provide a multi-dimensional array data structure that can be accessed in the manner described in this TIP. In the struct module of tcllib, several other data structures are being developed: graph, tree, queue, stack. I would support adding another data structure to that module that provides an interface like the one described in this TIP, with the intent that all of these helpful data structures find their way into the BI distribution.
I don't see any advantage to adding complexity to [lindex] as an alternative to development of a multi-dimensional array structure. Without a compelling advantage, I'm inclined against making [lindex] more complex. I like having Tcl's built-in commands provide primitive operations, and leave it to packages to combine the primitives into more useful, more complex resources.
This TIP should also consider how any changes to [lindex] mesh with the whole [listx] overhaul of Tcl's [list] command that has been discussed.
Dave Cuthbert dacut@kanga.org responds
Don makes a good point -- with a good set of data structures in tcllib, the need for this TIP is lessened or even eliminated. Nonetheless, I see this as a way of implementing the structures he describes. In other words, a more powerful primitive (which, in reality, adds fairly little complexity when measured in number of lines of code changed) would benefit these structures.
As for the [listx] overhaul, there are many competing proposals for the specification it is difficult to come up with a metric. In writing this TIP, I assumed a vacuum -- that is, a listx command would not be added to the core in the near future.
Donal K. Fellows fellowsd@cs.man.ac.uk points out
Although there is tcllib and [listx] to think about, they are certainly not reasons for rejecting this TIP out of hand. The availability of tcllib is not currently anything like universal (not stating whether this is a good, bad or ugly thing) and all the [listx] work will need its own TIP to make it into the core (you tend to have availability problems if it is an extension.) It is not as if the core is short of syntactic sugar right now (the [foreach] command is ample demonstration of this.)
Don Porter dgp@users.sourceforge.net follows up
I'll leave the discussion above in place so the history of this TIP is preserved, but I have withdrawn my objection.
There was quite a discussion on news:comp.lang.tcl about using lindex to return multiple arguments. For example:
% set list {a {b1 b2} c d e}
% lindex $list 1
b1 b2
% lindex $list 1 0
{b1 b2} a
% lindex $list {1 0}
b1
In other words, the list index arguments can, themselves, be lists. Only when the argument is a list would the "recursive selection" procedure of the TIP be used. For multiple arguments, the behaviour is akin to
lindex $list a b c ->
list [lindex $list a] [lindex $list b] [lindex $list c]
Summarised by Dave Cuthbert dacut@kanga.org
Donal K. Fellows fellowsd@cs.man.ac.uk points out
The problems with the above version of multiple indexing are that it loses the property that [lindex] always returns a single element (making writing robust code harder) and that it forces use of the [list] constructor a lot or inefficient type handling when some of the indices must be computed. Then there is the whole question of what happens when you have indexes that are lists of lists, which is a major can of worms.
Luckily, we could always put this sort of behaviour into a separate command (e.g. called [lselect]) which addresses at least the majority of my concerns, and which (in my opinion) need not even form part of this TIP.
Dave Cuthbert dacut@kanga.org adds
I intentionally left [lselect] out of the original TIP (and it is still not present in the 02-April-2001 version). As Donal points out, it is a major can of worms and, though there was general agreement on c.l.t that such a command would be useful, people had differing opinions on what form it should take.
Perhaps my view on TIPs is incorrect, but I try to include only sure-fire "yeah, we ought to have done that a few versions ago" items.
Notes on History of this TIP
This TIP was originally written by Dave Cuthbert dacut@kanga.org, but ownership has passed (beginning of April) to Kevin Kenny kennykb@acm.org.
This TIP underwent substantial revision in May of 2001, to add the syntax where all the index parameters could be grouped as a list rather than placed inline on the command line.
See Also
[33].
Copyright
This document has been placed in the public domain.