TIP 230: Tcl Channel Transformation Reflection API

Login
Bounty program for improvements to Tcl and certain Tcl packages.
Author:         Andreas Kupries <andreas_kupries@users.sf.net>
Author:         Andreas Kupries <akupries@shaw.ca>
Author:         Andreas Kupries <andreask@activestate.com>
State:          Final
Type:           Project
Vote:           Done
Created:        02-Nov-2004
Post-History:   
Tcl-Version:    8.6

Abstract

This document describes an API which reflects the Channel Transformation API of the core I/O system up into the Tcl level, for the implementation of channel transformations in Tcl. It is built on top of [208] ('Add a chan command') and also an independent companion to [219] ('Tcl Channel Reflection API') and [228] ('Tcl Filesystem Reflection API'). As the latter TIPs bring the ability of writing channel drivers and filesystems in Tcl itself so this TIP provides the facilities for the implementation of new channel transformations in Tcl. This document specifies version 1 of the transformation reflection API.

Background and Motivation

The purpose of this and the other reflection TIPs is to provide all the facilities required for the creation and usage of wrapped files (i.e. virtual filesystems attached to executables and binary libraries) within the core.

While it is possible to implement and place all the proposed reflectivity in separate and external packages, this however means that the core itself cannot make use of wrapping technology and virtual filesystems to encapsulate and attach its own data and library files to itself. Something which is desirable as it can make the deployment and embedding of the core easier, due to having less files to deal with, and a higher degree of self-containment.

One possible application of a completely self-contained core library would be, for example, the Tcl browser plugin.

It is also possible to create a special purpose filesystem and channel driver in the core for this type of thing, but it is my belief that the general purpose framework specified here is a better solution as it will also give users of the core the freedom to experiment with their own ideas, instead of constraining them to what we managed to envision.

Another use for reflected transformations is as a helper for testing the generic I/O layer of Tcl, by creating transformations which forcibly return errors, bogus data, and the like.

An implementation of this TIP is already present in the core, as part of the special test commands for exercising the various internal of the Tcl core during test. This TIP asks to make that mechanism publicly available to script and package authors, with a bit of cleanup regarding the Tcl level API. The roots of that mechanism can be traced back to the Trf package which implemented channel transformations first and provides a similar command.

Transform Management API Specification

The Tcl level API consists of two new subcommands added to the ensemble command chan' specified by [208]. Both subcommands are completely generic, i.e. they can be applied to any type of channel, without restrictions. There is no C API to specify. The Tcl core already has a standard API for the creation of channel transformations from the C level.

The push Subcommand

chan push channel cmdprefix

This subcommand creates a new script level transformation using the command prefix cmdprefix as its handler and attaches it to the specified channel. The new transformation is always added on top of any other transformations which may be present.

The handle of the new transformation is returned as the result of the command. This handle is the first argument given to all handler methods, to allow the code to distinguish between the various instances of the same transformation, if necessary.

Use the new chan pop command to remove the transformation. See below.

The API this handler has to provide is specified below, in the section "Command Handler API Specification".

We have chosen to use late-binding of the handler command. See the section "Early- versus Late-Binding of the Handler Command" for more detailed explanations.

The command invokes the handler method initialize to determine the supported methods before it pushes the transformation. It will throw an error if a read-only transformation is pushed on a write-only channel, or vice versa. In general if the r/w-mode of transformation and channel together cause the result to be neither readable nor writable.

The pop Subcommand

chan pop channel

This subcommand removes the topmost transformation from the channel, if there is any. This command is equivalent to the builtin command close if the channel had no transformations added to it.

Note: If the removal of the topmost transformation uncovers inactive transformations (See section "Interaction with Threads and Other Interpreters"), then these will be removed now as well.

Command Handler API Specification

The Tcl-level handler command for a reflected channel transformation is an ensemble that has to support the following subcommands, as listed below. Note that the term ensemble is used to generically describe all command (prefixes) which are able to process subcommands. This TIP is not tied to the recently introduced 'namespace ensemble's (though they may be used to implement such handlers.)

The initialize Subcommand

handler initialize handle mode

This method is called first, and then never again (for the given handle). Its responsibility is to initialize all parts of the transformation at the Tcl level. The MODE is a list containing any of read and write.

  • write contained in mode implies that the channel is writable.

  • read contained in mode implies that the channel is readable.

The return value of the method has to be a list containing the names of all methods which are supported by this handler. Any error thrown by the method will prevent the creation of the transformation. The thrown error will appear as error thrown by chan push.

The current version is 1.

This method has no equivalent at the C level.

The finalize Subcommand

handler finalize handle

This method is called last for the given handle, just before the destruction of the C level data structures. It is now its responsibility to clean up all parts of the transformation at the Tcl level.

Any result returned by the method will be ignored. The same is true for errors thrown by the method.

This method has no equivalent at the C level.

The flush Subcommand

handler flush handle

This method is called whenever data in the transformation 'write' buffer has to be forced downward, i.e. towards the base channel. The result returned by the method is taken as the binary data to write to the transformation below the current transformation. This can be the base channel as well.

In other words, when this method is called the transformation cannot defer the actual transformation operation anymore and has to transform all data waiting in its internal write buffers and return the result of that action.

The method is optional. However if this method is supported then write has to be supported as well. The reverse is not true.

The write Subcommand

handler write handle buffer

This method is optional. If it is not present it means that the transformation does not support writing, and the channel it was pushed on becomes non-writable as well.

It will be called whenever the user, or a transformation above this transformation writes data downward. The buffer contains the binary data which has been written to us. It is the responsibility of this method to actually transform the data.

The result returned by the method is taken as the binary data to write to the transformation below this transformation. This can be the base channel as well. Note that the result is allowed to be empty, or less than the data we got. The transformation is not required to transform everything which was written to it right now. It is allowed to store this data in internal buffers and to defer the actual transformation until it has more data.

The drain Subcommand

handler drain handle

This method is called whenever data in the transformation input (i.e. read) buffer has to be forced upward, i.e. towards the user, i.e. the script. The result returned by the method is taken as the binary data to push upward to the transformation above this transformation. This can be the script as well.

In other words, when this method is called the transformation cannot defer the actual transformation operation anymore and has to transform all data waiting in its internal read buffers and return the result of that action.

The method is optional. However if this method is supported then read has to be supported as well. The reverse is not true.

The read Subcommand

handler read handle buffer

This method is optional. If it is not present it means that the transformation does not support reading, and the channel it was pushed on becomes non-readable as well.

It is called whenever the base channel, or a transformation below this transformation pushes data upward. The buffer contains the binary data which has been given to us from below. It is the responsibility of this method to actually transform data. The result returned by the method is taken as the binary data to push further upward to the transformation above this transformation. This can be the user, i.e. the script as well.

Note that the result is allowed to be empty, or even less than the data we got. The transformation is not required to transform everything given to it right now. It is allowed to store incoming data in internal buffers and to defer the actual transformation until it has more data.

The limit? Subcommand

handler limit? handle

This method is optional. If it is not present it means that the transformation is allowed to read ahead as much as it likes.

This method is called during input processing and allows the Tcl level part of the transformation to restrict the number of bytes read from the downward transformation or base channel before its method read is called with the resulting buffer.

The result of the method has to be an integer number not equal to zero. A negative result signals that the transformation allows the I/O system to read an unlimited number of bytes. A positive number on the other hand is interpreted as the maximum number of bytes the I/O system is allowed to read from the downward transformation or base channel.

This method is necessary for transformations where the data is bounded at the end in some way. In that case the transformation has to prevent the system from reading beyond the boundary as otherwise data behind it will be given to the transformation and then lost when the transformation is removed. This is a limitation of the current I/O core as it does not allow a transformation to push unused data back into the I/O core when the transformation is removed from the channel.

Fixing this limitation requires a separate TIP, as either the public API of the I/O core has to be extended, or the public structure of channel drivers.

The clear Subcommand

handler clear handle

This method is called to clear any data stored in the internal input buffers of the transformation. This happens only when the user seeks the channel the transformation is attached to.

Any result returned by the method is ignored.

The method is optional. I.e. a transformation not having any internal buffers to clear can leave out its implementation.

Hardwired Behavior of Reflected Transformations

Not all functions of the channel driver implementing the reflected transformation are directly reflected into the Tcl level. Their behavior is hardwired in the C level implementing the reflection and specified now.

BlockModeProc: Records the chosen blocking mode in the C-level data structures. This influences the low-level write and read behavior. However the Tcl level is shielded from this, so a handler method is not required.

CloseProc: Invokes the methods drain and flush to clean up any buffers managed by the Tcl level of the transformation, and then finalize. drain is not called if a previous call to it has not been invalidated yet.

InputProc: Tries to satisfies the incoming read request from the input result buffer first. If that is not enough it invokes the limit? method to establish the current read limit, reads data from the downward transformation or base channel per the limit and feeds the data it got to the Tcl level of the transformation, via an invocation of the method read. The result of that call is added to the input result buffer and used to further satisfy the read request.

If the channel is blocking the system will iterate until the request is fulfilled completely or EOF has been signaled from below.

In non-blocking mode however the loop will stop if either the request was fulfilled completely, or if we would block. Note that reaching EOF in this situation causes a flush of the Tcl side input buffers via an invocation of the method drain. Otherwise the data stored in them would be lost when the transformation is removed or the channel closed completely.

OutputProc: Simply forwards the written buffer to the method write for processing and then writes any returned result to the transformation or base channel below.

SeekProc: Recognizes requests made by tell and passes them down without doing anything else. The result generated by the base channel is then passed back up unchanged.

Any other request causes it to flush all write buffers on the Tcl side via an invocation of the method flush, and clear all input buffers on the Tcl side via an invocation of the method clear before passing the request down. Note that the calls mentioned above are not made if the channel is not writable, or not readable. Further note that the results of the flush are discarded, not written, as they would otherwise move the current location and throw off our expectations regarding where we are now and will end up after the seek.

This implements the most simple seek behavior as it was found in the very earliest incarnation of the transform reflection functionality, i.e. passing down any seek request unchanged until the base channel is reached. This also means that the internal state of transformations is not adjusted after a seek and may generate bogus results.

The reflection (actually any transformation) provided by the package Trf has much more complex seek behavior. This was left out for now to keep the scope of this TIP relatively focused. A follow-up TIP can be written for a deeper discussion of the interaction between seeking and any type of transformation, not only reflected ones.

See http://www.oche.de/~akupries/soft/trf/trf\_seek.html for the description of the complex seeking model used by the transformation reflection in Trf.

SetOptionProc, GetOptionProc, GetFileHandleProc: The calls are passed down without change, any results are passed back to the caller, again without any changes.

Transformations have no options which can be configured when they are attached to a channel, hence the pass-through and no handler method at the Tcl level.

WatchProc: Remembers the interest mask provided by the caller and uses this to manage the internal timer used to generate fake file events when data is buffered.

NotifyProc: Manages the internal timer. Will pass the incoming mask of triggered events upward without change.

Interaction with Threads and Other Interpreters

Adding a reflected transformation to a channel does not create any restrictions on the sharing of the channel with other interpreters, nor with moving the channel to different interpreters or threads.

Like for reflected channels (See [219] ('Tcl Channel Reflection API')), the implementation ensures that the handler command is always executed in the original interpreter and thread. The latter is done by posting specialized events to the original thread, essentially forwarding driver invocations to the correct thread.

When a thread or interpreter is deleted all transformations it owns are deleted as far as possible, and any remnants are marked as dead. The latter occurs only if the channel using the deleted transformation is outside of the deleted thread or interpreter. Such channels will throw errors when accessed, until the offending transformation is removed from them via chan pop (Multiple pop's will be necessary if the deleted transformation sits in the middle of a stack).

Interaction with Safe Interpreters

The new subcommands push and pop of chan are both safe and therefore made accessible to safe interpreters.

While pushing a transformation arranges for the execution of code this code is always executed within the safe interpreter, even if the channel was moved (See previous section).

That the data flowing through the channel is modified by the transformation is no problem either, because to attach the transformation to the channel it has to have been given to the safe interpreter in the beginning, in other words, the interpreter doing this already trusted the safe interpreter in some way, and the fact that we can now add a transformation does not change this.

Equivalent reasoning applies if the channel was created by the safe interpreter and then shared/moved into the trusted interpreter. The transformation has no effect on the trust already given to the safe interpreter through the share/move operation.

The same holds for the subcommand pop. If the safe interpreter can execute it on a channel it has the channel already in its possession, either because it created the channel, or because the channel was shared/moved into it.

Event processing

It is specified that reflected transformations do not support any type of user-visible event handling.

The only event handling done is the invisible passing of current interest from higher to lower layers, and the (again invisible) behaviour needed to drive the flushing of transformation buffers to higher layers. See the descriptions of WatchProc and NotifyProc in the section Hardwired Behavior of Reflected Transformations.

This is different from the previous revisions of this TIP which included language and definitions to support event processing by transformations up to and including revision 1.11, i.e. allowing transformations to register interest in events, and then process them.

The main example use case for this was that it would give a transformation the ability to initialize itself with data coming from the channel (like a key exchange) before switching to the plain transformative behaviour. While working on the design of how to support this on the Tcl script level I ran into lots of complicated edge cases and convolutions.

Based on this I have now come to the belief that this has been a misfeature from the beginning and that the use case this was based on (see above) is an example of comingling concerns which should be separate, i.e an example of what should not be done.

The transformation's concern is transforming data, nothing else. Any parameters needed for this, like keys, seeds, etc. should come from the outside, at the time it is pushed on the channel stack. Initializations like key exchanges, cipher negotiations, etc. should and are not be a concern of the transformation. They should and can be done before the transformation is configured and pushed.

This last means that the removal of user-visible event handling support is by no means a restriction on the type of things we can do. Even so, should we find other use cases for event handling by transformations we can always write another TIP within which we specify how to extend reflected transformations with that feature.

Early- versus Late-Binding of the Handler Command

We have two principal methods for using the handler command. These are called early- and late-binding.

Early-binding means that the command implementation to use is determined at the time of the creation of the channel, i.e. when chan push is executed, before any methods are called. Afterward it cannot change. The result of the command resolution is stored internally and used until the channel is destroyed. Renaming the handler command has no effect. In other words, the system will automatically call the command under the new name. The destruction of the handler command is intercepted and causes the channel to close as well.

Late-binding means that the handler command is stored internally essentially as a string, and this string is mapped to the implementation to use for each and every call to a method of the handler. Renaming the command, or destroying it means that the next call of a handler method will fail, causing the higher level channel command to fail as well. Depending on the method the error message may not be able to explain the reason of that failure.

Another problem with the late-binding approach is that the context for the resolution of the command name has to be specified explicitly to avoid problems with relative names. Early-binding resolves once, in the context of the chan create. Late-binding performs resolution anywhere where channel commands like puts, gets, etc. are called, i.e. in a random context. To prevent problems with different commands of the same name in several namespace it becomes necessary to force the usage of a specific fixed context for the resolution.

Note that moving a different command into place after renaming the original handler allows the Tcl level to change the implementation dynamically at runtime. This however is not really an advantage over early-binding as the early-bound command can be written such that it delegates to the actual implementation, and that can then be changed dynamically as well.

However, despite all this late binding is so far the method of choice for the implementation of callbacks, be they in Tcl, or Tk; and has been chosen for the reflection as well.

The context for all handler method invokations is the global scope.

Limitations

The method limit? is required to limit reading from below to prevent reading over transformation specific boundaries. This is a direct consequence of not being able to push unused data back into the I/O core when a transformation is removed.

The reflection implements the very simple seek behavior found in the earliest incarnation of this functionality, i.e. passing down any seek request unchanged until the base channel is reached. This also means that the internal state of transformations is not adjusted after a seek and may generate bogus results. The reflection (actually any transformation) provided by the package Trf has much more complex seek behavior. This was left out for now to keep the scope of this TIP relatively focused. A follow-up TIP can be written for a deeper discussion of the interaction between seeking and any type of transformation, not only reflected ones.

See http://www.oche.de/~akupries/soft/trf/trf\_seek.html for the description of the complex seeking model used by the transformation reflection in Trf.

Miscellanea

The transform reflection API reserves the driver type "tclrtransform" for itself, and uses it to detect its own transformations. Usage of this driver type by other transformations is not allowed.

Examples

Transformation Implementations

A simple way of implementing new transformations is to use any of the various object systems for Tcl. Create a class for the transformation. chan push the transformation in the constructor for new objects and store the transformation handle. Make the new object the command handler for the transformation. This automatically translates the sub commands for the command handler into object methods. Implement the various methods required. When the object is deleted deactivate the transformation, and delete the object when the channel announces that the transformation has been chan popped. This part is a bit tricky, flags have to be used to break the potential cycle.

Another possibility is to implement the command handler as a regular command, together with a creation command wrapping around chan push and a backend which keeps track of all handles created by it and their state, associated data, etc.

Possible Transformations

  • Identity

  • Gathering statistics about the channel. The simplest would be to count bytes, for example.

  • Divert a copy of all data to a separate channel. In other words, spying on the channel.

  • Give a channel which normally cannot seek backward the ability to do so through buffering (a limited amount of) the data read from it.

  • Limit reading from a channel to a specific finite number of characters (This can be done via a suitable implementation of the method "limit?" in the transformation handler).

Reference Implementation

A variant implementation of this TIP is already present in the core, as part of the special test commands for exercising the various internal of the Tcl core during test.

The relevant files are

  • "tclIOGT.c" (base implementation) and

  • "tclTest.c" (command interface).

The command interface specified here is different from the current interface.

 test transform channel -command command

versus

 chan push channel cmdprefix
 chan pop

The handler methods are different as well. This TIP consolidated a number of methods, gave them better names and removed unnecessary arguments. It also removed some limitations.

The actual reference implementation is provided at SourceForge http://sourceforge.net/support/tracker.php?aid=1163274 .

Comments

[ Add comments on the document here ]

Copyright

This document has been placed in the public domain.

History