Author: Andreas Kupries <akupries@shaw.ca>
Author: Jean-Claude Wippler <jcw@equi4.com>
Author: Jeff Hobbs <jeffh@activestate.com>
Author: Don Porter <dgp@users.sourceforge.net>
Author: Larry W. Virden <lvirden@yahoo.com>
Author: Daniel A. Steffen <das@users.sourceforge.net>
Author: Don Porter <dgp@users.sf.net>
State: Final
Type: Project
Vote: Done
Created: 24-Mar-2004
Post-History:
Tcl-Version: 8.5
Abstract
This document describes a new mechanism for the handling of packages by the Tcl Core which differs from the existing system in important details and makes different trade-offs with regard to flexibility of package declarations and to access to the filesystem. This mechanism is called "Tcl Modules".
Background and Motivation
The current mechanism for locating and loading packages employed by the Tcl core is very flexible, but suffers from a number of drawbacks as well. These are at least partially the result of the flexibility, and thus not easily solved without giving up something.
One problem with the current mechanism is that it extensively searches the filesystem for packages, and that it has to actually read a file (pkgIndex.tcl) to get the full information for a prospective package. All of these operations take time. The fact that "index scripts" are able to extend the list of paths searched tends to heighten this cost as it forces rescans of the filesystem. Installations where directories in the auto_path are large or mounted from remote hosts are hit especially hard by this (network delays). All of this together causes a slow startup of tclsh and Tcl-based applications.
"Tcl Modules" on the other hand is designed with less flexibility in mind and to allow implementations to glean as much information as possible without having to perform lots of accesses to the filesystem.
Additional benefits of the proposed design are a simplified deployment of packages, akin to the way starkits made application deployment simple, and from that an easier implementation and management of repositories.
It does not come without penalties however.
The simplified design has no "index scripts". While this does away with extending the list of paths for searching, it also does away with the ability of packages to check preconditions, like the version of the currently executing Tcl interpreter. Dependencies of packages (in module form) on particular versions of Tcl have to be managed differently and outside of them.
"Tcl Modules" is defined to be an extension of the existing package mechanism and does not replace it. This means that any failure to find a package as a module has to cause a fall back to the regular package mechanism. It also sets a limit on how much of our goals we can reach: searching for packages which are not installed will stay relatively slow, and dominated by the filesystem scan of the regular search. This implies that "Tcl Modules" will be best suited in installations where the number of regular packages is low, and contained in a small part of the overall filesystem.
On the gripping hand, the only regular packages required will be packages supporting the virtual filesystems employed by modules (more on that later), so a transformation of a installation based on a set of regular packages to the form above is quite feasible.
Specification
Introduction
Modules are regular Tcl Packages, in a different guise. To ease explanations, first a summary of the existing mechanism:
Packages are identified through "pkgIndex.tcl" files and the "index script" they contain. These files are read and define the "provide script", which tells Tcl how to actually load the package. In other words, the provide script tells whether to use the "source" or "load" command, which file to specify as an argument to that command, etc. However as "pkgIndex.tcl" contains a regular tcl script, it can do more than that and actually influence the environment, i.e., the package search itself, in several ways:
* It may choose to not register the package if conditions for the package are not met, like being run by a too old version of Tcl.
* It may extend the list of paths used to search for packages. This implies that a package is able to modify the behaviour of the package search (usually extending the search) even before it is loaded, and even if it will not be loaded at all.
The above is very flexible, but comes at a price. The filesystem is not only searched, but files have to be read as well to build up the in-memory index of packages. And this is iterated if index files change/extend the list of paths to search.
Tcl Modules simplifies the above considerably, by cutting down on the number of indirections involved. It only searches for module files and records their location, but does not read them. The search is only performed when required, on a limited part of the filesystem. This makes locating and importing packages in module form easier and faster. The price is that packages in module form cannot prevent registration in an interpreter not of their choice, nor can they influence the package search itself before they are actually used.
The remainder of this document will cover the following topics
What constitutes a Tcl Module ?
How are they found ?
How are they indexed, i.e. entered into the package database ?
Module Definition
A Tcl Module is a Tcl Package contained in a single file, and no other files required by it. This file has to be sourceable. In other words, a Tcl Module is always imported via:
source module_file
The "load" command is not directly used. This restriction is not an actual limitation, as we may believe. Ever since 8.4 the Tcl source command reads only until the first ^Z character. This allows us to combine an arbitrary Tcl script with arbitrary binary data into one file, where the script processes the attached data in any it chooses to fully import and activate the package. Please read [190] "Implementation Choices for Tcl Modules" for more explanations of the various choices which are possible.
The name of a module file has to match the regular expression
([[:alpha:]_][:[:alnum:]_]*)-([[:digit:]].*)\.tm
The first capturing parentheses provides the name of the package, the second clause its version. In addition to matching the pattern, the extracted version number must not raise an error when used in the command
package vcompare $version 0
This additional check has several benefits. The regular expression pattern is a bit simpler, and the full version check is based on the official definition of version numbers used by the Tcl core itself.
Finding Modules
Remember the check for a valid module in last section, and notice that any filename matching this name pattern is going to be treated by the TM system as if it's a Tcl module, whether it really is or not. This means it's a bad idea for any non-Tcl module files that might match that pattern to end up in a directory where TM will be scanning. This suggests that the directory tree for storing Tcl modules ought to be something separate from other parts of the filesystem. This further implies that a new search path over just these separate storage areas would be better than Yet Another Use of $::auto_path.
Therefore: Modules are searched for in all directories listed in the result of the command "::tcl::tm::path list" (See also section 'API to "Tcl Modules"'). This is called the "Module path". Neither "auto_path" nor "tcl_pkgPath" are used.
All directories on the module path have to obey one restriction:
- For any two directories, neither is an ancestor directory of the other.
This is required to avoid ambiguities in package naming. If for example the two directories
foo/
foo/cool
were on the path a package named 'cool::ice' could be found via the names 'cool::ice' or 'ice', the latter potentially obscuring a package named 'ice', unqualified.
Before the search is started, the name of the requested package is translated into a partial path, using the following algorithm:
- All occurrences of '::' in the package name are replaced by the appropriate directory separator character for the platform we are on. On Unix, for example, this is '/'.
Example:
The requested package is encoding::base64. The generated partial path is
encoding/base64
After this translation the package is looked for in all module paths, by combining them one-by-one, first to last with the partial path to form a complete search pattern. The exact pattern and mechanism is left unspecified, giving the implementation freedom of choice as to what glob searches to perform, how much of them, and when.
Independent of that, the implemented algorithm has to reject all files where the filename does not match the regular expression given in the previous section. For the remaining files "provide scripts" are generated and added to the package ifneeded database.
The algorithm has to fall back to the previous unknown handler when none of the found module files satisfy the request. If the request was satisfied no fall-back is required.
Provide and Index Scripts
Packages in module form have no control over the "index" and "provide script"s entered into the package database for them. For a module file MF the "index script" is
package ifneeded PNAME PVERSION [list source MF]
and the "provide script" embedded in the above is
source MF
Both package name PNAME and package version PVERSION are extracted from the filename MF according to the definition below:
MF = /module_path/PNAME'-PVERSION.tm
Where PNAME' **is the partial path of the module as defined in section 'Finding Modules' before, and translated into **PNAME by changing all directory separators to '::', and module_path is the path (from the list of paths to search) that we found the module file under.
Note that we are here creating a connection between package names and paths. Tcl is case-sensitive when it comes to comparing package names, but there are filesystems which are not, like NTFS. Luckily these filesystems do store the case of the name, despite not using the information when comparing.
Given the above we allow the names for packages in Tcl modules to have mixed-case, but also require that there are no collisions when comparing names in a case-insensitive manner. In other words, if a package 'Foo' is deployed in the form of a Tcl Module, packages like 'foo', 'fOo', etc. are not allowed anymore.
Regular packages have no problem with the names of their files, as their entry point has a standard name ("pkgIndex.tcl") and its contents can be adjusted according to the filesystem they are stored in.
API to "Tcl Modules"
"Tcl Modules" is implemented in Tcl, as a new handler command for package unknown. This command calls the previously installed handler when its own search fails, thereby ensuring proper fall-back to the regular package search.
All code and data structures implementing "Tcl Modules" reside in the namespace "::tcl::tm".
A namespace variable holds the list of paths to search for modules, but is not officially exported. All access to this variable is done through the following public commands:
::tcl::tm::path add PATH
The path is added at the head to the list of module paths.
The command enforces the restriction that no path may be an ancestor directory of any other path on the list. If the new path violates this restriction an error will be raised.
If the path is already present as is, no error will be raised and no action will be taken.
Paths are searched in the order of their appearance in the list. As they are added to the front of the list they are searched in reverse order of addition. In other words, the paths added last are looked at first.
::tcl::tm::path remove PATH
Removes the path from the list of module paths. The command is silently ignored if the path is not on the list.
::tcl::tm::path list
Returns a list containing all registered module paths, in the order that they are searched for modules.
::tcl::tm::roots PATH_LIST
Similar to path add, and layered on top of it. This command takes a list of paths, extends each with tclX/site-tcl, and tclX/X.y, for major version X of the tcl interpreter and minor version y less than or equal to the minor version of the interpreter, and adds the resulting set of paths to the list of paths to search.
This command is used internally by the system to set up the system-specific default paths. See section Defaults for their definition, and that their structure matches what this command does.
The command has been exposed to allow a buildsystem to define additional root paths beyond those defined by this document.
We do not provide APIs for rescanning directories, clearing internal state and such. The official interface to this functionality is "package forget" and special interfaces are neither required nor desirable.
Discussion
Restriction to "source"
This has already been discussed in the specification above.
For more discussion I again refer to [190] "Implementation Choices for Tcl Modules" which explains the various implementation choices in much more detail.
Preconditions
It has already been mentioned in section 'Background and Motivation' that preconditions in "index scripts" are lost, one of the penalties of the simplified scheme specified here.
Their existence was most important to installations with multiple versions of Tcl coexisting with each other as they could share the directory hierarchy containing packages between the various Tcl cores. This is not possible anymore, at least not in a simple manner.
For the majority of installations however, i.e. those without only one version of Tcl installed, or controlled environments like the inside of starkits and starpacks, this loss is irrelevant and of no consequence.
For more discussion please see [191] "Managing Tcl Package and Modules in a Multi-Version Environment" which explains the various choices a sysadmin has in much more detail.
Package Metadata
An area possibly made harder by Tcl Modules is the storage and query of package metadata. [59] was one way of handling such information, by storing them in the binary library of packages which have such. Another approach was to store them in the package index script, using a hypothetical package about command.
The latter approach has the definite advantage that it was possible to query the database of metadata for a particular package without having to actually load said package, as a load may fail if the Tcl shell used to query the database does not fulfil the preconditions for that package.
Both approaches listed above assume that it makes sense to query the database of metadata for all installed packages from a plain Tcl shell. In other words, to use the standard Tcl shell also as the tool to directly manage an installation.
It is possible to extend the proposal made in this document to handle metadata as well. We already reserved the namespace ::tcl::tm for use by us, so it is no problem to extend the public API with commands to locate all installed packages, their metadata, and to perform queries based on this. This will require an additional specification as to how metadata is stored in/by Tcl Modules, and it will have to be understood that these extended management operations can take considerably more time than a package require, as they will have to scan all defined search paths and all their sub directories for Tcl Modules, and have to extract the metadata itself as well.
Deployment
The fact that a Tcl Module consists only of a single file makes its deployment quite easy. We only have to ensure correct placement in one of the searched directories when installing it locally, but nothing more.
Regarding the usage of Tcl Modules in a wrapped application, please see [190] "Implementation Choices for Tcl Modules". This is highly dependent on the implementation chosen for a specific Tcl Module and thus not discussed here, but in the referred document.
Package Repositories
At a very basic level, the physical storage, any directory tree containing properly placed files for a number of modules can serve as a package repository for the modules in it. In other words, from that point of view an installation is virtually indistinguishable from a repository, and their creation and maintenance is very easy
Note however that the higher levels of a repository, like indexing package metadata in general, or dependence tracking in particular, licensing, documentation, etc. are not addressed here and by this.
This requires standards for package metadata, format and content, topics with which this document will not deal.
Defaults
The default list of paths on the module path is computed by a tclsh as follows, where X is the major version of the Tcl interpreter and y is less than or equal to the minor version of the Tcl interpreter.
System specific paths
* file normalize [info library]/../tclX/X.y
In other words, the interpreter will look into a directory specified by its major version and whose minor versions are less than or equal to the minor version of the interpreter.
Example: For Tcl 8.4 the paths searched are
* [info library]/../tcl8/8.4
* [info library]/../tcl8/8.3
* [info library]/../tcl8/8.2
* [info library]/../tcl8/8.1
* [info library]/../tcl8/8.0
This definition assumes that a package defined for Tcl X.y can also be used by all interpreters which have the same major number X and a minor number greater than y.
* file normalize EXEC/tclX/X.y
Where EXEC is [file normalize [info nameofexecutable]/../lib] or [file normalize [::tcl::pkgconfig get libdir,runtime]]
This sets of paths is handled equivalently to the set coming before, except that it is anchored in EXEC_PREFIX. For a build with PREFIX = EXEC_PREFIX the two sets are identical.
Site specific paths.
* file normalize [info library]/../tclX/site-tcl
User specific paths.
* $::env(TCLX.y_TM_PATH)
A list of paths, separated by either : (Unix) or ; (Windows). This is user and site specific as this environment variable can be set not only by the user's profile, but by system configuration scripts as well.
These paths are seen and therefore shared by all Tcl shells in the $::env(PATH) of the user.
Note that X and y follow the general rules set out above. In other words, Tcl 8.4, for example, will look at these 5 environment variables
* $::env(TCL8.4_TM_PATH)
* $::env(TCL8.3_TM_PATH)
* $::env(TCL8.2_TM_PATH)
* $::env(TCL8.1_TM_PATH)
* $::env(TCL8.0_TM_PATH)
All the default paths are added to the module path, even those paths which do not exist. Non-existent paths are filtered out during actual searches. This enables a user to create one of the paths searched when needed and all running applications will automatically pick up any modules placed in them.
The paths are added in the order as they are listed above, and for lists of paths defined by an environment variable in the order they are found in the variable.
Installation
The installation of a Tcl module for a particular interpreter is basically done like this:
#! /path/to/chosen/tclsh
# First argument is the name of the module.
# Second argument is the base filename
set mpaths [::tcl::tm::path list]
... remove all paths the user has no write permissions for.
... throw an error if there are no paths left.
... provide the user with some UI if more than one path is left
... so that she can select the path to use.
set selmpath [ui_select $mpaths]
file copy [lindex $argv 1] \
[file join $selmpath \
[file dirname [string map {:: /} \
[lindex $argv 0]]]]
Glossary
The following terms and definitions are used throughout the document
index script
A script used to index a package, or not. Usually contained in a file named "pkgIndex.tcl". Can check preconditions for a package and contains package specific code for setting up the package specific provide script.
provide script
This is a package specific script and tells Tcl exactly how to import it. In the existing package system it is generated and registered by the index script. Tcl Modules on the other hand generates it based on information gleaned from filenames.
Reference Implementation
A reference implementation is available in Patch 942881 http://sf.net/tracker/?func=detail&aid=942881&group_id=10894&atid=310894
Questions
Comments
[ Add comments on the document here ]
A feature asked for during discussion is to allow a directory as a Tcl Module. I am opposed to this, because behind Tcl Modules is the same idea/vision as for starkit and starpacks, namely that of deploying something in the simplest possible manner, without any overhead. Sometimes I call Tcl Modules package kits, short pakits (and then twist that then spoken into 'packet' :). http://groups.google.ca/groups?hl=en&lr=&ie=UTF-8&frame=right&th=78764d499cc4e4a&seekm=c6tshf030c6%40news4.newsguy.com#link19
Copyright
This document has been placed in the public domain.