Author: Arjen Markus <arjen.markus@wldelft.nl>
Author: Donal K. Fellows <fellowsd@cs.man.ac.uk>
State: Withdrawn
Type: Project
Vote: Pending
Created: 04-Jul-2001
Post-History:
Keywords: documentation,automatic generation,HTML,reference
Tcl-Version: 8.0
Abstract
This TIP proposes the adoption of a standard documentation format for Tcl scripts and the implementation of a simple tool that will extract this documentation from the source code so that it may be turned into a programmer's guide. This is in essence akin to documentation tools like the well-known javadoc utility for Java programs and Eiffel's short utility.
Introduction
The style guide by Ray Johnson presents a documentation standard that is easy to use and is in fact adopted in the Tcl/Tk distribution. Other than this, the standard has not been enforced or encouraged. It is also not backed up by some tool (as far as I know) that can generate pretty looking documents from this. As a consequence, styles of documentation may vary widely and at times it is necessary to read the source code (looking for descriptions) to understand how to use the script.
The availability of such a tool may encourage people to use the standard, as the costs of generating the documentation are relatively low. The tool must accommodate for variations and therefore be flexible - for instance by providing customisable procedures to support the user's preferred header format.
The tool also needs to distinguish the types of output: in many cases HTML output is desirable to make it look pretty and provide hypertext facilities, in other cases it should provide plain text, formatted so that it can be read in any ordinary text editor or printed quickly.
Parallel to the development of such a tool, a standard or checklist should be assembled of what information programmer ought to provide, the version of Tcl/Tk, extensions that need to be present, what functionality is offered and so on.
Rationale
Automatic documentation generation has two goals: improving usability and improving maintainability. The first means: pleasant looking documentation, at low cost for the author, is easy to use. One can also avoid reading the source code. Further, it ensures homegeneously looking documentation.
Improving the maintainability is achieved by having more or less technical documentation near the code. There is no need for separate documents, something which enhances the risk of discrepancies. Remember the DRY principle: Don't Repeat Yourself.
What information
A user clearly needs different information than a maintainer. For the user it is important to know what functionality is provided, what other packages or extensions are needed, which (public) procedures are available and how to use them.
For the maintainer: having an overview of the source files helps finding the procedures. Part of this information can be extracted directly from the source (such as via inspection of the proc, package and namespace statements).
Formats
Use the format proposed by the style guide as a guideline (certainly for the reference implementation):
# pkg_compareExtension --
#
# Used internally by pkg_mkIndex to compare the extension of a file to
# a given extension. On Windows, it uses a case-insensitive comparison
# because the file system can be file insensitive.
#
# Arguments:
# fileName name of a file whose extension is compared
# ext (optional) The extension to compare against; you must
# provide the starting dot.
# Defaults to [info sharedlibextension]
#
# Results:
# Returns 1 if the extension matches, 0 otherwise
(This comes from the "package.tcl" script file that came with Tcl 8.3.1, it is consistent with the Tcl style guide by Ray Johnson)
Requirements
The requirements are simple to describe:
Implementation shall be in Tcl, to guarantee availability and portability.
The system shall be easy to extend to new documentation formats (Implementation note: small procedures that register the information piece by piece)
It shall be easy to extend to new output formats (Implementation note: small procedures that format the registered information)
It shall properly deal with organisational aspects: source file, package, namespace.
Summary of reactions
The replies on the first version of this TIP were quite positive: both Donal Porter and Cameron Laird think it is a good idea. Juan Gil gave a very extensive reaction, describing a more general framework that would eventually result in a system for generating all kinds of output from Tcl scripts, TIPs and so on.
To do him more justice, without repeating the entire document, he proposes the use of XML as an intermediate format holding the structure of the information. The advantage is the possibility to reuse all existing tools and (de facto) standards, notably DocBook, in this context.
Even though I share some of the enthousiasm of Juan, I am a bit awed by it: the original idea of this TIP is not so much creating a publication system, but rather an easy-to-use tool for automatically extracting useful information in a nice shape. Eventually it could develop into something of the kind Juan describes, but that should not be the first goal.
The technique for representing the information structure he proposes, is quite useable (and akin to the rendering process of TIPs):
Parse the script and store the pieces in "qualified lists".
Such qualified lists are an intermediate format, either in memory or stored in a suitable format on disk.
These lists and lists of lists are then passed to the output renderers.
The first problem to solve is then finding a suitable structure for the information we need to extract. This is the subject of the next section.
Will Duquette and Andreas Kupries mentioned the frequent use of specialised commands that introduce new commands (rather than a straightforward call to "proc"). This feature will have to be looked into, because if you only look for lines like "proc some-command { ... } {", you might well miss the essentials of such applications.
Information in a Tcl script
Tcl scripts are organised in three essentially unrelated ways:
Individual files contain the code
Programs consist of one or several files of code
Packages are used to identify code that belongs together
In practice these methods will be used in accord with each other, but there is no guarantee for instance, that a source file contains only one package and programs will probably quite often use more than one package.
On a smaller scale, the following items are of importance:
Procedures or commands defined inside some namespace (exported or not)
Variables local to the enclosing namespace or global to the application
For a user it will be important to know what a program or package has to offer and how to get this functionality:
A description of the program (with its command-line arguments, if any, the packages it uses, if they come separately)
A description of the package or packages (assuming it is properly installed, its name and version should be enough to load the scripts and binaries)
A description of each public procedure and a description of the arguments and the result, if any
A list of all other requirements (other packages or the Tcl version?)
For a maintainer of the code, additional information would include:
A list of all variables (local to the namespace and global) that are used in the various procedures
The contents of each source file (so where does each procedure live?)
Detailed maintenance issues that have been written down in comments in the code
A possible structure in which all this information can be stored and retrieved is sketched below:
program:
version
source files (list of)
packages (list of)
procedures (list of)
command-line arguments:
description of each option and the values (if any)
associated with it
description of other arguments (such as file names)
source files:
packages (list of)
procedures (list of)
packages:
package A:
description
version
requirements
Tcl-version
packages that are required
local and global variables:
variable D:
used by which procedures?
variable E:
used by which procedures?
...
procedures:
procedure F:
description
exported or not
arguments:
argument G:
description
argument H:
description
...
result:
description
procedure I:
...
Except for the descriptions, all these items can - at least in principle be extracted automatically. So, even though the programmer has been too lazy to describe his/her procedures in detail, some information can be retrieved about the use of global data for instance and the complexity of the argument lists.
Note that as far as the three methods of organisation is concerned, there is no attempt to define a practical relationship between them.
To refer to the example above:
procedure: pkg_compareExtension
description:
"Used internally by pkg_mkIndex to compare the extension of a file to
a given extension. On Windows, it uses a case-insensitive comparison
because the file system can be file insensitive."
arguments:
argument fileName:
description:
"name of a file whose extension is compared"
argument ext:
description:
"(optional) The extension to compare against; you must
provide the starting dot.
Defaults to [info sharedlibextension]"
result:
description:
"Returns 1 if the extension matches, 0 otherwise"
By adopting a tree structure to represent the information extracted from the source code, one can be as flexible as probably needed. For instance, suppose one would like to extract certain metrics, like the number of lines or the cyclometric complexity. This could then be an additional node in the subtree for the command procedure, besides the list of arguments, the result and the description.
Copyright
This document is placed in the public domain.