TIP 430: Add basic ZIP archive support to Tcl

Login
Bounty program for improvements to Tcl and certain Tcl packages.
Author:         Sean Woods <yoda@etoyoc.com>
Author:         Donal Fellows <donal.k.fellows@manchester.ac.uk>
Author:         Poor Yorick <tk.tcl.tip@pooryorick.com>
Author:         Harald Oehlmann <oehhar@users.sourceforge.net>
State:          Draft
Type:           Project
Vote:           Pending
Created:        03-Sep-2014
Post-History:
Keywords:       virtual filesystem,zip,tclkit,boot,bootstrap
Tcl-Version:    8.7

Abstract

This proposal will add basic support for mounting zip archive files as virtual filesystems to the Tcl core.

Target Tcl-Version

This TIP targets TCL Version 8.7

Rationale

Tcl/Tk relies on the presence of a file system containing Tcl scripts for bootstrapping the interpreter. When dealing with code packed in a self-contained executable, or a dynamic library, a chicken-and-egg problem arises when developers try to provide this bootstrap from their attached VFS with extensions like TclVfs. TclVfs runs in the Tcl interpreter. The interpreter needs init.tcl, which would mean that the filesystem containing init.tcl is not present until after TclVfs mounts it, yet that mount cannot happen until after init.tcl has been loaded. Bootstrap filesystem mounts require built-in support for the filesystem that they use.

With the inclusion of Zlib in the core (starting with 8.6, [234]), all that is required to implement a zip file system based VFS is to add a C-level VFS implementation to decode the zip archive format. Thus: this project.

Note that we are prioritizing the zip archive format also because it is practical to generate the files without a Tcl installation being present; it is a format with widespread OS support. This makes it much easier to bootstrap a build of Tcl that uses it without requiring a native build of tclsh to be present.

Specification

There shall be new ensemble zipfs added to tcl. That ensemble will contain several commands including:

  • zipfs canonical filename

    Returns the a string representing where filename would be located within zipfs.

  • zipfs exists filename

    Returns true if a file exists in zipfs. Unlike file exists this command is safe to run in a safe interp, because it confers no access to the local file system.

  • zipfs mount ?archive? ?mountpoint?

    Mounts the ZIP file archive at the location given by mountpoint, which will default to zipfs:/archive if absent. With no arguments this command describes all current mounts, returning a list of pairs.

  • zipfs root

    Return the root mount point for Zipfs file systems. On windows this returns zipfs:/. On all other platforms this returns //zipfs:/

  • zipfs tcl_library

    Search the current executable, the tcl dynamic library, and the local file system for a zipfs file system containing Tcl's init.tcl file. Returns null if none was found.

  • zipfs unmount archive

    Unmounts the ZIP file archive, which must have been previously mounted.

Outside of a save interpreter, the following additional commands will be available:

  • zipfs lmkimg outfile inlist ?strip? ?password? ?infile?

    Generate a zip archive (outfile) from a list of files (inlist), as a self extracting executable appended to a bare executable (infile).

    If strip is given, that string will be removed from the front of all files before generating their names within the archive.

    If password is given, the file will be encrypted with that passphrase

  • zipfs lmkzip outfile inlist ?strip? ?password?

    Generate a zip archive (outfile) from a list of files (inlist).

  • zipfs mkzip outfile indir ?strip? ?password?

    Generate a zip archive (outfile) from the contents of a directory (indir)

  • zipfs mkimg outfile indir ?strip? ?password? ?infile?

    Generate a zip archive (outfile) from the contents of a directory (indir), as a self extracting executable appended to a bare executable (infile).

VFS Mount Point

On virtually all platforms tcl supports (Unix, Windows) ZipFs will mount all archives under //zipfs:/. Some operating systems (past or future) may have a special meaning for this style path. To that end, it may be changed to address the needs of the specific environment. Which root is being used for the current platform can be accessed via a call to zipfs root. For the remainder of this document, references to //zipfs:/ are also intended to referred to whatever the prefix designated by ZIPFS_ROOT actually is.

Volumes may be mounted at any point under ZIPFS_ROOT, and if a mount point does not start with ZIPFS_ROOT the path will be considered relative to ZIPFS_ROOT. This conventions avoids some confusing interactions between file normalize and glob that differ between Windows and Unix and make building global paths either hop volumes or interact with the native file system.

Having a fixed mount point breaks from the tradition of mounting volumes under / or info nameofexecutable that other zipfs implementations use. However, if a kit builder wishes to retain that capability, all that is required is to load their own zipfs implementation using the conventional shims provided for kit building. The function names for the core implementation have been modified to not conflict with zipfs implementations that are out in the wild.

Generating Task Executables Tclsh/Wish

If tclsh/wish detect that the executable has a zip archive attached, the executable will be mounted as ZIPFS_ROOT/app. If ZIPFS_ROOT/app/main.tcl exists, that file is marked set the shell's startup script. If ZIPFS_ROOT/app/tcl_library/ exists, it will be searched for init.tcl.

The way to produce an executable will be as follows (Assuming the source for the application is at ~/myapp/src:

From Tcl:

> zipfs mkimg ~/bin/myapp.exe ~/myapp/src ~/myapp/src ~/bin/tclsh87.exe

From Unix:

> cd ~/myapp/src
> zip -r ~/myapp.zip .
> cd ..
> cp ~/bin/tclsh87.exe myapp.exe
> cat myapp.zip >> myapp.exe

First argument handling for Tclsh/Wish

If the first argument to Tclsh or Wish is detected to be a zipfile, that file will be mounted as ZIPFS_ROOT*/app. If **ZIPFS_ROOT/app/main.tcl exists, that file is marked set the shell's startup script. If ZIPFS_ROOT/app/tcl_library exists, it will be searched for init.tcl.

New Tclsh features for TEA

To assist in packaging extensions, tclsh will take on a new command install. If install is the first argument, set subsequent arguments are passed to a new file in library install.tcl.

tclsh install with no arguments is designed to return immediately with a normal return code, thus making it easy to test if a tclsh is tip430 Savvy but running in autoconf:

> AS_IF([$TCLSH_PROG install],[
>  ZIP_PROG=${TCLSH_PROG}
>  ZIP_PROG_OPTIONS="install mkzip"
>  ZIP_PROG_VFSSEARCH="."
>  AC_MSG_RESULT([Can use Native Tclsh for Zip encoding])
> ])

This tip only defines 2 function for install:

  • tclsh install mkzip.

    This command is a passthrough to the zipfs mkzip command, and allows tclsh to operate as zip encoder from make.

  • tclsh install mkimg.

    This command is a passthrough to the zipfs mkimg command, and allows tclsh to operate as zip encoder from make.

  • tclsh install copyDir source destination

    This command will recursively copy the file structure of source to destination

  • tclsh install installDir source destination

    This command will recursively copy the file structure of source to destination, deleting destination if it already exists, and marking all files copies as read-only.

  • tclsh install pkgindex_path path ?path...?

    Index all of the paths specified and generate a script that can be sourced to feed all of the package ifneeded statements to an interpreter in one shot. Useful for indexing VFS file systems

    example:

    tclsh install copyDir ~/myapp/src myapp.vfs tclsh install pkgindex_path myapp.vfs > myapp.vfs/pkgIndex.tcl

Package loading

Calls to tcl_findLibrary will now search through loaded packages to see if the dynamic library for the package in question has an attached zip file system. If that file system exists, it is mounted to ZIPFS_ROOT/lib/PGKNAME, and that mount point is added to the list of directories to search.

Implementation

This work is largely adapted Richard Hipp's work on Tcl As One Big Executable (TOBE). The concept has been modernized, somewhat, as well as heavily influenced by improvements made to it through the FreeWrap and Androwish projects. That implementation consists of one C file (tclZipvfs.c). I have also prepared a set of kit-like behaviors for the core to express when tclAppInit.c is not compiled with a TCL_LOCAL_MAIN_HOOK defined. Those behaviors reside in the TclZipfs_AppHook() function.

This work is checked in as the "core_zip_vfs" branch on both Tcl and Tk.

Modifications to auto.tcl

auto.tcl now has rules for scanning DLLs for zip file systems.

Modifications to minizip.c

minizip has been modified to be able to handle recursive directory arguments.

Modifications to tclAppInit.c

tclAppInit.c will now call TclZipfs_AppHook() if no TCL_LOCAL_MAIN_HOOK was defined.

Modifications to tclBasic.c

tclBasic.c will contain a call to *TclZipfs_Init() which will initialize the portions of C needed to implement zipfs as well as inject the zipfs command into the interpreter.

Modifications to tclFileName.c

tclFileName.c has a minor patch to exclude the prefix // from local file searches.

New C File tclZipFS.c

This file is a self-contained implementation in C of a zip based VFS. It includes all functions needed for implementing zipfs.

Modifications to tclIOUtil.c

tclIOUtil.c has a minor patch to exclude UNC style paths that contain a colon (:) in the server field from being resolved by the operating system. (Which by standard is not allowed anyway.) This allows VFS file systems to use //FSTYPE: namespace with impunity.

Modifications to the Tcl build system

Tcl will now attempt to find a zip encoder in the environment. If a tip430 savvy tclsh is discovered, that shell will be used. Failing that, the system will search for an executable named zip. Failing that, tcl will build it's own zip encoder.

When it cannot locate a zip encoded in the environment, Tcl will now build a copy of the minizip program, whose source is currently distributed in /compat/zlib/contrib/minizip. The tcl.m4 macro now detects if the compiler used can produce native native executables, and in cases where it cannot, will search for a C compiler that can, an substitute that value into the Makefile as HOST_CC. The C compiler will generate a native executable minizip which will be compiled in the same directory as tcl, and be used for all archive creation.

New build product libtcl.zip

A new build target libtcl_MAJOR_MINOR_PATCHLEVEL.zip is created from the /library directory in the tcl sources. For static library installs, this archive is copied to the tcl standard install location. For shared library builds this archive is appended to the dynamic library.

Modifications to the /library file system

To reduce the complexity of building archives, init.tcl has been modified to look for the presence of an adjacent file pkgIndex.tcl. That file contains all of the package ifneeded calls to direct the core to find the core distributed packages relative to location of tcl_library. Unlike other pkgIndex.tcl files, this file must be manually maintained and kept up to date as package names and versions change, are added, or removed.

Modifications to the tclConfig.sh and TEA

A new field TCLZIPFILE will indicate the name of the zip archive generated by the build system. If this field is present and the value is non-blank, extensions (for instance Tk) can use this to infer the core was built with ZipFs support.

TEA extensions which detect a non-blank value for TCLZIPFILE will generate a value TCLZIPFSSUPPORT=1 when compiling as a shared library, and TCLZIPFSSUPPORT=2 when compiling as a static library.

Modifications to Tk

Tk will scan tclConfig.sh, and if it detects a non-blank value for TCLZIPFILE, it will make a call to TclZipfs_AppHook() if no TK_LOCAL_MAIN_HOOK was defined.

C API

  • int TclZipfs_AppHook(int *argc, char ***argv);

  • If the current executable has an attached zip file system, mount that to ZIPFS_ROOT/app.

  • If the file ZIPFS_ROOT/app/main.tcl exists, register that file as the process startup script.

  • If the file ZIPFS_ROOT/app/tcl_library/init.tcl exists, register ZIPFS_ROOT/app/tcl_library/init.tcl as tcl_library

  • If the file ZIPFS_ROOT/app/tk_library/init.tcl exists, register ZIPFS_ROOT_/app/tk_library/init.tcl* as tk_library

  • If tcl_library was not set, the function will then scan the local environment for a zipfs file system attached to either the tcl dynamic library or an archive named libtcl_MAJOR_MINOR_PATCHLEVEL.zip. That file can either be in the present working directory or in the standard system install location for Tcl.

  • int TclZipfs_Mount(Tcl_Interp *interp, const char *zipname, const char *mntpt, const char *passwd);

    Mounts a zip file zipname to the mount point mntpt. If passwd is non-null, that string is used as the password to decrypt the contents. mntpnt will always be relative to zipfs:

  • int TclZipfs_Unmount(Tcl_Interp *interp, const char *zipname);

    Unmount the file system created by a prior call to TclZipfs_Mount()

Creating a wrapped executable

With this tip, producing a wrapped executable is now a matter of:

mkdir myvfs.vfs
cd myvfs.vfs
echo "puts {hello world}" > main.tcl
zip -r ../hello.zip .
cd ..
cp tclsh8.7 hello
cat hello.zip >> hello
./hello
> hello world

Copyright

This document has been placed in the public domain.

History