TIP 270: Utility C Routines for String Formatting

Login
Bounty program for improvements to Tcl and certain Tcl packages.
Author:         Don Porter <dgp@users.sf.net>
State:          Final
Type:           Project
Vote:           Done
Created:        19-Jun-2006
Post-History:   
Tcl-Version:    8.5

Abstract

This TIP proposes new public C utility routines for the convenience of C-coded extensions and embedded uses of Tcl.

Background

During development of Tcl 8.5, several internal routines have been created that provide useful string formatting functions. These routines are most commonly used in the construction of error messages, but have a generally useful nature. The Tcl source code itself makes significant use of them.

Making some of these routines public also addresses Feature Request 1184069.

Proposed Changes

Add the following routines to Tcl's public interface:

Tcl_AppendObjToErrorInfo

void Tcl_AppendObjToErrorInfo(Tcl_Interp *interp, Tcl_Obj *objPtr)

This routine is analogous to the existing routine Tcl_AddErrorInfo, but permits appending a Tcl_Obj value rather than requiring a (const char *).

Tcl_AppendLimitedToObj

void Tcl_AppendLimitedToObj(Tcl_Obj *objPtr, const char *bytes, int length, int limit, const char *ellipsis)

This routine is used to append a string, but to impose a limit on how many bytes are appended. This can be handy when the string to be appended might be very large, but the value being constructed should not be allowed to grow without bound. A common usage is when constructing an error message, where the end result should be kept short enough to be read.

Bytes from bytes are appended to objPtr, but no more than limit bytes total are to be appended. If the limit prevents all length bytes that are available from being appended, then the appending is done so that the last bytes appended are from the string ellipsis. This allows for an indication of the truncation to be left in the string.

When length is -1, all bytes up to the first zero byte are appended, subject to the limit. When ellipsis is NULL, the default string ... is used. When ellipsis is non-NULL, it must point to a zero-byte-terminated string in Tcl's internal UTF encoding. The number of bytes appended can be less than the lesser of length and limit when appending fewer bytes is necessary to append only whole multi-byte characters.

The objPtr must be unshared, or the attempt to append to it will panic.

Tcl_Format

Tcl_Obj * Tcl_Format(Tcl_Interp *interp, const char *format, int objc, Tcl_Obj *const objv[])

This routine is the C-level interface to the engine of Tcl's format command. The actual command procedure for format is little more than

 Tcl_Format(interp, Tcl_GetString(objv[1]), objc-2, objv+2);

The objc Tcl_Obj values in objv are formatted into a string according to the conversion specification in format argument, following the documentation for the format command. The resulting formatted string is converted to a new Tcl_Obj with refcount of zero and returned. If some error happens during production of the formatted string, NULL is returned, and an error message is recorded in interp, if interp is non-NULL.

Tcl_AppendFormatToObj

int Tcl_AppendFormatToObj(Tcl_Interp *interp, Tcl_Obj *objPtr, const char *format, int objc, Tcl_Obj *const objv[])

This routine is an appending alternative form of Tcl_Format. Its function is equivalent to:

 Tcl_Obj *newPtr = Tcl_Format(interp, format, objc, objv);
 if (newPtr == NULL) return TCL_ERROR;
 Tcl_AppendObjToObj(objPtr, newPtr);
 return TCL_OK;

But it is more convenient and efficient when the appending functionality is needed.

The objPtr must be unshared, or the attempt to append to it will panic.

Tcl_ObjPrintf

Tcl_Obj * Tcl_ObjPrintf(const char *format, ...)

This routine serves as a replacement for the common sequence:

 char buf[SOME_SUITABLE_LENGTH];
 sprintf(buf, format, ...);
 Tcl_NewStringObj(buf, -1);

Use of the proposed routine is shorter and doesn't require the programmer to determine SOME_SUITABLE_LENGTH. The formatting is done with the same core formatting engine used by Tcl_Format. This means the set of supported conversion specifiers is that of Tcl's format command and not that of sprintf() where the two sets differ. When a conversion specifier passed to Tcl_ObjPrintf includes a precision, the value is taken as a number of bytes, as sprintf() does, and not as a number of characters, as format does. This is done on the assumption that C code is more likely to know how many bytes it is passing around than the number of encoded characters those bytes happen to represent. The variable number of arguments passed in should be of the types that would be suitable for passing to sprintf(). Note in this example usage, x is of type long.

  long x = 5;
  Tcl_Obj *objPtr = Tcl_ObjPrintf("Value is %d", x);

If the value of format contains internal inconsistencies or invalid specifier formats, the formatted string result produced by Tcl_ObjPrintf will be an error message instead of any attempt to Do What Is Meant.

Tcl_AppendPrintfToObj

void Tcl_AppendPrintfToObj(Tcl_Obj *objPtr, const char *format, ...)

This routine is an appending alternative form of Tcl_ObjPrintf. Its function is equivalent to:

 Tcl_AppendObjToObj(objPtr, Tcl_ObjPrintf(format, ...));

But it is more convenient and efficient when the appending functionality is needed.

The objPtr must be unshared, or the attempt to append to it will panic.

Compatibility

This proposal includes only new features. It is believed that existing scripts and C code that operate without errors will continue to do so.

Reference Implementation

The actual code is already complete as internal routines corresponding to the proposed public routines. Implementation is just an exercise in renaming, placing in stub tables, documentation, etc.

Copyright

This document has been placed in the public domain.

History