TIP 476: Scan/Printf format consistency

Login
Bounty program for improvements to Tcl and certain Tcl packages.
Author: Jan Nijtmans (jan.nijtmans@gmail.com)
State: Final
Type: Project
Vote: Done
Created: 27-Sep-2017
Post-history: PM
Tcl-Version: 8.7
Keywords: scan printf

Abstract

The Scan/Printf format handlers are originally derived from the C-equivalent scan() and printf() functions. Since ISO C99 there is the inttypes.h header file, which defines useful macros. But since the Tcl implementation was older than that, those macros don't play well together with Tcl. This TIP proposes a solution for that.

In addition, this TIP proposes to adapt the %#o modifier such that it produces the "0o" prefix in stead of "o". This is a small step in the direction of phasing out octal (TIP #114). Finally this TIP modifies the %#d modifier, such that it only produces a "0d" prefix if that is needed for correct interpretation of the number when parsing it.

Rationale

For an example program containing all situations mentioned here, see main.c

First of all, when compiling the example program on 64-bit linux (with -DTCL_WIDE_INT_IS_LONG=1), gcc outputs 5 warnings, 4 of them are unnecessary:

main.c: In function 'main':
main.c:61:9: warning: format '%lu' expects argument of type 'long unsigned int', but argument 2 has type 'long long unsigned int' [-Wformat=]
  printf("%" TCL_LL_MODIFIER "u\n", ll);
         ^~~
main.c:62:22: warning: format '%lu' expects argument of type 'long unsigned int', but argument 2 has type 'long long unsigned int' [-Wformat=]
  obj = Tcl_ObjPrintf("%" TCL_LL_MODIFIER "u", ll);
                      ^~~
main.c:64:13: warning: format '%llu' expects argument of type 'long long unsigned int', but argument 2 has type 'Tcl_WideUInt {aka long unsigned int}' [-Wformat=]
  printf("%llu\n", w);
             ^
main.c:65:26: warning: format '%llu' expects argument of type 'long long unsigned int', but argument 2 has type 'Tcl_WideUInt {aka long unsigned int}' [-Wformat=]
  obj = Tcl_ObjPrintf("%llu", w);
                          ^
main.c:99:25: warning: format '%Ld' expects argument of type 'long long int', but argument 2 has type 'mp_int * {aka struct mp_int *}' [-Wformat=]
  obj = Tcl_ObjPrintf("%Ld", &mp);
                         ^~~

The last warning arises because the C printf formatter doesn't handle the mp_int types, this warning can be safely ignored.

The other 4 warnings arise because Tcl_WideInt is defined as being the same as long on this sytem. Making it the same as long long fully eliminates those warnings.

Specification

This TIP proposes serveral things:

  • Whenever possible, typedef Tcl_WideInt to be equal to long long, even on platforms where long has the same size as long long.

  • Document the already existing TCL_LL_MODIFIER macro to be equivalent to "ll" whenever possible. This is used already to format/scan variables of type Tcl_WideInt/Tcl_WideUInt, but now it can be used also for variables of type long long or __int64 (Windows). The obligation to use Tcl_WideInt/Tcl_WideUInt in extensions (enforced by compiler warnings) is now gone: "long long" can always be used instead.

  • Add a new TCL_Z_MODIFIER macro to be equivalent to "z" whenever possible. This can be used to format/scan variables of type size_t or ptrdiff_t. If your compiler is ISO C99-compatible, you can use "z" resp "t" instead.

  • Add the "a" and "A" type fields, for formatting floats in hex. Conforming to ISO C99, except that the "p" in the output is always lowercase. Since hex float format is not a valid Tcl number representation, those type fields will not be added to Tcl's "scan" command.

  • Allow macro's from <inttypes.h>, such as PRId32 and PRId64, to be used in Tcl_ObjPrintf(). All that is needed in Tcl is making the format characters mean exactly the same as they do in C. See: https://en.wikipedia.org/wiki/Printf_format_string Missing were the "t", "z" and "j" modifier (unknown to MSVC as well), and "q" for BSD platforms. For windows, "I", "I32" and "I64" are needed. With those additions, the Tcl handling of the <inttypes.h> macros is fully ISO C99 compatible, even on platforms which don't have an ISO C99 conformant printf.

  • No longer let the "%llu" format generate an error if the formatted number is positive. This affects the Tcl "scan" command as well.

  • Add a new "L" length modifier. When used in combination with "f" or "g", it allows long doubles to be formatted. Since Tcl doesn't handle long doubles internally, the value is converted to a normal double first. For integers, "L" indicates that the value to be formatted is of type mp_int, the function expect a pointer to it as argument. Since standard C printf format handles don't handle mp_int types, this will result in a gcc warning, which can be safely ignored.

  • Adapt the %#x and %#X modifiers, such that the "x" in the output is always lowercase. If the formatted number is 0, no "0x" prefix is produced.

  • Adapt the %#o modifier to use "0o" as octal prefix in stead of "0". This prefix is only produced when the formatted number is not 0, i.e. when there is possible confusion with a decimal number.

  • Adapt the %#d modifier. The prefix "0d" is only produced when there is a width field and the formatted number is not 0, i.e. when there is possible confusion with an octal number. Since in Tcl 9.0 octal numbers with "0" prefix will be gone, "#" will not produce the decimal prefix any more.

  • Add a %p modifier (but not to "scan"), which is shorthand for 0x%zx. So it outputs the integer in hexadecimal form with "0x" prefix (since Tcl doesn't know about pointers). In Tcl_ObjPrintf() this can be used to format pointers without first casting it to ptrdiff_t or size_t, as does the equivalent C printf %p modifier.

Considerations regarding the incompatibility

This change is almost fully upwards compatible. As long as Tcl extensions use the Tcl_WideInt/Tcl_WideUInt data types (in stead of long long or __int64), everything works as before.

When using the "#" modifier, "format" doesn't produce the exact same output as Tcl 8.6 any more, but only rarely used corner cases are affected. As the "main.c" example shows, the result is more similar to ISO C99 than before. Here is a summary of the changes, and the 'compatible' format string if you want the 8.6 behavior (most likely you are using that already):

format_string value 8.6 8.7 compatible
"%#o" 15 017 0o17 "0%o"
"%#0d" 15 15 0d15 "%0d"
"%#X" 15 0XF 0xF "0X%X"

Reference Implementation

https://core.tcl.tk/tcl/timeline?r=z_modifier

Copyright

This document is placed in public domain.

History