TIP 287: Add a Commands for Determining Size of Buffered Data

Login
Bounty program for improvements to Tcl and certain Tcl packages.
Author:         Michael A. Cleverly <michael@cleverly.com>
State:          Final
Type:           Project
Vote:		Done
Created:        26-Oct-2006
Post-History:   
Keywords:       Tcl,channel,chan,pendinginput,pendingoutput
Tcl-Version:    8.5

Abstract

Many network servers programmed in Tcl (including the venerable tclhttpd) are vulnerable to DoS (denial of service) attacks because they lack any way to introspect the amount of buffered data on a non-blocking socket that is in line buffering mode. This TIP proposes a new subcommand to chan to allow the amount of buffered input and buffered output (for symmetry) to be inspected from Tcl.

Rationale

Many network protocols are inherently line-oriented (HTTP, SMTP, etc.) and the natural approach to implementing servers for these protocols in Tcl is to configure the incoming client sockets to use non-blocking I/O and to have line buffering and then define a readable fileevent callback.

    proc accept {sock addr port} {
        fconfigure $sock -buffering line -blocking 0
        fileevent $sock readable [list callback $sock ...]
    }
    socket -server accept $listenPort

Recall that a readable fileevent will be called even when there is an incomplete line buffered. As the fileevent manual page states:

A channel is considered to be readable if there is unread data available on the underlying device. A channel is also considered to be readable if there is unread data in an input buffer, except in the special case where the most recent attempt to read from the channel was a gets call that could not find a complete line in the input buffer.

The fblocked (and in 8.5 chan blocked) command provides the Tcl programmer a means to test whether:

the most recent input operation ... returned less information than requested because all available input was exhausted.

There is currently no way at the Tcl level to see how much data is buffered and could be read safely (via read instead of gets).

There is also no way to specify any kind of upper limit on the length of a line; when in line-buffering mode all input is buffered until an end-of-line sequence is encountered or the EOF on the channel is reached.

The practical result is that all network daemons written in Tcl using line-oriented I/O (gets) can be fed repeated input lacking an end-of-line sequence until all physical memory is exhausted.

This vulnerability has been recognized since at least 2001. See, for example, the discussion between George Peter Staplin and Donald Porter on the gets page on the Tcl'ers Wiki http://wiki.tcl.tk/gets .

Proposed Change

At the C level Tcl already has a function, Tcl_InputBuffered which returns the number of unread bytes buffered for a channel and a corresponding Tcl_OutputBuffered which returns the number of bytes buffered for output that have not yet been flushed out.

This TIP proposes to implement a new chan pending command which will take two arguments: a mode and a channelId. The mode argument can be either input or output.

When the mode is input the command returns the value of Tcl_InputBuffered() (if the channel was open for input or -1 otherwise).

When the mode is output the command returns the value of Tcl_OutputBuffered() (if the channel was output for output or -1 otherwise).

This allows a programmer developing network daemons at the Tcl level to implement their own policy decisions based on the size of the unread line. Potential DoS situations could be avoided (in an application specific manner) long before all memory was exhausted.

 if {[chan blocked $sock] && [chan pending input $sock] > $limit} {
     # Take application specific steps (i.e., [close $sock] or
     # [read $sock] to process a partial line and drain the buffer, etc.)
 }

Rejected Alternatives

  • Adding a flag to fblocked to return the number of unread bytes instead of just 0 or 1 (since fblocked is now considered deprecated as per [208]).

  • Polluting the global namespace with a new favailable, fpending or fqueued command.

  • A chan unread because of potential confusion as to whether it performed ungetch() type functionality (un-reed vs un-red).

  • Any sort of -maxchars or -maxbytes flag to gets in order to not complicate the semantics of gets. Additionally without even further complicating gets semantics one could not distinguish input of exactly $limit characters from the case where only $limit characters were returned (with some input remaining unread).

  • The initial version of this TIP called for a chan available command. This was changed to pendinginput (and pendingoutput added for symmetry's sake) following suggestions on news:comp.lang.tcl from Donald Arseneau and Donal Fellows, and later to chan pending that takes a mode argument (input or output) based on suggestions from Donald Porter and Joe English.

Reference Implementation

[RFE 1586860] at SourceForge now contains a patch implementing chan pendinginput and chan pendingoutput (including updated chan man page and corresponding test cases) http://sourceforge.net/tracker/index.php?func=detail&aid=1586860&group_id=10894&atid=360894 .

Copyright

This document is in the public domain.

History