TIP 143: An Interpreter Resource Limiting Framework

Login
Bounty program for improvements to Tcl and certain Tcl packages.
Author:         Donal K. Fellows <donal.k.fellows@man.ac.uk>
State:          Final
Type:           Project
Vote:           Done
Created:        25-Jul-2003
Post-History:   
Tcl-Version:    8.5

Abstract

This TIP introduces a mechanism for creating and manipulating per-interpreter resource limits. This stops several significant classes of denial-of-service attack, and can also be used to do things like guaranteeing an answer within a particular amount of time.

Rationale

Currently it is trivial for scripts running in even safe Tcl interpreters to conduct a denial-of-service attack on the thread in which they are running. All they have to do is write a simple infinite loop with the for or while commands. Of course, it is possible to put a stop to this by hiding those commands from the interpreter, but even then it is possible to simulate the DoS effect with the help of a procedure like this (which even under ideal conditions will take over three days to run):

proc x s {eval $s; eval $s; eval $s; eval $s}
x {x {x {x {x {x {x {x {x {x {x {x {x {x {sleep 1}}}}}}}}}}}}}}

The easiest way around this, of course, is the resource quota as used by heavyweight operating systems, perhaps combined with the alarm(2) system call. Or at least that would be the way to do it if it wasn't for the fact that those are ridiculously heavy sledgehammers to take to this particular nut. Luckily we control the execution environment - it is a Tcl interpreter after all - so we can implement our own checks. It is this that is the aim of this TIP.

Efficiency

Efficiency of any resource monitoring system is naturally a major concern; it is relatively simple to create a resource monitoring system but quite a lot harder to arrange for that monitoring to be cheap enough that its use does not greatly impact on the general speed of the program (in this case, Tcl.) The costs of checking time limits in particular can be somewhat excessive because of the necessity of performing a system call to carry out the check, but if you're performing any action a lot, it can get costly.

My strategy for limiting the impact upon performance is to provide a programmer-tunable per-limit parameter called the granularity. This specifies how often (out of all the locations in the processing of the Tcl interpreter and bytecode engine) the limits should be checked; the granularity is per-limit because there is this distinct difference in the cost of checking different kinds of limits, and the limits are tunable because the ideal frequency depends very largely on the code being limited. This can be combined with the fact that the test to see if any limits have been turned on can be made extremely cheap (i.e. just a comparison of a memory location to zero) and it allows unlimited interpreters to perform at almost the same speed as before.

A Tcl API for Limit Control

I propose to add a subcommand limit to the interp command and to the object-like interpreter access commands. It will be an error for any safe interpreter to call the limit subcommand; master interpreters may enable it for their safe slaves via the mechanism interpreter aliases, as is normal for security policies.

The first argument to the interp limit command will be the name of the interpreter whose limit is to be inspected. The second argument will be the name of the limit being modified; this TIP defines two kinds of limits:

time: Specifies when the interpreter will be prohibited from performing further executions. The limit is not guaranteed to be hit at exactly the time given, but will not be hit before then.

command: Specifies a maximum number of commands (see info cmdcount) to be executed.

The third and subsequent arguments specify a series of properties (each of which has a name that begins with a hyphen) which may be set (if there are pairs of arguments, being the property name and the value to set it to) or read (if there is just the property name on its own). If no properties are listed at all, all dictionary (see [111]) of all properties for the particular limit is returned. The set of properties is limit-specific, but always includes the following:

-command: A Tcl script to be executed in the global namespace of the interpreter reading/writing the property when the limit is found to be exceeded in the limited interpreter. If no command callback is defined for this interpreter, the -command option will be the empty string. Note that multiple interpreters may set command callbacks that are distinct from each other; an interpreter may not see what other interpreters have installed (except by running a script in those foreign interpreters, of course.) The order of calling of the callbacks is not defined by this TIP.

Where a callback returns any kind of exceptional condition (i.e. the result isn't TCL_OK) a background error is flagged up.

-granularity: Limits will always be checked in locations where the Tcl_Async*() API is called, as this is called regularly by the Tcl interpreter (with a few exceptions relating to event loop handling.) However, the cost of just checking a limit can be quite appreciable (it might involve system calls, say) so this property allows the control of how frequently, out of the opportunities to check a limit, are such limits actually checked. If the granularity is 1, the limit will be checked every time. It is an error to try to set the granularity to less than 1.

When an interpreter hits a limit, it first runs all command callbacks in their respective interpreters. Once that is done, the interpreter rechecks the limit (since the command callbacks might have decided to raise or remove it) and if it is still exceeded it bails out the interpreter in the following way:

  • A flag is set in the interpreter to mark the interpreter as having exceeded its limits.

  • The currently executing command/script in the interpreter is made to return with code TCL_ERROR. (This is superior to using a novel return code, as third-party extensions are usually far better at handling error cases!)

  • The catch command will only catch errors if the interpreter containing it does not have the flag mentioned just above set. Similarly, further trips round the internal loops of the vwait, update and tkwait commands will not proceed with that flag set. (Extensions can find this information out by using Tcl_LimitExceeded(); see below.) No calls to bgerror should be made.

  • Once the execution unwinds out of the interpreter so that no further invocations of the interpreter are left on the call-stack, the flag is reset. The same is true if the limits are adjusted in any way. Note that attempting to execute things within the interpreter without raising the limits will result in the limit being hit immediately.

When resource limits are being used, unlimited master interpreters should take care to use the catch command when calling their limited slaves. Otherwise hitting the limit in the slave might well smash the master as well, just because of general error propagation. But that is good practise anyway.

Time Limits

Time limits are specified in terms of the time when the limit will be hit. Setting the limit to the current time ensures that the limit will be immediately activated. Time limits have two options for specifying the limit.

-seconds: This sets the absolute time (in seconds from the epoch, as returned by clock seconds) that the time limit is hit. If set to the empty string, the limit is removed. If no time limit is set, this option is empty when inspected.

-milliseconds: This sets the number of milliseconds after the start of the second specified in the -seconds option that the time limit is hit. May only be set to the empty string if the -seconds option is empty and present, and may only be set to a numeric value if the -seconds option is unspecified or present and non-empty (it is always safe to leave this option unspecified.) If no time limit is set, this option is empty when inspected.

Where a time-limited interpreter creates a slave interpreter, the slave will get the same time-limit as the creating master interpreter.

Command Limits

Command limits are specified in terms of the number of commands (see info cmdcount for a definition of the metric) that may be executed before the limit is hit. Command limits have the following extra option for specifying and inspecting the limit.

-value: The number of commands (integer of course) that may actually be executed. If set to the empty string, the limit is removed, and when introspecting, unlimited interpreters return empty strings for this value.

Where a command-limited interpreter creates a slave interpreter, the slave will get the command-limit 0 after initialisation (i.e. after the return from the call to Tcl_CreateSlave()) and will be unable to execute and commands until the limit for the slave is raised. Master interpreters implementing security policies for safe interpreters might want to set such limits semi-automatically to something more useful by deducting command-executions from the creating interpreter to its new slave.

A C-level API for Limit Control

It is also desirable for there to be a general C API for controlling resource limits. Not only does this provide control to extension authors that can't be easily smashed by Tcl scripts by accident, but it also makes implementation of the Tcl API to the limit subsystem easier to create as well.

  • int Tcl_LimitReady(Tcl_Interp *interp)

    Test whether a limit is ready to be checked according to its granularity rules. Result is boolean. Note that this is a cheap call.

  • int Tcl_LimitCheck(Tcl_Interp *interp)

    Test whether a limit has been exceeded. Result is boolean. Note that this call is potentially expensive (checking a time limit requires a minimum of one system call.)

  • int Tcl_LimitExceeded(Tcl_Interp *interp)

    Test whether the interpreter is in a state where a limit has been previously exceeded (i.e. whether a previous call to Tcl_LimitCheck had indicated that a limit had been hit) and not yet reset (by extending or removing the relevant limit.) Result is boolean. Note that this is a cheap call.

  • int Tcl_LimitTypeEnabled(Tcl_Interp *interp, int type)

    Test whether the given type of limit is turned on. Result is boolean. Note that this is a cheap call.

  • int Tcl_LimitTypeExceeded(Tcl_Interp *interp, int type)

    Test whether the given type of limit has been hit (in the notion of Tcl_LimitExceeded) and not yet reset. Result is boolean. Note that this is a cheap call.

  • void Tcl_LimitTypeSet(Tcl_Interp *interp, int type)

    Turn on the given type of limit.

  • void Tcl_LimitTypeReset(Tcl_Interp *interp, int type)

    Turn off the given type of limit.

  • void Tcl_LimitAddHandler(Tcl_Interp *interp, int type, Tcl_LimitHandlerProc *handlerProc, ClientData clientData, Tcl_LimitHandlerDeleteProc *deleteProc)

    Add a limit handler for the given type of limit and specify some caller context and a scheme for deleting that context.

  • void Tcl_LimitRemoveHandler(Tcl_Interp *interp, int type, Tcl_LimitHandlerProc *handlerProc, ClientData clientData)

    Remove a limit handler for the given type of limit. It is not an error to delete a limit that was not set; in that case, nothing is changed.

  • void Tcl_LimitSetCommands(Tcl_Interp *interp, int commandLimit)

    Set a limiting value on the number of commands, and reset whether the limit has been triggered. Does not enable the limit.

  • int Tcl_LimitGetCommands(Tcl_Interp *interp)

    Get the current limit on the number of commands.

  • void Tcl_LimitSetTime(Tcl_Interp *interp, Tcl_Time *timeLimitPtr)

    Set a limiting value (copying it from the buffer pointed to by timeLimitPtr) on the latest time that the code may execute, and reset whether the limit has been triggered. Does not enable the limit. The usec field of the buffer should be in the range 0 to 999999.

  • void Tcl_LimitGetTime(Tcl_Interp *interp, Tcl_Time *timeLimitPtr)

    Get the current limit on the execution time, writing it into the buffer pointed to by timeLimitPtr.

  • void Tcl_LimitSetGranularity(Tcl_Interp *interp, int type, int granularity)

    Set the checking granularity for the given type of limit.

  • int Tcl_LimitGetGranularity(Tcl_Interp *interp, int type)

    Get the checking granularity for the given type of limit.

Above, type is either TCL_LIMIT_COMMANDS or TCL_LIMIT_TIME, handlerProc is a pointer to a function that takes two parameters (a ClientData and a Tcl_Interp*) and returns void, and deleteProc is a pointer to a function that takes a single parameter (a ClientData) and returns void. The key is that handlerProcs are called when a limit is hit (they are used to implement the guts of the -command option), and when the callback is deleted for any reason (including a call to Tcl_LimitRemoveHandler and deleting the limited interpreter) the deleteProc is called to release the resources consumed by the clientData context.

Use Cases

WebServices

One use for this sort of code might be in a web-services context where it is important to return a message to some client code within some interval. Using an in-process limiting mechanism allows this to be implemented in a far more light-weight fashion, as the alternative would be to fork off a new small application server for each incoming request and it would be considerably more complex to have a scripted executive that decides (possibly by examining the stack) whether a failure to deliver an answer within bounds is serious, or whether some extra resources should be granted to allow execution to run to completion. Other high-performance server applications would also be likely to gain from this sort of thing.

Profiling

It is possible to use the limiting code (and especially the script callbacks) to write a Tcl profiler. Every time the limit runs out, the callback can examine the Tcl stack in the limited interpreter and then assign some more resources to last up until the next profile trap.

Untrusted Code Execution

As indicated earlier, these limits can be used to increase control over untrusted code running in safe interpreters. While it would be necessary to extend this to memory consumption for every aspect that could be impacted by some malicious code to be controllable, having control over the number of commands that may be executed and how long those commands may take gives a much higher degree of control than currently exists, and is thus a monotonic improvement.

Possible Future Directions

There are some obvious other things that could be brought within this framework, but which I've left out for various reasons:

call stack: We already do limiting of this, so bring that within this framework would be a nice regularisation. On the other hand, such a change would not be backward-compatible, and it might not be safe to perform the callbacks either (especially as overflowing the stack is fatal in a way that overrunning on time is not.)

memory: Limiting the amount of memory that an interpreter may allocate for its own use would be very nice. But conceptually difficult to do (what about memory shared between interpreters?), expensive to keep track of (memory debugging is known to add a lot of overhead, and memory limiting would be more intrusive) and a really big change technically too (i.e. you'd be rewriting a significant fraction of the Tcl C API to make sure that every memory allocation knows which interpreter owns it.)

open channels, file sizes, etc.: Most other things can be limited in Tcl right now without special support.

Implementation

An example implementation of this TIP is available as a patch http://sf.net/tracker/?func=detail&aid=926771&group_id=10894&atid=310894 .

Copyright

This document is placed in the public domain.

History