TIP 406: "C" is for Cookie

Login
Bounty program for improvements to Tcl and certain Tcl packages.
State:		Draft
Type:		Project
Tcl-Version:	8.7
Vote:		Pending
Post-History:	
Author:		Donal K. Fellows <dkf@users.sf.net>
Created:	01-Aug-2012

Abstract

The "http" package needs cookie support, especially to support complex modern web authentication protocols. This TIP defines a pluggable interface and a TclOO class that implements that interface so that Tcl programmers can control just what is shared and stored, and where.

Rationale and Design Constraints

Cookies are a feature of many modern web applications. They're short strings stored in HTTP clients on behalf of servers, and which are then sent back to those servers with further requests to those servers. Often (but not universally) these short strings are IDs that are used as database keys to look up a "session object" in a server-side database that holds relevant information; such information can include whether the user has logged in, what color scheme to use in the stylesheet, etc. Of particular relevance to Tcl programmers is the fact that they are often now associated with login information; this means that it is not practical to leave Tcl code without cookie support.

Currently, Tcl programs that handle cookies have to do so manually, examining the otherwise-unparsed HTTP headers to see if anything relevant has been set. As the cookie protocol http://tools.ietf.org/html/rfc6265 is quite complex, it is highly desirable to have a centralized implementation of at least the basic parsing and injection code.

The main down-side of adding cookie handling is that definitely increases the ability of web servers to track clients written in Tcl. In particular, there is a danger of cross-application tracking. It is therefore necessary to ensure that the cookie handling mechanism is off by default, and that the locations used to store the cookies are controllable by the Tcl application.

Proposed Change to 'http' Package

I propose to add a new configuration option to the http package:

-cookiejar commandPrefixList

This option (configurable via http::config), can either be an empty string (the default), or it can be a list of words that is a command prefix. An empty string will disable cookie handling: the current http package behavior in relation to cookies (i.e., ignoring them) will prevail. However, if a non-empty list of words is supplied, they will be used as if they were a command (or rather a prefix to be expanded) in the following ways:

For Each Cookie Provided by an HTTP Server

When a cookie is supplied by an HTTP server, it will be reported to the cookie jar command like this:

{*}$commandPrefixList storeCookie cookieName cookieValue optionDictionary

The cookieName and cookieValue are relatively self-explanatory; they represent the name/value pair to store. The optionDictionary contains the parsed cookie options, being at least these:

hostonly: Boolean; whether the cookie should only ever be returned to the originating host (if not, it should be sent according to the domain). This property is always present.

httponly: Boolean; whether the cookie ought to be only used with HTTP connections. (NB: This is unlikely to be something we enforce.) This property is always present.

persistent: Boolean; whether the HTTP server wishes the cookie to persist for longer than the current "session". This property is always present. Non-persistent cookies are expected to never be committed to permanent storage.

secure: Boolean; whether the cookie should only be sent on "secure" connections to the HTTP server. (This typically means "HTTPS is required", but the various cookie specifications are less-than-clear in this area.) This property is always present.

expires: Timestamp; when a persistent cookie will cease to be interesting to the HTTP server that issued it. (NB: If this is in the past, any matching cookie should be deleted.) This property is only present if the "persistent" property is true.

domain: String; what domains should this cookie be sent to. This property is always present, but may sometimes be the same as the "origin" property; see the "hostonly" property for how to treat this.

origin: String; what host did we send the request to that caused this cookie to be generated. This property is always present.

path: String; what resource paths within the relevant HTTP servers should a cookie be sent to. This property is always present.

The result of this command will be ignored (so long as it is non-exceptional).

When Making an HTTP Request

Each time an HTTP request is made, the cookie store is consulted (prior to the connection being opened) to find out what cookies should be sent as part of the request, like this:

{*}$commandPrefixList getCookies protocol host path

The protocol is the name of the protocol scheme that will be used to contact the HTTP server (typically http or https), the host is the server's hostname, and the path is the resource path on that server. The query and fragment parts of the URL are never supplied to the cookie store as part of this request, nor is the port, nor are any user identification credentials (the cookie specification specifically states that it is a known problem that cookies have always ignored the service port number for the purposes of whether to send the cookie; we therefore duplicate this failure).

The result is treated as a list of keys and values (i.e., it is expected to be a list with an even number of items in it, with the first key at index 0 and the first value at index 1) and describes the collection of cookies to send. The http package will manage the formatting of the cookies as part of the request to send.

A Cookie Jar Implementation

To go with the above specification, this TIP also describes a TclOO class that will be provided to implement the cookie store side of the protocol. This class will be provided in the cookiejar package.

The name of the class will be ::http::cookiejar, and its instances will be cookie stores that participate in the above protocol. The constructor of the class will take an optional argument that names an SQLite database that will be used to store the cookies; if no name is provided, an in-memory database will be used and all cookies will be treated like pure session cookies.

The result is that it will be possible to enable cookie handling in a Tcl script using this:

 package require http
 package require cookiejar

 http::config -cookiejar [http::cookiejar new ~/mycookies.db]

The ::http::cookiejar class will also allow configuration of its logging level via the loglevel method on the class (which takes a single argument, the new logging level, or which returns the current logging level if called with no arguments). Permitted log levels are error, warn, info and debug. Log messages will be written by a call to ::http::Log (which does nothing by default anyway).

An example of setting the logging level to the (substantially more verbose) debug level:

 http::cookiejar loglevel debug

The instances of the ::http::cookiejar class will additionally support the method forceLoadDomainData and the method lookup.

instance forceLoadDomainData

This instructs the instance to load (or reload) its definitions of what domains may not have cookies set for them. It takes no extra arguments and produces an empty result (unless an error occurs).

instance lookup ?host? ?key?

This looks up cookies in the store. If neither host nor key are specified, this returns the list hosts for which cookies are defined. If host is specified but key is not, this returns the list of cookie keys for the host (note that these may be session cookies or durable cookies; this interface does not distinguish). If both host and key are specified, the value for the particular cookie is returned (with it being an error if no such cookie is defined). This method provides no mechanism for setting the value of a cookie or creating a new one.

Implementation Notes

It is worth noting that the current cookiejar package will download a list of "bad" domains (i.e., domains that correspond to super-registries, such as com, ac.uk or tk) when a new database is constructed (provided the cookiejar instance is backed by a database file; in-memory databases never have this part populated by default). This extensive list of domains http://mxr.mozilla.org/mozilla-central/source/netwerk/dns/effective_tld_names.dat?raw=1 is a security feature that prevents the setting of cookies for large numbers of hosts at once, but it is belived sufficiently long that it is excessive for supplying as part of the package directly.

The degree to which it is necessary to update cookie stores from that list has not yet been studied.

The underlying SQLite database is forwarded to the cookiejar instance's interface as the (default non-exported) method Database; this takes all the normal subcommands of an SQLite dbcmd, as documented in http://www.sqlite.org/tclsqlite.html .

The cookiejar package handles conversion of host names into and out of punycode so that lookups are always performed on canonical names. This is important because there is no guarantee that the encoding of host names will be the same as they are referred to in another context; for example, the list of forbidden domains above is in UTF-8 and not using the IDNA scheme.

Privacy

Cookies are often associated in the public mind with problems with privacy. There are two principal mechanisms provided here to mitigate these (beyond the proposed domain restrictions, which follow recommended Best Practice):

  1. No Tcl interpreter will ever have cookie handling enabled by default. There will always need to be an explicit action taken to turn it on.

  2. Applications have to pick the name of their cookie stores when creating and installing them; there is no default.

Copyright

This document has been placed in the public domain.

History