TIP 466: Revised Implementation of the Text Widget

Login
Bounty program for improvements to Tcl and certain Tcl packages.
Author:         François Vogel <fvogelnew1@free.fr>
Author:         Gregor Cramer <remarcg@gmx.net>
State:          Draft
Type:           Project
Vote:           Pending
Created:        10-Mar-2017
Post-History:   
Keywords:       Tk,text widget
Tcl-Version:    8.7
Tk-Branch:      revised_text

Abstract

This TIP proposes the replacement of the current implementation of the text widget (the "legacy" text widget) by a revised implementation offering a large number of advantages.

Rationale

The Tk text widget has become increasingly complex as long as incremental improvements and features have been added from time to time. In that process, some known long-standing issues have become very difficult to tackle, for instance the long line problem regarding lack of performance.

Gregor Cramer, in the process of using the text widget in one of his applications, has analyzed the issues the current text widget is suffering from, and has come up with a new implementation having the following main advantages:

  • Big performance improvement

  • Better implementation of some existing features

  • A large number of new features

  • Numerous bug fixes

  • Very few incompatibilities with the legacy text widget

Proposal

The proposal is to replace the legacy code with the new implementation.

The author of the revised implementation has written a well documented website http://scidb.sourceforge.net/tk/revised-text-widget.html describing in details the issues with the legacy code, how he fixed these issues, and what features he has changed or improved.

It was not deemed feasible nor necessary to copy/paste/reformat all the information of the above website into the present TIP. Only the new features and incompatibilities are highlighted here, as opposed to detailed rationales about each change.

A version of the text man page, consistent with the changes and improvements proposed by the present TIP, can be seen at http://scidb.sourceforge.net/tk/text.html This version of the man page is colorized, with blue meaning "changed", and green meaning "new", so that it is easier to spot what's different from the legacy text widget.

Performance Improvements

Detailed performance comparison between legacy code and revised code can be found at http://scidb.sourceforge.net/tk/comparison.html but these are the key points:

  • Long line problem, especially with many tags, is eliminated: in general only O(N log N) in revised version, was a higher order polynomial time in legacy code.

  • Display is faster: smoother scrolling, faster response time. http://scidb.sourceforge.net/tk/display.html

  • Undo/redo is much faster: a completely new implementation has been worked out, directly working on the text segments. http://scidb.sourceforge.net/tk/undo.html

New Features

Detailed explanations and rationales for each of the items below can be found at http://scidb.sourceforge.net/tk/revised-text-widget.html

  • Undo/redo is handling tags (this was requesteed in Issue #1561991 and Issue #1027741, embedded images, embedded windows, and also marks if option -steadymarks is enabled

  • Additional widget state readonly

  • Hyphenation and full-justification support

    * Support of hyphenation (Issue #1096580 is fixed), with new helper functions tk_textinsert and tk_textReplace, and with new switches to pathName count, pathName get, and pathName search

    * Additional justification mode full

    * Additional wrap mode codepoint, and new widget option -useunibreak. New subcommand pathName brks

    * Additional option -lang used to guide hyphenation engines.

  • Additional subcommands:

    * pathName checksum

    * pathName clear

    * pathName edit altered

    * pathName edit info

    * pathName edit inspect

    * pathName edit irreversible

    * pathName edit recover

    * pathName inspect

    * pathName isclean

    * pathName isdead

    * pathName isempty

    * pathName lineno

    * pathName load

    * pathName mark compare

    * pathName mark exists

    * pathName mark generate

    * pathName tag clear

    * pathName tag findnext

    * pathName tag findprev

    * pathName tag getrange

    * pathName tag priority

    * pathName watch

  • Additional tag attributes:

    * -eolcolor

    * -hyphencolor

    * -hyphenrules

    * -inactivebackground

    * -inactiveforeground

    * -inactiveselectbackground

    * -inactiveselectforeground

    * -indentbackground

    * -undo

  • Additional widget options:

    * -endindex

    * -eolchar

    * -eolcolor

    * -eotchar

    * -eotcolor

    * -hyphencolor

    * -hyphenrules

    * -hyphens

    * -inactiveselectforeground

    * -insertforeground

    * -maxredo

    * -maxundosize

    * -responsiveness

    * -showendofline

    * -showendoftext

    * -showinsertforeground

    * -spacemode

    * -startindex

    * -steadymarks

    * -synctime

    * -tagging

  • Extensions to the syntax for indices:

    * new specifier begin

    * new syntax tag.current.first, tag.current.last

    * new syntax @first,last

  • Additional features of existing subcommands:

    * Additional option -marks for pathName delete command

    * Additional optional parameter direction for pathName mark set sub-command

    * New virtual event <<Altered>> to support new sub-command pathName edit altered

    * Extensions to commands pathName edit reset and pathName edit separator

  • Extended command pathName tag names

  • Additional switch for pathName dump

  • Additional option -extents for pathName bbox and pathName dlineinfo

  • Additional option -discardspecial for pathName mark names, pathName mark next, and pathName mark previous.

  • Additional optional parameter pattern for pathName mark names, pathName mark next, and pathName mark previous.

  • New helper commands:

    * tk_mergeRange

    * tk_textInsert

    * tk_textReplace

    * tk_textRebindMouseWheel

  • Additional option -owner for embedded window

  • Additional option -tags for embedded images and embedded windows

Bug Fixes

  • Bug fixed in TkTextGetIndex

  • Bug fixed in TkTextGetIndexFromObj

  • Bug fixed in DeleteIndexRange (note that this bugfix implies that deletion at the end of the text handles the last newline now differently - slight incompatibility with the legacy text widget)

  • Bug fixed in TkTextDeleteTag/TagBindEvent

  • Problems fixed with -startline/-endline

  • Problems fixed with tag event handling

  • Several bug fixes with undo

  • Edit modified confusing results fixed with new command edit altered

  • Severe problems with command sync fixed

  • Invalid changes in disabled widget are marked as deprecated

  • Inaccurate wrapping algorithm fixed

  • Bugs in display logic fixed

  • Insert cursor is now fully visible in all conditions

  • Trimming spaces: Issue #1082213 (and #1754051) are invalid, the fix put in core-8-6-branch and trunk (8.7) have been reverted. There is now the new option -spacemode that can be set to trim, however none of the available values for -spacemode in the revised code provide a display exactly identical to 8.6 in all cases

  • Issues with display of selections fixed

  • Update is no longer wasting the processor time since superfluous update computations are not done anymore

  • Bugs in context drawing support (OS X) fixed

  • Bugs fixed in tkUnixRFont.c

  • Several bug fixes related to handling/positioning of the insertion cursor

Details on each of these bugs can be found in the "Bugs/Issues in Original Implementation" section at http://scidb.sourceforge.net/tk/revised-text-widget.html

Incompatibilities with Legacy Version

Based on the author's website, the following incompatibilities are currently known:

  • [449] (undo/redo to Return Range of Characters) was not adapted into the revised implementation, because Issue #1217222 - the basis for [449] - is now featured by:

    1. The new undo implementation, because also the tag associations will be restored, and

    2. The powerful watch command, which also provides the affected ranges (with constant runtime behavior).

    Moreover, the tk_mergeRange function convenience function has been implemented in the revised version.

  • The special selection tag sel can no longer be elided (would be useless anyway).

  • Tag options (introduced in 8.6.6) -overstrikefg and -underlinefg were renamed to -overstrikecolor and -underlinecolor

  • The new index syntax @first,last is incompatible with the legacy version but it is not expected that any existing application will break, certainly nobody is using such a form for the name of a mark or image

  • The default value of 50 ms for the new -responsiveness option is incompatible to prior releases, but it shouldn't matter here, because nobody wants flickering, and nobody is using special tricks with a short mouse hovering while the widget is scrolling. Setting the responsiveness to zero restores the old behavior of the text widget.

  • <<UndoStack>> is generated with any change on the undo stack, not only when the undo stack or the redo stack becomes empty or non-empty

  • -startline/-endline behavior was subtly changed in some corner cases

  • In revised implementation "+N chars" and "-N chars" refer to characters, and no longer to indices (which was the case in legacy code for backwards compatibility reasons).

  • No value of the -spacemode option provides display of the text completely identical to legacy code (8.6 and above) in all situations. This includes the default "none".

Deprecated Commands and Options

  • Tag options (introduced in 8.6.6) -overstrikefg and -underlinefg were renamed to -overstrikecolor and -underlinecolor

  • edit undodepth|redodepth|canundo|canredo are replaced by more general edit info

  • Widget options -startline/-endline' are replaced by -startindex/-endindex

Drawbacks

  • The increase in memory usage is not very high (but a bit high), and despite this, in many cases, especially if many tags are used, and/or undo is enabled, the revised version is even decreasing the memory usage.

Detailed memory comparison between legacy code and revised code can be found at http://scidb.sourceforge.net/tk/comparison.html

Known Issues in the Revised Implementation

Based on the author's website, currently only these issues are known:

  • The code for the implementation has increased by more than 100%, and about 70% of the old code has been changed. The revised implementation needs more testing, the text widget is very complex, and bugs are expected. And a few additions are not yet well tested.

  • Function tk_textCopy is copying hidden (elided) text. This seems to be unexpected, but it's the behavior of the original implementation. Probably this is a bug and should be corrected.

  • Adding/deleting tags covering a large range of text is still quite time consuming.

  • The display line with the insert cursor is redrawn each time the cursor blinks, which causes a steady stream of graphics traffic. It would be desirable if the cursor update will be performed with a specialized and efficient redraw function.

  • If option -spacemode is set to trim, then get -displaychars should probably return trimmed spaces. Currently this command is not trimming spaces, so the result may not coincide with the visible text.

  • The search -regexp sub-command is still not yet fully implemented, see Tk documentation.

  • The revised widget still ignores modifying commands if state is not normal; this behavior is unreasonable, but conforms to the original version.

  • Currently the special index specifier begin has the lowest precedence, although it should have the same precedence as the special index special end (see section INDICES). In a future release this should be corrected. The current behavior is a workaround, avoiding that existing applications will break with the introduction of begin.

  • The implementation still contains some TODO's of minor issues.

Also, the following should be noted:

  • With the revised version there are failing tests on all platforms, they need to be fixed (by fixing the expected result in the test, or by fixing the text widget code).

  • More tests should be written to exercise the new or changed features.

  • The OS X case should be more tested on a real Mac, because it's the only platform using context drawing.

Miscellaneous

  • No function signature pertaining to a public interface was changed. Also public data structures haven't been touched.

  • All recent new features brought in trunk in the legacy version have their counterpart in the revised version, have been improved in performance and have no known drawbacks. Minor incompatibilities are however identified here and there.

Target Release

Given the amount of changes, also because of our usual precautions regarding backwards compatibility, and despite the very high quality of the code and the fact it passes (almost all) the previously existing test suite, it is deemed reasonable to target Tcl/Tk 8.7 (or 9.0), but neither the 8.6 nor the 8.5 streams of releases, which will continue to implement the legacy text widget code.

Support of versions back to 8.5 is currently included in the revised code, but will be removed (because it's useless for use in trunk only) at the time the new code will get merged into trunk.

Implementation

Implementation of the revised text widget code has been placed in branch https://core.tcl-lang.org/tk/timeline?r=revised_text of the fossil repository.

This implementation compiles on Linux, Windows, and OS X. It respects the standards of Tk (C99 standard, and also the Tcl source code formatting described in [247]).

The man page for the text widget has been contributed by jima and is included in the revised_text branch.

The expected results of many tests were adjusted to take into account that the revised implementation is better optimizing, so some trace results of display line computation are different. Other adjustments were required because of bug fixes.

Open Questions

  • tkTextUndo.c implements a specialized undo/redo, not using the legacy tkUndo.c. Reasons for this are stated at the top of tkTextUndo.c. It is interesting to note that, in the revised_text branch, tkUndo.c is not even compiled anymore, except on Linux (for no apparent reason). This is dead code waiting for use case by a widget. At least, compilation on Linux should be removed, but couldn't we even rename tkTextUndo.c to tkUndo.c and forget about the old implementation? tkTextUndo.c is also a shareable implementation (in the spirit of [104]).

  • Actual removal of deprecated features or keep them (some are marked as deprecated, but actually still supported)?

Copyright

This document has been placed in the public domain.

The author of the revised text widget code has explicitly placed his code of the text widget under the same license as Tcl.

History