(1) Overview
Introduction
In writing and translation studies – but also in digital humanities studies – not only the analysis of the final product is crucial, also the writing process that leads to that final product has become an important perspective [1]. In order to observe this writing process a set of tools have been developed, on the one hand digital tablets are used to log handwriting processes [2], on the other hand keystroke logging programs have been developed to log keyboard writing. The latter programs can be divided in two groups:
keystroke logging tools that record the typing processes in self-developed word processors with limited functionality but integrating other specialised complementary observation for special purposes. For example, Translog [https://sites.google.com/site/centretranslationinnovation/translog-ii] was developed to study translation processes; Scriptlog [3] integrates eye tracking and keystroke logging (and also integrates pictures); GGXlog includes a range of graph-based visualisation tools to analyse the logged data [4, 5, 6].
keystroke logging tools that record the typing process in existing, professional word processors (e.g., Inputlog, [https://www.inputlog.net]) [7].
The keystroke logger presented in this paper adds a new observation instrument in the latter category. It is designed to record writing processes taking place in LibreOffice [https://www.libreoffice.org/] Writer, and allows the user to access all the functionalities in that word processor.
As mentioned above, the loggers in the first category all work from a built-in word processor, which allows for a highly accurate registration of the user’s actions and is well suited for lab-like data collection. However, in order to study writing in a more natural work environment, it is important that writers can produce their texts using the word processor they are most familiar with, like Microsoft Word or LibreOffice Writer [7]. Through its integration with these professional word processors, the logging is optimised for studying writing in the workplace.
The main reason to develop this new logging tool is its complementarity with the standard Inputlog version. Although Microsoft Word is no doubt the most widely used word processor in professional contexts, LibreOffice – which is open source and free to use – is gaining more and more market share, especially in educational contexts [8]. A second reason is the open source code base of LibreOffice. This makes the integration of logging more sustainable and easier than it is for Microsoft Word. Third, the simple interface addition (two small icons, see Figure 2) is more convenient to use on a day-to-day basis for writers, as it requires fewer steps to launch a new writing session. It could be used as a ‘light’ self-archiving tool, in addition to its applications in research. Fourthly, contrary to the ‘standard’ Inputlog, which only works on Windows operating system, this extension can be used both on Windows, Mac, and Linux operating systems. Finally, although Microsoft Word and LibreOffice Writer differ in features and functionality for advanced features, for less complex editing tasks the products are very similar.
As the Inputlog-LibreOffice extension [9] has only just been released, no studies have been conducted so far. Inputlog-MS Word, on the other hand, has been widely used in the writing studies community. An introductory article by Leijten & Van Waes [7], for instance reports that already in 2020 about 400 studies were published using Inputlog. The article also presents an overview of the main domains in writing and translation the program was used. The book on keystroke logging by Lindgren & Sullivan [10] also gives a good overview of this observation method, as does the chapter by Wengelin et al. [5].
To name just a few studies that used Inputlog, Meulemans et al. [11], for example, used pause and fluency measures to assess the effect of cognitive decline on writing skills. And Luan et al. [12] studied the way university students made use of online sources in writing, in their first, second and third language. Bekius [13] studied the processes of literary writers and interpreted their ‘paths’ through their documents to find semantic and chronological connections between different textual changes.
Implementation and architecture
Setting up a keystroke-logging session with Inputlog-LibreOffice
Figure 1 describes the researchers’ journey when setting up a writing study. The flow shows the main steps to be taken from installing the Inputlog-LibreOffice extension until processing and analysing the logged data. We briefly describe this logging flow.

Figure 1
Workflow to be followed when setting up a keystroke logging study with Inputlog-LibreOffice.
Logging data
Installation on first usage
When using Inputlog-LibreOffice for the first time, you can download the Inputlog-LibreOffice extension from the Inputlog website [www.inputlog.net/downloads], or you can access it through the LibreOffice extensions repository, which includes a detailed installation manual [https://extensions.libreoffice.org/en/extensions/show/70047]. The extension comes with a graphical installation wizard. To run the extension, Java Runtime Environment (JRE 8 281-version or later) is needed. Apart from the extension and possibly JRE, no further tools need to be installed.
After having successfully installed, the extension creates two small icons in the top (see Figure 2) of LibreOffice Writer.

Figure 2
The two right icons are the ones that are created by the Inputlog-LibreOffice extension. The red one allows you to activate a logging session; the square one you press to end the logging session.
Activation and setting up logging session
When the logging process is activated clicking the arrow icon, a wizard is presented that leads you through the main options and settings (see Figure 3). In the session identification, for instance, the user is invited to provide a Participant (code), select the language of the text, and age of the user. All other fields are optional. Next, a privacy statement is provided, which leads to the final window in which the logging action is selected. When a new profile is built, only the option ‘start a new document’ is shown; if it is a follow-up session, the participant can also choose to continue with a previously created document.

Figure 3
The steps of starting a new recording session in chronological order (A, B, C, D).
Start recording and start writing
After pressing ‘OK’ in the previous step, an (empty) Writer document is opened automatically. The writing can start and all the keyboard and mouse activities are being logged, together with a timestamp (in milliseconds for both the key down and the key up. A copy of the starting document is stored now.
Stop recording and author’s notes
In order to stop logging, the square Inputlog icon should be pressed. This ends the logging session. A copy of the ending document is now also stored in the working directory defined in the option settings. Optionally, an author’s note can be added when ending each logging session. This allows the user to phrase, for instance, why there was a long pause in the process because of a non-writing related event, e.g. caused by a call.
When ending the recording, the extension saves both the document with the text produced so far as Open Document Format, and the XML-logging file (idfx) with all the logging events, chronologically ordered. The latter also includes all the session information (e.g., user information, version used, start clock time) and a quantitative document summary (e.g., number of characters with and without spaces in the final document). We describe the contents of this file in more detail below.
Process and analyse logfiles
The XML-logging files can be read and processed in various statistical and programming environments (e.g., SPSS or R), for further analysis. One can also use the pre-set processing and analysis functions through the GUI of the standard Inputlog version (preferably, version 9 or later), as the logfiles use the same structure and syntax as in the standard Inputlog files. For a detailed overview we refer to the Inputlog manual [https://www.inputlog.net/downloads/].
Some examples:
Pre-processing: One of the pre-processing functions in Inputlog allows researchers to time filter the logfile, for instance, in order to remove the initial ‘pause’ that was related to giving instructions about the writing task while the logging had already started. Another option facilitates combining logging files that are part of a multi-session writing process.
General analysis: The general analysis is the heart of most other analyses. It generates a linear list with all the log events (keyboard/mouse actions, and time stamp), adding also pause and action time to it, as well as pause locations (e.g. within word, between sentence etc.).
Pause analysis: The general analysis is also used as input for the pause analysis which summarizes pausing behaviour from different perspectives (e.g., number of pauses above a certain threshold, length, location, ratio writing/pausing).
Process graph: The process graph is a time-based visualisation of how the text gradually evolved, including product, process, revision and pause information. Figure 4 shows an example of such a graph. The X-axis represents the timeline (i.c. about 40 minutes). The orange line below the X-axis shows the interaction between the main document and sources consulted. In this case, the user spent the first five minutes reading sources, and then started writing in the main document. During the next 20 minutes the writer produced her text quite fluently: the green process line steadily increases during that period. After 20 minutes the green line flattens and both top solid lines get more and more apart, referring to the fact that the text produced so far is being revised, deleting and adding text. The dotted green line shows the cursor position at any moment in time. In this process the writer revised her text from top to bottom in four different cycles (see dotted line going down and up, step by step), making smaller changes in the text-produced so far.

Figure 4
A process graph generated by Inputlog.
In the near future, additional analyses will be developed. Especially those analyses related to revision and also the replay function will be made available. These analyses were not always fully reliable in the standard Inputlog application as we did not have full control over the writers’ Microsoft Word interaction. The LibreOffice API allowed us to more adequately and reliably log character position changes during the composition process. The optimised logging data in LibreOffice will function as the basis for multi-focused revision analyses [14, 15] and user-friendly replay functions.
Moreover, at the moment Inputlog-LibreOffice is restricted to logging in LibreOffice. When a writer leaves the program, for instance, starts using a web browser, this interaction with the internet browser is not registered yet. However, as this so-called “WinLog” module is already available in the Inputlog standard version, we plan to integrate it in Inputlog-LibreOffice as well by merging both logging threads, the LibreOffice log and the WinLog that provides data about all the events outside the word processor (i.e., when leaving the main document being produced).
Moreover, we are building an online community [https://www.inputlog.net/community/] to further the exchange of open source code and open datasets for keystroke logging research, building on the ones available in the standard Inputlog version.
Logged data
As mentioned above, the writing process is logged as an XML-file. For instance, the logging of a single character – here the letter ‘i’ – looks like this:
<event type=”keyboard” id=”4”> <part type=”wordlog”> <position>2</position> <documentLength>3</documentLength> <replay>True</replay> </part> <part type=”winlog”> <startTime>3307</startTime> <endTime>3376</endTime> <key>VK_I</key> <value>i</value> <keyboardstate /> </part> </event>
A more detailed description of the XML-structure is available in a white paper [https://www.inputlog.net/wp-content/uploads/Generic_XML_structure_version-1_3.pdf]. This paper proposes a Generic XML standard for writing logs, independent of the application being used (e.g., Scriptlog). In a next step we will further optimize this white paper and make it available in JSON format as well.
This idfx-file is used to process and enrich the general analysis (also XML-format; cf. supra). The formatted output of this analysis is shown in Figure 5 (fourth ID-row).

Figure 5
Snapshot of the first six events of a general logging analysis representing the production of the word ‘This’ as the first word in a new text.
Inputlog-LibreOffice captures keyboard and mouse actions within Libre Office Writer. It does not capture the locations and actions of the writer outside of the working document. The keyboard and mouse actions captured within the working document offer a fairly complete picture of the writing. As shown in Figure 5, the output, position in the document, document length, character production and start and end time stamps are given for each event.
The logger handles keyboard shortcuts, such as control-C, by reporting them as a single event and indicating the result of the action on the following lines. It also captures text selection, offering both start and end positions of the selected text, and dragging text selections through mouse actions. A limitation is that it does not register formatting adjustments, although it will record shortcuts such as control-B or control-I for bold and italics respectively.
Further development
We are currently developing a recording procedure that enables researchers to initiate logging in LibreOffice Writer through the Inputlog interface (“Record”), analogous to the functionality already available for MS Word and Google Docs. This update is expected to be released in the near future. In addition, the adaptation will support the integration of focus logging (Winlog), allowing not only the capture of events within the Writer environment but also the tracking of interactions with external Windows applications (e.g., digital dictionaries or Google Search).
Technical implementation
The Inputlog-LibreOffice extension is implemented using the Java UNO runtime, which provides an object-oriented API to the LibreOffice framework. UNO can be used to plug-in new functionality into LibreOffice, as well as intercept or replace existing functionality.
Whenever a logging session is started, the InputLog-LibreOffice extension creates a number of objects designed to track and log the user interaction with the document, as shown in Figure 6.

Figure 6
Overview of the events and created objects during a logging session.
First, the extension creates several global listeners, designed to track keyboard events and mouse clicks and movements. As it is not possible to track such events in the necessary detail using the UNO API, we use a separate library called JNativeHook [https://github.com/kwhat/jnativehook]. While all events are passed on to LibreOffice as-is, they are also forwarded to the document tracker, explained in more detail below.
Next, several LibreOffice listeners are created and registered with LibreOffice. These track events generated by LibreOffice itself, such as keyboard events, window events (such as window focus), dispatch events (such as pasting text using a menu item or keyboard shortcut) and events that change the content of the clipboard (such as copying text). Again, these events are passed on to the document tracker, before being forwarded to the actual document which is being edited.
The reason for using two different methods to intercept events is that certain events, such as mouse movements, mouse clicks, and pressing special keys such as control-c or capslock, can only be intercepted by the global listeners. LibreOffice does not forward such events to the document, but instead only forwards their effects (such as cutting text or producing upper case letters). Other events, such as gaining or losing window focus, manipulating text via the menu options, or inserting special characters, can only be intercepted by the LibreOffice listeners.
The document tracker receives the incoming events from all listeners, and merges the two event streams. Special care must be taken in handling keyboard events. These are present in both event streams, but may not be completely identical. Only the global keyboard listeners record special keys or keyboard shortcuts, and only the LibreOffice keyboard listeners record the actual character being produced, which may include special characters such as à or é. Once the event streams are merged, the document tracker writes them to the log file in the XML format described above.
If multiple documents are open at the same time, multiple document trackers will be created, each writing to their own log file. While the LibreOffice listeners will be created separately for each document, the global listeners will only be created once. Window focus events are then used to track which document is actively being edited to ensure the events end up in the correct log file.
Quality control
Previous versions of the software have been submitted to several user experience tests using different configurations and keyboards. Moreover, also an accuracy check of the event related time stamps was conducted by setting up simultaneous logging sessions (using Inputlog and Scriptlog in parallel).
The testing procedure we conducted focused on the following areas:
Usability: Assessing the transparency and user-friendliness of the extension’s interface and logging process, including installation and data analyses, using think aloud protocols.
Timing accuracy: Evaluating the temporal accuracy of the LibreOffice extension by comparing its timestamps with those from Scriptlog and Inputlog, following the methodology outlined in Frid et al. [16], Testing the temporal accuracy of keystroke logging using the sound card. PDF.
Keystrokes: Conducting a systematic analysis of categorized keystroke logs, covering basic keyboard input, special characters, keyboard shortcuts, mouse clicks, and mouse selections.
Correspondence of XML data logfile: Systematically comparing the XML logfile generated by Inputlog LibreOffice with the corresponding logfile syntax in Inputlog to ensure accuracy and consistency.
The results were fully in line with the other loggers (i.e., +/–7 ms accuracy).
The changes made in the code are automatically checked by the Continuous Integration workflows. Regular code reviews, integration tests and also benchmarking with the original Inputlog were performed in order to improve the code quality and compatibility.
(2) Availability
Operating system
It has been optimised and tested for Windows 10, but also functions on Windows 11, Linux and Mac.
Programming language
Java 15, Java 18, possibly newer versions as well
Additional system requirements
N/A
Dependencies
LibreOffice version 7.1.6 or higher
Java Runtime Environment 8u281 or higher
List of contributors
Software was developed by Resoft Labs: Faruk Diblen, Jisk Attema and Jason Maassen
Software location
Working software: https://extensions.libreoffice.org/en/extensions/show/70047
Archive & Code Repository
Name: Inputlog-LibreOffice
URL: https://gitlab.huc.knaw.nl/trackchanges/inputlog-libreoffice
Persistent identifier: DOI 10.5281/zenodo.10788834
Licence: CC-By
Publisher: Floor Buschenhenke
Contact: <floor@buschenhenke.nl>
Version published: 1.0.0
Date published: 06/03/2024
Project homepage: http://www.inputlog.net
Language
English
(3) Reuse potential
The extension can be used in the context of writing and translation research. It can also be used by people who wish to implement a systematic registration of their writing processes. Within the literary heritage sector there has recently been an interest [17, 18] in pro-active collection of writing materials. This interest stems from the realisation that digital self-archiving practices are often inconsistent and prone to accessibility issues. The logging tool could be used to elicit high quality, sustainable archival material.
One possible extension of the software was already mentioned above; it could be combined with a Winlog functionality, so that activities outside of LibreOffice are also captured. Furthermore, the software is currently built for Latin scripts but could be adjusted to work on other scripts like Chinese, Korean, Cyrillic, Arabic, Devanāgarī (as far as these are supported by LibreOffice).
Acknowledgements
The software development was commissioned by the research project Track Changes: textual scholarship and the challenge of digital literary writing, a collaboration between Huygens Institute and the University of Antwerp. The project’s PI was Karina van Dalen-Oskam, and the software development was coordinated by Floor Buschenhenke and Luuk van Waes.
Consultation by Erik van Horenbeeck (technical coordinator of Inputlog_MS Word).
Additional testing by Alessandra Rossetti, Hayco de Jong, Hennie Brugman, Lamyk Bekius.
Competing Interests
The authors have no competing interests to declare.
Author Contributions
This paper was written and revised by Floor Buschenhenke and Luuk van Waes, making use of textual input and feedback from Faruk Diblen, Jason Maassen and Jisk Attema.
