Vertical error correction of eye trackers in nonrestrictive reading condition
© The Author(s) 2016
Received: 24 February 2016
Accepted: 21 April 2016
Published: 14 September 2016
The eye tracking technology is used for four decades for studying reading behavior. The applications are various: estimating the reader comprehension, identifying the reader, summarizing a read document, creating a reading-life log, etc. The gaze data used in such applications has to be accurate enough to perform the analysis. In order to improve the accuracy, most of the experiments are set up with restrictive conditions such as using a head fixation and a professional eye tracker. It implies that the results are valid only in restrictive laboratory settings and an unrealistic small error is produced by the experiment. However, the use of affordable eye trackers in realistic conditions of reading leads to large errors in the recordings. We propose a new algorithm to correct the vertical error and to align the gazes with the text. The proposed algorithm is robust to rereading and skipping some parts of text, contrary to all the other algorithms of the state of the art. We show that up to 69 % of the gazes are aligned with the correct text lines.
Since the first studies conducted by Rayner  around 40 years ago, the analysis of the reading behavior with the use of eye tracking systems has been widely popularized and especially in the last 10 years, thanks to the development of affordable eye trackers. Rayner has shown that the movements of the eyes while reading can be divided in two main categories: fixations and saccades. The fixations correspond to the short stops on words during reading which last about 250 ms, and the saccades correspond to the quick movements of the eyes between two fixations. By recording the sequence of the fixation positions, the reading behavior can be analyzed and different services provided to the user.
For instance, the eye gaze can be used for creating a “reading-life log” (Augereau et al. ). The idea is to record the read words in daily life with an eye tracker and to save them into a log file. Then, the user can research information on his reading history, count how many words he reads per day, analyze what kind of text he reads, etc. The eye gaze position can also be used for creating a summary of a text based on the reading behavior  by measuring the reader attention to every word in the document. The attention on a word is measured as the number of gazes on this word. Some other applications are also possible, such as detecting the understanding of a text (Kunze et al. ) or providing a real-time translation of a word (Hyrskykari et al. ).
There are two main solutions to compensate the error: (1) using specific recording conditions and (2) processing the signal. In order to limit the inaccuracy of the eye tracking systems, the researchers controlled the conditions of the experiments by using head fixations, bite bars, etc. However, fixing such strict conditions of experiments limits the usage of eye trackers in a laboratory environment. Because our aim is to analyze the reading behaviors in natural and realistic conditions of reading, we choose to focus on finding a post-processing algorithm to correct this problem. In this paper, we propose an algorithm to estimate and correct automatically the vertical error of the eye tracker.
Unlike the other methods from the state of the art, our method can be used in a “nonrestrictive” reading situation, i.e., even if some parts of the text are skipped or reread. For the experiments we use the Tobii EyeX eye tracker which is available for 139$1. It implies that the error we deal with is quite larger than the experiments using a chin rest or a professional eye tracker. But working with an inexpensive eye tracker is important in order to disseminate the reading analysis algorithms to a large community.
The rest of the paper is organized as follows. First, we present some related work. In the following section, we explain the different steps of the proposed algorithm. The fixations are processed from the eye gazes and segmented as a set of “fixation lines” where a fixation line corresponds to the reading of one text line. Then, the vertical error is coarsely estimated and the fixation lines are matched with the text lines. In the next section, we present the algorithm proposed by Yamaya et al.  for correcting the vertical error and compare it to our method. After this, we show through the experiments that we can align 69 % of the eye gaze lines with their corresponding text lines and compare the results with the state-of-the-art method. Finally, we conclude and discuss about the outlooks.
2 Related work
Formerly, the researchers dealt with the inaccuracy of the eye tracker by using a chin rest , a high-technology eye tracker , a large line spacing text , or a large-size font text . Only few recent studies have been dedicated to the correction of the vertical error of the eye tracker by a post-processing of the recordings.
Hyrskykari  proposed a method to sequentially map the fixations to the text lines. In this method, the first fixation is mapped to the closest word, and then, the following fixations are aligned with the same text line. Martinez-Gomez et al.  proposed to correct the error by using a global text-gaze alignment. In this method, the gazes are represented as a scatter plot and the words are represented as boxes. The aim of the algorithm is to find the global transformation which best aligns the gazes with the boxes. The most recent research work is proposed by Yamaya et al. . Their idea is to perform a global alignment between the fixation lines and the text lines. The constraint of reading each line one by one, respecting the order of the text (from the first to the last line), is used to perform the alignment.
Rereading, skimming, or skipping some parts of the text strongly affects the algorithms.
The vertical error cannot be greater than a line spacing.
A chin rest or a professional eye tracker is used.
To assess the performance of the proposed method,we implemented the algorithm of Yamaya et al.  which is the most recent method to correct the vertical error of an eye tracker. The details of this algorithm and the differences with the proposed method will be described in Section 4.
In the next section, we present our algorithm.
3 Vertical error correction
Our problem is formulated as finding the optimal function g i which associates the corresponding text line of each fixation line. In order to be robust to rereading or skipping some part of the text, the proposed algorithm aligns individually each fixation line with the corresponding text line.
Creating the fixation lines
Estimating and correcting coarsely the vertical error
Performing a precise (fixation line; text line) matching
In more details, the algorithm is the following. First, we extract the fixations and segment them into fixation lines. For each fixation line, we compute a matching score with the n 1 nearest lines. By using this score, some matchings are selected and used for estimating coarsely the global vertical error. All fixations are then translated. The final step consists in matching each translated fixation line with the best candidate among the n 2 nearest text lines to obtain the final vertical positions of the fixation. We can notice that, after correcting coarsely the vertical error, the fixation lines are closer to the text lines so we can choose n 2 such that n 2<n 1, in order to improve the final matching step.
3.1 Creating the fixation lines
where P is a positive integer, chosen large enough not to detect short rereading as reading a new line.
3.2 Estimating and correcting coarsely the vertical error
The aim of this step is to match a fixation line with a text line and to give a score to this matching. Because we do not know which text line corresponds to which fixation line, for a given fixation line, we select the n 1 nearest text lines according to the vertical axis and compute the matching scores. The rating is computed by using the dynamic time warping (DTW) algorithm which is a sequence alignment algorithm. The input is a pair of sequences (L t ; L f ) we want to match. The DTW algorithm provides n 1 scores for each matching (L f (i);L t (i)). The matching is based on the position of the words in a text line. If two text lines have different lengths or if the words are distributed at different positions, the matching can perform well. So the larger n 1, the more confusion between the matchings there can be. Unfortunately, because the error of the eye tracker is large, we cannot choose a small value for n 1 before correcting the vertical translation.
To estimate the global vertical translation, we want to keep only the pairs (L f (i);L t (i)) which are correctly matched. So we look for the non-ambiguous matchings. For a fixation line, the difference between the best matching score s 1 and the second best matching score s 2 is computed: D=s 1−s 2. If this difference is large enough (D>T), we consider that the matching is correct. T is a threshold which defines the strictness of the selected matchings. If T is set to a low value, we will keep almost all the matchings of the previous step, and among these matchings there will be some fixation lines matched with wrong text lines. If T is set to a high value, we will select few matchings but less ambiguous.
After selecting the matchings, we estimate the vertical error. For each pair (L f (i);L t (i)), we compute the distance in pixels between the y-coordinate of the fixation line (Y F ) and the y-coordinate of the corresponding text line (Y T ). For each pair, we obtain a translation G, where G=Y F −Y T . By considering all the possible vertical translations, we then compute G a as the average value of all the vertical translations. We apply the vertical translation G a to all the fixations.
3.3 Performing a precise (fixation line; text line) matching
After correcting the vertical error, the fixation lines are closer to the corresponding text lines. For each fixation line, we align vertically all fixations with the best text line by using exactly the same method as in the previous step. The only difference is that a smaller number of n 2 nearest text lines for the matching is selected. In this step, the matching score is not used to estimate the vertical error; we simply align the fixations with the matched text lines.
4 Comparison with the state of the art
In this section, we will present Yamaya et al.’s  method for correcting the vertical error of an eye tracker while reading. We will point out the differences with our method.
4.1 Similarity measure and global alignment
The difference of length between s k and t l
The distance between s k and t l
The input of the sequence alignment algorithm is the whole sequence S and the whole sequence T. The algorithm tries then to find the best alignment between S and T by using the similarity measure.
4.2 Differences with the proposed method
In the proposed algorithm, the input of the sequence alignment algorithm is a pair (text line; fixation line) we want to match. The main feature used is the distance between the fixation and the word. The process is repeated for each fixation line, so the lines are processed independently and the reading order does not matter. At each step, one fixation line is compared with several text lines: the alignment is local. In other words, with the same notation, suppose that for each fixation line s i we can find the corresponding text line t j . The problem of our algorithm could be then formulated as ∀ s i ∈ S finding the local function g i so that g i (s i ) = t j .
The method proposed by Yamaya et al.  takes into account the sequence order of the fixation lines. As a consequence, in case of specific reading behaviors where the reading order is preserved, the algorithm will have good performances. However, in case of normal reading behavior, the reading order of the text lines cannot be predicted. The reader does not necessarily start to read the text from the first line and reads each line one by one until the last text line. He can start, stop, skip, or reread any line of text. Such natural behavior will greatly impact the performances of Yamaya’s algorithm. On the contrary, because the fixation lines are processed independently in the proposed method, the performances will not be affected by such behaviors. In the next section, we present our experiment and our results to illustrate this remark.
5 Experimental results
In the experiments, seven subjects were asked to read eight different texts. The eye tracker employed in this experiments is a stationary eye tracker Tobii EyeX Controller.
E1: read forward the text without rereading nor skipping any part of the text.
E2: read only the second paragraph of a text.
E3: read the two paragraphs of the text and reread the first one.
E4: read the second paragraph before the first one.
5.1 Ground truth
In case of a normal reading situation, some parts of the texts are skipped or reread. However, in order to create the ground truth, we asked the subjects to reread and skip only the part of text planned by the experiment.
5.2 Experiment results
The results are presented in two sections. First, we present the results associated to cases E1 and E2. Second, we present the results of cases E3 and E4. We compare the accuracy of our algorithm with Yamaya et al.’s with all these reading behaviors. The accuracy is computed as a percentage: the number of fixation lines matched with the correct corresponding text lines. The percentages are based on an average of all seven readers and two texts for each experiment.
5.2.1 Preserved reading order: E1 and E2
Percentages of good matching with both algorithms in case of reading: without rereading nor skipping (E1) and with skipping one paragraph (E2)
Total number of lines
5.2.2 Rereading and different reading order: E3 and E4
Percentages of good matching with both algorithms in case of rereading (E3) and reading in a different order (E4)
Total number of lines
These differences are mainly explained because the algorithm proposed by Yamaya et al. uses the reading order to match the fixation lines with the text lines. In case of E4, the reading order is not preserved because readers were asked to read paragraph 2 first. In case of rereading (E3), the algorithm proposed in this paper also has better performances than the Yamaya’s method. However, Yamaya’s method still manages to align around 60 % of fixation lines with the corresponding text lines. This is because the algorithm successfully matches paragraphs 1 and 2 but fails to match most of the fixation lines corresponding to the rereading. On the contrary, our algorithm can perform well in case of rereading and/or skipping behaviors because the fixation lines are matched with each text line independently. As a consequence, with our algorithm, the reader could read the lines in any order.
5.3 General discussion
Percentages of good matching with both algorithms in case of reading with preserved order (E1, E2) or not (E3, E4) and global performances with all the recordings (all texts)
Reading order is
Total number of lines
However, our algorithm is still not capable of matching all the fixation lines with the corresponding text lines. The remaining error can be explained because the position and number of the fixations do not correspond exactly to the distribution of words in the text. In particular if the reader skips many words unintentionally (e.g., small functional words such as “the” or “of”), it will affect the performance of the algorithms.
In the algorithm, we do not use the reading order: the reader could read the lines in any order. However, during a natural reading behavior, we do not read the lines randomly. So, we could find a compromise between setting up a constrain on the reading order (such as Yamaya’s method) or without any constrain (such as in the proposed method) in order to improve the performances of the algorithm. In order to do so, we need a probabilistic model of the reading order in case of a natural reading behavior.
We have presented a method for correcting the vertical error of an eye tracker in case of natural reading conditions. Our method can be used to enhance the input of the reading analysis algorithms based on the eye gaze positions. In our experiment, we have shown that our algorithm is capable of matching in average 69 % of the fixation lines with text lines. Contrarily to the state-of-the-art methods, our method is robust to normal reading behaviors such as skipping, rereading or reading the text lines in a different order. Besides, in our experiment we used an inexpensive eye tracking system which led to a more important error.
However, our algorithm is still not capable of matching all the fixation lines with the corresponding text lines. To deal with the remaining error, we plan to use a saccade and fixation model such as SWIFT  or the E-Z Reader  to generate some fixation distributions corresponding to a given text. Then, by using our alignment method, we will compare the fixation distribution of the reader with this model.
This work was supported in part by the JST CREST and the JSPS Kakenhi Grant Number 25240028 and 15K12172.
The authors declare that they have no competing interests.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
- Augereau O, Kise K, Hoshika K (2015) A proposal of a document image reading-life log based on document image retrieval and eyetracking In: Document Analysis and Recognition (ICDAR), 2015 13th International Conference on, 246–250.. IEEE, doi:10.1109/ICDAR.2015.7333761.
- Biedert R, Buscher G, Dengel A (2010) The eyebook–using eye tracking to enhance the reading experience. Informatik-Spektrum 33(3): 272–281.View ArticleGoogle Scholar
- Biedert R, Hees J, Dengel A, Buscher G (2012) A robust realtime reading-skimming classifier In: Proceedings of the Symposium on Eye Tracking Research and Applications ACM, 123–130, doi:10.1145/2168556.2168575.
- Calvi C, Porta M, Sacchi D (2008) e5learning, an e-learning environment based on eye tracking In: Advanced Learning Technologies, 2008. ICALT’08. Eighth IEEE International Conference on, 376–380.. IEEE, doi:10.1109/ICALT.2008.35.
- Engbert R, Nuthmann A, Richter EM, Kliegl R (2005) Swift: a dynamical model of saccade generation during reading. Psychol Rev 112(4): 777.View ArticleGoogle Scholar
- Hyrskykari A (2006) Utilizing eye movements: overcoming inaccuracy while tracking the focus of attention during reading. Comput Hum Behav 22(4): 657–671.View ArticleGoogle Scholar
- Hyrskykari A, Majaranta P, Aaltonen A, Räihä K (2000) Design issues of idict: a gaze-assisted translation aid In: Proceedings of the 2000 symposium on, Eye tracking research & applications, 9–14.. ACM, doi:10.1145/355017.355019.
- Kunze K, Kawaichi H, Yoshimura K, Kise K (2013) Towards inferring language expertise using eye tracking In: CHI’13 Extended Abstracts on, Human Factors in Computing Systems, 217–222.. ACM, doi:10.1145/2468356.2468396.
- Martinez-Gomez P, Chen C, Hara T, Kano Y, Aizawa A (2012) Image registration for text-gaze alignment In: Proceedings of the 2012 ACM international conference on Intelligent User Interfaces, 257–260.. ACM, doi:10.1145/2168556.2168575.
- O’Brien S (2009) Eye tracking in translation process research: methodological challenges and solutions. Methodol, Technol Innov Translation Process Res 38: 251–266.Google Scholar
- Rayner K (1998) Eye movements in reading and information processing: 20 years of research. Psychol Bull 124(3): 372.View ArticleGoogle Scholar
- Reichle ED, Rayner K, Pollatsek A (2003) The ez reader model of eye-movement control in reading: comparisons to other models. Behav Brain Sci 26(04): 445–476.View ArticleGoogle Scholar
- Stampe DM, Reingold EM (1995) Selection by looking: a novel computer interface and its application to psychological research. Stud Vis Inf Process 6: 467–478.View ArticleGoogle Scholar
- Xu S, Jiang H, Lau F (2009) User-oriented document summarization through vision-based eye-tracking In: Proceedings of the 14th international conference on Intelligent user interfaces, 7–16.. ACM, doi:10.1145/2168556.2168575.
- Yamaya A, Topić G, Martínez-Gómez P, Aizawa A (2015) Dynamic-programming–based method for fixation-to-word mapping In: Intelligent Decision Technologies, 649–659.. Springer, doi:10.1007/978-3-319-19857-6_55.