Eye-tracker data in information seeking tasks on texts (in French)

NIAID Data Ecosystem2026-03-12 收录

下载链接：

https://zenodo.org/record/4655839

下载链接

链接失效反馈

官方服务：

资源简介：

Description of eye-movement data from the reading task (so called “Paris experiment”) File: em-y35-fasttext.csv The experiment is described in Frey et al. (2013). To summarize the experiment: Twenty-one healthy adults participated in the experiment, all French native speakers. Data of six participants were discarded because they did not follow the rules of the experiment thoroughly or data was too noisy during the acquisition with the eye tracker. The whole experiment was reviewed and approved by the ethics committee of Grenoble CHU (“Centre Hospitalier Universitaire”) (RCB: n° 2011-A00845-36). 180 short texts were extracted from the French newspaper Le Monde, edition 1999. Texts were given a topic and were constructed around three types, those which were highly related (HR, f in French) to the topic, or moderately related (MR, m in French) to the topic, or unrelated (UR, a in French) to the topic. There were 60 texts of each type, hence 180 in total. The semantic relatedness of the text to the topic was controlled by Latent Semantic Analysis. The goal of the experiment was to assess as soon as possible during reading whether the text was or not related to a given topic. First the topic was presented to participants and then they clicked to start the trial. Then a fixation cross was presented on the left of the first character at the first line, to stabilize the gaze location at the beginning of the text. When the text was displayed, participants read and had to mouse-click as fast as possible to stop reading and decide during another screen if the text was related or not to the topic. The trial was then repeated for the 180 texts with breaks in-between. During trials, the eye tracker gave the position of each fixation on the screen, and the fixation duration. The minimum fixation duration threshold was set to be 80ms whereas the maximum duration was 600ms. All fixations outside these limits were removed for all analyses. A posteriori it was necessary to know which word was being processed by the participant. First, the word identification span was defined as the necessary area from which a word can be identified. This span varies according to the direction of the reading, the alphabet, or the language, but can also be micro-context related. For simplicity, we used a fixed span that is considered for most of Latin languages (Rayner, 1998) : an asymmetrical window of 4 characters left and 8 characters right to the fixation. Moreover, a word may not entirely be located in the word identification span. We considered a word to be processed if at least 1/3 of its beginning or 2/3 of its end was inside the window. This result was obviously language sensitive, only valid in French. Finally, another hypothesis had to be made on the processed word within the window since several words might be captured. For this, we assumed that only one word could be processed during a given fixation and that this word was chosen as the closest to fixation centre, excluding stop words. Consequently, one word per fixation was selected. Thanks to this enhancement, features characterizing the reading strategy were defined. Each fixation was associated to its outgoing saccade. Data associated at each fixation were the fixation duration, the fixed word, the saccade amplitude expressed in visual degree, the number of crossed words between two saccades and the saccade duration. The saccade was characterized by this number of crossed words, which would be negative for a backward progression, null for a refixation or positive for a forward progression. The texts are stored as .png files. The whole set of eye movements is stored in the .csv file. There is one line per fixation and the associated outgoing saccade (except for the last fixation). For each trial, a topic is proposed and then a text. Three possibilities arise: the text is closely related to the topic, from a semantic point of view (“f” category), the text is moderately related to the topic (“m” category), the text has nothing to do at all with the topic (“a” category). The subject had to decide as fast as possible the question “Is the text related to the topic?” For each topic, six texts were presented to him / her (2 “f”, 2 “m”, 2 “a”). The answer was positive with a very high frequency (> 95%) in the first case, it was negative in the third case, and positive / negative answers were equally likely in the intermediate case. In the .csv file, only the trials“xxx_f1” and “xxx-f2” with close semantic proximity to the topic are present. Not every fixation on the scanpath was considered: the first fixation is never considered, it has a particular status the last fixation is never considered, since the characteristics of the associated outgoing saccade are not defined, all fixations that are too short are not considered (duration less than 80ms: information is not recorded). The first column is the number of the fixation (starting from 0), the second column is the subject identifier, the third column is the text identifier (alphabetic ordering among the different texts), the fourth column is the text name. The other characteristics associated with each fixation are: ‘ANSWER’, ‘FIX_NUM’, number of the fixation within current scanpath, ‘FIX_LATENCY’, elapsed time since the beginning of recording in current scanpath, 'X', current x-axis coordinate of fixation (gaze) within text image, 'Y', current y-axis coordinate of fixation (gaze) within text image, ‘FDUR’, fixation duration in ms, ‘OFF_DUR’, 'SACAMP', outgoing saccade amplitude in pixels, 'SACOR', outgoing saccade amplitude in pixels, 'INEEG', 1 if some electro-encephalogram was recorded during the fixation, 0 otherwise, 'ISFIRST', indicates whether current fixation is the first within trial (Boolean), 'ISLAST' , indicates whether current fixation is the last within trial (Boolean), 'READMODE', reading mode categorized into 5 classes (see hereunder), 'WINC', increment in number of words required to reach next fixation from current fixation, 'CINC', increment in the number of characters required to reach next fixation from current fixation, 'FIXED_WORD', word in the text currently considered as being fixed, 'FIXED_WINDOW', set of words in the text currently being potentally fixed, 'WORD_FREQUENCY', frequency of current word being fixed with respect to a large corpus of French words, 'COSINST', measure of the LSA semantic similarity between the target topic and word currently being fixed. This is an instantaneous similarity for each fixated word. This comes from the computation of a cosine (between 0 and 1), 'COSCUM', measure of the LSA semantic similarity between the target topic and all the words read up to current fixation. This is a cumulative similarity, 'SACDIR', direction of outgoing saccade (forward, backward, upward, downward or last), 'NEW_READ_WORDS', number of words being read between two successive fixations, excluding words fixed during previous fixations, 'TEXT_TYPE', type of text among 'a', 'f', 'm', 'COS_INST_FASTTEXT_2016', same as COSINST but using FastText representation by Joulin et al. (2016) instead of LSA, 'COS_CUM_FASTTEXT_2016', same as COSCUM but using Fast Text representation by Joulin et al. (2016) instead of LSA, 'WFREQ_RANK_FASTTEXT_2016', rank of current word being fixed ordered by decreasing frequencies within training corpus in Joulin et al. (2016) 'COS_INST_FASTTEXT_2018', same as COSINST but using FastText representation by Mikolov et al. (2016) instead of LSA, 'COS_CUM_FASTTEXT_2018', same as COSCUM but using Fast Text representation by Mikolov et al. (2016) instead of LSA, 'WFREQ_RANK_FASTTEXT_2018', rank of current word being fixed ordered by decreasing frequencies within training corpus in Mikolov et al. (2016) 'WFREQ_RANK_FASTTEXT_1618', mean of 'WFREQ_RANK_FASTTEXT_2016' and 'WFREQ_RANK_FASTTEXT_2018' 'COS_INST_FASTTEXT_1618', mean of 'COS_INST_FASTTEXT_2016' and 'COS_INST_FASTTEXT_2018' 'COS_CUM_FASTTEXT_1618', mean of 'COS_CUM_FASTTEXT_2016' and 'COS_CUM_FASTTEXT_2018' 'TEXT_TYPE_2', refinement of text_type where 'f' texts are distinguished between 'f+' (at least one word of the target topic appears in the text) or 'f' (other 'TEXT_TYPE'-'f' texts) ReadMode is defined as 0 if WINC >=2 1 if WINC = 1 2 if WINC = 0 3 if WINC = -1 4 if WINC <=-2 References: Joulin, A., Grave, E., Bojanowski, P., and Mikolov, T. (2016b). Bag of tricks for efficient text classification. arXiv preprint arXiv:1607.01759. Mikolov, T., Grave, E., Bojanowski, P., Puhrsch, C., and Joulin, A. (2018). Advances in pre-training distributed word representations. In Proceedings of the International Conference on Language Resources and Evaluation (LREC 2018).

创建时间：

2021-04-13

5,000+

优质数据集

54 个

任务类型

进入经典数据集