The codebook specifies the data types, possible values, and other information for each column in the data files.
- Word features
- Stimuli and comprehension questions
- Items
- Areas of interest (AOI)
- Dependency trees
- Fixations
- Scanpaths
- Reading measures
- Reading measures merged
- Scanpaths merged
- AOI to word mapping
- Participants
- Participant's response accuracy
- Coding online survey
- Participant's response accuracy online survey
- Response data online survey
Contains the word features for each of the stimulus texts.
Please find the files at this link: Word features
| Column name | Possible values | Value type | Description | Num missing values | Missing value description | Source |
|---|---|---|---|---|---|---|
| word | string | Words as they appear in the stimuli texts. Words are split at white-space. | 0 | nan | nan | |
| word_with_punct | string | The word as it appears in the text, including punctuation. | 0 | nan | nan | |
| word_index_in_text | 1-180 | Integer | The index of the word in the text. Indexing starts at 1. | 0 | nan | nan |
| word_index_in_sent | 1-51 | Integer | The index of the word in the sentence. Indexing starts at 1. | 0 | nan | nan |
| sent_index_in_text | 1-12 | Integer | The index of a sentence in the respective text. Indexing starts at 1. | 0 | nan | nan |
| word_limit_char_indices | no stats? | Specifies the limits of each word in character indices. Format: [word_start],[word_end]. For example: 3,7 means a word starts at character index 3 in the text and ends at character index 7. The properties of the character indices are specified in char_index_in_text. | 0 | nan | nan | |
| text_id | b0, b1, b2, b3, b4, b5, p0, p1, p2, p3, p4, p5 | Unique identifier given to each stimulus text. | 0 | nan | nan | |
| text_id_numeric | 0-11 | Integer | Numerical value of text_id; 0=b0, 1=b1, 2=b2, 3=b3, 4=b4, 5=b5, 6=p0, 7=p1, 8=p2, 9=p3, 10=p4, 11=p5 | 0 | nan | Manually created |
| text_domain | biology: 954, physics: 941 | Categorical | The domain of the stimulus text. | 0 | nan | Manually tagged |
| text_domain_numeric | 0: 954, 1: 941 | Categorical | Numerical value of text_domain; 0=biology, 1=physics. | 0 | nan | Manually created |
| word_length | 2-33 | Integer | Word length is defined in number of characters including symbols like hyphens but without sentence punctuation at the end (i.e., z.B. = 4 characters; DNA-Kette =9 characters; eats.=4 characters). | 0 | nan | nan |
| STTS_punctuation_before | nan: 1883, $(: 12 | Categorical | If a word is preceded by a punctuation mark, the STTS-PoS-tag of the punctuation mark is added here. | 1883 | nan | Manually tagged |
| STTS_punctuation_after | nan: 1689, |
Categorical | If a word is followed by a punctuation mark, the STTS-PoS-tag of the punctuation mark is added here. | 1689 | nan | Manually tagged |
| is_in_quote | 0: 1881, 1: 14 | Categorical | Whether or not the word is part of an expression in quotes. | 0 | nan | Manually tagged |
| is_in_parentheses | 0: 1890, 1: 5 | Categorical | Whether or not the word is part of a phrase in parentheses. | 0 | nan | Manually tagged |
| is_clause_beginning | 0: 1796, 1: 99 | Categorical | Whether or not the word is the beginning of a clause. | 0 | nan | Manually tagged |
| is_sent_beginning | 0: 1798, 1: 97 | Categorical | Whether or not the word is the beginning of a new sentence. | 0 | nan | Manually tagged |
| is_clause_end | 0: 1797, 1: 98 | Categorical | Whether or not the word is the end of a clause. | 0 | nan | Manually tagged |
| is_sent_end | 0: 1798, 1: 97 | Categorical | Whether or not the word is the end of a sentence. | 0 | nan | Manually tagged |
| is_abbreviation | 0: 1890, 1: 5 | Categorical | Whether or not the entire word is an abbreviation. | 0 | nan | Manually tagged |
| is_expert_technical_term | 0: 1740, 1: 155 | Categorical | 1 if the word is a technical term that is not generally understandable. E.g.: ""Agarose"". | 0 | nan | Manually tagged |
| is_general_technical_term | 0: 1646, 1: 249 | Categorical | 1 if the word is a technical term that is generally understandable. E.g.: "elektrisch" | 0 | nan | Manually tagged |
| contains_symbol | 0: 1887, 1: 8 | Categorical | Whether or not the word contains a symbol. E.g.: β-D-Glucose | 0 | nan | Manually tagged |
| contains_hyphen | 0: 1866, 1: 29 | Categorical | Whether or not the word contains a hyphen. E.g. 1 for DNA-Fragment (not words that have tag TRUNC (compositional first element, e.g. in "Sekundär- und Tertiärstrukturen", "Sekundär-" does not count as having a hyphen.)). | 0 | nan | Manually tagged |
| contains_abbreviation | 0: 1883, 1: 12 | Categorical | Whether or not the word contains an abbreviation. 0 for words that are only an abbreviation. See is_abbreviation. E.g. 1 for DNA-Fragment, 0 for DNA. | 0 | nan | Manually tagged |
| STTS_PoS_tag | ADJA: 154, ADJD: 53, ADV: 73, APPR: 184, APPRART: 48, APZR: 1, ART: 276, CARD: 9, KOKOM: 17, KON: 66, KOUI: 6, KOUS: 16, NE: 4, NN: 515, PAV: 18, PDAT: 16, PDS: 7, PIAT: 5, PIDAT: 9, PIS: 10, PPER: 25, PPOSAT: 7, PRELAT: 6, PRELS: 29, PRF: 25, PTKA: 1, PTKNEG: 4, PTKVZ: 13, PTKZU: 10, PWAV: 1, TRUNC: 5, VAFIN: 73, VAINF: 8, VMFIN: 25, VMINF: 1, VVFIN: 102, VVINF: 33, VVIZU: 2, VVPP: 38 | Categorical | Part-of-speech tags according to the STTS-tagset. See stimuli/ANNOTATION.MD for more information. | 0 | nan | Manually tagged |
| type | string | The orthographical representation of a word as found in the corpus; this data is case sensitive, i.e. there is a distinction between name and Name. | 4 | nan | dlexDB | |
| type_length_chars | 2.0-33.0 | Integer | The length of the type of a word in characters. See the description of word_length for a definition of how characters are counted. | 1 | nan | nan |
| PoS_tag | adja: 162, adjd: 54, adv: 91, appr: 182, apprart: 48, art: 280, card: 9, kokom: 17, kon: 63, koui: 5, kous: 16, ne: 7, nn: 508, pdat: 16, pds: 7, piat: 5, pidat: 2, pis: 14, pper: 24, pposat: 7, prelat: 6, prels: 24, prf: 25, ptka: 1, ptkneg: 4, ptkvz: 15, ptkzu: 10, pwav: 1, trunc: 5, vafin: 73, vainf: 8, vmfin: 24, vminf: 1, vvfin: 103, vvinf: 33, vvizu: 2, vvpp: 38, xy: 5 | Categorical | Part-of-speech tag as defined by the dlexDB query. | 0 | nan | dlexDB |
| lemma | string | nan | 4 | nan | dlexDB | |
| lemma_length_chars | 1.0-32.0 | Integer | nan | 3 | nan | dlexDB |
| syllables | string | nan | 25 | nan | dlexDB | |
| type_length_syllables | 1.0-14.0 | Integer | nan | 24 | nan | dlexDB |
| annotated_type_frequency_normalized | min: 0.00817507899599, max: 24738.5901996, mean: 3889.8532, std: 6967.089 | Float | The number of occurrences of an annotated type in corpus. An annotated type is a unique combination of a type, its part-of-speech tag and its lemma. | 127 | nan | dlexDB |
| type_frequency_normalized | min: 0.00817507899599, max: 26530.3631386, mean: 4409.2283, std: 7712.5287 | Float | nan | 115 | nan | dlexDB |
| lemma_frequency_normalized | min: 0.00817507899599, max: 80100.3069113, mean: 13063.8057, std: 25247.1898 | Float | nan | 115 | nan | dlexDB |
| familiarity_normalized | min: 0.0, max: 26530.3631386, mean: 4074.0362, std: 7634.0602 | Float | nan | 117 | nan | dlexDB |
| regularity_normalized | min: 0.0, max: 2123.30585022, mean: 37.6119, std: 123.3575 | Float | nan | 116 | nan | dlexDB |
| document_frequency_normalized | min: 0.126068429944, max: 9372.80956103, mean: 3073.6225, std: 3377.4549 | Float | nan | 116 | nan | dlexDB |
| sentence_frequency_normalized | min: 0.0155184320176, max: 30912.3596552, mean: 6119.8019, std: 9642.457 | Float | nan | 116 | nan | dlexDB |
| cumulative_syllable_corpus_frequency_normalized | min: 1.40611358731, max: 125126.524676, mean: 16825.508, std: 15793.39 | Float | nan | 116 | nan | dlexDB |
| cumulative_syllable_lexicon_frequency_normalized | min: 0.428085856899, max: 218985.607753, mean: 23221.2613, std: 31879.0143 | Float | nan | 119 | nan | dlexDB |
| cumulative_character_corpus_frequency_normalized | min: 15533.2550482, max: 7810554.20193, mean: 1917789.2641, std: 1253328.3202 | Float | nan | 116 | nan | dlexDB |
| cumulative_character_lexicon_frequency_normalized | min: 47003.8270876, max: 18380479.713, mean: 4265792.357, std: 2812004.0938 | Float | nan | 116 | nan | dlexDB |
| cumulative_character_bigram_corpus_frequency_normalized | min: 5138.64210483, max: 1322150.62097, mean: 363265.3368, std: 217175.5613 | Float | nan | 116 | nan | dlexDB |
| cumulative_character_bigram_lexicon_frequency_normalized | min: 12677.7626521, max: 2788357.77704, mean: 590209.5889, std: 442407.5129 | Float | nan | 116 | nan | dlexDB |
| cumulative_character_trigram_corpus_frequency_normalized | min: 4358.04468689, max: 603427.130456, mean: 227949.9158, std: 122856.9432 | Float | nan | 116 | nan | dlexDB |
| cumulative_character_trigram_lexicon_frequency_normalized | min: 11942.3111499, max: 899592.89035, mean: 237804.6839, std: 171696.6712 | Float | nan | 116 | nan | dlexDB |
| initial_letter_frequency_normalized | min: 199.202149895, max: 110461.430317, mean: 38381.0963, std: 33346.9984 | Float | nan | 116 | nan | dlexDB |
| initial_bigram_frequency_normalized | min: 1.57779024623, max: 53801.2331077, mean: 12768.0203, std: 14670.9631 | Float | nan | 116 | nan | dlexDB |
| initial_trigram_frequency_normalized | min: -0.00817507899599, max: 29048.3692201, mean: 5888.4981, std: 8949.4325 | Float | nan | 116 | nan | dlexDB |
| avg_cond_prob_in_bigrams | min: 1.2e-07, max: 0.5006180465, mean: 0.0451, std: 0.0448 | Float | The conditional probability of the bigram, given the occurrence of its first component. In other words, how likely it is for the second component to follow directly after the first. Here, this measure is computed on the basis of the annotated type information. | 116 | nan | dlexDB |
| avg_cond_prob_in_trigrams | min: 3.153e-06, max: 25.0, mean: 0.2526, std: 0.6009 | Float | The conditional probability of the trigram, given the occurrence of its initial bigram. In other words, how likely it is for the third component to follow directly after the initial pair. Here, this measure is computed on the basis of the annotated type information. | 116 | nan | dlexDB |
| neighbors_coltheart_higher_freq_cum_freq_normalized | min: 0.0, max: 44055.247282, mean: 2248.7136, std: 7540.5582 | Float | nan | 116 | nan | dlexDB |
| neighbors_coltheart_higher_freq_count_normalized | min: 0.0, max: 8.13363128109, mean: 0.2077, std: 0.5007 | Float | nan | 116 | nan | dlexDB |
| neighbors_coltheart_all_cum_freq_normalized | min: 0.0, max: 49782.1108458, mean: 5076.6032, std: 10127.1033 | Float | nan | 116 | nan | dlexDB |
| neighbors_coltheart_all_count_normalized | min: 0.0, max: 47.5175301158, mean: 15.7971, std: 14.4153 | Float | nan | 116 | nan | dlexDB |
| neighbors_levenshtein_higher_freq_cum_freq_normalized | min: 0.0, max: 44055.247282, mean: 2879.4346, std: 7921.0448 | Float | nan | 116 | nan | dlexDB |
| neighbors_levenshtein_higher_freq_count_normalized | min: 0.0, max: 11.9864039932, mean: 0.3277, std: 0.6576 | Float | nan | 116 | nan | dlexDB |
| neighbors_levenshtein_all_cum_freq_normalized | min: 0.0, max: 54875.2749862, mean: 6722.366, std: 11598.2601 | Float | nan | 116 | nan | dlexDB |
| neighbors_levenshtein_all_count_normalized | min: 0.0, max: 75.7711966712, mean: 24.6418, std: 22.5295 | Float | nan | 116 | nan | dlexDB |
| sent_surprisal_gpt2-base | min: 0.0005104430601932, max: 56.804420471191406, mean: 6.9134, std: 6.601 | Float | Surprisal value extracted from a language model (GerPT2-base) with the sentence as context. | 0 | nan | See script get_surprisal.py |
| text_surprisal_gpt2-base | min: 0.0002225389762315, max: 53.041446685791016, mean: 5.5822, std: 5.709 | Float | Surprisal value extracted from a language model (GerPT2-base) with the text as context. | 0 | nan | See script get_surprisal.py |
| sent_surprisal_gpt2-large | min: 0.0002048997703241, max: 42.28059005737305, mean: 6.1407, std: 5.8854 | Float | Surprisal value extracted from a language model (GerPT2-large) with the sentence as context. | 0 | nan | See script get_surprisal.py |
| text_surprisal_gpt2-large | min: 0.0001027531252475, max: 35.38883209228516, mean: 4.735, std: 4.8645 | Float | Surprisal value extracted from a language model (GerPT2-large) with the text as context. | 0 | nan | See script get_surprisal.py |
| sent_surprisal_llama-7b | min: 0.0001720042055239, max: 42.96158599853516, mean: 6.1564, std: 5.7273 | Float | Surprisal value extracted from a language model (LeoLM-7b) with the sentence as context. | 0 | nan | See script get_surprisal.py |
| text_surprisal_llama-7b | min: 1.990775308513548e-05, max: 35.62324142456055, mean: 3.4794, std: 3.8552 | Float | Surprisal value extracted from a language model (LeoLM-7b) with the text as context. | 0 | nan | See script get_surprisal.py |
| sent_surprisal_llama-13b | min: 8.702239938429557e-06, max: 46.25139999389648, mean: 6.0065, std: 5.8588 | Float | Surprisal value extracted from a language model (LeoLM-13b) with the sentence as context. | 0 | nan | See script get_surprisal.py |
| text_surprisal_llama-13b | min: 9.298280929215252e-06, max: 36.29869842529297, mean: 3.2454, std: 3.8091 | Float | Surprisal value extracted from a language model (LeoLM-13b) with the text as context. | 0 | nan | See script get_surprisal.py |
| sent_surprisal_bert-base | min: 1.1920928244535389e-07, max: 101.79562616348268, mean: 6.4507, std: 11.6184 | Float | Surprisal value extracted from a language model (BERT-base) with the sentence as context. | 0 | nan | See script get_surprisal.py |
| text_surprisal_bert-base | min: -0.0, max: 88.84420316047726, mean: 6.2599, std: 11.5846 | Float | Surprisal value extracted from a language model (BERT-base) with the text as context. | 0 | nan | See script get_surprisal.py |
Contains the stimulus information including the questions for each text.
Please find the file at this link: Stimuli including comprehension questions
| Column name | Possible values | Value type | Description | Num missing values | Missing value description | Source |
|---|---|---|---|---|---|---|
| text_id | b0, b1, b2, b3, b4, b5, p0, p1, p2, p3, p4, p5 | Unique identifier given to each stimulus text. | 0 | nan | nan | |
| text_id_numeric | 0-11 | Integer | Numerical value of text_id; 0=b0, 1=b1, 2=b2, 3=b3, 4=b4, 5=b5, 6=p0, 7=p1, 8=p2, 9=p3, 10=p4, 11=p5 | 0 | nan | Manually created |
| text_domain | biology: 6, physics: 6 | Categorical | The domain of the stimulus text. | 0 | nan | Manually tagged |
| text_domain_numeric | 0: 6, 1: 6 | Categorical | Numerical value of text_domain; 0=biology, 1=physics. | 0 | nan | Manually created |
| source | no stats? | The source of the stimulus text. | 0 | nan | nan | |
| headline | string | The header of the respective stimulus text. | 0 | nan | nan | |
| tq_1 | string | Text question 1. | 0 | nan | Manually created | |
| tq_1_option1 | string | Option 1 for text question 1. | 0 | nan | Manually created | |
| tq_1_option2 | string | Option 2 for text question 1. | 0 | nan | Manually created | |
| tq_1_option3 | string | Option 3 for text question 1. | 0 | nan | Manually created | |
| tq_1_option4 | string | Option 4 for text question 1. | 0 | nan | Manually created | |
| tq_2 | string | Text question 2. | 0 | nan | Manually created | |
| tq_2_option1 | string | Option 1 for text question 2. | 0 | nan | Manually created | |
| tq_2_option2 | string | Option 2 for text question 2. | 0 | nan | Manually created | |
| tq_2_option3 | string | Option 3 for text question 2. | 0 | nan | Manually created | |
| tq_2_option4 | string | Option 4 for text question 2. | 0 | nan | Manually created | |
| tq_3 | string | Text question 3. | 0 | nan | Manually created | |
| tq_3_option1 | string | Option 1 for text question 3. | 0 | nan | Manually created | |
| tq_3_option2 | string | Option 2 for text question 3. | 0 | nan | Manually created | |
| tq_3_option3 | string | Option 3 for text question 3. | 0 | nan | Manually created | |
| tq_3_option4 | string | Option 4 for text question 3. | 0 | nan | Manually created | |
| bq_1 | string | Background question 1. | 0 | nan | Manually created | |
| bq_1_option1 | string | Option 1 for background question 1. | 0 | nan | Manually created | |
| bq_1_option2 | string | Option 2 for background question 1. | 0 | nan | Manually created | |
| bq_1_option3 | string | Option 3 for background question 1. | 0 | nan | Manually created | |
| bq_1_option4 | string | Option 4 for background question 1. | 0 | nan | Manually created | |
| bq_2 | string | Background question 2. | 0 | nan | Manually created | |
| bq_2_option1 | string | Option 1 for background question 2. | 0 | nan | Manually created | |
| bq_2_option2 | string | Option 2 for background question 2. | 0 | nan | Manually created | |
| bq_2_option3 | string | Option 3 for background question 2. | 0 | nan | Manually created | |
| bq_2_option4 | string | Option 4 for background question 2. | 0 | nan | Manually created | |
| bq_3 | string | Background question 3. | 0 | nan | Manually created | |
| bq_3_option1 | string | Option 1 for background question 3. | 0 | nan | Manually created | |
| bq_3_option2 | string | Option 2 for background question 3. | 0 | nan | Manually created | |
| bq_3_option3 | string | Option 3 for background question 3. | 0 | nan | Manually created | |
| bq_3_option4 | string | Option 4 for background question 3. | 0 | nan | Manually created | |
| correct_ans_tq_1 | 1-4 | Integer | The index of the correct answer for text question 1. Specified as option number of the questions in that file. For example: 2 means that the answer that is specified in the column "tq_3_option2" is the correct answer to this question. | 0 | nan | nan |
| correct_ans_tq_2 | 1-4 | Integer | The index of the correct answer for text question 2. Specified as option number of the questions in that file. For example: 2 means that the answer that is specified in the column "tq_3_option2" is the correct answer to this question. | 0 | nan | nan |
| correct_ans_tq_3 | 1-4 | Integer | The index of the correct answer for text question 3. Specified as option number of the questions in that file. For example: 2 means that the answer that is specified in the column "tq_3_option2" is the correct answer to this question. | 0 | nan | nan |
| correct_ans_bq_1 | 1-4 | Integer | The index of the correct answer for background question 1. Specified as option number of the questions in that file. For example: 2 means that the answer that is specified in the column "tq_3_option2" is the correct answer to this question. | 0 | nan | nan |
| correct_ans_bq_2 | 1-4 | Integer | The index of the correct answer for background question 2. Specified as option number of the questions in that file. For example: 2 means that the answer that is specified in the column "tq_3_option2" is the correct answer to this question. | 0 | nan | nan |
| correct_ans_bq_3 | 1-4 | Integer | The index of the correct answer for background question 3. Specified as option number of the questions in that file. For example: 2 means that the answer that is specified in the column "tq_3_option2" is the correct answer to this question. | 0 | nan | nan |
The file contains the information on the version number of the question answer randomization for each text.
Please find the file at this link: Items
| Column name | Possible values | Value type | Description | Num missing values | Missing value description | Source |
|---|---|---|---|---|---|---|
| version | 0-119 | Integer | Specifies the version of the items. In each version, the order of the stimuli and the order of the answer options for each question differ. The specifics of each version can be found in the items.tsv. | 0 | nan | nan |
| text_id | b0, b1, b2, b3, b4, b5, p0, p1, p2, p3, p4, p5 | Unique identifier given to each stimulus text. | 0 | nan | nan | |
| text_domain | biology: 720, physics: 720 | Categorical | The domain of the stimulus text. | 0 | nan | Manually tagged |
| order_bq_1_ans | no stats? | The order in which the answers for background question 1 were presented. | 0 | nan | nan | |
| order_bq_2_ans | no stats? | See description of order_bq_1_ans | 0 | nan | nan | |
| order_bq_3_ans | no stats? | See description of order_bq_1_ans | 0 | nan | nan | |
| order_tq_1_ans | no stats? | See description of order_bq_1_ans | 0 | nan | nan | |
| order_tq_2_ans | no stats? | See description of order_bq_1_ans | 0 | nan | nan | |
| order_tq_3_ans | no stats? | See description of order_bq_1_ans | 0 | nan | nan | |
| trial | 1-12 | Integer | Each participant reads all 12 texts, the order of which follows their trial number. If text b0 has trial number 2 for participant 5, this participant read text b0 as the second text. | 0 | nan | nan |
Contains the aoi files for each of the stimulus texts.
Please find the files at this link: AOI
| Column name | Possible values | Value type | Description | Num missing values | Missing value description | Source |
|---|---|---|---|---|---|---|
| aoi_type | The shape of the area of interest. In this corpus, all aois are rectangles around the characters. | 0 | nan | SR Research data viewer | ||
| aoi | 1-1121 | Integer | The region of interest specified as character index in the text (see char_index_in_text). Defines which character has been fixated. | 0 | nan | SR Research experiment builder |
| start_x | 80-1622 | Integer | The x-coordinate in pixels of the top left corner of the aoi rectangle. | 0 | nan | nan |
| start_y | 21-920 | Integer | The y-coordinate in pixels of the top left corner of the aoi rectangle. | 0 | nan | nan |
| end_x | 92-1634 | Integer | The x-coordinate in pixels of the bottom right corner of the aoi rectangle. | 0 | nan | nan |
| end_y | 99-998 | Integer | The y-coordinate in pixels of the bottom right corner of the aoi rectangle. | 0 | nan | nan |
| character | string | Character as text. | 0 | nan | nan | |
| line | 1-12 | Integer | The texts were presented on the screen in multiple lines. Specifies the line of the respective row; indexing starts at 1. | 0 | nan | nan |
The constituency trees that have been corrected manually.
Please find the file at this link:
| Column name | Possible values | Value type | Description | Num missing values | Missing value description | Source |
|---|---|---|---|---|---|---|
| sent_index_in_text | 1-12 | Integer | The index of a sentence in the respective text. Indexing starts at 1. | 0 | nan | nan |
| sentence | string | The sentence in the text. | 0 | nan | nan | |
| spacy_constituency_tree | no stats? | The constituency tree of the sentence in the text as constructed by spacy. | 0 | nan | Spacy | |
| str_constituents | no stats? | The constituency tree in string format. This way it can be parsed easily and be displayed. | 0 | nan | Spacy | |
| spacy_pos | no stats? | The part-of-speech tags of the words in the sentence as tagged by spacy. | 0 | nan | Spacy | |
| constituents | no stats? | The constituents of the sentence tree as constructed by spacy. | 0 | nan | Spacy | |
| text_id_numeric | 0-11 | Integer | Numerical value of text_id; 0=b0, 1=b1, 2=b2, 3=b3, 4=b4, 5=b5, 6=p0, 7=p1, 8=p2, 9=p3, 10=p4, 11=p5 | 0 | nan | Manually created |
| text_id | b0, b1, b2, b3, b4, b5, p0, p1, p2, p3, p4, p5 | Unique identifier given to each stimulus text. | 0 | nan | nan | |
| manually_corrected | False: 19, True: 79 | Categorical | Whether the sentence tree was manually corrected. | 0 | nan | Manually tagged |
Contains the dependency trees for all stimuli which have been manually corrected.
Please find the file at this link: Dependency trees
| Column name | Possible values | Value type | Description | Num missing values | Missing value description | Source |
|---|---|---|---|---|---|---|
| spacy_word | The words in the sentence as tokenized by spacy. | 0 | nan | Spacy | ||
| spacy_lemma | The lemmas of the words in the sentence as constructed by spacy. | 0 | nan | Spacy | ||
| spacy_pos | no stats? | The part-of-speech tags of the words in the sentence as tagged by spacy. | 0 | nan | Spacy | |
| spacy_tag | The details part-of-speech tags of the words in the sentence as constructed by spacy (more fine-grained than spacy_pos). | 0 | nan | Spacy | ||
| dependency | no stats? | The dependency relations of the words in the sentence as constructed by spacy. | 0 | nan | Spacy | |
| dependency_head | no stats? | The head of the dependency relation of the words in the sentence as constructed by spacy. | 0 | nan | Spacy | |
| dependency_head_pos | no stats? | The part-of-speech tag of the head of the dependency relation of the words in the sentence as constructed by spacy. | 0 | nan | Spacy | |
| dependency_children | no stats? | The children of the dependency relation of the words in the sentence as constructed by spacy. | 0 | nan | Spacy | |
| text_id_numeric | 0-11 | Integer | Numerical value of text_id; 0=b0, 1=b1, 2=b2, 3=b3, 4=b4, 5=b5, 6=p0, 7=p1, 8=p2, 9=p3, 10=p4, 11=p5 | 0 | nan | Manually created |
| text_id | b0, b1, b2, b3, b4, b5, p0, p1, p2, p3, p4, p5 | Unique identifier given to each stimulus text. | 0 | nan | nan | |
| sent_index_in_text | 1.0-12.0 | Integer | The index of a sentence in the respective text. Indexing starts at 1. | 1 | nan | nan |
| manually_corrected | False: 1768, True: 193, nan: 153, Flse: 2 | Categorical | Whether the sentence tree was manually corrected. | 153 | nan | Manually tagged |
The raw eye tracking data (i.e. each line contains a sample) for each trial.
Please find the files at this link: Raw ET data
| Column name | Value type | Description | Source |
|---|---|---|---|
| time | Float | The time stamp of the sample. | edf file created by EyeLink |
| x | Float | The x-coordinate of the sample. | edf file created by EyeLink |
| y | Float | The y-coordinate of the sample. | edf file created by EyeLink |
| pupil_diameter | Float | The pupil diameter of the sample. | edf file created by EyeLink |
Computed gaze events of all trials for each reader.
Please find the files at this link: Fixations
| Column name | Possible values | Value type | Description | Num missing values | Missing value description | Source |
|---|---|---|---|---|---|---|
| fixation_index | 1-1469 | Integer | The index of the fixation in temporal order. | 0 | nan | SR Research data viewer |
| text_domain | bio: 203667, biology: 1032, physics: 199721 | Categorical | The domain of the stimulus text. | 0 | nan | Manually tagged |
| trial | 1-12 | Integer | Each participant reads all 12 texts, the order of which follows their trial number. If text b0 has trial number 2 for participant 5, this participant read text b0 as the second text. | 0 | nan | nan |
| acc_bq_1 | min: 0.0, max: 1.0, mean: 0.3869, std: 0.487 | Float | The accuracy of background question 1. The answer can be either true or false, so the value is either 0 or 1. | 5785 | For participant 1 (p0), 31 (p0, b1, b5), 32 (p0), 61 (p1), 62 (b0, b1, b3, b5, 04) and 90 (b3) the accuracies for certain trials are missing due to hardware problems (missing measurements). | nan |
| acc_bq_2 | min: 0.0, max: 1.0, mean: 0.3564, std: 0.4789 | Float | The accuracy of background question 2. The answer can be either true or false, so the value is either 0 or 1. | 5785 | For participant 1 (p0), 31 (p0, b1, b5), 32 (p0), 61 (p1), 62 (b0, b1, b3, b5, 04) and 90 (b3) the accuracies for certain trials are missing due to hardware problems (missing measurements). | nan |
| acc_bq_3 | min: 0.0, max: 1.0, mean: 0.4217, std: 0.4938 | Float | The accuracy of background question 3. The answer can be either true or false, so the value is either 0 or 1. | 5785 | For participant 1 (p0), 31 (p0, b1, b5), 32 (p0), 61 (p1), 62 (b0, b1, b3, b5, 04) and 90 (b3) the accuracies for certain trials are missing due to hardware problems (missing measurements). | nan |
| acc_tq_1 | min: 0.0, max: 1.0, mean: 0.6625, std: 0.4729 | Float | The accuracy of text question 1. The answer can be either true or false, so the value is either 0 or 1. | 5785 | For participant 1 (p0), 31 (p0, b1, b5), 32 (p0), 61 (p1), 62 (b0, b1, b3, b5, 04) and 90 (b3) the accuracies for certain trials are missing due to hardware problems (missing measurements). | nan |
| acc_tq_2 | min: 0.0, max: 1.0, mean: 0.6326, std: 0.4821 | Float | The accuracy of text question 2. The answer can be either true or false, so the value is either 0 or 1. | 5785 | For participant 1 (p0), 31 (p0, b1, b5), 32 (p0), 61 (p1), 62 (b0, b1, b3, b5, 04) and 90 (b3) the accuracies for certain trials are missing due to hardware problems (missing measurements). | nan |
| acc_tq_3 | min: 0.0, max: 1.0, mean: 0.6564, std: 0.4749 | Float | The accuracy of text question 3. The answer can be either true or false, so the value is either 0 or 1. | 5785 | For participant 1 (p0), 31 (p0, b1, b5), 32 (p0), 61 (p1), 62 (b0, b1, b3, b5, 04) and 90 (b3) the accuracies for certain trials are missing due to hardware problems (missing measurements). | nan |
| fixation_duration | 2-4474 | Integer | The duration of the fixation in milliseconds. | 0 | nan | SR Research data viewer |
| next_saccade_duration | 1.0-9491.0 | Integer | The duration of the saccade that follows a fixation in milliseconds. | 46 | nan | SR Research data viewer |
| previous_saccade_duration | nan-nan | Integer | The duration of a saccade that preceeds a fixation in milliseconds. | 515 | nan | SR Research data viewer |
| version | 0-105 | Integer | Specifies the version of the items. In each version, the order of the stimuli and the order of the answer options for each question differ. The specifics of each version can be found in the items.tsv. | 0 | nan | nan |
| line | 1-12 | Integer | The texts were presented on the screen in multiple lines. Specifies the line of the respective row; indexing starts at 1. | 0 | nan | nan |
| aoi | 1-1121 | Integer | The region of interest specified as character index in the text (see char_index_in_text). Defines which character has been fixated. | 0 | nan | SR Research experiment builder |
| char_index_in_line | 1-100 | Integer | Index of a character in the line. Indexing starts at 1. | 0 | nan | nan |
| original_fixation_index | 1-1478 | Integer | The index of the uncorrected fixation. | 0 | nan | SR Research data viewer |
| is_fixation_adjusted | False: 382202, True: 22218 | Categorical | Whether or not the fixation has been adjusted manually. | 0 | nan | Manually tagged. |
| reader_id | 0-105 | Integer | The unique identifier given to each reader. Reader IDs start at 0. | 0 | nan | Manually created |
| text_id | b0, b1, b2, b3, b4, b5, p0, p1, p2, p3, p4, p5 | Unique identifier given to each stimulus text. | 0 | nan | nan |
The scanpaths for each trial (i.e. fixations in fixation order).
Please find the files at this link: Scanpaths
| Column name | Possible values | Value type | Description | Num missing values | Missing value description | Source |
|---|---|---|---|---|---|---|
| fixation_index | 1-1469 | Integer | The index of the fixation in temporal order. | 0 | nan | SR Research data viewer |
| text_domain | bio: 4682, biology: 200017, physics: 199721 | Categorical | The domain of the stimulus text. | 0 | nan | Manually tagged |
| trial | 1-12 | Integer | Each participant reads all 12 texts, the order of which follows their trial number. If text b0 has trial number 2 for participant 5, this participant read text b0 as the second text. | 0 | nan | nan |
| acc_bq_1 | min: 0.0, max: 1.0, mean: 0.3869, std: 0.487 | Float | The accuracy of background question 1. The answer can be either true or false, so the value is either 0 or 1. | 5785 | For participant 1 (p0), 31 (p0, b1, b5), 32 (p0), 61 (p1), 62 (b0, b1, b3, b5, 04) and 90 (b3) the accuracies for certain trials are missing due to hardware problems (missing measurements). | nan |
| acc_bq_2 | min: 0.0, max: 1.0, mean: 0.3564, std: 0.4789 | Float | The accuracy of background question 2. The answer can be either true or false, so the value is either 0 or 1. | 5785 | For participant 1 (p0), 31 (p0, b1, b5), 32 (p0), 61 (p1), 62 (b0, b1, b3, b5, 04) and 90 (b3) the accuracies for certain trials are missing due to hardware problems (missing measurements). | nan |
| acc_bq_3 | min: 0.0, max: 1.0, mean: 0.4217, std: 0.4938 | Float | The accuracy of background question 3. The answer can be either true or false, so the value is either 0 or 1. | 5785 | For participant 1 (p0), 31 (p0, b1, b5), 32 (p0), 61 (p1), 62 (b0, b1, b3, b5, 04) and 90 (b3) the accuracies for certain trials are missing due to hardware problems (missing measurements). | nan |
| acc_tq_1 | min: 0.0, max: 1.0, mean: 0.6625, std: 0.4729 | Float | The accuracy of text question 1. The answer can be either true or false, so the value is either 0 or 1. | 5785 | For participant 1 (p0), 31 (p0, b1, b5), 32 (p0), 61 (p1), 62 (b0, b1, b3, b5, 04) and 90 (b3) the accuracies for certain trials are missing due to hardware problems (missing measurements). | nan |
| acc_tq_2 | min: 0.0, max: 1.0, mean: 0.6326, std: 0.4821 | Float | The accuracy of text question 2. The answer can be either true or false, so the value is either 0 or 1. | 5785 | For participant 1 (p0), 31 (p0, b1, b5), 32 (p0), 61 (p1), 62 (b0, b1, b3, b5, 04) and 90 (b3) the accuracies for certain trials are missing due to hardware problems (missing measurements). | nan |
| acc_tq_3 | min: 0.0, max: 1.0, mean: 0.6564, std: 0.4749 | Float | The accuracy of text question 3. The answer can be either true or false, so the value is either 0 or 1. | 5785 | For participant 1 (p0), 31 (p0, b1, b5), 32 (p0), 61 (p1), 62 (b0, b1, b3, b5, 04) and 90 (b3) the accuracies for certain trials are missing due to hardware problems (missing measurements). | nan |
| fixation_duration | 2-4474 | Integer | The duration of the fixation in milliseconds. | 0 | nan | SR Research data viewer |
| next_saccade_duration | 1.0-9491.0 | Integer | The duration of the saccade that follows a fixation in milliseconds. | 46 | nan | SR Research data viewer |
| previous_saccade_duration | 1.0-9491.0 | Integer | The duration of a saccade that preceeds a fixation in milliseconds. | 515 | nan | SR Research data viewer |
| version | 0-105 | Integer | Specifies the version of the items. In each version, the order of the stimuli and the order of the answer options for each question differ. The specifics of each version can be found in the items.tsv. | 0 | nan | nan |
| line | 1-12 | Integer | The texts were presented on the screen in multiple lines. Specifies the line of the respective row; indexing starts at 1. | 0 | nan | nan |
| aoi | 1-1121 | Integer | The region of interest specified as character index in the text (see char_index_in_text). Defines which character has been fixated. | 0 | nan | SR Research experiment builder |
| char_index_in_line | 1-100 | Integer | Index of a character in the line. Indexing starts at 1. | 0 | nan | nan |
| original_fixation_index | 1-1478 | Integer | The index of the uncorrected fixation. | 0 | nan | SR Research data viewer |
| is_fixation_adjusted | False: 382202, True: 22218 | Categorical | Whether or not the fixation has been adjusted manually. | 0 | nan | Manually tagged. |
| reader_id | 0-105 | Integer | The unique identifier given to each reader. Reader IDs start at 0. | 0 | nan | Manually created |
| text_id | b0, b1, b2, b3, b4, b5, p0, p1, p2, p3, p4, p5 | Unique identifier given to each stimulus text. | 0 | nan | nan | |
| word_index_in_text | 1-180 | Integer | The index of the word in the text. Indexing starts at 1. | 0 | nan | nan |
| sent_index_in_text | 1-12 | Integer | The index of a sentence in the respective text. Indexing starts at 1. | 0 | nan | nan |
| char_index_in_text | 1-1121 | Integer | Index of a character in the text. Indexing starts at 1. | 0 | nan | nan |
| word | string | Words as they appear in the stimuli texts. Words are split at white-space. | 0 | nan | nan | |
| character | string | Character as text. | 0 | nan | nan | |
| text_id_numeric | 0-11 | Integer | Numerical value of text_id; 0=b0, 1=b1, 2=b2, 3=b3, 4=b4, 5=b5, 6=p0, 7=p1, 8=p2, 9=p3, 10=p4, 11=p5 | 0 | nan | Manually created |
| text_domain_numeric | 0: 204699, 1: 199721 | Categorical | Numerical value of text_domain; 0=biology, 1=physics. | 0 | nan | Manually created |
| reader_discipline_numeric | 0: 223158, 1: 181262 | Categorical | Numerical encoding of the reader discipline; 0=biology, 1=physics. | 0 | nan | Manually created |
| level_of_studies_numeric | 0: 154333, 1: 250087 | Categorical | Numerical value of level_of_studies; 0=beginner, 1=expert. | 0 | nan | demographic questionnaire |
| expert_reading_label_numeric | 0: 290883, 1: 113537 | Categorical | Numeric encoding of whether the reader is an expert in the text domain (i.e. text_domain == reader_discipline and reader is expert). 1=expert_reading, 0=non-expert_reading | 0 | nan | Manually tagged |
| expert_reading_label | expert_reading: 113537, non-expert_reading: 290883 | Categorical | Whether the reader is an expert in the text domain (i.e. text_domain == reader_discipline and reader is expert) | 0 | nan | Manually tagged |
The word-level reading measures in a short format.
Please find the files at this link: Reading measures
| Column name | Possible values | Value type | Description | Num missing values | Missing value description | Source |
|---|---|---|---|---|---|---|
| word_index_in_sent | 1-51 | Integer | The index of the word in the sentence. Indexing starts at 1. | 0 | nan | nan |
| sent_index_in_text | 1-12 | Integer | The index of a sentence in the respective text. Indexing starts at 1. | 0 | nan | nan |
| line | 1-12 | Integer | The texts were presented on the screen in multiple lines. Specifies the line of the respective row; indexing starts at 1. | 0 | nan | nan |
| FFD | min: 0, max: 2144, mean: 166.4158, std: 132.8433 | Float | First-fixation duration: duration of the first fixation on a word if this word is fixated in first-pass reading, otherwise 0. | 0 | nan | compute_reading_measures.py |
| SFD | min: 0, max: 2144, mean: 118.8309, std: 135.573 | Float | Single-fixation duration: duration of the only first-pass fixation on a word, 0 if the word was skipped or more than one fixation occurred in the first-pass (equals FFD in case of a single first-pass fixation). | 0 | nan | compute_reading_measures.py |
| FD | min: 0, max: 2144, mean: 203.5219, std: 116.9324 | Float | First duration: duration of the first fixation on a word (identical to FFD if not skipped in the first-pass). | 0 | nan | compute_reading_measures.py |
| FPRT | min: 0, max: 9649, mean: 247.1511, std: 298.6889 | Float | First-pass reading time: sum of the durations of all first-pass fixations on a word (0 if the word was skipped in the first-pass). | 0 | nan | compute_reading_measures.py |
| FRT | min: 0, max: 9649, mean: 291.8272, std: 288.631 | Float | First-reading time: sum of the duration of all fixations from first fixating the word (independent if the first fixation occurs in first-pass reading) until leaving the word for the first time (equals FPRT in case the word was fixated in the first-pass). | 0 | nan | compute_reading_measures.py |
| TFT | min: 0, max: 25314, mean: 632.8199, std: 720.3975 | Float | Total-fixation time: sum of all fixations on a word (FPRT+RRT). | 0 | nan | compute_reading_measures.py |
| RRT | min: 0, max: 23902, mean: 385.6688, std: 597.5206 | Float | Re-reading time: sum of the durations of all fixations on a word that do not belong to the first-pass (TFT-FPRT). | 0 | nan | compute_reading_measures.py |
| RPD_inc | min: 0, max: 318898, mean: 632.8199, std: 3881.7376 | Float | Inclusive regression-path duration: Sum of all fixation durations starting from the first first-pass fixation on a word until fixation on a word to the right of this word (including all regressive fixations on previous words), 0 if the word was not fixated in the first-pass (RPD_exc+RBRT). | 0 | nan | compute_reading_measures.py |
| RPD_exc | min: 0, max: 315640, mean: 342.295, std: 3815.3786 | Float | Exclusive regression-path duration: Sum of all fixation durations after initiating a first-pass regression from a word until fixating a word to the right of this word, without counting fixations on the word itself (RPD_inc-RBRT). | 0 | nan | compute_reading_measures.py |
| RBRT | min: 0, max: 10675, mean: 290.5249, std: 358.8929 | Float | Right-bounded reading time: Sum of all fixation durations on a word until a word to the right of this word is fixated (RPD_inc-RDP_exc). | 0 | nan | compute_reading_measures.py |
| Fix | 0: 14182, 1: 127943 | Categorical | Fixation: 1 if the word was fixated, otherwise 0 (FPF or RR). | 0 | nan | compute_reading_measures.py |
| FPF | 0: 38408, 1: 103717 | Categorical | First-pass fixation: 1 if the word was fixated in the first-pass, otherwise 0. | 0 | nan | compute_reading_measures.py |
| RR | 0: 48283, 1: 93842 | Categorical | Re-reading: 1 if the word was fixated after the first-pass reading, otherwise 0 (sign(RRT)). | 0 | nan | compute_reading_measures.py |
| FPReg | 0: 119060, 1: 23065 | Categorical | First-pass regression: 1 if a regression was initiated in the first-pass reading of the word, otherwise 0 (sign(RPD exc)). | 0 | nan | compute_reading_measures.py |
| TRC_out | min: 0, max: 15, mean: 0.4226, std: 0.7828 | Float | Total count of outgoing regressions: total number of regressive saccades initiated from this word. | 0 | nan | compute_reading_measures.py |
| TRC_in | min: 0, max: 12, mean: 0.4219, std: 0.7892 | Float | Total count of incoming regressions: total number of regressive saccades landing on this word. | 0 | nan | compute_reading_measures.py |
| LP | min: 0, max: 28, mean: 2.7791, std: 2.0942 | Float | Landing position: position of the first saccade on the word expressed by ordinal position of the fixated character. | 0 | nan | compute_reading_measures.py |
| SL_in | min: -162, max: 156, mean: 1.077, std: 3.0552 | Float | Incoming saccade length: length of the saccade that leads to first fixation on a word in number of words; positive sign if the saccade is a progressive one, negative sign if it is a regression. | 0 | nan | compute_reading_measures.py |
| SL_out | min: -179, max: 63, mean: 0.1881, std: 7.0821 | Float | Outgoing saccade length: length of the first saccade that leaves the word in number of words; positive sign if the saccade is a progressive one, negative sign if it is a regression; 0 if the word is never fixated. | 0 | nan | compute_reading_measures.py |
| TFC | min: 0, max: 87, mean: 2.8392, std: 2.9135 | Float | The total fixation count on the word. | 0 | nan | compute_reading_measures.py |
| text_domain_numeric | 0: 71550, 1: 70575 | Categorical | Numerical value of text_domain; 0=biology, 1=physics. | 0 | nan | Manually created |
| trial | 1-12 | Integer | Each participant reads all 12 texts, the order of which follows their trial number. If text b0 has trial number 2 for participant 5, this participant read text b0 as the second text. | 0 | nan | nan |
| text_id | b0, b1, b2, b3, b4, b5, p0, p1, p2, p3, p4, p5 | Unique identifier given to each stimulus text. | 0 | nan | nan | |
| text_id_numeric | 0-11 | Integer | Numerical value of text_id; 0=b0, 1=b1, 2=b2, 3=b3, 4=b4, 5=b5, 6=p0, 7=p1, 8=p2, 9=p3, 10=p4, 11=p5 | 0 | nan | Manually created |
| reader_id | 0-105 | Integer | The unique identifier given to each reader. Reader IDs start at 0. | 0 | nan | Manually created |
| gender_numeric | 0.0: 66325, 1.0: 73905, nan: 1895 | Categorical | Numerical value of gender; 0=male, 1=female. | 1895 | nan | nan |
| reader_discipline_numeric | 0: 81485, 1: 60640 | Categorical | Numerical encoding of the reader discipline; 0=biology, 1=physics. | 0 | nan | Manually created |
| level_of_studies_numeric | 0: 53060, 1: 89065 | Categorical | Numerical value of level_of_studies; 0=beginner, 1=expert. | 0 | nan | demographic questionnaire |
| discipline_level_of_studies_numeric | 0: 30320, 1: 51165, 2: 22740, 3: 37900 | Categorical | Numerical value of discipline_level_of_studies; 0=biology-beginner, 1=biology-expert, 2=physics-beginner, 3=physics-expert. | 0 | nan | demographic questionnaire |
| expert_reading_label_numeric | 0: 97547, 1: 44578 | Categorical | Numeric encoding of whether the reader is an expert in the text domain (i.e. text_domain == reader_discipline and reader is expert). 1=expert_reading, 0=non-expert_reading | 0 | nan | Manually tagged |
| expert_reading_label | expert_reading: 44578, non-expert_reading: 97547 | Categorical | Whether the reader is an expert in the text domain (i.e. text_domain == reader_discipline and reader is expert) | 0 | nan | Manually tagged |
| age | min: 18.0, max: 41.0, mean: 24.1644, std: 4.1809 | Float | Reader's age. | 3790 | nan | demographic questionnaire |
| mean_acc_bq | min: 0.0, max: 0.999250936329588, mean: 0.6381, std: 0.3139 | Float | The mean accuracy of all text questions for one text read by one reader. | 0 | nan | nan |
| mean_acc_tq | min: 0.0, max: 0.9991603694374476, mean: 0.3875, std: 0.3161 | Float | The mean accuracy of all background questions for one text read by one reader. | 0 | nan | nan |
| acc_bq_1 | min: 0.0, max: 0.9993197278911564, mean: 0.3858, std: 0.4857 | Float | The accuracy of background question 1. The answer can be either true or false, so the value is either 0 or 1. | 0 | For participant 1 (p0), 31 (p0, b1, b5), 32 (p0), 61 (p1), 62 (b0, b1, b3, b5, 04) and 90 (b3) the accuracies for certain trials are missing due to hardware problems (missing measurements). | nan |
| acc_bq_2 | min: 0.0, max: 0.9993197278911564, mean: 0.3559, std: 0.4778 | Float | The accuracy of background question 2. The answer can be either true or false, so the value is either 0 or 1. | 0 | For participant 1 (p0), 31 (p0, b1, b5), 32 (p0), 61 (p1), 62 (b0, b1, b3, b5, 04) and 90 (b3) the accuracies for certain trials are missing due to hardware problems (missing measurements). | nan |
| acc_bq_3 | min: 0.0, max: 0.9992429977289932, mean: 0.4207, std: 0.4925 | Float | The accuracy of background question 3. The answer can be either true or false, so the value is either 0 or 1. | 0 | For participant 1 (p0), 31 (p0, b1, b5), 32 (p0), 61 (p1), 62 (b0, b1, b3, b5, 04) and 90 (b3) the accuracies for certain trials are missing due to hardware problems (missing measurements). | nan |
| acc_tq_1 | min: 0.0, max: 0.9993197278911564, mean: 0.6364, std: 0.4794 | Float | The accuracy of text question 1. The answer can be either true or false, so the value is either 0 or 1. | 0 | For participant 1 (p0), 31 (p0, b1, b5), 32 (p0), 61 (p1), 62 (b0, b1, b3, b5, 04) and 90 (b3) the accuracies for certain trials are missing due to hardware problems (missing measurements). | nan |
| acc_tq_2 | min: 0.0, max: 0.999250936329588, mean: 0.6322, std: 0.4805 | Float | The accuracy of text question 2. The answer can be either true or false, so the value is either 0 or 1. | 0 | For participant 1 (p0), 31 (p0, b1, b5), 32 (p0), 61 (p1), 62 (b0, b1, b3, b5, 04) and 90 (b3) the accuracies for certain trials are missing due to hardware problems (missing measurements). | nan |
| acc_tq_3 | min: 0.0, max: 0.9993197278911564, mean: 0.6456, std: 0.4766 | Float | The accuracy of text question 3. The answer can be either true or false, so the value is either 0 or 1. | 0 | For participant 1 (p0), 31 (p0, b1, b5), 32 (p0), 61 (p1), 62 (b0, b1, b3, b5, 04) and 90 (b3) the accuracies for certain trials are missing due to hardware problems (missing measurements). | nan |
The word-level reading measures merged with trial, session and reader information, as well as more information on the words.
Please find the files at this link: Reading measures merged
| Column name | Possible values | Value type | Description | Num missing values | Missing value description | Source |
|---|---|---|---|---|---|---|
| word | string | Words as they appear in the stimuli texts. Words are split at white-space. | 0 | nan | nan | |
| word_with_punct | string | The word as it appears in the text, including punctuation. | 0 | nan | nan | |
| word_index_in_text | 1-180 | Integer | The index of the word in the text. Indexing starts at 1. | 0 | nan | nan |
| word_index_in_sent | 1-51 | Integer | The index of the word in the sentence. Indexing starts at 1. | 0 | nan | nan |
| sent_index_in_text | 1-12 | Integer | The index of a sentence in the respective text. Indexing starts at 1. | 0 | nan | nan |
| text_id_numeric | 0-11 | Integer | Numerical value of text_id; 0=b0, 1=b1, 2=b2, 3=b3, 4=b4, 5=b5, 6=p0, 7=p1, 8=p2, 9=p3, 10=p4, 11=p5 | 0 | nan | Manually created |
| text_domain | biology: 71550, physics: 70575 | Categorical | The domain of the stimulus text. | 0 | nan | Manually tagged |
| word_length | 2-33 | Integer | Word length is defined in number of characters including symbols like hyphens but without sentence punctuation at the end (i.e., z.B. = 4 characters; DNA-Kette =9 characters; eats.=4 characters). | 0 | nan | nan |
| STTS_punctuation_before | 0.0: 70800, 0: 70425, $(: 900 | Categorical | If a word is preceded by a punctuation mark, the STTS-PoS-tag of the punctuation mark is added here. | 0 | nan | Manually tagged |
| STTS_punctuation_after |
|
Categorical | If a word is followed by a punctuation mark, the STTS-PoS-tag of the punctuation mark is added here. | 0 | nan | Manually tagged |
| is_in_quote | 0: 141075, 1: 1050 | Categorical | Whether or not the word is part of an expression in quotes. | 0 | nan | Manually tagged |
| is_in_parentheses | 0: 141750, 1: 375 | Categorical | Whether or not the word is part of a phrase in parentheses. | 0 | nan | Manually tagged |
| is_clause_beginning | 0: 134700, 1: 7425 | Categorical | Whether or not the word is the beginning of a clause. | 0 | nan | Manually tagged |
| is_sent_beginning | 0: 134850, 1: 7275 | Categorical | Whether or not the word is the beginning of a new sentence. | 0 | nan | Manually tagged |
| is_clause_end | 0: 134775, 1: 7350 | Categorical | Whether or not the word is the end of a clause. | 0 | nan | Manually tagged |
| is_sent_end | 0: 134850, 1: 7275 | Categorical | Whether or not the word is the end of a sentence. | 0 | nan | Manually tagged |
| is_abbreviation | 0: 141750, 1: 375 | Categorical | Whether or not the entire word is an abbreviation. | 0 | nan | Manually tagged |
| is_expert_technical_term | 0: 130500, 1: 11625 | Categorical | 1 if the word is a technical term that is not generally understandable. E.g.: ""Agarose"". | 0 | nan | Manually tagged |
| is_general_technical_term | 0: 123450, 1: 18675 | Categorical | 1 if the word is a technical term that is generally understandable. E.g.: "elektrisch" | 0 | nan | Manually tagged |
| contains_symbol | 0: 141525, 1: 600 | Categorical | Whether or not the word contains a symbol. E.g.: β-D-Glucose | 0 | nan | Manually tagged |
| contains_hyphen | 0: 139950, 1: 2175 | Categorical | Whether or not the word contains a hyphen. E.g. 1 for DNA-Fragment (not words that have tag TRUNC (compositional first element, e.g. in "Sekundär- und Tertiärstrukturen", "Sekundär-" does not count as having a hyphen.)). | 0 | nan | Manually tagged |
| contains_abbreviation | 0: 141225, 1: 900 | Categorical | Whether or not the word contains an abbreviation. 0 for words that are only an abbreviation. See is_abbreviation. E.g. 1 for DNA-Fragment, 0 for DNA. | 0 | nan | Manually tagged |
| STTS_PoS_tag | ADJA: 11550, ADJD: 3975, ADV: 5475, APPR: 13800, APPRART: 3600, APZR: 75, ART: 20700, CARD: 675, KOKOM: 1275, KON: 4950, KOUI: 450, KOUS: 1200, NE: 300, NN: 38625, PAV: 1350, PDAT: 1200, PDS: 525, PIAT: 375, PIDAT: 675, PIS: 750, PPER: 1875, PPOSAT: 525, PRELAT: 450, PRELS: 2175, PRF: 1875, PTKA: 75, PTKNEG: 300, PTKVZ: 975, PTKZU: 750, PWAV: 75, TRUNC: 375, VAFIN: 5475, VAINF: 600, VMFIN: 1875, VMINF: 75, VVFIN: 7650, VVINF: 2475, VVIZU: 150, VVPP: 2850 | Categorical | Part-of-speech tags according to the STTS-tagset. See stimuli/ANNOTATION.MD for more information. | 0 | nan | Manually tagged |
| type | string | The orthographical representation of a word as found in the corpus; this data is case sensitive, i.e. there is a distinction between name and Name. | 0 | nan | dlexDB | |
| type_length_chars | 0.0-33.0 | Integer | The length of the type of a word in characters. See the description of word_length for a definition of how characters are counted. | 0 | nan | nan |
| PoS_tag | adja: 12150, adjd: 4050, adv: 6825, appr: 13650, apprart: 3600, art: 21000, card: 675, kokom: 1275, kon: 4725, koui: 375, kous: 1200, ne: 525, nn: 38100, pdat: 1200, pds: 525, piat: 375, pidat: 150, pis: 1050, pper: 1800, pposat: 525, prelat: 450, prels: 1800, prf: 1875, ptka: 75, ptkneg: 300, ptkvz: 1125, ptkzu: 750, pwav: 75, trunc: 375, vafin: 5475, vainf: 600, vmfin: 1800, vminf: 75, vvfin: 7725, vvinf: 2475, vvizu: 150, vvpp: 2850, xy: 375 | Categorical | Part-of-speech tag as defined by the dlexDB query. | 0 | nan | dlexDB |
| lemma | string | nan | 0 | nan | dlexDB | |
| lemma_length_chars | 0.0-32.0 | Integer | nan | 0 | nan | dlexDB |
| syllables | string | nan | 0 | nan | dlexDB | |
| type_length_syllables | 0.0-14.0 | Integer | nan | 0 | nan | dlexDB |
| annotated_type_frequency_normalized | min: 0.0, max: 24738.5901996, mean: 3629.1612, std: 6797.6492 | Float | The number of occurrences of an annotated type in corpus. An annotated type is a unique combination of a type, its part-of-speech tag and its lemma. | 0 | nan | dlexDB |
| type_frequency_normalized | min: 0.0, max: 26530.3631386, mean: 4141.6498, std: 7546.5578 | Float | nan | 0 | nan | dlexDB |
| lemma_frequency_normalized | min: 0.0, max: 80100.3069113, mean: 12271.0154, std: 24660.3797 | Float | nan | 0 | nan | dlexDB |
| familiarity_normalized | min: 0.0, max: 26530.3631386, mean: 3822.4994, std: 7457.3314 | Float | nan | 0 | nan | dlexDB |
| regularity_normalized | min: 0.0, max: 2123.30585022, mean: 35.3095, std: 119.8288 | Float | nan | 0 | nan | dlexDB |
| document_frequency_normalized | min: 0.0, max: 9372.80956103, mean: 2885.4746, std: 3353.4877 | Float | nan | 0 | nan | dlexDB |
| sentence_frequency_normalized | min: 0.0, max: 30912.3596552, mean: 5745.1861, std: 9454.5921 | Float | nan | 0 | nan | dlexDB |
| cumulative_syllable_corpus_frequency_normalized | min: 0.0, max: 125126.524676, mean: 15795.556, std: 15820.9152 | Float | nan | 0 | nan | dlexDB |
| cumulative_syllable_lexicon_frequency_normalized | min: 0.0, max: 218985.607753, mean: 21763.0396, std: 31363.3366 | Float | nan | 0 | nan | dlexDB |
| cumulative_character_corpus_frequency_normalized | min: 0.0, max: 7810554.20193, mean: 1800394.2485, std: 1298158.5605 | Float | nan | 0 | nan | dlexDB |
| cumulative_character_lexicon_frequency_normalized | min: 0.0, max: 18380479.713, mean: 4004667.3367, std: 2909455.8454 | Float | nan | 0 | nan | dlexDB |
| cumulative_character_bigram_corpus_frequency_normalized | min: 0.0, max: 1322150.62097, mean: 341028.5141, std: 227677.2532 | Float | nan | 0 | nan | dlexDB |
| cumulative_character_bigram_lexicon_frequency_normalized | min: 0.0, max: 2788357.77704, mean: 554080.6642, std: 451286.9101 | Float | nan | 0 | nan | dlexDB |
| cumulative_character_trigram_corpus_frequency_normalized | min: 0.0, max: 603427.130456, mean: 213996.2534, std: 130950.6249 | Float | nan | 0 | nan | dlexDB |
| cumulative_character_trigram_lexicon_frequency_normalized | min: 0.0, max: 899592.89035, mean: 223247.7744, std: 175811.3775 | Float | nan | 0 | nan | dlexDB |
| initial_letter_frequency_normalized | min: 0.0, max: 110461.430317, mean: 36031.6466, std: 33586.1123 | Float | nan | 0 | nan | dlexDB |
| initial_bigram_frequency_normalized | min: 0.0, max: 53801.2331077, mean: 11986.4422, std: 14536.7787 | Float | nan | 0 | nan | dlexDB |
| initial_trigram_frequency_normalized | min: -0.00817507899599, max: 29048.3692201, mean: 5528.0412, std: 8782.9659 | Float | nan | 0 | nan | dlexDB |
| avg_cond_prob_in_bigrams | min: 0.0, max: 0.5006180465, mean: 0.0423, std: 0.0447 | Float | The conditional probability of the bigram, given the occurrence of its first component. In other words, how likely it is for the second component to follow directly after the first. Here, this measure is computed on the basis of the annotated type information. | 0 | nan | dlexDB |
| avg_cond_prob_in_trigrams | min: 0.0, max: 25.0, mean: 0.2371, std: 0.5852 | Float | The conditional probability of the trigram, given the occurrence of its initial bigram. In other words, how likely it is for the third component to follow directly after the initial pair. Here, this measure is computed on the basis of the annotated type information. | 0 | nan | dlexDB |
| neighbors_coltheart_higher_freq_cum_freq_normalized | min: 0.0, max: 44055.247282, mean: 2111.0615, std: 7323.9586 | Float | nan | 0 | nan | dlexDB |
| neighbors_coltheart_higher_freq_count_normalized | min: 0.0, max: 8.13363128109, mean: 0.195, std: 0.4875 | Float | nan | 0 | nan | dlexDB |
| neighbors_coltheart_all_cum_freq_normalized | min: 0.0, max: 49782.1108458, mean: 4765.8454, std: 9884.7277 | Float | nan | 0 | nan | dlexDB |
| neighbors_coltheart_all_count_normalized | min: 0.0, max: 47.5175301158, mean: 14.8301, std: 14.4676 | Float | nan | 0 | nan | dlexDB |
| neighbors_levenshtein_higher_freq_cum_freq_normalized | min: 0.0, max: 44055.247282, mean: 2703.1737, std: 7703.635 | Float | nan | 0 | nan | dlexDB |
| neighbors_levenshtein_higher_freq_count_normalized | min: 0.0, max: 11.9864039932, mean: 0.3077, std: 0.6418 | Float | nan | 0 | nan | dlexDB |
| neighbors_levenshtein_all_cum_freq_normalized | min: 0.0, max: 54875.2749862, mean: 6310.865, std: 11349.5391 | Float | nan | 0 | nan | dlexDB |
| neighbors_levenshtein_all_count_normalized | min: 0.0, max: 75.7711966712, mean: 23.1334, std: 22.6083 | Float | nan | 0 | nan | dlexDB |
| sent_surprisal_gpt2-base | min: 0.0005104430601932, max: 56.804420471191406, mean: 6.9134, std: 6.5992 | Float | Surprisal value extracted from a language model (GerPT2-base) with the sentence as context. | 0 | nan | See script get_surprisal.py |
| text_surprisal_gpt2-base | min: 0.0002225389762315, max: 53.041446685791016, mean: 5.5822, std: 5.7075 | Float | Surprisal value extracted from a language model (GerPT2-base) with the text as context. | 0 | nan | See script get_surprisal.py |
| sent_surprisal_gpt2-large | min: 0.0002048997703241, max: 42.28059005737305, mean: 6.1407, std: 5.8838 | Float | Surprisal value extracted from a language model (GerPT2-large) with the sentence as context. | 0 | nan | See script get_surprisal.py |
| text_surprisal_gpt2-large | min: 0.0001027531252475, max: 35.38883209228516, mean: 4.735, std: 4.8632 | Float | Surprisal value extracted from a language model (GerPT2-large) with the text as context. | 0 | nan | See script get_surprisal.py |
| sent_surprisal_llama-7b | min: 0.0001720042055239, max: 42.96158599853516, mean: 6.1564, std: 5.7258 | Float | Surprisal value extracted from a language model (LeoLM-7b) with the sentence as context. | 0 | nan | See script get_surprisal.py |
| text_surprisal_llama-7b | min: 1.990775308513548e-05, max: 35.62324142456055, mean: 3.4794, std: 3.8542 | Float | Surprisal value extracted from a language model (LeoLM-7b) with the text as context. | 0 | nan | See script get_surprisal.py |
| sent_surprisal_llama-13b | min: 8.702239938429557e-06, max: 46.25139999389648, mean: 6.0065, std: 5.8573 | Float | Surprisal value extracted from a language model (LeoLM-13b) with the sentence as context. | 0 | nan | See script get_surprisal.py |
| text_surprisal_llama-13b | min: 9.298280929215252e-06, max: 36.29869842529297, mean: 3.2454, std: 3.8081 | Float | Surprisal value extracted from a language model (LeoLM-13b) with the text as context. | 0 | nan | See script get_surprisal.py |
| sent_surprisal_bert-base | min: 1.1920928244535389e-07, max: 101.79562616348268, mean: 6.4507, std: 11.6153 | Float | Surprisal value extracted from a language model (BERT-base) with the sentence as context. | 0 | nan | See script get_surprisal.py |
| text_surprisal_bert-base | min: -0.0, max: 88.84420316047726, mean: 6.2599, std: 11.5816 | Float | Surprisal value extracted from a language model (BERT-base) with the text as context. | 0 | nan | See script get_surprisal.py |
| line | 1-12 | Integer | The texts were presented on the screen in multiple lines. Specifies the line of the respective row; indexing starts at 1. | 0 | nan | nan |
| FFD | min: 0, max: 2144, mean: 166.4158, std: 132.8433 | Float | First-fixation duration: duration of the first fixation on a word if this word is fixated in first-pass reading, otherwise 0. | 0 | nan | compute_reading_measures.py |
| SFD | min: 0, max: 2144, mean: 118.8309, std: 135.573 | Float | Single-fixation duration: duration of the only first-pass fixation on a word, 0 if the word was skipped or more than one fixation occurred in the first-pass (equals FFD in case of a single first-pass fixation). | 0 | nan | compute_reading_measures.py |
| FD | min: 0, max: 2144, mean: 203.5219, std: 116.9324 | Float | First duration: duration of the first fixation on a word (identical to FFD if not skipped in the first-pass). | 0 | nan | compute_reading_measures.py |
| FPRT | min: 0, max: 9649, mean: 247.1511, std: 298.6889 | Float | First-pass reading time: sum of the durations of all first-pass fixations on a word (0 if the word was skipped in the first-pass). | 0 | nan | compute_reading_measures.py |
| FRT | min: 0, max: 9649, mean: 291.8272, std: 288.631 | Float | First-reading time: sum of the duration of all fixations from first fixating the word (independent if the first fixation occurs in first-pass reading) until leaving the word for the first time (equals FPRT in case the word was fixated in the first-pass). | 0 | nan | compute_reading_measures.py |
| TFT | min: 0, max: 25314, mean: 632.8199, std: 720.3975 | Float | Total-fixation time: sum of all fixations on a word (FPRT+RRT). | 0 | nan | compute_reading_measures.py |
| TFC | min: 0, max: 87, mean: 2.8392, std: 2.9135 | Float | The total fixation count on the word. | 0 | nan | compute_reading_measures.py |
| RRT | min: 0, max: 23902, mean: 385.6688, std: 597.5206 | Float | Re-reading time: sum of the durations of all fixations on a word that do not belong to the first-pass (TFT-FPRT). | 0 | nan | compute_reading_measures.py |
| RPD_inc | min: 0, max: 318898, mean: 632.8199, std: 3881.7376 | Float | Inclusive regression-path duration: Sum of all fixation durations starting from the first first-pass fixation on a word until fixation on a word to the right of this word (including all regressive fixations on previous words), 0 if the word was not fixated in the first-pass (RPD_exc+RBRT). | 0 | nan | compute_reading_measures.py |
| RPD_exc | min: 0, max: 315640, mean: 342.295, std: 3815.3786 | Float | Exclusive regression-path duration: Sum of all fixation durations after initiating a first-pass regression from a word until fixating a word to the right of this word, without counting fixations on the word itself (RPD_inc-RBRT). | 0 | nan | compute_reading_measures.py |
| RBRT | min: 0, max: 10675, mean: 290.5249, std: 358.8929 | Float | Right-bounded reading time: Sum of all fixation durations on a word until a word to the right of this word is fixated (RPD_inc-RDP_exc). | 0 | nan | compute_reading_measures.py |
| Fix | 0: 14182, 1: 127943 | Categorical | Fixation: 1 if the word was fixated, otherwise 0 (FPF or RR). | 0 | nan | compute_reading_measures.py |
| FPF | 0: 38408, 1: 103717 | Categorical | First-pass fixation: 1 if the word was fixated in the first-pass, otherwise 0. | 0 | nan | compute_reading_measures.py |
| RR | 0: 48283, 1: 93842 | Categorical | Re-reading: 1 if the word was fixated after the first-pass reading, otherwise 0 (sign(RRT)). | 0 | nan | compute_reading_measures.py |
| FPReg | 0: 119060, 1: 23065 | Categorical | First-pass regression: 1 if a regression was initiated in the first-pass reading of the word, otherwise 0 (sign(RPD exc)). | 0 | nan | compute_reading_measures.py |
| TRC_out | min: 0, max: 15, mean: 0.4226, std: 0.7828 | Float | Total count of outgoing regressions: total number of regressive saccades initiated from this word. | 0 | nan | compute_reading_measures.py |
| TRC_in | min: 0, max: 12, mean: 0.4219, std: 0.7892 | Float | Total count of incoming regressions: total number of regressive saccades landing on this word. | 0 | nan | compute_reading_measures.py |
| LP | min: 0, max: 28, mean: 2.7791, std: 2.0942 | Float | Landing position: position of the first saccade on the word expressed by ordinal position of the fixated character. | 0 | nan | compute_reading_measures.py |
| SL_in | min: -162, max: 156, mean: 1.077, std: 3.0552 | Float | Incoming saccade length: length of the saccade that leads to first fixation on a word in number of words; positive sign if the saccade is a progressive one, negative sign if it is a regression. | 0 | nan | compute_reading_measures.py |
| SL_out | min: -179, max: 63, mean: 0.1881, std: 7.0821 | Float | Outgoing saccade length: length of the first saccade that leaves the word in number of words; positive sign if the saccade is a progressive one, negative sign if it is a regression; 0 if the word is never fixated. | 0 | nan | compute_reading_measures.py |
| acc_bq_1 | min: 0.0, max: 0.9993197278911564, mean: 0.3858, std: 0.4857 | Float | The accuracy of background question 1. The answer can be either true or false, so the value is either 0 or 1. | 0 | For participant 1 (p0), 31 (p0, b1, b5), 32 (p0), 61 (p1), 62 (b0, b1, b3, b5, 04) and 90 (b3) the accuracies for certain trials are missing due to hardware problems (missing measurements). | nan |
| acc_bq_2 | min: 0.0, max: 0.9993197278911564, mean: 0.3559, std: 0.4778 | Float | The accuracy of background question 2. The answer can be either true or false, so the value is either 0 or 1. | 0 | For participant 1 (p0), 31 (p0, b1, b5), 32 (p0), 61 (p1), 62 (b0, b1, b3, b5, 04) and 90 (b3) the accuracies for certain trials are missing due to hardware problems (missing measurements). | nan |
| acc_bq_3 | min: 0.0, max: 0.9992429977289932, mean: 0.4207, std: 0.4925 | Float | The accuracy of background question 3. The answer can be either true or false, so the value is either 0 or 1. | 0 | For participant 1 (p0), 31 (p0, b1, b5), 32 (p0), 61 (p1), 62 (b0, b1, b3, b5, 04) and 90 (b3) the accuracies for certain trials are missing due to hardware problems (missing measurements). | nan |
| acc_tq_1 | min: 0.0, max: 0.9993197278911564, mean: 0.6364, std: 0.4794 | Float | The accuracy of text question 1. The answer can be either true or false, so the value is either 0 or 1. | 0 | For participant 1 (p0), 31 (p0, b1, b5), 32 (p0), 61 (p1), 62 (b0, b1, b3, b5, 04) and 90 (b3) the accuracies for certain trials are missing due to hardware problems (missing measurements). | nan |
| acc_tq_2 | min: 0.0, max: 0.999250936329588, mean: 0.6322, std: 0.4805 | Float | The accuracy of text question 2. The answer can be either true or false, so the value is either 0 or 1. | 0 | For participant 1 (p0), 31 (p0, b1, b5), 32 (p0), 61 (p1), 62 (b0, b1, b3, b5, 04) and 90 (b3) the accuracies for certain trials are missing due to hardware problems (missing measurements). | nan |
| acc_tq_3 | min: 0.0, max: 0.9993197278911564, mean: 0.6456, std: 0.4766 | Float | The accuracy of text question 3. The answer can be either true or false, so the value is either 0 or 1. | 0 | For participant 1 (p0), 31 (p0, b1, b5), 32 (p0), 61 (p1), 62 (b0, b1, b3, b5, 04) and 90 (b3) the accuracies for certain trials are missing due to hardware problems (missing measurements). | nan |
| mean_acc_tq | min: 0.0, max: 0.9991603694374476, mean: 0.3875, std: 0.3161 | Float | The mean accuracy of all background questions for one text read by one reader. | 0 | nan | nan |
| mean_acc_bq | min: 0.0, max: 0.999250936329588, mean: 0.6381, std: 0.3139 | Float | The mean accuracy of all text questions for one text read by one reader. | 0 | nan | nan |
| text_domain_numeric | 0: 71550, 1: 70575 | Categorical | Numerical value of text_domain; 0=biology, 1=physics. | 0 | nan | Manually created |
| trial | 1-12 | Integer | Each participant reads all 12 texts, the order of which follows their trial number. If text b0 has trial number 2 for participant 5, this participant read text b0 as the second text. | 0 | nan | nan |
| text_id | b0, b1, b2, b3, b4, b5, p0, p1, p2, p3, p4, p5 | Unique identifier given to each stimulus text. | 0 | nan | nan | |
| reader_id | 0-105 | Integer | The unique identifier given to each reader. Reader IDs start at 0. | 0 | nan | Manually created |
| gender_numeric | 0.0: 66325, 1.0: 73905, nan: 1895 | Categorical | Numerical value of gender; 0=male, 1=female. | 1895 | nan | nan |
| reader_discipline_numeric | 0: 81485, 1: 60640 | Categorical | Numerical encoding of the reader discipline; 0=biology, 1=physics. | 0 | nan | Manually created |
| age | min: 18.0, max: 41.0, mean: 24.1644, std: 4.1809 | Float | Reader's age. | 3790 | nan | demographic questionnaire |
| level_of_studies_numeric | 0: 53060, 1: 89065 | Categorical | Numerical value of level_of_studies; 0=beginner, 1=expert. | 0 | nan | demographic questionnaire |
| discipline_level_of_studies_numeric | 0: 30320, 1: 51165, 2: 22740, 3: 37900 | Categorical | Numerical value of discipline_level_of_studies; 0=biology-beginner, 1=biology-expert, 2=physics-beginner, 3=physics-expert. | 0 | nan | demographic questionnaire |
| expert_reading_label_numeric | 0: 97547, 1: 44578 | Categorical | Numeric encoding of whether the reader is an expert in the text domain (i.e. text_domain == reader_discipline and reader is expert). 1=expert_reading, 0=non-expert_reading | 0 | nan | Manually tagged |
Contains the scanpaths for each trial merged with infomration on the reader, texts, etc.
Please find the files at this link: Scanpaths merged
| Column name | Possible values | Value type | Description | Num missing values | Missing value description | Source |
|---|---|---|---|---|---|---|
| fixation_index | 1-1469 | Integer | The index of the fixation in temporal order. | 0 | nan | SR Research data viewer |
| text_domain | bio: 4682, biology: 200017, physics: 199721 | Categorical | The domain of the stimulus text. | 0 | nan | Manually tagged |
| trial | 1-12 | Integer | Each participant reads all 12 texts, the order of which follows their trial number. If text b0 has trial number 2 for participant 5, this participant read text b0 as the second text. | 0 | nan | nan |
| acc_bq_1 | min: 0.0, max: 1.0, mean: 0.3869, std: 0.487 | Float | The accuracy of background question 1. The answer can be either true or false, so the value is either 0 or 1. | 5785 | For participant 1 (p0), 31 (p0, b1, b5), 32 (p0), 61 (p1), 62 (b0, b1, b3, b5, 04) and 90 (b3) the accuracies for certain trials are missing due to hardware problems (missing measurements). | nan |
| acc_bq_2 | min: 0.0, max: 1.0, mean: 0.3564, std: 0.4789 | Float | The accuracy of background question 2. The answer can be either true or false, so the value is either 0 or 1. | 5785 | For participant 1 (p0), 31 (p0, b1, b5), 32 (p0), 61 (p1), 62 (b0, b1, b3, b5, 04) and 90 (b3) the accuracies for certain trials are missing due to hardware problems (missing measurements). | nan |
| acc_bq_3 | min: 0.0, max: 1.0, mean: 0.4217, std: 0.4938 | Float | The accuracy of background question 3. The answer can be either true or false, so the value is either 0 or 1. | 5785 | For participant 1 (p0), 31 (p0, b1, b5), 32 (p0), 61 (p1), 62 (b0, b1, b3, b5, 04) and 90 (b3) the accuracies for certain trials are missing due to hardware problems (missing measurements). | nan |
| acc_tq_1 | min: 0.0, max: 1.0, mean: 0.6625, std: 0.4729 | Float | The accuracy of text question 1. The answer can be either true or false, so the value is either 0 or 1. | 5785 | For participant 1 (p0), 31 (p0, b1, b5), 32 (p0), 61 (p1), 62 (b0, b1, b3, b5, 04) and 90 (b3) the accuracies for certain trials are missing due to hardware problems (missing measurements). | nan |
| acc_tq_2 | min: 0.0, max: 1.0, mean: 0.6326, std: 0.4821 | Float | The accuracy of text question 2. The answer can be either true or false, so the value is either 0 or 1. | 5785 | For participant 1 (p0), 31 (p0, b1, b5), 32 (p0), 61 (p1), 62 (b0, b1, b3, b5, 04) and 90 (b3) the accuracies for certain trials are missing due to hardware problems (missing measurements). | nan |
| acc_tq_3 | min: 0.0, max: 1.0, mean: 0.6564, std: 0.4749 | Float | The accuracy of text question 3. The answer can be either true or false, so the value is either 0 or 1. | 5785 | For participant 1 (p0), 31 (p0, b1, b5), 32 (p0), 61 (p1), 62 (b0, b1, b3, b5, 04) and 90 (b3) the accuracies for certain trials are missing due to hardware problems (missing measurements). | nan |
| fixation_duration | 2-4474 | Integer | The duration of the fixation in milliseconds. | 0 | nan | SR Research data viewer |
| next_saccade_duration | 1.0-9491.0 | Integer | The duration of the saccade that follows a fixation in milliseconds. | 46 | nan | SR Research data viewer |
| previous_saccade_duration | 1.0-9491.0 | Integer | The duration of a saccade that preceeds a fixation in milliseconds. | 515 | nan | SR Research data viewer |
| version | 0-105 | Integer | Specifies the version of the items. In each version, the order of the stimuli and the order of the answer options for each question differ. The specifics of each version can be found in the items.tsv. | 0 | nan | nan |
| line | 1-12 | Integer | The texts were presented on the screen in multiple lines. Specifies the line of the respective row; indexing starts at 1. | 0 | nan | nan |
| aoi | 1-1121 | Integer | The region of interest specified as character index in the text (see char_index_in_text). Defines which character has been fixated. | 0 | nan | SR Research experiment builder |
| char_index_in_line | 1-100 | Integer | Index of a character in the line. Indexing starts at 1. | 0 | nan | nan |
| original_fixation_index | 1-1478 | Integer | The index of the uncorrected fixation. | 0 | nan | SR Research data viewer |
| is_fixation_adjusted | False: 382202, True: 22218 | Categorical | Whether or not the fixation has been adjusted manually. | 0 | nan | Manually tagged. |
| reader_id | 0-105 | Integer | The unique identifier given to each reader. Reader IDs start at 0. | 0 | nan | Manually created |
| text_id | b0, b1, b2, b3, b4, b5, p0, p1, p2, p3, p4, p5 | Unique identifier given to each stimulus text. | 0 | nan | nan | |
| word_index_in_text | 1-180 | Integer | The index of the word in the text. Indexing starts at 1. | 0 | nan | nan |
| sent_index_in_text | 1-12 | Integer | The index of a sentence in the respective text. Indexing starts at 1. | 0 | nan | nan |
| char_index_in_text | 1-1121 | Integer | Index of a character in the text. Indexing starts at 1. | 0 | nan | nan |
| word | string | Words as they appear in the stimuli texts. Words are split at white-space. | 0 | nan | nan | |
| character | string | Character as text. | 0 | nan | nan | |
| text_id_numeric | 0-11 | Integer | Numerical value of text_id; 0=b0, 1=b1, 2=b2, 3=b3, 4=b4, 5=b5, 6=p0, 7=p1, 8=p2, 9=p3, 10=p4, 11=p5 | 0 | nan | Manually created |
| text_domain_numeric | 0: 204699, 1: 199721 | Categorical | Numerical value of text_domain; 0=biology, 1=physics. | 0 | nan | Manually created |
| reader_discipline_numeric | 0: 223158, 1: 181262 | Categorical | Numerical encoding of the reader discipline; 0=biology, 1=physics. | 0 | nan | Manually created |
| level_of_studies_numeric | 0: 154333, 1: 250087 | Categorical | Numerical value of level_of_studies; 0=beginner, 1=expert. | 0 | nan | demographic questionnaire |
| expert_reading_label_numeric | 0: 290883, 1: 113537 | Categorical | Numeric encoding of whether the reader is an expert in the text domain (i.e. text_domain == reader_discipline and reader is expert). 1=expert_reading, 0=non-expert_reading | 0 | nan | Manually tagged |
| expert_reading_label | expert_reading: 113537, non-expert_reading: 290883 | Categorical | Whether the reader is an expert in the text domain (i.e. text_domain == reader_discipline and reader is expert) | 0 | nan | Manually tagged |
| word_with_punct | string | The word as it appears in the text, including punctuation. | 96 | nan | nan | |
| word_index_in_sent | 1-51 | Integer | The index of the word in the sentence. Indexing starts at 1. | 0 | nan | nan |
| word_length | 2-33 | Integer | Word length is defined in number of characters including symbols like hyphens but without sentence punctuation at the end (i.e., z.B. = 4 characters; DNA-Kette =9 characters; eats.=4 characters). | 0 | nan | nan |
| STTS_punctuation_before | 0.0: 211108, 0: 189407, $(: 3905 | Categorical | If a word is preceded by a punctuation mark, the STTS-PoS-tag of the punctuation mark is added here. | 0 | nan | Manually tagged |
| STTS_punctuation_after |
|
Categorical | If a word is followed by a punctuation mark, the STTS-PoS-tag of the punctuation mark is added here. | 0 | nan | Manually tagged |
| is_in_quote | 0: 399715, 1: 4705 | Categorical | Whether or not the word is part of an expression in quotes. | 0 | nan | Manually tagged |
| is_in_parentheses | 0: 403155, 1: 1265 | Categorical | Whether or not the word is part of a phrase in parentheses. | 0 | nan | Manually tagged |
| is_clause_beginning | 0: 388232, 1: 16188 | Categorical | Whether or not the word is the beginning of a clause. | 0 | nan | Manually tagged |
| is_sent_beginning | 0: 386681, 1: 17739 | Categorical | Whether or not the word is the beginning of a new sentence. | 0 | nan | Manually tagged |
| is_clause_end | 0: 381545, 1: 22875 | Categorical | Whether or not the word is the end of a clause. | 0 | nan | Manually tagged |
| is_sent_end | 0: 380027, 1: 24393 | Categorical | Whether or not the word is the end of a sentence. | 0 | nan | Manually tagged |
| is_abbreviation | 0: 403478, 1: 942 | Categorical | Whether or not the entire word is an abbreviation. | 0 | nan | Manually tagged |
| is_expert_technical_term | 0: 332354, 1: 72066 | Categorical | 1 if the word is a technical term that is not generally understandable. E.g.: ""Agarose"". | 0 | nan | Manually tagged |
| is_general_technical_term | 0: 325333, 1: 79087 | Categorical | 1 if the word is a technical term that is generally understandable. E.g.: "elektrisch" | 0 | nan | Manually tagged |
| contains_symbol | 0: 400458, 1: 3962 | Categorical | Whether or not the word contains a symbol. E.g.: β-D-Glucose | 0 | nan | Manually tagged |
| contains_hyphen | 0: 388149, 1: 16271 | Categorical | Whether or not the word contains a hyphen. E.g. 1 for DNA-Fragment (not words that have tag TRUNC (compositional first element, e.g. in "Sekundär- und Tertiärstrukturen", "Sekundär-" does not count as having a hyphen.)). | 0 | nan | Manually tagged |
| contains_abbreviation | 0: 399423, 1: 4997 | Categorical | Whether or not the word contains an abbreviation. 0 for words that are only an abbreviation. See is_abbreviation. E.g. 1 for DNA-Fragment, 0 for DNA. | 0 | nan | Manually tagged |
| STTS_PoS_tag | ADJA: 51041, ADJD: 12714, ADV: 12236, APPR: 22470, APPRART: 5566, APZR: 91, ART: 37340, CARD: 1594, KOKOM: 2428, KON: 5798, KOUI: 654, KOUS: 2521, NE: 955, NN: 162980, PAV: 3444, PDAT: 3292, PDS: 1374, PIAT: 791, PIDAT: 1653, PIS: 1322, PPER: 2511, PPOSAT: 1360, PRELAT: 1302, PRELS: 4193, PRF: 3606, PTKA: 97, PTKNEG: 687, PTKVZ: 1490, PTKZU: 583, PWAV: 76, TRUNC: 1137, VAFIN: 10340, VAINF: 1206, VMFIN: 3953, VMINF: 153, VVFIN: 23854, VVINF: 7713, VVIZU: 578, VVPP: 9317 | Categorical | Part-of-speech tags according to the STTS-tagset. See stimuli/ANNOTATION.MD for more information. | 0 | nan | Manually tagged |
| type | string | The orthographical representation of a word as found in the corpus; this data is case sensitive, i.e. there is a distinction between name and Name. | 0 | nan | dlexDB | |
| type_length_chars | 0.0-33.0 | Integer | The length of the type of a word in characters. See the description of word_length for a definition of how characters are counted. | 0 | nan | nan |
| PoS_tag | adja: 53330, adjd: 12226, adv: 15728, appr: 22193, apprart: 5566, art: 37918, card: 1594, kokom: 2428, kon: 5405, koui: 559, kous: 2521, ne: 1386, nn: 160585, pdat: 3292, pds: 1374, piat: 791, pidat: 352, pis: 2063, pper: 2434, pposat: 1360, prelat: 1302, prels: 4076, prf: 3606, ptka: 97, ptkneg: 687, ptkvz: 1891, ptkzu: 583, pwav: 76, trunc: 1137, vafin: 10340, vainf: 1206, vmfin: 3829, vminf: 153, vvfin: 23978, vvinf: 7713, vvizu: 578, vvpp: 9317, xy: 746 | Categorical | Part-of-speech tag as defined by the dlexDB query. | 0 | nan | dlexDB |
| lemma | string | nan | 0 | nan | dlexDB | |
| lemma_length_chars | 0.0-32.0 | Integer | nan | 0 | nan | dlexDB |
| syllables | string | nan | 0 | nan | dlexDB | |
| type_length_syllables | 0.0-14.0 | Integer | nan | 0 | nan | dlexDB |
| annotated_type_frequency_normalized | min: 0.0, max: 24738.5901996, mean: 1950.9055, std: 5185.3006 | Float | The number of occurrences of an annotated type in corpus. An annotated type is a unique combination of a type, its part-of-speech tag and its lemma. | 0 | nan | dlexDB |
| type_frequency_normalized | min: 0.0, max: 26530.3631386, mean: 2247.4523, std: 5847.2187 | Float | nan | 0 | nan | dlexDB |
| lemma_frequency_normalized | min: 0.0, max: 80100.3069113, mean: 7203.2409, std: 19769.4428 | Float | nan | 0 | nan | dlexDB |
| familiarity_normalized | min: 0.0, max: 26530.3631386, mean: 2191.7786, std: 5759.2592 | Float | nan | 0 | nan | dlexDB |
| regularity_normalized | min: 0.0, max: 2123.30585022, mean: 46.8657, std: 137.5046 | Float | nan | 0 | nan | dlexDB |
| document_frequency_normalized | min: 0.0, max: 9372.80956103, mean: 1684.1043, std: 2829.0626 | Float | nan | 0 | nan | dlexDB |
| sentence_frequency_normalized | min: 0.0, max: 30912.3596552, mean: 3137.4539, std: 7374.8037 | Float | nan | 0 | nan | dlexDB |
| cumulative_syllable_corpus_frequency_normalized | min: 0.0, max: 125126.524676, mean: 15768.7784, std: 17529.5528 | Float | nan | 0 | nan | dlexDB |
| cumulative_syllable_lexicon_frequency_normalized | min: 0.0, max: 218985.607753, mean: 27232.3183, std: 36883.9628 | Float | nan | 0 | nan | dlexDB |
| cumulative_character_corpus_frequency_normalized | min: 0.0, max: 7810554.20193, mean: 2053804.334, std: 1596380.3916 | Float | nan | 0 | nan | dlexDB |
| cumulative_character_lexicon_frequency_normalized | min: 0.0, max: 18380479.713, mean: 4612580.9638, std: 3597155.0404 | Float | nan | 0 | nan | dlexDB |
| cumulative_character_bigram_corpus_frequency_normalized | min: 0.0, max: 1322150.62097, mean: 356831.454, std: 269772.388 | Float | nan | 0 | nan | dlexDB |
| cumulative_character_bigram_lexicon_frequency_normalized | min: 0.0, max: 2788357.77704, mean: 629626.1651, std: 539088.9742 | Float | nan | 0 | nan | dlexDB |
| cumulative_character_trigram_corpus_frequency_normalized | min: 0.0, max: 603427.130456, mean: 200341.8076, std: 144122.7012 | Float | nan | 0 | nan | dlexDB |
| cumulative_character_trigram_lexicon_frequency_normalized | min: 0.0, max: 899592.89035, mean: 236423.2776, std: 199573.1416 | Float | nan | 0 | nan | dlexDB |
| initial_letter_frequency_normalized | min: 0.0, max: 110461.430317, mean: 28045.0077, std: 30618.9167 | Float | nan | 0 | nan | dlexDB |
| initial_bigram_frequency_normalized | min: 0.0, max: 53801.2331077, mean: 8706.0335, std: 12743.2638 | Float | nan | 0 | nan | dlexDB |
| initial_trigram_frequency_normalized | min: -0.00817507899599, max: 29048.3692201, mean: 3754.6304, std: 7393.1224 | Float | nan | 0 | nan | dlexDB |
| avg_cond_prob_in_bigrams | min: 0.0, max: 0.5006180465, mean: 0.0313, std: 0.0466 | Float | The conditional probability of the bigram, given the occurrence of its first component. In other words, how likely it is for the second component to follow directly after the first. Here, this measure is computed on the basis of the annotated type information. | 0 | nan | dlexDB |
| avg_cond_prob_in_trigrams | min: 0.0, max: 25.0, mean: 0.2251, std: 0.8814 | Float | The conditional probability of the trigram, given the occurrence of its initial bigram. In other words, how likely it is for the third component to follow directly after the initial pair. Here, this measure is computed on the basis of the annotated type information. | 0 | nan | dlexDB |
| neighbors_coltheart_higher_freq_cum_freq_normalized | min: 0.0, max: 44055.247282, mean: 1276.643, std: 5775.4034 | Float | nan | 0 | nan | dlexDB |
| neighbors_coltheart_higher_freq_count_normalized | min: 0.0, max: 8.13363128109, mean: 0.1556, std: 0.4321 | Float | nan | 0 | nan | dlexDB |
| neighbors_coltheart_all_cum_freq_normalized | min: 0.0, max: 49782.1108458, mean: 2794.1781, std: 7982.6321 | Float | nan | 0 | nan | dlexDB |
| neighbors_coltheart_all_count_normalized | min: 0.0, max: 47.5175301158, mean: 9.0448, std: 12.679 | Float | nan | 0 | nan | dlexDB |
| neighbors_levenshtein_higher_freq_cum_freq_normalized | min: 0.0, max: 44055.247282, mean: 1683.6273, std: 6153.8504 | Float | nan | 0 | nan | dlexDB |
| neighbors_levenshtein_higher_freq_count_normalized | min: 0.0, max: 11.9864039932, mean: 0.2681, std: 0.5814 | Float | nan | 0 | nan | dlexDB |
| neighbors_levenshtein_all_cum_freq_normalized | min: 0.0, max: 54875.2749862, mean: 3761.4734, std: 9299.5647 | Float | nan | 0 | nan | dlexDB |
| neighbors_levenshtein_all_count_normalized | min: 0.0, max: 75.7711966712, mean: 14.1417, std: 19.6383 | Float | nan | 0 | nan | dlexDB |
| sent_surprisal_gpt2-base | min: 0.0005104430601932, max: 56.804420471191406, mean: 10.0061, std: 9.1114 | Float | Surprisal value extracted from a language model (GerPT2-base) with the sentence as context. | 0 | nan | See script get_surprisal.py |
| text_surprisal_gpt2-base | min: 0.0002225389762315, max: 53.041446685791016, mean: 8.0061, std: 8.0873 | Float | Surprisal value extracted from a language model (GerPT2-base) with the text as context. | 0 | nan | See script get_surprisal.py |
| sent_surprisal_gpt2-large | min: 0.0002048997703241, max: 42.28059005737305, mean: 8.76, std: 8.0159 | Float | Surprisal value extracted from a language model (GerPT2-large) with the sentence as context. | 0 | nan | See script get_surprisal.py |
| text_surprisal_gpt2-large | min: 0.0001027531252475, max: 35.38883209228516, mean: 6.6792, std: 6.6522 | Float | Surprisal value extracted from a language model (GerPT2-large) with the text as context. | 0 | nan | See script get_surprisal.py |
| sent_surprisal_llama-7b | min: 0.0001720042055239, max: 42.96158599853516, mean: 8.0373, std: 7.0611 | Float | Surprisal value extracted from a language model (LeoLM-7b) with the sentence as context. | 0 | nan | See script get_surprisal.py |
| text_surprisal_llama-7b | min: 1.990775308513548e-05, max: 35.62324142456055, mean: 4.7991, std: 4.9022 | Float | Surprisal value extracted from a language model (LeoLM-7b) with the text as context. | 0 | nan | See script get_surprisal.py |
| sent_surprisal_llama-13b | min: 8.702239938429557e-06, max: 46.25139999389648, mean: 7.7768, std: 7.1775 | Float | Surprisal value extracted from a language model (LeoLM-13b) with the sentence as context. | 0 | nan | See script get_surprisal.py |
| text_surprisal_llama-13b | min: 9.298280929215252e-06, max: 36.29869842529297, mean: 4.5172, std: 4.9048 | Float | Surprisal value extracted from a language model (LeoLM-13b) with the text as context. | 0 | nan | See script get_surprisal.py |
| sent_surprisal_bert-base | min: 1.1920928244535389e-07, max: 101.79562616348268, mean: 8.1926, std: 13.1873 | Float | Surprisal value extracted from a language model (BERT-base) with the sentence as context. | 0 | nan | See script get_surprisal.py |
| text_surprisal_bert-base | min: -0.0, max: 88.84420316047726, mean: 7.487, std: 12.7275 | Float | Surprisal value extracted from a language model (BERT-base) with the text as context. | 0 | nan | See script get_surprisal.py |
| FFD | min: 0, max: 2144, mean: 195.9741, std: 124.5597 | Float | First-fixation duration: duration of the first fixation on a word if this word is fixated in first-pass reading, otherwise 0. | 0 | nan | compute_reading_measures.py |
| SFD | min: 0, max: 2144, mean: 107.9483, std: 134.474 | Float | Single-fixation duration: duration of the only first-pass fixation on a word, 0 if the word was skipped or more than one fixation occurred in the first-pass (equals FFD in case of a single first-pass fixation). | 0 | nan | compute_reading_measures.py |
| FD | min: 0, max: 2144, mean: 226.9857, std: 103.7904 | Float | First duration: duration of the first fixation on a word (identical to FFD if not skipped in the first-pass). | 0 | nan | compute_reading_measures.py |
| FPRT | min: 0, max: 9649, mean: 408.9247, std: 526.0428 | Float | First-pass reading time: sum of the durations of all first-pass fixations on a word (0 if the word was skipped in the first-pass). | 0 | nan | compute_reading_measures.py |
| FRT | min: 0, max: 9649, mean: 456.8788, std: 518.1388 | Float | First-reading time: sum of the duration of all fixations from first fixating the word (independent if the first fixation occurs in first-pass reading) until leaving the word for the first time (equals FPRT in case the word was fixated in the first-pass). | 0 | nan | compute_reading_measures.py |
| TFT | min: 0, max: 25314, mean: 1333.0163, std: 1428.494 | Float | Total-fixation time: sum of all fixations on a word (FPRT+RRT). | 0 | nan | compute_reading_measures.py |
| TFC | min: 0, max: 87, mean: 5.8238, std: 5.5152 | Float | The total fixation count on the word. | 0 | nan | compute_reading_measures.py |
| RRT | min: 0, max: 23902, mean: 924.0916, std: 1240.0587 | Float | Re-reading time: sum of the durations of all fixations on a word that do not belong to the first-pass (TFT-FPRT). | 0 | nan | compute_reading_measures.py |
| RPD_inc | min: 0, max: 318898, mean: 1076.7946, std: 5339.73 | Float | Inclusive regression-path duration: Sum of all fixation durations starting from the first first-pass fixation on a word until fixation on a word to the right of this word (including all regressive fixations on previous words), 0 if the word was not fixated in the first-pass (RPD_exc+RBRT). | 0 | nan | compute_reading_measures.py |
| RPD_exc | min: 0, max: 315640, mean: 557.5849, std: 5209.143 | Float | Exclusive regression-path duration: Sum of all fixation durations after initiating a first-pass regression from a word until fixating a word to the right of this word, without counting fixations on the word itself (RPD_inc-RBRT). | 0 | nan | compute_reading_measures.py |
| RBRT | min: 0, max: 10675, mean: 519.2098, std: 638.9024 | Float | Right-bounded reading time: Sum of all fixation durations on a word until a word to the right of this word is fixated (RPD_inc-RDP_exc). | 0 | nan | compute_reading_measures.py |
| Fix | 0: 110, 1: 404310 | Categorical | Fixation: 1 if the word was fixated, otherwise 0 (FPF or RR). | 0 | nan | compute_reading_measures.py |
| FPF | 0: 56838, 1: 347582 | Categorical | First-pass fixation: 1 if the word was fixated in the first-pass, otherwise 0. | 0 | nan | compute_reading_measures.py |
| RR | 0: 48241, 1: 356179 | Categorical | Re-reading: 1 if the word was fixated after the first-pass reading, otherwise 0 (sign(RRT)). | 0 | nan | compute_reading_measures.py |
| FPReg | 0: 308156, 1: 96264 | Categorical | First-pass regression: 1 if a regression was initiated in the first-pass reading of the word, otherwise 0 (sign(RPD exc)). | 0 | nan | compute_reading_measures.py |
| TRC_out | min: 0, max: 15, mean: 0.8249, std: 1.193 | Float | Total count of outgoing regressions: total number of regressive saccades initiated from this word. | 0 | nan | compute_reading_measures.py |
| TRC_in | min: 0, max: 12, mean: 0.7776, std: 1.1734 | Float | Total count of incoming regressions: total number of regressive saccades landing on this word. | 0 | nan | compute_reading_measures.py |
| LP | min: 1, max: 28, mean: 3.3887, std: 2.3225 | Float | Landing position: position of the first saccade on the word expressed by ordinal position of the fixated character. | 0 | nan | compute_reading_measures.py |
| SL_in | min: -162, max: 156, mean: 1.3449, std: 2.928 | Float | Incoming saccade length: length of the saccade that leads to first fixation on a word in number of words; positive sign if the saccade is a progressive one, negative sign if it is a regression. | 0 | nan | compute_reading_measures.py |
| SL_out | min: -179, max: 63, mean: -0.0835, std: 7.9375 | Float | Outgoing saccade length: length of the first saccade that leaves the word in number of words; positive sign if the saccade is a progressive one, negative sign if it is a regression; 0 if the word is never fixated. | 0 | nan | compute_reading_measures.py |
| mean_acc_tq | min: 0.0, max: 0.9991603694374476, mean: 0.3819, std: 0.3148 | Float | The mean accuracy of all background questions for one text read by one reader. | 0 | nan | nan |
| mean_acc_bq | min: 0.0, max: 0.999250936329588, mean: 0.6398, std: 0.312 | Float | The mean accuracy of all text questions for one text read by one reader. | 0 | nan | nan |
| gender_numeric | 0.0: 187536, 1.0: 212874, nan: 4010 | Categorical | Numerical value of gender; 0=male, 1=female. | 4010 | nan | nan |
| age | min: 18.0, max: 41.0, mean: 24.0283, std: 4.1436 | Float | Reader's age. | 8459 | nan | demographic questionnaire |
| discipline_level_of_studies_numeric | 0: 89325, 1: 133833, 2: 65008, 3: 116254 | Categorical | Numerical value of discipline_level_of_studies; 0=biology-beginner, 1=biology-expert, 2=physics-beginner, 3=physics-expert. | 0 | nan | demographic questionnaire |
Contains the mapping of each aoi to the respective word in each of the texts.
Please find the file at this link: aoi to word mapping
| Column name | Possible values | Value type | Description | Num missing values | Missing value description | Source |
|---|---|---|---|---|---|---|
| text_id | b0, b1, b2, b3, b4, b5, p0, p1, p2, p3, p4, p5 | Unique identifier given to each stimulus text. | 0 | nan | nan | |
| word_index_in_text | 1-180 | Integer | The index of the word in the text. Indexing starts at 1. | 0 | nan | nan |
| char_index_in_text | 1-1121 | Integer | Index of a character in the text. Indexing starts at 1. | 0 | nan | nan |
In the participants' data file, all demographic information is stored.
Please find the file at this link: Participant information
| Column name | Possible values | Value type | Description | Num missing values | Missing value description | Source |
|---|---|---|---|---|---|---|
| reader_id | 0-105 | Integer | The unique identifier given to each reader. Reader IDs start at 0. | 0 | nan | Manually created |
| reader_discipline | biology: 43, physics: 32 | Categorical | The area of expertise of the reader. All readers are students whose major is either physics or biology. | 0 | nan | demographic questionnaire |
| reader_discipline_numeric | 0: 43, 1: 32 | Categorical | Numerical encoding of the reader discipline; 0=biology, 1=physics. | 0 | nan | Manually created |
| level_of_studies | graduate: 47, undergraduate: 28 | Categorical | Reader's level of studies. Readers are considered experts if they are either MSc or PhD students. 1st semester BSc students are considered beginners. | 0 | nan | demographic questionnaire |
| level_of_studies_numeric | 0: 28, 1: 47 | Categorical | Numerical value of level_of_studies; 0=beginner, 1=expert. | 0 | nan | demographic questionnaire |
| discipline_level_of_studies | biology-graduate: 27, biology-undergraduate: 16, physics-graduate: 20, physics-undergraduate: 12 | Categorical | The combination of the readers' major (reader_discipline) and their expertise (level_of_studies). | 0 | nan | demographic questionnaire |
| discipline_level_of_studies_numeric | 0: 16, 1: 27, 2: 12, 3: 20 | Categorical | Numerical value of discipline_level_of_studies; 0=biology-beginner, 1=biology-expert, 2=physics-beginner, 3=physics-expert. | 0 | nan | demographic questionnaire |
| glasses | no: 54, yes: 20, nan: 1 | Categorical | Whether or not reader is wearing glasses. | 1 | nan | demographic questionnaire |
| age | min: 18.0, max: 41.0, mean: 24.1644, std: 4.2098 | Float | Reader's age. | 2 | nan | demographic questionnaire |
| handedness | right: 68, left: 6, nan: 1 | Categorical | Reader's handedness. | 1 | nan | demographic questionnaire |
| hours_sleep | min: 0.0, max: 11.0, mean: 7.2095, std: 1.3138 | Float | The hours of sleep of the participant before the experiment. | 1 | nan | demographic questionnaire |
| alcohol | no: 71, yes: 3, nan: 1 | Categorical | Whether or not a participant consumed alcohol within 24 hours before the experiment start. | 1 | nan | demographic questionnaire |
| gender | female: 39, male: 35, nan: 1 | Categorical | Reader's gender. | 1 | nan | demographic questionnaire |
| gender_numeric | 0.0: 35, 1.0: 39, nan: 1 | Categorical | Numerical value of gender; 0=male, 1=female. | 1 | nan | nan |
| semester | string | The semester the reader is currently enrolled in. | 1 | nan | demographic questionnaire | |
| bilingual | n: 73, j: 1, nan: 1 | Categorical | Whether the reader is bilingual. | 1 | nan | demographic questionnaire |
| state | string | The German state the reader is from. | 1 | nan | demographic questionnaire | |
| grade | string | The grade of the reader in their university entrance diploma. | 4 | nan | demographic questionnaire | |
| subject_detailed | The detailed subject of the reader's major. | 1 | nan | demographic questionnaire |
The response accuracy for each participant for each question.
Please find the file at this link: Participant response accuracy
| Column name | Possible values | Value type | Description | Num missing values | Missing value description | Source |
|---|---|---|---|---|---|---|
| reader_id | 0-105 | Integer | The unique identifier given to each reader. Reader IDs start at 0. | 0 | nan | Manually created |
| reader_discipline | biology: 516, physics: 384 | Categorical | The area of expertise of the reader. All readers are students whose major is either physics or biology. | 0 | nan | demographic questionnaire |
| reader_discipline_numeric | 0: 516, 1: 384 | Categorical | Numerical encoding of the reader discipline; 0=biology, 1=physics. | 0 | nan | Manually created |
| level_of_studies | graduate: 564, undergraduate: 336 | Categorical | Reader's level of studies. Readers are considered experts if they are either MSc or PhD students. 1st semester BSc students are considered beginners. | 0 | nan | demographic questionnaire |
| level_of_studies_numeric | 0: 336, 1: 564 | Categorical | Numerical value of level_of_studies; 0=beginner, 1=expert. | 0 | nan | demographic questionnaire |
| text_id | b0, b1, b2, b3, b4, b5, p0, p1, p2, p3, p4, p5 | Unique identifier given to each stimulus text. | 0 | nan | nan | |
| text_domain | biology: 450, physics: 450 | Categorical | The domain of the stimulus text. | 0 | nan | Manually tagged |
| expert_reading_label | expert-reading: 282, non-expert-reading: 618 | Categorical | Whether the reader is an expert in the text domain (i.e. text_domain == reader_discipline and reader is expert) | 0 | nan | Manually tagged |
| expert_reading_label_numeric | 0: 618, 1: 282 | Categorical | Numeric encoding of whether the reader is an expert in the text domain (i.e. text_domain == reader_discipline and reader is expert). 1=expert_reading, 0=non-expert_reading | 0 | nan | Manually tagged |
| acc_tq_1 | min: 0.0, max: 1.0, mean: 0.6475, std: 0.478 | Float | The accuracy of text question 1. The answer can be either true or false, so the value is either 0 or 1. | 12 | For participant 1 (p0), 31 (p0, b1, b5), 32 (p0), 61 (p1), 62 (b0, b1, b3, b5, 04) and 90 (b3) the accuracies for certain trials are missing due to hardware problems (missing measurements). | nan |
| acc_tq_2 | min: 0.0, max: 1.0, mean: 0.6441, std: 0.479 | Float | The accuracy of text question 2. The answer can be either true or false, so the value is either 0 or 1. | 12 | For participant 1 (p0), 31 (p0, b1, b5), 32 (p0), 61 (p1), 62 (b0, b1, b3, b5, 04) and 90 (b3) the accuracies for certain trials are missing due to hardware problems (missing measurements). | nan |
| acc_tq_3 | min: 0.0, max: 1.0, mean: 0.6509, std: 0.477 | Float | The accuracy of text question 3. The answer can be either true or false, so the value is either 0 or 1. | 12 | For participant 1 (p0), 31 (p0, b1, b5), 32 (p0), 61 (p1), 62 (b0, b1, b3, b5, 04) and 90 (b3) the accuracies for certain trials are missing due to hardware problems (missing measurements). | nan |
| acc_bq_1 | min: 0.0, max: 1.0, mean: 0.393, std: 0.4887 | Float | The accuracy of background question 1. The answer can be either true or false, so the value is either 0 or 1. | 12 | For participant 1 (p0), 31 (p0, b1, b5), 32 (p0), 61 (p1), 62 (b0, b1, b3, b5, 04) and 90 (b3) the accuracies for certain trials are missing due to hardware problems (missing measurements). | nan |
| acc_bq_2 | min: 0.0, max: 1.0, mean: 0.366, std: 0.482 | Float | The accuracy of background question 2. The answer can be either true or false, so the value is either 0 or 1. | 12 | For participant 1 (p0), 31 (p0, b1, b5), 32 (p0), 61 (p1), 62 (b0, b1, b3, b5, 04) and 90 (b3) the accuracies for certain trials are missing due to hardware problems (missing measurements). | nan |
| acc_bq_3 | min: 0.0, max: 1.0, mean: 0.4234, std: 0.4944 | Float | The accuracy of background question 3. The answer can be either true or false, so the value is either 0 or 1. | 12 | For participant 1 (p0), 31 (p0, b1, b5), 32 (p0), 61 (p1), 62 (b0, b1, b3, b5, 04) and 90 (b3) the accuracies for certain trials are missing due to hardware problems (missing measurements). | nan |
| mean_acc_tq | min: 0.0, max: 1.0, mean: 0.6475, std: 0.3082 | Float | The mean accuracy of all background questions for one text read by one reader. | 12 | nan | nan |
| mean_acc_bq | min: 0.0, max: 1.0, mean: 0.3941, std: 0.3163 | Float | The mean accuracy of all text questions for one text read by one reader. | 12 | nan | nan |
This file is an explanation of the values used in the online survey answer file (response_data_online_survey.csv). Each variable has four different options which are expressed as a numerical value and each of the option is mapped to the text option the participant saw.
Please find the file at this link: Answer coding online survey
| Column name | Possible values | Value type | Description | Num missing values | Missing value description | Source |
|---|---|---|---|---|---|---|
| VAR | string | Variable name of the fields in the participant online survey. These are explanations of the names of the columns in the file: response_data_online_survey.csv | 0 | nan | online survey tool | |
| RESPONSE | -9: 46, 0: 2, 1: 95, 2: 92, 3: 93, 4: 54, 5: 13, 6: 13, 7: 12, 8: 12, 9: 12, 10: 12, 11: 12, 12: 12 | Categorical | The response code given by the online survey tool. In the answer file these codes are used. | 0 | nan | online survey tool |
| MEANING | The literal meaning of the response. What the participant could see in the online survey. | 0 | nan | online survey tool | ||
| CORRECT_ANSWER | nan: 312, False: 126, True: 42 | Categorical | Whether or not this answer was a correct answer or not. | 312 | The value is missing if this is not applicable. If the answer means that the participant did not even answer. | online survey tool |
This file contains the response accuracy for the participants from the online survey.
Please find the file at this link: Response accuracy
| Column name | Possible values | Value type | Description | Num missing values | Missing value description | Source |
|---|---|---|---|---|---|---|
| text_id | b0, b1, b2, b3, b4, b5, p0, p1, p2, p3, p4, p5 | Unique identifier given to each stimulus text. | 0 | nan | nan | |
| text_domain | biology: 210, physics: 210 | Categorical | The domain of the stimulus text. | 0 | nan | Manually tagged |
| mean_acc_tq | min: 0.0, max: 1.0, mean: 0.2619, std: 0.2495 | Float | The mean accuracy of all background questions for one text read by one reader. | 0 | nan | nan |
| reader_discipline | biology: 108, other: 156, physics: 156 | Categorical | The area of expertise of the reader. All readers are students whose major is either physics or biology. | 0 | nan | demographic questionnaire |
| level_of_studies | graduate: 264, other: 156 | Categorical | Reader's level of studies. Readers are considered experts if they are either MSc or PhD students. 1st semester BSc students are considered beginners. | 0 | nan | demographic questionnaire |
The original response data from the online survey. The coding fo the values contained in here is found in the answer_coding_online_survey.csv file which is why the table below is empty. Please note that there are many value isn this file which are not relevant for this corpus. E.g., all columns starting with RA specify the randomization and all values starting with TIME contain response time information.
Please find the file at this link: Response data online survey
| Column name | Possible values | Value type | Description | Num missing values | Missing value description | Source |
|---|
Contains the scanpaths for each trial merged with infomration on the reader, texts, etc.
Please find the files at this link: Scanpaths merged
| Column name | Possible values | Value type | Description | Num missing values | Missing value description | Source |
|---|---|---|---|---|---|---|
| fixation_index | 1-1469 | Integer | The index of the fixation in temporal order. | 0 | nan | SR Research data viewer |
| text_domain | bio: 4682, biology: 200017, physics: 199721 | Categorical | The domain of the stimulus text. | 0 | nan | Manually tagged |
| trial | 1-12 | Integer | Each participant reads all 12 texts, the order of which follows their trial number. If text b0 has trial number 2 for participant 5, this participant read text b0 as the second text. | 0 | nan | nan |
| acc_bq_1 | min: 0.0, max: 1.0, mean: 0.3869, std: 0.487 | Float | The accuracy of background question 1. The answer can be either true or false, so the value is either 0 or 1. | 5785 | For participant 1 (p0), 31 (p0, b1, b5), 32 (p0), 61 (p1), 62 (b0, b1, b3, b5, 04) and 90 (b3) the accuracies for certain trials are missing due to hardware problems (missing measurements). | nan |
| acc_bq_2 | min: 0.0, max: 1.0, mean: 0.3564, std: 0.4789 | Float | The accuracy of background question 2. The answer can be either true or false, so the value is either 0 or 1. | 5785 | For participant 1 (p0), 31 (p0, b1, b5), 32 (p0), 61 (p1), 62 (b0, b1, b3, b5, 04) and 90 (b3) the accuracies for certain trials are missing due to hardware problems (missing measurements). | nan |
| acc_bq_3 | min: 0.0, max: 1.0, mean: 0.4217, std: 0.4938 | Float | The accuracy of background question 3. The answer can be either true or false, so the value is either 0 or 1. | 5785 | For participant 1 (p0), 31 (p0, b1, b5), 32 (p0), 61 (p1), 62 (b0, b1, b3, b5, 04) and 90 (b3) the accuracies for certain trials are missing due to hardware problems (missing measurements). | nan |
| acc_tq_1 | min: 0.0, max: 1.0, mean: 0.6625, std: 0.4729 | Float | The accuracy of text question 1. The answer can be either true or false, so the value is either 0 or 1. | 5785 | For participant 1 (p0), 31 (p0, b1, b5), 32 (p0), 61 (p1), 62 (b0, b1, b3, b5, 04) and 90 (b3) the accuracies for certain trials are missing due to hardware problems (missing measurements). | nan |
| acc_tq_2 | min: 0.0, max: 1.0, mean: 0.6326, std: 0.4821 | Float | The accuracy of text question 2. The answer can be either true or false, so the value is either 0 or 1. | 5785 | For participant 1 (p0), 31 (p0, b1, b5), 32 (p0), 61 (p1), 62 (b0, b1, b3, b5, 04) and 90 (b3) the accuracies for certain trials are missing due to hardware problems (missing measurements). | nan |
| acc_tq_3 | min: 0.0, max: 1.0, mean: 0.6564, std: 0.4749 | Float | The accuracy of text question 3. The answer can be either true or false, so the value is either 0 or 1. | 5785 | For participant 1 (p0), 31 (p0, b1, b5), 32 (p0), 61 (p1), 62 (b0, b1, b3, b5, 04) and 90 (b3) the accuracies for certain trials are missing due to hardware problems (missing measurements). | nan |
| fixation_duration | 2-4474 | Integer | The duration of the fixation in milliseconds. | 0 | nan | SR Research data viewer |
| next_saccade_duration | 1.0-9491.0 | Integer | The duration of the saccade that follows a fixation in milliseconds. | 46 | nan | SR Research data viewer |
| previous_saccade_duration | 1.0-9491.0 | Integer | The duration of a saccade that preceeds a fixation in milliseconds. | 515 | nan | SR Research data viewer |
| version | 0-105 | Integer | Specifies the version of the items. In each version, the order of the stimuli and the order of the answer options for each question differ. The specifics of each version can be found in the items.tsv. | 0 | nan | nan |
| line | 1-12 | Integer | The texts were presented on the screen in multiple lines. Specifies the line of the respective row; indexing starts at 1. | 0 | nan | nan |
| aoi | 1-1121 | Integer | The region of interest specified as character index in the text (see char_index_in_text). Defines which character has been fixated. | 0 | nan | SR Research experiment builder |
| char_index_in_line | 1-100 | Integer | Index of a character in the line. Indexing starts at 1. | 0 | nan | nan |
| original_fixation_index | 1-1478 | Integer | The index of the uncorrected fixation. | 0 | nan | SR Research data viewer |
| is_fixation_adjusted | False: 382202, True: 22218 | Categorical | Whether or not the fixation has been adjusted manually. | 0 | nan | Manually tagged. |
| reader_id | 0-105 | Integer | The unique identifier given to each reader. Reader IDs start at 0. | 0 | nan | Manually created |
| text_id | b0, b1, b2, b3, b4, b5, p0, p1, p2, p3, p4, p5 | Unique identifier given to each stimulus text. | 0 | nan | nan | |
| word_index_in_text | 1-180 | Integer | The index of the word in the text. Indexing starts at 1. | 0 | nan | nan |
| sent_index_in_text | 1-12 | Integer | The index of a sentence in the respective text. Indexing starts at 1. | 0 | nan | nan |
| char_index_in_text | 1-1121 | Integer | Index of a character in the text. Indexing starts at 1. | 0 | nan | nan |
| word | string | Words as they appear in the stimuli texts. Words are split at white-space. | 0 | nan | nan | |
| character | string | Character as text. | 0 | nan | nan | |
| text_id_numeric | 0-11 | Integer | Numerical value of text_id; 0=b0, 1=b1, 2=b2, 3=b3, 4=b4, 5=b5, 6=p0, 7=p1, 8=p2, 9=p3, 10=p4, 11=p5 | 0 | nan | Manually created |
| text_domain_numeric | 0: 204699, 1: 199721 | Categorical | Numerical value of text_domain; 0=biology, 1=physics. | 0 | nan | Manually created |
| reader_discipline_numeric | 0: 223158, 1: 181262 | Categorical | Numerical encoding of the reader discipline; 0=biology, 1=physics. | 0 | nan | Manually created |
| level_of_studies_numeric | 0: 154333, 1: 250087 | Categorical | Numerical value of level_of_studies; 0=beginner, 1=expert. | 0 | nan | demographic questionnaire |
| expert_reading_label_numeric | 0: 290883, 1: 113537 | Categorical | Numeric encoding of whether the reader is an expert in the text domain (i.e. text_domain == reader_discipline and reader is expert). 1=expert_reading, 0=non-expert_reading | 0 | nan | Manually tagged |
| expert_reading_label | expert_reading: 113537, non-expert_reading: 290883 | Categorical | Whether the reader is an expert in the text domain (i.e. text_domain == reader_discipline and reader is expert) | 0 | nan | Manually tagged |
| word_with_punct | string | The word as it appears in the text, including punctuation. | 96 | nan | nan | |
| word_index_in_sent | 1-51 | Integer | The index of the word in the sentence. Indexing starts at 1. | 0 | nan | nan |
| word_length | 2-33 | Integer | Word length is defined in number of characters including symbols like hyphens but without sentence punctuation at the end (i.e., z.B. = 4 characters; DNA-Kette =9 characters; eats.=4 characters). | 0 | nan | nan |
| STTS_punctuation_before | 0.0: 211108, 0: 189407, $(: 3905 | Categorical | If a word is preceded by a punctuation mark, the STTS-PoS-tag of the punctuation mark is added here. | 0 | nan | Manually tagged |
| STTS_punctuation_after |
|
Categorical | If a word is followed by a punctuation mark, the STTS-PoS-tag of the punctuation mark is added here. | 0 | nan | Manually tagged |
| is_in_quote | 0: 399715, 1: 4705 | Categorical | Whether or not the word is part of an expression in quotes. | 0 | nan | Manually tagged |
| is_in_parentheses | 0: 403155, 1: 1265 | Categorical | Whether or not the word is part of a phrase in parentheses. | 0 | nan | Manually tagged |
| is_clause_beginning | 0: 388232, 1: 16188 | Categorical | Whether or not the word is the beginning of a clause. | 0 | nan | Manually tagged |
| is_sent_beginning | 0: 386681, 1: 17739 | Categorical | Whether or not the word is the beginning of a new sentence. | 0 | nan | Manually tagged |
| is_clause_end | 0: 381545, 1: 22875 | Categorical | Whether or not the word is the end of a clause. | 0 | nan | Manually tagged |
| is_sent_end | 0: 380027, 1: 24393 | Categorical | Whether or not the word is the end of a sentence. | 0 | nan | Manually tagged |
| is_abbreviation | 0: 403478, 1: 942 | Categorical | Whether or not the entire word is an abbreviation. | 0 | nan | Manually tagged |
| is_expert_technical_term | 0: 332354, 1: 72066 | Categorical | 1 if the word is a technical term that is not generally understandable. E.g.: ""Agarose"". | 0 | nan | Manually tagged |
| is_general_technical_term | 0: 325333, 1: 79087 | Categorical | 1 if the word is a technical term that is generally understandable. E.g.: "elektrisch" | 0 | nan | Manually tagged |
| contains_symbol | 0: 400458, 1: 3962 | Categorical | Whether or not the word contains a symbol. E.g.: β-D-Glucose | 0 | nan | Manually tagged |
| contains_hyphen | 0: 388149, 1: 16271 | Categorical | Whether or not the word contains a hyphen. E.g. 1 for DNA-Fragment (not words that have tag TRUNC (compositional first element, e.g. in "Sekundär- und Tertiärstrukturen", "Sekundär-" does not count as having a hyphen.)). | 0 | nan | Manually tagged |
| contains_abbreviation | 0: 399423, 1: 4997 | Categorical | Whether or not the word contains an abbreviation. 0 for words that are only an abbreviation. See is_abbreviation. E.g. 1 for DNA-Fragment, 0 for DNA. | 0 | nan | Manually tagged |
| STTS_PoS_tag | ADJA: 51041, ADJD: 12714, ADV: 12236, APPR: 22470, APPRART: 5566, APZR: 91, ART: 37340, CARD: 1594, KOKOM: 2428, KON: 5798, KOUI: 654, KOUS: 2521, NE: 955, NN: 162980, PAV: 3444, PDAT: 3292, PDS: 1374, PIAT: 791, PIDAT: 1653, PIS: 1322, PPER: 2511, PPOSAT: 1360, PRELAT: 1302, PRELS: 4193, PRF: 3606, PTKA: 97, PTKNEG: 687, PTKVZ: 1490, PTKZU: 583, PWAV: 76, TRUNC: 1137, VAFIN: 10340, VAINF: 1206, VMFIN: 3953, VMINF: 153, VVFIN: 23854, VVINF: 7713, VVIZU: 578, VVPP: 9317 | Categorical | Part-of-speech tags according to the STTS-tagset. See stimuli/ANNOTATION.MD for more information. | 0 | nan | Manually tagged |
| type | string | The orthographical representation of a word as found in the corpus; this data is case sensitive, i.e. there is a distinction between name and Name. | 0 | nan | dlexDB | |
| type_length_chars | 0.0-33.0 | Integer | The length of the type of a word in characters. See the description of word_length for a definition of how characters are counted. | 0 | nan | nan |
| PoS_tag | adja: 53330, adjd: 12226, adv: 15728, appr: 22193, apprart: 5566, art: 37918, card: 1594, kokom: 2428, kon: 5405, koui: 559, kous: 2521, ne: 1386, nn: 160585, pdat: 3292, pds: 1374, piat: 791, pidat: 352, pis: 2063, pper: 2434, pposat: 1360, prelat: 1302, prels: 4076, prf: 3606, ptka: 97, ptkneg: 687, ptkvz: 1891, ptkzu: 583, pwav: 76, trunc: 1137, vafin: 10340, vainf: 1206, vmfin: 3829, vminf: 153, vvfin: 23978, vvinf: 7713, vvizu: 578, vvpp: 9317, xy: 746 | Categorical | Part-of-speech tag as defined by the dlexDB query. | 0 | nan | dlexDB |
| lemma | string | nan | 0 | nan | dlexDB | |
| lemma_length_chars | 0.0-32.0 | Integer | nan | 0 | nan | dlexDB |
| syllables | string | nan | 0 | nan | dlexDB | |
| type_length_syllables | 0.0-14.0 | Integer | nan | 0 | nan | dlexDB |
| annotated_type_frequency_normalized | min: 0.0, max: 24738.5901996, mean: 1950.9055, std: 5185.3006 | Float | The number of occurrences of an annotated type in corpus. An annotated type is a unique combination of a type, its part-of-speech tag and its lemma. | 0 | nan | dlexDB |
| type_frequency_normalized | min: 0.0, max: 26530.3631386, mean: 2247.4523, std: 5847.2187 | Float | nan | 0 | nan | dlexDB |
| lemma_frequency_normalized | min: 0.0, max: 80100.3069113, mean: 7203.2409, std: 19769.4428 | Float | nan | 0 | nan | dlexDB |
| familiarity_normalized | min: 0.0, max: 26530.3631386, mean: 2191.7786, std: 5759.2592 | Float | nan | 0 | nan | dlexDB |
| regularity_normalized | min: 0.0, max: 2123.30585022, mean: 46.8657, std: 137.5046 | Float | nan | 0 | nan | dlexDB |
| document_frequency_normalized | min: 0.0, max: 9372.80956103, mean: 1684.1043, std: 2829.0626 | Float | nan | 0 | nan | dlexDB |
| sentence_frequency_normalized | min: 0.0, max: 30912.3596552, mean: 3137.4539, std: 7374.8037 | Float | nan | 0 | nan | dlexDB |
| cumulative_syllable_corpus_frequency_normalized | min: 0.0, max: 125126.524676, mean: 15768.7784, std: 17529.5528 | Float | nan | 0 | nan | dlexDB |
| cumulative_syllable_lexicon_frequency_normalized | min: 0.0, max: 218985.607753, mean: 27232.3183, std: 36883.9628 | Float | nan | 0 | nan | dlexDB |
| cumulative_character_corpus_frequency_normalized | min: 0.0, max: 7810554.20193, mean: 2053804.334, std: 1596380.3916 | Float | nan | 0 | nan | dlexDB |
| cumulative_character_lexicon_frequency_normalized | min: 0.0, max: 18380479.713, mean: 4612580.9638, std: 3597155.0404 | Float | nan | 0 | nan | dlexDB |
| cumulative_character_bigram_corpus_frequency_normalized | min: 0.0, max: 1322150.62097, mean: 356831.454, std: 269772.388 | Float | nan | 0 | nan | dlexDB |
| cumulative_character_bigram_lexicon_frequency_normalized | min: 0.0, max: 2788357.77704, mean: 629626.1651, std: 539088.9742 | Float | nan | 0 | nan | dlexDB |
| cumulative_character_trigram_corpus_frequency_normalized | min: 0.0, max: 603427.130456, mean: 200341.8076, std: 144122.7012 | Float | nan | 0 | nan | dlexDB |
| cumulative_character_trigram_lexicon_frequency_normalized | min: 0.0, max: 899592.89035, mean: 236423.2776, std: 199573.1416 | Float | nan | 0 | nan | dlexDB |
| initial_letter_frequency_normalized | min: 0.0, max: 110461.430317, mean: 28045.0077, std: 30618.9167 | Float | nan | 0 | nan | dlexDB |
| initial_bigram_frequency_normalized | min: 0.0, max: 53801.2331077, mean: 8706.0335, std: 12743.2638 | Float | nan | 0 | nan | dlexDB |
| initial_trigram_frequency_normalized | min: -0.00817507899599, max: 29048.3692201, mean: 3754.6304, std: 7393.1224 | Float | nan | 0 | nan | dlexDB |
| avg_cond_prob_in_bigrams | min: 0.0, max: 0.5006180465, mean: 0.0313, std: 0.0466 | Float | The conditional probability of the bigram, given the occurrence of its first component. In other words, how likely it is for the second component to follow directly after the first. Here, this measure is computed on the basis of the annotated type information. | 0 | nan | dlexDB |
| avg_cond_prob_in_trigrams | min: 0.0, max: 25.0, mean: 0.2251, std: 0.8814 | Float | The conditional probability of the trigram, given the occurrence of its initial bigram. In other words, how likely it is for the third component to follow directly after the initial pair. Here, this measure is computed on the basis of the annotated type information. | 0 | nan | dlexDB |
| neighbors_coltheart_higher_freq_cum_freq_normalized | min: 0.0, max: 44055.247282, mean: 1276.643, std: 5775.4034 | Float | nan | 0 | nan | dlexDB |
| neighbors_coltheart_higher_freq_count_normalized | min: 0.0, max: 8.13363128109, mean: 0.1556, std: 0.4321 | Float | nan | 0 | nan | dlexDB |
| neighbors_coltheart_all_cum_freq_normalized | min: 0.0, max: 49782.1108458, mean: 2794.1781, std: 7982.6321 | Float | nan | 0 | nan | dlexDB |
| neighbors_coltheart_all_count_normalized | min: 0.0, max: 47.5175301158, mean: 9.0448, std: 12.679 | Float | nan | 0 | nan | dlexDB |
| neighbors_levenshtein_higher_freq_cum_freq_normalized | min: 0.0, max: 44055.247282, mean: 1683.6273, std: 6153.8504 | Float | nan | 0 | nan | dlexDB |
| neighbors_levenshtein_higher_freq_count_normalized | min: 0.0, max: 11.9864039932, mean: 0.2681, std: 0.5814 | Float | nan | 0 | nan | dlexDB |
| neighbors_levenshtein_all_cum_freq_normalized | min: 0.0, max: 54875.2749862, mean: 3761.4734, std: 9299.5647 | Float | nan | 0 | nan | dlexDB |
| neighbors_levenshtein_all_count_normalized | min: 0.0, max: 75.7711966712, mean: 14.1417, std: 19.6383 | Float | nan | 0 | nan | dlexDB |
| sent_surprisal_gpt2-base | min: 0.0005104430601932, max: 56.804420471191406, mean: 10.0061, std: 9.1114 | Float | Surprisal value extracted from a language model (GerPT2-base) with the sentence as context. | 0 | nan | See script get_surprisal.py |
| text_surprisal_gpt2-base | min: 0.0002225389762315, max: 53.041446685791016, mean: 8.0061, std: 8.0873 | Float | Surprisal value extracted from a language model (GerPT2-base) with the text as context. | 0 | nan | See script get_surprisal.py |
| sent_surprisal_gpt2-large | min: 0.0002048997703241, max: 42.28059005737305, mean: 8.76, std: 8.0159 | Float | Surprisal value extracted from a language model (GerPT2-large) with the sentence as context. | 0 | nan | See script get_surprisal.py |
| text_surprisal_gpt2-large | min: 0.0001027531252475, max: 35.38883209228516, mean: 6.6792, std: 6.6522 | Float | Surprisal value extracted from a language model (GerPT2-large) with the text as context. | 0 | nan | See script get_surprisal.py |
| sent_surprisal_llama-7b | min: 0.0001720042055239, max: 42.96158599853516, mean: 8.0373, std: 7.0611 | Float | Surprisal value extracted from a language model (LeoLM-7b) with the sentence as context. | 0 | nan | See script get_surprisal.py |
| text_surprisal_llama-7b | min: 1.990775308513548e-05, max: 35.62324142456055, mean: 4.7991, std: 4.9022 | Float | Surprisal value extracted from a language model (LeoLM-7b) with the text as context. | 0 | nan | See script get_surprisal.py |
| sent_surprisal_llama-13b | min: 8.702239938429557e-06, max: 46.25139999389648, mean: 7.7768, std: 7.1775 | Float | Surprisal value extracted from a language model (LeoLM-13b) with the sentence as context. | 0 | nan | See script get_surprisal.py |
| text_surprisal_llama-13b | min: 9.298280929215252e-06, max: 36.29869842529297, mean: 4.5172, std: 4.9048 | Float | Surprisal value extracted from a language model (LeoLM-13b) with the text as context. | 0 | nan | See script get_surprisal.py |
| sent_surprisal_bert-base | min: 1.1920928244535389e-07, max: 101.79562616348268, mean: 8.1926, std: 13.1873 | Float | Surprisal value extracted from a language model (BERT-base) with the sentence as context. | 0 | nan | See script get_surprisal.py |
| text_surprisal_bert-base | min: -0.0, max: 88.84420316047726, mean: 7.487, std: 12.7275 | Float | Surprisal value extracted from a language model (BERT-base) with the text as context. | 0 | nan | See script get_surprisal.py |
| FFD | min: 0, max: 2144, mean: 195.9741, std: 124.5597 | Float | First-fixation duration: duration of the first fixation on a word if this word is fixated in first-pass reading, otherwise 0. | 0 | nan | compute_reading_measures.py |
| SFD | min: 0, max: 2144, mean: 107.9483, std: 134.474 | Float | Single-fixation duration: duration of the only first-pass fixation on a word, 0 if the word was skipped or more than one fixation occurred in the first-pass (equals FFD in case of a single first-pass fixation). | 0 | nan | compute_reading_measures.py |
| FD | min: 0, max: 2144, mean: 226.9857, std: 103.7904 | Float | First duration: duration of the first fixation on a word (identical to FFD if not skipped in the first-pass). | 0 | nan | compute_reading_measures.py |
| FPRT | min: 0, max: 9649, mean: 408.9247, std: 526.0428 | Float | First-pass reading time: sum of the durations of all first-pass fixations on a word (0 if the word was skipped in the first-pass). | 0 | nan | compute_reading_measures.py |
| FRT | min: 0, max: 9649, mean: 456.8788, std: 518.1388 | Float | First-reading time: sum of the duration of all fixations from first fixating the word (independent if the first fixation occurs in first-pass reading) until leaving the word for the first time (equals FPRT in case the word was fixated in the first-pass). | 0 | nan | compute_reading_measures.py |
| TFT | min: 0, max: 25314, mean: 1333.0163, std: 1428.494 | Float | Total-fixation time: sum of all fixations on a word (FPRT+RRT). | 0 | nan | compute_reading_measures.py |
| TFC | min: 0, max: 87, mean: 5.8238, std: 5.5152 | Float | The total fixation count on the word. | 0 | nan | compute_reading_measures.py |
| RRT | min: 0, max: 23902, mean: 924.0916, std: 1240.0587 | Float | Re-reading time: sum of the durations of all fixations on a word that do not belong to the first-pass (TFT-FPRT). | 0 | nan | compute_reading_measures.py |
| RPD_inc | min: 0, max: 318898, mean: 1076.7946, std: 5339.73 | Float | Inclusive regression-path duration: Sum of all fixation durations starting from the first first-pass fixation on a word until fixation on a word to the right of this word (including all regressive fixations on previous words), 0 if the word was not fixated in the first-pass (RPD_exc+RBRT). | 0 | nan | compute_reading_measures.py |
| RPD_exc | min: 0, max: 315640, mean: 557.5849, std: 5209.143 | Float | Exclusive regression-path duration: Sum of all fixation durations after initiating a first-pass regression from a word until fixating a word to the right of this word, without counting fixations on the word itself (RPD_inc-RBRT). | 0 | nan | compute_reading_measures.py |
| RBRT | min: 0, max: 10675, mean: 519.2098, std: 638.9024 | Float | Right-bounded reading time: Sum of all fixation durations on a word until a word to the right of this word is fixated (RPD_inc-RDP_exc). | 0 | nan | compute_reading_measures.py |
| Fix | 0: 110, 1: 404310 | Categorical | Fixation: 1 if the word was fixated, otherwise 0 (FPF or RR). | 0 | nan | compute_reading_measures.py |
| FPF | 0: 56838, 1: 347582 | Categorical | First-pass fixation: 1 if the word was fixated in the first-pass, otherwise 0. | 0 | nan | compute_reading_measures.py |
| RR | 0: 48241, 1: 356179 | Categorical | Re-reading: 1 if the word was fixated after the first-pass reading, otherwise 0 (sign(RRT)). | 0 | nan | compute_reading_measures.py |
| FPReg | 0: 308156, 1: 96264 | Categorical | First-pass regression: 1 if a regression was initiated in the first-pass reading of the word, otherwise 0 (sign(RPD exc)). | 0 | nan | compute_reading_measures.py |
| TRC_out | min: 0, max: 15, mean: 0.8249, std: 1.193 | Float | Total count of outgoing regressions: total number of regressive saccades initiated from this word. | 0 | nan | compute_reading_measures.py |
| TRC_in | min: 0, max: 12, mean: 0.7776, std: 1.1734 | Float | Total count of incoming regressions: total number of regressive saccades landing on this word. | 0 | nan | compute_reading_measures.py |
| LP | min: 1, max: 28, mean: 3.3887, std: 2.3225 | Float | Landing position: position of the first saccade on the word expressed by ordinal position of the fixated character. | 0 | nan | compute_reading_measures.py |
| SL_in | min: -162, max: 156, mean: 1.3449, std: 2.928 | Float | Incoming saccade length: length of the saccade that leads to first fixation on a word in number of words; positive sign if the saccade is a progressive one, negative sign if it is a regression. | 0 | nan | compute_reading_measures.py |
| SL_out | min: -179, max: 63, mean: -0.0835, std: 7.9375 | Float | Outgoing saccade length: length of the first saccade that leaves the word in number of words; positive sign if the saccade is a progressive one, negative sign if it is a regression; 0 if the word is never fixated. | 0 | nan | compute_reading_measures.py |
| mean_acc_tq | min: 0.0, max: 0.9991603694374476, mean: 0.3819, std: 0.3148 | Float | The mean accuracy of all background questions for one text read by one reader. | 0 | nan | nan |
| mean_acc_bq | min: 0.0, max: 0.999250936329588, mean: 0.6398, std: 0.312 | Float | The mean accuracy of all text questions for one text read by one reader. | 0 | nan | nan |
| gender_numeric | 0.0: 187536, 1.0: 212874, nan: 4010 | Categorical | Numerical value of gender; 0=male, 1=female. | 4010 | nan | nan |
| age | min: 18.0, max: 41.0, mean: 24.0283, std: 4.1436 | Float | Reader's age. | 8459 | nan | demographic questionnaire |
| discipline_level_of_studies_numeric | 0: 89325, 1: 133833, 2: 65008, 3: 116254 | Categorical | Numerical value of discipline_level_of_studies; 0=biology-beginner, 1=biology-expert, 2=physics-beginner, 3=physics-expert. | 0 | nan | demographic questionnaire |
Contains the mapping of each aoi to the respective word in each of the texts.
Please find the file at this link: aoi to word mapping
| Column name | Possible values | Value type | Description | Num missing values | Missing value description | Source |
|---|---|---|---|---|---|---|
| text_id | b0, b1, b2, b3, b4, b5, p0, p1, p2, p3, p4, p5 | Unique identifier given to each stimulus text. | 0 | nan | nan | |
| word_index_in_text | 1-180 | Integer | The index of the word in the text. Indexing starts at 1. | 0 | nan | nan |
| char_index_in_text | 1-1121 | Integer | Index of a character in the text. Indexing starts at 1. | 0 | nan | nan |
In the participants' data file, all demographic information is stored.
Please find the file at this link: Participant information
| Column name | Possible values | Value type | Description | Num missing values | Missing value description | Source |
|---|---|---|---|---|---|---|
| reader_id | 0-105 | Integer | The unique identifier given to each reader. Reader IDs start at 0. | 0 | nan | Manually created |
| reader_discipline | biology: 43, physics: 32 | Categorical | The area of expertise of the reader. All readers are students whose major is either physics or biology. | 0 | nan | demographic questionnaire |
| reader_discipline_numeric | 0: 43, 1: 32 | Categorical | Numerical encoding of the reader discipline; 0=biology, 1=physics. | 0 | nan | Manually created |
| level_of_studies | graduate: 47, undergraduate: 28 | Categorical | Reader's level of studies. Readers are considered experts if they are either MSc or PhD students. 1st semester BSc students are considered beginners. | 0 | nan | demographic questionnaire |
| level_of_studies_numeric | 0: 28, 1: 47 | Categorical | Numerical value of level_of_studies; 0=beginner, 1=expert. | 0 | nan | demographic questionnaire |
| discipline_level_of_studies | biology-graduate: 27, biology-undergraduate: 16, physics-graduate: 20, physics-undergraduate: 12 | Categorical | The combination of the readers' major (reader_discipline) and their expertise (level_of_studies). | 0 | nan | demographic questionnaire |
| discipline_level_of_studies_numeric | 0: 16, 1: 27, 2: 12, 3: 20 | Categorical | Numerical value of discipline_level_of_studies; 0=biology-beginner, 1=biology-expert, 2=physics-beginner, 3=physics-expert. | 0 | nan | demographic questionnaire |
| glasses | no: 54, yes: 20, nan: 1 | Categorical | Whether or not reader is wearing glasses. | 1 | nan | demographic questionnaire |
| age | min: 18.0, max: 41.0, mean: 24.1644, std: 4.2098 | Float | Reader's age. | 2 | nan | demographic questionnaire |
| handedness | right: 68, left: 6, nan: 1 | Categorical | Reader's handedness. | 1 | nan | demographic questionnaire |
| hours_sleep | min: 0.0, max: 11.0, mean: 7.2095, std: 1.3138 | Float | The hours of sleep of the participant before the experiment. | 1 | nan | demographic questionnaire |
| alcohol | no: 71, yes: 3, nan: 1 | Categorical | Whether or not a participant consumed alcohol within 24 hours before the experiment start. | 1 | nan | demographic questionnaire |
| gender | female: 39, male: 35, nan: 1 | Categorical | Reader's gender. | 1 | nan | demographic questionnaire |
| gender_numeric | 0.0: 35, 1.0: 39, nan: 1 | Categorical | Numerical value of gender; 0=male, 1=female. | 1 | nan | nan |
| semester | string | The semester the reader is currently enrolled in. | 1 | nan | demographic questionnaire | |
| bilingual | n: 73, j: 1, nan: 1 | Categorical | Whether the reader is bilingual. | 1 | nan | demographic questionnaire |
| state | string | The German state the reader is from. | 1 | nan | demographic questionnaire | |
| grade | string | The grade of the reader in their university entrance diploma. | 4 | nan | demographic questionnaire | |
| subject_detailed | The detailed subject of the reader's major. | 1 | nan | demographic questionnaire |
The response accuracy for each participant for each question.
Please find the file at this link: Participant response accuracy
| Column name | Possible values | Value type | Description | Num missing values | Missing value description | Source |
|---|---|---|---|---|---|---|
| reader_id | 0-105 | Integer | The unique identifier given to each reader. Reader IDs start at 0. | 0 | nan | Manually created |
| reader_discipline | biology: 516, physics: 384 | Categorical | The area of expertise of the reader. All readers are students whose major is either physics or biology. | 0 | nan | demographic questionnaire |
| reader_discipline_numeric | 0: 516, 1: 384 | Categorical | Numerical encoding of the reader discipline; 0=biology, 1=physics. | 0 | nan | Manually created |
| level_of_studies | graduate: 564, undergraduate: 336 | Categorical | Reader's level of studies. Readers are considered experts if they are either MSc or PhD students. 1st semester BSc students are considered beginners. | 0 | nan | demographic questionnaire |
| level_of_studies_numeric | 0: 336, 1: 564 | Categorical | Numerical value of level_of_studies; 0=beginner, 1=expert. | 0 | nan | demographic questionnaire |
| text_id | b0, b1, b2, b3, b4, b5, p0, p1, p2, p3, p4, p5 | Unique identifier given to each stimulus text. | 0 | nan | nan | |
| text_domain | biology: 450, physics: 450 | Categorical | The domain of the stimulus text. | 0 | nan | Manually tagged |
| expert_reading_label | expert-reading: 282, non-expert-reading: 618 | Categorical | Whether the reader is an expert in the text domain (i.e. text_domain == reader_discipline and reader is expert) | 0 | nan | Manually tagged |
| expert_reading_label_numeric | 0: 618, 1: 282 | Categorical | Numeric encoding of whether the reader is an expert in the text domain (i.e. text_domain == reader_discipline and reader is expert). 1=expert_reading, 0=non-expert_reading | 0 | nan | Manually tagged |
| acc_tq_1 | min: 0.0, max: 1.0, mean: 0.6475, std: 0.478 | Float | The accuracy of text question 1. The answer can be either true or false, so the value is either 0 or 1. | 12 | For participant 1 (p0), 31 (p0, b1, b5), 32 (p0), 61 (p1), 62 (b0, b1, b3, b5, 04) and 90 (b3) the accuracies for certain trials are missing due to hardware problems (missing measurements). | nan |
| acc_tq_2 | min: 0.0, max: 1.0, mean: 0.6441, std: 0.479 | Float | The accuracy of text question 2. The answer can be either true or false, so the value is either 0 or 1. | 12 | For participant 1 (p0), 31 (p0, b1, b5), 32 (p0), 61 (p1), 62 (b0, b1, b3, b5, 04) and 90 (b3) the accuracies for certain trials are missing due to hardware problems (missing measurements). | nan |
| acc_tq_3 | min: 0.0, max: 1.0, mean: 0.6509, std: 0.477 | Float | The accuracy of text question 3. The answer can be either true or false, so the value is either 0 or 1. | 12 | For participant 1 (p0), 31 (p0, b1, b5), 32 (p0), 61 (p1), 62 (b0, b1, b3, b5, 04) and 90 (b3) the accuracies for certain trials are missing due to hardware problems (missing measurements). | nan |
| acc_bq_1 | min: 0.0, max: 1.0, mean: 0.393, std: 0.4887 | Float | The accuracy of background question 1. The answer can be either true or false, so the value is either 0 or 1. | 12 | For participant 1 (p0), 31 (p0, b1, b5), 32 (p0), 61 (p1), 62 (b0, b1, b3, b5, 04) and 90 (b3) the accuracies for certain trials are missing due to hardware problems (missing measurements). | nan |
| acc_bq_2 | min: 0.0, max: 1.0, mean: 0.366, std: 0.482 | Float | The accuracy of background question 2. The answer can be either true or false, so the value is either 0 or 1. | 12 | For participant 1 (p0), 31 (p0, b1, b5), 32 (p0), 61 (p1), 62 (b0, b1, b3, b5, 04) and 90 (b3) the accuracies for certain trials are missing due to hardware problems (missing measurements). | nan |
| acc_bq_3 | min: 0.0, max: 1.0, mean: 0.4234, std: 0.4944 | Float | The accuracy of background question 3. The answer can be either true or false, so the value is either 0 or 1. | 12 | For participant 1 (p0), 31 (p0, b1, b5), 32 (p0), 61 (p1), 62 (b0, b1, b3, b5, 04) and 90 (b3) the accuracies for certain trials are missing due to hardware problems (missing measurements). | nan |
| mean_acc_tq | min: 0.0, max: 1.0, mean: 0.6475, std: 0.3082 | Float | The mean accuracy of all background questions for one text read by one reader. | 12 | nan | nan |
| mean_acc_bq | min: 0.0, max: 1.0, mean: 0.3941, std: 0.3163 | Float | The mean accuracy of all text questions for one text read by one reader. | 12 | nan | nan |
This file is an explanation of the values used in the online survey answer file (response_data_online_survey.csv). Each variable has four different options which are expressed as a numerical value and each of the option is mapped to the text option the participant saw.
Please find the file at this link: Answer coding online survey
| Column name | Possible values | Value type | Description | Num missing values | Missing value description | Source |
|---|---|---|---|---|---|---|
| VAR | string | Variable name of the fields in the participant online survey. These are explanations of the names of the columns in the file: response_data_online_survey.csv | 0 | nan | online survey tool | |
| RESPONSE | -9: 46, 0: 2, 1: 95, 2: 92, 3: 93, 4: 54, 5: 13, 6: 13, 7: 12, 8: 12, 9: 12, 10: 12, 11: 12, 12: 12 | Categorical | The response code given by the online survey tool. In the answer file these codes are used. | 0 | nan | online survey tool |
| MEANING | The literal meaning of the response. What the participant could see in the online survey. | 0 | nan | online survey tool | ||
| CORRECT_ANSWER | nan: 312, False: 126, True: 42 | Categorical | Whether or not this answer was a correct answer or not. | 312 | The value is missing if this is not applicable. If the answer means that the participant did not even answer. | online survey tool |
This file contains the response accuracy for the participants from the online survey.
Please find the file at this link: Response accuracy
| Column name | Possible values | Value type | Description | Num missing values | Missing value description | Source |
|---|---|---|---|---|---|---|
| text_id | b0, b1, b2, b3, b4, b5, p0, p1, p2, p3, p4, p5 | Unique identifier given to each stimulus text. | 0 | nan | nan | |
| text_domain | biology: 210, physics: 210 | Categorical | The domain of the stimulus text. | 0 | nan | Manually tagged |
| mean_acc_tq | min: 0.0, max: 1.0, mean: 0.2619, std: 0.2495 | Float | The mean accuracy of all background questions for one text read by one reader. | 0 | nan | nan |
| reader_discipline | biology: 108, other: 156, physics: 156 | Categorical | The area of expertise of the reader. All readers are students whose major is either physics or biology. | 0 | nan | demographic questionnaire |
| level_of_studies | graduate: 264, other: 156 | Categorical | Reader's level of studies. Readers are considered experts if they are either MSc or PhD students. 1st semester BSc students are considered beginners. | 0 | nan | demographic questionnaire |
The original response data from the online survey. The coding fo the values contained in here is found in the answer_coding_online_survey.csv file which is why the table below is empty. Please note that there are many value isn this file which are not relevant for this corpus. E.g., all columns starting with RA specify the randomization and all values starting with TIME contain response time information.
Please find the file at this link: Response data online survey
| Column name | Possible values | Value type | Description | Num missing values | Missing value description | Source |
|---|