five

The dative dataset of World Englishes

收藏
NIAID Data Ecosystem2026-03-13 收录
下载链接:
https://zenodo.org/record/2553356
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset is distributed under a Creative Commons Attribution Non Commercial 4.0 International license. Use for research purposes only! The dataset contains 13,171 variable double-object and prepositional datives extracted from the International Corpus of English series and the Corpus of Global web-based English sampling from nine national varieties of English: British English Canadian English New Zealand English Irish English Hong Kong English Philippine English Singapore English Indian English Jamaican English   The dataframe contains the following columns: 1 TokenID: Unique identifier for the individual token 2 Variety: The variety from which the token is taken 3 Nativity: Native or non-native variety of English (L1 vs. L2) 4 Corpus: The corpus from which the token stems 5 Subcorpus: Combination of Corpus and Variety 6 FileID: ID of the corpus file in which the token was found. Format: VARIETY:FILENAME 7 TextID: ID of the corpus text in which the token was found. Individual files in ICE can have multiple texts. Format: VARIETY:FILENAME:TEXTNUMBER 8 LineID: ID of the line in the text in which the token sentence was found. Format: VARIETY:FILENAME:TEXTNUMBER:LINENUMBER 9 SpeakerID: ID of the speaker of the sentence. Speakers in spoken texts are indicated with capital letters. Authors of written texts have ID ‘A’. Format: VARIETY:FILENAME:TEXTNUMBER:SPEAKERID 10 UnitMarker: UnitMarker of the utterance in the text. Format UTTERANCE NUMBER:TEXTNUMBER:SPEAKERID 11 GenreFine: 14-level distinction: The 12-level ICE sub-register in which the token was found and the two levels in GloWbE (blog vs. general). Levels: See ICE documentation 12 GenreCoarse: 5-level distinction: The 4-level ICE register in which the token was found and GloWbE (online = 1 level). Levels: See ICE documentation 13 Mode: The mode (‘spoken’, ‘written’) of the token. 14 Register: The 4-level Register along two axes – spoken vs. written / informal vs. formal 15 PriorContextPlain: The plain text version of the 100 words preceding the dative token. 16 PriorContextTag: The POS-tagged version of the 100 words preceding the dative token. 17 SentencePlain: The plain text version of the sentence containing the dative token. 18 SentenceTag: The POS-tagged version of the sentence containing the dative token. 19 WholeConstructionPlain: The plain text version of the VP containing the dative token (i.e. verb + object + object). 20 WholeConstructionTag: The POS-tagged version of the VP containing the dative token. 21 Verb: The lemma of the verbal head (give in gave it some thought) 22 VerbForm: The verb form of the verbal head (gave in gave it some thought) 23 RecipientShort: The short plain text version of the recipient without hesitations or repetitions 24 ThemeShort: The short plain text version of the theme without hesitations or repetitions 25 RecipientLong: The long plain text version of the recipient with hesitations or repetitions 26 ThemeLong: The long plain text version of the theme with hesitations or repetitions 27 RecHeadPlain: The plain text version of the recipient head 28 RecHeadTag: The POS-tagged version of the recipient head 29 RecHeadLemma: The lemma of the recipient head 30 ThemeHeadPlain: The plain text version of the theme head 31 ThemeHeadTag: The POS-tagged version of the theme head 32 ThemeHeadLemma: The lemma of the theme head 33 VerbThemeLemma: Combination of the verb lemma and the theme head. Format: VERB_THEME 34 VerbSense: Semantics of the verb based on the whole construction combined with the verb lemma. Format: VERB.VERBSEMANTICS 35 VerbSemantics: 5-level distinction of verb semantics (‘a’, ‘t’, ‘p’, ‘f’, ‘c’). 36 Resp: The variant order. Levels: ‘do’ (=ditransitive), ‘pd’ (=prepositional) 37 RecAnimacy: 6-level distinction of recipient animacy following previous research: human (a1) > animal (a2) > collective (c) > locative (l) > temporal (t) > inanimate (i) 38 ThemeAnimacy: 6-level distinction of theme animacy following previous research: human (a1) > animal (a2) > collective (c) > locative (l) > temporal (t) > inanimate (i) 39 RecWordLth: Length of recipient NP in words 40 RecLetterLth: Length of recipient NP in orthographic characters 41 ThemeWordLth: Length of theme NP in words 42 ThemeLetterLth: Length of theme NP in orthographic characters 43 RecComplexity 15-level distinction of recipient complexity indicating type and number of posthead dependents, restricted to the ICE components. (GloWbE components make simplified distinction between ‘simple’ and ‘complex’). Levels: ‘s’ = simple (no postmodifications), ‘co’ = coordinated, ‘ge’ = general extender, ‘gn’ = genitive, ‘postad’ = postmodifying adverbial/adjective, ‘pp’ = modifying prepositional phrase, ‘appnom’ = nominal apposition, ‘rc’ = relative clause, ‘cp’ = complement clause, ‘advc’ = adverbial clause, ‘nonfin’ = nonfinite clause, ‘tpp’ = two nominal posthead dependents, ‘tvp’ = two posthead dependents involving at least one VP, ‘mpp’ = more than two nominal posthead dependents, ‘mvp’ = more than two posthead dependents involving at least one VP 44 ThemeComplexity 15-level distinction of theme complexity indicating type and number of posthead dependents, restricted to the ICE components. (GloWbE components make simplified distinction between ‘simple’ and ‘complex’). Levels: ‘s’ = simple (no postmodifications), ‘co’ = coordinated, ‘ge’ = general extender, ‘gn’ = genitive, ‘postad’ = postmodifying adverbial/adjective, ‘pp’ = modifying prepositional phrase, ‘appnom’ = nominal apposition, ‘rc’ = relative clause, ‘cp’ = complement clause, ‘advc’ = adverbial clause, ‘nonfin’ = nonfinite clause, ‘tpp’ = two nominal posthead dependents, ‘tvp’ = two posthead dependents involving at least one VP, ‘mpp’ = more than two nominal posthead dependents, ‘mvp’ = more than two posthead dependents involving at least one VP 45 RecNPExprType: Syntactic category of the recipient NP Levels: ‘dem’ = bare demonstrative; ‘nc’ = common noun; ‘np’ = proper noun; ‘pprn’ = personal pronoun; ‘iprn’ = impersonal pronoun; ‘rprn’ = reflexive pronoun; ‘vp’ = gerund (-ing) NP; ‘wh’ = NP headed by wh- word 46 ThemeNPExprType: Syntactic category of the theme NP Levels: ‘dem’ = bare demonstrative; ‘nc’ = common noun; ‘np’ = proper noun; ‘pprn’ = personal pronoun; ‘iprn’ = impersonal pronoun; ‘rprn’ = reflexive pronoun; ‘vp’ = gerund (-ing) NP; ‘wh’ = NP headed by wh- word 47 RecGivenness: Givenness of the recipient NP. Levels: ‘given’, ‘new’ 48 ThemeGivenness: Givenness of the theme NP. Levels: ‘given’, ‘new’ 49 RecDefiniteness: Definiteness of the recipient NP. Levels: ‘def’, ‘indef’ 50 ThemeDefiniteness: Definiteness of the theme NP. Levels: ‘def’, ‘indef’ 51 RecBinComplexity: Binary predictor of recipient complexity indicating following postmodifications after the head noun. Levels: ‘simple’, ‘complex’ 52 ThemeBinComplexity: Binary predictor of theme complexity indicating following postmodifications after the head noun. Levels: ‘simple’, ‘complex’ 53 RecPerson: Person of recipient. Levels: ‘local’, ‘non-local’ 54 ThemeConcreteness: Concreteness of theme based on verb semantics. Levels: ‘concrete’, ‘non-concrete’ 55 TypeTokenRatio: Type-token ratio of the 100 word context surrounding the token 56 RecHeadFreq: Frequency of recipient head lemma in GloWbE 57 ThemeHeadFreq: Frequency of theme head lemma in GloWbE 58 RecThematicity: Normalized frequency of recipient head lemma in its text (per 2000 words) 59 ThemeThematicity: Normalized frequency of theme head lemma in its text (per 2000 words) 60 PrimeType: The response type of the preceding dative token, if any. Levels: ‘do, ‘pd, ‘NA’ 61 Persistence: Indicates whether preceding dative token, if any, is the same or not. Levels: ‘none’, ‘yes’, ‘no’ 62 SameUtterance: Indicates whether the preceding dative token occurred in the same utterance or not. Necessary for manual coding of persistence. 63 DistanceToPrevious: Number of utterances between current and preceding dative token. ‘None’ if no preceding dative token. 64 RecPron: Binary factor of recipient pronominality. Levels: ‘pron’, ‘non-pron’ 65 ThemePron: Binary factor of theme pronominality: Levels: ‘pron’, ‘non-pron’ 66 RecBinAnimacy: Binary factor of recipient animacy. Levels of RecAnimacy conflated to: ‘animate’, ‘inanimate’ 67 ThemeBinAnimacy: Binary factor of theme animacy. Levels of ThemeAnimacy conflated to: ‘animate’, ‘inanimate’ 68 logRecLetterLth: Natural logarithm of recipient length in orthographic characters 69 logThemeLetterLth: Natural logarithm of theme length in orthographic characters 70 WeightRatio: Ratio of object lengths: Recipient length in characters divided by theme length in characters 71 logWeightRatio: Natural logarithm of weight ratio 72 PrimeTypePruned: The variant of the preceding dative token within the previous 10 utterances. Levels: ‘none’, ‘do’, ‘pd’ 73 NumDistanceToPrevious: Numeric distance to previous token (for calculations in R) 74 PersistencePruned: Indicates whether the preceding token within the previous 10 utterances is the same as the current token. Levels: ‘none’, ‘yes’, ‘no’ 75-82 z.__________: Numeric predictor centered around the mean and scaled by two standard deviations. 83 Variety.Sum: Column used for sum coding in modeling process
创建时间:
2022-09-05
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作