five

Kiezdeutschkorpus KiDKo

收藏
re3data.org2024-05-31 收录
下载链接:
https://www.re3data.org/repository/r3d100013513
下载链接
链接失效反馈
官方服务:
资源简介:
The KiezDeutsch-Korpus (KiDKo) has been developed by project B6 (PI: Heike Wiese) of the collaborative research centre Information Structure (SFB 632) at the University of Potsdam from 2008 to 2015. KiDKo is a multi-modal digital corpus of spontaneous discourse data from informal, oral peer group situations in multi- and monoethnic speech communities. KiDKo contains audio data from self-recordings, with aligned transcriptions (i.e., at every point in a transcript, one can access the corresponding area in the audio file). The corpus provides parts-of-speech tags as well as an orthographically normalised layer (Rehbein & Schalowski 2013). Another annotation level provides information on syntactic chunks and topological fields. There are several complementary corpora: KiDKo/E (Einstellungen - "attitudes") captures spontaneous data from the public discussion on Kiezdeutsch: it assembles emails and readers' comments posted in reaction to media reports on Kiezdeutsch. By doing so, KiDKo/E provides data on language attitudes, language perceptions, and language ideologies, which became apparent in the context of the debate on Kiezdeutsch, but which frequently related to such broader domains as multilingualism, standard language, language prestige, and social class. KiDKo/LL ("Linguistic Landscape") assembles photos of written language productions in public space from the context of Kiezdeutsch, for instance love notes on walls, park benches, and playgrounds, graffiti in house entrances, and scribbled messages on toilet walls. Contains materials in following languages: Spanish, Italian, Greek, Kurdish, Swedish, French, Croatian, Arabic, Turkish.

由波茨坦大学信息结构(SFB 632)协同研究中心项目B6(负责人:海克·维塞)自2008年至2015年开发的KiezDeutsch-Korpus(KiDKo)是一个包含多种模式的数字语料库,该语料库收集了来自多民族和单一民族口语群体非正式口头交流场合的自发话语数据。KiDKo包含来自自录的音频数据,并配有相应的转录文本(即在转录的每个点上,均可以访问音频文件中的对应区域)。该语料库提供了词性标注以及正字法规范层(Rehbein & Schalowski 2013)。另一个标注层级提供了关于句法块和拓扑域的信息。此外,还存在几个互补语料库:KiDKo/E(Einstellungen - "态度")捕捉了关于Kiezdeutsch的公共讨论中的自发数据:它汇集了针对有关Kiezdeutsch的媒体报道而发表的电子邮件和读者评论。通过这种方式,KiDKo/E提供了关于语言态度、语言认知和语言意识形态的数据,这些数据在Kiezdeutsch辩论的语境中变得明显,但它们通常与多语言、标准语、语言声望和社会阶级等更广泛的领域相关。KiDKo/LL("语言景观")汇集了来自Kiezdeutsch语境的公共空间中书面语言生产的照片,例如墙壁、公园长椅和游乐场上的情书,住宅入口处的涂鸦,以及厕所墙壁上的涂鸦信息。包含以下语言的材料:西班牙语、意大利语、希腊语、库尔德语、瑞典语、法语、克罗地亚语、阿拉伯语、土耳其语。
提供机构:
KiDKo
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作