five

ATIS - Seven Languages

收藏
DataCite Commons2021-01-19 更新2025-04-16 收录
下载链接:
https://catalog.ldc.upenn.edu/LDC2021T04
下载链接
链接失效反馈
官方服务:
资源简介:
Introduction<br><br> ATIS - Seven Languages was developed by Amazon Web Services, Inc. and consists of translation and annotation of ATIS (Air Travel Information Services) corpora, specifically ATIS2 (LDC93S5), ATIS3 Training Data (LDC94S19), and ATIS3 Test Data (LDC95S26) into six languages: Spanish, German, French, Portuguese, Chinese, and Japanese.<br><br> The ATIS collection was developed to support the research and development of speech understanding systems. Participants were presented with various hypothetical travel planning scenarios and asked to solve them by interacting with partially or completely automated ATIS systems. The resulting utterances were recorded and transcribed. Data was collected in the early 1990s at five US sites: Raytheon BBN, Carnegie Mellon University, MIT Laboratory of Computer Science, National Institute for Standards and Technology, and SRI International. Data<br><br> The data is separated into 4,978 utterances for training and 893 utterances for testing following the original ATIS division. The training set contains 4,978 utterances selected from the Class A (context independent) training data in the ATIS2 and ATIS3 corpora. The test set contains 893 utterances from the November 1993 and December 1994 data sets in ATIS3.<br><br> The original English utterances were manually translated into the six languages. This release also indicates the original English utterance. Each utterance is annotated with named entities via table lookup; markers include city, airline, airport names and dates.<br><br> Data is stored in UTF-8 encoded tab separated value files. Samples<br><br> Please view the following samples:<br><br> English Source (TXT) Japanese Translation (TXT) French Translation (TXT)<br><br> Updates<br><br> None at this time. Copyright Portions © 2021 Amazon Web Services, Inc., © 1993-1995, 2019, 2021 Trustees of the University of Pennsylvania
提供机构:
Linguistic Data Consortium
创建时间:
2021-01-07
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作