STRUCTUREDREGEX
收藏arXiv2020-05-02 更新2024-06-21 收录
下载链接:
https://www.cs.utexas.edu/~xiye/streg/
下载链接
链接失效反馈官方服务:
资源简介:
STRUCTUREDREGEX是由德克萨斯大学奥斯汀分校计算机科学系创建的一个新的正则表达式合成数据集。该数据集包含3520个英语描述,每个描述都与一个复杂的正则表达式及其相关的正负例配对。数据集的创建过程涉及使用结构化概率文法生成正则表达式,并通过向众包工作者展示抽象图示来收集语言描述,以避免预设描述方式。STRUCTUREDREGEX旨在解决现有数据集在正则表达式复杂性和语言描述多样性方面的不足,适用于开发大型神经模型,特别是在处理真实世界正则表达式任务时。
STRUCTUREDREGEX is a novel regular expression synthesis dataset developed by the Department of Computer Science at The University of Texas at Austin. This dataset includes 3,520 English descriptions, each paired with a complex regular expression and its associated positive and negative examples. The dataset creation process involves generating regular expressions using structured probabilistic grammars, and collecting linguistic descriptions by presenting abstract diagrams to crowdworkers to avoid preset description patterns. STRUCTUREDREGEX aims to address the shortcomings of existing datasets in terms of regular expression complexity and linguistic description diversity, and is suitable for developing large-scale neural models, particularly when handling real-world regular expression tasks.
提供机构:
德克萨斯大学奥斯汀分校计算机科学系
创建时间:
2020-05-02



