corto-ai/nsw-caselaw-chunked
收藏Hugging Face2024-08-31 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/corto-ai/nsw-caselaw-chunked
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含多个配置,每个配置对应不同的文本分块策略(如chunk size和overlap)。数据集的特征包括文档的唯一标识符(_id)、版本标识符(version_id)、类型(type)、管辖区域(jurisdiction)、来源(source)、MIME类型(mime)、日期(date)、引用(citation)、URL(url)、抓取时间(when_scraped)、文本内容(text)以及分块索引(chunk_index)。数据集主要用于文本分块任务,适用于自然语言处理中的文本处理和分析。
The dataset contains multiple configurations, each corresponding to different text chunking strategies (e.g., chunk size and overlap). The features of the dataset include a unique document identifier (_id), version identifier (version_id), type, jurisdiction, source, MIME type (mime), date, citation, URL (url), scraping time (when_scraped), text content (text), and chunk index (chunk_index). The dataset is primarily used for text chunking tasks and is suitable for text processing and analysis in natural language processing.
提供机构:
corto-ai



