defector/autotrain-data-company
收藏数据集概述
数据集描述
本数据集是为项目“company”自动处理生成的。
语言
数据集的语言代码为BCP-47标准的en。
数据集结构
数据实例
数据集的样本示例如下:
json [ { "tokens": [ "sahil", "prasad", "president", "www", "swimcentre", "com", "banik", "baalkrishan", "gandhi", "com", "no", "satish", "nagar", "hisar" ], "tags": [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 ] }, { "tokens": [ "olivia", "wilson", "real", "estate", "agent", "reallygreatsite", "com", "anywhere", "st", "any", "city", "st", "www", "reallygreatsite", "com" ], "tags": [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 ] } ]
数据集字段
数据集包含以下字段:
json { "tokens": "Sequence(feature=Value(dtype=string, id=None), length=-1, id=None)", "tags": "Sequence(feature=ClassLabel(num_classes=2, names=[0, 9], id=None), length=-1, id=None)" }
数据集分割
数据集分为训练集和验证集,分割情况如下:
| 分割名称 | 样本数量 |
|---|---|
| 训练集 | 999651 |
| 验证集 | 499630 |
以上信息为数据集的基本概述。



