five

open-source-metrics/tokenizers-dependents

收藏
Hugging Face2024-05-28 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/open-source-metrics/tokenizers-dependents
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: apache-2.0 pretty_name: tokenizers metrics tags: - github-stars dataset_info: features: - name: name dtype: string - name: stars dtype: int64 - name: forks dtype: int64 splits: - name: package num_bytes: 95 num_examples: 2 - name: repository num_bytes: 1893 num_examples: 42 download_size: 5046 dataset_size: 1988 --- # tokenizers metrics This dataset contains metrics about the huggingface/tokenizers package. Number of repositories in the dataset: 11460 Number of packages in the dataset: 124 ## Package dependents This contains the data available in the [used-by](https://github.com/huggingface/tokenizers/network/dependents) tab on GitHub. ### Package & Repository star count This section shows the package and repository star count, individually. Package | Repository :-------------------------:|:-------------------------: ![tokenizers-dependent package star count](./tokenizers-dependents/resolve/main/tokenizers-dependent_package_star_count.png) | ![tokenizers-dependent repository star count](./tokenizers-dependents/resolve/main/tokenizers-dependent_repository_star_count.png) There are 14 packages that have more than 1000 stars. There are 41 repositories that have more than 1000 stars. The top 10 in each category are the following: *Package* [huggingface/transformers](https://github.com/huggingface/transformers): 70475 [hankcs/HanLP](https://github.com/hankcs/HanLP): 26958 [facebookresearch/ParlAI](https://github.com/facebookresearch/ParlAI): 9439 [UKPLab/sentence-transformers](https://github.com/UKPLab/sentence-transformers): 8461 [lucidrains/DALLE-pytorch](https://github.com/lucidrains/DALLE-pytorch): 4816 [ThilinaRajapakse/simpletransformers](https://github.com/ThilinaRajapakse/simpletransformers): 3303 [neuml/txtai](https://github.com/neuml/txtai): 2530 [QData/TextAttack](https://github.com/QData/TextAttack): 2087 [lukas-blecher/LaTeX-OCR](https://github.com/lukas-blecher/LaTeX-OCR): 1981 [utterworks/fast-bert](https://github.com/utterworks/fast-bert): 1760 *Repository* [huggingface/transformers](https://github.com/huggingface/transformers): 70480 [hankcs/HanLP](https://github.com/hankcs/HanLP): 26958 [RasaHQ/rasa](https://github.com/RasaHQ/rasa): 14842 [facebookresearch/ParlAI](https://github.com/facebookresearch/ParlAI): 9440 [gradio-app/gradio](https://github.com/gradio-app/gradio): 9169 [UKPLab/sentence-transformers](https://github.com/UKPLab/sentence-transformers): 8462 [microsoft/unilm](https://github.com/microsoft/unilm): 6650 [EleutherAI/gpt-neo](https://github.com/EleutherAI/gpt-neo): 6431 [moyix/fauxpilot](https://github.com/moyix/fauxpilot): 6300 [lucidrains/DALLE-pytorch](https://github.com/lucidrains/DALLE-pytorch): 4816 ### Package & Repository fork count This section shows the package and repository fork count, individually. Package | Repository :-------------------------:|:-------------------------: ![tokenizers-dependent package forks count](./tokenizers-dependents/resolve/main/tokenizers-dependent_package_forks_count.png) | ![tokenizers-dependent repository forks count](./tokenizers-dependents/resolve/main/tokenizers-dependent_repository_forks_count.png) There are 11 packages that have more than 200 forks. There are 39 repositories that have more than 200 forks. The top 10 in each category are the following: *Package* [huggingface/transformers](https://github.com/huggingface/transformers): 16158 [hankcs/HanLP](https://github.com/hankcs/HanLP): 7388 [facebookresearch/ParlAI](https://github.com/facebookresearch/ParlAI): 1920 [UKPLab/sentence-transformers](https://github.com/UKPLab/sentence-transformers): 1695 [ThilinaRajapakse/simpletransformers](https://github.com/ThilinaRajapakse/simpletransformers): 658 [lucidrains/DALLE-pytorch](https://github.com/lucidrains/DALLE-pytorch): 543 [utterworks/fast-bert](https://github.com/utterworks/fast-bert): 336 [nyu-mll/jiant](https://github.com/nyu-mll/jiant): 273 [QData/TextAttack](https://github.com/QData/TextAttack): 269 [lukas-blecher/LaTeX-OCR](https://github.com/lukas-blecher/LaTeX-OCR): 245 *Repository* [huggingface/transformers](https://github.com/huggingface/transformers): 16157 [hankcs/HanLP](https://github.com/hankcs/HanLP): 7388 [RasaHQ/rasa](https://github.com/RasaHQ/rasa): 4105 [plotly/dash-sample-apps](https://github.com/plotly/dash-sample-apps): 2795 [facebookresearch/ParlAI](https://github.com/facebookresearch/ParlAI): 1920 [UKPLab/sentence-transformers](https://github.com/UKPLab/sentence-transformers): 1695 [microsoft/unilm](https://github.com/microsoft/unilm): 1223 [openvinotoolkit/open_model_zoo](https://github.com/openvinotoolkit/open_model_zoo): 1207 [bhaveshlohana/HacktoberFest2020-Contributions](https://github.com/bhaveshlohana/HacktoberFest2020-Contributions): 1020 [data-science-on-aws/data-science-on-aws](https://github.com/data-science-on-aws/data-science-on-aws): 884
提供机构:
open-source-metrics
原始信息汇总

数据集概述

数据集名称

tokenizers metrics

许可证

apache-2.0

标签

  • github-stars

数据集信息

特征

  • name: 字符串类型
  • stars: 整数类型(int64)
  • forks: 整数类型(int64)

分割

  • package:
    • 字节数: 95
    • 示例数: 2
  • repository:
    • 字节数: 1893
    • 示例数: 42

下载大小

5046字节

数据集大小

1988字节

数据集内容

仓库和包统计

  • 仓库数量: 11460
  • 包数量: 124

星数统计

  • 超过1000星的包数量: 14
  • 超过1000星的仓库数量: 41

叉数统计

  • 超过200叉的包数量: 11
  • 超过200叉的仓库数量: 39

顶级仓库和包

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作