five

PhillyMac/Giving_and_Receiving_Feedback_Content_1

收藏
Hugging Face2026-04-01 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/PhillyMac/Giving_and_Receiving_Feedback_Content_1
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc0-1.0 task_categories: - text-generation - feature-extraction language: - en tags: - corpus - leadership - historical - deku-corpus-builder size_categories: - 1K<n<10K --- # Giving and Receiving Feedback Content 1 This corpus was automatically generated by the **Deku Corpus Builder** for use in RAG-based AI applications. ## Dataset Description - **Subject**: Giving and Receiving Feedback Leadership - **Subject Type**: topic - **Total Items**: 1,749 - **Items Requiring Attribution**: 0 - **Has Embeddings**: Yes (all-MiniLM-L6-v2) - **Created**: 2026-04-01 ## Dataset Structure Each record contains: - `text`: The content text - `source_url`: Original source URL - `source_title`: Title of the source document - `source_domain`: Domain of the source - `license_type`: License classification (e.g. `public_domain`, `cc_by`, `cc_by_sa`) - `attribution_required`: Boolean — True for CC BY / CC BY-SA and other attribution-required licenses - `attribution_text`: Formatted Creative Commons attribution string (empty if not required) - `license_url`: URL to the CC license deed (empty if not required) - `relevance_score`: Relevance to the subject (0-1) - `quality_score`: Content quality score (0-1) - `topics`: JSON array of detected topics - `character_count`: Length of the text - `subject_name`: The subject this content relates to - `subject_type`: "personality" or "topic" - `extraction_date`: When the content was extracted - `embedding`: Pre-computed 384-dimensional embedding vector ## Attribution 0 of 1,749 chunks in this corpus require attribution under their source license. When building lessons from these chunks, the `attribution_text` field must be surfaced in the lesson output per the Legend Leadership Attribution Tracking Spec. ## Usage ```python from datasets import load_dataset dataset = load_dataset("PhillyMac/Giving_and_Receiving_Feedback_Content_1") # Access attribution-required chunks for item in dataset["train"]: if item["attribution_required"]: print(item["attribution_text"]) ``` ## Integration with RAG This dataset is designed to be integrated with existing embedded corpuses. The embeddings use the `sentence-transformers/all-MiniLM-L6-v2` model, compatible with FAISS indexing. ## License Content is sourced from public domain and Creative Commons licensed materials. See individual `license_type` fields for per-chunk licensing details. ## Generated By [Deku Corpus Builder](https://github.com/PhillyMac/deku-corpus-builder) - An automated corpus building system for AI applications.

--- license: cc0-1.0 task_categories: - text-generation - feature-extraction language: - en tags: - corpus - leadership - historical - Deku Corpus Builder size_categories: - 1K<n<10K --- # 给予与接收反馈内容1 本语料库由**Deku语料库构建器(Deku Corpus Builder)**自动生成,专为基于检索增强生成(Retrieval-Augmented Generation,简称RAG)的人工智能应用打造。 ## 数据集说明 - **主题**:给予与接收反馈的领导力实践 - **主题类型**:话题 - **总条目数**:1749 - **需标注来源条目数**:0 - **已生成词嵌入**:是(采用all-MiniLM-L6-v2模型) - **创建时间**:2026-04-01 ## 数据集结构 每条数据记录包含以下字段: - `text`:内容文本 - `source_url`:原始来源网址 - `source_title`:来源文档标题 - `source_domain`:来源域名 - `license_type`:许可证分类(例如`public_domain`(公有领域)、`cc_by`(知识共享署名许可)、`cc_by_sa`(知识共享署名-相同方式共享许可)) - `attribution_required`:布尔值——对于CC BY、CC BY-SA等需标注来源的许可证,该值为`True` - `attribution_text`:格式化后的知识共享署名字符串(无需标注时为空) - `license_url`:知识共享许可证 deed 页面的网址(无需标注时为空) - `relevance_score`:与主题的相关度评分(取值范围0-1) - `quality_score`:内容质量评分(取值范围0-1) - `topics`:检测到的主题的JSON数组 - `character_count`:文本字符数 - `subject_name`:该内容关联的主题名称 - `subject_type`:取值为"personality"(人物)或"topic"(话题) - `extraction_date`:内容提取时间 - `embedding`:预计算得到的384维词嵌入向量 ## 来源标注 本语料库的1749个文本块中,无任何条目需根据其来源许可证标注来源。 当基于这些文本块构建教学内容时,需按照《Legend领导力归因跟踪规范》(Legend Leadership Attribution Tracking Spec)在教学输出中展示`attribution_text`字段的内容。 ## 使用方法 python from datasets import load_dataset dataset = load_dataset("PhillyMac/Giving_and_Receiving_Feedback_Content_1") # Access attribution-required chunks for item in dataset["train"]: if item["attribution_required"]: print(item["attribution_text"]) ## 与RAG系统集成 本数据集专为与现有嵌入语料库集成而设计。其词嵌入由`sentence-transformers/all-MiniLM-L6-v2`模型生成,兼容FAISS索引。 ## 许可证 本数据集内容源自公有领域及知识共享许可协议授权的素材。各文本块的具体许可证信息请参见`license_type`字段。 ## 生成方 [Deku语料库构建器(Deku Corpus Builder)](https://github.com/PhillyMac/deku-corpus-builder)——一款面向人工智能应用的自动化语料库构建系统。
提供机构:
PhillyMac
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作