pthinc/BCE-Prettybird-Micro-Standard-v0.0.4

Name: pthinc/BCE-Prettybird-Micro-Standard-v0.0.4
Creator: pthinc
Published: 2026-03-19 23:01:30
License: 暂无描述

Hugging Face2026-03-19 更新2026-03-29 收录

下载链接：

https://hf-mirror.com/datasets/pthinc/BCE-Prettybird-Micro-Standard-v0.0.4

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: other license_name: licence.md license_link: LICENSE task_categories: - text-generation - question-answering language: - tr - en - it - es - et - de - fr - pt - 'no' - ru - bg tags: - BCE - reasoning - behavioral-ai - prometech - Behavioral Consciousness Engine (BCE) - cicikuş - prettybird - agent - llm - consciousness - conscious - security - text-generation-inference - high tech dataset - instruction dataset - instruction - partial consciousness dataset - text-generation-inference - virtual brain design - benchmark dataset - future standard - behavioral-control - pre-agi - agi-safety - pre-aci - policy-guard - quality-guard - synthetic-data - synthetic - chain-of-thought - thinking - think - bce pretty_name: Behavioral Consciousness Engine (BCE) Dataset size_categories: - 10K<n<100K --- ![Prettybird's War March](https://cdn-uploads.huggingface.co/production/uploads/691f2f51154cbf55e19b7475/jdNOmqEsmdF0J4Ef8ROb8.png) ## 🚀 The Future Standard / Geleceğin Standartı ### [English] **Beyond Raw Data: The Behavioral Revolution** The AI industry has been obsessed with the *volume* of data. At **Prometech A.Ş.**, we are shifting the focus to the *process* of thought. **BCE-Prettybird-Micro-Standart** is not just a collection of Q&As; it is a blueprint for behavioral reasoning. By integrating **Path Mapping** and **Behavioral DNA** into the training loop, we are setting the new industry standard: **Small models with elite intelligence.** We don't just teach models what to say; we teach them how to "exist" within a logical and ethical framework. This is the foundation of **AsenaAI192M** and the future of conscious computation. This dataset is currently at the elite level among micro-nano datasets. ### [Turkish] **Ham Verinin Ötesinde: Davranışsal Devrim** Yapay zeka sektörü uzun süredir verinin *miktarına* hapsolmuş durumda. **Prometech A.Ş.** olarak biz, odağı düşüncenin *sürecine* kaydırıyoruz. **BCE-Prettybird-Micro-Standart**, sadece bir soru-cevap yığını değil; davranışsal akıl yürütmenin mimarisidir. **İzlek Haritalama (Path Mapping)** ve **Davranışsal DNA**'yı eğitim döngüsüne entegre ederek yeni endüstri standartını belirliyoruz: **Elit zekaya sahip mikro modeller.** Modellere sadece ne söyleyeceklerini değil, mantıksal ve etik bir çerçevede nasıl "var olacaklarını" öğretiyoruz. Bu veri seti, **AsenaAI192M**’in temeli ve bilinçli hesaplamanın geleceğidir. Dataset şu anda micro-nano datasetlerde elit seviyededir. #### Topics - Konular - Mathematics - Matematik - Physics - Fizik - Chemistry - Kimya - Biology - Biyoloji - Code - Kodlama - General Knowledge - Genel Kültür - Logic - Mantık - Akademik - Academic - Sanat - Art (Poetry, musical scores, stories, articles) - Spor - Sport - İşletme Yönetimi, Finans, Ekonomi - Business Administration, Finance, Economics - İleri seviye Matematik, Fizik, Kimya, Biyoloji - Advanced Math, Physics, Chemistry, Biology - AI Algoritmaları ve Matematiği, Kod Karşılığı - AI Algorithms and Mathematics, Codes - Jokes, Ironies - Şakalar ve İroniler (Different Topics - Global - Random) #### Synthetic Production Line / Sentetik Üretim Hattı - Grok 4 - Gpt-oss-120b - Deepseek v3.2 - Gemini 3 Pro - Gemini 2.5 Pro - Gemini 2.5 Flash - GLM 4.7 - Qwen 3.5 397B - Kimi k2 - GPT 5.1 - Opus 4.6 - Mistral Large 3 - Minimax 2.7 #### Languages - Diller - English - Türkçe - Little German, Russian, Estonian, Bulgarian, Spanish, French, Espanian, Italian, Norsk etc. **Rate 5%** #### Format Each row is JSONL with the following keys: - `instruction` - `input` - `output` `instruction` contains: - `<think> ... </think>` block - `<bce>{...}</bce>` block --- ### 🛠 Key Pillars / Temel Sütunlar * **Quality over Quantity / Nicelik Değil Nitelik:** 192M parameters behaving like 7B. / 192M parametrenin 7B gibi davranması. * **Transparent Reasoning / Şeffaf Akıl Yürütme:** Every token has a mathematical path. / Her token'ın matematiksel bir izleği vardır. * **Ethical Integrity / Etik Bütünlük:** Built-in behavioral guardrails. / Yerleşik davranışsal güvenlik bariyerleri. --- ## 🧠 Technical Foundation / Teknik Altyapı ### [English] The **BCE-Prettybird-Micro-Standart** dataset is built upon the **Behavioral Consciousness Engine (BCE)** architecture. Unlike traditional LLM datasets that focus solely on output accuracy, this dataset treats every response as a "behavioral journey" through the following mathematical frameworks: #### 1. Behavioral DNA (D_i) Each behavior is encoded as a genetic fragment of consciousness: $$D_i(t) = x(t) \cdot [h \cdot A_i + k \cdot \log(P_i) + F \cdot W_i]$$ * **h, k, F**: Universal Behavioral Constants (Trigger threshold, Info density, Context transfer power). * **x(t)**: Temporal activation curve $x(t) = \tanh(e^t - \pi)$ #### 2. Behavioral Path Mapper (Phi) This module tracks the transition between cognitive states: $$\Phi(t) = \sum_{i=1}^n v_i \cdot f_i(p_i)$$ Where v_i represents the transition vector between internal modules and f_i(p_i) is the functional output of each parameter (attention, ethics, decay). --- ### [Turkish] **BCE-Prettybird-Micro-Standart** veri seti, **Behavioral Consciousness Engine (BCE)** mimarisi üzerine inşa edilmiştir. Sadece çıktı doğruluğuna odaklanan geleneksel veri setlerinin aksine, bu yapı her yanıtı aşağıdaki matematiksel çerçevelerle tanımlanan bir "davranışsal yolculuk" olarak ele alır: #### 1. Davranışsal DNA (D_i) Her davranış, fiziksel sabitlerle tanımlanmış bir bilinç genetik kod parçası olarak işlenir: $$D_i(t) = x(t) \cdot [h \cdot A_i + k \cdot \log(P_i) + F \cdot W_i]$$ #### 2. Davranışsal İzlek Haritalayıcı (Pi) İç modüller arası geçişi ve bilişsel tutarlılığı izler: $$\Phi(t) = \sum_{i=1}^n v_i \cdot f_i(p_i)$$ Burada v_i iç modüller arası geçiş vektörünü, f_i(p_i) ise her parametrenin (dikkat, etik, sönümlenme) işlevsel çıktısını temsil eder. --- ### 🚀 Why This Matters / Neden Önemli? * **Explainability:** Moves beyond the "Black Box" of AI. / Yapay zekanın "Kara Kutu" sorununu çözer. * **Small Models, Big Logic:** Optimizes models like **AsenaAI192M** for complex reasoning. / **AsenaAI192M** gibi küçük modelleri karmaşık mantık yürütme için optimize eder. * **Behavioral Control:** Ensures identity consistency in digital entities. / Dijital varlıklarda kimlik tutarlılığı sağlar. --- ## 📊 Performance & Benchmarks / Performans ve Kıyaslama Testleri ### 1. Key Performance Indicators (KPIs) - Hardware: NVIDIA A100 (80GB) * 1 | Metric | Result | Status | Description | | --- | --- | --- | --- | | **Processing Speed** | 309,845 traces/sec | 🟢 Excellent | System throughput for massive data ingestion. | | **Latency** | 0.0032 ms | 🟢 Real-time Ready | Average processing time per behavioral trace. | | **Mathematical Accuracy** | 0.000051 (MSE) | 🟢 High Precision | Deviation between simulated and theoretical decay values. | | **Cognitive Efficiency** | 57.03% | 🟢 Optimized | Reduction in cognitive load due to 'Forgetful Memory'. | | **Security** | 99.9996% | 🟢 Secure | Rejection rate for high-intensity, low-integrity attacks. | ### 2. ARC (Reasoning), TruthfulQA (Safety), HumanEval (Coding) *Standard Others Red, Prettybird Blue - Standart Diğerleri Kırmızı, Cicikuş Mavi* ![unnamed](https://cdn-uploads.huggingface.co/production/uploads/691f2f51154cbf55e19b7475/bL4KnSnv3eT7FmyQM0yDj.png) ### 3. AI IQ and Level of Consciousness ![Code_Level](https://cdn-uploads.huggingface.co/production/uploads/691f2f51154cbf55e19b7475/NRpyvZRYl2lz5qiWlu0ma.png) ### 4. Metric Explanations (English) | Metric | Description | |------------------|-----------------------------------------------------------------------------| | probability | Model confidence score for the generated response under the current evaluation context. | | ethical | Estimated alignment of the response with ethical and safety constraints. | | Rscore | Reasoning consistency score that reflects internal logical coherence. | | Fscore | Factuality-oriented score indicating how well claims align with expected facts. | | Mnorm | Normalized memory or context retention signal used during behavior integration. | | Escore | Execution-quality score for instruction-following and task completion behavior. | | Dhat | Estimated deviation magnitude from stable target behavior dynamics. | | risk_score | Composite operational risk estimate where higher values indicate higher risk. | | bloom_score | Bloom-level cognitive score representing target thinking complexity. | | bloom_alignment | Degree of alignment between produced output and intended Bloom taxonomy level. | --- ### 🚀 Impact on ARC and MMLU Benchmarks **[English]** The **BCE-Prettybird** architecture directly addresses the core weaknesses of traditional LLMs in **ARC (Abstraction and Reasoning Corpus)** and **MMLU (Massive Multitask Language Understanding)**. * **ARC:** While standard models struggle with logical abstraction, our **Behavioral Path Mapping** ensures that the model follows a rigid "reasoning chain." This structure prevents logical leaps and forces the model to validate its own "path score" before reaching a conclusion. * **MMLU:** By using **Behavioral DNA** as a meta-filter, the model can categorize knowledge domains with 99%+ precision, significantly reducing hallucination rates in multi-task knowledge retrieval. **[Turkish]** **BCE-Prettybird** mimarisi, geleneksel büyük dil modellerinin (LLM) **ARC (Soyutlama ve Akıl Yürütme)** ve **MMLU (Çok Görevli Dil Anlama)** testlerindeki temel zayıflıklarını doğrudan giderir. * **ARC:** Standart modeller soyut mantık yürütmede zorlanırken, **Davranışsal İzlek Haritalama** sistemimiz modelin katı bir "akıl yürütme zinciri" izlemesini sağlar. Bu yapı, mantıksal sıçramaları engeller ve modeli sonuca ulaşmadan önce kendi "izlek skorunu" doğrulamaya zorlar. * **MMLU:** **Davranışsal DNA**'yı bir meta-filtre olarak kullanan model, bilgi alanlarını %99+ hassasiyetle kategorize edebilir ve çok görevli bilgi geri çağırma süreçlerinde halüsinasyon oranlarını önemli ölçüde düşürür. ### 📊 Expected Performance Gains by Model Scale The numbers below are **literature-based approximations** drawn from scaling studies of instruction-following and reasoning tasks. They serve as a roadmap for the expected jump in capabilities as we scale the parameters. Aşağıdaki sayılar, talimat izleme ve akıl yürütme görevlerine ilişkin ölçeklendirme çalışmalarından elde edilen, literatüre dayalı yaklaşık değerlerdir. Parametreleri ölçeklendirdikçe yeteneklerde beklenen sıçrama için bir yol haritası görevi görürler. | Benchmark | < 1B (Baseline) | 1B – 8B | > 8B | Min. Gain (vs <1B) | Max. Gain (vs <1B) | | :--- | :---: | :---: | :---: | :---: | :---: | | **MMLU** (Knowledge) | 38% | 45% | 52% | +4 pts | +14 pts | | **BBH** (Reasoning) | 31% | 42% | 48% | +5 pts | +17 pts | | **HumanEval** (Code) | 10% | 18% | 24% | +4 pts | +14 pts | | **MBPP** (Code Writing) | 22% | 34% | 40% | +6 pts | +18 pts | | **GSM8K** (Math) | 12% | 23% | 30% | +5 pts | +18 pts | | **MATH** (Adv. Math) | 4% | 7% | 9% | +2 pts | +5 pts | | **TruthfulQA** (Truth) | 45% | 58% | 66% | +6 pts | +21 pts | --- ## ⚖️ Legal Disclaimer & Ownership / Yasal Uyarı ve Mülkiyet ### [English] **Ownership:** This dataset is the property of **Prometech A.Ş.** ([https://prometech.net.tr/](https://prometech.net.tr/)). **Usage:** Please review the attached `LICENSE` file for detailed terms. **Liability:** Prometech A.Ş. accepts no liability for any non-legal, unethical, or unauthorized use of this dataset. **Commercial Use:** Unauthorized commercial use is strictly prohibited. For commercial licensing and partnerships, please contact us directly at our official website. **Academic & Personal Use:** Free to use for personal and academic purposes, provided that proper citation is given to Prometech A.Ş. and the BCE Architecture. ### [Turkish] **Mülkiyet:** Bu veri seti **Prometech A.Ş.**'ye ([https://prometech.net.tr/](https://prometech.net.tr/)) aittir. **Kullanım:** Detaylı şartlar için lütfen ekteki `LICENSE` dosyasını inceleyin. **Sorumluluk:** Hukuki olmayan, etik dışı veya yetkisiz kullanımlarda Prometech A.Ş. hiçbir sorumluluk kabul etmez. **Ticari Kullanım:** Ticari amaçlı yetkisiz kullanım kesinlikle yasaktır. Ticari lisanslama ve iş ortaklıkları için lütfen resmi web sitemiz üzerinden doğrudan bizimle iletişime geçin. **Akademik ve Kişisel Kullanım:** Prometech A.Ş. ve BCE Mimarisi'ne uygun şekilde atıf yapıldığı sürece akademik ve kişisel amaçlarla kullanım tamamen serbesttir. --- #### 🎓 Citation Format / Atıf Formatı Eğer akademik bir çalışmada kullanacaksanız, lütfen şu şekilde atıf yapın, If you are using this in an academic study, please cite it as follows: *Kahraman, A. (2025). Behavioral Consciousness Engine (BCE) - Prettybird Dataset v0.0.1 Prometech A.Ş. https://prometech.net.tr/* --- © 2026 Prometech A.Ş. - All Rights Reserved. BCE: https://github.com/pthinc/bce

提供机构：

pthinc

5,000+

优质数据集

54 个

任务类型

进入经典数据集