IsmatS/azerbaijan-court-data
收藏Hugging Face2026-04-10 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/IsmatS/azerbaijan-court-data
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- az
- en
license: cc-by-4.0
task_categories:
- text-classification
- text-generation
- question-answering
- token-classification
- feature-extraction
- summarization
tags:
- legal
- law
- court
- azerbaijan
- azerbaijani
- nlp
- court-decisions
- judicial
- case-law
- lawyers
- graph-rag
- rag
- knowledge-graph
- pdf
- tabular
- text
- ocr
- document-ai
- fine-tuning
- embeddings
pretty_name: "Azerbaijan Court System Dataset"
size_categories:
- 1M<n<10M
---
# Azerbaijan Court System Dataset
**The most comprehensive open dataset of Azerbaijan's judicial system** — 1.64 million structured records and 1.54 million court decision PDFs (~160 GB) covering court decisions, active cases, scheduled hearings, court registries, judges, lawyers, and mediator organizations.
Built for AI engineers, legal tech startups, and researchers who need real-world legal data at scale.
---
## Quick Start
### Load with Hugging Face `datasets`
```python
from datasets import load_dataset
# Load any CSV by specifying the data file
ds = load_dataset("IsmatS/azerbaijan-court-data", data_files="data/court_acts.csv")
print(ds["train"][0])
```
### Load with pandas
```python
import pandas as pd
# All CSVs use UTF-8 with BOM encoding
acts = pd.read_csv("hf://datasets/IsmatS/azerbaijan-court-data/data/court_acts.csv",
encoding="utf-8-sig")
courts = pd.read_csv("hf://datasets/IsmatS/azerbaijan-court-data/data/courts.csv",
encoding="utf-8-sig")
print(f"Court acts: {len(acts):,} rows")
print(f"Courts: {len(courts):,} rows")
```
### Download a specific PDF
```python
from huggingface_hub import hf_hub_download
import tarfile, io
decision_id = 12345678
shard = str(decision_id % 1000).zfill(3) # "678"
# Download the shard tar file
tar_path = hf_hub_download(
repo_id="IsmatS/azerbaijan-court-data",
filename=f"pdfs/{shard}.tar",
repo_type="dataset"
)
# Extract the specific PDF
with tarfile.open(tar_path, "r") as tar:
pdf_bytes = tar.extractfile(f"{decision_id}.pdf").read()
print(f"PDF size: {len(pdf_bytes):,} bytes")
```
---
## Purpose
This dataset is released to **democratize access to Azerbaijan's legal data** for:
- **Training and fine-tuning LLMs** on Azerbaijani legal text — court decisions, case outcomes, legal terminology in both structured CSV and raw PDF format
- **Building legal AI startups** — automated legal research, case outcome prediction, lawyer-case matching, document analysis, OCR pipelines
- **Enabling RAG and Graph RAG applications** — the interconnected nature of courts, judges, cases, and decisions makes this ideal for retrieval-augmented generation and knowledge graph construction
- **Academic research** — judicial analytics, legal system efficiency studies, comparative law research
- **Legal tech innovation** — automating routine legal work, building intelligent case management systems, creating legal chatbots for Azerbaijani law
- **Document AI** — 1.54M court decision PDFs for training document understanding, legal OCR, and PDF extraction models
---
## Dataset Contents
### Structured Data (CSVs)
| File | Records | Size | Description |
|------|---------|------|-------------|
| `data/court_acts.csv` | 1,541,289 | ~250 MB | Court decisions with outcomes, case types, judges, dates (2016–2026) |
| `data/court_cases.csv` | 67,877 | ~15 MB | Active/pending court cases — live docket snapshot |
| `data/court_meetings.csv` | 29,921 | ~6 MB | Scheduled court hearings (Apr–Sep 2026) |
| `data/courts.csv` | 116 | ~20 KB | Court registry with types, regions, and hierarchy |
| `data/judges.csv` | 709 | ~160 KB | Judge registry with court assignments, bios, and demographics |
| `data/lawyers.csv` | 2,232 | ~350 KB | Licensed lawyers with practice areas and experience |
| `data/organizations.csv` | 70 | ~15 KB | Mediator organizations by region |
**Total: 1,642,214 structured records across 7 datasets (~494 MB CSV)**
### Court Decision PDFs
| Directory | Files | Total Size | Description |
|-----------|-------|------------|-------------|
| `pdfs/` | 1,541,218 | ~160 GB | Full-text court decision documents as PDF files |
PDFs are stored as **tar archives by shard** (`000.tar` through `999.tar`). Each tar contains ~1,500 PDFs named by `decisionId` (e.g., `12345678.pdf`). Each tar is approximately 160 MB.
### Analysis Charts
30 business analysis charts in `charts/` directory (PNG, 150+ DPI) covering volume, trends, outcomes, regional analysis, and cross-dataset relationships.
---
## Entity Relationships & Schema
The 7 datasets are interconnected. Understanding these relationships is critical for building AI applications over this data.
```
┌─────────────┐ ┌──────────────┐ ┌───────────────┐
│ courts │ │ court_acts │ │ PDFs │
│ (116 courts) │────▶│ (1.54M acts) │────▶│ (1.54M files) │
│ │ │ │ │ │
│ id │ │ decisionId ──┼────▶│ {id}.pdf │
│ title ───────┼──┐ │ caseId │ │ in shard │
│ type_title │ │ │ caseNo │ │ {id%1000}.tar │
│ region_title │ │ │ caseType │ └───────────────┘
│ parent_court │ │ │ decisionType │
└─────────────┘ │ │ decisionDate │
▲ │ │ court ───────┼── matches courts.title (after normalization)
│ │ │ judge ───────┼── matches judges.full_name (strip oğlu/qızı)
│ │ │ caseResult │
│ │ └──────────────┘
│ │
┌──────┴──────┐ │ ┌──────────────┐ ┌────────────────┐
│ judges │ └─▶│ court_cases │ │ court_meetings │
│ (709 judges) │ │ (67K cases) │ │ (30K meetings) │
│ │ │ │ │ │
│ id │ │ id │ │ meetingId │
│ full_name │ │ caseNo ──────┼────▶│ caseId │
│ work ────────┼──── │ caseType │ │ caseType │
│ birthday │ │ caseStatus │ │ meetingType │
│ description │ │ court ───────┼──┐ │ meetingDate │
│ organization │ │ judge │ │ │ court │
│ experiences │ │ enterDate │ │ │ judge │
│ educations │ └──────────────┘ │ │ meetingStatus │
└─────────────┘ │ └────────────────┘
│
┌─────────────┐ ┌──────────────┐ │
│ lawyers │ │organizations │ │
│ (2,232) │ │ (70 orgs) │ │
│ │ │ │ │
│ id │ │ id │ │
│ full_name │ │ company │ │
│ areas │ │ region_title─┼───┘ (same regions as courts)
│ languages │ │ mediator_cnt │
│ duration │ └──────────────┘
│ institution │
└─────────────┘
```
### Join Keys
| From | To | Join Strategy |
|------|----|---------------|
| `court_acts.decisionId` | PDF file | `{decisionId}.pdf` inside `pdfs/{decisionId % 1000}.tar` |
| `court_acts.court` | `courts.title` | Normalize both: strip diacritics (ə→e, ı→i, ö→o, ü→u, ş→s, ç→c, ğ→g), lowercase, collapse whitespace |
| `court_acts.judge` | `judges.full_name` | Normalize + strip patronymic suffix (oğlu/qızı): `"Abasov Qürur Bəybala oğlu"` → `"abasov qurur beybala"` matches `"Abasov Qürur Bəybala"` |
| `judges.work` | `courts.title` | Same court name normalization as above |
| `court_cases.court` | `courts.title` | Same normalization as above |
| `court_meetings.court` | `courts.title` | Same normalization as above |
| `court_cases.caseNo` | `court_acts.caseNo` | Direct string match — links active cases to their historical decisions |
| `court_meetings.caseId` | `court_cases.id` | Direct integer match — links scheduled hearings to cases |
| `courts.region_title` | `organizations.region_title` | Direct string match — links courts to mediator orgs in same region |
### Court Name Normalization
Court names differ across datasets (ASCII transliteration vs full Unicode Azerbaijani). **You must normalize before joining:**
```python
import unicodedata, re
AZ_MAP = str.maketrans("əıöüşçğƏIÖÜŞÇĞ", "eiouscgEIOUSCG")
def normalize_court_name(name: str) -> str:
if not isinstance(name, str):
return ""
name = name.translate(AZ_MAP)
name = unicodedata.normalize("NFKD", name)
name = name.encode("ascii", "ignore").decode()
name = re.sub(r"\s+", " ", name).strip().lower()
return name
# Example:
# "Bakı Şəhəri Binəqədi Rayon Məhkəməsi" → "baki saheri bineqedi rayon mehkemesi"
# "Baki Bineqedi Rayon Mehkemesi" → "baki bineqedi rayon mehkemesi"
```
---
## Linking PDFs to CSV Records
Each row in `court_acts.csv` has a `decisionId` that maps directly to a PDF file. The PDFs are sharded into 1,000 tar archives using the formula:
```
shard = decisionId % 1000 → zero-padded to 3 digits (e.g., 007, 042, 999)
tar file = pdfs/{shard}.tar
PDF filename inside tar = {decisionId}.pdf
```
### Full Example: Load a court act row and its PDF
```python
import pandas as pd
import tarfile
from huggingface_hub import hf_hub_download
# 1. Load structured data
acts = pd.read_csv(
"hf://datasets/IsmatS/azerbaijan-court-data/data/court_acts.csv",
encoding="utf-8-sig"
)
# 2. Pick a decision
row = acts.iloc[0]
decision_id = row["decisionId"]
print(f"Decision: {decision_id}")
print(f"Case: {row['caseNo']} | Type: {row['caseType']}")
print(f"Court: {row['court']} | Judge: {row['judge']}")
print(f"Result: {row['caseResult']}")
# 3. Compute shard and download the tar
shard = str(int(decision_id) % 1000).zfill(3)
tar_path = hf_hub_download(
repo_id="IsmatS/azerbaijan-court-data",
filename=f"pdfs/{shard}.tar",
repo_type="dataset"
)
# 4. Extract the PDF
with tarfile.open(tar_path, "r") as tar:
pdf_bytes = tar.extractfile(f"{int(decision_id)}.pdf").read()
print(f"PDF: {len(pdf_bytes):,} bytes")
```
### Batch PDF Processing
```python
import tarfile
from pathlib import Path
from huggingface_hub import hf_hub_download
def iter_pdfs_from_shard(repo_id: str, shard: int):
"""Yield (decision_id, pdf_bytes) for all PDFs in a shard."""
shard_str = str(shard).zfill(3)
tar_path = hf_hub_download(
repo_id=repo_id, filename=f"pdfs/{shard_str}.tar", repo_type="dataset"
)
with tarfile.open(tar_path, "r") as tar:
for member in tar.getmembers():
if member.name.endswith(".pdf"):
decision_id = int(Path(member.name).stem)
pdf_bytes = tar.extractfile(member).read()
yield decision_id, pdf_bytes
# Process all PDFs in shard 42
for did, pdf_data in iter_pdfs_from_shard("IsmatS/azerbaijan-court-data", 42):
print(f" Decision {did}: {len(pdf_data):,} bytes")
```
---
## Key Fields Reference
### court_acts.csv (1,541,289 rows — the core dataset)
| Column | Type | Description | Example |
|--------|------|-------------|---------|
| `decisionId` | int | Unique decision ID — **links to PDF** | `5432109` |
| `caseId` | int | Case ID | `1234567` |
| `caseNo` | str | Human-readable case number | `2(2)-1234/2024` |
| `caseType` | str | Case category | `Mülki işlər` (Civil) |
| `decisionType` | str | Decision category | `Qətnamə` (Judgment) |
| `decisionDate` | str | Date of decision (ISO format) | `2024-03-15` |
| `court` | str | Court name (Azerbaijani) | `Bakı Şəhəri Xətai Rayon Məhkəməsi` |
| `judge` | str | Judge name | `Mehdiyev Nəriman Hüseynqulu` |
| `caseResult` | str | Outcome text (Azerbaijani) | `İddia təmin edildi` (Claim granted) |
| `categoryName` | str | Subcategory (47.7% populated) | |
| `caseCodes` | str | Case codes (99.9% empty — ignore) | |
### court_cases.csv (67,877 rows — live snapshot of open docket)
| Column | Type | Description |
|--------|------|-------------|
| `id` | int | Case ID |
| `caseNo` | str | Case number (matches court_acts.caseNo) |
| `caseType` | str | Case category |
| `caseStatus` | str | One of: `İcraatda` (In Proceedings, 84.9%), `Dayandırılıb` (Suspended, 13.4%), `Hakim təyin edilib` (Judge Assigned, 1.7%) |
| `court` | str | Court name |
| `judge` | str | Judge name |
| `enterDate` | str | Filing date |
### court_meetings.csv (29,921 rows — future schedule Apr–Sep 2026)
| Column | Type | Description |
|--------|------|-------------|
| `meetingId` | int | Meeting ID |
| `caseId` | int | Linked case ID |
| `caseType` | str | Case category |
| `meetingType` | str | Hearing type (preparatory, oral, review, etc.) |
| `meetingDate` | str | Scheduled datetime |
| `court` | str | Court name |
| `judge` | str | Judge name |
| `meetingStatus` | str | `Təyin edilib` (Scheduled, 98.4%), `Keçirilməyib` (Not Held, 0.9%), `Ləğv edilib` (Cancelled, 0.7%) |
### courts.csv (116 rows — court registry)
| Column | Type | Description |
|--------|------|-------------|
| `id` | int | Court ID |
| `title` | str | Court name (canonical Azerbaijani) |
| `type_title` | str | One of 7 types: Rayon (85), Heavy Crimes (6), Appeal (6), Military (6), Administrative (6), Commercial (6), Supreme (1) |
| `region_title` | str | Geographic region |
| `parent_court_title` | str | Appellate parent court |
### judges.csv (709 rows — judge registry)
| Column | Type | Description |
|--------|------|-------------|
| `id` | int | Judge ID |
| `full_name` | str | Full name with patronymic (e.g., `Abasov Qürur Bəybala oğlu`) |
| `work` | str | Assigned court name — **key field for linking to courts** |
| `description` | str | Role description (48.5% populated) |
| `organization` | str | Organization affiliation (34.7% populated) |
| `experiences` | str | Career experience, pipe-separated (43.9% populated) |
| `educations` | str | Education history (14.5% populated) |
| `birthday` | str | Date of birth (49.8% populated) |
| `photo` | str | Photo URL |
| `cover` | str | Cover text / title |
**Note:** Judge names in `court_acts.csv` omit the patronymic suffix (`oğlu`/`qızı`). Strip this suffix when matching: `"Abasov Qürur Bəybala oğlu"` → `"Abasov Qürur Bəybala"`. After normalization, 636 of 709 registered judges (90%) match to court acts.
### lawyers.csv (2,232 rows)
| Column | Type | Description |
|--------|------|-------------|
| `id` | int | Lawyer ID |
| `full_name` | str | Full name |
| `areas` | str | Practice areas, semicolon-separated (43.1% populated) |
| `languages` | str | Languages spoken |
| `duration` | str | Experience as string, e.g., `16 il` (16 years). Extract number with regex `(\d+)` |
| `institution_title` | str | Bar association |
### organizations.csv (70 rows — mediator organizations)
| Column | Type | Description |
|--------|------|-------------|
| `id` | int | Organization ID |
| `company` | str | Organization name |
| `region_title` | str | Region |
| `mediator_count` | int | Number of mediators |
| `voen` | str | Tax ID |
---
## Use Cases for AI Engineers
### 1. Retrieval-Augmented Generation (RAG) over Court Decisions
Build a legal question-answering system that retrieves relevant court decisions and generates answers grounded in actual case law.
```python
# Step 1: Extract text from PDFs
import fitz # PyMuPDF
def extract_text(pdf_bytes: bytes) -> str:
doc = fitz.open(stream=pdf_bytes, filetype="pdf")
return "\n".join(page.get_text() for page in doc)
# Step 2: Chunk the text
from langchain.text_splitter import RecursiveCharacterTextSplitter
splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
# Step 3: Build an index with metadata from the CSV
import pandas as pd
acts = pd.read_csv("data/court_acts.csv", encoding="utf-8-sig")
documents = []
for decision_id, pdf_bytes in iter_pdfs_from_shard("IsmatS/azerbaijan-court-data", 0):
row = acts[acts["decisionId"] == decision_id].iloc[0]
text = extract_text(pdf_bytes)
chunks = splitter.split_text(text)
for chunk in chunks:
documents.append({
"text": chunk,
"metadata": {
"decisionId": int(decision_id),
"caseNo": row["caseNo"],
"caseType": row["caseType"],
"court": row["court"],
"judge": row["judge"],
"decisionDate": row["decisionDate"],
"caseResult": row["caseResult"],
}
})
# Step 4: Embed and store in a vector database
# Use any embedding model — e.g., sentence-transformers, OpenAI, Cohere
# Store in ChromaDB, Pinecone, Weaviate, Qdrant, etc.
```
### 2. Knowledge Graph Construction (Graph RAG)
The dataset has natural graph structure with rich interconnections:
```
Courts ──has_judge──▶ Judges ──decided──▶ Decisions ──has_pdf──▶ PDFs
│ │ │
│ │ ├── caseType
├── type (Rayon, ├── assigned_to ├── decisionType
│ Appeal, etc.) │ (court_cases) ├── caseResult
│ │ └── decisionDate
├── region └── scheduled_for
│ (court_meetings)
└── parent_court
(appellate hierarchy)
```
**Key graph facts:**
- 709 registered judges + 1,040 unique judges in court decisions (636 overlap)
- 192 judges (18.5%) serve multiple courts — these are key bridge nodes
- 7 court types form a hierarchical structure (Rayon → Appeal → Supreme)
- Each decision links to exactly one court, one judge, one case type, and one PDF
```python
# Example: Build a NetworkX graph from court_acts
import pandas as pd
import networkx as nx
acts = pd.read_csv("data/court_acts.csv", encoding="utf-8-sig",
usecols=["decisionId", "court", "judge", "caseType", "caseResult"])
G = nx.Graph()
for _, row in acts.iterrows():
G.add_edge(row["court"], row["judge"], relation="has_judge")
G.add_edge(row["judge"], row["decisionId"], relation="decided")
G.add_edge(row["decisionId"], row["caseType"], relation="case_type")
print(f"Nodes: {G.number_of_nodes():,} | Edges: {G.number_of_edges():,}")
```
### 3. Case Outcome Prediction (Classification)
With 1.54M labeled decisions, train models to predict outcomes:
```python
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
acts = pd.read_csv("data/court_acts.csv", encoding="utf-8-sig")
# Filter to rows with known results
labeled = acts[acts["caseResult"].notna()].copy()
# Encode features
le_type = LabelEncoder()
le_court = LabelEncoder()
le_result = LabelEncoder()
labeled["caseType_enc"] = le_type.fit_transform(labeled["caseType"].fillna(""))
labeled["court_enc"] = le_court.fit_transform(labeled["court"].fillna(""))
labeled["result_enc"] = le_result.fit_transform(labeled["caseResult"])
X = labeled[["caseType_enc", "court_enc"]]
y = labeled["result_enc"]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
print(f"Train: {len(X_train):,} | Test: {len(X_test):,}")
print(f"Classes: {len(le_result.classes_):,} unique outcomes")
```
### 4. Legal AI Assistant / Fine-Tuning
Create training data for fine-tuning an LLM on Azerbaijani legal text:
```python
import pandas as pd
acts = pd.read_csv("data/court_acts.csv", encoding="utf-8-sig")
# Create instruction-following pairs from structured data
training_examples = []
for _, row in acts.dropna(subset=["caseResult"]).iterrows():
training_examples.append({
"instruction": f"What was the outcome of {row['caseType']} case {row['caseNo']} "
f"at {row['court']}?",
"output": f"The case was decided by Judge {row['judge']} on {row['decisionDate']}. "
f"Decision type: {row['decisionType']}. "
f"Result: {row['caseResult']}."
})
# For richer training data, combine with extracted PDF text
# to create longer-form question-answer pairs
```
### 5. Document AI & Legal OCR
Use 1.54M court decision PDFs to train or evaluate document understanding models:
```python
# Extract structured information from PDFs
# Compare against CSV ground truth for evaluation
import fitz
import pandas as pd
acts = pd.read_csv("data/court_acts.csv", encoding="utf-8-sig")
# For each PDF, you have ground truth labels:
# - caseNo (should appear in document header)
# - court name (should appear in letterhead)
# - judge name (should appear in signature)
# - decisionDate (should appear in document)
# - caseResult (should appear in verdict section)
# This makes the dataset ideal for training information extraction models
# or evaluating OCR accuracy on legal documents
```
### 6. Court Analytics Dashboard
```python
import pandas as pd
acts = pd.read_csv("data/court_acts.csv", encoding="utf-8-sig")
courts = pd.read_csv("data/courts.csv", encoding="utf-8-sig")
# Decisions per court per year
acts["year"] = pd.to_datetime(acts["decisionDate"], errors="coerce").dt.year
volume = acts.groupby(["court", "year"]).size().reset_index(name="decisions")
# Judge workload
judge_load = acts.groupby("judge").agg(
decisions=("decisionId", "count"),
courts=("court", "nunique"),
case_types=("caseType", "nunique")
).sort_values("decisions", ascending=False)
print(judge_load.head(10))
```
### 7. Lawyer Matching Platform
```python
import pandas as pd
lawyers = pd.read_csv("data/lawyers.csv", encoding="utf-8-sig")
# Parse practice areas (semicolon-separated)
lawyers["area_list"] = lawyers["areas"].fillna("").str.split(";")
lawyers["experience_years"] = lawyers["duration"].str.extract(r"(\d+)").astype(float)
# Find lawyers specializing in criminal law with 10+ years experience
criminal_lawyers = lawyers[
lawyers["area_list"].apply(lambda x: any("cinayət" in a.lower() for a in x))
& (lawyers["experience_years"] >= 10)
]
print(f"Experienced criminal lawyers: {len(criminal_lawyers)}")
```
---
## Dataset Statistics
| Metric | Value |
|--------|-------|
| Total structured records | 1,642,214 |
| Total PDFs | 1,541,218 |
| PDF coverage | 99.995% (71 decisions had no PDF) |
| Unique courts | 124 (in court_acts) |
| Registered judges | 709 |
| Unique judges in acts | 1,040 |
| Judge registry overlap | 636 (90% of registry matched to acts) |
| Multi-court judges | 192 (18.5%) |
| Court acts date range | 2016–2026 (reliable from 2019+) |
| Case types | Civil (46.9%), Admin Offenses (24.3%), Criminal (11.7%), Admin Disputes (9.8%), Commercial (3.6%) |
| Top decision type | Qətnamə / Judgment (39.7%) |
| Top outcome | İddia təmin edildi / Claim granted (20.6%) |
| Busiest court | Bakı Şəhəri Xətai Rayon Məhkəməsi (64,933 decisions) |
| Busiest judge | Mehdiyev Nəriman Hüseynqulu (9,543 decisions) |
| Baku region share | 47.5% of all decisions |
| YoY growth 2019→2025 | 4,400% (7,725 → 347,971) |
---
## Data Quality Notes
| Issue | Details | Handling |
|-------|---------|----------|
| **Encoding** | UTF-8 with BOM (`utf-8-sig`) | Use `encoding="utf-8-sig"` when reading CSVs |
| **Court name variation** | Names differ across datasets (ASCII vs Unicode Azerbaijani) | Normalize with diacritic stripping before joins (see code above) |
| **Empty columns** | `court_meetings.parties` (100%), `court_meetings.caseCodes` (100%), `court_acts.caseCodes` (99.9%), `lawyers.services/description/achievement` (97–99%) | Ignore these columns |
| **Temporal context** | `court_cases` = live snapshot (no closed cases), `court_meetings` = future schedule, `court_acts` = historical archive | Do not mix temporal semantics |
| **Early years sparse** | 2016: 1 record, 2017: 104, 2018: 355 | Start trend analysis from 2019 |
| **PDF coverage** | 1,541,218 of 1,541,289 decisions have PDFs (99.995%) | 71 decisions had no PDF attachment |
| **Lawyer experience format** | `duration` is string like `"16 il"` (16 years) | Extract number with regex `(\d+)` |
| **Lawyer practice areas** | Only 43.1% of lawyers have `areas` populated | Analyze available subset only |
| **Meeting dates** | Some have `.1` suffix (e.g., `2026-05-19T09:20:00.1`) | Strip with regex before parsing |
| **Scraping completeness** | 1 page of 108,589 failed (HTTP 400) — ~15 records | 99.999% coverage, negligible impact |
---
## Source
All data scraped from the public API of [courts.gov.az](https://courts.gov.az) — the official website of the Azerbaijan Court System. Data is publicly available and released under CC-BY-4.0.
---
## Citation
```bibtex
@dataset{samadov2026azerbaijan_court_data,
title={Azerbaijan Court System Dataset},
author={Samadov, Ismat},
year={2026},
url={https://huggingface.co/datasets/IsmatS/azerbaijan-court-data},
note={1.64M structured records + 1.54M court decision PDFs from Azerbaijan court system}
}
```
---
## License
CC-BY-4.0 — free to use for commercial and non-commercial purposes with attribution.
提供机构:
IsmatS



