juliensimon/kepler-observations
收藏Hugging Face2026-04-18 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/juliensimon/kepler-observations
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-4.0
pretty_name: "Kepler Observation Catalog"
language:
- en
description: "The Kepler Observation Catalog indexes every target observed by NASA's Kepler space telescope during its prime mission (2009–2013), drawn from the Mikulski Archive for Space Telescopes (MAST). Kepler "
task_categories:
- tabular-classification
tags:
- space
- kepler
- k2
- nasa
- exoplanets
- astronomy
- telescope
- photometry
- open-data
- tabular-data
- parquet
size_categories:
- 100K<n<1M
configs:
- config_name: default
data_files:
- split: train
path: data/kepler_observations.parquet
default: true
---
# Kepler Observation Catalog
<div align="center">
<img src="banner.jpg" alt="Artist concept of NASA's Kepler space telescope in orbit, surrounded by a starfield" width="400">
<p><em>Credit: NASA/Ames/JPL-Caltech</em></p>
</div>
*Part of a [dataset collection](https://huggingface.co/collections/juliensimon/astronomy-datasets-69c24caf2f17e36128946743) on Hugging Face.*
## Dataset description
The Kepler Observation Catalog indexes every target observed by NASA's Kepler space telescope during its prime mission (2009–2013), drawn from the Mikulski Archive for Space Telescopes (MAST). Kepler is the most successful exoplanet-hunting mission in history: by continuously monitoring ~200,000 stars in a single 100-square-degree field of view in Cygnus–Lyra, it discovered the majority of confirmed exoplanets through high-precision photometric transit detection, including the first Earth-sized planets in habitable zones.
Each row in this catalog is one Kepler target — identified by its 9-digit Kepler Input Catalog (KIC) ID — with the cadence at which it was observed (long cadence = 29.4-minute integration, capable of catching transits on weeks-to-months orbital periods; short cadence = 58.9-second integration, used for asteroseismology and short-period transits), the pointing (RA/Dec), and a 17-character bitmask indicating which of Kepler's 17 quarterly observing periods contain data for that target. The `quarters_observed` column summarises the mask as an integer count — a target observed in all 17 quarters has the longest, most exoplanet-favourable light curve in the archive.
This dataset is designed for cross-matching with other exoplanet catalogs (Kepler confirmed planets, TESS TOI, Gaia DR3), for selecting targets with long baselines for long-period planet searches, and for understanding the Kepler field's completeness. It complements the Kepler eclipsing binary and transit timing variation catalogs already in this collection by providing the full target list. Each target's raw and de-trended light curves can be retrieved from MAST using the `obs_id`.
The catalog is derived from MAST's CAOM table `dbo.caomobservation` (collection = 'KEPLER'). The K2 extended mission (2014–2018) uses a different observation-id schema and is published as a separate dataset (planned). The Kepler prime-mission archive is static, so this dataset is refreshed quarterly to pick up any late-stage reprocessing.
This dataset is suitable for **tabular classification** tasks.
## Schema
| Column | Type | Description | Sample | Null % |
|--------|------|-------------|--------|--------|
| `obs_id` | string | MAST observation identifier (e.g., 'kplr000757076_lc_Q111111111111111111'); encodes KIC ID, cadence (lc=long, sc=short), and Q-flags for each of the 17 Kepler quarters (1=observed, 0=not). Primary key. | kplr000757076_lc_Q111111111111111111 | 0.0% |
| `intent` | string | Observation intent: 'science' (target star monitoring) or 'calibration' | science | 0.0% |
| `target_ra` | float64 | Target right ascension in decimal degrees (ICRS). Kepler observed a fixed ~100 sq. deg. field near RA 290°, Dec 45° in Cygnus-Lyra. | 291.03872 | 0.0% |
| `target_dec` | float64 | Target declination in decimal degrees (ICRS) | 36.59813 | 0.0% |
| `kic_id` | Int64 | Kepler Input Catalog identifier (9-digit integer) for the target star; shared with the NASA Exoplanet Archive | 757076 | 0.8% |
| `cadence` | string | Cadence type: 'lc' (long cadence, 29.4-minute integration) or 'sc' (short cadence, 58.9-second integration) | lc | 0.8% |
| `quarters_mask` | string | 17-character string of '1'/'0' flags marking which Kepler quarters contain data for this target (Q1–Q17, in order) | 111111111111111111 | 0.8% |
| `quarters_observed` | Int64 | Number of Kepler quarters (out of 17) in which the target was observed; higher = longer light-curve baseline | 18 | 0.0% |
## Quick stats
- **212,993** Kepler prime-mission observations (2009–2013)
- **207,656** long cadence (29.4 min), **3,724** short cadence (58.9 s)
- **82,915** targets observed in all 17 Kepler quarters (maximum baseline)
- **207,656** distinct Kepler Input Catalog (KIC) targets
## Usage
```python
from datasets import load_dataset
ds = load_dataset("juliensimon/kepler-observations", split="train")
df = ds.to_pandas()
```
```python
from datasets import load_dataset
ds = load_dataset("juliensimon/kepler-observations", split="train")
df = ds.to_pandas()
# Targets with longest baseline (all 17 quarters)
full_baseline = df[(df["quarters_observed"] == 17) & (df["cadence"] == "lc")]
print(f"Targets observed across the full Kepler prime mission: {len(full_baseline):,}")
# Map of Kepler field
import matplotlib.pyplot as plt
sample = df.sample(min(50000, len(df)))
plt.figure(figsize=(10, 8))
plt.scatter(sample["target_ra"], sample["target_dec"], s=0.2, alpha=0.3)
plt.xlabel("RA (deg)"); plt.ylabel("Dec (deg)")
plt.title("Kepler prime-mission field of view (50K sample)")
plt.gca().invert_xaxis()
plt.show()
# Cadence distribution per quarter
import numpy as np
mask_chars = np.array([list(m) for m in df["quarters_mask"].fillna("0" * 17)])
per_quarter = (mask_chars == "1").sum(axis=0)
plt.bar(range(1, 18), per_quarter)
plt.xlabel("Kepler quarter"); plt.ylabel("Targets observed")
plt.title("Kepler target count per quarter")
plt.show()
```
## Data source
https://archive.stsci.edu/missions-and-data/kepler
## Update schedule
Quarterly (1st of Jan/Apr/Jul/Oct at 14:00 UTC) via [GitHub Actions](https://github.com/juliensimon/space-datasets).
## Related datasets
- [juliensimon/kepler-eclipsing-binaries](https://huggingface.co/datasets/juliensimon/kepler-eclipsing-binaries)
- [juliensimon/kepler-transit-timing](https://huggingface.co/datasets/juliensimon/kepler-transit-timing)
- [juliensimon/nasa-exoplanets](https://huggingface.co/datasets/juliensimon/nasa-exoplanets)
- [juliensimon/tess-toi-candidates](https://huggingface.co/datasets/juliensimon/tess-toi-candidates)
- [juliensimon/hst-observations](https://huggingface.co/datasets/juliensimon/hst-observations)
- [juliensimon/jwst-observations](https://huggingface.co/datasets/juliensimon/jwst-observations)
> If you find this dataset useful, please consider [giving it a like](https://huggingface.co/datasets/juliensimon/kepler-observations) on Hugging Face. It helps others discover it.
## About the author
Created by [Julien Simon](https://julien.org) — AI Operating Partner at Fortino Capital. Part of the [Space Datasets](https://julien.org/datasets) collection.
## Citation
```bibtex
@dataset{kepler_observations,
title = {Kepler Observation Catalog},
author = {juliensimon},
year = {2026},
url = {https://huggingface.co/datasets/juliensimon/kepler-observations},
publisher = {Hugging Face}
}
```
## License
[CC-BY-4.0](https://creativecommons.org/licenses/by/4.0/)
提供机构:
juliensimon



