znhoughton/babylm-150m-ablated
收藏Hugging Face2026-04-12 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/znhoughton/babylm-150m-ablated
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- en
license: other
tags:
- babylm
- ablation
- binomials
---
# znhoughton/babylm-150m-v3 — binomial-ablated
This dataset is a copy of [znhoughton/babylm-150m-v3](https://huggingface.co/datasets/znhoughton/babylm-150m-v3)
with sentences containing specific **novel N-and-N binomial pairs** removed.
## Ablation details
- **Source corpus:** `znhoughton/babylm-150m-v3`
- **Exclusion list:** `novel_binomials_curated.csv` (431 pairs)
- **Sentences removed:** 47
- **Date:** 2026-04-12
### Matching rule
A sentence is removed if it contains both words of an excluded pair **and**
the word *and*, with *and* appearing between the two words in either order
(any number of intervening words allowed).
Example: *"cold bread and small butter"* is removed for the pair (bread, butter).
Sentences containing only one of the two words, or lacking *and*, are retained.
### Domains and sentence-splitting
| Domain | Structure | Ablation method |
|---|---|---|
| `bnc_spoken` | sentence-per-line | line-wise |
| `childes` | sentence-per-line | line-wise |
| `open_subtitles` | sentence-per-line | line-wise |
| `switchboard` | sentence-per-line | line-wise |
| `gutenberg` | multi-sentence/line | join → sentence-tokenise → filter |
| `simple_wiki` | multi-sentence/line | join → sentence-tokenise → filter |
### Excluded pairs
- abashed / sorry
- abbots / acolytes
- abysses / puddles
- acorns / boulders
- acorns / goblets
- acorns / lions
- acorns / tigers
- acorns / valleys
- actresses / lumberjacks
- admirals / deckhands
- admirals / flowers
- admirals / knitting
- admirals / paupers
- admirals / ribbons
- admirals / weavers
- allergic / unaccustomed
- amber / gravel
- amethysts / pebbles
- amphitheaters / gazebos
- annoying / teal
- anthills / volcanoes
- ants / giraffes
- ants / stars
- anvils / coins
- aprons / swords
- archbishops / millers
- archdukes / tanners
- archdukes / vagrants
- archipelagos / sandbars
- arenas / sheds
- arrows / flowers
- asteroids / pebbles
- avalanches / feathers
- avalanches / murmurs
- avalanches / snowballs
- axes / flowers
- axes / petals
- axes / ribbons
- bacteria / candy
- ballrooms / cellars
- banners / rags
- banquets / crumbs
- banyans / ferns
- barns / marbles
- barons / fishermen
- barons / paupers
- barons / weavers
- bassoons / kazoos
- battleaxes / penknives
- battlements / cobwebs
- bears / buttons
- bears / kettles
- bears / ribbons
- bears / seeds
- bears / teacups
- beautiful / stinky
- beetles / hills
- beetles / mammoths
- beetles / mastodons
- beetles / moons
- beggars / captains
- beggars / dukes
- bicycles / robots
- birdhouses / cathedrals
- birdhouses / skyscrapers
- birdhouses / turrets
- bishops / blacksmiths
- bishops / potters
- bishops / seamstresses
- bishops / vagabonds
- bison / sparrows
- bitter / purple
- blacksmiths / lawyers
- blankets / kittens
- blizzards / butterflies
- blizzards / flurries
- blizzards / mists
- bogs / tundras
- bombs / pebbles
- bonnets / cannons
- bonnets / shields
- bonnets / swords
- bonnets / wolves
- books / owls
- boulders / butterflies
- boulders / feathers
- boulders / marbles
- bowls / sharks
- breezes / tornadoes
- breezes / typhoons
- bridges / needles
- buffaloes / sparrows
- burlap / cashmere
- burlap / velvet
- butterflies / comets
- butterflies / earthquakes
- butterflies / floods
- butterflies / squalls
- butterflies / tempests
- buttons / castles
- buttons / deer
- buttons / foxes
- buttons / lions
- buttons / mountains
- buttons / seas
- buttons / whales
- cabins / citadels
- campfires / wildfires
- candles / foxes
- candles / galaxies
- candles / gales
- candles / hurricanes
- candles / nebulae
- cannons / darts
- cannons / needles
- cannons / slingshots
- canyons / marbles
- captains / flowers
- cardinals / novices
- carracks / skiffs
- cashmere / flannel
- catapults / slingshots
- cathedrals / outhouses
- cathedrals / sheds
- cauldrons / thimbles
- cedars / mosses
- cellos / kazoos
- cellos / triangles
- chalices / thimbles
- chalk / diamonds
- chalk / tanzanite
- champagne / vinegar
- chandeliers / torches
- chanting / enchanting
- chasms / puddles
- chauffeurs / stewardesses
- cherries / llamas
- chestnuts / ferns
- chickens / fences
- chipmunks / gorillas
- churches / pebbles
- civilizations / homesteads
- clarinets / kazoos
- clay / emeralds
- claymores / penknives
- clearings / plateaus
- clearings / prairies
- cliffs / cobwebs
- cliffs / moths
- clocks / deer
- closets / dungeons
- cobblers / colonels
- cobblers / deacons
- cobblers / judges
- cobblers / marquesses
- cobblers / monks
- cobblers / privateers
- cobras / mice
- cobwebs / storms
- coins / eagles
- coins / forests
- coins / ravens
- comets / sparks
- condors / sparrows
- conscripts / generals
- coroners / senators
- corsairs / drapers
- cottages / dynasties
- counts / shepherds
- counts / tanners
- counts / vagabonds
- crabs / walruses
- crates / warehouses
- crayfish / otters
- crocodiles / tadpoles
- crossbows / slingshots
- crowbars / tweezers
- crowns / nightcaps
- crumbs / feasts
- cudgels / maces
- cutlasses / needles
- cyclones / gusts
- cyclones / zephyrs
- cypresses / mosses
- daggers / flowers
- daggers / thimbles
- deltas / puddles
- deserts / raindrops
- determined / forgettable
- dewdrops / forests
- dewdrops / glaciers
- dewdrops / planets
- dewdrops / tsunamis
- dewdrops / waterfalls
- disheveled / dreary
- ditches / fjords
- doctors / shepherds
- donates / provides
- dreadnoughts / rowboats
- drips / geysers
- dukes / flowers
- dukes / millers
- dukes / shepherds
- dukes / vagrants
- dungeons / spiders
- eagles / plates
- eagles / ribbons
- eagles / thimbles
- earls / gravediggers
- earls / paupers
- earls / tanners
- earthquakes / teardrops
- earthquakes / whimpers
- elephants / gnats
- elephants / pins
- elms / mushrooms
- embers / thunderbolts
- emperors / flowers
- eons / heartbeats
- eruptions / sparks
- falcons / wrens
- farmers / princes
- farmers / shoguns
- feasts / morsels
- feasts / scraps
- feathers / floods
- feathers / lightning
- feathers / mountains
- feathers / rivers
- feathers / spears
- feathers / storms
- feathers / thunderstorms
- feathers / tsunamis
- feathers / volcanoes
- felines / quails
- ferns / larches
- ferns / sequoias
- ferns / spruces
- ferns / walnuts
- finches / vultures
- fireflies / galaxies
- fireflies / moons
- fireflies / supernovas
- first / ninety-eighth
- fishermen / generals
- flails / switches
- flint / opals
- flintlocks / slingshots
- floods / puddles
- flowers / generals
- flowers / knights
- flowers / knives
- flowers / lances
- flowers / lords
- flowers / monarchs
- flowers / pharaohs
- flowers / princes
- flowers / warriors
- flowers / zinnias
- flutes / kazoos
- footmen / marshals
- fortresses / tents
- foxes / thimbles
- friars / tanners
- fuming / mad
- gales / sparks
- galleons / kayaks
- garnets / slate
- gazelles / leopards
- gelatin / lard
- generals / knitters
- generals / lace
- generals / riflemen
- gerbils / pythons
- glaciers / raindrops
- gnats / gorillas
- gnats / hippopotamuses
- gnats / mammoths
- gnats / rhinoceroses
- goblets / puddles
- gorillas / sparrows
- gravediggers / kings
- gravel / moonstone
- gravel / onyx
- gravel / sapphires
- gravel / turquoise
- groundskeeper / superintendent
- hailstones / icebergs
- hairpins / sabres
- hairpins / tiaras
- halberds / knives
- hammers / lace
- happily / rudely
- harpooners / knitters
- harps / kazoos
- harpsichords / kazoos
- hawks / thimbles
- hawks / voles
- helmets / nightcaps
- helmets / ribbons
- herrings / seals
- herrings / sharks
- hesitate / readjust
- hippopotamuses / mice
- horses / mirrors
- horses / thimbles
- hurricanes / petals
- hurricanes / zephyrs
- huts / monasteries
- icebergs / snowflakes
- jacket / phone
- jaguars / rabbits
- kaisers / peasants
- kale / vegetables
- kazoos / lutes
- kazoos / organs
- kazoos / pianos
- kazoos / saxophones
- kazoos / trumpets
- kazoos / violas
- kazoos / violins
- khans / weavers
- lace / twine
- lances / scarves
- lanterns / ravens
- laundresses / queens
- leopards / rabbits
- leviathans / minnows
- lightning / moths
- lions / needles
- lions / teacups
- lizards / tyrannosaurs
- looms / trebuchets
- lords / shepherds
- maelstroms / ripples
- majors / spinners
- mansions / pigsties
- mantles / rags
- marbles / orbs
- marooned / missing
- masculine / undignified
- mastodons / moths
- mausoleums / shacks
- meadows / tundras
- meteors / raindrops
- mice / tigers
- mice / wolves
- millers / priests
- millstones / pebbles
- minnows / orcas
- moles / polecats
- monarchs / serfs
- monks / tanners
- monks / telescopes
- moths / seas
- moths / suns
- mountains / petals
- mud / platinum
- mudslides / puddles
- mushrooms / oaks
- mushrooms / pines
- muskets / slingshots
- nebulae / sparks
- nectar / swill
- needles / ravens
- needles / sledgehammers
- needles / tigers
- nurses / patriarchs
- oboes / tambourines
- organs / whistles
- owls / pennies
- owls / thimbles
- paintbrushes / sailors
- palaces / pigsties
- pearls / silt
- peasants / sultans
- pebbles / volcanoes
- pelicans / sardines
- pennies / wolves
- petals / tornadoes
- pharaohs / potters
- pianos / triangles
- pickaxes / pins
- pikes / slingshots
- pinnacles / vales
- pins / tigers
- pins / walls
- planets / sparks
- priests / tanners
- priors / scullions
- pterodactyls / sparrows
- puddles / straits
- puppies / tigers
- pyramids / sandcastles
- queens / serfs
- rags / tapestries
- rapiers / thimbles
- rats / sharks
- redwoods / toadstools
- rhinoceroses / voles
- ribbons / sharks
- ribbons / shields
- ripples / waterspouts
- rivers / seeds
- rubies / sandstone
- rustles / tempests
- sackcloth / velvet
- sardines / whales
- saxophones / whistles
- scepters / twigs
- scissors / scythes
- seeds / sharks
- serfs / tsars
- sharks / teacups
- showers / typhoons
- slings / trebuchets
- soldiers / spools
- sparks / thunderbolts
- sparrows / wolves
- spears / tablecloths
- spoons / wolves
- tablecloths / tigers
- teacups / wolves
- therapy / vacations
- thimbles / towers
- thimbles / wolves
- thunderclaps / whispers
- trombones / whistles
- tsars / vagrants
- tubas / whistles
- vocabulary / vowels
提供机构:
znhoughton



