dstam/matchmaking
收藏Matchmaking Dataset
概述
这是一个入门级数据集,旨在帮助训练基本的约会推荐模型。尽管记录不多,但由于公开可用的约会应用数据集稀缺,该数据集可以帮助启动基本的推荐模型,适用于约会或配对应用。
使用方法
python from datasets import load_dataset
dataset = load_dataset("dstam/matchmaking")
数据转换
数据集可以通过以下代码转换为用户特征和行为,形成模拟关系数据库结构: python import pandas as pd class RelationalMatchMakingDataset: def init(self, dataset_dict): self.dataset_dict = dataset_dict
def as_relational_db(self):
main_df = self.dataset_dict["train"].to_pandas()
actions_columns = [dated_uid, dater_uid, interests_correlation,
dater_liked_dated, probability_dated_wants_to_date,
already_met_before, dater_wants_to_date,
dated_wants_to_date, is_match]
actions_df = main_df[actions_columns].copy()
user_columns = [uid, bio, looking_for, race, is_male, age,
same_race_importance, same_religion_importance]
users_df = pd.DataFrame(columns=user_columns)
unique_users = pd.concat([main_df[dater_uid], main_df[dated_uid]]).unique()
for uid in unique_users:
user_data = {}
user_data[uid] = uid
if uid in main_df[dater_uid].values:
dater_row = main_df[main_df[dater_uid] == uid].iloc[0]
user_data[bio] = dater_row[dater_bio]
user_data[looking_for] = dater_row[dater_looking_for]
user_data[race] = dater_row[dater_race]
user_data[is_male] = dater_row[is_dater_male]
user_data[age] = dater_row[dater_age]
user_data[same_race_importance] = dater_row[same_race_importance_for_dater]
user_data[same_religion_importance] = dater_row[same_religion_importance_for_dater]
elif uid in main_df[dated_uid].values:
dated_row = main_df[main_df[dated_uid] == uid].iloc[0]
user_data[bio] = dated_row[dated_bio]
user_data[looking_for] = dated_row[dated_looking_for]
user_data[race] = dated_row[dated_race]
user_data[is_male] = dated_row[is_dated_male]
user_data[age] = dated_row[dated_age]
user_data[same_race_importance] = dated_row[same_race_importance_for_dated]
user_data[same_religion_importance] = dated_row[same_religion_importance_for_dated]
user_data_df = pd.DataFrame([user_data])
users_df = pd.concat([users_df, user_data_df], ignore_index=True)
relational_db = {
"actions": actions_df,
"users": users_df,
}
return relational_db
relational_db = RelationalMatchMakingDataset(dataset).as_relational_db()
数据结构
- user_feature.csv: 包含124个真实用户的各种特征。
- actions.csv: 包含1048个用户之间的交互。
- bio.csv: 包含每个用户的个人简介。
- looking_for.csv: 包含用户对“理想伴侣”的期望。
- races.csv: 将每个种族ID与种族名称关联。
来源
该数据集是mstz的speedating数据集的转换版本,经过重新结构化以更好地适应“用户/物品”框架,用于训练推荐模型。
引用
@misc{Matchmaking 1.0, title = {Matchmaking 1.0: An open-source starter dataset for training dating app and matchmaking recommendation models}, author = {dstam}, year = {2024}, publisher = {HuggingFace}, url = {https://huggingface.co/datasets/danstam/Matchmaking-1.0} }



