five

Replication Data for: Can Large Language Models (or Humans) Disentangle Text?

收藏
DataONE2024-04-25 更新2024-10-19 收录
下载链接:
https://search.dataone.org/view/https://doi.org/10.7910/DVN/TEC1ZP
下载链接
链接失效反馈
官方服务:
资源简介:
Can Large Language Models (or Humans) Disentangle Text? Abstract: We investigate the potential of large language models (LLMs) to disentangle text variables—to remove the textual traces of an undesired forbidden variable in a task sometimes known as text distillation and closely related to the fairness in AI and causal inference literature. We employ a range of various LLM approaches in an attempt to disentangle text by identifying and removing information about a target variable while preserving other relevant signals. We show that in the strong test of removing sentiment, the statistical association between the processed text and sentiment is still detectable to machine learning classifiers post-LLM-disentanglement. Furthermore, we find that human annotators also struggle to disentangle sentiment while preserving other semantic content. This suggests there may be limited separability between concept variables in some text contexts, highlighting limitations of methods relying on text-level transformations and also raising questions about the robustness of disentanglement methods that achieve statistical independence in representation space if this is difficult for human coders operating on raw text to attain. Dataverse: This repository contains data from the human-coded and processed reviews. Paper link: arXiv.org/abs/2403.16584
创建时间:
2024-09-24
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作