QSAR鱼类生物浓缩因子(BCF)数据集
收藏帕依提提2024-03-04 收录
下载链接:
https://www.payititi.com/opendatasets/show-26222.html
下载链接
链接失效反馈官方服务:
资源简介:
Francesca Grisoni, University of Milano-Bicocca, Dept. of Earth and Environmental Sciences, Milano Chemometrics & QSAR Research Group, francesca.grisoni '@' unimib.it Viviana Consonni, University of Milano-Bicocca, Dept. of Earth and Environmental Sciences, Milano Chemometrics & QSAR Research Group, viviana.consonni '@' unimib.it Marco Vighi, University of Milano-Bicocca, Dept. of Earth and Environmental Sciences Sara Villa, University of Milano-Bicocca, Dept. of Earth and Environmental Sciences RobertoTodeschini, University of Milano-Bicocca, Dept. of Earth and Environmental Sciences, Milano Chemometrics & QSAR Research Group, roberto.todeschini '@' unimib.it Data Set Information: This dataset contains manually-curated experimental bioconcentration factor (BCF) for 1058 molecules (continuous values). Each row contains a molecule, identified by a CAS number, a name (if available), and a SMILES string. Additionally, the KOW (experimental or predicted) is reported. In this database, you will also find Extended Connectivity Fingerprints (binary vectors of 1024 bits), to be used as independent variables to predict the BCF. You can find additional information in the referenced papers. In case you had questions, please do not hesitate to contact us! Attribute Information: The provided zip file contains two files. (I) The file 'QSAR BCF KOW' contains the following attributes: 1. CAS number (molecule identifier) 2. Molecule Name (if not available, marked as 'n.a.') 3. SMILES string to identify the 2D molecular structure 4. LogKOW: octanol water partitioning coefficient (experimental or predicted, as indicated by the column 'KOW Type' 5. KOW Type: indicates whether the logKOW value is experimental or predicted 6. Experimental logBCF (quantitative response): experimental fish bioconcentration factor (logarithm form) (II) The file 'ECFP_1024_m0-2_b2_c.txt' contains the following molecular descriptors (to be used to predict the BCF): - Extended Connectivity Fingerprints (ECFPs): binary descriptors useful to predict the experimental logBCF (computed with Dragon7, default settings --> details specified in the file) Each row corresponds to one molecule, as identified by the SMILES field. The molecules are in the same order as in the previous file. Relevant Papers: 1. Grisoni, F., Consonni, V., Villa, S., Vighi, M. and Todeschini, R., 2015. QSAR models for bioconcentration: Is the increase in the complexity justified by more accurate predictions?. Chemosphere, 127, pp.171-179. --> Procedure for data curation. 2. Grisoni, F., Consonni, V., Vighi, M., Villa, S. and Todeschini, R., 2016. Expert QSAR system for predicting the bioconcentration factor under the REACH regulation. Environmental research, 148, pp.507-512. --> Benchmark on the performance for this dataset 3. Grisoni, F., Consonni, V., Vighi, M., Villa, S. and Todeschini, R., 2016. Investigating the mechanisms of bioconcentration through QSAR classification trees. Environment international, 88, pp.198-205. --> Relationship between KOW and BCF Citation Request: If you publish results based on this dataset or parts of it, please cite the following paper: @article{grisoni2015, title={QSAR models for bioconcentration: Is the increase in the complexity justified by more accurate predictions?}, author={Grisoni, Francesca and Consonni, Viviana and Villa, Sara and Vighi, Marco and Todeschini, Roberto}, journal={Chemosphere}, volume={127}, pages={171--179}, year={2015}, publisher={Elsevier} } If you use the ECFP values, additionally please cite the following software: Dragon (Software for Molecular Descriptor Calculation) Version 6.0 a€” 2012 [Web link] (2012) And paper: @article{rogers2010, title={Extended-connectivity fingerprints}, author={Rogers, David and Hahn, Mathew}, journal={Journal of chemical information and modeling}, volume={50}, number={5}, pages={742--754}, year={2010}, publisher={ACS Publications} } --> Thanks and happy predicting!
作者信息:
Francesca Grisoni,米兰比可卡大学地球与环境科学系米兰化学计量学与定量构效关系(Quantitative Structure-Activity Relationship,QSAR)研究组,邮箱:francesca.grisoni@unimib.it
Viviana Consonni,米兰比可卡大学地球与环境科学系米兰化学计量学与QSAR研究组,邮箱:viviana.consonni@unimib.it
Marco Vighi,米兰比可卡大学地球与环境科学系
Sara Villa,米兰比可卡大学地球与环境科学系
Roberto Todeschini,米兰比可卡大学地球与环境科学系米兰化学计量学与QSAR研究组,邮箱:roberto.todeschini@unimib.it
数据集说明:本数据集包含1058个分子的经人工整理甄选的实验生物浓缩因子(Bioconcentration Factor,BCF),对应连续数值型数据。每条记录对应一个分子,通过CAS号、分子名称(若有)以及简化分子线性输入规范(SMILES)字符串进行标识。此外,数据集还包含辛醇-水分配系数(Octanol-Water Partition Coefficient,KOW)的实验值或预测值。本数据库同时提供扩展连接性指纹(Extended Connectivity Fingerprints,ECFP)—— 一种1024位的二元向量,可作为自变量用于BCF的预测模型构建。更多详细信息可参阅所引用的相关文献。若您有任何疑问,欢迎随时联系我们。
属性说明:所提供的压缩归档文件包含两个子文件:
(I) 文件「QSAR BCF KOW」包含以下属性:
1. CAS号:分子标识符
2. 分子名称:若未提供则标记为"n.a."
3. SMILES字符串:用于识别分子的二维结构
4. LogKOW:辛醇-水分配系数,分为实验测定值与预测值,具体类型由「KOW Type」列标注
5. KOW类型:标注LogKOW值的来源,即实验测定或预测计算
6. 实验LogBCF(定量响应指标):实验测得的鱼类生物浓缩因子,以对数形式呈现
(II) 文件「ECFP_1024_m0-2_b2_c.txt」包含以下用于BCF预测的分子描述符:
- 扩展连接性指纹(ECFP):可用于预测实验LogBCF的二元描述符,采用Dragon7软件默认参数计算,详细说明见该文件
每条记录对应一个分子,通过SMILES字段进行标识,分子顺序与前述「QSAR BCF KOW」文件完全一致。
相关参考文献:
1. Grisoni F, Consonni V, Villa S, Vighi M, Todeschini R. 2015. QSAR模型用于生物浓缩预测:复杂度提升是否能带来预测精度的改善?. Chemosphere, 127: 171-179. —— 阐述本数据集的整理流程
2. Grisoni F, Consonni V, Vighi M, Villa S, Todeschini R. 2016. 面向REACH法规下生物浓缩因子预测的专家QSAR系统. Environmental Research, 148: 507-512. —— 提供本数据集的性能基准测试结果
3. Grisoni F, Consonni V, Vighi M, Villa S, Todeschini R. 2016. 通过QSAR分类树探究生物浓缩机制. Environment International, 88: 198-205. —— 探讨KOW与BCF之间的关联
引用要求:若您基于本数据集或其部分内容发表研究成果,请引用以下论文:
Grisoni F, Consonni V, Villa S, Vighi M, Todeschini R. QSAR models for bioconcentration: Is the increase in the complexity justified by more accurate predictions?[J]. Chemosphere, 2015, 127: 171-179.
若您使用了本数据集中的ECFP值,请额外引用以下软件与文献:
软件:Dragon(分子描述符计算软件)6.0版,2012年
文献:Rogers D, Hahn M. Extended-connectivity fingerprints[J]. Journal of Chemical Information and Modeling, 2010, 50(5): 742-754. 美国化学会出版集团出版
致谢并祝您预测顺利!
提供机构:
帕依提提
搜集汇总
数据集介绍

背景与挑战
背景概述
QSAR鱼类生物浓缩因子(BCF)数据集包含1058个分子的实验性生物浓缩因子数据,以及相关的化学描述符如SMILES字符串和KOW值,用于支持QSAR研究。数据集由米兰比可卡大学的研究团队整理,并附有相关研究论文和引用要求。
以上内容由遇见数据集搜集并总结生成



