Dataset_1

NIAID Data Ecosystem2026-03-12 收录

下载链接：

https://data.mendeley.com/datasets/df8w8dct3b

下载链接

链接失效反馈

官方服务：

资源简介：

Dataset_1 provides seven FASTA files corresponding to protein databases. The composite database, named “All_Databases_5950827_sequences.fasta” contains protein sequences retrieved from public databases related to cephalopods salivary glands and proteins identified from our original data. This database comprises a total of 5,950,827 protein sequences and in turn it is composed by six smaller databases, named with capital letters from A to F: Database_A_19087_sequences.fasta, Database_B_16990_sequences.fasta, Database_C_2427_sequences.fasta, Database_D_84778_sequences.fasta, Database_E_5106635_sequences.fasta, Database_F_720910_sequences.fasta. Each one of these databases, contains data from several sources, i.e.: Database_A_19087_sequences.fasta – protein database from proteogenomic analyses of O. vulgaris salivary apparatus, built by Fingerhut et al. (2018); Database_B_16990_sequences.fasta – antimicrobial peptides from a non-redundant database collected by Aguilera-Mendoza et al. (2015); Database_C_2427_sequences.fasta – proteins identified with Proteome Discoverer using our 12 LTQ raw files against the UniProt database for the Metazoa taxonomic selection (2018_07 release); Database_D_84778_sequences.fasta and Database_E_5106635_sequences.fasta – proteins identified, from de novo transcriptome assemblies of 16 cephalopods posterior salivary glands, by TransDecoder and six-frame translation tool, respectively; Database_F_720910_sequences.fasta – proteins obtained by six-frame translation tool using the transcripts profiled in the transcriptome of O. vulgaris, but not included by the authors in Database_A_19087_sequences.fasta.

创建时间：

2021-03-01

5,000+

优质数据集

54 个

任务类型

进入经典数据集