five

Fast.genomics comparative genome browser: 2023 release

收藏
DataCite Commons2023-08-23 更新2024-08-26 收录
下载链接:
https://figshare.com/articles/dataset/Fast_genomics_comparative_genome_browser_2023_release/24010353
下载链接
链接失效反馈
官方服务:
资源简介:
This is an archive of the 2023 release of http://fast.genomics.lbl.gov, a fast comparative genome browser for diverse bacteria and archaea.fast_code_Aug2023.tar.gz contains the source code, which is also available at github. See SETUP for installation instructions and lib/neighbor.sql for the database schema.Some of the code depends on Perl libraries from the PaperBLAST code base. These are archived in PaperBLAST_lib.tar.gz and should be exploded into ../PaperBLAST/ to create the ../PaperBLAST/lib directory. They are also available from github.fast_main_May2023.tar.gz contains the main database, with one representative genome for each of 6,377 genera. neighbor.db is the SQLite3 database and neighbor.faa.gz has the protein sequences. Put these files in the data/ directory. You can build the mmseqs database with:gunzip data/neighbor.faa.gz<br>mmseqs createdb data/neighbor.faa data/mmseqsdb --dbtype 1<br>mmseqs createindex data/mmseqsdb /tmp -k 6The sub-databases, with additional genomes for each taxonomic order, can be downloaded here or here. The tarball contains a directory for each sub-database; these subdirectories should go in the data/ directory. After installing the main database, you can build the clustered BLAST+ database for each sub-database's cluster with:for sub in `sqlite3 data/neighbor.db 'select prefix FROM SubDb;'`;<br>do gunzip data/$sub/cluster.faa.gz<br>makeblastdb -in data/$sub/cluster.faa -dbtype prot -out data/$sub/cluster.faa.plusdb<br>doneOr, to download the the sub-database for a specific order, visit the fast.genomics web site, search for that order, switch to the sub-database, and see downloads section at the bottom of the main page.

本归档文件对应http://fast.genomics.lbl.gov于2023年发布的版本,该站点是一款面向各类细菌与古菌的快速比较基因组浏览器(comparative genome browser)。 fast_code_Aug2023.tar.gz 包含源代码,该代码亦可从GitHub获取。安装说明详见SETUP文件,数据库架构请参见lib/neighbor.sql。 部分代码依赖PaperBLAST代码库中的Perl库,此类库归档于PaperBLAST_lib.tar.gz,需将其解压至../PaperBLAST/目录以创建../PaperBLAST/lib目录,该类库同样可从GitHub获取。 fast_main_May2023.tar.gz 包含主数据库,该数据库收录了6377个属各1个代表性基因组。其中neighbor.db为SQLite3数据库文件,neighbor.faa.gz存储蛋白质序列,请将上述文件放置于data/目录下。 可通过以下命令构建MMseqs数据库: gunzip data/neighbor.faa.gz mmseqs createdb data/neighbor.faa data/mmseqsdb --dbtype 1 mmseqs createindex data/mmseqsdb /tmp -k 6 针对每个分类学目的额外基因组的子数据库,可通过此处或此处下载。该压缩包内含各子数据库对应的目录,需将这些子目录放置于data/目录中。 完成主数据库安装后,可通过以下命令为每个子数据库的聚类结果构建BLAST+数据库: for sub in `sqlite3 data/neighbor.db 'select prefix FROM SubDb;'`; do gunzip data/$sub/cluster.faa.gz makeblastdb -in data/$sub/cluster.faa -dbtype prot -out data/$sub/cluster.faa.plusdb done 或者,若需下载特定分类学目的子数据库,可访问fast.genomics官网,搜索对应目并切换至子数据库页面,查看主页底部的下载板块。
提供机构:
figshare
创建时间:
2023-08-23
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作