2. Create GTax database

2.1. Download genomes data with NCBI Datsets

GTax uses four taxonomy superkingdoms for downloading data: archaea, bacteria, viruses and eukaryotes

Users need to run these commands to download the genomes sequences:

2.1.1. Archaea

localhost:~> datasets download genome taxon 2157 --assembly-source refseq --exclude-gff3 --exclude-protein --exclude-rna --exclude-gff3 --exclude-rna --exclude-genomic-cds --dehydrated
localhost:~> mv ncbi_dataset.zip archaea_meta.zip

2.1.2. Bacteria

localhost:~> datasets download genome taxon 2 --assembly-source refseq --exclude-gff3 --exclude-protein --exclude-rna --exclude-gff3 --exclude-rna --exclude-genomic-cds --dehydrated
localhost:~> mv ncbi_dataset.zip bacteria_meta.zip

2.1.3. Viruses

localhost:~> datasets download genome taxon 10239 --assembly-source refseq --exclude-gff3 --exclude-protein --exclude-rna --exclude-gff3 --exclude-rna --exclude-genomic-cds --dehydrated
localhost:~> mv ncbi_dataset.zip viruses_meta.zip

2.1.4. Eukaryotes

localhost:~> datasets download genome taxon 2759 --assembly-source refseq --exclude-gff3 --exclude-protein --exclude-rna --exclude-gff3 --exclude-rna --exclude-genomic-cds --dehydrated
localhost:~> mv ncbi_dataset.zip eukaryotes_meta.zip

2.2. Process metadata and creates the directories for hydration

The command filter_metadata_zip will read the zipped metadata file for each superkingdom and create the folders for hydration with the datasets command. This command will keep the reference genome for each taxa if it is available. If no reference genome is available, the latest assembly will be kept.

localhost:~> filter_metadata_zip

2.3. Hydrate directories with datasets

2.3.1. Archaea

localhost:~> cd archaea
localhost:~> datasets rehydrate --directory .

2.3.2. Bacteria

localhost:~> cd bacteria
localhost:~> datasets rehydrate --directory .

2.3.3. Viruses

localhost:~> cd viruses
localhost:~> datasets rehydrate --directory .

2.3.4. Eukaryotes

localhost:~> cd eukaryotes
localhost:~> datasets rehydrate --directory .

2.4. Create Gtax FASTA files

After all data is downloaded, it will take few hours to finish, we can create the FASTA, indexes and TaxID maps for the databases.

localhost:~> gtax_database