Reference Gut Microbiota - What To Use?
What genome references to use when analyzing human gut microbiota?
Whenever researchers perform DNA sequencing, they need a reference database to map the sequencing reads to. If reference collections are not adequate or complete, the risk of failing to map the sequences is high.
Uniquely, HumGut provides a prokaryote genome collection of high quality. Compared to other genome collections, it offers a high-quality reference database tailor-made for human gut microbiota studies. It contains highly relevant genomes found in healthy people worldwide.
UHGG - Unified Human Gastrointestinal Genome Collection
HumGut was published in 2021. Another catalog with highly relevant procaryotes living in the human gut was published earlier that year, the UHGG catalog. This catalog contains more than 200.000 genomes belonging to 4644 species.
UHGG itself represents a great milestone in the field of human gut microbiota studies. The majority of genomes belonging to this collection represent genomes constructed bioinformatically (metagenome assembled genomes, MAGs).
MAGs threw light into the number of species existing in the gut, which the scientific community has not cultured yet. Said shortly, MAGs represent all the unique microorganisms whose presence we can only infer based on their DNA sequences. We know little about them, except their genetic makeup.
UHGG represents a compilation of genomes found in various human gut microbiota studies, mostly performed in the US and China. However, are all genomes relevant? Are there more genomes found in healthy people worldwide?
The HumGut microbiome database gives a high-quality answer. It contains about 30,000 genome representatives, each representing a cluster of genomes with at least 97.5% similarity to one-another. Such an accurate classification is excellent in, for example, the development of targeted human gut microbiota diagnostic and therapeutic approaches.
HumGut was built by screening for the presence of prokaryote genomes in about 3,500 human gut samples (metagenomes) collected worldwide. These metagenomes were collected from healthy people coming from various countries such as Denmark, Sweden, Italy, Spain, US, Canada, Peru, China, Mongolia, Australia, Tanzania, Cameron, Ghana, etc.
All the work was performed using data that is publicly available. If you find a genome in HumGut that means that it has been encountered in at least one of the screened metagenomes. How about that!
How does HumGut perform as a reference database? The result claims to outperform the other reference databases - and it does so rightfully. HumGut can classify metagenomic reads from any study of human gut samples. And the best is, it does this with an unmatched low percentage of unclassified reads.
HumGut vs UHGG
The HumGut database is already a perfect match with the UHGG database and the NCBI RefSeq prokaryote genomes. So, what’s the difference? Scientists used both UHGG and RefSeq to make HumGut. They screened genomes coming from these two sources against metagenomes, and only genomes hitting at least one metagenome were qualified for HumGut.
HumGut offers an opportunity to create customized database with higher biomedical relevance. This facilitates any kind of research within the gut microbiome ecosystem. Thus, HumGut database is superior to all other genome collections.
How can today’s researchers use HumGut?
A high-quality reference database facilitates a high quality metagenomic analysis.
HumGut database contains a relevant collection of genomes in the form of FASTA files. From this, one can build a custom database for taxonomic profiling tools, such as for example Kraken2. This is possible because one only needs the HumGut FASTA files as a reference.
HumGut is publicly available and free to use by everyone. More and more research groups are utilizing it as a reference database, although its use is still on its infancy.
How do genomes- and species numbers add up?
When a catalog makes up 200.000 genomes, representing 4644 species, what does that mean?
We have many different genomes within one species. As you are different from me, one bacterium might differ from its cousin bacteria. We could as well call these numbers for something so hard as intraspecies diversity. And biologist do that. Microbiologists struggle when collecting genomes from all representatives of a species.
All organisms that belong to one species are actually very numerous and perhaps impossible to find. Thus, a lower taxonomic level than species is seldom reported for microbiota. Yet, the number of genomes and species numbers are informative, both of them. The species characteristics position the bacteria in a taxonomic landscape. This identifies where a particular group of genomes belongs. The genome is valuable information, for example on the variation of genes and their encoding proteins.