A new web server, "PZLAST-MAG," has been released to enable high-speed searching of the large number of protein sequences derived from MAG
- Others
- Funding
- Database Integration Coordination Program
On May 6, 2026, Associate Professor Hiroshi Mori and his colleagues at the National Institute of Genetics, National Organization of Informatics, published the release of a new web server, "PZLAST-MAG ", capable of ultra-fast similarity searches of protein sequences encoded in metagenome-assembled genome sequences (MAGs (*1)) derived from microbial metagenomics, in the scientific journal Bioinformatics Advances. PZLAST-MAG is capable to search protein sequences very fast and with high accuracy, targeting approximately 400 million protein sequences (approximately 100 billion amino acids) encoded in over 210,000 MAGs from the "Microbiome Datahub" database, which contains MAGs from a variety of environments.
With the advancement of metagenomic analysis technologies that directly sequence DNA from environmental microbiomes, there has been an explosive increase in the number of MAGs (metagenome-assembled genomes) derived from microorganisms that are difficult to culture. However, there are few tools capable of performing fast and accurate similarity searches on the vast number of protein sequences encoded by these MAGs, and understanding the taxonomic and ecological context of groups of proteins identified as similar has never been an easy task.
PZLAST-MAG is developed as an extension of PZLAST , a tool developed for rapid similarity searches of protein sequences predicted from short-read metagenomic data. And it supports large-scale similarity searches of nearly full-length protein sequences derived from MAGs. It enables very fast and highly accurate sequence similarity searches of MAG-derived protein sequences in the Microbiome Datahub. Furthermore, it not only displays the searched similar sequences as a tabular alignment, but also visualizes and presents the phylogenetic distribution of hit proteins across the entire MAG, as well as habitat distribution based on Metagenome/Microbes Environmental Ontology (MEO (*2)), and co-occurrence patterns of multiple queries in the genome, utilizing metadata from the Microbiome Datahub. These features allow for rapid homologous gene searches of functionally important genes across diverse microbial lineages, while simultaneously facilitating an easy understanding of their taxonomic and ecological backgrounds.
For more details, please refer to the paper .
< Number of data sets in the Microbiome Datahub (as of June 2026) >
- MAG: 218,248 MAGs
- Metagenomic BioProject: 102,174 projects
- Number of Environments Included: 123 categories
- MAG-derived protein sequences: 454,799,346 proteins
PZLAST-MAG is developed as part of JST Database Integration Coordination Program (DICP), "Development of an integrated microbiome data hub for microbiome research " (Principal Investigator: Associate Professor MORI Hiroshi, National Institute of Genetics).
Figure Overview of the PZLAST-MAG web interface and output visualization
(A) Top page of PZLAST-MAG, where users can submit up to 10 000 sequences per job. (B) Tabular results view, in which each row represents one hit and pairwise alignments can be view. (C) Phylogenetic tree distribution of hits, with red circles indicating the lineages from which the matched proteins are derived. (D) Environmental distribution of similar proteins based on MEO classes, shown as bar graphs. (E) Completion table summarizing the co-occurrence of homologous proteins from multiple queries within the same MAG.
Terminology
※1 MAG (Metagenome-Assembled Genomes): A hypothetical genome sequence obtained by extracting DNA from a sample as a mixture without culturing a microbial community, comprehensively sequencing the base sequence to obtain a metagenomic, assembling the resulting metagenomic sequence, and then clustering (binning) the sequences based on information such as the sequence's continuous base composition and relative abundance from the resulting contig sequences.
※2 MEO (Metagenome/Microbes Environmental Ontology) : An ontology for describing and organizing metadata about the habitats of microorganisms.
Related Links