Using Microbiome Datahub, a tool called "AutoFixMark" has been developed to accurately predict the presence or absence of CO₂ fixation pathways in chemoautotrophic bacteria from their genomes
- Others
- Funding
- Database Integration Coordination Program
On Feb 11, 2026, Associate Professor Mori Hiroshi of the National Institute of Genetics and his colleagues published a paper in the international scientific journal Scientific Data announcing their development of "AutoFixMark," a tool for highly accurate prediction of the presence or absence of CO₂ fixation pathways in chemolithoautotrophic bacteria from their genomes using Microbiome Datahub that they are developing. Also on Feb 12, the National Institute of Genetics (Research Organization of Information and Systems), its Data Science Joint Research Facility, the Database Center for Life Sciences (DBCLS), and the National Institute of Technology and Evaluation (NITE) issued a press release announcing the results of this research.
Microbial CO₂ fixation is an essential process for microbial survival in carbon-limited environments and plays a key role in the global carbon cycle. Chemoautotrophic bacteria have a variety of CO₂ fixation pathways, with seven known pathways including the Calvin-Benson cycle (CBB cycle). However, because enzyme genes involved in these pathways are found in diverse lineages and some enzymes are involved in multiple pathways, it has been difficult to accurately predict which pathways an organism possesses based on genome information alone. While existing metabolic pathway prediction tools for bacteria (e.g., METABOLIC and gapseq) are useful for predicting general metabolic pathways, accuracy has been lacking when it comes to predicting diverse CO₂ fixation pathways, especially for some relatively recently discovered pathways.
Therefore, Associate Professor Mori and his colleagues defined pathway-specific marker genes and constructed prediction rules, developed the prediction tool " AutoFixMark," and constructed a high-quality benchmark dataset to evaluate its performance. By doing so, they succeeded in predicting with high accuracy CO₂ fixation pathways such as the "dicarboxylic acid/4-hydroxybutyrate cycle" and the "reductive glycine pathway," which were difficult to predict using existing tools.
In developing AutoFixMark, microbial genome data included in the Microbiome Datahub database developed by Associate Professor Mori and his colleagues was utilized, and a list of enzyme genes for each genome was created using the KEGG Orthology prediction tool KofamScan.
For more details, please see the paper and the press release.
MAG (Metagenome Assembled Genome) (*1) has attracted attention in recent years because it can reveal the genomes of even difficult-to-culture microorganisms and provide important clues for analyzing microbial diversity. Microbiome Datahub, built from publicly available data and quality-controlled, contains 220,000 high-quality MAG data items, including sequence information, metadata such as the lineage and sample from which they originated, information on identified orthologs, and predicted functions and phenotypes. Dr. Mori stated that they plan to use AutoFixMark to predict CO₂ fixation pathways for MAG data and SAG (Single Amplified Genome) (*2) data stored in Microbiome Datahub, thereby estimating the distribution and evolutionary history of CO₂ fixation pathways in diverse microbial lineages and publishing the results on Microbiome Datahub.
Microbiome Datahub is developed by Dr. Mori and his colleagues as part of JST Database Integration Coordination Program (DICP), "Development of an integrated microbiome data hub for microbiome research" (Principal Investigator: Associate Professor MORI Hiroshi, National Institute of Genetics). In the future, they plan to use AutoFixMark developed in this study to predict CO₂ fixation pathways for MAG data and SAG (Single Amplified Genome) (*2) data included in the Microbiome Datahub, estimate the distribution and evolutionary history of CO₂ fixation pathways in diverse microbial lineages, and publish the prediction results on Microbiome Datahub.
<Number of data entries in Microbiome Datahub (ver.1.0)>
- MAG: 218,248 MAGs
- Metagenome BioProject: 102,174 projects
- Protein sequences from MAG: 454,799,346 proteins
Terminology
*1 Metagenome-Assembled Genome (MAG): Microbial genome sequences assigned to a single taxonomic group, obtained by reconstructing sequence data derived from genomic data (metagenomes) collected from samples such as soil or feces that have not undergone isolation procedures. These sequences are particularly useful for identifying microorganisms that are difficult to culture. They aid in understanding microbial diversity and the interactions between microbial communities and their environment, and are also gaining attention as a source of information on novel genes.
*2 SAG(Single amplified genome): It refers to microbial genome information determined from DNA isolated from a single microbial cell. It is a core technology of single-cell genome analysis, which comprehensively analyzes the genomes of individual microbes within microbial communities from diverse environments. It is considered effective for identifying and studying the genes of difficult-to-culture microbes and rare microbes found in the environment.
Related Links
- Original paper "A curated resource of chemolithoautotrophic genomes and marker genes for CO₂ fixation pathway prediction" | Scientific Data
- Press release "Development of 'AutoFixMark': A Tool for High-Precision Prediction of CO₂ Fixation Pathway Presence in Chemolithotrophic Bacteria Based on Genome Data" | National Institute of Genetics | National Institute of Genetics
- Software AutoFixMark
- Database Microbiome Datahubk
- Funded projects: "Development of an integrated microbiome data hub for microbiome research" | NBDC
Project summaries and reports are posted.