Autonomous construction of a pathogenic splicing-associated variant database using large language model
Category
- In progress
- Database Integration Coordination Program (DICP)
- Projects funded in FY 2025-Fostering
Name and affiliation of Research Director
SHIRAISHI Yuichi
Chief, Division of Genome Analysis Platform Development, National Cancer Center Research Institute
Outline of R&D
The SSCV DB, which identifies and archives splice site generating variants (SSCVs) from public transcriptome data, will be extended to predict pathological potential using large-scale language models and provide disease relevance information. In addition, a pipeline will be developed to automatically detect SSCVs from newly published transcriptome data and automatically update the database. Identification of SSCVs, which can cause hereditary diseases and/or cancer, has been difficult and often overlooked in conventional genome analysis. By clarifying the possible association between SSCVs and diseases through this database, the project expects to contribute to the research and development of disease diagnostic methods by detecting SSCVs and nucleic acid drugs that target SSCVs.
Main database(s) subject to research and development

SSCV DB (Splice-Site Creating Variant Database)
Period of research and development
Apr 2025 to Mar 2028
Grant Number
JPMJND2501