Autonomous construction of a pathogenic splicing-associated variant database using large language model

In progress
Database Integration Coordination Program (DICP)
Projects funded in FY 2025-Fostering

SHIRAISHI Yuichi

Chief, Division of Genome Analysis Platform Development, National Cancer Center Research Institute

The SSCV DB, which identifies and archives splice site generating variants (SSCVs) from public transcriptome data, will be extended to predict pathological potential using large-scale language models and provide disease relevance information. In addition, a pipeline will be developed to automatically detect SSCVs from newly published transcriptome data and automatically update the database. Identification of SSCVs, which can cause hereditary diseases and/or cancer, has been difficult and often overlooked in conventional genome analysis. By clarifying the possible association between SSCVs and diseases through this database, the project expects to contribute to the research and development of disease diagnostic methods by detecting SSCVs and nucleic acid drugs that target SSCVs.

Main database(s) subject to research and development

SSCV DB (Splice-Site Creating Variant Database)

Period of research and development

Apr 2025 to Mar 2028

Grant Number

JPMJND2501

Autonomous construction of a pathogenic splicing-associated variant database using large language model

Category

Name and affiliation of Research Director

Outline of R&D

Main database(s) subject to research and development

Period of research and development

Grant Number

Inquiries & opinions