Autonomous construction of a pathogenic splicing-associated variant database using large language model

Category

  • In progress
  • Database Integration Coordination Program (DICP)
  • Projects funded in FY 2025-Fostering

Name and affiliation of Research Director

SHIRAISHI Yuichi

Chief, Division of Genome Analysis Platform Development, National Cancer Center Research Institute

Outline of R&D

The SSCV DB, which identifies and archives splice site generating variants (SSCVs) from public transcriptome data, will be extended to predict pathological potential using large-scale language models and provide disease relevance information. In addition, a pipeline will be developed to automatically detect SSCVs from newly published transcriptome data and automatically update the database. Identification of SSCVs, which can cause hereditary diseases and/or cancer, has been difficult and often overlooked in conventional genome analysis. By clarifying the possible association between SSCVs and diseases through this database, the project expects to contribute to the research and development of disease diagnostic methods by detecting SSCVs and nucleic acid drugs that target SSCVs.

Main database(s) subject to research and development

SSCV DB (Splice-Site Creating Variant Database)

Period of research and development

Apr 2025 to Mar 2028

Grant Number

JPMJND2501

Inquiries & opinions

Receive our monthly newsletter, tailored for life science researchers, technicians, and supporters, featuring updates on NBDC workshops, research funding calls and results.