The Big Data Center (BDC) of China Medical University Hospital (CMUH) established the Clinical Research Data Repository in 2016. BDC manages the largest phenome-genome-environmental data platform in Asia, encompassing 19-year EMR and environmental exposure data from 3 million patients and genetic information from 230 thousand patients, which forms the solid foundation for generating clinical data with high resolution. To make great use of valuable medical big data, BDC developed the smart data platform, iHi Platform, in 2020 to ignite hyper-intelligent data applications. The iHi platform is the only innovative data platform that combines clinical, genetic, and environmental data in Taiwan (Fig. 1).
The iHi Platform provides clean, integrated, and de-identified data to clinical researchers through a cloud-based system. This data architecture not only can make the data ecosystem interoperable and sustainable, but also solve the problem of cluttered data and low accessibility and create a venue for infinite artificial intelligence. Through the iHi Platform services, we aim to expand multi-omics clinical data for education, research, and clinical or business application. Ultimately, the insights inspired by the iHi Platform provide feedback to clinical settings and ultimately improve medical quality and patient health (Fig. 2).
The iHi Platform was designed as a patient-centered medical data ecosystem that provides clinical researchers with accessible, reliable, and diverse data. Several AI/data tools have received US and Taiwan patents and one AI tool has been approved by US FDA or Taiwan FDA, providing the validity and quality of the iHi Platform. The iHi Platform encompasses innovative data structure (data LEGO) and systematic data annotation workflows (data chip). We further establish the iHi Genomics Analytic Platform to speed up translational research discovery. With the deep-cleaned, comprehensive, and accessible data service, iHi Platform can bring research to an infinite intelligence applications.
We aim to build a full-spectrum big data ecosystem that can not only integrate the EMR data, health insurance data, genomic, and environmental data, but also combine with real-world data from patient-centered systems and multi-omics data such as microbiome and exosome data. Most importantly, these diverse and heterogeneous data must be linkable and traceable for sustainability and reusable. Therefore, we process all data through the standardized data management pipeline, which provides users with high-quality and protected clinical datasets. Furthermore, we modularize multi-omics and multi-dimensional datasets into data LEGO brick which is deep-cleaned and well-sorted by their characteristics. The iHi data platform, a data LEGO pool, deposits diverse data sources, such as EHR, medical images and examination reports. The researchers can select the data bricks of interest to build their own unique data castles and to perform analyses on the iHi Platform (Fig. 3).
All data provided in the iHi platform were processed through the standardized data management pipeline, which provides users with high-quality and protected clinical datasets. To perform systematic data cleaning, validation, and integration, we establish a unique smart data chip fabrication process to control the quality of each processing step.
From data sources acquisition, data architecture design, data polishing, standardization, refinement, to data validation and stacking, the smart data chip with qualified and certified datasets can be generated (Fig. 4).
Through this standard and pre-built smart data chip fabrication process, we can easily manage and trace each process step in the iHi Platform. In addition, we are the only platform that provides both ISO and CNS double-certified de-identified data in Taiwan (Fig. 5).
This brand-new concept of continuous scale and flow production used in data processing can deeply clean data and enhance the high-quality AI solutions that fit into the real-world clinical flow. At the same time, high-performance AI can help extract important new data features and insights to enhance data diversity and further brew the smart data ecosystem (Fig. 6).
In 2021, we launched the iHi Genomics analytic platform, which is an easy-to-use analytic platform for data exploring and extracting insights from interesting datasets. The iHi Genomics provides the disease cohort selection and Genome-wide Association Study (GWAS) analysis within a few clicks. The iHi Genomics can generate the full report for GWAS, including the quality control details and the Manhattan plot for significant SNPs associated with the disease (Fig.7).
Using virtual desktop infrastructure (VDI), the user can remotely access de-identified data certified by ISO and CNS in a highly secure environment (Fig 8).
The iHi Genomics analytic pipeline provides the full report of the identified gene or SNP, with a detailed description and linkable external information, which allows researchers without coding skills to painlessly perform basic GWAS analysis.
Under the full support of the CMUH board, the BDC manages the EHR of more than 3 million patients connected with genetic and environmental data. The iHi Platform with the deep-cleaned, multi-omics, and integrated data can provide deep macro-level and micro-level resolution for clinical insights discovery. Based on these iHi services, more than 80 SCI papers have been published (Fig. 9).
Due to an extensive experience and the high quality data of the iHi Platform, we have been collaborating with 19 international institutions, including universities, medical centers, and national institutes of health. In 2018 and 2019, we were invited by the American Society of Nephrology (ASN) to present the big data research and application of kidney diseases (Fig. 10).
The whole working flow and infrastructure of the iHi platform also have been highly recognized by many leading experts worldwide, such as Dr. Nick Bryan, the former president of Radiological Society of North America (Fig. 11).
Starting from 2022, the global cooperation in clinical and intelligent medical projects has gone stronger, including the collaborations with the universities and hospitals located in US and Japan. In the future, we will continue nurturing the iHi data ecosystem and integrating the worldwide collaborations.
We are looking forward to more cooperations.