Heterogeneous Data Analysis Opportunities and Challenges for Clinical Research

Dr. Umit Topaloglu
Associate Professor of Cancer Biology & Biostatistics
Wake Forest School of Medicine

Monday, October 22, 2018 @ 4:00 pm
Manchester Hall 241
Refreshments will be served at 3:30 in Man.229

As new Precision Medicine approaches are trying to comprehend the link between genes and the phenotypes, computational method already play substantial roles on the way genomic and multimodal biomedical data are analyzed. Consequently, such methods will result in better translation of genomic variants towards understanding phenotypes and real-world evidence.

The secondary use of patient’s clinical, genomic, and treatment outcome data have hopes and promise of identifying potential linkages between disease groups and genetic variations to predict the outcome or response to treatment. Progress reports, structured data (e.g. labs and genomic tests), and images are housed in distributed data lakes and data warehouses with lack of ontological foundation. Since a well-defined semantic framework is needed for heterogenous data integration and sharing, indexing, accessing, and querying the research data for analyses is an enormous undertaking for organizations to become competitive via technology to data efforts. Each of these steps impacts the scope, coverage and the validity of the research process as well despite current focus is being mostly on data analytics and insight as the primary goal. Machine understandable semantics that promotes interoperability while efficiently sharing, managing, and coordinating health care data across multiple stakeholders could be an opportunity as various data modalities are being stored in separate systems.

Deep Learning (DL) techniques has ability to extract unstructured documents into structured named entities which could enable automation of identification and extraction of relevant information. DL and standard terminologies could significantly reduce data quality issues reside in Electronic Health Record (EHR) data stores due to potential decoupling of entity resolution and master data management, which in turn impacts DL performance for accurately extraction of embedded information.