The achievement of the project's objectives will be validated through three use cases. These have been carefully selected to:
- Address clearly established problems, widely recognised in industry and academia;
- Cover the full range of features and functionalities of the project, while representing alternative collaboration and decision making paradigms (that may be found in various sectors and domains), and dealing with various types of large scale and real time data residing in heterogeneous sources.
A short description of each use case is given below.
Use Case #1: Clinico-Genomic Research Assimilator
Focusing on scientific collaboration, this use case will demonstrate how to support scientific research by integrated large-scale knowledge discovery. It deals with the clinico-genomic research community and its needs to properly manage translational research processes so that relevant findings and results, accompanied with their clinical relevance and applicability, are appropriately and timely delivered. In this context, the need to collaboratively explore, evaluate, disseminate and diffuse relative scientific findings and results is more than profound. In the context of the Dicode project, we envisage the Clinico-Genomic Research Assimilator (CGRA), through which disparate clinical and post-genomic data sources are seamlessly linked and mined. The target is to device predictive disease (diagnostic and prognostic) models as well as their utilisation in the context of clinical decision-making. The use case is founded on an integrated knowledge discovery scenario that amalgamates and maps gene-expression profiles onto gene-regulatory networks in an aim to uncover molecular regulatory mechanisms that govern target phenotypes (e.g., tumour relapse vs. not-relapse). The induced research findings are assimilated in an appropriately organised repository in order to serve and support future clinico-genomic decision making processes. We tackle the breast-cancer case but the scenario could be extended to cover other disease domains as well.
Use Case #2: Trial of Rheumatoid Arthritis Treatment
Focusing on medical decision making, this use case will make use of Dicode's services to deliver pertinent information to communities of doctors and patients. It focuses on the domain of Rheumatoid Arthritis (RA) treatment trials carried out by an academic research establishment on behalf of pharmaceutical company. During the trials, there are needs to check provenance of records/datasets obtained during the treatment process and identifying progress using dynamic MRI. A wide range of data in different formats (paper, electronic, text or images) is being used, causing potential information overload. Moreover, appropriate physical examinations and record keeping of the patient are needed when the clinician is diagnosing the severity of RA and course of treatment. This will include blood test results, records of physical examinations, RA scoring, scans from digital X-Ray, static MRI, dynamic MRI and ultrasound. An important part will be data from the patient, such as a journal of their physical condition and any pain. The journals may be kept between patients and clinicians, and would be extended into Web 2.0 technologies to create communities of interest and mining for progression of conditions during trial studies. Dicode services will enable a new way of working to enable more effective collaborative decision making to improve accuracy and volume of trial data and speed up the introduction of life saving treatments. All patients' data involved in this case will be fully anonymized (thus ensuring that no human data collection and informed consent issues are involved).
Use Case #3: Opinion mining from Unstructured Web 2.0 Data
This use case concerns capturing tractable, commercially valuable information to support marketing decisions and strategies. Today, companies must use social media and the internet to reach their audience. As consumer savvy in the online arena has increased, companies are forced to keep pace. This forces companies to invest additional resources in finding, aggregating and summarizing all information pertaining to them to be able to judge what people are saying about their needs, market trends and brands. It is paramount today that companies know what is being said about their brand, services or products, regardless of whether it is good or bad (sentiment analysis / opinion mining). This can lead in extreme cases to bad press, loss of consumer trust, and even affect the price of stocks and/or company revenues. With the current tools, finding who and what is being said is literally searching for a virtual needle in virtual haystack of unstructured information. Through this use case, we aim to validate the Dicode suite of services for the automatic analyses of this voluminous amount of unstructured information. Data for this case will be primarily obtained from spidering the Web (blogs, forums, and news). We will also make use of different Application Programming Interfaces (APIs) from various Web 2.0 platforms, such as micro blogging platforms (Twitter), and social network platforms (Facebook). The majority of the data will be unstructured in nature, i.e. text data with HTML mark-up information (hyperlinks, virtual structure like tables, separation of postings etc.). Both English and German data sources will be considered.