Project type: Small or medium-scale focused research project (STREP)
Funding scheme: Collaborative project
Start date: 01 September 2010
Duration: 36 months
EU funding: € 2,599,991.00
Total: € 3,510,004.00
Scientific Coordinator: Nikos Karacapilidis
Project coordinator: Computer Technology Institute & Press "Diophantus" (CTI), Greece
The goal of the Dicode project is to facilitate and augment collaboration and decision making in data-intensive and cognitively-complex settings. To do so, it will exploit and build on the most prominent high-performance computing paradigms and large data processing technologies - such as cloud computing, MapReduce, Hadoop, Mahout, and column databases – to meaningfully search, analyze and aggregate data existing in diverse, extremely large, and rapidly evolving sources. Building on current advancements, the solution foreseen in the Dicode project will bring together the reasoning capabilities of both the machine and the humans. It can be viewed as an innovative workbench incorporating and orchestrating a set of interoperable services that reduce the data-intensiveness and complexity overload at critical decision points to a manageable level, thus permitting stakeholders to be more productive and concentrate on creative activities. Services to be developed are: (i) scalable data mining services (including services for text mining and opinion mining), (ii) collaboration support services, and (iii) decision making support services.
The achievement of the Dicode project’s goal will be validated through three use cases addressing clearly established problems. These cases were chosen to test the transferability of Dicode solution in different collaboration and decision making settings, associated with diverse types of data and data sources, thus covering the full range of the foreseen solution’s features and functionalities. They concern: (i) scientific collaboration supported by integrated large-scale knowledge discovery in clinico-genomic research, (ii) delivering pertinent information from heterogeneous data to communities of doctors and patients in medical treatment decision making, and (iii) capturing tractable, commercially valuable high-level information from unstructured Web 2.0 data for opinion mining.