The Dicode project aims at developing cutting-edge technological solutions to facilitate and augment collaboration and decision making in today’s data intensive and cognitively complex work environments. The Dicode consortium’s experience in successful innovative interdisciplinary projects at the European and national levels, as well as the best practices identified from relevant projects, led to the conclusion that the project’s objectives will be best accomplished through an evolutionary approach with the following features:
- Throughout the project, stakeholders will be actively engaged in the specification, design and evaluation of the foreseen technological solutions;
- An incremental development approach will be followed, thus ensuring that end users can experiment with the Dicode services from the early stages of the project (operational prototype versions of the foreseen services will be available at the end of the first year of the project; enhanced versions will be delivered in month 24; final versions will be ready in month 33);
- User requirements will be refined through testing (involving users from all three use cases);
- An operational integrated suite of services will be early available for trials and proof-of-concept purposes, which is the best way to gather feedback in order to test our research hypotheses and appropriately refine the overall approach;
- The existence of the above integrated suite of Dicode services early on in the project enables early exploitation and dissemination and ensures project sustainability.
Further to this evolutionary approach, crucial for the successful completion of the Dicode project’s plan is the availability of key technologies, models and experience brought from relevant previous research, as well as the existence of key and complementary competences and expertise among the consortium’s partners to carry out productive work from the very beginning of the project. Dicode has also secured the direct involvement of the relevant developers’ community.
The Dicode project has already identified and elaborated significant risks and will proceed according to its major milestones. Broadly speaking, the project is divided into two main phases: (i) Phase I (months 1-18), during which requirements and specifications are produced, operational versions of the Dicode services are developed and integrated, innovative work methodologies are sketched, and feedback from the first validation and assessment of the Dicode outcomes is collected; (ii) Phase II (months 19-36), where specifications and the overall conceptual framework is revised, Dicode services and integrated suite offer advanced capabilities, work methodologies turn to best practices and innovative work guidelines, Dicode outcomes continue to be thoroughly tested, while the final evaluation via use cases and the overall project’s evaluation take place. It is noted that exploitation activities take place throughout the project’s duration.
The project provides an end-to-end approach to the problem of intelligent information management, ranging from efficient data acquisition and processing, over building blocks for important bottlenecks of information management, to prototypical use cases in complex domains of significant scientific and economic impact. Being committed to an open source approach, the project thus makes significant impact and delivers re-usable results on many different levels of intelligent information management.
To achieve its scientific and technical objectives and assure a successful scientific and administrative management, the Dicode project breaks down into seven (7) workpackages, whose architecture, interdependencies, leaders and foreseen use cases are illustrated in the Figure below. Work to be performed in these WPs, together with their interdependencies, is explained below.
First of all, WP1 (leader: CTI) coordinates the work to be performed in the project, being solely devoted to strategic and daily management activities.
WP2 (leader: UOL) scrutinizes the processes of collaboration and decision making in data-intensive and cognitively-complex settings. Work to be performed in this work package includes the study of the related state-of-the-art (covering both the work methodologies followed and the supporting information and communication technologies), the identification and analysis of stakeholders’ problems and requirements in diverse settings (through the detailed elaboration of the project’s use cases), and the specification of an agile solution to the problem under consideration. The above will guide the software development in WP3-WP5. Also, through the three use cases, WP2 works closely with WP6.
WP3 (leader: FHG) exploits and builds on top of most prominent large data processing technologies (cloud computing, MapReduce, Hadoop, Mahout, column databases, etc.) to design and implement innovative services for mining both structured and unstructured (i.e. text) data. The associated modules serve functionalities such as data acquisition and pre-processing (document analysis, data cleansing, and data transformation), scalable data storage, directed data filtering and fusion, data clustering and classification, and data aggregation. Intelligent data mining techniques to be elaborated include local pattern mining, similarity learning, and graph mining for structured data, and named entity recognition, entity assignment, relation extraction and opinion mining for text data. The foreseen approach makes use of both semantic metadata and pre-structured data patterns.
WP4 (leader: CTI), which runs in parallel to WP3, concerns design and implementation of collaboration and decision making support services, dedicated to the data-intensive settings under consideration. Modules foreseen in this work package enable the synchronous and asynchronous collaboration of stakeholders through adaptive workspaces, and serve alternative data visualization schemas (also accommodating the functionalities of WP3 services). Most important, they facilitate (both individual and group) sense- and decision-making by supporting stakeholders in locating and retrieving relevant information and providing them with appropriate recommendations (taking into account parameters such as preferences competences, expertise etc.). The services to be developed in WP3 and WP4 aim at reducing the data-intensiveness and overall complexity of the settings under consideration to a manageable level, the former focusing on the exploitation of the reasoning capabilities of the machine while the latter on that of humans.
Integration of the abovementioned services is carried out in WP5 (leader: UPM). Such integration concerns a technical and a conceptual point of view. In order to assure technical interoperability, WP5 includes a continuous task on definition and elaboration of development standards and guidelines (to be adopted by all teams involved), starting early in the project. Data bases and document repositories issues will be also addressed in this work package. The conceptual integration will be based on the development of an appropriate ontology to drive the data mining services of WP3 and augment the collaboration and decision support services of WP4. The foreseen integration will be enabled by a workflow engine that will meaningfully orchestrate these services. Another important outcome of WP5 concerns the development of innovative work methodologies, which exploit the project’s innovative services to advance current practices in data-intensive collaboration.
Through WP6 (leader: BRF), the project pays much attention to the validation and assessment of the services developed and integrated in WP3-WP5 through three real use cases. Dedicated metrics and instruments will be designed and exploited to evaluate the overall solution and assess the performance of the associated trials. WP6 provides valuable feedback for the refinement and improvement of the work being performed in WP3-WP5.
Finally, WP7 (leader: NEO) includes various typical and dedicated tasks concerning dissemination of project’s achievements and exploitation of project’s results in various business domains that are associated with large and complex data.