The architecture was designed to be as generic as possible while remaining easy to understand, implement and use. Our first key requirement was to ensure that no programming language is imposed on the user. The motivation behind this requirement was simply that certain programming language as better suited for certain tasks. Given the variety of tasks that are required to achieve high-quality question answering, enforcing a programming language would have been prohibitive to the functionality and extensibility of the framework. We thus decided that all modules would be implemented as web services.
Our second requirement was to reuse existing standards as much as possible. We thus decided that all services are to generate and consume JSON objects according to the architectural design below.
Our third requirement was that of provenance tracking. We thus chose to add the ID of each service to its JSON output, making the contribution of each module easy to track throughout the QA process.
Several types of architecture can be envisaged for QA. We assumed the QA process to be a workflow in which a controller decides on the workflow to employ, stores metadata on the current workflow and is free to call component in the order it requires. Each component on the other hand assumes a particular type of JSON object as input and returns JSON as output. Depending on their implementation, components are free to access as many other components as required.
Overall, 8 modules were specified as integral parts of the QA process.
Other modules that might be considered when building a QA system include:
Evaluation of QA systems typically involve having a set of pre-defined questions, example queries, and answers to the questions. The Question Answering over Linked Data (QALD) is an evaluation campaign on multilingual question answering over linked data. QALD-4 provides a set of biomedical questions drawn from DrugBank, SIDER and Diseaseome. We wanted to extend this evaluation to a broader set of questions of increasing complexity, and to consider the data from DBpedia, Bio2RDF and BioGateway. Towards this goal, we
(2) created questions, queries, and answers over our KB.