OntoDM-core - Ontology of Core Data Mining Entities
Background
In data mining, the data used for analysis are organized in the form of a dataset. Every dataset consists of data examples. The task of data mining is to produce some type of a generalization from a given dataset. Generalization is a broad term that denotes the output of a data mining algorithm. A data mining algorithm is an algorithm, that is implemented as computer program and is designed to solve a data mining task. Data mining algorithms are computer programs and when executed they take as input a dataset and give as output a generalization.
In this context, the OntoDM-core sub-ontology formalizes the key data mining entities needed for the representation of mining structured data in the context of a general framework for data mining (Dzeroski, 2006).
Design
OntoDM-core is expressed in OWL-DL , a de facto standard for representing ontologies. The ontology is being developed using the Protege ontology editor. The ontology is freely available at this page and at BioPortal.
In order to ensure the extensibility and interoperability of OntoDM-core with other resources, in particular with biomedical applications, the OntoDM-core ontology follows the Open Bio-Ontologies (OBO) Foundry design principles, such as the
- use of an upper-level ontology,
- the use of formal ontology relations,
- single inheritance, and
- the re-use of already existing ontological resources where possible.
The application of these design principles enables cross-domain reasoning, facilitates wide re-usability of the developed ontology, and avoids duplication of ontology development efforts. Consequently, OntoDM-core imports the upper-level classes from the BFO version 1.1 and formal relations from the OBO Relational Ontology and an extended set of RO relations.
Following best practices in ontology development, the OntoDM-core ontology reuses appropriate classes from a set of ontologies, that act as mid-level ontologies for OntoDM-core. These include the
For representing the mining of structured data, we import the OntoDT ontology of datatypes. Classes that are referenced and reused in OntoDM-core are imported into the ontology by using the Minimum Information to Reference an External Ontology Term (MIREOT) principle and extracted using the OntoFox web service.
Ontology Structure
For the domain of DM, we propose a horizontal description structure that includes three layers:
- a specification layer,
- an implementation layer, and
- an application layer.
Having all three layers represented separately in the ontology will facilitate different uses of the ontology. For example, the specification layer can be used to reason about data mining algorithms; the implementation layer can be used for search over implementations of data mining algorithms and to compare various implementations; and the application layer can be used for searching through executions of data mining algorithms.
Key OntoDM-core classes
The ontology includes the representation of the following entities:
- data,
Ontology evaluation
We assess the quality of OntoDM-core from three different evaluation aspects:
Versions and Download
Release version 1
- All files in one zip archiveOntoDM-coreV1.zip
- OntoDM-core main file OntoDM-core.owl
- File that OntoDM-core imports directly and contains external classes external.owl
- File that external file imports and contains OBI classes external-OBI.owl
- OntoDT ontology of datatypes OntoDT.owl
Papers
Panov P., Soldatova L., Džeroski S. Ontology of core data mining entities. Data Mining and Knowledge Discovery 28(5-6):1222-1265, 2014 DOI 10.1007/s10618-014-0363-0