This is an old revision of the document!


OntoDM-KDD - Ontology for Data Mining Investigations

The Knowledge Discovery Process

The term knowledge discovery in databases, or KDD for short, refers to the broad process of finding knowledge in data. The term data mining refers to the sub-process of the KDD process that involves the application of algorithms for extraction of knowledge from data in form of patterns (generalizations). The KDD process is interactive and iterative. It involves numerous sub-processes, such as for example: developing an understanding of the application domain, the relevant prior knowledge, and end user's goals; collecting data and creating a target dataset; data cleaning and pre-processing; data reduction and projection; choice of data mining task; choice of data mining algorithm; modeling or data mining; deployment of mined knowledge; and incorporating and use of the mined knowledge.

In the domain of data mining and knowledge discovery, there are proposals for standardizing the process of knowledge discovery in the context of representing and performing data mining investigations. One of the most prominent proposals is the CRISP-DM methodology (Chapman et al., 1999). CRISP-DM stands for Cross Industry Standard Process for Data Mining. It is a process model that describes data mining investigations performed in practical applications. The CRISP-DM process model is based on commonly used approaches that expert data miners use to tackle and solve the practical problems in the domain of data mining. The OntoDM-KDD ontology is based on the CRISP-DM process model.

Ontology Design

The OntoDM-KDD ontology is based on the CRISP-DM process model. Its main goal is to be general enough to allow the representation of knowledge discovery processes and data mining investigations performed in practical applications. The OntoDM-KDD ontology provides:

  • a representation of data mining investigation by directly extending classes from the OBI and IAO ontologies;
  • a model of each of the phases (including their inputs and outputs) in a data mining investigation
    • application understanding,
    • data understanding,
    • data preparation,
    • modeling,
    • DM process evaluation, and
    • deployment.

In order to ensure the interoperability of OntoDM-KDD with other resources, the OntoDM-KDD ontology follows the OBO Foundry design principles. OntoDM-KDD imports the upper level classes from the Basic Formal Ontology (BFO1.1) and formal relations from the OBO Relational Ontology (RO) and uses an extended set of RO relations. BFO an RO were chosen as they are widely accepted, especially in the bio-medical domain.

Following best practices in ontology development, the OntoDM-KDD ontology reuses appropriate classes from a set of ontologies, that act as mid-level ontologies. These include the Ontology for Biomedical Investigations (OBI), the Information Artifact Ontology (IAO), and the Software Ontology (SWO). Classes that are referenced and reused in OntoDM-KDD are imported by using the Minimum Information to Reference an External Ontology Term (MIREOT) principle (Mireot et al, 2011} and extracted using the OntoFox web service.

OntoDM-KDD is expressed in OWL-DL, a de facto standard for representing ontologies. The ontology is being developed using the Protege ontology editor.

Versions and Download

Release version 1

Publications

OntoDM-KDD@Bioportal


QR Code
QR Code OntoDM-KDD - Ontology for Data Mining Investigations (generated for current page)