Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
ontodm-kdd [2013/02/13 16:48]
admin
ontodm-kdd [2016/04/12 11:00] (current)
admin [Release version 1]
Line 1: Line 1:
 ====== OntoDM-KDD - Ontology for Data Mining Investigations ====== ====== OntoDM-KDD - Ontology for Data Mining Investigations ======
  
-=====Background===== +=====The Knowledge Discovery Process=====
-The term knowledge discovery in databases, or KDD for short, refers to the broad process of finding knowledge in data. The term data mining refers to the sub-process of the KDD process that involves the application of algorithms for extraction of knowledge from data in form of patterns (generalizations). The KDD process is interactive and iterative. It involves numerous sub-processes,​ such as for example: developing an understanding of the application domain, the relevant prior knowledge, and end user's goals; collecting data and creating a target dataset; data cleaning and pre-processing;​ data reduction and projection; choice of data mining task; choice of data mining algorithm; modeling or data mining; deployment of mined knowledge; and incorporating and use of the mined knowledge.+
  
 +The term knowledge discovery in databases, or KDD for short, refers to the broad process of finding knowledge in data. The term data mining refers to the sub-process of the KDD process that involves the application of algorithms for extraction of knowledge from data in form of patterns (generalizations). The KDD process is interactive and iterative. It involves numerous sub-processes,​ such as for example: developing an understanding of the application domain, the relevant prior knowledge, and end user's goals; collecting data and creating a target dataset; data cleaning and pre-processing;​ data reduction and projection; choice of data mining task; choice of data mining algorithm; modeling or data mining; deployment of mined knowledge; and incorporating and use of the mined knowledge. ​
  
-OntoDM-KDD is a sub-ontology for representing data mining investigations. Its goal is to allow the representation of knowledge discovery processes and be general enough to represent the data mining investigations. In the domain of data mining and knowledge discovery, there are proposals for standardizing the process of knowledge discovery in the context of representing and performing data mining investigations. One of the most prominent proposals is the [[http://​www.google.si/​url?​sa=t&​rct=j&​q=crisp-dm&​source=web&​cd=7&​cad=rja&​ved=0CEcQFjAG&​url=http%3A%2F%2Fwww.the-modeling-agency.com%2Fcrisp-dm.pdf&​ei=Za4CUYrwIaji4QSX9YHgCA&​usg=AFQjCNFTNXE36E5pSvSvKi_8QhAv2w0ay|CRISP-DM methodology]] (Chapman et al., 1999). CRISP-DM stands for Cross Industry Standard Process for Data Mining. It is a process model that describes data mining investigations performed in practical applications. The CRISP-DM process model is based on commonly used approaches that expert data miners use to tackle and solve the practical problems in the domain of data mining. The OntoDM-KDD ontology is based on the CRISP-DM process model. +{{ :ontodmkdd-crispdm.png?300 |}} 
- +In the domain of data mining and knowledge discovery, there are proposals for standardizing the process of knowledge discovery in the context of representing and performing data mining investigations. One of the most prominent proposals is the [[http://​www.google.si/​url?​sa=t&​rct=j&​q=crisp-dm&​source=web&​cd=7&​cad=rja&​ved=0CEcQFjAG&​url=http%3A%2F%2Fwww.the-modeling-agency.com%2Fcrisp-dm.pdf&​ei=Za4CUYrwIaji4QSX9YHgCA&​usg=AFQjCNFTNXE36E5pSvSvKi_8QhAv2w0ay|CRISP-DM methodology]] (Chapman et al., 1999). CRISP-DM stands for Cross Industry Standard Process for Data Mining. It is a process model that describes data mining investigations performed in practical applications. The CRISP-DM process model is based on commonly used approaches that expert data miners use to tackle and solve the practical problems in the domain of data mining. The OntoDM-KDD ontology is based on the CRISP-DM process model. 
-=====Content===== +===== Ontology Design ​===== 
-The OntoDM-KDD ontology ​module ​provides:+The OntoDM-KDD ontology ​is based on the CRISP-DM process model. Its main goal is to be general enough to allow the representation of knowledge discovery processes and data mining investigations performed in practical applications. The OntoDM-KDD ontology ​provides:
   * a representation of data mining investigation by directly extending classes from the OBI and IAO ontologies;   * a representation of data mining investigation by directly extending classes from the OBI and IAO ontologies;
   * a model of each of the phases (including their inputs and outputs) in a data mining investigation   * a model of each of the phases (including their inputs and outputs) in a data mining investigation
Line 18: Line 18:
     * deployment.     * deployment.
  
 +In order to ensure the interoperability of OntoDM-KDD with other resources, the OntoDM-KDD ontology follows the [[http://​obofoundry.org/​crit.shtml|OBO Foundry design principles]]. ​ OntoDM-KDD imports the upper level classes from the [[http://​www.ifomis.org/​bfo|Basic Formal Ontology (BFO1.1)]] and formal relations from the [[http://​purl.org/​obo/​owl/​OBO_REL|OBO Relational Ontology (RO)]] and uses an extended set of RO relations. BFO an RO were chosen as they are widely accepted, especially in the bio-medical domain. ​
 +
 +Following best practices in ontology development,​ the OntoDM-KDD ontology reuses appropriate classes from a set of ontologies, that act as mid-level ontologies. These include the [[http://​obi-ontology.org/​page/​Main_Page|Ontology for Biomedical Investigations (OBI)]], the [[http://​code.google.com/​p/​information-artifact-ontology|Information Artifact Ontology (IAO)]], and the [[http://​theswo.sourceforge.net|Software Ontology (SWO)]]. Classes that are referenced and reused in OntoDM-KDD are imported by using the [[http://​obi-ontology.org/​page/​MIREOT|Minimum Information to Reference an External Ontology Term (MIREOT)]] principle (Mireot et al, 2011} and extracted using the [[http://​ontofox.hegroup.org|OntoFox web service]].
 +
 +OntoDM-KDD is expressed in [[http://​www.w3.org/​TR/​owl-guide|OWL-DL]],​ a de facto standard for representing ontologies. The ontology is being developed using the [[http://​protege.stanford.edu|Protege]] ontology editor. ​
 +{{ :​dminvestigation.png?​direct&​600 |}}
 +
 +===== OntoDM-KDD Structure =====
 +The [[http://​www.google.si/​url?​sa=t&​rct=j&​q=crisp-dm&​source=web&​cd=7&​cad=rja&​ved=0CEcQFjAG&​url=http%3A%2F%2Fwww.the-modeling-agency.com%2Fcrisp-dm.pdf&​ei=Za4CUYrwIaji4QSX9YHgCA&​usg=AFQjCNFTNXE36E5pSvSvKi_8QhAv2w0ay|CRISP-DM methodology]] process model, at the top level, is organized into six phases: business understanding phase, data understanding phase, data preparation phase, modeling phase, evaluation phase, and deployment phase. It defines the outputs of each CRISP-DM phase and the second-level generic tasks. For example, the data understanding phase consists of four generic tasks: collect initial data, describe data, explore data, and verify data quality. The level of specialized tasks (third level) describes ​ how the generic tasks should be carried out in specific situations, in terms of activities. For example, the describe data task includes activities for volumetric analysis of data, assessment of the attribute types and values, etc. The fourth level, the level of process instances, describes the actions, decisions and results of an actual data mining investigation performed in the domain of interest.
 +
 +For the purpose of representing data mining investigations,​ it is very important to have the ability to represent entities that deal with information,​ such as data, documents, reports, models, algorithms, protocols, etc. We thus incorporate and further extend some classes of the IAO ontology. The IAO ontology is a mid-level ontology describing information content entities (e.g., documents), processes that consume or produce information content entities (e.g., documenting),​ material bearers of information (e.g., journals), and relations in which one of the relata is an information content entity (e.g., is-about). ​
 +
 +Another important representational aspect is representation of processes. In OntoDM-KDD, we use and further extend classes from the OBI ontology, such as the OBI process taxonomy, which includes general processes such as documenting,​ planing, validation, etc. The OBI ontology aims to provide a standard for the representation of biological and biomedical investigations. It supports consistent annotation of biomedical investigations regardless of the particular field of study  and  is fully compliant with the existing formalisms in biomedical domains \cite{BrinkmanEtAl10OBIcaseStudy}. In addition, OBI defines an investigation as a process with several parts, including the planning of an overall study design, executing the designed study, and documenting the results. Finally, in OntoDM-KDD we include the SWO class \ontoclass{Information processing}{\swo} that represents processes in which input information is analysed or transformed in order to produce an output information.
 +==== Description Layers ====
 +
 +In OntoDM-KDD we distinguish two description layers ​ based on the mid-level ontologies that it extends (Fig.\ref{fig:​layers}). The first layer is the specification layer, that deals with information entities needed to describe and represent the DM investigations. The second layer is the application layer that deals with processual entities in order to represent processes that occur in a DM investigation.
 +=== Specification Layer ===
 +The specification layer consists of classes that are extensions of the IAO class //​information content entity//. At the top level, it includes classes such as //data item//, //directive information entity//, //​document//,​ //document part// and //textual entity//. The //directive information entity// class is further extended with //action specification//,​ //data format specification//,​ //objective specification//,​ and //plan specification//​. In addition, ​ we also reuse the //study design// and //​protocol//​ classes from the OBI ontology.
 +=== Application layer ===
 +The application layer consists of classes that are extensions of the OBI class //planned process//. These include general classes of processes such as: //​validation//,​ //​planning//,​ //​interpreting data//, //​information processing//,​ //​selection//,​ //​identification//,​ //​documenting//,​ and //​acquisition//​. Finally, the application layer includes the OBI //​investigation//​ class, which we further extend to define and represent a //data mining investigation//​.
 +
 +====== Versions and Download ======
 +==== Release version 1 ====
 +
 +  * [[http://​ontodm.com/​ontodm-kdd/​OntoDM-KDD.owl|OntoDM-KDD.owl]]
 +===== Publications =====
 +  * Panče Panov, Larisa Soldatova, Sašo Džeroski. [[http://​link.springer.com/​chapter/​10.1007/​978-3-642-40897-7_9|OntoDM-KDD:​ Ontology for Representing the Knowledge Discovery Process]]. Discovery Science 2013, Lecture Notes in Computer Science Volume 8140, pp 126-140, 2013
 +  * Panče Panov. [[https://​www.dropbox.com/​s/​0w1gwjja76sipgi/​PanovPhD2012.pdf|A Modular Ontology of Data Mining]]. Doctoral Thesis. Jožef Stefan International Postgraduate School. 2012 (Chapter 7)
 +===== OntoDM-KDD@Bioportal =====
 +[[http://​bioportal.bioontology.org/​ontologies/​47620]]

QR Code
QR Code OntoDM-KDD - Ontology for Data Mining Investigations (generated for current page)