Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
ontodm-core [2014/09/15 10:23]
admin [Data mining task]
ontodm-core [2016/04/12 10:56] (current)
admin [Release version 1]
Line 22: Line 22:
 For representing the mining of structured data, we import the [[ontodt|OntoDT ontology of datatypes]]. Classes that are referenced and reused in OntoDM-core are imported into the ontology by using the [[http://​obi-ontology.org/​page/​MIREOT|Minimum Information to Reference an External Ontology Term (MIREOT) principle]] ​ and extracted using the [[http://​ontofox.hegroup.org|OntoFox]] web service. For representing the mining of structured data, we import the [[ontodt|OntoDT ontology of datatypes]]. Classes that are referenced and reused in OntoDM-core are imported into the ontology by using the [[http://​obi-ontology.org/​page/​MIREOT|Minimum Information to Reference an External Ontology Term (MIREOT) principle]] ​ and extracted using the [[http://​ontofox.hegroup.org|OntoFox]] web service.
 =====Ontology Structure===== =====Ontology Structure=====
-For the domain of DM, we propose a horizontal description structure that includes three layers: ​+For the domain of DM, we propose a [[layers|horizontal description structure that includes three layers]]
   * a specification layer, ​   * a specification layer, ​
   * an implementation layer, and    * an implementation layer, and 
Line 28: Line 28:
 Having all three layers represented separately in the ontology will facilitate different uses of the ontology. For example, the specification layer can be used to reason about data mining algorithms; the implementation layer can be used for search over implementations of data mining algorithms and to compare various implementations;​ and the application layer can be used for searching through executions of data mining algorithms. ​ Having all three layers represented separately in the ontology will facilitate different uses of the ontology. For example, the specification layer can be used to reason about data mining algorithms; the implementation layer can be used for search over implementations of data mining algorithms and to compare various implementations;​ and the application layer can be used for searching through executions of data mining algorithms. ​
  
-This description structure is based on the use of the upper-level ontology [[http://​www.ifomis.org/​bfo/​|BFO]] and the extensive reuse of classes from the mid-level ontologies [[http://​obi-ontology.org/​page/​Main_Page|OBI]] and [[https://​code.google.com/​p/​information-artifact-ontology/​|IAO]]. The proposed three layer description structure is orthogonal to the vertical ontology architecture which comprises ​ an:  
-  * upper-level, ​ 
-  * a mid-level, and  
-  * a domain level. ​ 
-This means that each vertical level contains all three description layers. ​ 
-==== Layers ==== 
-{{ ::​fig1-page1.png?​400|}} 
-The specification layer contains //BFO: generically dependent continuants//​ at the upper-level,​ and //IAO: information content entities// at the mid-level. In the domain of data mining, example classes are //data mining task// and //data mining algorithm//​. ​ 
  
-The implementation layer describes //BFO: specifically dependent continuants//,​ such as //BFO: realizable entities// (entities that are executable in a process). At the domain level, this layer contains classes that describe the implementations of algorithms. ​ 
- 
-The application layer contains classes that aim at representing processes, e.g., extensions of //BFO: processual entity//. Examples of (planned) process entities in the domain of data mining are the execution of a data mining algorithm and the application of a generalization on new data, among others. 
-==== Relations between layers ==== 
- 
-The entities in each layer are connected using general relations, that are layer independent,​ and layer specific relations. Examples of general relations are //is-a// and //​part-of//:​ they can only be used to relate entities from the same description layer. For example, an information entity (member of the specification layer) can not have as parts processual entities (members of the application layer). Layer specific relations can be used only with entities from a specific layer. For example, the relation //​precedes//​ is only used to relate two processual entities. The description layers are connected using cross-layer relations. An entity from the specification layer //​is-concretized-as//​ an entity from the implementation layer. Next, an implementation entity //​is-realized-by//​ an application entity. Finally, an application entity, e.g., a planned process //​achieves-planned-objective//,​ which is a specification entity. 
 =====Key OntoDM-core classes===== =====Key OntoDM-core classes=====
-The ontology includes the representation of the following entities: ​data specification and dataset, data mining task, generalization,​ data mining algorithm, constraints and constraint based data mining tasks and algorithms, and data mining scenario. +The ontology includes the representation of the following entities:  
-{{ :​ontodm-coreentities.png?​direct&600 |}} +{{ :​ontodm-coreentities.png?​600|}} 
- +  * [[data|data]] 
-==== Data ==== +  * [[data mining task|data mining task]] 
-The main ingredient in the process of data mining is the data. In OntoDM-corewe model the data with a //data specification//​ entity that describes the datatype of the underlying data. For this purposewe import the mechanism for representing arbitrarily complex datatypes from [[ontodt|OntoDT ontology]] +  * [[generalization|generalization]] 
-=== Descriptive and output ​data specification === +  * [[data mining algorithm|data mining algorithm]] 
- +  * [[constraints|constraints ​and constraint ​based data mining tasks and algorithms]], ​and  
-In OntoDM-core,​ we distinguish between a //​descriptive ​data specification//​that specifies the data used for descriptive purposes (e.g., in the clustering ​and pattern discovery), and //output data specification//,​ that specifies the data used for output purposes (e.g., classes/​targets in predictive modeling). A tuple of primitives or a graph with boolean edges and discrete nodes are examples of data specified only by a descriptive specification. Feature-based data with primitive output ​and feature-based data with structured output are examples of data specified by both descriptive ​and output specifications. +  ​* ​[[data mining scenario|data mining scenario]].
- +
-=== Dataset=== +
- +
-OntoDM-core imports the IAO class dataset (defined as `a data item that is an aggregate of other data items of the same type that have something in common'​) and extends it by further specifying that a //DM dataset// has part //data examples//​. +
- +
-OntoDM-core also defines the class //dataset specification//​ to enable reasoning about data and datasets. It specifies the type of the dataset based on the type of data it contains. Using data specifications and the taxonomy of datatypes from the [[ontodt|OntoDT ontology]], in OntoDM-core we build a taxonomy of datasets.+
  
-==== Data mining task ====+===== Ontology evaluation ​===== 
 +We assess the quality of OntoDM-core from three different evaluation aspects: 
 +  * [[ontology metrics|we analyze a set of ontology metrics]];  
 +  * [[design criteria assessment|assess how well the ontology meets a set of predefined design criteria and ontology best practices]];​ and  
 +  * [[competency questions assessment|assess the ontology toward a set of competency questions]].
  
-The task of data mining is to produce a generalization from given data. In OntoDM-core,​ we use the term generalization to denote the outcome of a data mining task. A //data mining task// is defined as sub-class of the IAO class //objective specification//​. It is an objective specification that specifies the objective that a data mining algorithm needs to achieve when executed on a dataset to produce as output a generalization. 
  
-=== Taxonomy of data mining tasks === 
  
-The definition of a data mining task depends directly ​ on the data specification,​ and indirectly on the datatype of the data at hand. This allows us to form a taxonomy of data mining tasks based on the type of data. Dzeroski (2006) proposes four basic classes of data mining tasks based on the generalizations that are produced as output: //​clustering//,​ //pattern discovery//,​ //​probability distribution estimation//,​ and //​predictive modeling//. These classes of tasks are included as the first level of the OntoDM-core data mining task taxonomy. They are fundamental and can be defined on an arbitrary type of data. An exception is the predictive modeling task that is defined on a pair of datatypes (for the descriptive and output data separately). At the next levels, the taxonomy of data mining task depends on the datatype of the descriptive data (in the case of predictive modeling also on the datatype of the output data). ​ 
  
-=== Taxonomy of predictive modeling tasks === 
  
-If we focus only on the predictive modeling task and using the output data specification as a criterion, we distinguish between the //primitive output prediction task// and the //​structured output prediction task//. In the first case, the output datatype is primitive (e.g., discrete, boolean or real); in the second case, it is some structured datatype (such as a tuple, set, sequence or graph). 
  
-{{ ::​fig3-page1.png?​600 |}} 
  
-== Primitive output prediction tasks == 
-//Primitive output prediction tasks// can be feature-based or structure-based,​ depending on the datatype of the descriptive part. The //​feature-based primitive output prediction tasks// have a tuple of primitives (a set of primitive features) on the description side and a primitive datatype on the output side. This is the most exploited data mining task in traditional single-table data mining, described in all major data mining textbooks. If we specify the output datatype in more detail, we have the //binary classification task//, the //​multi-class classification task// and the //​regression task//; where the output datatype is boolean, discrete or real, respectively. //​Structure-based primitive output prediction//​ tasks operate on data that have some structured datatype (other than tuple of primitives) on the description side and a primitive datatype on the output side. 
  
-== Structured output prediction tasks == 
-In a similar way, //​structured output prediction tasks// can be feature-based or structure-based. //​Feature-based structured output prediction tasks// operate on data that have a tuple of primitives on the description side and a structured datatype on the output side. //​Structure-based structured output prediction tasks// operate on data that have structured datatypes both on the description side and the output side. 
  
-If we focus just on feature-based structured output tasks and further specify a structured output datatype, we can represent a variety of structured output prediction tasks. For example, we can represent the following tasks: //​multi-target prediction//​ (which has as output datatype //tuple of primitives//​),​ //​multi-label classification//​ (having as output datatype //set of discrete//​),​ //​time-series prediction//​ (having as output datatype //sequence of real//) and //​hierarchical classification//​ (having as output datatype //labeled graph with boolean edges and discrete nodes//). //​Multi-target prediction//​ can be further divided into: //​multi-target binary classification//,​ //​multi-target multi-class classification//,​ and //​multi-target regression//​. 
  
  
Line 88: Line 61:
 {{ :​file_structure.jpg?​direct&​300|}} {{ :​file_structure.jpg?​direct&​300|}}
   * All files in one zip archive{{:​ontodm_v_1_r.zip|OntoDM-coreV1.zip}}   * All files in one zip archive{{:​ontodm_v_1_r.zip|OntoDM-coreV1.zip}}
-  * OntoDM-core main file [[http://kt.ijs.si/panovp/​OntoDM/​OntoDM.owl|OntoDM-core.owl]] +  * OntoDM-core main file [[http://ontodm.com/ontodm-core/​OntoDM.owl|OntoDM-core.owl]] 
-  * File that OntoDM-core imports directly and contains external classes [[http://kt.ijs.si/panovp/​OntoDM/​external.owl|external.owl]] +  * File that OntoDM-core imports directly and contains external classes [[http://ontodm.com/ontodm-core/​external.owl|external.owl]] 
-  * File that external file imports and contains OBI classes[[http://​kt.ijs.si/panovp/​OntoDM/​external-OBI.owl|external-OBI.owl]] +  * File that external file imports and contains OBI classes [[http://ontodm.com/ontodm-core/​external-OBI.owl|external-OBI.owl]] 
-  * OntoDT ontology of datatypes [[http://kt.ijs.si/panovp/​OntoDM/​OntoDT.owl|OntoDT.owl]]+  * OntoDT ontology of datatypes [[http://ontodm.com/ontodm-core/​OntoDT.owl|OntoDT.owl]]
   * {{:​clus_instances.owl}}   * {{:​clus_instances.owl}}
   * {{:​clus_inferred.owl}}   * {{:​clus_inferred.owl}}

QR Code
QR Code OntoDM-core - Ontology of Core Data Mining Entities (generated for current page)