Data mining algorithm

A data mining algorithm is an algorithm (implemented in a computer program), designed to solve a data mining task. It takes as input a dataset of examples of a given datatype and produces as output a generalization (from a given class) on the given datatype. A specific data mining algorithm can typically handle examples of a limited set of datatypes: For example, a rule learning algorithm might handle only tuples of Boolean attributes and a boolean class.

In the OntoDM-core ontological framework, we consider three aspects of the DM algorithm entity: a DM algorithm (as a specification), a DM algorithm implementation, and a DM algorithm execution.

Data mining algorithm as a specification

Data mining algorithm as a specification is a subclass of the IAO class plan specification having as parts a data mining task, an action specification (reused from IAO), a generalization specification, and a document (reused from IAO). The data mining task defines the objective that the realized plan should fulfill at the end giving as output a generalization, while the action specification describes the actions of the data mining algorithm realized in the process of execution. The generalization specification denotes the type of generalization produced by executing the algorithm. Finally, having a document class as a part allows us to connect the algorithm to the annotations of documents (journal articles, workshop articles, technical reports) that publish knowledge about the algorithm.

In analogy with the taxonomy of datasets, data mining tasks and generalizations, in OntoDM-core we also construct a taxonomy of data mining algorithms. As criteria, we use the data mining task and the generalization produced as the output of the execution of the algorithm.

Data mining algorithm implementation

Data mining algorithm implementation is defined as a sub-class of the BFO class realizable entity. It is a concretization of a data mining algorithm, in the form of a runnable computer program, and has as qualities parameters. The parameters of the algorithm affect its behavior when the algorithm implementation is used as an operator. A parameter itself is specified by a parameter specification that includes its name and description.

Data mining software

In OntoDM-core, we define data mining softwareas a sub-class of directive information entity (reused from IAO). It represents a specification of a data mining algorithm implementation. It has as parts all the meta-information entities about the software implementation such as: source code, software version specification, programming language, software compiler specification, software manufacturer, the data mining software toolkit it belongs to, etc. Finally, a data mining software toolkit is a specification entity that contains as parts data mining software entities.

Data mining operator

Data mining operator is defined as sub-class of the BFO class role. In that context, it is a role of a data mining algorithm implementation that is realized (executed) by a data mining algorithm execution process. Data mining operator has information about the specific parameter setting of the algorithm, in the context of the realization of the operator in the process of execution. The parameter setting is a subclass of data item (reused from IAO), which is a quality specification of a parameter.

Data mining algorithm execution

In OntoDM-core, we define data mining algorithm execution as a sub-class of planned process (reused from the OBI ontology). A data mining algorithm execution realizes (executes) a data mining operator, has as input a dataset, has as output a generalization, has as agent a computer, and achieves as a planned objective a data mining task.

QR Code
QR Code Data mining algorithm (generated for current page)