Data mining deals with the kind of patterns that can be mined. On the basis of the kind of data to be mined, there are two categories of functions involved in Data Mining −
The descriptive function deals with the general properties of data in the database. Here is the list of descriptive functions −
Class/Concept refers to the data to be associated with the classes or concepts. For example, in a company, the classes of items for sales include computer and printers, and concepts of customers include big spenders and budget spenders. Such descriptions of a class or a concept are called class/concept descriptions. These descriptions can be derived by the following two ways −
Data Characterization − This refers to summarizing data of class under study. This class under study is called as Target Class.
Data Discrimination − It refers to the mapping or classification of a class with some predefined group or class.
Frequent patterns are those patterns that occur frequently in transactional data. Here is the list of kind of frequent patterns −
Frequent Item Set − It refers to a set of items that frequently appear together, for example, milk and bread.
Frequent Subsequence − A sequence of patterns that occur frequently such as purchasing a camera is followed by memory card.
Frequent Sub Structure − Substructure refers to different structural forms, such as graphs, trees, or lattices, which may be combined with item-sets or subsequences.
Associations are used in retail sales to identify patterns that are frequently purchased together. This process refers to the process of uncovering the relationship among data and determining association rules.
For example, a retailer generates an association rule that shows that 70% of time milk is sold with bread and only 30% of times biscuits are sold with bread.
It is a kind of additional analysis performed to uncover interesting statistical correlations between associated-attribute-value pairs or between two item sets to analyze that if they have positive, negative or no effect on each other.
Cluster refers to a group of similar kind of objects. Cluster analysis refers to forming group of objects that are very similar to each other but are highly different from the objects in other clusters.
Classification is the process of finding a model that describes the data classes or concepts. The purpose is to be able to use this model to predict the class of objects whose class label is unknown. This derived model is based on the analysis of sets of training data. The derived model can be presented in the following forms −
The list of functions involved in these processes are as follows −
Classification − It predicts the class of objects whose class label is unknown. Its objective is to find a derived model that describes and distinguishes data classes or concepts. The Derived Model is based on the analysis set of training data i.e. the data object whose class label is well known.
Prediction − It is used to predict missing or unavailable numerical data values rather than class labels. Regression Analysis is generally used for prediction. Prediction can also be used for identification of distribution trends based on available data.
Outlier Analysis − Outliers may be defined as the data objects that do not comply with the general behavior or model of the data available.
Evolution Analysis − Evolution analysis refers to the description and model regularities or trends for objects whose behavior changes over time.
Note − These primitives allow us to communicate in an interactive manner with the data mining system. Here is the list of Data Mining Task Primitives −
This is the portion of database in which the user is interested. This portion includes the following −
It refers to the kind of functions to be performed. These functions are −
The background knowledge allows data to be mined at multiple levels of abstraction. For example, the Concept hierarchies are one of the background knowledge that allows data to be mined at multiple levels of abstraction.
This is used to evaluate the patterns that are discovered by the process of knowledge discovery. There are different interesting measures for different kind of knowledge.
This refers to the form in which discovered patterns are to be displayed. These representations may include the following. −