Data mining is the process of analyzing data from different sources and summarizing it into relevant information that can be used to help increase revenue and decrease costs. Its primary purpose is to find correlations or patterns among dozens of fields in large databases.
Data mining software is one of many analytical tools for reading data, allowing users to view data from many different angles, categorize it, and sum up the relationships identified. The ultimate goal of data mining is prediction and discovery. The process searches for consistent patterns and systematic relationships between variables, then validates the findings by applying the patterns to new subsets of data.
Data mining consists of five major elements:
I. Extract, transform, and load transaction data onto the data warehouse
II. Store and manage the data in a multidimensional database system
III. Provide data access to business analysts and IT professionals
IV. Analyze the data by application software
V. Present the data in a useful format (graph, table, etc.)
The process of data mining is simple and consists of three stages. The initial exploration stage usually starts with data preparation which involves cleaning out data, transforming data, and selecting subsets of records and data sets with large number of variables. Then, identifying relevant variables and determining the complexity of models must be done to elaborate exploratory analyses using a wide variety of graphical and statistical methods.