HomeFaculty Microsoft Faculty Technology Dossiers Data Mining and Business Intelligence

Data Mining and Business Intelligence

What is Data Mining and Business Intelligence?

Data Mining is searching for valuable information within large volumes of data. Business intelligence (BI) is a term that refers to the set of technologies and tools used to collect, store, and share data, and to perform data analysis, produce reports, and integrate data, reports, and analysis.

The Basics of Data Mining

Data Mining seeks to find patterns that describe relationships or patterns in the data. More formally, given a set of data (D), a language (L), and some measurement of certainty (C), data mining is a process that finds statements (S) or patterns (P) that describe relationships among subsets of (D) with certainty (C). “Interesting” patterns having sufficient certainty can be treated as new pieces of knowledge that can be incorporated into a knowledge base. Fields or features are identified from a problem domain and measured over many cases, to do classification or regression. The size of the problem is given by the number of records, the number of features, and the number of distinct values that features can take.

Data mining has four core tasks: Cluster Analysis, Predictive Modeling, Association Analysis, and Anomaly Detection.

Cluster analysis is a set of statistical techniques that can be applied to reveal the “natural” similarities or differences within a data set. The process consists of sorting through the raw data and create clusters of similar data. Metrics can be created to measure how close or distant data items are within a cluster or with respect to items in other clusters.

Predictive modeling consists of creating or choosing models that represent a process accurately enough as to be able to predict the probability of an outcome.

Association analysis is a method used to discover interesting relationships hidden in large data sets. The relationships so discovered can be expressed as association rules containing metrics to assess the rule’s degree of support and confidence.

Anomaly detectioncompares a profile of permissible or expected behaviors to actual outcomes, and provides measures to compare deviations from the profile and for recommending actions when deviations exceed predefined tolerances.

Data Mining is a multidisciplinary endeavor that emerged from statistics, computer science, machine learning, and AI, and as such, is a term used to refer to algorithms and data structures. As shown below the most commonly used algorithms include Decision Trees, Clustering, Logistic Regression, Time Series Analysis, Sequence Clustering, Association, Bayesian methods and Neural Nets.

BI1.jpg

The Basics of Business Intelligence

Business intelligence (BI) is a term that refers to the set of technologies, tools and procedures used to collect, store, and share data, and to perform data analysis, produce reports, and integrate data, reports, and analysis.

The set of tools and facilities depicted in the diagram below represent a complete BI offering that provides a rich set of tools and technologies to incorporate and leverage the data mining algorithms. To be complete, the tools should also include facilities for software developers and system administrators, as the figure depicts.

BI2.jpg

Ideally BI tools should be designed to facilitate working in collaboration, promote communication, allow reports to be produced in a wide variety of formats, and ensure that the data storage and transmission is secure, as illustrated below.

BI3.jpg

How can I learn more?

We have assembled additional resource at http://www.Microsoft4Me.com/faculty/Resource_BI that will allow to further explore Data Mining and Business Intelligence.