Marketing Competence :: Contributions
How Database Marketing could marry Data Mining
- What new opportunities result from adding data mining to the database marketing.
- How exactly the data mining enters the database marketing cycles.
- What steps are involved in solving a sample marketing problem - illustrated by applying PolyAnalyst - one of the most advanced data mining systems.
Database marketing
Database marketing is often defined as a technology that involves creating a database or a file with information about the companys customers and prospects and then using this information to market to those customers individually.
This technology is probably the most important achievement in the field of marketing since the introduction of television in the 50s. How the database marketing can improve your marketing efforts? The features that are usually cited include:
- Cost effectiveness . Your marketing dollars are spent only on the best customers and prospects who are most likely to buy.
- Market segmentation . Groups of customers with similar needs may be identified.
- Personalization . The marketing message does not have to be universal as when advertising through the mass media. It can be personalized to meet the needs of an individual customer.
- Increase customer loyalty . It is far less costly to market to existing customers, rather than to prospects. Thus it is important to keep your customers satisfied and informed about the new offerings.
The concept was introduced in the 80s, those who started using it made a wealth of money since then, but the real benefits of database marketing remain to be discovered by the general public yet. The number of companies utilizing database marketing has been growing at an amazing rate lately. According to Jim Wheaton of Neodata, more than 90% of large US corporations are evaluating or implementing database marketing today. On the other hand a transition from simple accumulation of data to a serious analysis of this data just have started recently. We are going to see more and more stress put on the data analysis in the years to come. The technology to propel this process is called data mining. This technology will be in the focus of the present paper. Prior to discussing it in detail, let us see where exactly this technology enters the database marketing cycle.
Steps involved in database marketing
Database marketing should be viewed as a sequence of steps that comprise an integral technological cycle. If any of the links in the sequence is missing, the whole process breaks down. The steps of this process are:
- Identify your customers.
- Decide what information about your customers is relevant and possible to obtain.
- Find available sources of such information (you might have to collect it yourself).
- Store customer names and all relevant information in a computer file or a database.
- Formulate a question you would like to answer.
- Analyze the stored data: build a model.
- Develop the marketing strategy that is based on this model and meets your goals.
- Communicate with the selected customers directly.
- Analyze the response from the customers.
- Repeat steps 1-9 if necessary.
- Try to go through the same steps with your prospects.
It is assumed here that you have a previous experience in the database marketing and are familiar with some of the well established techniques involved in performing the first three steps. So, you have identified your customer, selected what information to store and found a source of this information. Data storage questions involved in step 4 will be considered in a greater detail below. Let us assume for now that we found the best way to store the data and have it ready for the analysis.
Steps 5 through 7, and step 9, which involve data analysis, are usually the most intimidating and puzzling, and the least developed ones. Very often these steps become the bottleneck of the whole process. At the same time, these steps are extremely important. Even the simplest customer data analysis can significantly improve your bottom line. For example, splitting your customers in different age segments and mailing each of these categories a different marketing message is already a step toward the cost effectiveness. However, you start to reap the real benefits of the integrated database marketing approach only when you become able to answer more sophisticated questions.
- How the available characteristics of a customer are related to his buying habits ?
- What customers require an additional promotion mailing for their retention ?
- Who among your prospects will most probably become your best customers ?
- Which customer features are not worth your attention in the future ?
- How large your next month sale volume is going to be ?
Knowledge of the correct answers to these and many other questions translates into huge money saving and money generating opportunities. However, gaining this knowledge would require a far more elaborate data analysis than have been performed traditionally. One needs to build a number of hypotheses about the relations in data, test them against the reality, and keep improving the hypotheses until the one that explains data well is discovered. This endeavor involves a lot of raw data digging and sifting. That is why such model building process is called data mining.
Some history
One of the best definitions of data mining has been provided by G. Piatetsky-Shapiro of GTE, one of the major authorities in the field: Data mining is the process of identifying valid, novel, potentially useful, and ultimately comprehensible knowledge from databases that is used to make crucial business decisions.
In old days, analysts had to come up with models explaining data, and perform the model statistical testing manually. They knew statistics as their own back yards and could perform a hundred mathematical operations per minute for eight hours during the day. It was not an easy and fun task but analysts were hard laborers: they kept searching for new models if the tried ones worked poorly.
The first relief came when statistical methods involved in hypotheses testing became computerized. Statistical packages, such as the notorious SPSS, saved a lot of human brain cells. However, the step of building a model to be tested statistically turned out to be a more difficult task. Computers had to leave it to people for quite a while. This task remained a bottleneck of the process that leads from data to successful business decisions.
Things changed again with an advent of machine learning and knowledge discovery technologies. A number of approaches, such as neural networks, decision trees, genetic algorithms, and case based reasoning, were developed. Computer systems based on these algorithms can learn patterns by analyzing the historical data, and thus predict outcomes of future situations. These machine learning systems can develop hypotheses about the relations hidden in the data and carry out statistical testing of these hypotheses against the data, thus automating the whole data mining process - a major breakthrough. Computers are so fast, accurate and tireless that the technology of data mining backed by machine learning techniques quickly moved from the academic circles to the worldwide business community.
We are entering an epoch of knowledge-driven business decisions. Today many will agree with Eric Brethenoux of Gartner Group that data mining is necessary for survival. The best data mining systems can be controlled by users who are not statisticians. The only thing that is required is a thorough understanding of the application field. Implementation of the data mining technology provides a major enhancement, automation, and simplification of the database marketing cycle. It eliminates a lot of manual effort: thousands of man/hours are recovered. But the real consequences are far more reaching.
In the next section we are going to consider in greater detail how the involved business questions can be answered with the help of the data mining system PolyAnalyst from Megaputer Intelligence, and what features are important to consider when you select a data mining system. Later I return to the motivation why PolyAnalyst has been chosen to illustrate the data mining efforts involved in the database marketing technology.
Sample problem from database marketing
For a database marketer the data mining involves the following stages:
- Formulating a question that you would like to be answered.
- Building a model that determines how the selected target variable is influenced by independent variables.
- Statistical testing of this model against known historical data.
- Repeating the previous two steps until the desired prediction accuracy is achieved.
- Developing a strategy to increase profitability of your market activities based on the developed model.
Assume that we have stored the following customer attributes in our database:
- telephone number
- ZIP code
- age
- sex
- date of the first contact
- total volume of purchases
- last year volume of purchases
- date of the first purchase made
- date of the most recent purchase
- amount of the most recent purchase
- number of the last year mailings to the customer
- annual income
- renter/owner of the place
The process starts by formulating a question of interest. For example, we could be interested in determining what features characterize our best and worst customers in terms of the last year volume of purchases. Then we need to study records describing those customers whose accounts were established more than a year ago, and whom we sent the same amount of mailings last year. Can we find an explicit relation between independent variables and the target variable with a desired accuracy? When such a model is found, we can use it to calculate which customers recruited over the last year are the primer candidates to respond to the next mail offer by making a purchase. The next mailing would be sent primarily to those customers. If we have managed to preserve the response rate, while avoiding mailing the bulk of redundant offers, then we have saved a lot of marketing money.
As a next step we could embark on determining the effectiveness of our marketing efforts. For example, we can try to determine how well our marketing message is accepted by the customers. Again we study only those customers whose accounts are older than a year. However, now we take a different sample of records: we will study those customers who fall in the same response rate group according to the previous model, but received a different number of mailings over the last year. Then we try to build a model of their actual response rate. If the response rate increases with the number of mailings received by the customer, then we are doing fine. However, if the number of mailings the customer has received does is not involved in formula predicting the purchase volume, or if the response rate decreases with the increasing number of mailings, then something is seriously wrong with our marketing message. We might need to take urgent measures to reconsider the message.
Now let us go through the steps involved in the outlined study in greater detail. Along the way we underline those features of a data mining system that are vital for carrying out the project successfully.
- We start by taking a sample of the database containing those customer records for which values of the target variable the last year volume of purchases are known. In addition, we first consider only the records of those customers whose account has been established for over a year and who received exactly two our mailings over the last year. The data mining system you are using should be able to support ODBC standard for communicating with databases to perform that task smoothly.
- Usually you will need to take only a part of the data of the size not larger than ten to fifty thousand records. This is sufficient for deriving a significant and reliable model. The system you are using should be able to process that amount of records. Flexible data manipulation mechanisms should be available. You might need to split your data in several datasets and leave a portion of the data for the model validation purpose, or create a union or an intersection of datasets. For example, if you have reasons to expect that customers of different age groups will respond to the offer differently, you can separate your customers in different age bins and study these groups separately.
- Note that most often you need to process a combination of attributes of numerical, logical, and categorical types. For example, in our data, sex and renter/owner are logical variables, while ZIP code is of categorical type. Make sure the data mining system you are considering supports all these types of attributes. In addition, many database marketing applications contain dates, and thus this data format should be supported by the system explicitly.
- First we need to calculate a number of days that passed since certain events. For example, instead of the date of the first contact, we would need to consider how much time has passed since then. To this end, we have to calculate new variables, subtracting the corresponding dates from todays date. To calculate a new variable in PolyAnalyst you type in the corresponding formula, and click mouse to apply it to your dataset.
-
Next you create a new dataset containing only those variables, that we are
going to include in the exploration. For example, the variables expressing how
much time has elapsed since a certain event need to be included instead of
variables of the type
date
. In addition, we need to exclude variables describing the most recent
purchase, since these variables cannot influence the total volume of purchases
of a customer over the last year. PolyAnalyst offers
Create new dataset
function for performing these actions. You also need to change the type of the
ZIP code
variable from
numerical
to
categorical
when creating the training dataset for further exploration.
- You might have your own marketing experience that you would like to incorporate in the data mining process. The way you do this in PolyAnalyst is straightforward: you just type in your rule using the standard mathematical notation. For example, there may be reasons to believe that single male customers in the 30-35 age window, who are renters with an annual income above $40,000, are in the top 10% of the most likely responders to your offer. Type in the rule producing yes when all of the above holds, no - otherwise. Apply this rule to the training dataset in order to create a new variable that holds your background knowledge about the field. Include this as a new independent variable when launching one of PolyAnalyst exploration engines. This is a hint to the system what rule to try first. If your expectation is correct, the corresponding variable will be strongly involved in the solution. On the other hand, if the system does not include the created variable in the final result, you might need to reconsider your previous experience.
- Next you perform the data cleansing. The world is not perfect, and it is unavoidable that some portion of your data will contain errors. The data mining tool should have some means to detect and separate records containing such errors. The performance of self learning systems is often influenced heavily by the presence of a few far outlying points. Even those data records that contain no errors but correspond to some exceptional cases, better be taken out of the main dataset and studied separately. PolyAnalyst allows you to automatically separate the exceptional records by analyzing the data with the Find Dependencies exploration engine, when the Liberal algorithm option is selected.
- It is a wise practice to include all the available information about your customers in the database and let the data mining system help you deciding which variables are indeed important in predicting the customer behavior and which are not. On the other hand, the processing time of the majority of data mining systems depends critically on the number of independent variables considered. Thus the next task of the data preprocessing with the help of statistical methods should be the determination of those variables which influence your target variable the most. In PolyAnalyst this task is carried out by the Find Dependencies exploration engine with the Strict algorithm option selected.
-
Next comes the most important element of data mining: automated building of an empirical model that describes the dependence of the target variable on independent variables. No matter how good looking a data mining system might be, if its model building module fails to come up with an accurate, reliable, and easy to understand relation that predicts future values of the target variable, this system is virtually useless. You would rather need a data mining workhorse incorporating not just one but several robust engines for a thorough data analysis. The model building module of PolyAnalyst includes four different exploration engines:
- Find Laws
- Find Dependencies
- Classify
- Multiparametric Linear Regression
Several considerations are important when searching for a suitable data mining system:
- Database marketing applications might involve both: tasks of predicting values of a continuous variable, and classification tasks. A prospective data mining system should be able to solve problems of both these kinds.
- The system should automatically perform tests determining statistical significance of the developed model. Arbitrarily complex, even randomly generated data can be explained if one has included a sufficiently large number of free parameters in the model. However, such a model has no predictive power whatsoever. This problem is called overfitting and often plagues neural network based systems.
- A model developed by the system should be easy to interpret. If you cannot understand what knowledge the model contains, how exactly the target variable depends on independent variables, you have little control of the results. You will not be able to use your expertise to check the found model for possible obvious inconsistencies. Neural networks are especially dangerous in this respect: the built model is represented by a black box of the trained network itself. On the contrary, decision trees present the developed rule explicitly in the form of a tree with a classification question in each node. Often though, for real world problems the built trees are so bushy that it becomes very difficult to grasp the meaning of the rule. A much more attractive knowledge representation is provided by the Symbolic Knowledge Acquisition and Evolutionary Programming technology implemented in PolyAnalyst. There the discovered relation is presented explicitly in the form of a formula connecting the target variable to independent variables. Such a formula may involve mathematical relations, as well as logical constructions.
- The system should be capable to find rules of diverse nature. You cannot predict in advance what kind of relation between variables is hidden in your database. Thus you want to be able to try various kinds of relations so you do not miss the important one. How can you be sure that a data mining system is searching in a broad enough space of possible relations? It is a difficult question. In PolyAnalyst the broadness of the search space is guaranteed by the utilization of a universal internal programming language, which has means to express an arbitrary type of algorithm.
- You would gain more control over the data processing if you are using a multistrategy data mining system. Such a system has a whole set of mutually complementing tools, which allow the user to analyze data from different prospectives. The utilization of a combination of data mining methods, as in PolyAnalyst, improves the overall performance of the system.
- An important parameter to consider is an acceptable data processing time. However, this parameter is hard to assess accurately, because the processing time depends heavily on the characteristics of the data studied. PolyAnalyst has the mechanism of Generalizing Transformations, which allows the system to move almost linearly toward the final solution. Avoiding the super-exponential growth of the number possibilities in the world of evolving hypotheses, we dramatically reduce the data processing time.
-
Launch the Find Laws exploration engine, selecting the last year total volume
of purchases to be the target variable, and setting the desired exploration
error to 10%. PolyAnalyst then determines the explicit form of the relation
connecting the target variable to other variables characterizing the customer.
-
As the final output PolyAnalyst generates a report that consists of one text and two graphics windows. The text window displays explicitly the best found model that explains the data most reliably, the accuracy of the model and the significance of the model. The accuracy tells you what is the standard error made when using the found model to predict the target variable. The significance measures the probability that the built model explains the data accidentally. The two graphics windows help you assessing the predicting power of the model visually. Try to understand the meaning of various terms that enter the relation found by PolyAnalyst. This is easy to carry out, because the model is presented to you explicitly in the form of a mathematical relation, which includes some algebraic and logical constructions.
-
The data mining system you are using should provide an easy way to apply the discovered model to the existing and future data. If this operation requires additional programming, this is definitely a drawback. PolyAnalyst allows you to apply the developed model to the data in order to calculate the target variable predicted values by just clicking mouse button.
-
Next you need to visualize the results of your exploration. This is the quickest and the most intuitive way to assess the accuracy and reliability of the discovered relation. PolyAnalyst offers a variety of tools for the data visualization: histograms, two-dimensional graphs, or pseudo three-dimensional graphs with the third dimension represented by the color of the points. Explore the results you have obtained: calculate and plot the target variable values for a test dataset, which has not been used to train the knowledge discovery system.
-
Some systems provide you with additional tools for visualizing the results of the exploration. For example, a unique rule plotting capability is provided to users in PolyAnalyst. Since this system presents the discovered relation in the explicit form, one can plot the predicted target variable against any of the independent variables involved. However, the majority of rules encountered in the real world are multidimensional. Additional variables are represented on the rule graph by the sliders that can be moved in the range where the corresponding variables change. By changing the values of independent variables the user can feel and control the discovered rule, thus gaining a better understanding of the model.
- Report generating and printing capabilities are also important considerations when deciding what data mining system to use. You might need to present the results of your data exploration to your colleagues, or you might wish to have the final graphs and conclusions at hand just for your personal convenience. In either case, the system should provide means to add objects of various nature to the output, arrange and resize these objects on the output page, and add text explaining the objectives of the undertaken investigation, as well as the notation. These printing features are implemented in PolyAnalyst with the help of the Print Form mechanism.
- The last step of the data mining cycle is the development of a marketing strategy based on the discovered model. This is not related to data mining per se. You have formulated a question and the discovered model allows you to answer this question. Now you need to use this knowledge to make a decision what your marketing strategy is going to be. Your final objective, as well as the available resources are going to be additional considerations at that stage, but your acquired knowledge of the customer behavior is now the key element of the marketing strategy. That is why such a scheme is called knowledge driven decision making.
Now we can summarize what has been achieved during the performed data exploration. The main achievement is your newly generated ability to make intelligent marketing decisions based on the model automatically deduced by PolyAnalyst from the existing customer data. This is a newly created value. Prior to the data mining process this value was hidden in the overwhelming amount of raw data. Now you see what customer features, and how exactly, influence the customer response probability. You can increase the response rate on your mailings to those customers whose accounts were established over the last year, since the same model must predict their response rate. Or you can save money by decreasing the redundant mailing. One way or another, the return on your marketing dollar grows. And this is just the first step: for example, carrying out the marketing message effectiveness evaluation task, mentioned before, can further increase the return.
Data storage: Flat file versus database
There is one last issue left to discuss before I proceed to the motivation of why PolyAnalyst has been chosen to illustrate the steps involved in the data mining for marketing applications. This issue is data storage and retrieval.
Database marketing represents an ongoing, self-correcting process, rather than a one-time feat. You will need to update your data, add new variables, extract samples, organize the data according to different criteria. These actions can be performed in principle with a flat file, containing data stored in a Comma Separated Value (CSV) format, or some peculiar application format such as Excel. However, this turns into a serious headache if you have a significant amount of data.
Storing your data in a database requires a larger initial investment and understanding, but it saves you a lot of time when you start working with the data. In addition, this type of data storage facilitates the ease and quality of data analysis, thus providing you with a potential to generate higher future returns on your investment. If you need to store and process very large amounts of data, then you might want to consider data warehouses or datamarts, such as the Visual Warehouse Solution from IBM.
At the same time many data analysis systems would accept only flat files. Does this entail the database technology funeral? Not at all. First, any major database has an option of exporting data into a flat file. Second, you just can be more picky when selecting a data mining tool. Choose the system that supports the standard for the direct communicating with databases - ODBC.
Systems like PolyAnalyst from Megaputer Intelligence or SAS Datamining Tool from SAS Institute have this capability. In addition, some systems are able to support the environment of a specific database or data warehouse. For example, a special version of PolyAnalyst can draw data and store results of the data analysis directly in the IBM Visual Warehouse environment. Such an architecture provides for an automated application of the developed models to new data in the warehouse, thus equipping the user with a complete decision support system.
Why PolyAnalyst?
A direct comparison of PolyAnalyst with all other data mining systems and techniques that exist in the world would require a lot of space. I suggest that those readers who are interested in becoming familiar with the basics of various machine learning approaches would take a look at the web site of Megaputer Intelligence. A good discussion of how different approaches compare to each other also can be found there. Here I will concentrate on the most prominent features that single PolyAnalyst out from other available software.
The most interesting point about PolyAnalyst is that this system utilizes a new machine learning technology which is free of the most serious problems, which reduced the power of older techniques. Today many data miners are aware of the limitations of the existing approaches and are looking for new developments in the field of machine learning that go beyond these limitations. One of the next generation data mining technologies is Symbolic Knowledge Acquisition and Evolutionary Programming.
PolyAnalyst:
- Utilizes this new promising technology. The technology is based on a UNIVERSAL programming language in terms of which PolyAnalyst searches for relations and algorithms connecting various variables. Any algorithm hidden in data can be described by means of this language. This is a major step forward. For example, decision trees possess a far poorer expressive capabilities. Similarly, there exist problems that cannot be even formulated in terms of neural networks.
- Presents discovered relations in explicit SYMBOLIC form. You can readily use the derived relation, evaluate and control how exactly various variables enter the relation. If you have ever tried to understand results contained in a trained neural network, you can appreciate the luxury of having the model in the explicit form.
- A multi-strategy system. A balanced combination of the hypotheses production techniques with statistical preprocessing of data and rigorous evaluation of significance of the obtained results allows PolyAnalyst to find a reliable solution fast.
-
Provides tools for solving:
- Tasks of predicting values of continuous variables
- Determination of a set of variables that influence target variable most significantly
- Classification tasks
- Can work with numerical, integer, logical, or categorical variables.
- Avoids overfitting due to the rigorous statistical testing of the significance of the obtained results.
- Easy-to-use - the user does not have to be a professional in statistics: the complexity of hypotheses generation and evaluation methods is hidden deep inside the system. The object oriented graphical user interface ensures that no programming is involved, just point and click.
- Free evaluation copy is available: you can try the system hands-on. Relying on an outside opinion seems to be time saving, but it is dangerous. Your adviser might be biased or not having a thorough understanding of the nature of your applications.
- Finally, PolyAnalyst is the system that allowed me to carry out quiet a few successful data mining projects for customers in various application fields.
Your own window to the world of knowledge
We have discussed what deep relation exists between the data mining and database marketing. You followed the steps that are involved in the solution of a sample marketing problem. This presentation could have provided you with a first understanding of both: the underlying concept of the advanced data analysis required in database marketing, and the technology involved. Your next steps could be to assess data mining tools that are offered by various vendors, to acquire an advanced, yet easy-to-use software which is well positioned to tackle your specific application, and to try this software hands-on for processing your own marketing data. For example, a free evaluation copy of PolyAnalyst - the data mining software discussed in this article - is available from http://www.megaputer.ru
In conclusion, I would be very happy to receive a feedback from you when you discover the diamonds of knowledge hidden in your data, utilize this knowledge to make informed business decisions - and this brings you the marketing success. Good luck!







