What are the similarities-differences in the performance


Assignment:

Introduction

Analytics Portfolio Exercise Supervised Data Mining Capstone

The purpose of this assignment is to demonstrate your knowledge and understanding of the analytical techniques and tools learned in the course and to show your understanding of how it relates to a business scenario. This assignment is somewhat different from previous ones: I do not give you detailed instructions on how to build your analytical process in RapidMiner. Instead, you are expected to use what you have learned in the course to do the modeling, validation and performance analysis on the given dataset, answer the questions about the analysis, and make recommendations for the business based on the results.

Perform the necessary tasks using RapidMiner, answer the questions below and prepare the required screenshots.

The attached file contains a dataset collected by a phone company about customer attrition or churn, which occurs when a customer cancels their service and (possibly) signs up with a competitor. The company is interested predicting who will churn so that it can take steps to keep these customers. Look at the data and make some recommendations based on the findings of your analysis.

Here is the explanation of the variables in the dataset:

a. Gender_Female: female or not

b. PhoneService_Yes: whether the customer has phone service with the company

c. MultipleLines_No: whether the customer has no multiple line service

d. MultipleLines_Yes: whether the customer has multiple line service

e. InternetService_DSL: whether the customer has DSL internet

f. InternetService_Fiber optic: whether the customer has Fiber optic internet

g. TechSupport_Yes: whether the customer signed up for tech support service

h. TechSupport_No internet service: whether the customer had no internet service

i. StreamingTV_Yes: customer streams TV

j. StreamingTV_ No internet service: customer had no internet service for TV streaming

k. StreamingMovies_Yes: customer streams movies

l. StreamingMovies_ No internet service: customer had no internet service for movie streaming

m. Contract_One year: type of contract for customer: 1 yr

n. Contract_Two year: type of contract the customer: 2 yr

o. PaperlessBilling_Yes: whether the customer signed up for paperless billing

p. PaymentMethod_ Electronic check: payment made by Electronic check

q. PaymentMethod_ Bank transfer: payment made by Bank transfer

r. PaymentMethod_ Credit card: payment made by credit card

s. Retired: 0 for not, 1 for yes

t. Tenure (months): how long has been a customer with the company

u. MonthlyCharges: $ amount of monthly payments for the subscribed services

v. Churn: Whether the customer churned (assume that 'positive' means that the customer churns)

1. As a first step, build at least 3 models using different classification techniques that are capable of classifying customers into 2 categories (churn/no churn.) At least one of the 3 models should be either a decision tree or a neural network. Make sure that you build the process to include the cross-validation operator (it is enough to have 3 folds in each validation to save some process runtime). Where possible, experiment with the parameter settings of the model operators to try to improve the model's performance.

Make readable screenshots of the following for all 3 models:

- Processes

- Parameter settings for the type of model (Decision Trees, Neural Networks, etc.)

- Appropriate model results (i.e. Coefficients, Tree, Network, etc.)

2. For measuring the performance of the 3 models look at the following performance measures:

- Accuracy

- Kappa

- Precision

- Recall

- Lift

- AUC (NOT the optimistic or pessimistic)

a. Make a screenshot of the confusion matrix output of each of the 3 methods.

b. Prepare a table to report the above values for the 3 models.

c. Discuss the performance for each of the 3 models based on the above values. Relate the performances to the a priori probabilities of the outcome as well.

d. Suppose that if you can correctly predict that a customer will churn (a true positive), you can make a special promotional offer that will result in the customer staying with you. You estimate that the net profit from such an outcome is $200 per customer. However, if you predict a customer will churn when they actually wouldn't have (a false positive), you incur a net cost of $50 per customer by unnecessarily making the offer. Using this information, compute and show a profit/cost matrix for each model and report the per-customer expected value of each model.

e. Prepare a visual evaluation of the 3 models by including a screenshot of the ROC comparison chart.

f. Using the above information, compare the performance of the 3 models. What are the similarities/differences in their performance?

3. Answer the following questions:

a. Which attributes seem to matter the most? How do you know it? Discuss their importance and/or effect sizes. How can you interpret the results of the models?

b. Are the 3 models giving you more or less same suggestions? Do the models agree in most aspects? If there are differences, what are they?

4. What business recommendations can be given based on the results? How could the results of the model(s) be useful for the company?

Attachment:- Data.rar

Solution Preview :

Prepared by a verified Expert
Management Information Sys: What are the similarities-differences in the performance
Reference No:- TGS02059037

Now Priced at $40 (50% Discount)

Recommended (95%)

Rated (4.7/5)