Create a correlation matrix for all the variables


Assignment: Business Analytics

In the wake of the Enron scandal in 2002 two public accounting firms, Oscar Anderson (OA) and Trice-Milkhouse-Loopers (TML), merged (forming OATML) and are reviewing their methods for detecting management fraud during audits. The two firms had each developed their own set of questions that auditors could use in assessing management fraud.

To avoid a repeat of the problems faced by Enron's auditors, OATML wants to develop an automated decision tool to assist auditors in predicting whether or not their clients are engaged in fraudulent management practices. This tool would basically ask an auditor all the OA or TML fraud detection questions and then automatically render a decision about whether or not the client company is engaging in fraudulent

Ragsdale Ch10 - 101 activities. The decision problem OATML faces is really two-fold: 1) Which of the two sets of fraud detection questions are best at detecting fraud? and, 2) What's the best way to translate the answers to these questions into a prediction or classification about management fraud?

To assist in answering these questions, the company has compiled an Excel spreadsheet (the file Fraud.xlsm accompanying this book) that contains both the OA and TML fraud detection questions and answers to both sets of questions based on 382 audits previously conducted by the two companies (see sheets OA and TML, respectively). (Note: for all data 1=yes, 0=no.) For each audit, the last variable in the spreadsheet indicates whether or not the respective companies were engaged in fraudulent activities (i.e., 77 audits uncovered fraudulent activities, 305 did not).

You have been asked to perform the following analysis and provide a recommendation as to what combination of fraud questions OATML should adopt.

1. For the OA fraud questions, create a correlation matrix for all the variables. Do any of the correlations pose a concern?

2. Using the 8 questions that correlate most strongly with the dependent fraud variable, partition the OA data with oversampling to create a training and validation data sets with a 50% success rate in the training data. (Use the default seed of 12345.)

3. Use each of Analytic Solver Data Mining's classification techniques to create classifiers for the partitioned OA dataset. Summarize the classification accuracy of each technique on the training and validation sets. Interpret these results and indicate which technique you would recommend OATML use.

4. For the TML fraud questions, create a correlation matrix for all the variables. Do any of the correlations pose a concern?

Ragsdale Ch10 - 102

5. Using the 8 questions that correlate most strongly with the dependent fraud variable, partition the TML data with oversampling to create training and validation data sets with a 50% success rate in the training data. (Use the default seed of 12345.)

6. Use each of Analytic Solver Data Mining's classification techniques to create classifiers for the partitioned TML dataset. Summarize the classification accuracy of each technique on the training and validation sets. Interpret these results and indicate which technique you would recommend OATML use.

7. Suppose OATML wants to use both fraud detection instruments and combine their individual results to create a composite prediction. Let LR1 represent the logistic regression probability estimate for a given company using the OA fraud detection instrument and LR2 represent the same company's logistic regression probability estimate using the TML instrument. The composite score for the company might then be defined as C = w1LR1 + (1 - w1)LR2 where 0 ≤ w1 ≤ 1. A decision rule could then be created where we classify the company as non-fraudulent if C is less than or equal to some cut-off value, and is otherwise considered fraudulent. Use Solver's evolutionary optimizer to find the optimal value of w1 and the cut-off value that minimizes the number of classification errors for the training data.

What do you obtain for w1 and the cut-off value? Summarize the accuracy of this technique for the training and validation data sets. How do these results compare with the logistic regression results in questions 3 and 6?

8. What other techniques can you think for combining OA's and TML's fraud detection questionnaires that might be beneficial to OATML?

Format your assignment according to the following formatting requirements:

1. The answer should be typed, double spaced, using Times New Roman font (size 12), with one-inch margins on all sides.

2. The response also includes a cover page containing the title of the assignment, the student's name, the course title, and the date. The cover page is not included in the required page length.

3. Also include a reference page. The Citations and references should follow APA format. The reference page is not included in the required page length.

Attachment:- Fraud.rar

Request for Solution File

Ask an Expert for Answer!!
Applied Statistics: Create a correlation matrix for all the variables
Reference No:- TGS03022919

Expected delivery within 24 Hours