Solved: Ict707 big data assignment - explore feature extraction and, Database Management System

Ict707 big data assignment - explore feature extraction and

Big Data Assignment -

Regression Models - Regression models are concerned with target variables that can take any real value. The underlying principle is to find a model that maps input features to predicted target variables. Regression is also a form of supervised learning.

Regression models can be used to predict just about any variable of interest. A few examples include the following:

Predicting stock returns and other economic variables
Predicting loss amounts for loan defaults (this can be combined with a classification model that predicts the probability of default, while the regression model predicts the amount in the case of a default)
Recommendations (the Alternating Least Squares factorization model from Chapter 5, Building a Recommendation Engine with Spark, uses linear regression in each iteration)
Predicting customer lifetime value (CLTV) in a retail, mobile, or other business, based on user behavior and spending patterns

In the different sections of this chapter, we will do the following:

Introduce the various types of regression models available in ML

Explore feature extraction and target variable transformation for regression models
Train a number of regression models using ML
Building a Regression Model with Spark
See how to make predictions using the trained model
Investigate the impact on performance of various parameter settings for regression using cross-validation

Types of regression models - The core idea of linear models (or generalized linear models) is that we model the predicted outcome of interest (often called the target or dependent variable) as a function of a simple linear predictor applied to the input variables (also referred to as features or independent variables).

y = f(w^Tx)

Here, y is the target variable, w is the vector of parameters (known as the weight vector), and x is the vector of input features. w^Tx is the linear predictor (or vector dot product) of the weight vector w and feature vector x. To this linear predictor, we applied a function f (called the link function). Linear models can, in fact, be used for both classification and regression simply by changing the link function. Standard linear regression uses an identity link (that is, y = w^Tx directly), while binary classification uses alternative link functions as discussed here.

Spark's ML library offers different regression models, which are as follows:

Linear regression
Generalized linear regression
Logistical regression
Decision trees
Random forest regression
Gradient boosted trees
Survival regression
Isotonic regression
Ridge regression

Regression models define the relationship between a dependent variable and one or more independent variables. It builds the best model that fits the values of independent variables or features.

Linear regression unlike classification models such as support vector machines and logistic regression is used for predicting the value of a dependent variable with generalized value rather than predicting the exact class label.

Linear regression models are essentially the same as their classification counterparts, the only difference is that linear regression models use a different loss function, related link function, and decision function. Spark ML provides a standard least squares regression model (although other types of generalized linear models for regression are planned).

Assignment -

1. Utilising Python 3 Build the following regression models:

Decision Tree
Gradient Boosted Tree
Linear regression

2. Select a dataset (other than the example dataset given in section 3) and apply the Decision Tree and Linear regression models created above. Choose a dataset from Kaggle.

3. Build the following in relation to the gradient boost tree and the dataset choosen in step 2

Gradient boost tree iterations
Gradient boost tree Max Bins

4. Build the following in relation to the decision tree and the dataset choosen in step 2

Decision Tree Categorical features
Decision Tree Log
Decision Tree Max Bins
Decision Tree Max Depth

5. Build the following in relation to the linear regression and the dataset choosen in step 2

a) Linear regression Cross Validation

Intercept
Iterations
Step size
L1 Regularization
L2 Regularization

b) Linear regression Log (see section 5.4)

6. Follow the provided example of the Bike sharing data set and the guide lines in the sections that follow this section to develop the requirements given in steps 1, 3, 4 and 5.

Attachment:- Assignment Files.rar

View Complete Question

Solution Preview :

Prepared by a verified Expert

Database Management System: Ict707 big data assignment - explore feature extraction and

Reference No:- TGS02913194

Now Priced at $55 (50% Discount)

Recommended (92%)

Rated (4.4/5)

Have a Question? (oR Write a Review)

Write atleast 100 words!!

Solution Preview :

Prepared by a verified Expert

Database Management System: Ict707 big data assignment - explore feature extraction and

Reference No:- TGS02913194

Have a Question? (oR Write a Review)

Recent Questions Asked Database Management System

Q : Why earth is not shaped like a cube describe 4 reasons with

Q : Discuss how you might use one or more of the simple

Q : Discuss how you might use one or more of the simple

Q : What is magnitude of the average force experienced by one

Q : Ict707 big data assignment - explore feature extraction and

Q : A particle starts from the origin at t 0 with an initial

Q : What is the ideal speed to take a 80 m radius curve banked

Q : What should be the angle between the transmission axes of

Q : What is the time necessary for crossing if the hunter

Discuss signs and symptoms of hpv related cancer

Describe structured multimodal pain management program

Discuss client with severe atherosclerotic disease

Reflect on the definition and goal of ebp

Examine the process of putting a new policy into place

Essential information for early childhood professionals

Discuss about the value of examining your personal biases

Solution Preview :

Prepared by a verified Expert

Database Management System: Ict707 big data assignment - explore feature extraction and

Reference No:- TGS02913194

Recent Questions Asked Database Management System

Q : Why earth is not shaped like a cube describe 4 reasons with

Q : Discuss how you might use one or more of the simple

Q : Discuss how you might use one or more of the simple

Q : What is magnitude of the average force experienced by one

Q : Ict707 big data assignment - explore feature extraction and

Q : A particle starts from the origin at t 0 with an initial

Q : What is the ideal speed to take a 80 m radius curve banked

Q : What should be the angle between the transmission axes of

Q : What is the time necessary for crossing if the hunter

Asked Questions