Develop model classify document as relevant or non-relevant


Problem

Classifying Classified Ads Submitted Online. Consider the case of a website that caters to the needs of a specific farming community, and carries classified ads intended for that community. Anyone, including robots, can post an ad via a web interface, and the site owners have problems with ads that are fraudulent, spam, or simply not relevant to the community. They have provided a file with 4143 ads, each ad in a row, and each ad labeled as either - 1 (not relevant) or 1 (relevant).
The goal is to develop a predictive model that can classify ads automatically.

I. Open the file farm-ads.csv, and briefly review some of the relevant and non-relevant ads to get a flavor for their contents.

II. Following the example in the chapter, preprocess the data in R, and create a term-document matrix, and a concept matrix. Limit the number of concepts to 20.

i. Using logistic regression, partition the data (60% training, 40% validation), and develop a model to classify the documents as 'relevant' or 'non-relevant.' Comment on its efficacy.

ii. Why use the concept-document matrix, and not the term-document matrix, to provide the predictor variables?

Request for Solution File

Ask an Expert for Answer!!
Computer Engineering: Develop model classify document as relevant or non-relevant
Reference No:- TGS03332309

Expected delivery within 24 Hours