Dats 6103 introduction to data mining - devise a book


There are two files uploaded to Blackboard - BX-Books.csv and BX_Book-Ratings.csv. The former contains information about a variety of books, and the latter file contains several hundred thousand book ratings from the Book Crossing Website.

Use R to devise a book recommendation system for the data uploaded to Blackboard. In particular, develop a system that can recommend up to three books for an arbitrary user that can be entered into R after sourcing your code. Develop such a system using both a:

(a) User-based collaborative filtering approach. Use Euclidean, Manhattan, correlational, and cosine similarity distance measures. What problems (if any) do you run into?

(b) Item-based collaborative filtering approach. Use an adjusted cosine similarity approach as discussed in class. How does this approach compare to the user-based approach?

To load the data into R you will need to use the read.csv function. (i.e. read.csv(filename,header=TRUE)). Please type in ?read.csv" to the R console to see the syntax if you would like further info regarding the function's syntax.

Make your programs functions, where the names of users, can be entered into the R prompt.

(c) What are some general problems with both approaches? Conceptually speaking, how can these issues be ameliorated?

Hints:

- There is some flexibility with respect to how you construct the details of your recommendation system beyond your nearest neighbor algorithm. For example, you may use more than one nearest neighbor to make your algorithm better and you can weight the distances appropriately as discussed in class. Please feel free to discuss what your code is doing in a Word document or PDF and submit that along with your assignment. This will make it easier for the grader to understand the logic behind your algorithm.

- Make sure your program ignores zero values for the purposes of computing distances. Otherwise your recommendation system will be influenced by unrated books

- Use an estimated rating of above 5 as a threshold for the recommendation system.

- If your model cannot provide any recommendations for a particular individual, then please have it say so. You can discuss this in (c).

Request for Solution File

Ask an Expert for Answer!!
Applied Statistics: Dats 6103 introduction to data mining - devise a book
Reference No:- TGS01708120

Expected delivery within 24 Hours