Data retrieval skills in context of data processing system


Assignment Problem: Information Retrieval Techniques Problem Solving Task

Course Learning Outcome:

A) Demonstrate data retrieval skills in the context of a data processing system.

B) Discipline-specific knowledge and capabilities

Assignment Purpose:

This task evaluates the student's technical skills in the management of unstructured data, with potential usage in real applications.

Problem 1: Suppose you have joined a search engine development team to design a search algorithm based on both the Vector model and the Boolean model.

You have collected the following (3) documents (unstructured) and plan to apply an index technique to convert them into an inverted index.

Doc 1: data science is a field to use scientific method, process, algorithm, system to extract knowledge.

Doc 2: data mining is the process to discover pattern in large data to involve method at the database system.

Doc 3: information system is the study of network of hardware and software that people use to process data. To answer the below Problems, you have to provide the detailed procedures step by step.

Problem 1.1: In the process of creating the inverted index, please complete the following steps: Remove all stop words and punctuation. The list of stop words for this task is provided as follows: Is, An, That, Use, And, To, From, In, Both, Of, At, The

Problem 1.2: Create a merged inverted list including the within-document frequencies for each term.

Problem 1.3: Use the index created as above to create a dictionary and the related posting file.

Problem 1.4: Please design three Boolean queries, (e.g., web AND search) and list the relevant documents for each query. Each query must contain at least two keywords while no one keyword appears in one document only.

Problem 1.5: Please use the Vector model to query on the inverted index, and compare the result with the Boolean model. (Hint: you can use cosine similarity and set a similarity threshold).

Problem 2 (IR Evaluation)

In this Problem, you are required to evaluate the performance of different search engines. First, please find two search engines you are familiar with, such as Google, Bing, Yahoo!, etc.

Second, please choose one target from the following list, and design two queries to search in both search engines. So both query 1 and query 2 have to be tested in both search engines.

Target 1: obtain the new features of the new iPad. Target 2: obtain the user manual for installing Tera Term. Target 3: obtain a tutorial how to install Oracle SQL. Target 4: obtain the features of the new Xbox one.

Third, select the first 20 results in both search engines, if they return the target, then mark them as relevant documents, otherwise, they are irrelevant. We can assume there are 12 relevant documents in total (retrieved and not-retrieved). If you think there are more relevant documents to be searched, you can use higher expected relevance as threshold.

The following Problems are based on your search results.

Problem 2.1: List your target, results and designed search queries (You can use any keywords you think are related to the target). For each result, you can click the link and go to the page, and take the screenshot if you think this result is relevant. At your report, you are required to provide the screenshots and detailed explanation why they are relevant to your queries.

Problem 2.2: Get the precision and recall values for 20 documents for query 1 in search engine 1. Interpolate them to 11 standard recall levels. Then plot them into a chart. Get the precision and recall values for 20 documents for query 1 in search engine 2. Interpolate them to 11 standard recall levels. Then plot them into a chart.

Problem 2.3: Get the precision and recall values for 20 documents for query 2 in search engine 1. Interpolate them to 11 standard recall levels. Then plot them into the same chart as above. Get the precision and recall values for 20 documents for query 2 in search engine 2. Interpolate them to 11 standard recall levels. Then plot them into the same chart as above.

Problem 2.4: Now find the average interpolated precision of query 1 and query 2 for search engine 1 and plot it into the same chart. So you will have total of 3 interpolated curves in one single chart. Now find the average interpolated precision of query 1 and query 2 for search engine 2 and plot it into the same chart. So, you will have total of 3 interpolated curves in one single chart.

Problem 2.5: Plot the average interpolated values for Search Engine 1 and Search Engine 2 on one single chart, and compare the algorithms in terms of precision and recall. Which search engine do you think is superior? Why?

Are you not able to complete your academic tasks within the given deadline, then Database and Information Retrieval Assignment Help service is the best option for students for excelling their academic growth.

Tags: Database and Information Retrieval Assignment Help, Database and Information Retrieval Homework Help, Database and Information Retrieval Coursework, Database and Information Retrieval Solved Assignments

Attachment:- Database and Information Retrieval.rar

Request for Solution File

Ask an Expert for Answer!!
Database Management System: Data retrieval skills in context of data processing system
Reference No:- TGS03037977

Expected delivery within 24 Hours