Let us assume that youve written an ir system that for this


A. Consider these documents:

Consider the table of term frequencies for 3 documents denoted Doc1, Doc2, Doc3 in Figure 6.9. Compute the tf-idf weights for the terms car, auto, insurance, best, for each document, using the idf values from Figure 6.8.

 

Doc1

Doc2

Doc3

car

27

4

24

auto

3

33

0

insurance

0

33

29

best

14

0

17

Figure 6.9 Table tf value

Term

dft

idft

car

18,165

1.65

auto

6723

2.08

insurance

19,241

1.62

best

25,235

1.5

Figure 6.8 Table idf value. The idf's of terms with various frequencies in the Reuters collection of 806,791 documents.

B. Consider these documents:

Compute the vector space similarity between the query "digital cameras" and the document "digital cameras and video cameras" by filling out the empty columns in Table 6.1. Assume N = 10,000,000, logarithmic term weighting (wf columns) for query and document, idf weighting for the query only and cosine normalization for the document only. Treat and as a stop word. Enter term counts in the tf columns. What is the final similarity score?      (Please provide the details of the calculation.)

 

Query

Document

Word

tf

wf

df

idf

qi =wf-idf

tf

wf

di =normalized wf

digital

 

 

10,000

 

 

 

 

 

video

 

 

100,000

 

 

 

 

 

cameras

 

 

50,000

 

 

 

 

 

C. Why is the idf of a term always finite?

D. Sketch the frequency-ordered postings for the data in Figure 6.9.

E. Let the static quality scores for Doc1, Doc2 and Doc3 in Figure 6.11 be respectively 0.25, 0.5 and 1. Sketch the postings for impact ordering when each postings list is ordered by the sum of the static quality score and the Euclidean normalized tf values in Figure 6.11.

F. Derive the equivalence between the two formulas for F measure shown in the following Equation, given that α = 1/(β2 + 1).

F = 1/[α(1/p)+ (1- α)1/R]+= ((β2+1)PR)/(β2P+R)

G. What is the relationship between the value of F1 and the break-even point?

H. Below is a table showing how two human judges rated the relevance of a set of 12 documents to a particular information need (0 = nonrelevant, 1 = relevant). Let us assume that you've written an IR system that for this query returns the set of documents {4, 5, 6, 7, 8}.

Document ID

Judge 1

Judge 2

1

0

0

2

0

0

3

1

1

4

1

1

5

1

0

6

1

0

7

1

0

8

1

0

9

0

1

10

0

1

11

0

1

12

0

1

a. Calculate the kappa measure between the two judges.

b. Calculate precision, recall, and F1 of your system if a document is considered relevant only if the two judges agree.

c. Calculate precision, recall, and F1 of your system if a document is considered relevant if either judge thinks it is relevant.

Request for Solution File

Ask an Expert for Answer!!
Theory of Computation: Let us assume that youve written an ir system that for this
Reference No:- TGS02508926

Expected delivery within 24 Hours