Designing a visual search algorithm


PROJECT SUMMARY:

The objective of this project is to design a visual search algorithm (VSA) that looks for an object in a video clip. The VSA takes in an image and a video as input, where the image contains an object of interest (OOF) against a plain background. The VSA then has to search for the OOF in the video clip, and output the frames that contain the object as well as its location in those frames.
This is a recognition challenge, not an identification one. So if the input image is a face for example, the algorithm has to find all faces in the video. If the input image is a bicycle, the algorithm has to find all bicycles.
The VSA’s score will be based on its performance, speed, and visualization capabilities. The performance of the VSA will be assessed through a text file that the VSA will have to output. The speed of the VSA does not include the time it takes to visualize its output.

TECHNICAL PARAMETERS

•    The input image’s resolution will be , and will be in BMP format
•    The object in the input image will be of reasonable complexity (not a simple shape like a beach ball)
•    The object in the input image will be against a white background
•    The input video’s resolution will be , and will be in AVI format
•    The input video will be 10 seconds long, and may be composed of more than one scene

Occurrences of the OOF in the video:

•    Will be the same size or smaller than that of the input image, but not larger
•    Will have the same view as that of the input image, with a maximum angle of affine transformation of
•    Will not have an occlusion of greater than
•    Only one instance of the OOF will occur in any given frame of the video

VSA OUTPUT:

The VSA should output a text file, output.txt, which contains every occurrence of the OOF in the video as follows:
CN R1 C1 R2 C2

Where:

FN is the frame number
R1 is the starting row of the upright rectangle bounding the OOF
C1 is the starting column of the upright rectangle bounding the OOF
R2 is the ending row of the upright rectangle bounding the OOF
C2 is the ending column of the upright rectangle bounding the OOF

For example, assume the below figure on the left is frame number 23 in the video, and the VSA detects the OOF and highlights it as below. For the purposes of this competition, the VSA should always return the location of an upright rectangle bounding the object, regardless of the affine transformation of the object, as shown on the right. The VSA would add to output.txt the following line:

23 24 30 150 200

152_VSA.jpg

TESTING PROCEDURE:

Each team must package their VSA as an executable that runs on a windows machine without needing any other software to run. If any .dll files are required, then these files should be supplied by the team as well. It is advisable to compile the visualization as a separate executable as by competition definition the VSA speed does not include visualization. The input image and input video will be placed in the same directory as the VSA. The input image will be “InImage.bmp”, and the input video will be “InVideo.avi”

The VSA will be tested using a script that runs as follows:
• Start clock
• Run VSA executable
• Stop clock and record time
• Read output.txt and record for grading
• Run visualization executable

Performance
For each correctly identified frame, depending on the % overlap between the rectangle they specify and the ground truth rectangle. The correctly identified frame is as follows:

                  TP = 1- cos(90* OL)
                  Where OL is:

                                OL= (overlapping area/ area of ground truth table)+ (overlapping area/area of VSA suppliedarea)

An example is shown below in figure. Assume total number of overlapping pixels is 180, the ground truth rectangle is 220 and the supplied rectangle is 260 pixels. The frame would be:

                               OL= (180/220) +(180/260) =1.51

                               TP=1- cos(90*1.51 ) 1.72

1195_Overlapping pixels.jpg

Therefore, a VSA’s performance score is:

                            P= [50* (ΣTP- ΣFP- ΣFN/P max)]

The timing resolution of the script will be 1ms.

Request for Solution File

Ask an Expert for Answer!!
Electrical Engineering: Designing a visual search algorithm
Reference No:- TGS0510

Expected delivery within 24 Hours