Develop a test function to check several cases to make sure


Problem

1. Extract the gene names from column 9 of the GFF3 file by vectorized regular expression parsing. These gene names will be saved into a vector whose length is the total number of annotation lines in the GENCODE file. This step is an overhead needed to be run only once for the same GFF3 file.

2. Sort the gene name vector alphabetically using the sort() function in R. However, in order to track the original row number of each sorted gene, we name the vector by their row numbers before sorting. This is also a overhead step. The sorted vector should be saved for future use, and regenerated only if a new GENCODE release is to be used.

3. Write a logarithm search function to report the range of sorted names that are identical to the query gene. The input is a gene name and a sorted gene name vector. The output is a range, which is a vector of two elements -- beginning and ending indices of the query gene in the sorted vector. As the vector is sorted, all elements in the range in the vector is equal to the query gene. If the gene is not found, the function returns NULL. You will make sure that the run time must be O(log n), where n is the length of the sorted vector. The runtime must also be independent of how many times the query gene shows up in the sorted vector.

4. With the range from step 3, extract the rows of the gff3 data frame to form an new data frame which contains all annotation regarding the query gene.

5. Develop a test function to check several cases to make sure the function is correct. The test function should check more than the correct number of rows containing the give gene name, because the total number can still be correct if the exact row numbers are wrong.

6. Report the run time of the above logarithm search on the entire GENCODE annotation with three genes of your choice.

7. Report the runtime for the first three steps. Compare the run time of step 3 with the for-loop, apply, and vectorized operation implementations of linear search.

Turn in your R source code files and a summary of the run time recorded for the algorithms.

Request for Solution File

Ask an Expert for Answer!!
Computer Engineering: Develop a test function to check several cases to make sure
Reference No:- TGS02779587

Expected delivery within 24 Hours