Complete as many exercises from the book as necessary to


Python Regular Expressions and Dictionaries

Module Overview

In this module you'll use Python regular expressions to parse /scratch/go- basic.obo and put the fields in a dictionary with GO id as the key. The GO records in this file are multiline, so you'll need to use a record separator other than the newline character. Unlike Perl, Python doesn't allow you to change the record separator for the readline() method, so you'll read the whole file in with the read() method and then use a regular expression to split the file into records.

Since you want to find all the records with your regular expression, you'll need to use re.findall(r"your regex here",goFile, re.DOTALL), which returns a list of matches. re.DOTALL tells Python to match across line breaks with .*.

Required Reading

- Python for Biologists Chapter 7
- Python for Biologists Chapter 8

SwissProt Parser

The code shown below parses a SwissProt file. SwissProt records are multi-­-line, so it's very similar to what you need to do to parse go-­-basic.obo.

1478_Figure.jpg

Assignment

Complete as many exercises from the book as necessary to understand the concepts. These will not be graded. The graded part of the assignment is to use regular expressions to parse /scratch/go-basic.obo and put the results in a dictionary. Your program should be written for Python3 and named

~/BIOL6200/Module10/parseGoInfo.py.

- Parse the GO id, name, namespace, and is_a values for each term.
- Create a string with namespace on the first line followed by a line for name, and one line per is_a.
- Put the string as the value in a dictionary where go_id is the key.
- Iterate over the keys in the dictionary, printing go_id followed by a tab, then the string containing the name, namespace, and is_a values.
- Create a function for splitting the file into records, and a function for splitting the records into fields.
- Your output should look something like this:

16_Figure1.jpg

Request for Solution File

Ask an Expert for Answer!!
Python Programming: Complete as many exercises from the book as necessary to
Reference No:- TGS01693896

Expected delivery within 24 Hours