Bio380 practical computing for biologists assignment, Computer Engineering

Bio380 practical computing for biologists assignment

Practical Computing for Biologists Assignment: Integrating Tools: Python + UNIX

The last assignment asked you to build a script that calculated the hydropathy scores across a protein sequence using a "sliding window" approach. Highly positive regions for hydropathy scores are proposed to indicate regions of a protein predicted to be associated with transmembrane domains, providing both information on the function of the protein and its cellular location. However, a "sliding window" approach is not the only way to estimate whether a protein is predicted to have a transmembrane domain. One of the other more well-known methods is implemented in a program called TMHMM (Transmembrane Hidden Markov Model). Without going into the details of this method (see iLearn site if you're really interested - I'll post a paper there), this approach does a global analysis to predict, based on amino acid sequence, if there are regions of a protein that are highly likely to be associated with a secondary α-helix structure with certain characteristics; those regions are then predicted to be transmembrane domains. In this assignment, you compare the predictions of using hydropathy scores to predict transmembrane proteins with those of a program like TMHMM. To do this you will examine a subset of proteins from the ~20,000 reference proteins for humans in the SwissProt database and will do the following:

1) Modify the provided template script from the previous assignment to allow arguments to define the input file name (changes each time script is run), a threshold value for determining "transmembrane" prediction and window size to be used in the python script from UNIX command line interface. Also, change the code such that instead of outputting the scores for each bin, only output the name of the gene (it's the base name of the fasta file) if it has any score for a bin greater than an argument define threshold.

2) Provide a UNIX script that can run your python script for the following 9 different parameter combinations on the ~1000 sequences you are assigned and generate individual output files for each parameter set with names of genes that had hydropathy scores above the assigned thresholds: window size=9, threshold=30; window size=15, threshold=30; window size=21, threshold=30; window size=9, threshold=40; window size=15, threshold=40; window size=21, threshold=40; window size=9, threshold=50; window size=15, threshold=50; window size=21, threshold=50.

3) Compare your results with the TMHMM predictions for all human reference proteins (Hint: Think "regular expressions"). I only want responses to questions here. Not code.

a. How many of your hydropathy-score predicted transmembrane proteins are also predicted by TMHMM.

b. How many did hydropathy predict, but not TMHMM?

c. How many did TMHMM predict, but not hydropathy-scrore?

d. Assume that the TMHMM analysis is the standard of accuracy, let's calculate the type-1 (false positive) and type-2 (false negative) errors associated with using the hydropathy method. Type-1 errors would be where hydropathy values predicted transmembrane domains but TMHMM does not, while type-2 errors would be where TMHMM predicted transmembrane domains but hydropathy scores do not. Compare the proportions of false positives and negatives of all runs and describe which conditions (window size and threshold value) are best for using hydropathy values to predict transmembrane proteins. Make sure use evidence for this.

Attachment:- Assignment Files.rar

View Complete Question

Request for Solution File

Ask an Expert for Answer!!

Computer Engineering: Bio380 practical computing for biologists assignment

Reference No:- TGS02699572

Expected delivery within 24 Hours

Have a Question? (oR Write a Review)

Write atleast 100 words!!

Request for Solution File

Ask an Expert for Answer!!

Computer Engineering: Bio380 practical computing for biologists assignment

Reference No:- TGS02699572

Have a Question? (oR Write a Review)

Recent Questions Asked Computer Engineering

Q : What is the relationship between interest rates and future

Q : Review the informatics groupsspecialty areas public

Q : For this weeks assignment you are to create a list of 10

Q : Describe the behavioral and mental outcomes that accompany

Q : Bio380 practical computing for biologists assignment

Q : The degree of sensitivity responsiveness of

Q : What are the justifications given in favor of more

Q : Essay explaining why you want to earn the shrm

Q : How does the elastic or inelastic demand influence pricing

Assign the most appropriate cpt procedure code

Finger-to-nose test allows assessment of what

Post a description of the healthcare organization website

Problem about healthcare organization reviewed

Discuss about purchased an electronic health record system

Nearing the end of indigenous health in canada

Potassium has which of the following effects

Request for Solution File

Ask an Expert for Answer!!

Computer Engineering: Bio380 practical computing for biologists assignment

Reference No:- TGS02699572

Recent Questions Asked Computer Engineering

Q : What is the relationship between interest rates and future

Q : Review the informatics groupsspecialty areas public

Q : For this weeks assignment you are to create a list of 10

Q : Describe the behavioral and mental outcomes that accompany

Q : Bio380 practical computing for biologists assignment

Q : The degree of sensitivity responsiveness of

Q : What are the justifications given in favor of more

Q : Essay explaining why you want to earn the shrm

Q : How does the elastic or inelastic demand influence pricing

Asked Questions