Cancer genome identification tool - create structure charts


Programming Assignment: Cancer Genome Identification Tool

I. Learner Objectives

At the conclusion of this programming assignment, participants should be able to:

Implement pointers and/or arrays

Apply parallel arrays

Compare and contrast pointers and arrays

Pass output parameters to functions

Apply repetition structures within algorithms

Compose C programs consisting of sequential, conditional, and iterative statements

Create structure charts for a given problem

Determine an appropriate functional decomposition or top-down design from a structure chart

II. Prerequisites:

Before starting this programming assignment, participants should be able to:

Analyze a basic set of requirements and apply top-down design principles for a problem

Apply repetition structures within an algorithm

Construct while (), for (), or do-while () loops in C

Compose C programs consisting of sequential, conditional, and iterative statements

Eliminate redundancy within a program by applying loops and functions

Create structure charts for a given problem

Open and close files

Read, write to, and update files

Manipulate file handles

Apply standard library functions: fopen (), fclose (), fscanf (), and fprintf ()

Compose decision statements ("if" conditional statements)

Create and utilize compound conditions

Summarize topics from Hanly&Koffman Chapter 6 including:

o What is a pointer?
o What is an output parameter?

III. Overview & Requirements:

One person dies from cancer every minute in the U.S. (https://cancergenome.nih.gov/). DNA is the chemical responsible for carrying instructions that control cells. When the instructions are not recognized by the cells because of mutations, cells do not function properly. Improper functioning of cells can lead to cancer.

If mutations can be identified, then cancer treatments can be applied. Software may be used to identify mutations in the genome. The genome is the collection of DNA instructions in your cells. Most cells contain two sets of chromosomes, one from your father and one from your mother. Each chromosome has billions of DNA strands that consist of nucleotide bases. The four bases are A, C, G, and T. In the double helix structure of DNA, for a normal cell, the A-T and C-G bases are paired.

For this assignment we simplify our model of the genome. Our goal is to identify mutations in a DNA sequence. We will place our normal DNA sequences and "test" sample sequences in a file called "sequences.txt". The section of the file that represents the normal sequences will be identified by a ‘N' in the file, and the section that represents the "test" sample sequences will be represented by a ‘S'.

Mutations will be identified by mismatched base pairs, such as A-C, A-G, T-C, T-G, C-A, G-A, C-T, and G-T. They will also be identified by changes (flips) in any of the bases from the normal sequence to the sample sequence. Our definition of sequence is base pairs across multiple lines in our file. For example:

N
ATGGAATTCTCGCTC
TACCTTAAGAGCGAG

CGGTCA
GCCAGT

S
TTGGAATTCTAGCTC
AACCTTAAGAGCGCG

CGATGA
GCCACT

The file may contain an unknown number of sequences. However, you may assume that each sequence will not exceed 15 bases as shown above.

Your program must identify each mutation by indicating in which sequence it is found and in which position in the sequence. The results must be written to a file called "mutations.txt". Using the example above, your program would write the following to the file:

Mutation(s) found in sequence 1
Pair 1 flipped pair
Pair 11 mismatched pair
Pair 14 mismatched pair

Mutation(s) found in sequence 2
Pair 3 mismatched pair
Pair 5 flipped pair

BONUS

Generate randomly paired bases to form random DNA sequences that are written to your "sequences.txt" file.

Solution Preview :

Prepared by a verified Expert
C/C++ Programming: Cancer genome identification tool - create structure charts
Reference No:- TGS0813106

Now Priced at $40 (50% Discount)

Recommended (94%)

Rated (4.6/5)