List of some important libc functions that are used in the


Assignment: DNA sequencer

The biology department at UC Davis is looking for an application that can decode sequences of DNA, by locating genes and transcribing the sequence of corresponding proteins.

Genes are substrings of DNA which code for proteins and carry the heritable information from our parents. Genes start with the sequence of three letters ATG, called the start codon, and end with one of the three sequences TGA, TAA, or TAG, called stop codons. The stretch of sequence between the start codon and any of the stop codons is a potential gene.

Each codon codes for an amino acid represented by a letter of the alphabet. There is a total of 19 amino acids. Strung together, amino acids from proteins. A substring of a DNA sequence is a translatable sequence if:

• it has a length that is multiple of three,
• it starts with a start codon and ends with a stop codon
• it can be translated into an amino acid sequence

For example, DNA sequence AATTAAGATGGGGCTCTAAAAT contains such a translatable sequence, starting at the 8th position and of length 12 (ATGGGGCTCTAA), thus consisting of 4 codons. This sequence can be translated using a codon table into the length three amino acid sequence MGL.

Note that the start codon codes for amino acid M while the stop codons don't code for any amino acids.

On the other hand, DNA sequence AATGAATCTAGT is not a translatable sequence.

Write program dna_translate.c that takes two command line arguments: an input file name, containing DNA sequences, and an output file name, in which you will store the translated, protein sequences. For each sequence, the program should identify the longest possible translatable sub-sequence, if one exists, and translate it into a protein using a codon table given in the file codeoflife.txt. See example below.
$ cat codeoflife.txt I ATT I ATC I ATA ... R CGT x TAA x TAG x TGA $ cat dna_seqs.txt aaATttaTggattagcaagcag ACGATGATGATGGGGCCCTAATAGTGATAAAAAACT AAAATAATTTGGA ATGAAATGGTAGATGAAACCCGGGATATGATAG $ ./dna_translate dna_seqs.txt prot_seqs.txt MD MMMGP none MKPGI $

Here are a list of requirements, assumptions and hints:

• This program shall contain no global variables.

• All the dynamically allocated memory should be properly freed by the terminated by the end of the program.

• The translated sequences, in the output file, must be in the same order as the DNA sequences.

• If no translatable sequence is found, none should be outputted.

• We assume that the maximum number of characters a DNA sequence can contain is

• We assume that the DNA sequence file contain only proper sequences ( i.e. strings over {A, C, G, T, a, c, g, t}).

• You are expected to use a linked-list to represent the codon table (as read from file codeoflife.txt).

• You are expected to use a linked-list to represent the list of DNA sequences (as read from the input file).

• You will probably need to split the problem into a few principal functions, such as:

• A function that builds the linked-list of codons, as read from codeoflife.txt.

• A function that builds the linked-list of DNA sequences, as read from the input file.

• You will probably need to think of the order of insertion, in order to keep the same order when outputting the resulting sequences of proteins.

• A function that iterates through all the DNA sequences, and for each, finds the longest translatable sequence from each and outputs the corresponding sequence of proteins in the output file (or none if no translatable sequence was found).

• Two functions that iterate through the two linked-lists and free every dynamically allocated items and any dynamically allocated objects they might contain.

• List of some important libc functions that are used in the reference program: fopen(), fgets(), fprintf(), fclose(), sscanf(), strncpy(), strncmp(), etc.

Request for Solution File

Ask an Expert for Answer!!
Computer Engineering: List of some important libc functions that are used in the
Reference No:- TGS02699356

Expected delivery within 24 Hours