Write a two-pass assembler for a subset of the mips


Project

This assignment will reinforce your knowledge of the assembly process. You will need to go through all of the steps of converting an assembly source file to object code.

Your goal is to write a two-pass assembler for a subset of the MIPS instruction set. It should be able to read an assembly file from the command line and write the object code to standard output. You can make the following assumptions:

- The code segment will precede the data segment
- The source file will contain no more than 32768 distinct instructions
- The source file will define no more than 32768B of data
- The source file will not contain comments
- There will be no whitespace between arguments in each instruction
- Each line may have a symbolic label, terminated with a colon

Table 1 provides a list of the assembly directives that your assembler must recognize. Table 2 provides a list of the instructions that your assembler must recognize. Be sure that you note the arguments for each instruction. It may be helpful to refer to Appendix A.10 when writing your parser.

Table 1. List of Assembly Directives

Directive

Explanation

.text

Place items following this directive in the user text segment

.data

Place items following this directive in the data segment

.word w1,w2,...,wn

Store n 32b integer values in successive words in memory

.space n

Allocate n bytes of space in memory, initialized to zero

Table 2. List of MIPS Instructions

Mnemonic

Format

Args

Descriptions

addiu

I

 

Add immediate with no overflow

addu

R

3 (rd, rs, rt)

Add with no overflow

and

R

3 (rd, rs, rt)

Bitwise logical AND

beq

I

 

Branch when equal

bne

I

 

Branch when not equal

div

R

2 (rs, rt)

Signed integer divide

j

J

 

Jump

lw

I

 

Load 32b word

mfhi

R

1 (rd)

Move from hi register

mflo

R

1 (rd)

Move from low register

mult

R

2 (rs, rt)

Signed integer multiply

or

R

3 (rd, rs, rt)

Bitwise logical OR

slt

R

3 (rd, rs, rt)

Set when less than

subu

R

3 (rd, rs, rt)

Subtract with no overflow

sw

I

 

Store 32b word

syscall

R

0

System call

In addition to the instructions above, your assembler must be able to resolve symbolic labels. These labels may be targets used for changes in the control flow (branch or jump instructions) or as names for memory elements. The way labels are handled differs depending on their usage. Targets for branch instructions should be referenced as the location of the target in memory relative to the current instruction (remember that the PC points to the next instruction). For example, consider the code below:

00400400 :

400400:

400404:

1100000c

00000000

beqz nop

t0,400434

400408:

40040c:

400410:

01084021

1100fffc 00000000

addu beqz nop

t0,t0,t0 t0,400400

 

400414:

400418:

40041c:

01084021

1100fff9 00000000

addu beqz nop

t0,t0,t0 t0,400400

 

400420:

400424:

01084021

1100fff6

addu beqz

t0,t0,t0 t0,400400

 

400428:

00000000

nop

 

 

40042c:

11000001

beqz

t0,400434

400430:

00000000

nop

 

 

00400434 :

  400434: 00000000 nop

You can see that the forward branches to L5 (in pink) have distances of 12 and 1. If you count the instructions from the two branch instructions, you can see that the actual numbers of instructions are 13 and 2 - the PC will have already advanced to the next instruction. The same is true for the backward branches to L4 (the non-colored branches). The branches use two's complement for the target calculations, so the first branch, 0x1100fffc, is at an offset of 0xfffc from the target. If you calculate the decimal value, you should get -4, which is the distance of the label from the PC.

Targets for jump instructions should use the absolute location of the target. For example, assume that label L1 is located in memory at 0x400370. The instruction j L1 will resolve to j 400370.

Data labels should be referenced by their offset from the global pointer, $gp, which is assumed to point to the start of the data segment.

You should use the linprog servers for all of your compilation and testing. Your output should match mine exactly. You can determine if the results are identical by calculating the md5sum or by using diff. You must use C/C++ as your language and your solution should be a single file (e.g. ch03c.pr01.c or ch03c.pr01.cpp). You should submit this file through Blackboard. Your program should have comments inline and a header at the top. For example:

/**
* @file main.cpp
* @author hughes <>, (C) 2014, 2015, 2016
* @date 05/11/16
* @brief Simple MIPS assembler
*
* @section DESCRIPTION
* This program implements an assembler for a subset
* of the MIPS assembly language. Can compile with debug
* by including -DDEBUG in the compiler options.
************************************************************/

Please test your output against the results from the sample binary before submission. The test script uses md5 and diff to compare your output with the baseline. Your submissions will also be processed for plagiarism. The script will use the following for compilation: g++ -Werror -mtune=generic -O0 -std=c++11

If you write it in C instead of C++, the script will use gcc -Werror -mtune=generic -O0 -std=c11

You can access my binary using the following command:
~chughes/cda3101/assembler

There is an example assembly program below in Figure 1 along with the machine code. You can access the assembly source at ~chughes/cda3101/test01.s and the object code at ~chughes/cda3101/test01.obj. You should note that the machine code is in hexadecimal.

 

.text

addu $s0,$zero,$zero addu $s1,$zero,$zero addiu $v0,$zero,5 syscall

sw $v0,n($gp)

 

L1:

lw $s2,n($gp) slt $t0,$s1,$s2 beq $t0,$zero,L2

addiu $v0,$zero,5 syscall

addu $s0,$s0,$v0 addiu $s1,$s1,1 j L1

 

L2:

addu $a0,$s0,$zero addiu $v0,$zero,1 syscall

addiu $v0,$zero,10 syscall

 

.data n: .word 0

m: .word 1,9,12

q: .space 10

 

00008021

00008821

24020005

0000000c af820000 8f920000

0232402a

11000005

24020005

0000000c

02028021

26310001

08000005

02002021

24020001

0000000c

2402000a

0000000c

00000000

00000001

00000009

0000000c

00000000

00000000

00000000

00000000

00000000

00000000

00000000

00000000

00000000

00000000


Figure 1 - Sample source code (left) and object code (right)

A second test file is included in the directory and is named test02.s. These are samples and are not the inputs that will be used for grading. Feel free to write your own inputs and share them via the discussion boards. If you find an error in assembler, please let me know (extra credit)!

While you are free to use any string parsing method you choose, you may find it helpful to use the getline function. getline extracts characters from an input stream and stores them in a string until a delimiter is reached or a newline character is found.

istream& getline (istream& is, string& str);

For example, the code below discards whitespace at the current pointer, reads a line from the input, and pushes the line to a list as a string type.

do
{
std::ws(asmFile); std::getline(asmFile, lineIn);

sourceCode.push_back(lineIn); //add to the list of instructions from source
}while(asmFile.eof() == 0);

You may also find the Boost tokenizer class useful. The tokenizer will parse the input sequence and break the sequence into pieces, depending on a delimiter. The code below takes an input string, input, and seperates it based on the characters defined in delimeter. The for-loop then iterates through those tokens.

boost::char_separator delimeter(", ()");
boost::tokenizer< boost::char_separator< char > > tokens(input, delimeter);

for(boost::tokenizer< boost::char_separator >::iterator it = tokens.begin(); it != tokens.end(); it++)
{
//stuff
}

These are just some of the tools that I used in my solution; you are not required to use them! C/C++ has plenty of functions that you may find useful such as fgets and sscanf. Be creative!

I don't know how many pages it would be since it is programming and the details in the file i uploaded

Solution Preview :

Prepared by a verified Expert
Assembly Language: Write a two-pass assembler for a subset of the mips
Reference No:- TGS01411030

Now Priced at $130 (50% Discount)

Recommended (98%)

Rated (4.3/5)