Constructing a float type


In this assignment you will construct a Float8 type, containing one field data, an 8-bit UInt8:

In [6]:

import Base: show, bits, exponent, significand,

+, -, *, /, Float64

type Float8

   data::UInt8

end

For this types, we will interpret the 8-bits in data as: 1 sign bit, 3 exponent bits and 4 signficiand bits. That is, numbers will be represented by

x = ± 2q-s * (1.b1b2b3b4)2

Where S = 3, q is an insigned 3-bit integer in the 2nd through 4th bits and b1b2b3b4 are in the last 4 bits. We will not implement Inf.NaN or subnormal numbers. We will however implement both ±0.

We will use a globally defined constant s to represent S:

In

const S=UInt8(3)  # The shift

Out[8]:

0x03

We will use the bits to get access to the bits. The following function defines bits for a Float8:

In [11]:

function bits(x::Float8)

    bits(x.data)

end

Out[11]:

bits (generic function with 6 methods)

Exercise 1:

(a) Complete the following exponent function, that returns

q−S

q−S

(as an integer).

(b) Check that

exponent(Float8(UInt8(123)))

returns 4.

In [18]:

function exponent(x::Float8)

    ## TODO: interpret the bits and return an

integer

end

Out[18]:

exponent (generic function with 6 methods)

Exercise 2:

(a) Complete the following significand function, that returns the significand of a Float8. Don't forget to incorporate the sign bit.

(b) Add comments explaining the purpose of the first 4 lines.

(c) Check that

significand(Float8(UInt8(123)))

returns 1.6875.

In [28]:

function significand(x::Float8)

    if x.data==0

        0.0

    elseif x.data==128

        -0.0

    else

        bts=bits(x)

        #TODO: convert to 8-bits in bts to a

Floating point

    end

end

Out[28]:

significand (generic function with 6 methods)

Excercise 3:

(a) Use exponent and significand to complete the definition of Float64(x::Float8), which converts a Float8 to a Float64

(b) Check that

Float8(UInt8(123))

now displays as 27.0f8.

In [35]:

function Float64(x::Float8)

    #TODO: return a Float8 that equals the input

end

function show(io::IO,x::Float8)

    print(io,Float64(x))

    print(io,"f8")

end

Out[35]:

show (generic function with 106 methods)

Exercise 4

(a) Complete the following chop_to_8_bits function that returns a string for normal numbers containing the 8-bits for the Float8 representation. For this question, you can simply chop the significand bits of a Float64. (Recall that a Float64 has 1 sign bit, 11 exponent bits and 52 significand bits.)

(b) Add comments explaining the definition of

Float8(::Float64).

(c) Check that

Float8(1.25)

returns 1.25f8.

(d) Explain why

Float8(1.3)

returns the same number.

In [47]:

function chop_to_8_bits(x::Float64)

    #TODO: returns a string containing 8-bits for

the Float8 approximation to x

end

function Float8(x::Float64)

    if x===0.0

        Float8(UInt8(0))

    elseif x===-0.0

        Float8(UInt8(128))

    else

        Float8(parse(UInt8,chop_to_8_bits(x),2))

    end

end

Out[47]:

Float8

Exercise 5

Complete the following function that negates a Float8:

In [51]:

function -(x::Float8)

    #TODO: return the bits corresponding to -x

end

Exercise 6

(a) Complete the following algebra operations, ensuring that each

one returns a Float8. You can use Float64(x::Float8) and

Float8(x::Float64) to use the inbuilt Float64 arithmetic.

(b) Check that

Float8(1.25)+Float8(2.25)

returns 3.5f8

In [51]:

function +(x::Float8,y::Float8)

    #TODO: return x+y as Float8

end

function *(x::Float8,y::Float8)

    #TODO: return x*y as Float8

end

function /(x::Float8,y::Float8)

    #TODO: return x/y as Float8

end

function -(x::Float8,y::Float8)

    #TODO: return x-y as Float8

end

Out[51]:

- (generic function with 205 methods)

Exercise 7

(a) Implement the following routine round_to_8bits that rounds to the nearest Float8, rather than chops.

(b) Check that

Float8(parse(UInt8,round_to_8_bits(1.3),2))

returns 1.3125f8.

In [61]:

function round_to_8_bits(x::Float64)

    # TODO: Round a Float64 to a Float8

end

Out[61]:

round_to_8_bits (generic function with 1 method)

Request for Solution File

Ask an Expert for Answer!!
Programming Languages: Constructing a float type
Reference No:- TGS01238402

Expected delivery within 24 Hours