Writing program to create-search of heap-sorted file using c


1 Introduction

In this assignment, you will carry out a number of exercises to investigate the creation and searching of heap and sorted files.

General Requirements

This section contains information about the general requirements that your assignment must meet. Please read all requirements carefully before you start.

1. You must implement your programs in C. Your programs must be well written, using good coding style and including appropriate use of comments.

2. Your programs may be developed on any machine, but must compile and run on yallara.

3. Any code you submit must be able to be built using a single Makefile with the command:

> make all

Tasks

a) Writing a heap file

Write a program to create a heap file that holds the records currently in the file /scratch/ DatabaseSystems/DATA/data 2012 on yallara. The source records are variable-length. However, the heap file should hold fixed-length records. Create the new records according to the schema given in Table 1. All attributes with Unsigned Int type must be stored in binary, e.g. if the value of ID is equal to 70, it must be stored as 70 (in decimal) or 46 (in hexadecimal; in C: 0x46). It must not be stored as the string “70”, occupying two bytes. Your heap file is therefore a binary file. For simplicity, the heap file does not need a header (containing things like the number of records in the file or a free space list). The file should be packed, i.e. there is no gap between 2 records.

Table : Relation schema.

Attribute name          Data type           Size (bytes)
NAME                        Char                      13
RACE                         Unsigned Int             2
CLASS                       Unsigned Int             2
ID                             Unsigned Int             4
GUILD                          Char                     30
Total size:                                               51

Note that you will need to ensure that the size of each record matches the size shown in Table . To ensure that records are correctly packed in memory, you may need to use the following directive in your code before you define your structure:

#pragma pack(1)

The executable name of this program must be wHeap and should be executed using the command:
> ./wHeap data_2012 heap pagesize

where data 2012 is the input file, heap is an output file to which your converted data is written, and pagesize is an integer specifying how many records fit into a “page” of your file.

Your program should write out one “page” of the file at a time (for example, with a pagesize of 100, you would write out 100 records to disk at a time). Your wHeap program must not output anything to stdout.

b) Search on a heap file

Look at the file /scratch/DatabaseSystems/DATA/search 2012. This is a text file containing search key values; each entry is a particular ID (in the schema given above). You are to simulate searching over a heap file, with different assumptions for the size of file pages.

Write a program to perform equality search operations on the heap file produced by your wHeap program in Section a. The executable name of this program must be sHeap and it must be able to be executed using the command:

> ./sHeap search_2012 heap pagesize

where search 2012 is the name of the file containing the keys to be searched for; heap is the output file of our wHeap program; and pagesize is an integer value that specifies the size of the disk page that you are simulating.

Your program should read in the file, one “page” at a time. For example, if the pagesize parameter is 100, your program should read in the first 100 records from disk. These can then be scanned, in-memory, for a match. If a match is found, print the matching record to stdout.
You should assume that ID is a primary key. If no match is found, read in the next pagesize records of the file. The process should continue until either a matching record is found, or there are no more records in the file to process.

If a match is found, the program must print the matching record to stdout. If no match is found, a suitable message should be printed. In addition, the program must always output the total time taken to do all the search operations in milliseconds to stdout. For example, if the time taken to do the reading is 123.45 ms, the output would be:

Time: 123.45 ms

c) Writing a sorted file

Write a program to create a sorted file that stores the records currently in the file /scratch/ DatabaseSystems/DATA/data 2012 on yallara. You may modify your code from Section

Records should use the same fixed-length schema given previously, and should again be written in binary. When inserting the records into your new file, they should be sorted on an appropriate attribute. You will need to choose a sensible sorting algorithm, appropriate to the data that you are dealing with. You should implement this sorting algorithm yourself.

The executable name of this program must be wSort and it must be able to be executed using the command:

> ./wSort data_2012 sorted pagesize

where data 2012 is the input file; sorted is an output file to which your converted data is written; and pagesize is an integer specifying the size of a page of the file (that is, the number of records that can be stored per page).

Like wHeap, your wSort program must not output anything to stdout.

d) Search on a sorted file

Write a program to simulate searching over a sorted file, with different assumptions for the size of file pages. Write a program to perform equality search operations on the sorted file produced by your wSort program in Section c. The executable name of this program must be sSort and it must be able to be executed using the command:

> ./sSort search_2012 sorted pagesize

where search 2012 is the name of the file containing the ID keys to be searched for; sorted is the output file of our wSort program; and pagesize is an integer value that specifies the size of the disk page that you are simulating.

Your program must take advantage of the assumption that sorted is a file whose structure has been created by sorting on the ID key. Your program should read in required parts of the file, one “page” at a time. For example, if the pagesize parameter is 100, your program should fetch 100 records in a single read from disk. These can then be scanned, in-memory, for a match.

If a match is found, the program must print the matching record to stdout. If no match is found, a suitable message should be printed. In addition, the program must output the total time taken to do all the search operations in milliseconds to stdout. For example, if the time taken to do the reading is 123.45 ms, the output would be:

Time: 123.45 ms

Experiments and Analysis

In this section, you will be asked to carry out a number of experiments and to analyse your results. Create a file called report.txt. Use this file to record your answers to the following questions:

a) Searching with a heap file

• Put a heading “Equality search (heap file)” in your report.

• Run your sHeap program with pagesize settings of: 100; 1,000; 10,000. For each of these pagesize settings, run your program 10 times. Create a table in your report, and record the timing results, including the date and time of each run.

• Calculate the average and standard deviation of the running times for each pagesize, and record them in your report.

b)Searching with a sorted file

• Put a heading “Equality search (sorted file)” in your report.

• Run your sSort program with pagesize settings of: 100; 1,000; 10,000. For each of these pagesize settings, run your program 10 times. Record the timing results in a table in your report, including the date and time of each run.

• Calculate the average and standard deviation of the running times, for each pagesize, and record them in your report.

c) Comparison of approaches

• Put a heading “Comparison of approaches” in your report.

• With reference to your experimental results above, discuss the advantages and disadvantages of the heap and sorted file organisations. Do the trends change for different page sizes? Are the results what you would have expected to see based on your theoretical understanding of these file organisations? Why or why not? Limit your discussion to half a page.

d) Theory

• Answer the following questions.

1. Suppose that instead of equality searches, you were carrying out range searches. Would you expect your results to change? Which of the file organisations would you prefer? As part of your discussion, you should demonstrate your understanding of the properties of the file organisations. Limit your discussion to half a page.

2. Now, suppose that instead of equality searches, you were inserting new records into the file. Which of the file organisations would you prefer? As part of your discussion, you should demonstrate your understanding of the properties of the file organisations.

Request for Solution File

Ask an Expert for Answer!!
Programming Languages: Writing program to create-search of heap-sorted file using c
Reference No:- TGS01943

Expected delivery within 24 Hours