Query Evaluation

MiniSQL Parser

Relax; you don't have to write the parser! On the contrary, a complete parser and type checker for the MiniSQL language is provided with the skeleton code. This small subset of SQL includes the basic commands CREATE, DROP, INSERT, SELECT, UPDATE, DELETE, DESCRIBE, and EXPLAIN. Supported data types include INTEGER, FLOAT, and STRING (notice the slight deviations from the SQL standard). The following (incomplete) list illustrates what is not included in the language:

? Support for NULL values

? Complex expressions / parentheses

o for example WHERE a = b + 1

? Aliasing (i.e. FROM Employees E)

o this means column names should be unique

? DISTINCT and ORDER BY

? Aggregates, GROUP BY, HAVING, etc.

Once a MiniSQL statement is parsed, the abstract syntax tree (AST) is passed to the Optimizer, which in turn dispatches the query to the corresponding class implementing the Plan interface. In this project, you will implement several of these plan classes, using the provided parser and system Catalog.

Part 1: CREATE and DROP INDEX

Your first task is to implement the CreateIndex and DropIndex commands, allowing you to build and destroy hash indexes on tables. Your implementation of these and the remaining Plan classes must meet the following general requirements: 1. You must validate all query input. For example, you should not create an index if the file name already exists. Please review (and call) the appropriate methods in the provided class QueryCheck to fulfill this requirement. 2. Execute the query using the components we have developed throughout the semester. 3. If the query affects the system catalogs (i.e. create/drop statements), then call the

appropriate method(s) in Catalog to maintain them. 4. Each query should print a one-line message at the end, such as "Table Created" or "1 row inserted" -- or anything else you would rather say.

Some useful hints and tips:

? You may want to use CreateTable and DropTable as a reference. (i.e. these are provided to demonstrate how to use the parser)

? Don't forget that CREATE INDEX should actually build the hash index! (i.e. don't just rename the word "table" in CreateTable to "index")

Part 2: INSERT and SELECT

Your next task is to implement the Insert and Select classes, allowing you to create and query actual data. Remember to fulfill the general requirements listed in part one. For Select, you will implement a basic query optimizer. The parser will give you an array of table names to select from, an array of (unique) column names to project, and an array of selection predicates (in conjunctive normal form). The basic plan is to use FileScans and SimpleJoins for all the tables, add a Selection for each conjunct, and have one Projection at the root of the Iterator tree. For example: Given the tables T1(a, b) and T2(c, d) and the following query: EXPLAIN SELECT d, a FROM T1, T2 WHERE a = c and b = d or a = 5; The default (naive) execution plan is as follows: Projection : {3}, {0} Selection : b = d OR a = 5 Selection : a = c OR a = 5 SimpleJoin : (cross) FileScan : T2 FileScan : T1

(Note how the conditions of the WHERE clause change into the predicates)

In order to get the full credit, you need to implement two optimization techniques from the book:

? Pushing selections: if the predicates of a selection involve only the attributes of one table, you should execute the selection before the join

? Using hash indexes: this is a simple version of using hash indexes from section 12.5.2 of the book. If a predicate involve some indexed fields of a table, you should use KeyScan whenever possible.

Some more useful hints and tips:

? Think carefully about any side effects of INSERT statements. (i.e. what if the table you're inserting into has indexes?)

? The main goal of Select's constructor is to create an Iterator query tree. (i.e. all you need to do in execute() is call iter.explain() or iter.execute())

? For testing purpose, your execution plan will be printed before the returned results.

Note: The Selection class has been changed from the previous project. The default logical operator for the set of predicates is now OR

Part 3: Personal Extension (extra credit)

Your final task, should you choose to accept it, is to extend the system you have created by implementing one or more of the following features: ? Incorporate your previous Project 2 code The libraries of the previous projects are given to you. However, you will get extra credit if you use your own code. For example, you can use your Projection of the previous project to avoid duplicates; or use your own BufferManager and HeapFile from the first project. You may need to modify your own codes to match the new skeleton. ? Improved Optimization Although you could spend several years on this sort of work, one reasonable enhancement is to maintain catalog statistics (i.e. record counts) and use this information to determine what order to join the tables. ? Sort Merge Join Another good enhancement to the naive query optimization in part two is to use Sort Merge Joins (from Project 3) whenever possible. You will need to demonstrate when your optimizer will use it and when the optimizer will not use it using test cases of your own. ? Additional Features You may also implement any number of the remaining MiniSQL commands, including: Describe, Update, and/or Delete.

Please note in your project report what you chose to implement for this part of the project (if any), and a basic description of any special features you would like the grader to know about. This is your chance to shine! The more you impress the grader, the more extra credit you might receive.

Getting Started

Skeleton code and documentation is available here. Note that this code is a complete starting point for the project, i.e. the bufmgr, heap, index, and relop packages are provided for you (in jar files). Note that the framework classes may be sightly different than the ones used in previous projects.

Running and testing this project will be quite different from the others. Instead of using an automated test driver, you will run the provided command-line utility, Msql, which resembles the behavior of SQL*Plus. Several test queries are

provided with the skeleton code, but more queries will be tested at the time of grading. Use the STATS command to view performance counters. The Msql program can receive input from the command line, or from a file. In the latter case, you need to provide the file name as an input parameter. For example:

java -classpath bin global.Msql mytest.sql

If you use an IDE, feel free to set up your run configuration to accept that input parameter.

Turnin

You should turn in copies of your code together with the Makefile, a readme file and a project report if you implement the extension part. All files need to be zipped in a file named: your_career_login1_your_career_login2_qe.zip.

In the readme file, put the name of your group members, and the feature you would like me to know. I should be able to compile/run your program using make.

The directory structure of your zip file should be identical to the directory structure of the provided zip file (i.e., having the directory src, the Makefile, ...), except the top-level name (should be your career login above). Your grade may be deduced 5% off if you don’t follow this. In the readme file, make sure to state the roles of each member in the group (i.e., who did what).

   Related Questions in Database Management System

©TutorsGlobe All rights reserved 2022-2023.