Discuss the features of aoache hive for data


Part 1: 200 words with reference

Discuss the features of Hadoop

Part 2: 100-125 words with references

Topic Question:

Discuss the features of Aoache Hive for data warehousing

Discussion Post:

Apache Hive is an open-source data warehouse system for querying and analyzing large datasets stored in Hadoop files. Hadoop is a framework for handling large datasets in a distributed computing environment

Hive has three main functions: data summarization, query, and analysis. It supports queries expressed in a language called HiveQL, which automatically translates SQL-like queries into MapReduce jobs executed on Hadoop. In addition, HiveQL supports custom MapReduce scripts to be plugged into queries.

Hive also enables data serialization/deserialization and increases flexibility in schema design by including a system catalog called Hive-Metastore. Hive supports text files, SequenceFiles (flat files consisting of binary key/value pairs) and RCFiles (Record Columnar Files which store columns of a table in a columnar database way.

Reference

What is Apache Hive? - Definition from WhatIs.com. (n.d.).

Critical Reply with reference:

Part 3: 100-125 words with references

Topic Question:

Most inputs are validated by some combination of completeness checks, format checks, range checks, check digits, consistency checks, and database checks. Provide and explain in detail at least two validation methods for these inputs.

Discussion Post:

Input validation is a key function of a data entry system. A check for completeness is often done automatically by specifying a data field not be empty. A required data element such as a birth date or address, which must be present for a record to be valid, should make use of such a validation.

However, it is also important to verify that the proper type of information is present and of proper format. A numeric field should contain numeric data just as a text field should contain text. It is very difficult to validate text data that is unknown, such as a street address.

Although ascertaining the validity of such entries is difficult, a developer must also protect the system against malicious code injections by ensuring certain text strings are not present. SQL injection attacks are carried out by inserting specific text strings into data fields, coercing the database to perform action not available to users. It is the responsibility of the developer to ensure such string are not present in text fields before submitting them to the database.

References

Dennis, A., Haley Wixom, B., Roth, R. (2012). System Analysis and Design, Fifth Edition.

Critical Reply with reference:

Part 4: 100-125 words with references

Topic Question:

Explain the reasons why the study of HCI has become increasingly important for systems analysts and for the SDLC.

Discussion Post:

HCI, or Human Computer Interaction, has become increasing important as computers continue to become more important to the everyday life of humans. The way that humans have used computers has arguably reshaped the thought process of the entire millennial generation from that of generation X, and the market for computer software and systems is continually growing.

In its most basic sense, HCI is the study of human and computer interaction and activities ("HCI", 2017). The goal of HCI is to understand how interactions with computers affect individuals thinking and behaviors to attempt to determine the proper way to make computer systems safe to use, easy to understand, promote productive use, and ensure that computer use is enjoyable.

To this end, many elements of HCI focus on improving the interfaces through which people interact with computer software and hardware to ensure that users can pick up and intuitively understand how to operate complex systems.

With regards to the impact of HCI on the System Development Life Cycle (SDLC), HCI has become one of the most important considerations because it helps ensure that users are able to get use value out of a system without having to undergo training or push past a learning curve that could discourage user participation.

Unless you are creating a government or corporate system, in which, use is mandated by the employer, people will always have the option of using a separate software platform that may not be as developed or sophisticated on the back end, but may support better HCI. And, without user support, most systems will fail.

Reference

HCI (2017). Technopedia.

Critical Reply with reference:

Part 5: 100-125 words with references

Topic Question:

Discuss the features of Hadoop

Discussion Post:

Hadoop is a framework or platform made for large amount of data and used to perform analytical tasks, searches, data retention, log file processing and more.

Hadoop has two main components HDFS and MapReduce. HDFS which stands for Hadoop Distributed File System, replicates data across multiple nodes which increases data reliability. MapReduce performs computations on large data sets. It divides the computations into parts and assigns each to worker nodes. A quick glance at the characteristics by Sindol (2014):

Here are the prominent characteristics of Hadoop:

Hadoop provides a reliable shared storage (HDFS) and analysis system (MapReduce).

Hadoop is highly scalable and unlike the relational databases, Hadoop scales linearly. Due to linear scale, a Hadoop Cluster can contain tens, hundreds, or even thousands of servers.

Hadoop is very cost effective as it can work with commodity hardware and does not require expensive high-end hardware.

Hadoop is highly flexible and can process both structured as well as unstructured data.

Hadoop has built-in fault tolerance. Data is replicated across multiple nodes (replication factor is configurable) and if a node goes down, the required data can be read from another node which has the copy of that data. And it also ensures that the replication factor is maintained, even if a node goes down, by replicating the data to other available nodes.

Hadoop works on the principle of write once and read multiple times.

Hadoop is optimized for large and very large data sets. For instance, a small amount of data like 10 MB when fed to Hadoop, generally takes more time to process than traditional systems.

Sindol, D. (2014, January 30). Big Data Basics - Part 3 - Overview of Hadoop.

Solution Preview :

Prepared by a verified Expert
Management Information Sys: Discuss the features of aoache hive for data
Reference No:- TGS02524984

Now Priced at $20 (50% Discount)

Recommended (92%)

Rated (4.4/5)