As a popular nosql database cassandra has many features


Part 1: 170-word critical response with references

As a popular NoSQL database, Cassandra has many features that assure data availability and fault tolerance. In a distributed setting, Cassandra's replication strategy relies on proper data distribution on virtual nodes where data is assigned to a physical machine. It also requires partitioning of that data across the distributed cluster which can help to determine the replication strategy needed. The last component is a snitch that defines the topology for the replication strategy in where replicated data should be placed on the machines in the cluster.

As Cassandra implements continuous hashing, this allows for the distribution of data across the cluster to succeed reorganizing of data if a node has been introduced or removed. Because Cassandra stores replicas across multiple nodes, this also combats fault tolerance as a copy is always available of the original data.

Cassandra operates on two replication strategies. Simple strategy replication is very straight forward. It simply places the initial replica of the data on a node determined by the partitioner. This could be any node in the cluster. After the first replica is placed, the following replicas are placed in a circular fashion to the next nodes as depicted below. This is normally used in single data centers.

Boddu(2018) Simple Strategy Replication.

The other strategy Cassandra uses is Network Topology Strategy in which a cluster might consist of several data centers in various locations. The number of replicas desired are determined for each data center, and then the replicas are placed within the same data center in a clockwise manner to reach the first node of another rack. This approach attempts to keep replicas on distinct racks as nodes sharing racks can fail at the same time leaving the system voluble to data loss.

Boddu(2018) Network Topology Strategy Replication.

Part 2: 170-word critical response with references

How is data consistency maintained in Cassandra?

For consistency maintenance in Cassandra we are concerned with the consistency of reading and writing data to and from data center nodes from clients. Cassandra works on a peer to peer (within the data nodes/center(s) cluster), read and write from any data center node architecture (Understanding Data Consistency in Apache Cassandra, 2011). Clients can connect any data center to read/write their data. All writes are partitioned and replicated automatically.

Data writes are tracked on a commit log and written to a memtable in memory. This process continues until the memtable is full. Once the memtable is full the data is sent to a SSTable (sorted strings table). All columns in a row are updated or none.
CAP Theorem:

Cassandra uses tunable consistency, which means that the client application chooses the level of consistency for any read or write (Turnable Consistency | Learn Cassandra, n.d.). The write consistency level in Cassandra is the number of replicas the write must successfully accomplish before returning a successful acknowledgement back to the client application.

• Write consistency:

o Any - write to at least one node or a hinted handoff if all nodes are down

o All - write must be written to commit log and memtable on all replicas

o Each_Quorum - write must be to the commit log and memtable on a quorum (majority i.e. 3 out of 5) of relplicas in all data centers

o Local Quorum - same as each quorum but is only concerned with the same data center as the cooridinator node

• Read consistency:

o All - returns record with most recent timestamp after all replicas have responded

o Each_Quorum - same as above but after a quorum of replicas have responded

o Local_serial - confined to data center

o Local_Quorum - after current data center quorum of replicas respond

o One - returns read from closest replica

o Two - returns read from most recent data from two closest replicas

o Three - returns read from most recent data from three closest replicas

As you can see Cassandra uses a flexible consistency level that is tuned by the client application, which will affect the level of availability along with other factors.

Part 3: 170-word critical response with references

Graph Database Compared to Relational and NoSQL databases
Relational database has ruled since 1970 to 2009 when NoSQL database gained attraction due the shortcomings of relational database model. This happened because of the new ideas that were coming up leading to many factors that contributed to different solutions solved by NoSQL. However, NoSQL does not handle some of the relationships well. When the data is large and complex, it becomes highly expensive. The graph database takes over now, creating the best computational speed to solving this cost.

How relational database stores manage graph and connect data

In simple terms, this is how relationships work within relational database. Without doubts, relational databases lack relationships although it only appears when modelling as a means of joining tables. The design of schema will then make a few queries to run easily but other will become more difficult. To tackle this resulting problem, recursive joins will be used but still will make query syntactically complex and so does the computation. Furthermore, the schema proves to be too rigid and too brittle. This problem of schema is mainly solved by creating sparsely populated tables with many nullable columns and code to handle exceptional cases so as to increase coupling. However, it will destroy semblance of cohesion.

How NoSQL Database stores manages graph and connect data

NoSQL also lacks relationship. Instead it uses aggregate identifier to link data. This is because most of NoSQL database stores sets of disconnected: values, documents, columns that makes them hard to be used to connect data and graphs. Hence this will require adding aggregate identifier at the application level which becomes prohibitively costly.

Schema are also weak points as the aggregate identifiers don't point backwards when faced with expensive data questions that should be retrieved form the data store. Therefore, aggregate stores need to employ inherently latent methods for creating and querying relationships outside the data model. This is all because aggregate doesn't support index free adjacency nor do it support data consistency. (Robinson, Webber, &Eifrem, 2015).

Comparison of relational and NoSQL Database to that of Graph Database.

Relational databases stores data in tables and they use foreign key and JOIN operations to determine relationships while the graph database uses direct links for relationships where "Edges" allow for index free adjacency and tremendous flexibility.

For NoSQL databases, aggregate does not maintain consistency of connected data nor does it support index free adjacency, where elements contain direct links to their neighbors, while in graph database use index free adjacency for direct links in relationships.

Graph databases make data modelling natural which makes it so special compared to relational and NoSQL databases. This means you can build your model as you would think of the data intuitively.

Graph database handles symmetric data very well compared to NoSQL and relational databases.

Conclusion on Relational Database Lack Relationships.

The authors explain clearly how lack of relationships in relations database leads to difficulty in computation and space complexity of recursively joining tables. This is all because relationship exist only at modelling time. This will result to designed schemas that may be easy of difficult. That explains why relational database should not always be considered as some queries maybe more difficult to retrieve answers and hence highly expensive. This will cost in production level because they perform very slow when compared to graph databases.
References
Robinson, I., Webber, J., &Eifrem, E. (2015). Graph databases: new opportunities for connected data. " O'Reilly Media, Inc.".

Part 4: 170-word critical response with references

Neo offers a Recommendation Engine as part of their graph database solutions. Online web stores are the ideal benefactors from this technology. The role of the graph database technology in this environment is to guide customers to products or services based upon their interests or history. Under the graph database model, each customer would be a node. Each customer node would have properties, such as name, geographical location, age, preferences that the user entered in a profile, and purchase or search history. Other products and services would also be nodes, with their respective properties forming relationships (or edges) with the customer nodes that share those properties.

The power of graph databases allow for these relationships to be formed without their explicit representation being designed by an engineer. For example, a product that a customer with similar purchase history could be recommended to another customer even if that customer did not specifically search for that product. The benefits work for both the customer and the host - customers shopping online now take for granted that the product they are searching for will show prominently in the results even if they don't type the exact brand, model number, or name.

The dynamic creation of types of relationships lend themselves well to graph databases, as their strengths lie in connecting seemingly unlike data together in very large sets that are rapidly changing and growing. Always-on availability that this product promises ensures that recommendations can be made even during partial outages - if part of the environment becomes unavailable, there is likely still enough information to make meaningful recommendations to customers. As such, graph databases such as Neo4j perform better the more information they have available to them.


Attachment:- Response Paper.rar

Solution Preview :

Prepared by a verified Expert
Management Information Sys: As a popular nosql database cassandra has many features
Reference No:- TGS02720260

Now Priced at $30 (50% Discount)

Recommended (98%)

Rated (4.3/5)