• +1 510-870-8668, 510-298-5936, 510-796-2400
  • Login

Cassandra

Cassandra offers robust support for clusters spanning multiple datacenters. Highly scalable and high in performance. Cassandra has peer to peer distributed system across its nodes and data is distributed among all the nodes in a cluster.  Each node is independent and interconnected to other nodes. A 100% availability and the lowest total cost of ownership can be found only in one database i.e., Cassandra. It is the right choice when you need scalability and high availability without compromising performance. The distribution of the data across multiple machines in an application- transparent matter is the responsibility of Cassandra. It will automatically repartition as machines are added and removed from the cluster.

Miri Infotech is launching a product which will configure and publish Cassandra, to produce free implementations of distributed or otherwise scalable and high availability which is embedded pre-configured tool with Ubuntu and ready-to-launch AMI on Amazon EC2 that contains Cassandra and Hadoop. 

One of the amazing thing about Cassandra which it’s users enjoy is that there is no single point of failure in its database as well as no network bottlenecks.
Apache Cassandra is not only limited to this but is also a free and open-source distributed NoSQL(Not Only Sql) database management system which is designed to handle large amounts of data across many commodity servers. The level of its performance is too high which makes it more powerful and flexible to be easily used by everyone.

It uses a concept of nodes where there is no such thing called master node but instead all nodes play identical role, communicating with each other equally. Thousands of concurrent users or operations per sec are handled here.

Many companies have successfully deployed and benefited from Apache Cassandra such as Constant Contact, CERN, Comcast, eBay, GitHub, GoDaddy, Hulu, Instagram, Intuit, Netflix, Reddit, The Weather Channel, and over 1500 more companies that have large, active data sets.

Cassandra has peer-to-peer distributed system across its nodes, and data is distributed among all the nodes in a cluster. All the nodes in a cluster play the same role. Each node is independent and at the same time interconnected to other nodes. Each node in a cluster can accept read and write requests, regardless of where the data is actually located in the cluster. When a node goes down, read/write requests can be served from other nodes in the network.

Some Features of Cassandra are:

  • FAULT TOLERANT

Replication of data is automatically done to multiple nodes for fault-tolerance. Replication across multiple data centers is supported. Failed nodes can be replaced with no downtime.

  •  PERFORMANT

Cassandra consistently outperforms popular NoSQL alternatives in benchmarks and real applications, primarily because of fundamental architectural choices.

  • DECENTRALIZED

There are no single points of failure. There are no network bottlenecks. Every node in the cluster is identical.

  • SCALABILITY

Designed to have read/write throughput both increase linearly as new machines are added with the aim of no interruptions.

  • DURABLE

Cassandra is suitable for applications that can't afford to lose data, even when an entire data center goes down.

  • TUNABLE CONSISTENCY

Writes and reads offer a tunable level of consistency, all the way from “writes never fail” to “block for all replicas to be readable, with the quorum level in the middle.

  • MONITORING & ALERTING

Node monitoring and alerting for events of interest including performance and latency, disk capacity and node responsiveness. Customized alerting is also possible through our monitoring architecture.

  • UPDATES & PATCH MANAGEMENT

Our high-availability architecture ensures continuous operations through node upgrades including database version upgrades. We also provide continued monitoring and testing of for patch and security updates and we apply these to your nodes as required. All with zero downtime.

This is not the end to its features as if we dig much deeper, there would be more and more stuff related to it.

Now, without knowing what all is responsible or what all it takes to make Cassandra such an amazing database, we should give a study on its components.

Components of Cassandra are:

The key components of Cassandra are as follows −

  • Node − It is the place where data is stored.
  • Data center − It is a collection of related nodes.
  • Cluster − A cluster is a component that contains one or more data centers.
  • Commit log − the commit log is a crash-recovery mechanism in Cassandra. Every write operation is written to the commit log.
  • Mem-table − A mem-table is a memory-resident data structure. After commit log, the data will be written to the mem-table. Sometimes, for a single-column family, there will be multiple mem-tables.
  • SSTable − It is a disk file to which the data is flushed from the mem-table when its contents reach a threshold value.
  • Bloom filter − These are nothing but quick, nondeterministic, algorithms for testing whether an element is a member of a set. It is a special kind of cache. Bloom filters are accessed after every query.

 

Cassandra live cast:
 

You can subscribe Cassandra to an AWS Marketplace product and launch an instance from the Mahout product's AMI using the Amazon EC2 launch wizard.

To launch an instance from the AWS Marketplace using the launch wizard

  • Open the Amazon EC2 console at https://console.aws.amazon.com/ec2/.
  • From the Amazon EC2 dashboard, choose Launch Instance.
  • On the Choose an Amazon Machine Image (AMI) page, choose the AWS Marketplace category on the left. Find a suitable AMI by browsing the categories, or using the search functionality. Choose Select to choose your product.
  • A dialog displays an overview of the product you've selected. You can view the pricing information, as well as any other information that the vendor has provided. When you're ready, choose Continue.
  • On the Choose an Instance Type page, select the hardware configuration and size of the instance to launch. When you're done, choose Next: Configure Instance Details.
  • On the next pages of the wizard, you can configure your instance, add storage, and add tags. For more information about the different options you can configure, see Launching an Instance. Choose Next until you reach the Configure Security Group page.
  • The wizard creates a new security group according to the vendor's specifications for the product. The security group may include rules that allow all IP addresses (0.0.0.0/0) access on SSH (port 22) on Linux or RDP (port 3389) on Windows. We recommend that you adjust these rules to allow only a specific address or range of addresses to access your instance over those ports.
  • When you are ready, choose Review and Launch.
  • On the Review Instance Launch page, check the details of the AMI from which you're about to launch the instance, as well as the other configuration details you set up in the wizard. When you're ready, choose Launch to select or create a key pair, and launch your instance.
  • Depending on the product you've subscribed to, the instance may take a few minutes or more to launch. You are first subscribed to the product before your instance can launch. If there are any problems with your credit card details, you will be asked to update your account details. When the launch confirmation page displays

About

Apache Cassandra is a free and open-source distributed NoSQL database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure. Also it offers robust support for clusters spanning multiple datacenters, with asynchronous masterless replication allowing low latency operations for all clients.

Guidelines

Cassandra allows distribution of cluster servers across cloud provider failure zones. Cassandra’s rack concept is designed to group servers within a cluster where the likelihood of a failure is correlated and to provide ongoing availability despite the failure of a group of servers.

There are basically two main goals one should follow and those are:

1.  Spread data evenly around the cluster

2.  Minimize the number of partition reads

Limitations:

Cassandra is not row level consistent, meaning that inserts and updates into the table that affect the same row that are processed at approximately the same time may affect the non-key columns in inconsistent ways. One update may affect one column while another affects the other, resulting in sets of values within the row that were never specified or intended.

 Some more of its limitations are:

  • No Ad-hoc queries
  • No aggregations
  • Unpredictable performance
  • CQL

Usage and Deployment Instruction:

Step 1: Open Putty for SSH

Step 2: Open Putty and Type <instance public IP> at “Host Name” Type "ec2-user" as user name Password auto taken from PPK file

Step 3: Start cassandra Server

>> service cassandra start

Step 4: Start cassandra Prompt

>> cqlsh

Step 5: Run any cassandra command and enjoy.

 
Live Demo

Our Rating

5 star
0
4 star
0
3 star
0
2 star
0
1 star
0

Submit Your Request

First Name:*
Last Name:*
Company/Organisation:*
Email Address:*
Phone Number:*
Message:*