• +1 510-870-8668, 510-298-5936, 510-796-2400
  • Login


Miri Infotech brings you a first class Open Source, standalone small library called as Elasticsearch that allows Hadoop jobs to interact with Elasticsearch. It provides support for vanilla Map/Reduce, Cascading, Pig and Hive.


Miri Infotech is launching a product which will configure an open source tool called as Hadoop bundle with Elasticsearch to a wide variety of statistical (linear and nonlinear modelling, classical statistical tests, time-series analysis, classification, clustering) and graphical techniques which is embedded pre-configured tool with Ubuntu 16.04 and ready-to-launch AMI on Amazon EC2 that contains Hadoop, Hbase, Elasticsearch, Flume and Kibana. 

A real time analysis is provided by the Elasticsearch Hadoop. One of the good things which it includes is its connector that lets you get quick insight from your big data and makes working in the Hadoop ecosystem even better. Basically its features don’t end here but to talk more about it we can say that with ES-Hadoop, you can easily build dynamic, embedded search applications to serve your Hadoop data or perform deep, low-latency analytics using full-text, geospatial queries and aggregations. 

We have talked about what an Elasticsearch Hadoop is but before going into much details, we should first learn about all its components that are involved in it and what are their associated functionalities.

Components of Elasticsearch are:


Elasticsearch is a search engine that can index new documents in near real-time and make them immediately available for querying. It is based on Apache Lucene and allows for setting up clusters of nodes that store any number of indices in a distributed, fault-tolerant way. If a node disappears, the cluster will rebalance the (shards of) indices over the remaining nodes. You can configure how many shards make up each index and how many replicas of these shards there should be. If a master shard goes offline, one of the replicas is promoted to master and used to repopulate another node.


Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of streaming data into different storage destinations like Hadoop Distributed File System. It has a simple and flexible architecture based on streaming data flows; and is robust and fault tolerant with tunable reliability mechanisms for failover and recovery.


Kibana is an open source (Apache Licensed), browser based analytics and search interface to Logstash and other timestamped data sets stored in ElasticSearch. Kibana strives to be easy to get started with, while also being flexible and powerful.


Elasticsearch live cast:

You can subscribe to an AWS Marketplace product and launch an instance from the product's AMI using the Amazon EC2 launch wizard.

To launch an instance from the AWS Marketplace using the launch wizard

  • Open the Amazon EC2 console at https://console.aws.amazon.com/ec2/.
  •  From the Amazon EC2 dashboard, choose Launch Instance.
  • On the Choose an Amazon Machine Image (AMI) page, choose the AWS Marketplace category on the left. Find a suitable AMI by     browsing the categories, or using the search functionality. Choose Select to choose your product.
  • A dialog displays an overview of the product you've selected. You can view the pricing information, as well as any other information that the vendor has provided. When you're ready, choose Continue.
  • On the Choose an Instance Type page, select the hardware configuration and size of the instance to launch. When you're done, choose Next: Configure Instance Details.
  • On the next pages of the wizard, you can configure your instance, add storage, and add tags. For more information about the different options you can configure, see Launching an Instance. Choose Next until you reach the Configure Security Group page.
  • The wizard creates a new security group according to the vendor's specifications for the product. The security group may include rules that allow all IP addresses ( access on SSH (port 22) on Linux or RDP (port 3389) on Windows. We recommend that you adjust these rules to allow only a specific address or range of addresses to access your instance over those ports.
  • When you are ready, choose Review and Launch.
  • On the Review Instance Launch page, check the details of the AMI from which you're about to launch the instance, as well as the other configuration details you set up in the wizard. When you're ready, choose Launch to select or create a key pair, and launch your instance.
  • Depending on the product you've subscribed to, the instance may take a few minutes or more to launch. You are first subscribed to the product before your instance can launch. If there are any problems with your credit card details, you will be asked to update your account details. When the launch confirmation page displays


The best thing about Elasticsearch is its damn high speed. It is a distributed, RESTful search and analytics engine capable of solving a growing number of use cases. The data storation is done centrally. A good in its scalability level- scales horizontally to handle kajillions of events per second, while automatically managing how indices and queries are distributed across the cluster for smooth operations. Operates in a distributed environment designed from the ground up for perpetual peace of mind.


  • In general, it is better to prefer medium-to-large boxes.
  • Avoid small machines, because you don’t want to manage a cluster with a thousand nodes, and the overhead of simply running Elasticsearch is more apparent on such small boxes.
  • At the same time, avoid the truly enormous machines. They often lead to imbalanced resource usage (for example, all the memory is being used, but none of the CPU) and can add logistical complexity if you have to run multiple nodes per machine.


If we talk about any software or any tool, we don’t count their features only, instead we should also focus on their limitations deployed with it. These limitations do bind you but also help you to get the correct solution to your problem.

Some of the following limitations are as follows:


Elasticsearch’s plugin infrastructure is extremely flexible in terms of what can be extended. While it opens up Elasticsearch to a wide variety of (often custom) additional functionality, when it comes to security, this high extensibility level comes at a cost. We have no control over the third-party plugins' code (open source or not) and therefore we cannot guarantee their compliance with Shield. For this reason, third-party plugins are not officially supported on clusters with the Shield security plugin installed.

Filtered Index Aliases

Aliases containing filters are not a secure way to restrict access to individual documents, due to the limitations described in Index and Field Names Can Be Leaked When Using Aliases. Shield provides a secure way to restrict access to documents through the document-level security feature.

Index and Field Names Can Be Leaked When Using Aliases

Calling certain Elasticsearch APIs on an alias can potentially leak information about indices that the user isn’t authorized to access. For example, when you get the mappings for an alias with the _mapping API, the response includes the index name and mappings for each index that the alias applies to. Similarly, the response to a _field stats request includes the name of the underlying index, rather than the alias name.

Usage and Deployment Instruction

Step 1: Open Putty for SSH

Step 2: Open Putty and Type <instance public IP> at “Host Name” Type "ubuntu" as user name Password auto taken from PPK file


Step 3: Use following Linux command to Start Elasticsearch

Step 3.1: $ sudo su 

Step 3.2: $ sudo vi /etc/hosts

Take the Private Ip address from your machine as per the below screenshot and then replace the second line of your command screen with that Private ip address

Steps 3.3: How to start Elasticsearch

>> sudo systemctl enable elasticsearch.service

Step 3.4: Configure ElasticSearch

>> sudo vi /etc/elasticsearch/elasticsearch.yml

Step 3.5: Testing Elasticsearch:

>> curl -X GET 'http://localhost:9200'

Live Demo

Our Rating

5 star
4 star
3 star
2 star
1 star

Submit Your Request

First Name:*
Last Name:*
Email Address:*
Phone Number:*