• +1 510-870-8668, 510-298-5936, 510-796-2400
  • Login

Predictive Analytics Framework R Hadoop

MIRI Infotech brings Predictive analytics framework environment in R, Java and Hadoop. Specially optimized R version 3.3.3 (2017-03-06) with Ubuntu 16.04 OS. This includes RStudio Version 0.98.1091 & RServer 1.0.36 and Hadoop 2.7.3 with HDFS, HBase Version 1.3.0.

Miri Infotech is launching a product which will configure and publish Predictive Analytics Framework with R & Java and Hadoop, to a wide variety of statistical (linear and nonlinear modelling, classical statistical tests, time-series analysis, classification, clustering) and graphical techniques which is embedded pre-configured tool with Ubuntu 16.04 and ready-to-launch AMI on Amazon EC2 that contains Hadoop, R, RStudio, HDFS, Hbase and Shiny server.

R is a language and environment for statistical computing and graphics. R provides a wide variety of statistical (linear and nonlinear modelling, classical statistical tests, time-series analysis, classification, clustering) and graphical techniques, and is highly extensible. The capabilities of R are extended through user-created packages, which allow specialized statistical techniques, graphical devices.

A core set of packages is included with the installation of R, with more than 10,331 additional packages.

Currently, the CRAN package repository features 10331 available packages. To see the list of all available packages from CRAN, Click Link.

RStudio includes other open source software components. The following is a list of these components (full copies of the license agreements used by these components are included below):

  • Qt (LGPL v2.1)
  • QtSingleApplication
  • Ace (LGPL v2.1)
  • Boost
  • RapidXml
  • JSON Spirit
  • Google Web Toolkit
  • Guice
  • GIN
  • AOP Alliance
  • RSA-JS
  • tree.hh
  • Hunspell (MPL)
  • Chromium Hunspell Dictionaries (MPL)
  • pdf.js
  • SyncTeX
  • ZLib
  • Sundown
  • highlight.js
  • MathJax
  • reveal.js
  • node-webkit
  • JSCustomBadge

RStudio Server enables you to provide a browser-based interface (the RStudio IDE) to a version of R running on a remote Linux server. Deploying R and RStudio on a server has a number of benefits, including:

  • The ability to access your R workspace from any computer in any location
  • Easy sharing of code, data, and other files with colleagues
  • Allowing multiple users to share access to the more powerful computer resources (memory, processors, etc.) available on a well-equipped server
  • Centralized installation and configuration of R, R packages, TeX, and other supporting libraries
     
Predictive Analytics Framework R Hadoop live cast:
 

You can subscribe to an AWS Marketplace product and launch an instance from the product's AMI using the Amazon EC2 launch wizard.

You can subscribe to an AWS Marketplace product and launch an instance from the product's AMI using the Amazon EC2 launch wizard.
To launch an instance from the AWS Marketplace using the launch wizard
  • Open the Amazon EC2 console at https://console.aws.amazon.com/ec2/
  • From the Amazon EC2 dashboard, choose Launch Instance.
    On the Choose an Amazon Machine Image (AMI) page, choose the AWS Marketplace category on the left. Find a suitable AMI by browsing the categories, or using the search functionality. Choose Select to choose your product.
  • A dialog displays an overview of the product you've selected. You can view the pricing information, as well as any other information that the vendor has provided. When you're ready, choose Continue.
  • On the Choose an Instance Type page, select the hardware configuration and size of the instance to launch. When you're done, choose Next: Configure Instance Details.
  • On the next pages of the wizard, you can configure your instance, add storage, and add tags. For more information about the different options you can configure, see Launching an Instance. Choose Next until you reach the Configure Security Group page.
  • The wizard creates a new security group according to the vendor's specifications for the product. The security group may include rules that allow all IP addresses (0.0.0.0/0) access on SSH (port 22) on Linux or RDP (port 3389) on Windows. We recommend that you adjust these rules to allow only a specific address or range of addresses to access your instance over those ports.
  • When you are ready, choose Review and Launch.
  • On the Review Instance Launch page, check the details of the AMI from which you're about to launch the instance, as well as the other configuration details you set up in the wizard. When you're ready, choose Launch to select or create a key pair, and launch your instance.
  • Depending on the product you've subscribed to, the instance may take a few minutes or more to launch. You are first subscribed to the product before your instance can launch. If there are any problems with your credit card details, you will be asked to update your account details. When the launch confirmation page displays
About

R is a language and environment for statistical computing and graphics. R provides a wide variety of statistical (linear and nonlinear modelling, classical statistical tests, time-series analysis, classification, clustering) and graphical techniques, and is highly extensible. The capabilities of R are extended through user-created packages, which allow specialized statistical techniques, graphical devices.

A core set of packages is included with the installation of R, with more than 10,331 additional packages.

Guidelines

R is designed to make it easy to clearly express statistical ideas in code, but when it comes to writing code that runs as fast as possible, there are a few tips, tricks and caveats to be aware of. R include lots of handy guidelines, including:

  • Common performance pitfalls, and solutions
  • How to measure performance and memory use
  • How to work with large data files
  • How to use parallel computing to speed up "embarrassingly parallel" jobs

Limitations

R were written long before cheap parallel processing became available. Serialness is baked into the design of R, and while there are several packages out there that try to work around that, they’re definitely bolted on.

The type system in R mostly serves to get in the way. I’m sure there’s some underlying rationale for it, but I’ve yet to find a book or tutorial that explains what that rationale is.

If you don’t speak statistics, finding packages can be difficult. R has an amazingly rich ecosystem, but it’s mostly written by statisticians for statistician. Knowing a bit of that vocabulary makes life a lot easier.

R’s visualization capabilities are so rich that the occasional corner cases are that much more frustrating. Don’t know if this is still true or not, but not that long ago getting LaTeX-ish formulas on graph axes required postscript hacking.

Usage / Deployment Instruction

Step 1 : Open Putty for SSH

Step 2 : Open Putty and Type <instance public IP> at “Host Name”

 

Step 3 : Open Conncetion->SSH->Auth tab from Left Side Area

Step 4 : Click on browse button and select ppk file for Instance and then click on Open

 

Step 5 : Type "ubuntu" as user name Password auto taken from PPK file

 

Step 5.1: if you get any update option from Ubuntu then follow these steps:

 

Then follow these commands

$ apt-get update

 

$ apt-get upgrade

 

Step 6 : Use following Linux command to start Hadoop

Step 6.1 : $ sudo vi /etc/hosts

Take the Private Ip address from your machine as per the below screenshot and then replace the second line of your command screen with that Private ip address

 

Step 6.2 :  $ ssh-keygen -t rsa -P ""

This command is used to generate the ssh key.

 

Step 6.3 :  $ cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys

This command is used to move the generated ssh key to the desired location

 

Step 6.4 : ssh localhost

 

Step 6.5 : hdfs namenode –format

You have to write “yes” when it prompts you – Are you sure you want to continue?

 

Step 6.6 : start-all.sh

 

Step 6.7 : After the above command executes successfully, you should check the below urls in the browser -

http://<instance-public-ip>:8088

 

http://<instance-public-ip>:50070

 

http://<instance-public-ip>:50090

 

Step 7: Start Hbase

$ cd /usr/local/hbase/bin

$ start-hbase.sh

 

Step 8: Start R console

$ R

 

Step 9: Start RStudio Server

$ cd ~

$ sudo gdebi rstudio-server-0.98.1028-amd64.deb

 

$ sudo rstudio-server start

 

Step 9: Update user rstudio password

$ sudo passwd rstudio

 

Step 10: Configure r-hadoop

Open rstudio in browser

http://<instance-public-ip>:8787/

Example:

http://54.237.233.225:8787/

Enter rstudio user and its newly generated password

 

After login screen look like this:

 

For install r-hadoop packages:

Select à Tools à Install packages à Install from : Package Archive File(.tr.gz)

 

Select Browse button

Open new file explorer window and select all available packages for r-hadoop service. Available packages are:

rhdfs_1.0.8.tar.gz

rhbase_1.2.1.tar.gz

plyrmr_0.6.0.tar.gz

ravro_1.0.4.tar.gz

rmr2_3.3.1.tar.gz

 

After that you will enjoy with Predictive Analytics Framework R Hadoop  with your own commands, Environment Ready for use.

 
Live Demo

Our Rating

5 star
0
4 star
0
3 star
0
2 star
0
1 star
0

Submit Your Request

First Name:*
Last Name:*
Company/Organisation:*
Email Address:*
Phone Number:*
Message:*