• +1 510-870-8668, 510-298-5936, 510-796-2400
  • Login

Mahout

Apache Mahout is a project of the Apache Software Foundation to produce free implementations of distributed or otherwise scalable machine learning algorithms focused primarily in the areas of collaborative filtering, clustering and classification. Many of the implementations use the Apache Hadoop platform. Mahout also provides Java libraries for common maths operations (focused on linear algebra and statistics) and primitive Java collections. Mahout is a work in progress; the number of implemented algorithms has grown quickly, but various algorithms are still missing.

Miri Infotech is launching a product which will configure and publish Mahout, to produce free implementations of distributed or otherwise scalable machine learning algorithms which is embedded pre-configured tool with Ubuntu and ready-to-launch AMI on Amazon EC2 that contains Mahout, Hadoop, Scala, Sparks.

Apache Spark is a powerful open source processing engine for Hadoop data built around speed, easy to use, and sophisticated analytics. It was originally developed in UC Berkeley’s AMPLab and later-on it moved to Apache. Apache Spark is basically a parallel data processing framework that can work with Apache Hadoop to make it extremely easy to develop fast, Big Data applications combining batch, streaming, and interactive analytics on all your data.

Scala is a modern multi-paradigm programming language designed to express common programming patterns in a concise, elegant, and type-safe way. Scala has been created by Martin Odersky and he released the first version in 2003. Scala smoothly integrates the features of object-oriented and functional languages. This tutorial explains the basics of Scala in a simple and reader-friendly way.

Apache Mahout is an open source project that is primarily used for creating scalable machine learning algorithms. It implements popular machine learning techniques such as:

  • Recommendation
  • Classification
  • Clustering

Apache Mahout started as a sub-project of Apache’s Lucene in 2008. In 2010, Mahout became a top level project of Apache.

Scala Features:
  • Scala is object-oriented
  • Scala is functional
  • Scala is statically typed
  • Scala runs on the JVM
  • Scala can Execute Java Code
  • Scala can do Concurrent & Synchronize processing
Spark Features:
  • Lighting Fast Processing
  • Support for Sophisticated Analytics
  • Real Time Stream Processing
  • Active and Expanding Community
  • Ease of Use as it supports multiple languages
User interaction features include:
  • The algorithms of Mahout are written on top of Hadoop, so it works well in distributed environment. Mahout uses the Apache Hadoop library to scale effectively in the cloud.
  • Mahout offers the coder a ready-to-use framework for doing data mining tasks on large volumes of data.
  • Mahout lets applications to analyze large sets of data effectively and in quick time.
  • Includes several MapReduce enabled clustering implementations such as k-means, fuzzy k-means, Canopy, Dirichlet, and Mean-Shift.
  • Supports Distributed Naive Bayes and Complementary Naive Bayes classification implementations.
  • Comes with distributed fitness function capabilities for evolutionary programming.
  • Includes matrix and vector libraries.
Mahout live cast:
 

You can subscribe to Mahout, an AWS Marketplace product and launch an instance from the Mahout product's AMI using the Amazon EC2 launch wizard.

To launch an instance from the AWS Marketplace using the launch wizard
  • Open the Amazon EC2 console at https://console.aws.amazon.com/ec2/
  • From the Amazon EC2 dashboard, choose Launch Instance.
    On the Choose an Amazon Machine Image (AMI) page, choose the AWS Marketplace category on the left. Find a suitable AMI by browsing the categories, or using the search functionality. Choose Select to choose your product.
  • A dialog displays an overview of the product you've selected. You can view the pricing information, as well as any other information that the vendor has provided. When you're ready, choose Continue.
  • On the Choose an Instance Type page, select the hardware configuration and size of the instance to launch. When you're done, choose Next: Configure Instance Details.
  • On the next pages of the wizard, you can configure your instance, add storage, and add tags. For more information about the different options you can configure, see Launching an Instance. Choose Next until you reach the Configure Security Group page.
  • The wizard creates a new security group according to the vendor's specifications for the product. The security group may include rules that allow all IP addresses (0.0.0.0/0) access on SSH (port 22) on Linux or RDP (port 3389) on Windows. We recommend that you adjust these rules to allow only a specific address or range of addresses to access your instance over those ports.
  • When you are ready, choose Review and Launch.
  • On the Review Instance Launch page, check the details of the AMI from which you're about to launch the instance, as well as the other configuration details you set up in the wizard. When you're ready, choose Launch to select or create a key pair, and launch your instance.
  • Depending on the product you've subscribed to, the instance may take a few minutes or more to launch. You are first subscribed to the product before your instance can launch. If there are any problems with your credit card details, you will be asked to update your account details. When the launch confirmation page displays

About

Apache Mahout is a project of the Apache Software Foundation to produce free implementations of distributed or otherwise scalable machine learning algorithms focused primarily in the areas of collaborative filtering, clustering and classification. Many of the implementations use the Apache Hadoop platform. Mahout also provides Java libraries for common maths operations (focused on linear algebra and statistics) and primitive Java collections. Mahout is a work in progress; the number of implemented algorithms has grown quickly but various algorithms are still missing.

Guidelines

What do you like to work on? There are a ton of things in Mahout that we would love to have contributions for: documentation, performance improvements, better tests, etc. The best place to start is by looking into our issue tracker and seeing what bugs have been reported and seeing if any look like you could take them on. Small, well written, well tested patches are a great way to get your feet wet. It could be something as simple as fixing a typo. The more important piece is you are showing you understand the necessary steps for making changes to the code. Mahout is a pretty big beast at this point, so changes, especially from non-committers, need to be evolutionary not revolutionary since it is often very difficult to evaluate the merits of a very large patch. Think small, at least to start!

Usage / Deployment Instruction

Step 1 : Open Putty for SSH

Step 2 : Open Putty and Type <instanceid> at “Host Name”

Step 3 : Open Conncetion->SSH->Auth tab from Left Side Area

Step 4 : Click on browse button and select ppk file for Instance and then click on Open

Step 5 : Type "ubuntu" as user name Password auto taken from PPK file

Step 6 : Use following Linux command to start Hadoop

Step 6.1 : sudo vi /etc/hosts

Take the Private Ip address from your machine as per the below screenshot and then replace the second line of your command screen with that Private ip address

Step 6.2 : ssh-keygen -t rsa -P ""

This command is used to generate the ssh key.

Step 6.3 : cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys

This command is used to move the generated ssh key to the desired location

Step 6.4 : ssh localhost

Step 6.5 : hdfs namenode –format

You have to write “yes” when it prompts you – Are you sure you want to continue?

Step 6.6 : start-all.sh

Step 6.7 : After the above command executes successfully, you should check the below urls in the browser -

http://:8088

http://:50070

http://:50090

Step 7 : Use following Linux command to start Scala and Spark

Step 7.1 : cd spark-2.1.0/

Step 7.2 : ./bin/spark-shell

Step 7.3 : You can check the spark by going on to the following url in your browser –

http://:4040

Step 7.4 : Now you can execute your scala programs as below –

Step 8.0 : Install Mahout

cd /home/ubuntu/mahout/trunk

mvn install

mvn compile

Step 9.0: Run mahout

 
Live Demo

Our Rating

5 star
0
4 star
0
3 star
0
2 star
0
1 star
0

Submit Your Request

First Name:*
Last Name:*
Company/Organisation:*
Email Address:*
Phone Number:*
Message:*