Mahout integrates a lot of common machine learning algorithms which faciliates those who want to do some research in data mining. It is based on Java and a lot of need to be done before you can make it work. At least you will need JDK, Eclispse, Hadoop and Mahout. But I strongly recommend all those below to be done to make it better.

I JDK
II mysql
III Tomcat
IV Eclipse and MyEclipse
V Maven
VI Hadoop and Mahout
VII Test 
VIII k-means Algorithm Test

I JDK
jdk

sudo gedit /etc/profile

#set java environment
JAVA_HOME=/home/lethic/Documents/Softwares/jdk1.7.0_21
export JRE_HOME=/home/lethic/Documents/Softwares/jdk1.7.0_21/jre
export CLASSPATH=$JAVA_HOME/lib:$JRE_HOME/lib:$CLASSPATH
export PATH=$JAVA_HOME/bin:$JRE_HOME/bin:$PATH

Reboot
jdk2

Test:vim hello.java
public class hello{
public static void main(String args[]){
System.out.println("Hello World!");
}
}

Javac hello.java
Java hello
jdk3

II mysql
sudo apt-get install mysql-server my-client

And test:
sudo netstat -tap | grep mysql

A graphical tool is recommended. Search for mysql-admin in Synaptic and install it:

III Tomcat

http://mirror.bjtu.edu.cn/apache/tomcat/tomcat-7/v7.0.40/bin/

apache-tomcat-7.0.40.tar.gz

Add this:
JAVA_HOME=/home/lethic/Documents/Softwares/jdk1.7.0_21
JAVA_OPTS="-server -Xms512m -Xmx1024m -XX:PermSize=600M -XX:MaxPermSize=600m -Dcom.sun.management.jmxremote"

Infront of:
cygwin=false
os400=false
darwin=false
case "`uname`" in
CYGWIN*) cygwin=true;;
OS400*) os400=true;;
Darwin*) darwin=true;;


Add:
JAVA_HOME=/home/lethic/Documents/Softwares/jdk1.7.0_21
export JRE_HOME=/home/lethic/Documents/Softwares/jdk1.7.0_21/jre
export CLASSPATH=$JAVA_HOME/lib:$JRE_HOME/lib:$CLASSPATH
export PATH=$JAVA_HOME/bin:$JRE_HOME/bin:$PATH

To the end

then

Type: localhost:8080 in your browser

IV Eclipse and MyEclipse

http://www.eclipse.org/downloads/

I chose the fist one

Myeclipse:

Modify default jdk:

sudo update-alternatives --install "/usr/bin/java" "java" "/home/lethic/Documents/Softwares/jdk1.7.0_21/bin/java" 300
sudo update-alternatives --install "/usr/bin/javac" "javac" "/home/lethic/Documents/Softwares/jdk1.7.0_21/bin/javac" 300
sudo update-alternatives --install "/usr/bin/javaws" "javaws" "/home/lethic/Documents/Softwares/jdk1.7.0_21/bin/javaws" 300
sudo update-alternatives --config java
sudo update-alternatives --config javac
sudo update-alternatives --config javaws

Download:
http://www.myeclipseide.com/module-htmlpages-display-pid-4.html



Build a shortcut for MyEclipse
lethic@lethic:~/Documents/Softwares$ sudo chown -R root:root MyEclispse
lethic@lethic:~/Documents/Softwares$ sudo chmod -R +r MyEclispse
lethic@lethic:~/Documents/Softwares$ cd 'MyEclispse/MyEclipse 10/'
lethic@lethic:~/Documents/Softwares/MyEclispse/MyEclipse 10$ sudo chown -R root:root myeclipse
lethic@lethic:~/Documents/Softwares/MyEclispse/MyEclipse 10$ sudo chmod -R +r myeclipse

sudo gedit /usr/bin/MyEclipse

#!/bin/sh
export MYECLIPSE_HOME="/home/lethic/Documents/Softwares/MyEclispse/MyEclipse 10/myeclipse"
$MYECLIPSE_HOME/myeclipse $*

sudo chmod 755 /usr/bin/MyEclipse
sudo chmod -R 777 /home/lethic/Documents/Softwares/MyEclispse

sudo gedit /usr/share/applications/MyEclipse.desktop

[Desktop Entry]
Encoding=UTF-8
Name=MyEclipse 10
Comment=IDE for JavaEE
Exec=/home/lethic/Documents/Softwares/MyEclispse/MyEclipse\ 10/myeclipse
Icon=/home/lethic/Documents/Softwares/MyEclispse /MyEclipse\ 10/icon.xpm
Terminal=false
Type=Application
Categories=GNOME;Application;Development;
StartupNotify=true

Then initialize it:
'/usr/MyEclipse/MyEclipse 10/myeclipse' -clean

V Maven

Apache Maven 3.0.5

http://maven.apache.org/docs/3.0.5/release-notes.html

tar -xvzf apache-maven-3.0.5-bin.tar.gz

#create a link for it to make it easy to upgrade
ln -s apache-maven-3.0.5 apache-maven

#reboot and test


VI Hadoop and Mahout

Hadoop:
http://mirror.bit.edu.cn/apache/hadoop/common/stable/
hadoop-1.1.2.tar.gz

tar zxvf hadoop-1.1.2.tar.gzMahout:

http://mirror.bit.edu.cn/apache/mahout/0.6/
tar zxvf mahout-distribution-0.6.tar.gz

Add this to etc/profile

export HADOOP_HOME=/home/lethic/Documents/Softwares/hadoop-1.1.2
export HADOOP_CONF_DIR=/home/lethic/Documents/Softwares/hadoop-1.1.2/conf
export MAHOUT_HOME=/home/lethic/Documents/Softwares/mahout-distribution-0.6
export PATH=$HADOOP_HOME/bin:$MAHOUT_HOME/bin:$PATH

Then refresh the profile again:
source /etc/profile

VII Test
I modified my /etc/profile again and finally the part I added in is like this:
umask 022

#set java environment
#JAVA_HOME=/home/lethic/Documents/Softwares/jdk1.7.0_21
export JAVA_HOME=/home/lethic/Documents/Softwares/jdk1.7.0_21
export JRE_HOME=/home/lethic/Documents/Softwares/jdk1.7.0_21/jre

#export PATH=$JAVA_HOME/bin:$JRE_HOME/bin:$PATH

export GTK_IM_MODULE=ibus
export XMODIFIERS="@im=ibus"
export QT_IM_MODULE=ibus

export MAVEN_HOME=/home/lethic/Documents/Softwares/apache-maven-3.0.5
export HADOOP_HOME=/home/lethic/Documents/Softwares/hadoop-1.1.2
export HADOOP_CONF_DIR=/home/lethic/Documents/Softwares/hadoop-1.1.2/conf
export MAHOUT_HOME=/home/lethic/Documents/Softwares/mahout-distribution-0.6

export PATH=$JAVA_HOME/bin:$MAVEN_HOME/bin:$JRE_HOME/bin:$HADOOP_HOME/bin:$MAHOUT_HOME/bin:$PATH
export CLASSPATH=$JAVA_HOME/lib:$JRE_HOME/lib:$CLASSPATH

export HADOOP_HOME_WARN_SUPPRESS=1

NOTICE that all the /home/lethic/Documents/Softwares/ should be changed to your own path.

TEST:
Java:

javac

Remember to add this to etc/profile or it will show some warning:
export HADOOP_HOME_WARN_SUPPRESS=1

Hadoop:

Mahout:

It says that: MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath
I think this not a kind of error because when you refer to mahout, it contains:

if [ "$MAHOUT_LOCAL" != "" ]; then
echo "MAHOUT_LOCAL is set, so we don't add HADOOP_CONF_DIR to classpath."
else
echo "MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath."
CLASSPATH=${CLASSPATH}:$HADOOP_CONF_DIR
Fi

Which means whenever MAHOUT_LOCAL is not empty, it will echo “MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.”.
And notice that:

# MAHOUT_LOCAL set to anything other than an empty string to force
# mahout to run locally even if
# HADOOP_CONF_DIR and HADOOP_HOME are set

Which means if you want to run Mahout on Hadoop but not locally, you should set MAHOUT_LOCAL to empty string.
Thus we may get a conclusion that if we want to run Mahout on Hadoop, it will always echo “MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.” which is not a kind of error.
And all above is my opinion and it may be wrong because Im still fledgling. But at least all the things still goes well and I did not met any problem since then.


VIII k-means Algorithm Test

Test k-means:

Download the data:
http://archive.ics.uci.edu/ml/databases/synthetic_control/synthetic_control.data
And copy it to $MAHOUT_HOME

Get the Hadoop started:
$HADOOP_HOME/bin/start-all.sh

Then import the data to ‘testdata'(NOTICE that the name ‘testdata’ cannot be modified, it is said on the Internet that only the name ‘testdata’ can be detected by this program):
$HADOOP_HOME/bin/hadoop fs -mkdir testdata
$HADOOP_HOME/bin/hadoop fs -put $MAHOUT_HOME/synthetic_control.data $MAHOUT_ HOME/testdata

Kmeans algorithm:
$HADOOP_HOME/bin/hadoop jar $MAHOUT_HOME/mahout-examples-0.6-job.jar org.apache.mahout.clustering.syntheticcontrol.kmeans.Job

It will take a few minutes

To see the results:
$HADOOP_HOME/bin/hadoop fs -lsr output

$HADOOP_HOME/bin/hadoop fs -get output $MAHOUT_HOME/examples

$cd $MAHOUT_HOME/examples/output

$ ls
And if you see:
clusteredPoints clusters-0 clusters-1 clusters-10 clusters-2 clusters-3 clusters-4
clusters-5 clusters-6 clusters-7 clusters-8 clusters-9 data

Your Mahout is properly installed. 🙂