Designing a classification Model using RapidMiner Studio Tool
Design a Classification Model
Before Starting, I would like to thanks Dr. Ashok sharma (University of Jammu) for teaching this tool
For the Classification of
data most important we require is data gathering. For layman students, first
learn from the sample data which is inbuilt in the RapidMiner Tool. Once you know
how to preprocess the data then go for your own dataset.
Main disadvantage
of RapidMiner Tool is that it doesn’t work on Images. Secondly, it is not a
open source software, but for students it is free for 1 years after one years
we can renew. For commercial purpose, it is free for only 10,000 tuples for
accessing more number of data, we have to pay the free to buy this tool.
You may quickly and
easily create predictive models from your data with the aid of RapidMiner Go. All
you need to predict a model is data. It’s that simple!!
In this document First we
used the supervised data from the RapidMiner Tool itself of Titanic data.
Steps by step guide to predict
the data
First Open the RapidMiner
studio
On the Left Side
Repository is visible, under Repository click om Sample. as shown in Figure
below.
Step 2: Click
on Data, under data select Titanic Data (Predefined dataset), You can opt
another dataset too.
Step 3: To visualize the Titanic data, means what are the rows
and how many columns are there in this data, Double click on Titanic dataset.
Step 4: To visualize the tuples in this data more clearly you
can click on statistic. Here we can clearly visualize this.
Step 5: To make a model click on the design button on the top
Step 6: In this, we drag the Titanic dataset from the left side
and place in the process window
Step 7: From the operator option shown n left side bar, on
search bar write replace missing operator. This operator is used to replace the
missing value by 0 or some values given by you.
Step 9: Run this small chuck of program by F11 or by arrow sign at the top menu bar. After running the results shows like this
Step 10: From the operator on the left side, select Set Role operator
and make a connection and on the right side set the parameter on the right
side: Attribute name this attribute should be selected from dataset.
In this we select survived and in target role select label as we are using supervised learning.
Step 11: From
the operator select split data and make a connection. Split data used to split
data into testing and training. Doble click on the split data and window opens like
this
Select the ratio and see
the results:
70: 30
80: 20 etc.
In the parameter we
select automatic, shuffled, sampling, but we go for another options and see
thee results too.
Here, we tested for above
two ratio
Step 12: From
the operator select supervised learning algorithm. In this we try with decision
tree and random forest and compare the results
When we select the
decision tree operator then on the right side, we have to choose the parameter under
criteria dropdown option.
In this we are choosing
gain ratio as a parameter
Step 13: After
applying for decision tree, from the operator on the left side, drag and drop Apply
mode and Performance Classification and make the connection as shown in above Image.
Step 14: Once all things done,
run the mode by F11, Results will display you can show the result in both table
as well as plot view.
Table View
Sampling Type : Automatic
Accuracy : 92.1
Sampling Type as Shuffled
Ratio 70:30
Accuracy 90.84%
Table View
Sampling Type: Linear
Accuracy: 89.31%
Table View
Conclusion
Parameters selected
Ratio: 70:30
Split data: automatic,
shuffled and linear
Accuracy: 92.2%, 90.1%,
89%
Result: we can conclude
that when we apply decision tree with gain ratio in automatic mode dataset
gives better results as compared to other too sampling type.
Experiment
2:
When we change the ratio of testing and training
by following above steps by double clicking on split ratio and change the ratio
values
Ratio 80:20
Step 1: Change
the ratio value as 80:20
Step 2: change the parameter of sampling type as automatic and run and check the accuracy.
Table View
Accuracy: 94%
Step 3: changed Sampling
Type as Shuffled
Table View
Accuracy: 91%
Step 4: Changed the Sampling type as Linear and Run
Table view
Accuracy: 92 %
Result
We conclude after company
the ration of testing and training data, we found that 80:20 with automatic
sampling gives 94% accuracy as compared with other sampling type
You can check and try to do experiment and see the accuracy:
Comparison the accuracy of Decision tree
and random forest and find which gives better accuracy
Comments
Post a Comment