Designing a classification Model using RapidMiner Studio Tool

 

Design a Classification Model

Before Starting, I would like to thanks Dr. Ashok sharma (University of Jammu) for teaching this tool

For the Classification of data most important we require is data gathering. For layman students, first learn from the sample data which is inbuilt in the RapidMiner Tool. Once you know how to preprocess the data then go for your own dataset.

Main disadvantage of RapidMiner Tool is that it doesn’t work on Images. Secondly, it is not a open source software, but for students it is free for 1 years after one years we can renew. For commercial purpose, it is free for only 10,000 tuples for accessing more number of data, we have to pay the free to buy this tool.

You may quickly and easily create predictive models from your data with the aid of RapidMiner Go. All you need to predict a model is data. It’s that simple!!

In this document First we used the supervised data from the RapidMiner Tool itself of Titanic data.

Steps by step guide to predict the data

First Open the RapidMiner studio

On the Left Side Repository is visible, under Repository click om Sample. as shown in Figure below.

Step 2: Click on Data, under data select Titanic Data (Predefined dataset), You can opt another dataset too.

Step 3: To visualize the Titanic data, means what are the rows and how many columns are there in this data, Double click on Titanic dataset.


Step 4: To visualize the tuples in this data more clearly you can click on statistic. Here we can clearly visualize this.

Step 5: To make a model click on the design button on the top

Step 6: In this, we drag the Titanic dataset from the left side and place in the process window

Step 7: From the operator option shown n left side bar, on search bar write replace missing operator. This operator is used to replace the missing value by 0 or some values given by you.

Step 8: Make a connection either by plug sign or making a connecting line between two.

 
Step 9: Run this small chuck of program by F11 or by arrow sign at the top menu bar. After running the results shows like this

Step 10: From the operator on the left side, select Set Role operator and make a connection and on the right side set the parameter on the right side: Attribute name this attribute should be selected from dataset.

In this we select survived and in target role  select label as we are using supervised learning.



Step 11: From the operator select split data and make a connection. Split data used to split data into testing and training. Doble click on the split data and window opens like this


Select the ratio and see the results:

70: 30

80: 20 etc.

In the parameter we select automatic, shuffled, sampling, but we go for another options and see thee results too.

Here, we tested for above two ratio

Step 12: From the operator select supervised learning algorithm. In this we try with decision tree and random forest and compare the results

When we select the decision tree operator then on the right side, we have to choose the parameter under criteria dropdown option.

In this we are choosing gain ratio as a parameter

Step 13: After applying for decision tree, from the operator on the left side, drag and drop Apply mode and Performance Classification and make the connection as shown in above Image.

Step 14: Once all things done, run the mode by F11, Results will display you can show the result in both table as well as plot view.

Table View

Sampling Type : Automatic

Accuracy : 92.1

Sampling Type as Shuffled

Ratio 70:30

Accuracy 90.84%

Table View


Sampling Type: Linear

Accuracy: 89.31%

Table View

Conclusion

Parameters selected

Ratio: 70:30

Split data: automatic, shuffled and linear

Accuracy: 92.2%, 90.1%, 89%

Result: we can conclude that when we apply decision tree with gain ratio in automatic mode dataset gives better results as compared to other too sampling type.


Experiment 2:

When we change the ratio of testing and training by following above steps by double clicking on split ratio and change the ratio values

Ratio 80:20

Step 1: Change the ratio value as 80:20


Step 2:  change the parameter of sampling type as automatic and run and check the accuracy.

Table View


Accuracy: 94%

Step 3: changed Sampling Type as Shuffled

Table View

Accuracy: 91%

Step 4: Changed the Sampling type as Linear and Run

Table view

Accuracy: 92 %

Result

We conclude after company the ration of testing and training data, we found that 80:20 with automatic sampling gives 94% accuracy as compared with other sampling type


You can check and try to do experiment and see the accuracy:

Comparison the accuracy of Decision tree and random forest and find which gives better accuracy



Comments

Popular posts from this blog

Correct Way of Consuming Water

RapidMiner Studio (Computer Vison Tool)

Healthier Diet (Coconut Ladoos)