Total members 11897 |It is currently Sat Jan 18, 2025 2:51 am Login / Join Codemiles

Java

C/C++

PHP

C#

HTML

CSS

ASP

Javascript

JQuery

AJAX

XSD

Python

Matlab

R Scripts

Weka





This R script reads in a data set in a CSV file called "Data.csv", and loads the randomForest and ROSE packages. It then performs k-fold cross-validation using the caret package, with k = 10. For each iteration of the cross-validation, the script subsets the data into a training set and a test set. It then performs upsampling of the minority class using the ROSE package and downsampling of the majority class using the downSample function. It then uses the cforest function from the party package to fit a random forest model to the training data and makes predictions on the test data. The script then calculates various performance metrics, such as precision and recall for each class and ROC curve.

It also calculates variable importance using varImp function. After each iteration of the cross-validation, the script saves the variable importance and ROC values, and at the end of the loop it calculates the average of these values over all iterations. Finally, it saves the variable importance in a CSV file called "CondsaveTemptable.csv". This is script is used for building a randomForest Classifier with 10-cross-validation in R. This script also deals with the unbalanced data problem by doing up-sampling and down-sampling steps on the training data. This script also calculate the precision/recall, variable importance, and ROC curve area for each fold. To run this script, you need to do few modification to read and process your data.
Code:

setwd("D:/newFolder/")


data <- read.csv("Data.csv",head=TRUE )
require(randomForest)
require(ROSE)
if(!require(caret)){
  library(caret) 
}
if(!require(pROC)){
  library(pROC)
}
library (ROCR);





k=10 #Folds

id <- sample(1:k,nrow(data),replace=TRUE)
list <- 1:k

prediction <- data.frame()
trainingset <- data.frame()
testsetCopy <- data.frame()
#Creating a progress bar to know the status of CV
#progress.bar <- create_progress_bar("text")
#progress.bar$init(k)

PrecisionClassOne=0;
RecallClassOne=0;
PrecisionClassTwo=0;
RecallClassTwo=0;


for (i in 1:k){
 
  trainingset <- subset(data, id %in% list[-i])
  # Performing upsampling of minorities using ROSE package
   #trainingset <- ROSE(class~., data=trainingset,    N=length(trainingset$class))$data
 
   # Note that the sizes of the arrays here are based on your data. So you may need to change it!!
  trainingset=downSample(trainingset[,1:22],as.factor( trainingset[,23]), list = FALSE, yname = "class")
  trainingset=upSample(trainingset[,1:22],as.factor( trainingset[,23]), list = FALSE, yname = "class")
  #print(trainingset[,23])
  testset <- subset(data, id %in% c(i))
 

  #which(sapply(testset,  class) != sapply(trainingset,  class))
   
 
 
  library(party)
  cf1 <- cforest(class~.,data=trainingset,control=cforest_unbiased(mtry=2,ntree=100))
 
  print("perform predictions on test data...")
 
   
   
 
  predictions <- predict(cf1, newdata=testset)

 
 
  metrics<- confusionMatrix(predictions,testset$class,positive='1')
  ClassOne=metrics$byClass
 
 
  metrics2<- confusionMatrix(predictions,testset$class,positive='2')
 
 
 
  ClassTwo=metrics2$byClass;
 
 
 
  PrecisionClassOne=ClassOne[3]+PrecisionClassOne;
  RecallClassOne=ClassOne[1]+RecallClassOne;
 
  PrecisionClassTwo=ClassTwo[3]+PrecisionClassTwo;
  RecallClassTwo=ClassTwo[1]+RecallClassTwo;
 
 
  rocValue=roc.curve(testset$class, predictions,
                     main="ROC curve \n (Half circle depleted data)")
   
  importToSave=varImp(cf1)
  #varImp(model2,conditional=TRUE)
 
  #plot(varImp(model2), top = 20)
 
  if(i>1)
  {
    saveTemp= cbind(saveTemp,importToSave)
    saveROCtemp= rbind(saveROCtemp ,rocValue$auc)
  }else
  {
    saveTemp= importToSave;
    saveROCtemp=rocValue$auc;
  }
 
 
}
PrecisionClassOne=PrecisionClassOne/k;
RecallClassOne=RecallClassOne/k;
PrecisionClassTwo=PrecisionClassTwo/k;
RecallClassTwo=RecallClassTwo/k;
print("Class One Precision/ Recall");
print(PrecisionClassOne);
print(RecallClassOne);
print("Class Two(Re-open) Precision/ Recall");
print(PrecisionClassTwo);
print(RecallClassTwo);


### Saving the importance variables .

write.table ( saveTemp,
              file = "CondsaveTemptable.csv",
              append = FALSE,
              quote = TRUE,
              sep = ",",
              col.names = TRUE,
              row.names = TRUE);

meansOfCOlS=rowMeans(saveTemp)
max(saveTemp)
min(meansOfCOlS)
write.table (meansOfCOlS,
             file = "CondsaveTemptableMeans.csv",
             append = FALSE,
             quote = TRUE,
             sep = ",",
             col.names = TRUE,
             row.names = TRUE);



### Saving the RCOC variables .

write.table ( saveROCtemp,
              file = "CondsaveROCTemptable.csv",
              append = FALSE,
              quote = TRUE,
              sep = ",",
              col.names = TRUE,
              row.names = TRUE);



newSaveTemp<-t(saveTemp)
melt(newSaveTemp)
b <- ggplot(saveTemp, aes(x = saveTemp, ymin = `0%`, lower = `25%`, middle = `50%`, upper = `75%`, ymax = `100%`))
b + geom_boxplot(stat = "identity")





This R script uses the randomForest package to train a random forest model on a given data set, and performs cross-validation using the caret package to evaluate the model's performance.

The script first reads in a data set in a CSV file called "Data.csv" and loads the necessary packages (randomForest, ROSE, caret and pROC). It then sets the number of folds for the cross-validation (k = 10).

It then randomly assigns each row of the data to a fold, and for each iteration of the cross-validation:

    1. It creates a training set by subsetting the data to include only the rows assigned to the folds other than the current one, and a test set by subsetting the data to include only the rows assigned to the current fold.
    2. it performs upsampling of the minority class using the ROSE package and downsampling of the majority class using the downSample function to balance the class distribution.
    3. it uses the cforest function from the party package to fit a random forest model to the training data and make predictions on the test data.
    4. it calculates various performance metrics such as precision and recall for each class and ROC curve.
    5. it also calculates variable importance using varImp function.

After each iteration of the cross-validation, the script saves the variable importance and ROC values, and at the end of the loop it calculates the average of these values over all iterations. Finally, it saves the variable importance in a CSV file called "CondsaveTemptable.csv".

By using cross-validation and sampling techniques such as upsampling and downsampling, the script is able to evaluate the model's performance in a more robust and unbiased way.



_________________
Sami
PHD student - SAIL - School Of Computing
Queens' University
Canada


Author:
Site Admin
User avatar Posts: 33
Have thanks: 1 time

updated


_________________
Sami
PHD student - SAIL - School Of Computing
Queens' University
Canada


Author:
Site Admin
User avatar Posts: 33
Have thanks: 1 time
Post new topic Reply to topic  [ 2 posts ] 

  Related Posts  to : R script for RandomForest with Cross-validation and Sampling
 Weka java code for Random Forest Cross Validation     -  
 KFold Cross-validation Random Forest Binary Classification     -  
 Cross platform c++ programming     -  
 Can anyone suggest some script?     -  
 need help with java script in a pdf     -  
 Script ingoring lines     -  
 A PHP Number Guessing Script     -  
 Send Email from a PHP Script Example     -  
 Questions: Perl Script. PHP.     -  
 script for including files     -  



Topic Tags

R Classifiers






Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group
All copyrights reserved to codemiles.com 2007-2011
mileX v1.0 designed by codemiles team
Codemiles.com is a participant in the Amazon Services LLC Associates Program, an affiliate advertising program designed to provide a means for sites to earn advertising fees by advertising and linking to Amazon.com