Building a Quantile regression model in R script. Following example show the quantile regression at different quantile values (10%, 25%, 50%, .. 100%). Quantile regression model is important to understand the role of your model varriables along the different percentiles of your data.
- Code:
require(foreign)
require(ggplot2)
require(MASS)
require(boot)
data <- read.csv("Model14.csv",head=TRUE );
attach(data)
Y <- cbind(dupdifftriage3)
X <- cbind(descriptiontextcount, titletextcount, priority_val, comment_count, cc_count,repoter_rep,bug_severity_num)
z <- cbind(titletextcount)
library(quantreg)
quantreg10 <- rq(Y ~ X, data=data, tau=0.1)
summary(quantreg10)
quantreg25 <- rq(Y ~ X, data=data, tau=0.25)
summary(quantreg25)
quantreg50 <- rq(Y ~ X, data=data, tau=0.5)
summary(quantreg50)
quantreg75 <- rq(Y ~ X, data=data, tau=0.75)
summary(quantreg75)
quantreg90 <- rq(Y ~ X, data=data, tau=0.90)
summary(quantreg90)
quantreg2575 <- rq(Y ~ X, data=data, tau=c(0.1, 0.95))
summary(quantreg2575)
anova(quantreg25, quantreg75)
quantreg.all <- rq(log(Y+1) ~ X, tau = seq(0.05, 0.95, by = 0.05), data=data)
quantreg.plot <- summary(quantreg.all)
plot(quantreg.plot)
quantreg.all <- rq(log(Y+1) ~ z, tau = seq(0.05, 0.95, by = 0.05), data=data)
quantreg.plot <- summary(quantreg.all)
plot(quantreg.plot)
The script then "attaches" the data, which allows you to refer to the columns of the data by name without having to use the $ operator.
The script then defines several variables:
Y: a variable that is created by binding the 'dupdifftriage3' column of the data.
X: a variable that is created by binding several columns of the data together: 'descriptiontextcount', 'titletextcount', 'priority_val', 'comment_count', 'cc_count','repoter_rep','bug_severity_num'
z: a variable that is created by binding the 'titletextcount' column of the data.
The script then loads the 'quantreg' library. This library provides functions for quantile regression. Quantile regression is a statistical technique for estimating and predicting the quantiles of a response variable. The script then uses the rq() function from the 'quantreg' library to perform quantile regression on the data using different values of tau. The rq() function takes the form of "rq(Y ~ X, data=data, tau=value)" where Y is the response variable, X is the predictor variable, data is the data set, and tau is the quantile of interest. The script then uses the summary() function to summarize the results of each quantile regression.
Then script performs a simultaneous quantile regression using the rq() function with tau=c(0.1, 0.95) and then summarize the result. Then the script performs an ANOVA test for coefficient differences between quantile regression models with tau=0.25 and tau=0.75. Finally, the script plots the data using the plot() function. It first uses the rq() function to perform quantile regression on log(Y+1) as the response variable and 'X' as the predictor variable, using different values of tau. Then it uses the summary() function to summarize the results. And then the script plots the summary using the plot() function. It also performs the similar steps on 'z' as the predictor variable and log(Y+1) as the response variable.