# R and R programing

Data is better to be tab delimited than , delimited

# R Data Types

http://www.statmethods.net/input/datatypes.html

# R Courses

http://www-bcf.usc.edu/~gareth/ISL/ISLR%20First%20Printing.pdf

http://statweb.stanford.edu/~tibs/ElemStatLearn/

Carl James Schwarz

http://people.stat.sfu.ca/~cschwarz/CourseNotes/

swirl teaches you R programming

http://swirlstats.com/

http://personality-project.org/r/

http://xavier-fim.net/R/eng.html

# Rattle Videos

http://rattle.togaware.com/rattle-videos.html

http://rattle.togaware.com/

Data Mining with R: Day 2, Part 1:5   https://www.youtube.com/watch?v=WeSYTVQKm0Q&list=UUweVY4iwcto5qMrmo9VIQ5A

# R Videos

Roger Peng

R data sets are installed here:

C:\Program Files\R-3.1.1\library\datasets\data

R packages come with R dataset package which is automatically installed containing:

http://stat.ethz.ch/R-manual/R-patched/library/datasets/html/00Index.html

http://www.wekaleamstudios.co.uk/topics/r-environment/base-graphics/

Introduction to R Programming Online Course: (Day 1 of 16; Part 2 of 8)  https://www.youtube.com/watch?v=s0B1x08Qpzw

Engineering Data Analysis (with R and ggplot2) https://www.youtube.com/watch?v=TaxJwC_MP9Q&list=PL95AEF8D060866263

Book: Discovering Statistics Using R data   http://www.sagepub.com/dsur/study/articles.htm

http://www.ats.ucla.edu/stat/r/

“bartlett.test” function to formally test the homogeneity of variances.

# R commander rcmdr

http://www.mzandee.nl/statistiek/R/Rmanual_paper.pdf

R Commander – linear regression http://www.wekaleamstudios.co.uk/posts/r-commander-linear-regression/#more-1191

http://pages.wustl.edu/montgomery/teaching/quantitative-political-methods/course-book

prediction interval using “UsingR” package:

Book Linear Regression with R and R-commander Linear …

==============================================

http://www.statmethods.net/management/subset.html

# Sample R code:

data(airquality)

#list vars
names(airquality)

#showdata

airquality

#plot data
plot(Ozone~Solar.R,data=airquality)
#calculate mean ozone concentration (na´s removed)
mean.Ozone=mean(airquality$Ozone,na.rm=T­) abline(h=mean.Ozone) #use lm to fit a regression line through these data: model1=lm(Ozone~Solar.R,data=airquality) #Weighted to cancel heteroskedasticity model2=lm(Ozone~Solar.R,data=airquality,weights=(1/airquality$Ozone))
model1
abline(model1,col=”red”)
plot(model1)
termplot(model1)

summary(model1)

model1=lm(Ozone~Solar.R*Wind,data=airquality)

p1=predict(model1,data.frame(Solar.R=100,Wind=1:200))

p1

=============================================

#bring the package PerformanceAnalytics from menu

library(PerformanceAnalytics)

chart.Correlation(mycols[c(2,5:6)])   #columns 2 and 5 to 7 are selected

=======================

## Vector and list references in R

j=c(5:10)

“[” Index operator is actually a function ‘[‘(j,3)

“[[” Subset operator is actually a function ‘[[‘(j,3)

‘[‘(j,3)
‘[[‘(j,3)

j[3]
j[[3]]

#i is a vector
i <- c(5, 4,3,2,j)
# i is composition of a number of single item vectors this is the first vector in i
i[1]
#Pointing to index one one of the vector
class(i[1][1])
#Pointing to index wants Subset one of the vector
class(i[1][[1]])

#Pointing to index one two Of the vectorWhich doesn’t exist because you have air one dimensional vector
i[1][2] #NA
#Pointing to index one subset to which doesn’t exist in a one dimensional vector
i[1][[2]]
#this is the 6th vector in i And basically the two vectors are catinated
i[6]
s <- c(“s1”, “s2”, “s3”)
l <- c(TRUE, FALSE, TRUE)
#————————————————————-

#A list is a container of other objects.
amirlist <- list(i, s, l, 3,ll=list(i,j)) # x contains copies of n, s, b
##iv is a Subset of the list In index one As a vector identical to numeric vector
iv<-amirlist[[1]]
class(iv)
#iL is a list, This is what is an index one of the list therefore it is list
iL<-amirlist[1]
class(iL)
#what is in the index 5 of the list,Of course it is a list
amirlist[5]
class(amirlist[5])
#The first index of the first index remains a list
b<-amirlist[1][1][1][1][1]
class(b)
#list index [1][2] doesn’t exist because the list is one dimentional
amirlist[1][2]
#list index [5][2] doesn’t exist because the list is one dimentional
amirlist[5][2]
#The first subset of this fifth index of amirlist Is a list of two Victors
amirlist[5][[1]]
#This is identical with the syntax above
amirlist$ll #The first recursive index of the fifth index remained a list of two vectors amirlist[5][1][1][1][1] #The second Subset of what is indexed five doesn’t exist amirlist[5][[2]] #To access the value inside the list you have to use substance #The first subset of the list is a vector and at indexed two has a number a<-amirlist[[1]][2] class(a) #If we get this second subset of the first subset of the list we are getting the number b<-amirlist[[1]][[2]] class(b) #The second Subset of the fifth subset and the second index of the fifth subset is a vector amirlist[[5]][[2]] #The The first subset of the fifth subset of the list is another vector amirlist[[5]][[1]] #The fifth subset Index to exist and to vector which is identical to the second index of the list in the fifth place amirlist[[5]][2] #The second subset of the list the first subset the number 4 amirlist[[1]][[2]] #2nd Index to second subset of the fifth subset Is a value in the second vector amirlist[[5]][[2]][2] #assign 3 to Second subset of the first subset (second vector of first vector) amirlist[[1]][[2]]=3 ========================================================== A good correlation chart but deducer is better mycols=read.csv(“C:/Users/aghasemi/Desktop/Part1of428questions14Mar2015aft3.csv”) library(PerformanceAnalytics) library(corrplot) length(mycols) ncol(mycols) nrow(mycols) colnames(mycols) i=1 amirstep=5 j=2 seq(2,262, by=amirstep) for (j in seq(2,20, by=amirstep)) { k=j+amirstep k corrcheck<-mycols[c(i,j:k)] M<-cor(corrcheck,use=”complete”) M N=M[!is.na(M)] N corrcheck #we can use bg=c(“blue”,”red”,”yellow”) or bg=c(“black”,”blue”,”red”,”green”, “yellow”, “white”), pch=21 , # list of pch symbols http://www.endmemo.com/program/R/pchsymbols.php chart.Correlation(corrcheck, histogram = TRUE, method = “pearson”, col=”black”) #, lower = “ellipse”, upper = “circle” } # the meaning of * double checked with deducer (0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 1) #. P ≤ 0.1 #* P ≤ 0.05 #** P ≤ 0.01 #*** P ≤ 0.001 #**** P ≤ 0.0001 ============================================ THIS METHOD HAS NO p-vale BUILT IN http://cran.r-project.org/web/packages/corrplot/vignettes/corrplot-intro.html mycols=read.csv(“C:/Users/aghasemi/Desktop/Part1of428questions14Mar2015aft3.csv”) library(PerformanceAnalytics) library(corrplot) length(mycols) ncol(mycols) nrow(mycols) colnames(mycols) i=1 amirstep=5 j=2 seq(2,262, by=amirstep) k=j+amirstep k corrcheck<-mycols[c(i,j:k)] M<-cor(corrcheck,use=”complete”) # ignore NAs M par(cex=0.3) # to control the font size corrplot.mixed(M,upper=”ellipse”) #to control the file saved #png(height=1200, width=1500, pointsize=15, file=”C:/Users/aghasemi/Desktop/overlap.png”) corrplot(M, method = “color”, addCoef.col=”black”) # a simple nice chart corrplot(M, method = “ellipse”, addCoef.col=”black”, tl.pos = “s”, tl.srt = 45) # a better chart title at side with angle ============================================ ## For, While , If, and Print in R #for (vector counter) #{ # Statements #} #while (constraint condition) // while is a keyword #//returns bool (true/false) value #{ //opening curly brackets # //Statements #} // closing curly brackets #definition and initialization of vector amir = seq(1, 100, by=2) amir # vector operation amir^2 is fast #azi.squared is a vector I created azi.squared = amir^2 azi.squared #serial implementation #Define a dynamic array this empty I created nilou.squared = NULL #define a fixed length array for faster operation and initials with NA nilou.squared = rep(NA, 200) nilou.squared summary(nilou.squared) for (i in 1:50 ) { # or for (i in 1:length(amir) nilou.squared[i] = amir[i]^2 } nilou.squared[2] nilou.squared if (nilou.squared[2] == 9){ sprintf(“Nice”) #stop can give Error message #stop(“hello”) }else { sprintf(“oops”) } sprintf(“%1.0f”,nilou.squared[3]) print(nilou.squared[3]) print.table(nilou.squared[2:5]) #get rid of excess NAs #! means not # FOR EVERY CELL IN nilou.squared put the value in place only if that cell is not NA nilou.squared = nilou.squared[!is.na(nilou.squared)] nilou.squared summary(nilou.squared) silas.print=NULL i=1 while(nilou.squared[i]<1000) { silas.print[i]= nilou.squared[i] i=i+1 } silas.print the result will be > #for (vector counter) > #{ > # Statements > #} > #while (constraint condition) // while is a keyword > #//returns bool (true/false) value > #{ //opening curly brackets > # //Statements > #} // closing curly brackets > > #definition and initialization of vector > amir = seq(1, 100, by=2) > amir [1] 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71 73 75 77 79 [41] 81 83 85 87 89 91 93 95 97 99 > # vector operation amir^2 is fast > #azi.squared is a vector I created > azi.squared = amir^2 > azi.squared [1] 1 9 25 49 81 121 169 225 289 361 441 529 625 729 841 961 1089 1225 1369 1521 1681 1849 2025 2209 [25] 2401 2601 2809 3025 3249 3481 3721 3969 4225 4489 4761 5041 5329 5625 5929 6241 6561 6889 7225 7569 7921 8281 8649 9025 [49] 9409 9801 > #serial implementation > #Define a dynamic array this empty I created > nilou.squared = NULL > #define a fixed length array for faster operation and initials with NA > nilou.squared = rep(NA, 200) > nilou.squared [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA [40] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA [79] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA [118] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA [157] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA [196] NA NA NA NA NA > summary(nilou.squared) Mode NA's logical 200 > for (i in 1:50 ) { # or for (i in 1:length(amir) + nilou.squared[i] = amir[i]^2 + } > nilou.squared[2] [1] 9 > nilou.squared [1] 1 9 25 49 81 121 169 225 289 361 441 529 625 729 841 961 1089 1225 1369 1521 1681 1849 2025 [24] 2209 2401 2601 2809 3025 3249 3481 3721 3969 4225 4489 4761 5041 5329 5625 5929 6241 6561 6889 7225 7569 7921 8281 [47] 8649 9025 9409 9801 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA [70] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA [93] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA [116] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA [139] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA [162] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA [185] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA > if (nilou.squared[2] == 9){ + sprintf("Nice") + #stop can give Error message + #stop("hello") + }else { + sprintf("oops") + } [1] "Nice" > sprintf("%1.0f",nilou.squared[3]) [1] "25" > print(nilou.squared[3]) [1] 25 > print.table(nilou.squared[2:5]) [1] 9 25 49 81 > > #get rid of excess NAs > #! means not > # FOR EVERY CELL IN nilou.squared put the value in place only if that cell is not NA > nilou.squared = nilou.squared[!is.na(nilou.squared)] > nilou.squared [1] 1 9 25 49 81 121 169 225 289 361 441 529 625 729 841 961 1089 1225 1369 1521 1681 1849 2025 2209 [25] 2401 2601 2809 3025 3249 3481 3721 3969 4225 4489 4761 5041 5329 5625 5929 6241 6561 6889 7225 7569 7921 8281 8649 9025 [49] 9409 9801 > summary(nilou.squared) Min. 1st Qu. Median Mean 3rd Qu. Max. 1 651 2501 3333 5551 9801 > silas.print=NULL > i=1 > while(nilou.squared[i]<1000) + { + silas.print[i]= nilou.squared[i] + i=i+1 + } > silas.print [1] 1 9 25 49 81 121 169 225 289 361 441 529 625 729 841 961  >  dot “.”  a <- list(b=1) class(a) <- "myclass" Now declare myfunction as standard generic in the following way:  myfunction <- function(x,...) UseMethod("myfunction") Now declare the function  myfunction.myclass <- function(x,...) x$b+1 Then the dot has special meaning. For all objects with class myclass calling  myfunction(a) will actualy call function myfunction.myclass:  > myfunction(a) [1] 2 This is used widely in R, the most appropriate example is function summary. Each class has its own summary function, so when you fit some model for example (which usually returns object with specific class), you need to invoke summary and it will call appropriate summary function for that specific model.

===============================================

R_Scilab_DCOM3.0-1B5.exe

R (D)COM Server and RExcel
This package contains a DCOM server used to connect a client application (e.g. Microsoft Excel) with R.

R (D)COM Server provides a COM-Interface to R as well as various COM objects and Active X controls for your applications. Additionally, an Add-In for Microsoft Excel is provided to easily use R in Excel and create statistical applications with Excel as the main GUI. The main features of this package are:

-COM server for local and remote use of R
-transfer of data into/from R, including NA, NaN,…
-Active X Controls for text and graphics output
-Installation/Uninstallation
-Repository for R instances for shared and exclusive access
-Many Samples