Topics: 1. WRITING YOUR OWN FUNCTIONS 2. INPUT/OUTPUT FILES 3. Avoiding loops 4. Factors 1. WRITING YOUR OWN FUNCTIONS Comparison Operators -------------------- < less than > greater than == equal to ! not <= less than or equal to >= greater than or equal to != not equal to ## We create a matrix called size size.2<- matrix(c(130,26,110,24,118,25,112,25),ncol=2,byrow=T) heights <- c(140, 155, 142, 175) size<-cbind(size.2,heights) size<- rbind(size,c(128,26,170)) > size[,3] > 160 [1] FALSE FALSE FALSE TRUE TRUE * comparison operators compare data values and return logical values of T (True) when the comparison is true and F (False) when the comparison is false > size[,1] == 110 [1] FALSE TRUE FALSE FALSE FALSE Logical objects are coerced to numbers when used with functions that require numerical values. When this occurs, TRUE is equivalent to 1 and FALSE is equivalent to 0. > x <- c(1, 2, 3, NA) > i <- !is.na(x) > i * the expression !is.na(x) creates a vector [1] T T T F of logical values: T when x is not a missing value, F when x is missing > x <- c(1, 2, 3, NA) > sum(is.na(x)) > 0 [1] T > x <- !is.na(x) > sum(is.na(x)) > 0 [1] F Functions Although there are many functions available in R, there are times when a function tailored to a specific situation is needed. Functions are written in the form: function name<-function(arguments){expressions} where arguments is a list of arguments separated by commas which may be used by the function, and expressions are any legal R expressions. The following is an example of a function in R: std.dev<-function(x) { std.x<- sqrt(var(x)) std.x } The last line in the function gives the name of the value to be returned by the function. > std.dev(c(1,2,3,4,5)) > [1] 1.581139 Conditional Computations if, else if( condition ) expr1 or if( condition ) expr1 else expr2 If the logical expression condition is evaluated to TRUE then the expression expr1 is evaluated. If condition is evaluated to FALSE, then either the value of the whole expression is NULL, or when an else part is specified, expr2 is evaluated: > x<- 2 > if (x > 0) log(x) [1] 0.6931472 > if (x > 100) (x - 100) > if (x < 0) x else log(x) [1] 0.6931472 Logical Operators && sequential and || sequential or It is not always necessary to evaluate both sides of a logical expression using & or |. The sequential operators && and || take advantage of this to avoid errors which could crash the function. If the left operand is F, && returns FALSE and the rest of the expression is not evaluated. If the left operand is T, || returns TRUE and the rest of the expression is not evaluated. Both operators can be used only with scalars. > y<- "Jo" > if (is.numeric(y) && (y > 0)) log(y) If the first logical expression is FALSE, evaluating the second expression would cause an error. stop, warning stop() issues an error or warning message from a function and causes the function to be exited: y<- -1 if (y > 0) log(y) else stop("y is not greater than 0") #try y<- 10 if (y > 0) log(y) else stop("y is not greater than 0") y<- -1 if (y > 0) log(y) else stop("y is not greater than 0") #Error: y is not greater than 0 warning() prints a warning message but does not cause the function to be exited: > x<-c(1,2,3,4,5,NA) > if (sum(is.na(x)) > 0) {x<-x[!is.na(x)] warning("Missing values have been deleted")} Warning messages: Missing values have been deleted > x [1] 1 2 3 4 5 ###Iteration for for() loops are written in the form: for( index in values) { expressions } where the curly brackets are optional when only one expression is specified. > x<-0 > for(i in 1:10) x<-x+i > x [1] 55 for() loops may also be used to construct data objects by using index as a subscript for the variable being created. In this case, the variable must be initiated outside the for() loop. > rm(x) > for(i in 1:10) x[i]<-i Error: Object "x" not found > x<-NULL > for(i in 1:10) x[i]<-i > x [1] 1 2 3 4 5 6 7 8 9 10 To create a matrix using two for() loops, a null matrix of the correct dimensions must be created outside the for() loops. > x<-matrix(0, nrow=3, ncol=4) > for(i in 1:3) { for(j in 1:4) { x[i,j]<-i+j }} > x [,1] [,2] [,3] [,4] [1,] 2 3 4 5 [2,] 3 4 5 6 [3,] 4 5 6 7 ##while The while() expression keeps testing a condition and, so long as the condition is TRUE, evaluates the expressions provided. while() iterations are written in the form: while(condition) { expressions} x<-2 i<-0 while(x*2 < 1000000) { x<-x*2 i<-i+1} x [1] 524288 i [1] 18 Notice that in order for the last value of x to be less than 1000000, the condition in the while() expression must be x*2 < 1000000 and NOT x < 1000000. ##repeat A repeat() expression could also have been used in the previous example: x<-2 i<-0 repeat{ i<-i+1 x<-x*2 if(x*2 > 1000000) break} > x [1] 524288 > i [1] 18 repeat() will keep repeating the expressions forever unless some condition is set within the iteration to break out of the loop. Here, a break expression terminates the loop. When R evaluates a break, it exits the innermost enclosing for(), while(), or repeat() loop. ###Default arguments, return As mentionned in previous sections, a function may have a default setting for some of its arguments. An example of this was the mean() function, where the default for the argument trim= is trim=0. The first line of the function mean() looks like this: mean<-function(x, trim = 0, na.rm = F) trim: fraction (between 0 and .5, inclusive) of values to be trimmed from each end of the ordered data. If trim=.5, the result is the median. Specifying default values for some of the arguments in a function allows the user to omit those arguments when making a call to the function. Typing > mean(x) returns a regular mean, and specifying the trim= argument, for example > mean(x, trim=0.2) will return a trimmed mean. The same arguments can be applied to the na.rm= argument. > x<-c(NA,2,1) > mean(x) > [1] NA > mean(x, na.rm = T) [1] 1.5 > Objects which are created or modified inside a function are temporary, have no effect outside the function, and disappear when the evaluation of the function is complete. There are several ways of returning the value of a function, one of which is to name the value to be returned on the last line of the function. This value can be stored permanently by assigning a name to the function evaluation. For example: > x.mean<-mean(x) creates a data object, x.mean with the mean of the variable x. It is possible to return more than one data object using the return() function. The return() function creates a list with its arguments as the components of the list. example<-function(x) { x.mean <- mean(x) x.var <- var(x) x.sum <- sum(x) return(x.mean, x.var, x.sum) } > x<-c(1,2,3,4,5) > example(x) $x.mean: [1] 3 $x.var: [1] 2.5 $x.sum: [1] 15 ---------------------------------------------------------------------- 2. INPUT/OUTPUT ## read table read.table("climate")->climate #Reads a file in table format and creates a data frame from it, # with cases corresponding to lines and variables to fields in the # file. ##cat The cat() function coerces its arguments to mode character and prints the result. This function is very useful for keeping track of the progress of loops. > y<-NULL > for(i in 1:3){ y<-y+i cat("i is", i, "\n")} i is 1 i is 2 i is 3 ###scan The scan() function reads in numeric data interactively or from a file. The argument n= specifies how many data values are to be read in. example<-function(){ cat("Enter 10 numbers","\n") x<-scan(n=10) total<-sum(x) cat("The total is",total,"\n")} #type > example() The scan() function can be used to read in numeric data into a matrix. Suppose the file lab2.dat contains rainfall for two months from 1900-1905, and that the data is stored in 2 columns. The data would be read in as follows: > rain<-matrix(scan("lab2.dat"),ncol=2, byrow=T) To read in character data, the argument what=character() must be specified. More information on the options for the scan(), and cat() functions is available in the help documentation. ## paste: Concatenate vectors after converting to character 'paste' converts its arguments to character strings, and concatenates them (separating them by the string given by 'sep'). If the arguments are vectors, they are concatenated term-by-term to give a character vector result. paste("Test", 1:10, sep = "") [1] "Test1" "Test2" "Test3" "Test4" "Test5" "Test6" "Test7" "Test8" [9] "Test9" "Test10" paste("Today is", date()) [1] "Today is Wed Sep 22 14:16:07 2004" # date() is a charater vector with the date. ### source Simple functions may be typed directly into R, but this is not adviseable in the case of longer functions. These should be typed in a UNIX file and read into R using the source() function. Suppose the function example was stored in a file called ex1. The following command would read the function into R: > source("ex2") The source() function may be used to read in any R expression. Rather than typing the expressions directly into R, they can be stored in a file and evaluated using the source() function. Suppose the file ex2 contains the following R expressions: friends<-c("Jack","Jill") phones<-c(5554321,5551234) cat(friends[1],phones[1],"\n") cat(friends[2],phones[2],"\n") The expressions are then evaluated using the source() function: > source("ex2") Jack 5554321 Jill 5551234 Unlike data objects created within a function, those created from a file outside R and evaluated using source() remain in the .RData file after the expressions have been evaluated: > friends [1] "Jack" "Jill # function that standardizes a data set # sdz<- function( x){ ( x- mean(x))/std.dev(x)} ls() # you can see that this is stored as a dataset in your working directory sdz # list out this function # # notice the basic parts the calling arguments ( only one in this case) # and the body enclosed in curly brackets. # # The last line of a function is the object that is returned # ( in this case there is only one line so of course it is the last) # sdz( climate$jan)-> temp # here is another example of a function that returns a list pm<- function( a,b) { temp1<- a+b temp2<- a-b list( plus=temp1, minus=temp2) } pm( 1:10, 10:1) # note that like every thing else in R it works on vectors iqs<- function(x){ temp<- quantile(x,c(.25,.75)) con<- qnorm( .75)- qnorm( .25) # theoretical interquartile range for N(0,1) (temp[2]-temp[1])/con # standardized to be approx for N(0,1) } # Now try it out! iqs( rnorm(500)) rcauchy(50)-> zork iqs( zork) ---------------------- 3. AVOID LOOPS (for efficient programming) system.time(expr) Return CPU times that "expr" used. A numeric vector of length 5 containing the user cpu, system cpu, elapsed, subproc1, subproc2 times. The subproc times are the user and system cpu time used by child processes (and so are usually zero). For example (example provide by Li Chen): system.time(for(i in 1:50) x<-mean(rnorm(1000))) [1] 0.28 0.00 0.28 0.00 0.00 #The other way to get system time is using proc.time() function. #`proc.time' determines how much time (in seconds) the currently # running R process already consumed. For example: now<-proc.time() for(i in 1:50) x<-mean(rnorm(1000)) proc.time()-now [1] 0.29 0.01 0.31 0.00 0.00 For example to calculate the mean of 1, 2, ..., 10000. x<-c(1:10000) sum<-0 now<-proc.time() for(i in 1:10000) sum<-sum+x[i] meanx<-sum/10000 proc.time()-now #[1] 0.41 0.00 0.42 0.00 0.00 use vectorized arithmetic instead of loops: system.time(sum(x)/length(x)) #[1] 0.01 0.00 0.01 0.00 0.00 use R function. system.time(mean(x)) #[1] 0 0 0 0 0 ########################## 4. Factors A factor is an object that represents values from some specified set of possible levels. For example, a factor Sex might represent one of two values, "Male" or "Female". Creating factors Factors may be created using the factor() function or by converting a character or numeric object using the factor() or as.factor() functions. Factors may also be created by splitting a data object into groups. These methods will be illustrated to create the following factor: [1] 20-35yrs 20-35yrs 35-55yrs 35-55yrs 20-35yrs 55+yrs 20-35yrs 35-55yrs 1) Using the factor() function: > age<-factor(c(1,1,2,2,1,3,1,2),labels=c("20-35yrs","35-55yrs","55+yrs")) > age [1] 20-35yrs 20-35yrs 35-55yrs 35-55yrs 20-35yrs 55+yrs 20-35yrs 35-55yrs 2) Converting a numeric object: > age<-c(1,1,2,2,1,3,1,2) > age<-factor(age,labels=c("20-35yrs","35-55yrs","55+yrs")) 3) Converting a character object: > age<-c("20-35yrs","20-35yrs","35-55yrs","35-55yrs","20-35yrs","55+yrs", "20-35yrs","35-55yrs") > age<-as.factor(age) The function as.factor() may also be used to convert a numeric object to a factor object, however, it is not be possible to assign labels to the factor levels using the function as.factor(). 4) Splitting a data object into groups: > age<-c(22,31,37,52,27,60,34,53) > age.groups<- cut(age, breaks=c(20,35,55,80), labels=c("20-35yrs", "35-55yrs","55+ yrs")) > age.groups [1] 20-35yrs 20-35yrs 35-55yrs 35-55yrs 20-35yrs 55+ yrs 20-35yrs 35-55yrs Levels: 20-35yrs 35-55yrs 55+ yrs The function cut() creates an object of mode category. Category objects were used by the previous version of R and have now been replaced by factor objects. The category object age.groups can be converted to a factor object using the function factor(). > age<-factor(age.groups) The function cut() may also be used to split a data object into groups of equal width: > age<-c(22,31,37,52,27,60,34,53) > cut(age, breaks=3) [1] (22,34.7] (22,34.7] (34.7,47.3] (47.3,60] (22,34.7] (47.3,60] [7] (22,34.7] (47.3,60] Levels: (22,34.7] (34.7,47.3] (47.3,60] The function pretty() creates "pretty" break points which can then be used by the function cut() to split the data: > age<-c(22,31,37,52,27,60,34,53) > cut(age,pretty(age)) [1] (20,30] (30,40] (30,40] (50,60] (20,30] (50,60] (30,40] (50,60] Levels: (20,30] (30,40] (40,50] (50,60] ##Table The table() function creates a contingency table. age<-c("20-35yrs","20-35yrs","35-55yrs","35-55yrs","20-35yrs","55+yrs", "20-35yrs","35-55yrs") > table(age) 20-35yrs 35-55yrs 55+yrs 4 3 1 Any number of arguments may be given to the table() function: > sex<-factor(c(1,2,2,1,2,1,2,2), labels=c("Female","Male")) > table(sex, age) 20-35yrs 35-55yrs 55+yrs Female 1 1 1 Male 3 2 0 ---------------- Assignment: a) Write a function in R that calculates the median of each column of a given matrix. b) Write a function in R that calculates the median of each column of a given matrix, and accepts missing values (NA) in some of the matrix rows. c) Write a function in your directory that calculates the mean and standard deviation of a variable. Use the source command to get access to that function and apply it to the variable rain in the climate dataset (climate$rain). d) Calculate the time it takes to run your function