INTRO LECTURE TO R File with computer lab notes: Click on your R icon, go to FILE and click on "Change directory", write there whatever directory you want to work on, in which you have your data fiels. go to FILE and click on "Load workspace" to retrieve previous R programs. go to PACKAGES and click on "Load package" to get access to a library in R with some R functions of interest. To learn precisely what each function does, and get on-line documentation, type help at the R prompt ">" > help(help) For example to get help about the function plot, type > help(plot) or just > ?plot To bring up the Netscape help window, type > help.start() Please type help(INSTALL) or help(install.packages) in R for information on how to install packages from this directory. >help(install.packages) __________________________________________________________________________ 1. To find out which additional packages are available on your system, library() at the R prompt. ( It seems that all the package have already been installed.) 2.You can load the installed package, say geoR, by library(geoR) 3. To find out which functions provide by the package geoR, use library(help=geoR) or help(package=geoR) 4.unload the loaded package by detach("package:geoR") ====================================================================== # R is case sensitive. All objects created during an R session can be stored # permanently in a file for use in Future R sessions. # At the end of each R session you are given the opportunity to save # them, if you want to do this, the objects are written to # a file called .RData in the current directory. Entering Data in R: Scalar > height <- 175 * read as "height gets 175" * assigns the value 175 to the scalar name height > person<- "Jo" * character values are inserted in quotes * if the quotes are omitted, R will look for a data object Jo to assign to person Vector > heights <- c(160,140,155) * the function c() "collects" the values 160, 140, and 155 and stores them into the vector heights > people <- c("Ned","Jill","Pat") * creates a vector of names > names(heights) <- people * the names() function assigns names to the elements of a vector * the word people is not inserted in quotes, it refers to the vector people, and not the word itself > heights Ned Jill Pat * typing the name of an object by itself 160 140 155 causes its value to be printed on the terminal > heights["Ned"] Ned * when an object has a names attribute, 160 its elements can be referred to by name > names(heights) <- NULL * deletes the names attribute of the vector heights Extracting data from an object using subscripts > object[subscript] * syntax for subscripting, where object is the name of the data object and subscript defines which elements to extract * the expression heights["Ned"] above is an example of extracting data using a subscript > heights[2] [1] 140 * extracts the second element from heights * [1] refers to the position of the first element on the given line - this is very useful when vectors are several lines long > heights[c(2,1,2)] [1] 140 160 140 * extracts the second, first, and second elements from heights * the c() function is used in the subscript when more than one element is listed > heights[heights < 160] [1] 140 155 * returns all the values of heights which are less than 160 * this is NOT equivalent to > heights[ < 160] which would return the first 159 elements in the vector (eg.: the subscript numbers < 160) > heights[-2] [1] 160 155 * returns all except the second value in heights > heights[1] <- 162 > heights * assigns the value 162 to the first [1] 162 140 155 element of heights * the old object heights has now been replaced by the new object heights > heights[4] <- 135 > heights * appends the value 135 to the vector [1] 162 140 155 135 heights > heights <- append(heights,height) > heights * the function append() creates a new [1] 162 140 155 135 175 vector with the first values the same as heights and the last value as height (recall that height was previously assigned the value 175) * the function append() binds two objects into a vector * the arguments may be vectors, scalars, or both * this is equivalend to > heights <- c(heights,height) > heights.1 <- append(heights,180,after=2) [1] 162 140 180 155 135 175 * the argument after specifies the index of heights after which the new values are to be inserted > heights<- replace(heights,2,142) > heights * replaces the second value in heights with [1] 162 142 155 135 175 the value 142 and stores the new vector in heights * the first argument specifies the name of the data obejct, the second specifies the indices of the elements to be replaced, and the third argument specifies the values the elements are to be replaced with * this expression is equivalent to > heights[2]<- 142 > heights.2<- replace(heights,c(2,4),c(140,142)) > heights.2 * replaces the second and fourth values of [1] 162 140 155 142 175 heights by the values 140 and 142 and stores the result into heights.2 > numbers <- 1:5 > numbers * the operator ":" creates a sequence from [1] 1 2 3 4 5 1 to 5 * the syntax for the sequence operator is from:to > heights <- heights.2[2:5] > heights * assigns the last four elements of the [1] 140 155 142 175 vector heights.2 to the vector heights > length(heights) [1] 4 * returns the length (number of elements) of the object heights The data objects you have just created are stored in your .RData directory. To see a list of the data objects (and later on, functions) you have created, type ls() To remove an object or function from your .Data directory, use the rm() function. For example, to remove the scalar height, type rm(height) You can also remove more than one data object at a time. To remove the scalar person and the vector numbers, type rm(person,numbers) MATRICES > size.1 <- matrix(c(130,26,110,24,118,25,112,25),ncol=2) > size.1 * the function matrix() reads data into a matrix [,1] [,2] * the number of columns is specified using the [1,] 130 118 argument ncol= # [2,] 26 25 * alternatively, the number of rows can be [3,] 110 112 specified using the argument nrow= # or both [4,] 24 25 nrow and ncol can be specified * when neither nrow nor ncol are specified, the data is read in as a one column matrix > size.2<- matrix(c(130,26,110,24,118,25,112,25),ncol=2,byrow=T) > size.2 * specifying byrow=T forces R to read the [,1] [,2] data in row by row [1,] 130 26 * when the argument is not specified, or specified [2,] 110 24 as byrow=F, R assumes the data is [3,] 118 25 written in column by column [4,] 112 25 LISTS Names can be assigned to the rows and to the columns of matrices using the dimnames() and list() functions. The list() function may be used to combine data objects of different modes (eg.: numeric, character,...)or different types (vector, matrix) into one object of mode list. Here, the list() function is used to combine two vectors of differents lengths. The list is therefore made up of two components: the first component corresponds to the row names, and the second component corresponds to the column names. > size.names<- list(c("Abe","Bob","Carol","Deb"),c("Weight","Waist")) > size.names [[1]]: [1] "Abe" "Bob" "Carol" "Deb" [[2]]: [1] "Weight" "Waist" Notice the double square brackets: whereas single square brackets are used to extract data from a vector, double square brackets are used to extract components from a list: > size.names[[2]] [1] "Weight" "Waist" The individual components in the list retain their properties as vectors and as such, individual elements can be extracted from each component in the same way as in any other vector: > size.names[[2]][2] [1] "Waist" Names can also be assigned to the components of a list: > names(size.names)<- c("Rows","Columns") > size.names $Rows: [1] "Abe" "Bob" "Carol" "Deb" $Columns: [1] "Weight" "Waist" The components of the list can then be extracted using their names attribute: > size.names$Rows [1] "Abe" "Bob" "Carol" "Deb" > size.names$Rows[2] [1] "Bob" > dimnames(size.2) <- size.names > size.2 Weight Waist * the dimnames() function assigns names to the Abe 130 26 dimensions of a data object (in this case, Bob 110 24 the rows and columns of size.2) Carol 118 25 Deb 112 25 > size.2<- matrix(c(130,26,110,24,118,25,112,25),ncol=2,byrow=T, dimnames=list(c("Abe","Bob","Carol","Deb"),c("Weight","Waist"))) * it is possible to assign dimnames directly from within the matrix function * expressions can be spread over several lines, simply hit return at the end of the line and R prompts for a continuation line by means of the "+" character (this may also happen if you omit to close all open brackets or strings) > dimnames(size.2) <- list(NULL,c("Weight","Waist")) * the NULL object is used when no dimnames are to be assigned to a dimension > abc<- size.2 > dimnames(abc)<- list(c("Abe","Bob","Carol","Deb"),dimnames(size.2)[[2]]) * this command assigns dimnames to the rows of abc and assigns the column dimnames of size.2 to the columns of abc > size<-cbind(size.2,heights) > size * cbind() (column bind) "binds" together Weight Waist heights vectors and matrices columnwise into a [1,] 130 26 140 new matrix [2,] 110 24 155 * cbind() "binds" the vector heights [3,] 118 25 142 columnwise to the matrix size.2 and [4,] 112 25 175 stores the resulting matrix in size * the name heights is automatically assigned to the third column of the matrix size > size<- rbind(size,c(128,26,170)) > size * rbind() (row bind) "binds" together Weight Waist heights vectors and/or matrices rowwise into [1,] 130 26 140 a new matrix [2,] 110 24 155 [3,] 118 25 142 [4,] 112 25 175 [5,] 128 26 170 > x<-c(1,2,3) > y<-diag(x) > y * the function diag() creates a matrix with [,1] [,2] [,3] the vector y on the main diagonal [1,] 1 0 0 * the main diagonal of a matrix are those [2,] 0 2 0 elements whose row number and column [3,] 0 0 3 number are the same * the number of rows or columns can be specified using the arguments nrow or ncol > diag(y) [1] 1 2 3 * alternatively, when the argument is a matrix, diag() returns the diagonal of the matrix > col(y) [,1] [,2] [,3] * the function col() returns a matrix of [1,] 1 2 3 column numbers [2,] 1 2 3 * similarly, the function row() returns a [3,] 1 2 3 matrix of row numbers Extracting data from a matrix > size[2,3] heights * to extract one value from a matrix, it is 155 necessary to use two elements in the subscript: the first element applies to the rows, the second element applies to the columns * the full subscript expression applies to the elements of the matrix that satisfy both the row and the column condition * in this case, the element in the second row, third column of the matrix size is printed > size[2,] Weight Waist heights * if one dimension is not specified in the 110 24 155 subscript, all elements in that dimension are extracted * in this case, the columns are not specified so all the columns are included > size[,3] [1] 140 155 142 175 170 * prints the third column of the matrix size * in both examples, the comma must be kept in as a marker to indicate which dimension is specified * in both of these examples, R drops the extra dimension so that the result is a vector > size[2, ,drop=F] Weight Waist heights * to retain the matrix properties for the [1,] 110 24 155 result (which might be necessary in some computations), add drop=F to the subscripts * notice that two commas were used in the subscript, one to separate the row from the column (not specified) dimensions, the other to separate the indices from the argument drop > is.matrix(size[2,]) [1] F * is.matrix is a logical expression which tests whether an object is a matrix > is.matrix(size[2, ,drop=F]) [1] T * as seen above, when a single row or column is extracted from a matrix, the matrix properties are dropped unless otherwise specified in the argument drop > size[,c(1,3)] Weight heights * the c() function is used in matrix [1,] 130 140 subscripts in the same way as it is used [2,] 110 155 in vector subscripts [3,] 118 142 * here, the first and third columns of the [4,] 112 175 matrix size are printed out [5,] 128 170 > size[,c("Weight","Waist")] Weight Waist * character subscripts are used in the same [1,] 130 26 way as numeric subscripts: the first [2,] 110 24 element in the subscript specifies the [3,] 118 25 rows, and the second element in the [4,] 112 25 subscript specifies the columns [5,] 128 26 > size[-2,-3] Weight Waist * negative subscripts have the same meaning [1,] 130 26 for the rows and columns of matrices that [2,] 118 25 they have for elements of a vector [3,] 112 25 [4,] 128 26 Suppose you wished to print the weights of those people taller than 160cm: the expression size[,1] will print all the weights in the matrix size. It is necessary to limit the rows to be printed to those rows where the value for heights (column 3) is greater than 160cm, ie.: those rows which satisfy the condition 'size[,3] > 160'. Combining these two expressions gives > size[size[,3] > 160,1] [1] 112 128 * this command pulls out the weights (column 1) of those people (rows) with height (size[,3]) greater than 160 > size[size[,3] > 160,2] [1] 25 26 * this command pulls out the weights (column 2) of those people (rows) with height (size[,3]) greater than 160 Matrix Attributes > dim(size) [1] 5 3 * the dim() function returns the dimensions of an object * in the case of matrices, the first element is the number of rows in the matrix and the second element is the number of columns > nrow(size) [1] 5 > ncol(size) [1] 3 * the functions nrow() and ncol() are based on the function dim() and return the number of rows or the number of columns in the matrix ============================================================== Data types is R array Multi-Way Arrays category Category character Character objects complex Complex Valued objects double Double Precision objects factor Factor objects function Function objects integer Integer objects list List objects logical Logical objects matrix Matrix objects na Missing values name Name objects null Null objects numeric Numeric objects single Single Precision objects ts Time Series objects vector Vectors is.data type is used to test the data type of a data object. as.data type changes the data type of a data object. > age<-c("10","18","76","65","32") > age [1] "10" "18" "76" "65" "32" > age + 10 Error in age + 10 : non-numeric argument to binary operator > is.numeric(age) [1] F > age <- as.numeric(age) > age [1] 10 18 76 65 32 > age + 10 [1] 20 28 86 75 42 > x<- c(1,2,3,NA) > is.na(x) [1] F F F T To kepp only the values of x that are not missing > x<- x[!is.na(x)] > x [1] 1 2 3 > ================================================================== EXERCISES Lab 1: a) Create the following matrix called grades, and put in the approriate label names. Test1 Test2 Test3 Final [1,] 20 23 18 48 [2,] 16 15 18 40 [3,] 25 20 22 40 [4,] 14 19 18 42 b) Add the following row to the bottom of the matrix: 10 15 14 30 c) Change the fifth grade for test #2 from a 15 to a 17. d) Print all the grades for test #3. e) Print the final grades for those people with grades greater than 16 on test #1. f) Print the grades matrix without the column for test #3. g) Print the number of rows in the matrix.