Download notes for Lab 2: Computations and Data Manipulations ___________________________________ Data Transformations 1. The following functions compute elementary numerical results on a vector x. They could just as easily have been used on a scalar, matrix, or any other numerical object. ceiling floor trunc round signif print > x<-c(-1.90691,0.76018,-0.26556,-1.89828,0.08571,NA) * NA means "not available", or missing * normally, the result of operating on an NA is another NA > ceiling(x) [1] -1 1 0 -1 1 NA * in this case, where x is a vector, the function is applied to each element in x * the ceiling() function rounds UP to the next integer > floor(x) [1] -2 0 -1 -2 0 NA * floor() rounds DOWN to the next integer > trunc(x) [1] -1 0 0 -1 0 NA * trunc() returns only the integer part of the elements of x > round(x) [1] -2 1 0 -2 0 NA * round() rounds values to the nearest integer value * values of n.5 are rounded to the nearest even integer > round(x,1) [1] -1.9 0.8 -0.3 -1.9 0.1 NA * optionally, round() can take a second argument which specifies the number of decimal places to round to > signif(x,2) [1] -1.900 0.760 -0.270 -1.900 0.086 NA * signif() rounds data to the specified number of significant digits * all the numbers are printed to the same format * 0's are added where necessary > print(x,digits=1) [1] -1.91 0.76 -0.27 -1.90 0.09 NA * the print() function with the optional argument digits prints the numeric object x to the specified number of significant digits * all the elements of x are then printed to the same format 2. The following functions are used on vectors. In some cases, they may also be used on matrices, but you may not always get the result you expect. sum prod cumsum cumprod diff > x<-(1:5) > sum(x) [1] 15 * sum() calculates the sum of all the values in x > prod(x) [1] 120 * prod() calculates the product of all the elements in x > cumsum(x) [1] 1 3 6 10 15 * cumsum() returns an object with each element the sum of all the elements in x up to that point * if x is a matrix, cumsum() will find the cumulative sums columnwise > cumprod(x) [1] 1 2 6 24 120 * cumprod() returns an object with each element the product of all the elements in x up to that point > x<-c(1,4,8,2,1) > diff(x) [1] 3 4 -6 -1 * returns an object where the ith element is equal to x[i+1]-x[i] * when x is a matrix, the function diff() calculates the differences separately for each column > diff(x, lag=2) [1] 7 -2 -7 * optionally, the argument lag can be specified such that the ith element is equal to x[i+lag]-x[i] eg.: in this case 8-1, 2-4, 1-8 * the diff() functions returns a vector with length(x)-lag elements Arithmetic Operators + Addition - Subtraction * Multiplication (performs elementwise multiplication on a matrix) / Division ^ Exponential x^2 == x*x x^(1/3) == the cube root of x %/% Integer divide e1%/%e2 == floor(e1/e2) %% Modulo function e1%%e2 == e1 - (e1%/%e2)*e2 The usual arithmetic operators work as one would expect: if x is a numeric object, then x*2 multiplies each element of x by 2. > x<-c(-24,-99,82,15) > y<-c(2,3) > x/y [1] -12 -33 41 5 * when one argument is longer than the other, the shorter argument is used cyclically, if necessary > x%/%10 [1] -3 -10 8 1 * this is equivalent to floor(x/10) * returns 0 when dividing by 0 * when x is a positive number, %/% returns the integer part of / > x%%10 [1] 6 1 2 5 * this is equivalent to x-10*(x%/%10) (ie.: the remainder of %/%) * returns x when dividing by 0 Numerical Transformations Name Operation sqrt square root abs absolute value sin cos tan trigonometric functions (radians) asin acos atan inverse trigonometric functions sinh cosh tanh hyperbolic functions asinh acosh atanh inverse hyperbolic functions exp log exponential and natural logarithm log10 common logarithm gamma lgamma gamma function and its natural log - gamma(x) = (x-1)! when x is a positive integer - use the argument base= to change the base of the natural log function ie.: log(x,base=10) is the same as log10(x) Matrix Operations The following functions apply specifically to matrices (on first reading, ignore all but t(), solve(), and %*%): Name Usage Operation t t(A) transpose %*% A%*%B matrix multiply crossprod crossprod(A,B) cross product outer outer(A,B) outer product svd svd(A) singular value decomposition qr qr(A) QR decomposition solve solve(A,B) solve equations or invert matrices eigen eigen(A) eigenvalues chol chol(A) Choleski decomposition - crossprod(A,B) is equivalent to t(A) %*% B - crossprod(A) is equivalent to crossprod(A,A) - the functions outer(), svd(), qr(), eigen(), and chol() can all take optional arguments, these are described in the help documentation - solve(A,B) finds the solution to the system of equations A %*% X = B - solve(A) finds the inverse of A > square<-matrix(c(1,2,3,4,5,6,7,8,9),nrow=3) > square [,1] [,2] [,3] [1,] 1 4 7 [2,] 2 5 8 [3,] 3 6 9 > decomp<-eigen(square) > decomp$values [1] 1.611684e+01 -1.116844e+00 -1.652234e-16 > decomp$vectors [,1] [,2] [,3] [1,] -0.5598757 0.8251730 -0.3767961 [2,] -0.6879268 0.2238583 0.7535922 [3,] -0.8159780 -0.3774565 -0.3767961 * eigen() creates an object of mode list with two components: $values and $vectors * alternatively, eigenvalues could have been obtained using eigen(square)$values, and eigenvectors using eigen(square)$vectors Recall that the matrix size used in part 1 is: Weight Waist heights [1,] 130 26 140 [2,] 110 24 155 [3,] 118 25 142 [4,] 112 25 175 [5,] 128 26 170 The function apply() can be used to find the mean for each column in size: > colmean<-apply(size,2,mean) > colmean * the first argument gives the name of Weight Waist heights the matrix to which the function will 119.6 25.2 156.4 be applied * the second argument gives the dimensions over which the function is to be applied - in the case of a matrix, 1 indicates rows, 2 indicates columns * the third argument gives the name of the function to be applied; functions other than mean can be specified here > sweep(size,2,colmean) Weight Waist heights * sweep "sweeps out" the column means [1,] 10.4 0.8 -16.4 from the matrix size [2,] -9.6 -1.2 -1.4 * the first two arguments in sweep are [3,] -1.6 -0.2 -14.4 the same as in apply() [4,] -7.6 -0.2 18.6 * the third argument is a vector [5,] 8.4 0.8 13.6 containing the values to be "swept out" of the matrix > sweep(size,1,c(1,2,3,4,5),"+") Weight Waist heights * by default, sweep subtracts the [1,] 131 27 141 values in the third argument from [2,] 112 26 157 the rows or columns of the matrix [3,] 121 28 145 * this can be changed by specifying the [4,] 116 29 179 function in the fourth argument [5,] 133 31 175 * in this example, 1 is added to the first row, 2 to the second row, etc. Data Manipulations rep seq rev > rep(c(4,2),times=2) [1] 4 2 4 2 * the rep() function replicates input either a certain number of times or to a certain length > rep(c(4,2),times=c(2,1)) [1] 4 4 2 * when times is a single value, then the first argument is repeated that many times * when times is a vector, then each element in the first argument is matched with a number of times in the second argument > rep(c(4,2),length=3) [1] 4 2 4 * when the length argument is specified, the first argument is replicated to produce a vector of the length specified > seq(1,7,by=2) [1] 1 3 5 7 * seq() creates a sequence from a to b in steps specified in by (the default is by=1) > seq(1,-1,by=-0.5) [1] 1.0 0.5 0.0 -0.5 -1.0 > seq(1,7,length=3) [1] 1 4 7 * as with rep(), the length of the outcome can be specified in seq(), in which case the value for by is inferred > rev(seq(1,5)) [1] 5 4 3 2 1 * rev() reverses the order of a vector or list * rev() will also work on matrices, but the result will be a vector unique sort rank order rle > x<-c(rep(1,3),seq(1,5,by=2),rev(seq(1,5,length=3)),rep(2,3)) > x [1] 1 1 1 1 3 5 5 3 1 2 2 2 > unique(x) [1] 1 3 5 2 * unique() returns the values of the input without any replications > sort(x) [1] 1 1 1 1 1 2 2 2 3 3 5 5 * sort() sorts data in ascending order * to sort by descending order, use > rev(sort(x)) > rank(x) [1] 3.0 3.0 3.0 3.0 9.5 11.5 11.5 9.5 3.0 7.0 7.0 7.0 * rank() returns the ranks of the input * in case of ties, the average of the ranks is returned > order(x) [1] 1 2 3 4 9 10 11 12 5 8 6 7 * order() returns the indices of the data in ascending order * the first element in order(x) tells you where the lowest value in x is, the second element tells you where the second lowest value in x is, etc. * sort() is equivalent to > x[order(x)] Recall that the matrix size used in part 1 is: Weight Waist heights [1,] 130 26 140 [2,] 110 24 155 [3,] 118 25 142 [4,] 112 25 175 [5,] 128 26 170 > i<-order(size[,1]) > i * returns the indices of Weight. in [1] 2 4 3 5 1 ascending order > size[i,] Weight Waist heights * returns the matrix size sorted by Weight [1,] 110 24 155 * i contains the order of the rows which [2,] 112 25 175 would sort the first column in increasing [3,] 118 25 142 order [4,] 128 26 170 * by putting i in the row subscript, all [5,] 130 26 140 the rows are printed out such that the first column is in increasing order > rle(x) $lengths: * computes the length and the value of runs [1] 4 1 2 1 1 3 of the same value in a vector $values: * here, x is made up of four 1's, one 3, [1] 1 3 5 3 1 2 two 5's, one 3, one 1, and three 2's Exercises --------- a) Find the geometric mean of the vector x: x = (2,6,9,17,39) Note: the geometric mean of n values is the n-th root of the product of the n values Write the expression so that it will find the geometric mean of any vector x. b) The following are marks for a student on 12 weekly quizzes marked out of 25. quiz = 24 22 17 10 12 13 16 19 15 18 22 21 Calculate the change in the student's grades from one quiz to the next. c) Find the solution to the system of equations: 3(X1) + 2(X2) + 6(X3) = 44 5(X1) - 3(X2) + 4(X3) = 18 6(X1) + 3(X2) - 2(X3) = 14 d) Create a vector containing the sum of each column of the matrix size. ##This is the matrix size size.2<- matrix(c(130,26,110,24,118,25,112,25),ncol=2,byrow=T) heights <- c(140, 155, 142, 175) size<-cbind(size.2,heights) size<- rbind(size,c(128,26,170))