R programming language

From Christoph's Personal Wiki
Revision as of 22:03, 16 June 2012 by Christoph (Talk | contribs) (Tutorials)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

The R programming language (or just "R"), sometimes described as "GNU S", is a mathematical language and environment used for statistical analysis and display.

R is highly extensible through the use of packages, which are user submitted libraries for specific functions or specific areas of study. A core set of packages are included with the installation of R, with many more available at the comprehensive R archive network, CRAN. The bioinformatics community has seeded a successful effort to use R for the analysis of data from molecular biology laboratories. The bioconductor project started in the fall of 2001 provides R packages for the analysis of genomic data. e.g. Affymetrix and cDNA microarray object-oriented data handling and analysis tools.

see scripts

Installation

Installing R on SuSE 10.1 using the default settings for the rpm or source distribution seems to be a problem. Below are the methods I have used to resolved these problems.

First make sure you have the following installed (check http://www.rpmfind.net for the packages):

compat-g77
compat-gcc
gcc-g77

It also sometimes helps to create a soft link to gfortran like so (changing the directory to suit your needs):

ln -s /usr/bin/g77 /usr/bin/gfortran

Then, and this is important, add the following to your config.site (found in your R source directory):

FPICFLAGS=-g

Now you are ready to install R on SuSE:

./configure

Or,

./configure --x-includes=/usr/include/X11  # sometimes necessary
make
make check
make pdf     # optional
make info    # optional
make install # as superuser ('root')

That's it. You are now ready to use R

Comparison with other programs

Although R is mostly used by statisticians, and other people in need of statistics, it can also be used as a general matrix calculation toolbox in a program such as GNU Octave or its proprietary counterpart, MATLAB.

It should not be confused with the R package [1], a collection of programs for multidimensional and spatial analysis available on Macintosh and VAX/VMS systems.

Basics

How to get help:

help.start()
Opens browser
help()
For more on using help
help(..)
For help on ..
help.search("..")
To search for ..

How to leave again:

q()
Image can be saved to .RData

Basic R commands

Most arithmetic operators work like you would expect in R:

4+2 #Prints '6'
3*4 #Prints '12'

Operators have precedence as known from basic algebra:

1+2*4   #Prints '9', while
(1+2)*4 #Prints '12'

Functions

A function call in R looks like this:

  • function_name(arguments)
  • Examples:
cos(pi/3) #Prints '0.5'
exp(1)    #Prints '2.718282'

A function is identified in R by the parentheses

  • That's why it's: help(), and not: help

Variables (objects) in R

To assign a value to a variable (object):

x<-4   #Assigns 4 to x
x=4    #Assigns 4 to x (new)
x      #Prints '4'
y<-x+2 #Assigns 6 to y

Functions for managing variables:

ls() or objects()
lists all existing objects
str(x)
tells the structure (type) of object 'x'
rm(x)
removes (deletes) the object 'x'

Vectors

A vector in R is like a sequence of elements of the same mode.

x<-1:10           #Creates a vector
y<-c("a","b","c") #So does this

Handy functions for vectors:

c()
Concatenates arguments into a vector
min()
Returns the smallest value in vector
max()
Returns the largest value in vector
mean()
Returns the mean of the vector

Elements in a vector can be accessed individually:

x[1]      #Prints first element
x[1:10]   #Prints first 10 elements
x[c(1,3)] #Prints element 1 and 3

Most functions expect one vector as argument, rather than individual numbers

mean(1,2,3)    #Replies '1'
mean(c(1,2,3)) #Replies '2'

The Recycling Rule

The recycling rule is a key concept for vector algebra in R.

When a vector is too short for a given operation, the elements are recycled and used again.

Examples of vectors that are too short:

x<-c(1,2,3,4)
y<-c(1,2) #y is too short
x+y       #Returns '2,4,4,6'

</pre>

Data

All simple numerical objects in R function like a long string of numbers. In fact, even the simple: x<-1, can be thought of like a vector with one element.

The functions dim(x) and str(x) returns information on the dimensionality of x.

Important Objects

vector
A series of numbers
matrix
Tables of numbers
data.frame
More 'powerful' matrix (list of vectors)
list
Collections of other objects
class
Intelligent(?) lists

Data Matrices

Matrices are created with the matrix() function.

m<-matrix(1:12,nrow=3)
#This produces something like this:
– [,1] [,2] [,3] [,4]
– [1,] 1 4 7 10
– [2,] 2 5 8 11
– [3,] 3 6 9 12

The recycling rule still applies:

m<-matrix(c(2,5),nrow=3,ncol=3)
#Gives the following matrix:
– [,1] [,2] [,3]
– [1,] 2 5 2
– [2,] 5 2 5
– [3,] 2 5 2

Indexing Matrices

For vectors we could specify one index vector like this:

x<-c(2,0,1,5)
x[c(1,3)] #Returns '2' and '1'

For matrices we have to specify two vectors:

m<-matrix(1:3,nrow=3,ncol=3)
m[c(1,3),c(1,3)] #Return 2*2 matrix
m[1,] #First row as vector

Beyond two dimensions

You can actually assign to dim():

x<-1:12
dim(x)           #Returns 'NULL'
dim(x)<-c(3,4)   #3*4 Matrix
dim(x)           #Returns '3 4'
dim(x)<-c(2,3,2) #x is now in 3d
dim(x)           #Returns '2 3 2'

But functions like mean() still work:

mean(x) #Returns '6.5'

Graphics and visualisation

Visualization is one of R's strong points.

R has many functions for drawing graphs, including:

hist(x)   #Draws a histogram of values in x
plot(x,y) #Draws a basic xy plot of x against y

Adding stuff to plots:

points(x,y) #Add point (x,y) to existing graph.
lines(x,y)  #Connect points with line.

Graphical devices

A graphical device is what 'displays' the graph. It can be a window, it can be the printer.

Functions for plotting "Devices":

X11() #This function allows you to change the size and composition of the plotting window.
par(mfrow=c(x,y)) #Splits a plotting device into x rows and y columns.
dev.print(postscript, file="???.ps") #Use this device to save the plot to a file.

Packages (add-ons)

To install packages from the CLI, execute the following:

R CMD INSTALL /path/to/pkg_version.tar.gz

See also

Functions

Resources

Books

External links

Wikis

Packages / Resources

Tutorials