Difference between revisions of "R programming language"

From Christoph's Personal Wiki
Jump to: navigation, search
(See also)
Line 287: Line 287:
 
==See also==
 
==See also==
 
* [[Bioconductor]]
 
* [[Bioconductor]]
 +
* [http://bio3d.pbwiki.com/ Bio3D]
 
=== Functions ===
 
=== Functions ===
 
* [[R programming language/Heatmap|heatmap]]
 
* [[R programming language/Heatmap|heatmap]]

Revision as of 11:30, 31 December 2006

The R programming language (or just "R"), sometimes described as "GNU S", is a mathematical language and environment used for statistical analysis and display.

R is highly extensible through the use of packages, which are user submitted libraries for specific functions or specific areas of study. A core set of packages are included with the installation of R, with many more available at the comprehensive R archive network, CRAN. The bioinformatics community has seeded a successful effort to use R for the analysis of data from molecular biology laboratories. The bioconductor project started in the fall of 2001 provides R packages for the analysis of genomic data. e.g. Affymetrix and cDNA microarray object-oriented data handling and analysis tools.

Installation

Installing R on SuSE 10.1 using the default settings for the rpm or source distribution seems to be a problem. Below are the methods I have used to resolved these problems.

First make sure you have the following installed (check http://www.rpmfind.net for the packages):

compat-g77
compat-gcc
gcc-g77

It also sometimes helps to create a soft link to gfortran like so (changing the directory to suit your needs):

ln -s /usr/bin/g77 /usr/bin/gfortran

Then, and this is important, add the following to your config.site (found in your R source directory):

FPICFLAGS=-g

Now you are ready to install R on SuSE:

./configure

Or,

./configure --x-includes=/usr/include/X11  # sometimes necessary
make
make check
make pdf     # optional
make info    # optional
make install # as superuser ('root')

That's it. You are now ready to use R

Comparison with other programs

Although R is mostly used by statisticians, and other people in need of statistics, it can also be used as a general matrix calculation toolbox in a program such as GNU Octave or its proprietary counterpart, MATLAB.

It should not be confused with the R package [1], a collection of programs for multidimensional and spatial analysis available on Macintosh and VAX/VMS systems.

Basics

How to get help:

  • help.start() #Opens browser
  • help() #For more on using help
  • help(..) #For help on ..
  • help.search("..") #To search for ..

How to leave again:

  • q() #Image can be saved to .RData

Basic R commands

Most arithmetic operators work like you would expect in R:

> 4 + 2 #Prints '6'
> 3 * 4 #Prints '12'

Operators have precedence as known from basic algebra:

> 1 + 2 * 4 #Prints '9', while
> (1 + 2) * 4 #Prints '12'

Functions

A function call in R looks like this:

  • function_name(arguments)
  • Examples:
> cos(pi/3) #Prints '0.5'
> exp(1) #Prints '2.718282'

A function is identified in R by the parentheses

  • That's why it's: help(), and not: help

Variables (objects) in R

To assign a value to a variable (object):

> x <- 4 #Assigns 4 to x
> x = 4 #Assigns 4 to x (new)
> x #Prints '4'
> y <- x + 2 #Assigns 6 to y

Functions for managing variables:

  • ls() or objects() lists all existing objects
  • str(x) tells the structure (type) of object 'x'
  • rm(x) removes (deletes) the object 'x'

Vectors

A vector in R is like a sequence of elements of the same mode.

> x <- 1:10 #Creates a vector
> y <- c("a","b","c") #So does this

Handy functions for vectors:

  • c() – Concatenates arguments into a vector
  • min() – Returns the smallest value in vector
  • max() – Returns the largest value in vector
  • mean() – Returns the mean of the vector

Elements in a vector can be accessed individually:

> x[1] #Prints first element
> x[1:10] #Prints first 10 elements
> x[c(1,3)] #Prints element 1 and 3

Most functions expect one vector as argument, rather than individual numbers

> mean(1,2,3) #Replies '1'
> mean(c(1,2,3)) #Replies '2'

The Recycling Rule

The recycling rule is a key concept for vector algebra in R.

When a vector is too short for a given operation, the elements are recycled and used again.

Examples of vectors that are too short:

> x <- c(1,2,3,4)
> y <- c(1,2) #y is too short
> x + y #Returns '2,4,4,6'

Data

All simple numerical objects in R function like a long string of numbers. In fact, even the simple: x <- 1, can be thought of like a vector with one element.

The functions dim(x) and str(x) returns information on the dimensionality of x.

Important Objects

  • vector – "A series of numbers"
  • matrix – "Tables of numbers"
  • data.frame – "More 'powerful' matrix (list of vectors)"
  • list – "Collections of other objects"
  • class – "Intelligent(?) lists"

Data Matrices

Matrices are created with the matrix() function.

> m <- matrix(1:12,nrow=3)

This produces something like this:

– [,1] [,2] [,3] [,4]
– [1,] 1 4 7 10
– [2,] 2 5 8 11
– [3,] 3 6 9 12

The recycling rule still applies:

> m <- matrix(c(2,5),nrow=3,ncol=3)

Gives the following matrix:

– [,1] [,2] [,3]
– [1,] 2 5 2
– [2,] 5 2 5
– [3,] 2 5 2

Indexing Matrices

For vectors we could specify one index vector like this:

> x <- c(2,0,1,5)
> x[c(1,3)] #Returns ‘2’ and ‘1’

For matrices we have to specify two vectors:

> m <- matrix(1:3,nrow=3,ncol=3)
> m[c(1,3),c(1,3)] #Ret. 2*2 matrix
> m[1,] #First row as vector

Beyond two dimensions

You can actually assign to dim():

> x <- 1:12
> dim(x) #Returns ‘NULL’
> dim(x) <- c(3,4) #3*4 Matrix
> dim(x) #Returns ‘3 4’
> dim(x) <- c(2,3,2) #x is now in 3d
> dim(x) #Returns ‘2 3 2’

But functions like mean() still work:

> mean(x) #Returns ‘6.5’

Graphics and visualisation

Visualization is one of R’s strong points.

R has many functions for drawing graphs, including:

  • hist(x) – Draws a histogram of values in x
  • plot(x,y) – Draws a basic xy plot of x against y

Adding stuff to plots

  • points(x,y) – Add point (x,y) to existing graph.
  • lines(x,y) – Connect points with line.

Graphical devices

A graphical device is what ‘displays’ the graph. It can be a window, it can be the printer.

Functions for plotting “Devices”:

  • X11() – This function allows you to change the size and composition of the plotting window.
  • par(mfrow=c(x,y)) – Splits a plotting device into x rows and y columns.
  • dev.print(postscript, file=“???.ps”)
  • Use this device to save the plot to a file.

DNA Microarray Analysis - Example

## Objects

x <- rnorm(30)

y <- x[x>0]

z <- x
z[z<0] <- 0

m <- matrix(x, nrow = 5)
str(m)

d.f <- as.data.frame(m)
str(d.f)

m[2,2] = "a"
d.f[2,2] = "a"
str(m)
str(d.f)


## Functions

cube <- function(x) {
  z <- x*x*x
  return(z)
}

fact <- function(x) {
z <- 1
  for (i in 2:x) {
    z <- z * i
  }
  return(z)
}

func <- function(x, y) {
  z <- cube(x) - fact(y)
  return(z)
}


## Graphics

hist(a <- rnorm(100))

X11()
plot(a <- rnorm(100), b <- rnorm(100))
points(a[a<0 & b>0], b[a<0 & b>0],col="green")
points(a[a>0 & b>0], b[a>0 & b>0],col="red")
points(a[a>0 & b<0], b[a>0 & b<0],col="blue")
points(a[a<0 & b<0], b[a<0 & b<0],col="yellow")
lines(c(-10,10),c(0,0))
lines(c(0,0),c(-10,10))

Packages (add-ons)

To install packages from the CLI, execute the following:

R CMD INSTALL /path/to/pkg_version.tar.gz

See also

Functions

Resources

External links