Difference between revisions of "R programming language"
(→See also) |
(→See also) |
||
Line 289: | Line 289: | ||
* [http://bio3d.pbwiki.com/ Bio3D] | * [http://bio3d.pbwiki.com/ Bio3D] | ||
=== Functions === | === Functions === | ||
− | * [[ | + | * [[Heatmap]] |
=== Resources === | === Resources === | ||
*[http://www.jstatsoft.org/ Journal of Statistical Software] — peer-reviewed journal publishing many R related papers | *[http://www.jstatsoft.org/ Journal of Statistical Software] — peer-reviewed journal publishing many R related papers | ||
*[http://cran.r-project.org/mirrors.html CRAN] — Comprehensive R Archive Network for the R programming language. | *[http://cran.r-project.org/mirrors.html CRAN] — Comprehensive R Archive Network for the R programming language. | ||
− | * [http://spider.stat.umn.edu/R/library/graphics/html/ R graphics] — a long list of techniques with examples. | + | *[http://spider.stat.umn.edu/R/library/graphics/html/ R graphics] — a long list of techniques with examples. |
==External links== | ==External links== |
Revision as of 10:07, 12 February 2007
The R programming language (or just "R"), sometimes described as "GNU S", is a mathematical language and environment used for statistical analysis and display.
R is highly extensible through the use of packages, which are user submitted libraries for specific functions or specific areas of study. A core set of packages are included with the installation of R, with many more available at the comprehensive R archive network, CRAN. The bioinformatics community has seeded a successful effort to use R for the analysis of data from molecular biology laboratories. The bioconductor project started in the fall of 2001 provides R packages for the analysis of genomic data. e.g. Affymetrix and cDNA microarray object-oriented data handling and analysis tools.
Contents
Installation
Installing R on SuSE 10.1 using the default settings for the rpm or source distribution seems to be a problem. Below are the methods I have used to resolved these problems.
First make sure you have the following installed (check http://www.rpmfind.net for the packages):
compat-g77 compat-gcc gcc-g77
It also sometimes helps to create a soft link to gfortran like so (changing the directory to suit your needs):
ln -s /usr/bin/g77 /usr/bin/gfortran
Then, and this is important, add the following to your config.site (found in your R source directory):
FPICFLAGS=-g
Now you are ready to install R on SuSE:
./configure
Or,
./configure --x-includes=/usr/include/X11 # sometimes necessary
make make check make pdf # optional make info # optional make install # as superuser ('root')
That's it. You are now ready to use R
Comparison with other programs
Although R is mostly used by statisticians, and other people in need of statistics, it can also be used as a general matrix calculation toolbox in a program such as GNU Octave or its proprietary counterpart, MATLAB.
It should not be confused with the R package [1], a collection of programs for multidimensional and spatial analysis available on Macintosh and VAX/VMS systems.
Basics
How to get help:
- help.start() #Opens browser
- help() #For more on using help
- help(..) #For help on ..
- help.search("..") #To search for ..
How to leave again:
- q() #Image can be saved to .RData
Basic R commands
Most arithmetic operators work like you would expect in R:
> 4 + 2 #Prints '6' > 3 * 4 #Prints '12'
Operators have precedence as known from basic algebra:
> 1 + 2 * 4 #Prints '9', while > (1 + 2) * 4 #Prints '12'
Functions
A function call in R looks like this:
- function_name(arguments)
- Examples:
> cos(pi/3) #Prints '0.5' > exp(1) #Prints '2.718282'
A function is identified in R by the parentheses
- That's why it's: help(), and not: help
Variables (objects) in R
To assign a value to a variable (object):
> x <- 4 #Assigns 4 to x > x = 4 #Assigns 4 to x (new) > x #Prints '4' > y <- x + 2 #Assigns 6 to y
Functions for managing variables:
- ls() or objects() lists all existing objects
- str(x) tells the structure (type) of object 'x'
- rm(x) removes (deletes) the object 'x'
Vectors
A vector in R is like a sequence of elements of the same mode.
> x <- 1:10 #Creates a vector > y <- c("a","b","c") #So does this
Handy functions for vectors:
- c() – Concatenates arguments into a vector
- min() – Returns the smallest value in vector
- max() – Returns the largest value in vector
- mean() – Returns the mean of the vector
Elements in a vector can be accessed individually:
> x[1] #Prints first element > x[1:10] #Prints first 10 elements > x[c(1,3)] #Prints element 1 and 3
Most functions expect one vector as argument, rather than individual numbers
> mean(1,2,3) #Replies '1' > mean(c(1,2,3)) #Replies '2'
The Recycling Rule
The recycling rule is a key concept for vector algebra in R.
When a vector is too short for a given operation, the elements are recycled and used again.
Examples of vectors that are too short:
> x <- c(1,2,3,4) > y <- c(1,2) #y is too short > x + y #Returns '2,4,4,6'
Data
All simple numerical objects in R function like a long string of numbers. In fact, even the simple: x <- 1, can be thought of like a vector with one element.
The functions dim(x) and str(x) returns information on the dimensionality of x.
Important Objects
- vector – "A series of numbers"
- matrix – "Tables of numbers"
- data.frame – "More 'powerful' matrix (list of vectors)"
- list – "Collections of other objects"
- class – "Intelligent(?) lists"
Data Matrices
Matrices are created with the matrix() function.
> m <- matrix(1:12,nrow=3)
This produces something like this:
– [,1] [,2] [,3] [,4] – [1,] 1 4 7 10 – [2,] 2 5 8 11 – [3,] 3 6 9 12
The recycling rule still applies:
> m <- matrix(c(2,5),nrow=3,ncol=3)
Gives the following matrix:
– [,1] [,2] [,3] – [1,] 2 5 2 – [2,] 5 2 5 – [3,] 2 5 2
Indexing Matrices
For vectors we could specify one index vector like this:
> x <- c(2,0,1,5) > x[c(1,3)] #Returns ‘2’ and ‘1’
For matrices we have to specify two vectors:
> m <- matrix(1:3,nrow=3,ncol=3) > m[c(1,3),c(1,3)] #Ret. 2*2 matrix > m[1,] #First row as vector
Beyond two dimensions
You can actually assign to dim():
> x <- 1:12 > dim(x) #Returns ‘NULL’ > dim(x) <- c(3,4) #3*4 Matrix > dim(x) #Returns ‘3 4’ > dim(x) <- c(2,3,2) #x is now in 3d > dim(x) #Returns ‘2 3 2’
But functions like mean() still work:
> mean(x) #Returns ‘6.5’
Graphics and visualisation
Visualization is one of R’s strong points.
R has many functions for drawing graphs, including:
- hist(x) – Draws a histogram of values in x
- plot(x,y) – Draws a basic xy plot of x against y
Adding stuff to plots
- points(x,y) – Add point (x,y) to existing graph.
- lines(x,y) – Connect points with line.
Graphical devices
A graphical device is what ‘displays’ the graph. It can be a window, it can be the printer.
Functions for plotting “Devices”:
- X11() – This function allows you to change the size and composition of the plotting window.
- par(mfrow=c(x,y)) – Splits a plotting device into x rows and y columns.
- dev.print(postscript, file=“???.ps”)
- Use this device to save the plot to a file.
DNA Microarray Analysis - Example
## Objects x <- rnorm(30) y <- x[x>0] z <- x z[z<0] <- 0 m <- matrix(x, nrow = 5) str(m) d.f <- as.data.frame(m) str(d.f) m[2,2] = "a" d.f[2,2] = "a" str(m) str(d.f) ## Functions cube <- function(x) { z <- x*x*x return(z) } fact <- function(x) { z <- 1 for (i in 2:x) { z <- z * i } return(z) } func <- function(x, y) { z <- cube(x) - fact(y) return(z) } ## Graphics hist(a <- rnorm(100)) X11() plot(a <- rnorm(100), b <- rnorm(100)) points(a[a<0 & b>0], b[a<0 & b>0],col="green") points(a[a>0 & b>0], b[a>0 & b>0],col="red") points(a[a>0 & b<0], b[a>0 & b<0],col="blue") points(a[a<0 & b<0], b[a<0 & b<0],col="yellow") lines(c(-10,10),c(0,0)) lines(c(0,0),c(-10,10))
Packages (add-ons)
To install packages from the CLI, execute the following:
R CMD INSTALL /path/to/pkg_version.tar.gz
See also
Functions
Resources
- Journal of Statistical Software — peer-reviewed journal publishing many R related papers
- CRAN — Comprehensive R Archive Network for the R programming language.
- R graphics — a long list of techniques with examples.
External links
- The R Project for Statistical Computing
- The CRAN (Comprehensive R Archive Network) Project
- Web-based interface to R
- The R Reference Manual - Base Package by the R Development Core Team. ISBN 0-9546120-0-0 (vol. 1), ISBN 0-9546120-1-9 (vol. 2)
- The R Wiki User contributed R documentation and how to information.
- The R Graph Gallery shows several examples of graphics generated by R
- Robert Gentleman's site
- Ross Ihaka's site
- Rcmdr, an open source GUI for R
- List of IDEs and script editors for R
- Tinn-R, an advanced open source script editor for R under Windows