Difference between revisions of "R programming language"
(→Basics: Added "DNA Microarray Analysis - Example" code) |
(Added "installation" section) |
||
Line 1: | Line 1: | ||
− | The '''R programming language''' (or just "'''R'''"), sometimes described as " | + | The '''R programming language''' (or just "'''R'''"), sometimes described as "GNU S", is a mathematical language and environment used for statistical analysis and display. It was originally created by Ross Ihaka and Robert Gentleman (hence the name R) at the University of Auckland, New Zealand, and is now steadily developed further by a large community around the world. |
− | It is based upon [[S programming language|S]], which was developed by | + | It is based upon [[S programming language|S]], which was developed by John Chambers of Bell Laboratories and described in the paper "Evolution of the S Language" [http://cm.bell-labs.com/stat/doc/96.7.ps]. R is considered by its developers to be an implementation of S, with semantics derived from Scheme. The commercial implementation of S is S-PLUS [http://www.insightful.com/products/splus]. |
− | R's source code is freely available under the | + | R's source code is freely available under the GNU GPL. There are several GUIs for R, including [[RKWard]], [[SciViews-R]] [http://www.sciviews.org/SciViews-R/], and [[Rcmdr]] [http://socserv.socsci.mcmaster.ca/jfox/Misc/Rcmdr/index.html]. Many editors have specialised modes for R, including [[Emacs]] ([[Emacs Speaks Statistics]]), [[jEdit]] [http://community.jedit.org/?q=node/view/2339], [[Kate (text editor)]] [http://www.uni-kiel.de/agrarpol/ahenningsen/app-econ/R.xml.zip], and [http://tinn.solarvoid.com/ Tinn] [http://sourceforge.net/projects/tinn-r], and there is an [http://www.walware.de/goto/statet R plug-in] for the [[Eclipse (software)|Eclipse]] IDE framework. |
− | R is highly extensible through the use of packages, which are user submitted | + | R is highly extensible through the use of packages, which are user submitted libraries for specific functions or specific areas of study. A core set of packages are included with the installation of R, with many more available at the comprehensive R archive network, [[CRAN]]. The [[bioinformatics]] community has seeded a successful effort to use R for the analysis of data from [[molecular biology]] laboratories. The [[bioconductor]] project started in the fall of 2001 provides R packages for the analysis of genomic data. e.g. [[Affymetrix]] and [[Complementary DNA|cDNA]] microarray object-oriented data handling and analysis tools. |
+ | |||
+ | == Installation == | ||
+ | Installing <tt>R</tt> on [[SuSE]] 10.1 using the default settings for the [[rpm]] or source distribution seems to be a problem. Below are the methods I have used to resolved these problems. | ||
+ | |||
+ | First make sure you have the following installed (check http://www.rpmfind.net for the packages): | ||
+ | <pre> | ||
+ | compat-g77 | ||
+ | compat-gcc | ||
+ | gcc-g77 | ||
+ | </pre> | ||
+ | |||
+ | It also ''sometimes'' helps to create a soft link to <tt>gfortran</tt> like so (changing the directory to suit your needs): | ||
+ | <pre>ln -s /usr/bin/g77 /usr/bin/gfortran</pre> | ||
+ | |||
+ | Then, and this is '''important''', add the following to your <tt>config.site</tt> (found in your <tt>R</tt> source directory): | ||
+ | <pre>FPICFLAGS=-g</pre> | ||
+ | |||
+ | Now you are ready to install <tt>R</tt> on SuSE: | ||
+ | <pre> | ||
+ | ./configure | ||
+ | make | ||
+ | make check | ||
+ | make pdf # optional | ||
+ | make info # optional | ||
+ | make install # as superuser ('root') | ||
+ | </pre> | ||
+ | |||
+ | That's it. You are now ready to use <tt>R</tt> | ||
==Comparison with other programs== | ==Comparison with other programs== |
Revision as of 16:08, 25 May 2006
The R programming language (or just "R"), sometimes described as "GNU S", is a mathematical language and environment used for statistical analysis and display. It was originally created by Ross Ihaka and Robert Gentleman (hence the name R) at the University of Auckland, New Zealand, and is now steadily developed further by a large community around the world.
It is based upon S, which was developed by John Chambers of Bell Laboratories and described in the paper "Evolution of the S Language" [1]. R is considered by its developers to be an implementation of S, with semantics derived from Scheme. The commercial implementation of S is S-PLUS [2].
R's source code is freely available under the GNU GPL. There are several GUIs for R, including RKWard, SciViews-R [3], and Rcmdr [4]. Many editors have specialised modes for R, including Emacs (Emacs Speaks Statistics), jEdit [5], Kate (text editor) [6], and Tinn [7], and there is an R plug-in for the Eclipse IDE framework.
R is highly extensible through the use of packages, which are user submitted libraries for specific functions or specific areas of study. A core set of packages are included with the installation of R, with many more available at the comprehensive R archive network, CRAN. The bioinformatics community has seeded a successful effort to use R for the analysis of data from molecular biology laboratories. The bioconductor project started in the fall of 2001 provides R packages for the analysis of genomic data. e.g. Affymetrix and cDNA microarray object-oriented data handling and analysis tools.
Installation
Installing R on SuSE 10.1 using the default settings for the rpm or source distribution seems to be a problem. Below are the methods I have used to resolved these problems.
First make sure you have the following installed (check http://www.rpmfind.net for the packages):
compat-g77 compat-gcc gcc-g77
It also sometimes helps to create a soft link to gfortran like so (changing the directory to suit your needs):
ln -s /usr/bin/g77 /usr/bin/gfortran
Then, and this is important, add the following to your config.site (found in your R source directory):
FPICFLAGS=-g
Now you are ready to install R on SuSE:
./configure make make check make pdf # optional make info # optional make install # as superuser ('root')
That's it. You are now ready to use R
Comparison with other programs
Although R is mostly used by statisticians, and other people in need of statistics, it can also be used as a general matrix calculation toolbox in a program such as GNU Octave or its proprietary counterpart, MATLAB.
It should not be confused with the R package [8], a collection of programs for multidimensional and spatial analysis available on Macintosh and VAX/VMS systems.
Basics
How to get help:
- help.start() #Opens browser
- help() #For more on using help
- help(..) #For help on ..
- help.search("..") #To search for ..
How to leave again:
- q() #Image can be saved to .RData
Basic R commands
Most arithmetic operators work like you would expect in R:
> 4 + 2 #Prints ‘6’ > 3 * 4 #Prints ‘12’
Operators have precedence as known from basic algebra:
> 1 + 2 * 4 #Prints ‘9’, while > (1 + 2) * 4 #Prints ‘12’
Functions
A function call in R looks like this:
- function_name(arguments)
- Examples:
> cos(pi/3) #Prints ‘0.5’ > exp(1) #Prints ‘2.718282’
A function is identified in R by the parentheses
- That’s why it’s: help(), and not: help
Variables (objects) in R
To assign a value to a variable (object):
> x <- 4 #Assigns 4 to x > x = 4 #Assigns 4 to x (new) > x #Prints ‘4’ > y <- x + 2 #Assigns 6 to y
Functions for managing variables:
- ls() or objects() lists all existing objects
- str(x) tells the structure (type) of object ‘x’
- rm(x) removes (deletes) the object ‘x’
Vectors
A vector in R is like a sequence of elements of the same mode.
> x <- 1:10 #Creates a vector > y <- c(“a”,“b”,“c”) #So does this
Handy functions for vectors:
- c() – Concatenates arguments into a vector
- min() – Returns the smallest value in vector
- max() – Returns the largest value in vector
- mean() – Returns the mean of the vector
Elements in a vector can be accessed individually:
> x[1] #Prints first element > x[1:10] #Prints first 10 elements > x[c(1,3)] #Prints element 1 and 3
Most functions expect one vector as argument, rather than individual numbers
> mean(1,2,3) #Replies ‘1’ > mean(c(1,2,3)) #Replies ‘2’
The Recycling Rule
The recycling rule is a key concept for vector algebra in R.
When a vector is too short for a given operation, the elements are recycled and used again.
Examples of vectors that are too short:
> x <- c(1,2,3,4) > y <- c(1,2) #y is too short > x + y #Returns ‘2,4,4,6’
Data
All simple numerical objects in R function like a long string of numbers. In fact, even the simple: x <- 1, can be thought of like a vector with one element.
The functions dim(x) and str(x) returns information on the dimensionality of x.
Important Objects
- vector – “A series of numbers”
- matrix – “Tables of numbers”
- data.frame – “More ‘powerful’ matrix (list of vectors)”
- list – “Collections of other objects”
- class – “Intelligent(?) lists”
Data Matrices
Matrices are created with the matrix() function.
> m <- matrix(1:12,nrow=3)
This produces something like this:
– [,1] [,2] [,3] [,4] – [1,] 1 4 7 10 – [2,] 2 5 8 11 – [3,] 3 6 9 12
The recycling rule still applies:
> m <- matrix(c(2,5),nrow=3,ncol=3)
Gives the following matrix:
– [,1] [,2] [,3] – [1,] 2 5 2 – [2,] 5 2 5 – [3,] 2 5 2
Indexing Matrices
For vectors we could specify one index vector like this:
> x <- c(2,0,1,5) > x[c(1,3)] #Returns ‘2’ and ‘1’
For matrices we have to specify two vectors:
> m <- matrix(1:3,nrow=3,ncol=3) > m[c(1,3),c(1,3)] #Ret. 2*2 matrix > m[1,] #First row as vector
Beyond two dimensions
You can actually assign to dim():
> x <- 1:12 > dim(x) #Returns ‘NULL’ > dim(x) <- c(3,4) #3*4 Matrix > dim(x) #Returns ‘3 4’ > dim(x) <- c(2,3,2) #x is now in 3d > dim(x) #Returns ‘2 3 2’
But functions like mean() still work:
> mean(x) #Returns ‘6.5’
Graphics and visualisation
Visualization is one of R’s strong points.
R has many functions for drawing graphs, including:
- hist(x) – Draws a histogram of values in x
- plot(x,y) – Draws a basic xy plot of x against y
Adding stuff to plots
- points(x,y) – Add point (x,y) to existing graph.
- lines(x,y) – Connect points with line.
Graphical devices
A graphical device is what ‘displays’ the graph. It can be a window, it can be the printer.
Functions for plotting “Devices”:
- X11() – This function allows you to change the size and composition of the plotting window.
- par(mfrow=c(x,y)) – Splits a plotting device into x rows and y columns.
- dev.print(postscript, file=“???.ps”)
- Use this device to save the plot to a file.
DNA Microarray Analysis - Example
## Objects x <- rnorm(30) y <- x[x>0] z <- x z[z<0] <- 0 m <- matrix(x, nrow = 5) str(m) d.f <- as.data.frame(m) str(d.f) m[2,2] = "a" d.f[2,2] = "a" str(m) str(d.f) ## Functions cube <- function(x) { z <- x*x*x return(z) } fact <- function(x) { z <- 1 for (i in 2:x) { z <- z * i } return(z) } func <- function(x, y) { z <- cube(x) - fact(y) return(z) } ## Graphics hist(a <- rnorm(100)) X11() plot(a <- rnorm(100), b <- rnorm(100)) points(a[a<0 & b>0], b[a<0 & b>0],col="green") points(a[a>0 & b>0], b[a>0 & b>0],col="red") points(a[a>0 & b<0], b[a>0 & b<0],col="blue") points(a[a<0 & b<0], b[a<0 & b<0],col="yellow") lines(c(-10,10),c(0,0)) lines(c(0,0),c(-10,10))
See also
- Journal of Statistical Software — peer-reviewed journal publishing many R related papers
- CRAN — Comprehensive R Archive Network for the R programming language.
External links
- The R Project for Statistical Computing
- The CRAN (Comprehensive R Archive Network) Project
- Web-based interface to R
- The R Reference Manual - Base Package by the R Development Core Team. ISBN 0-9546120-0-0 (vol. 1), ISBN 0-9546120-1-9 (vol. 2)
- The R Wiki User contributed R documentation and how to information.
- The R Graph Gallery shows several examples of graphics generated by R
- Robert Gentleman's site
- Ross Ihaka's site
- Rcmdr, an open source GUI for R
- List of IDEs and script editors for R
- Tinn-R, an advanced open source script editor for R under Windows