Difference between revisions of "R programming language"
(Added "Basics" (basic R commands)) |
(→Tutorials) |
||
(21 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
− | The '''R programming language''' (or just "'''R'''"), sometimes described as " | + | The '''R programming language''' (or just "'''R'''"), sometimes described as "GNU S", is a mathematical language and environment used for statistical analysis and display. |
− | + | R is highly extensible through the use of packages, which are user submitted libraries for specific functions or specific areas of study. A core set of packages are included with the installation of R, with many more available at the comprehensive R archive network, [http://cran.r-project.org/ CRAN]. The [[:Category:Bioinformatics|bioinformatics]] community has seeded a successful effort to use R for the analysis of data from molecular biology laboratories. The [[bioconductor]] project started in the fall of 2001 provides R packages for the analysis of genomic data. e.g. Affymetrix and cDNA microarray object-oriented data handling and analysis tools. | |
− | + | see [[R programming language/Scripts|scripts]] | |
− | R | + | == Installation == |
+ | Installing <tt>R</tt> on [[SuSE]] 10.1 using the default settings for the [[rpm]] or source distribution seems to be a problem. Below are the methods I have used to resolved these problems. | ||
+ | |||
+ | First make sure you have the following installed (check http://www.rpmfind.net for the packages): | ||
+ | <pre> | ||
+ | compat-g77 | ||
+ | compat-gcc | ||
+ | gcc-g77 | ||
+ | </pre> | ||
+ | |||
+ | It also ''sometimes'' helps to create a soft link to <tt>gfortran</tt> like so (changing the directory to suit your needs): | ||
+ | <pre>ln -s /usr/bin/g77 /usr/bin/gfortran</pre> | ||
+ | |||
+ | Then, and this is '''important''', add the following to your <tt>config.site</tt> (found in your <tt>R</tt> source directory): | ||
+ | <pre>FPICFLAGS=-g</pre> | ||
+ | |||
+ | Now you are ready to install <tt>R</tt> on SuSE: | ||
+ | ./configure | ||
+ | Or, | ||
+ | ./configure --x-includes=/usr/include/X11 # sometimes necessary | ||
+ | |||
+ | make | ||
+ | make check | ||
+ | make pdf # optional | ||
+ | make info # optional | ||
+ | make install # as superuser ('root') | ||
+ | |||
+ | That's it. You are now ready to use <tt>R</tt> | ||
==Comparison with other programs== | ==Comparison with other programs== | ||
Line 12: | Line 39: | ||
It should not be confused with the R package [http://www.bio.umontreal.ca/Casgrain/en/labo/R/index.html], a collection of programs for multidimensional and spatial analysis available on Macintosh and VAX/VMS systems. | It should not be confused with the R package [http://www.bio.umontreal.ca/Casgrain/en/labo/R/index.html], a collection of programs for multidimensional and spatial analysis available on Macintosh and VAX/VMS systems. | ||
− | == Basics == | + | ==Basics== |
− | + | ||
How to get help: | How to get help: | ||
− | + | ;<code>help.start()</code>:Opens browser | |
− | + | ;<code>help()</code>:For more on using help | |
− | + | ;<code>help(..)</code>:For help on .. | |
− | + | ;<code>help.search("..")</code>:To search for .. | |
How to leave again: | How to leave again: | ||
− | + | ;<code>q()</code>:Image can be saved to <tt>.RData</tt> | |
− | === Basic R commands === | + | ===Basic R commands=== |
Most arithmetic operators work like you would expect in R: | Most arithmetic operators work like you would expect in R: | ||
− | + | 4+2 #Prints '6' | |
− | + | 3*4 #Prints '12' | |
− | + | ||
− | + | ||
Operators have precedence as known from basic algebra: | Operators have precedence as known from basic algebra: | ||
− | + | 1+2*4 #Prints '9', while | |
− | + | (1+2)*4 #Prints '12' | |
− | + | ||
− | + | ||
− | === Functions === | + | ===Functions=== |
A function call in R looks like this: | A function call in R looks like this: | ||
* function_name(arguments) | * function_name(arguments) | ||
* Examples: | * Examples: | ||
− | + | cos(pi/3) #Prints '0.5' | |
− | + | exp(1) #Prints '2.718282' | |
− | + | ||
− | + | ||
A function is identified in R by the parentheses | A function is identified in R by the parentheses | ||
− | * | + | * That's why it's: help(), and not: help |
− | === Variables (objects) in R === | + | ===Variables (objects) in R=== |
To assign a value to a variable (object): | To assign a value to a variable (object): | ||
− | + | x<-4 #Assigns 4 to x | |
− | + | x=4 #Assigns 4 to x (new) | |
− | + | x #Prints '4' | |
− | + | y<-x+2 #Assigns 6 to y | |
− | + | ||
− | + | ||
Functions for managing variables: | Functions for managing variables: | ||
− | + | ;<code>ls()</code> or <code>objects()</code>:lists all existing objects | |
− | + | ;<code>str(x)</code>:tells the structure (type) of object 'x' | |
− | + | ;<code>rm(x)</code>:removes (deletes) the object 'x' | |
− | + | ||
− | + | ||
+ | ===Vectors=== | ||
A vector in R is like a sequence of elements of the same mode. | A vector in R is like a sequence of elements of the same mode. | ||
− | + | x<-1:10 #Creates a vector | |
− | + | y<-c("a","b","c") #So does this | |
− | + | ||
− | + | ||
Handy functions for vectors: | Handy functions for vectors: | ||
− | + | ;<code>c()</code>:Concatenates arguments into a vector | |
− | + | ;<code>min()</code>:Returns the smallest value in vector | |
− | + | ;<code>max()</code>:Returns the largest value in vector | |
− | + | ;<code>mean()</code>:Returns the mean of the vector | |
Elements in a vector can be accessed individually: | Elements in a vector can be accessed individually: | ||
− | + | x[1] #Prints first element | |
− | + | x[1:10] #Prints first 10 elements | |
− | + | x[c(1,3)] #Prints element 1 and 3 | |
− | + | ||
− | + | ||
Most functions expect one vector as argument, rather than individual numbers | Most functions expect one vector as argument, rather than individual numbers | ||
− | + | mean(1,2,3) #Replies '1' | |
− | + | mean(c(1,2,3)) #Replies '2' | |
− | + | ||
− | + | ||
− | === The Recycling Rule === | + | ===The Recycling Rule=== |
The recycling rule is a key concept for vector algebra in R. | The recycling rule is a key concept for vector algebra in R. | ||
Line 95: | Line 106: | ||
Examples of vectors that are too short: | Examples of vectors that are too short: | ||
− | + | x<-c(1,2,3,4) | |
− | + | y<-c(1,2) #y is too short | |
− | + | x+y #Returns '2,4,4,6' | |
− | + | ||
</pre> | </pre> | ||
− | === Data === | + | ===Data=== |
+ | All simple numerical objects in R function like a long string of numbers. In fact, even the simple: <code>x<-1</code>, can be thought of like a vector with one element. | ||
− | + | The functions <code>dim(x)</code> and <code>str(x)</code> returns information on the dimensionality of <code>x</code>. | |
− | + | ||
− | + | ===Important Objects=== | |
− | + | ;<code>vector</code>:A series of numbers | |
− | === Important Objects === | + | ;<code>matrix</code>:Tables of numbers |
− | + | ;<code>data.frame</code>:More 'powerful' matrix (list of vectors) | |
− | + | ;<code>list</code>:Collections of other objects | |
− | + | ;<code>class</code>:Intelligent(?) lists | |
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
+ | ===Data Matrices=== | ||
Matrices are created with the matrix() function. | Matrices are created with the matrix() function. | ||
− | + | m<-matrix(1:12,nrow=3) | |
− | + | #This produces something like this: | |
− | + | – [,1] [,2] [,3] [,4] | |
− | + | – [1,] 1 4 7 10 | |
− | This produces something like this: | + | – [2,] 2 5 8 11 |
− | + | – [3,] 3 6 9 12 | |
− | – [,1] [,2] [,3] [,4] | + | |
− | – [1,] 1 4 7 10 | + | |
− | – [2,] 2 5 8 11 | + | |
− | – [3,] 3 6 9 12 | + | |
− | + | ||
The recycling rule still applies: | The recycling rule still applies: | ||
− | + | m<-matrix(c(2,5),nrow=3,ncol=3) | |
− | + | #Gives the following matrix: | |
− | + | – [,1] [,2] [,3] | |
− | + | – [1,] 2 5 2 | |
− | Gives the following matrix: | + | – [2,] 5 2 5 |
− | + | – [3,] 2 5 2 | |
− | – [,1] [,2] [,3] | + | |
− | – [1,] 2 5 2 | + | |
− | – [2,] 5 2 5 | + | |
− | – [3,] 2 5 2 | + | |
− | + | ||
− | + | ||
− | + | ||
+ | ===Indexing Matrices=== | ||
For vectors we could specify one index vector like this: | For vectors we could specify one index vector like this: | ||
− | + | x<-c(2,0,1,5) | |
− | + | x[c(1,3)] #Returns '2' and '1' | |
− | + | ||
− | + | ||
For matrices we have to specify two vectors: | For matrices we have to specify two vectors: | ||
− | + | m<-matrix(1:3,nrow=3,ncol=3) | |
− | + | m[c(1,3),c(1,3)] #Return 2*2 matrix | |
− | + | m[1,] #First row as vector | |
− | + | ||
− | + | ||
− | + | ||
− | + | ||
+ | ===Beyond two dimensions=== | ||
You can actually assign to dim(): | You can actually assign to dim(): | ||
− | + | x<-1:12 | |
− | + | dim(x) #Returns 'NULL' | |
− | + | dim(x)<-c(3,4) #3*4 Matrix | |
− | + | dim(x) #Returns '3 4' | |
− | + | dim(x)<-c(2,3,2) #x is now in 3d | |
− | + | dim(x) #Returns '2 3 2' | |
− | + | ||
− | + | ||
But functions like mean() still work: | But functions like mean() still work: | ||
− | + | mean(x) #Returns '6.5' | |
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | Visualization is one of | + | ===Graphics and visualisation=== |
+ | Visualization is one of R's strong points. | ||
R has many functions for drawing graphs, including: | R has many functions for drawing graphs, including: | ||
− | + | hist(x) #Draws a histogram of values in x | |
− | + | plot(x,y) #Draws a basic xy plot of x against y | |
− | Adding stuff to plots | + | Adding stuff to plots: |
− | + | points(x,y) #Add point (x,y) to existing graph. | |
− | + | lines(x,y) #Connect points with line. | |
− | === Graphical devices === | + | ===Graphical devices=== |
+ | A graphical device is what 'displays' the graph. It can be a window, it can be the printer. | ||
− | + | Functions for plotting "Devices": | |
+ | X11() #This function allows you to change the size and composition of the plotting window. | ||
+ | par(mfrow=c(x,y)) #Splits a plotting device into x rows and y columns. | ||
+ | dev.print(postscript, file="???.ps") #Use this device to save the plot to a file. | ||
− | + | ==Packages (add-ons)== | |
− | + | To install packages from the CLI, execute the following: | |
− | + | R CMD INSTALL /path/to/pkg_version.tar.gz | |
− | + | ||
− | + | ||
==See also== | ==See also== | ||
+ | *[[Bioconductor]] | ||
+ | *[http://bio3d.pbwiki.com/ Bio3D] | ||
+ | ===Functions=== | ||
+ | *[[Heatmap]] | ||
+ | *[[Boxplot]] | ||
+ | |||
+ | ===Resources=== | ||
*[http://www.jstatsoft.org/ Journal of Statistical Software] — peer-reviewed journal publishing many R related papers | *[http://www.jstatsoft.org/ Journal of Statistical Software] — peer-reviewed journal publishing many R related papers | ||
*[http://cran.r-project.org/mirrors.html CRAN] — Comprehensive R Archive Network for the R programming language. | *[http://cran.r-project.org/mirrors.html CRAN] — Comprehensive R Archive Network for the R programming language. | ||
+ | *[http://spider.stat.umn.edu/R/library/graphics/html/ R graphics] — a long list of techniques with examples. | ||
+ | ===Books=== | ||
+ | *[http://www3.imperial.ac.uk/naturalsciences/research/statisticsusingr Statistics: An Introduction using R] | ||
==External links== | ==External links== | ||
*[http://www.r-project.org/ The R Project for Statistical Computing] | *[http://www.r-project.org/ The R Project for Statistical Computing] | ||
*[http://www.cran.r-project.org/ The CRAN (Comprehensive R Archive Network) Project] | *[http://www.cran.r-project.org/ The CRAN (Comprehensive R Archive Network) Project] | ||
− | |||
*[http://www.network-theory.co.uk/R/base/ The R Reference Manual - Base Package] by the R Development Core Team. ISBN 0-9546120-0-0 (vol. 1), ISBN 0-9546120-1-9 (vol. 2) | *[http://www.network-theory.co.uk/R/base/ The R Reference Manual - Base Package] by the R Development Core Team. ISBN 0-9546120-0-0 (vol. 1), ISBN 0-9546120-1-9 (vol. 2) | ||
− | *[http://fawn.unibw-hamburg.de/cgi-bin/Rwiki.pl?RwikiHome The R Wiki] User contributed R documentation and | + | ===Wikis=== |
− | *[http:// | + | *[http://wiki.r-project.org/rwiki/doku.php R Wiki] |
− | + | *[http://fawn.unibw-hamburg.de/cgi-bin/Rwiki.pl?RwikiHome The R Wiki] User contributed R documentation and how to information. | |
− | *[http:// | + | *[http://commons.wikimedia.org/wiki/Category:Created_with_R Collection of examples] — from Wikimedia Commons. |
+ | |||
+ | ===Packages / Resources=== | ||
+ | *[http://rpy.sourceforge.net/ RPy] | ||
*[http://socserv.socsci.mcmaster.ca/jfox/Misc/Rcmdr/ Rcmdr, an open source GUI for R] | *[http://socserv.socsci.mcmaster.ca/jfox/Misc/Rcmdr/ Rcmdr, an open source GUI for R] | ||
*[http://www.sciviews.org/_rgui/projects/Editors.html List of IDEs and script editors for R] | *[http://www.sciviews.org/_rgui/projects/Editors.html List of IDEs and script editors for R] | ||
*[http://sourceforge.net/projects/tinn-r/ Tinn-R, an advanced open source script editor for R under Windows] | *[http://sourceforge.net/projects/tinn-r/ Tinn-R, an advanced open source script editor for R under Windows] | ||
+ | *[http://www.math.montana.edu/Rweb/ Web-based interface to R] | ||
+ | ===Tutorials=== | ||
+ | *[http://www.noitulove.ch/2008/07/03/learning-r-part-i/ Learning R - Part I] | ||
+ | *[http://addictedtor.free.fr/graphiques The R Graph Gallery] shows several examples of graphics generated by R | ||
+ | *[http://gentleman.fhcrc.org/ Robert Gentleman's site] | ||
+ | *[http://www.stat.auckland.ac.nz/~ihaka/ Ross Ihaka's site] | ||
+ | *[http://flowingdata.com/2012/05/15/how-to-visualize-and-compare-distributions/ How to Visualize and Compare Distributions] — by FlowingData | ||
− | [[Category: | + | [[Category:R| R programming language]] |
− | + | ||
− | + | ||
− | + |
Latest revision as of 22:03, 16 June 2012
The R programming language (or just "R"), sometimes described as "GNU S", is a mathematical language and environment used for statistical analysis and display.
R is highly extensible through the use of packages, which are user submitted libraries for specific functions or specific areas of study. A core set of packages are included with the installation of R, with many more available at the comprehensive R archive network, CRAN. The bioinformatics community has seeded a successful effort to use R for the analysis of data from molecular biology laboratories. The bioconductor project started in the fall of 2001 provides R packages for the analysis of genomic data. e.g. Affymetrix and cDNA microarray object-oriented data handling and analysis tools.
see scripts
Contents
Installation
Installing R on SuSE 10.1 using the default settings for the rpm or source distribution seems to be a problem. Below are the methods I have used to resolved these problems.
First make sure you have the following installed (check http://www.rpmfind.net for the packages):
compat-g77 compat-gcc gcc-g77
It also sometimes helps to create a soft link to gfortran like so (changing the directory to suit your needs):
ln -s /usr/bin/g77 /usr/bin/gfortran
Then, and this is important, add the following to your config.site (found in your R source directory):
FPICFLAGS=-g
Now you are ready to install R on SuSE:
./configure
Or,
./configure --x-includes=/usr/include/X11 # sometimes necessary
make make check make pdf # optional make info # optional make install # as superuser ('root')
That's it. You are now ready to use R
Comparison with other programs
Although R is mostly used by statisticians, and other people in need of statistics, it can also be used as a general matrix calculation toolbox in a program such as GNU Octave or its proprietary counterpart, MATLAB.
It should not be confused with the R package [1], a collection of programs for multidimensional and spatial analysis available on Macintosh and VAX/VMS systems.
Basics
How to get help:
help.start()
- Opens browser
help()
- For more on using help
help(..)
- For help on ..
help.search("..")
- To search for ..
How to leave again:
q()
- Image can be saved to .RData
Basic R commands
Most arithmetic operators work like you would expect in R:
4+2 #Prints '6' 3*4 #Prints '12'
Operators have precedence as known from basic algebra:
1+2*4 #Prints '9', while (1+2)*4 #Prints '12'
Functions
A function call in R looks like this:
- function_name(arguments)
- Examples:
cos(pi/3) #Prints '0.5' exp(1) #Prints '2.718282'
A function is identified in R by the parentheses
- That's why it's: help(), and not: help
Variables (objects) in R
To assign a value to a variable (object):
x<-4 #Assigns 4 to x x=4 #Assigns 4 to x (new) x #Prints '4' y<-x+2 #Assigns 6 to y
Functions for managing variables:
ls()
orobjects()
- lists all existing objects
str(x)
- tells the structure (type) of object 'x'
rm(x)
- removes (deletes) the object 'x'
Vectors
A vector in R is like a sequence of elements of the same mode.
x<-1:10 #Creates a vector y<-c("a","b","c") #So does this
Handy functions for vectors:
c()
- Concatenates arguments into a vector
min()
- Returns the smallest value in vector
max()
- Returns the largest value in vector
mean()
- Returns the mean of the vector
Elements in a vector can be accessed individually:
x[1] #Prints first element x[1:10] #Prints first 10 elements x[c(1,3)] #Prints element 1 and 3
Most functions expect one vector as argument, rather than individual numbers
mean(1,2,3) #Replies '1' mean(c(1,2,3)) #Replies '2'
The Recycling Rule
The recycling rule is a key concept for vector algebra in R.
When a vector is too short for a given operation, the elements are recycled and used again.
Examples of vectors that are too short:
x<-c(1,2,3,4) y<-c(1,2) #y is too short x+y #Returns '2,4,4,6'
</pre>
Data
All simple numerical objects in R function like a long string of numbers. In fact, even the simple: x<-1
, can be thought of like a vector with one element.
The functions dim(x)
and str(x)
returns information on the dimensionality of x
.
Important Objects
vector
- A series of numbers
matrix
- Tables of numbers
data.frame
- More 'powerful' matrix (list of vectors)
list
- Collections of other objects
class
- Intelligent(?) lists
Data Matrices
Matrices are created with the matrix() function.
m<-matrix(1:12,nrow=3) #This produces something like this: – [,1] [,2] [,3] [,4] – [1,] 1 4 7 10 – [2,] 2 5 8 11 – [3,] 3 6 9 12
The recycling rule still applies:
m<-matrix(c(2,5),nrow=3,ncol=3) #Gives the following matrix: – [,1] [,2] [,3] – [1,] 2 5 2 – [2,] 5 2 5 – [3,] 2 5 2
Indexing Matrices
For vectors we could specify one index vector like this:
x<-c(2,0,1,5) x[c(1,3)] #Returns '2' and '1'
For matrices we have to specify two vectors:
m<-matrix(1:3,nrow=3,ncol=3) m[c(1,3),c(1,3)] #Return 2*2 matrix m[1,] #First row as vector
Beyond two dimensions
You can actually assign to dim():
x<-1:12 dim(x) #Returns 'NULL' dim(x)<-c(3,4) #3*4 Matrix dim(x) #Returns '3 4' dim(x)<-c(2,3,2) #x is now in 3d dim(x) #Returns '2 3 2'
But functions like mean() still work:
mean(x) #Returns '6.5'
Graphics and visualisation
Visualization is one of R's strong points.
R has many functions for drawing graphs, including:
hist(x) #Draws a histogram of values in x plot(x,y) #Draws a basic xy plot of x against y
Adding stuff to plots:
points(x,y) #Add point (x,y) to existing graph. lines(x,y) #Connect points with line.
Graphical devices
A graphical device is what 'displays' the graph. It can be a window, it can be the printer.
Functions for plotting "Devices":
X11() #This function allows you to change the size and composition of the plotting window. par(mfrow=c(x,y)) #Splits a plotting device into x rows and y columns. dev.print(postscript, file="???.ps") #Use this device to save the plot to a file.
Packages (add-ons)
To install packages from the CLI, execute the following:
R CMD INSTALL /path/to/pkg_version.tar.gz
See also
Functions
Resources
- Journal of Statistical Software — peer-reviewed journal publishing many R related papers
- CRAN — Comprehensive R Archive Network for the R programming language.
- R graphics — a long list of techniques with examples.
Books
External links
- The R Project for Statistical Computing
- The CRAN (Comprehensive R Archive Network) Project
- The R Reference Manual - Base Package by the R Development Core Team. ISBN 0-9546120-0-0 (vol. 1), ISBN 0-9546120-1-9 (vol. 2)
Wikis
- R Wiki
- The R Wiki User contributed R documentation and how to information.
- Collection of examples — from Wikimedia Commons.
Packages / Resources
- RPy
- Rcmdr, an open source GUI for R
- List of IDEs and script editors for R
- Tinn-R, an advanced open source script editor for R under Windows
- Web-based interface to R
Tutorials
- Learning R - Part I
- The R Graph Gallery shows several examples of graphics generated by R
- Robert Gentleman's site
- Ross Ihaka's site
- How to Visualize and Compare Distributions — by FlowingData