Difference between revisions of "R programming language"
(Started article) |
(Added "Basics" (basic R commands)) |
||
Line 11: | Line 11: | ||
It should not be confused with the R package [http://www.bio.umontreal.ca/Casgrain/en/labo/R/index.html], a collection of programs for multidimensional and spatial analysis available on Macintosh and VAX/VMS systems. | It should not be confused with the R package [http://www.bio.umontreal.ca/Casgrain/en/labo/R/index.html], a collection of programs for multidimensional and spatial analysis available on Macintosh and VAX/VMS systems. | ||
+ | |||
+ | == Basics == | ||
+ | |||
+ | How to get help: | ||
+ | * <tt>help.start()</tt> #Opens browser | ||
+ | * <tt>help()</tt> #For more on using help | ||
+ | * <tt>help(..)</tt> #For help on .. | ||
+ | * <tt>help.search("..")</tt> #To search for .. | ||
+ | |||
+ | How to leave again: | ||
+ | * <tt>q()</tt> #Image can be saved to <tt>.RData</tt> | ||
+ | |||
+ | === Basic R commands === | ||
+ | Most arithmetic operators work like you would expect in R: | ||
+ | <pre> | ||
+ | > 4 + 2 #Prints ‘6’ | ||
+ | > 3 * 4 #Prints ‘12’ | ||
+ | </pre> | ||
+ | |||
+ | Operators have precedence as known from basic algebra: | ||
+ | <pre> | ||
+ | > 1 + 2 * 4 #Prints ‘9’, while | ||
+ | > (1 + 2) * 4 #Prints ‘12’ | ||
+ | </pre> | ||
+ | |||
+ | === Functions === | ||
+ | A function call in R looks like this: | ||
+ | * function_name(arguments) | ||
+ | * Examples: | ||
+ | <pre> | ||
+ | > cos(pi/3) #Prints ‘0.5’ | ||
+ | > exp(1) #Prints ‘2.718282’ | ||
+ | </pre> | ||
+ | |||
+ | A function is identified in R by the parentheses | ||
+ | * That’s why it’s: help(), and not: help | ||
+ | |||
+ | === Variables (objects) in R === | ||
+ | To assign a value to a variable (object): | ||
+ | <pre> | ||
+ | > x <- 4 #Assigns 4 to x | ||
+ | > x = 4 #Assigns 4 to x (new) | ||
+ | > x #Prints ‘4’ | ||
+ | > y <- x + 2 #Assigns 6 to y | ||
+ | </pre> | ||
+ | |||
+ | Functions for managing variables: | ||
+ | * ls() or objects() lists all existing objects | ||
+ | * str(x) tells the structure (type) of object ‘x’ | ||
+ | * rm(x) removes (deletes) the object ‘x’ | ||
+ | |||
+ | === Vectors === | ||
+ | |||
+ | A vector in R is like a sequence of elements of the same mode. | ||
+ | <pre> | ||
+ | > x <- 1:10 #Creates a vector | ||
+ | > y <- c(“a”,“b”,“c”) #So does this | ||
+ | </pre> | ||
+ | |||
+ | Handy functions for vectors: | ||
+ | * c() – Concatenates arguments into a vector | ||
+ | * min() – Returns the smallest value in vector | ||
+ | * max() – Returns the largest value in vector | ||
+ | * mean() – Returns the mean of the vector | ||
+ | |||
+ | Elements in a vector can be accessed individually: | ||
+ | <pre> | ||
+ | > x[1] #Prints first element | ||
+ | > x[1:10] #Prints first 10 elements | ||
+ | > x[c(1,3)] #Prints element 1 and 3 | ||
+ | </pre> | ||
+ | |||
+ | Most functions expect one vector as argument, rather than individual numbers | ||
+ | <pre> | ||
+ | > mean(1,2,3) #Replies ‘1’ | ||
+ | > mean(c(1,2,3)) #Replies ‘2’ | ||
+ | </pre> | ||
+ | |||
+ | === The Recycling Rule === | ||
+ | The recycling rule is a key concept for vector algebra in R. | ||
+ | |||
+ | When a vector is too short for a given operation, the elements are recycled and used again. | ||
+ | |||
+ | Examples of vectors that are too short: | ||
+ | <pre> | ||
+ | > x <- c(1,2,3,4) | ||
+ | > y <- c(1,2) #y is too short | ||
+ | > x + y #Returns ‘2,4,4,6’ | ||
+ | </pre> | ||
+ | |||
+ | === Data === | ||
+ | |||
+ | All simple numerical objects in R function like a long string of numbers. In fact, even the simple: x <- 1, can be | ||
+ | thought of like a vector with one element. | ||
+ | |||
+ | The functions dim(x) and str(x) returns information on the dimensionality of x. | ||
+ | |||
+ | === Important Objects === | ||
+ | |||
+ | * vector – “A series of numbers” | ||
+ | * matrix – “Tables of numbers” | ||
+ | * data.frame – “More ‘powerful’ matrix (list of vectors)” | ||
+ | * list – “Collections of other objects” | ||
+ | * class – “Intelligent(?) lists” | ||
+ | |||
+ | === Data Matrices === | ||
+ | |||
+ | Matrices are created with the matrix() function. | ||
+ | <pre> | ||
+ | > m <- matrix(1:12,nrow=3) | ||
+ | </pre> | ||
+ | |||
+ | This produces something like this: | ||
+ | <pre> | ||
+ | – [,1] [,2] [,3] [,4] | ||
+ | – [1,] 1 4 7 10 | ||
+ | – [2,] 2 5 8 11 | ||
+ | – [3,] 3 6 9 12 | ||
+ | </pre> | ||
+ | |||
+ | The recycling rule still applies: | ||
+ | <pre> | ||
+ | > m <- matrix(c(2,5),nrow=3,ncol=3) | ||
+ | </pre> | ||
+ | |||
+ | Gives the following matrix: | ||
+ | <pre> | ||
+ | – [,1] [,2] [,3] | ||
+ | – [1,] 2 5 2 | ||
+ | – [2,] 5 2 5 | ||
+ | – [3,] 2 5 2 | ||
+ | </pre> | ||
+ | |||
+ | === Indexing Matrices === | ||
+ | |||
+ | For vectors we could specify one index vector like this: | ||
+ | <pre> | ||
+ | > x <- c(2,0,1,5) | ||
+ | > x[c(1,3)] #Returns ‘2’ and ‘1’ | ||
+ | </pre> | ||
+ | |||
+ | For matrices we have to specify two vectors: | ||
+ | <pre> | ||
+ | > m <- matrix(1:3,nrow=3,ncol=3) | ||
+ | > m[c(1,3),c(1,3)] #Ret. 2*2 matrix | ||
+ | > m[1,] #First row as vector | ||
+ | </pre> | ||
+ | |||
+ | === Beyond two dimensions === | ||
+ | |||
+ | You can actually assign to dim(): | ||
+ | <pre> | ||
+ | > x <- 1:12 | ||
+ | > dim(x) #Returns ‘NULL’ | ||
+ | > dim(x) <- c(3,4) #3*4 Matrix | ||
+ | > dim(x) #Returns ‘3 4’ | ||
+ | > dim(x) <- c(2,3,2) #x is now in 3d | ||
+ | > dim(x) #Returns ‘2 3 2’ | ||
+ | </pre> | ||
+ | |||
+ | But functions like mean() still work: | ||
+ | <pre> | ||
+ | > mean(x) #Returns ‘6.5’ | ||
+ | </pre> | ||
+ | |||
+ | === Graphics and visualisation === | ||
+ | |||
+ | Visualization is one of R’s strong points. | ||
+ | |||
+ | R has many functions for drawing graphs, including: | ||
+ | * hist(x) – Draws a histogram of values in x | ||
+ | * plot(x,y) – Draws a basic xy plot of x against y | ||
+ | |||
+ | Adding stuff to plots | ||
+ | * points(x,y) – Add point (x,y) to existing graph. | ||
+ | * lines(x,y) – Connect points with line. | ||
+ | |||
+ | === Graphical devices === | ||
+ | |||
+ | A graphical device is what ‘displays’ the graph. It can be a window, it can be the printer. | ||
+ | |||
+ | Functions for plotting “Devices”: | ||
+ | * X11() – This function allows you to change the size and composition of the plotting window. | ||
+ | * par(mfrow=c(x,y)) – Splits a plotting device into x rows and y columns. | ||
+ | * dev.print(postscript, file=“???.ps”) | ||
+ | * Use this device to save the plot to a file. | ||
==See also== | ==See also== |
Revision as of 16:09, 2 January 2006
The R programming language (or just "R"), sometimes described as "GNU S", is a mathematical language and environment used for statistical analysis and display. It was originally created by Ross Ihaka and Robert Gentleman (hence the name R) at the University of Auckland, New Zealand, and is now steadily developed further by a large community around the world.
It is based upon S, which was developed by John Chambers of Bell Laboratories and described in the paper "Evolution of the S Language" [1]. R is considered by its developers to be an implementation of S, with semantics derived from Scheme. The commercial implementation of S is S-PLUS [2].
R's source code is freely available under the GNU GPL and pre-compiled binary versions are provided for Windows, Macintosh, and many Unix operating systems. There are several GUIs for R, including RKWard, SciViews-R [3], and Rcmdr [4]. Many editors have specialised modes for R, including Emacs (Emacs Speaks Statistics), jEdit [5], Kate (text editor) [6], and Tinn [7], and there is an R plug-in for the Eclipse IDE framework.
R is highly extensible through the use of packages, which are user submitted libraries for specific functions or specific areas of study. A core set of packages are included with the installation of R, with many more available at the comprehensive R archive network, CRAN. The bioinformatics community has seeded a successful effort to use R for the analysis of data from molecular biology laboratories. The bioconductor project started in the fall of 2001 provides R packages for the analysis of genomic data. e.g. Affymetrix and cDNA microarray object-oriented data handling and analysis tools.
Comparison with other programs
Although R is mostly used by statisticians, and other people in need of statistics, it can also be used as a general matrix calculation toolbox in a program such as GNU Octave or its proprietary counterpart, MATLAB.
It should not be confused with the R package [8], a collection of programs for multidimensional and spatial analysis available on Macintosh and VAX/VMS systems.
Basics
How to get help:
- help.start() #Opens browser
- help() #For more on using help
- help(..) #For help on ..
- help.search("..") #To search for ..
How to leave again:
- q() #Image can be saved to .RData
Basic R commands
Most arithmetic operators work like you would expect in R:
> 4 + 2 #Prints ‘6’ > 3 * 4 #Prints ‘12’
Operators have precedence as known from basic algebra:
> 1 + 2 * 4 #Prints ‘9’, while > (1 + 2) * 4 #Prints ‘12’
Functions
A function call in R looks like this:
- function_name(arguments)
- Examples:
> cos(pi/3) #Prints ‘0.5’ > exp(1) #Prints ‘2.718282’
A function is identified in R by the parentheses
- That’s why it’s: help(), and not: help
Variables (objects) in R
To assign a value to a variable (object):
> x <- 4 #Assigns 4 to x > x = 4 #Assigns 4 to x (new) > x #Prints ‘4’ > y <- x + 2 #Assigns 6 to y
Functions for managing variables:
- ls() or objects() lists all existing objects
- str(x) tells the structure (type) of object ‘x’
- rm(x) removes (deletes) the object ‘x’
Vectors
A vector in R is like a sequence of elements of the same mode.
> x <- 1:10 #Creates a vector > y <- c(“a”,“b”,“c”) #So does this
Handy functions for vectors:
- c() – Concatenates arguments into a vector
- min() – Returns the smallest value in vector
- max() – Returns the largest value in vector
- mean() – Returns the mean of the vector
Elements in a vector can be accessed individually:
> x[1] #Prints first element > x[1:10] #Prints first 10 elements > x[c(1,3)] #Prints element 1 and 3
Most functions expect one vector as argument, rather than individual numbers
> mean(1,2,3) #Replies ‘1’ > mean(c(1,2,3)) #Replies ‘2’
The Recycling Rule
The recycling rule is a key concept for vector algebra in R.
When a vector is too short for a given operation, the elements are recycled and used again.
Examples of vectors that are too short:
> x <- c(1,2,3,4) > y <- c(1,2) #y is too short > x + y #Returns ‘2,4,4,6’
Data
All simple numerical objects in R function like a long string of numbers. In fact, even the simple: x <- 1, can be thought of like a vector with one element.
The functions dim(x) and str(x) returns information on the dimensionality of x.
Important Objects
- vector – “A series of numbers”
- matrix – “Tables of numbers”
- data.frame – “More ‘powerful’ matrix (list of vectors)”
- list – “Collections of other objects”
- class – “Intelligent(?) lists”
Data Matrices
Matrices are created with the matrix() function.
> m <- matrix(1:12,nrow=3)
This produces something like this:
– [,1] [,2] [,3] [,4] – [1,] 1 4 7 10 – [2,] 2 5 8 11 – [3,] 3 6 9 12
The recycling rule still applies:
> m <- matrix(c(2,5),nrow=3,ncol=3)
Gives the following matrix:
– [,1] [,2] [,3] – [1,] 2 5 2 – [2,] 5 2 5 – [3,] 2 5 2
Indexing Matrices
For vectors we could specify one index vector like this:
> x <- c(2,0,1,5) > x[c(1,3)] #Returns ‘2’ and ‘1’
For matrices we have to specify two vectors:
> m <- matrix(1:3,nrow=3,ncol=3) > m[c(1,3),c(1,3)] #Ret. 2*2 matrix > m[1,] #First row as vector
Beyond two dimensions
You can actually assign to dim():
> x <- 1:12 > dim(x) #Returns ‘NULL’ > dim(x) <- c(3,4) #3*4 Matrix > dim(x) #Returns ‘3 4’ > dim(x) <- c(2,3,2) #x is now in 3d > dim(x) #Returns ‘2 3 2’
But functions like mean() still work:
> mean(x) #Returns ‘6.5’
Graphics and visualisation
Visualization is one of R’s strong points.
R has many functions for drawing graphs, including:
- hist(x) – Draws a histogram of values in x
- plot(x,y) – Draws a basic xy plot of x against y
Adding stuff to plots
- points(x,y) – Add point (x,y) to existing graph.
- lines(x,y) – Connect points with line.
Graphical devices
A graphical device is what ‘displays’ the graph. It can be a window, it can be the printer.
Functions for plotting “Devices”:
- X11() – This function allows you to change the size and composition of the plotting window.
- par(mfrow=c(x,y)) – Splits a plotting device into x rows and y columns.
- dev.print(postscript, file=“???.ps”)
- Use this device to save the plot to a file.
See also
- Journal of Statistical Software — peer-reviewed journal publishing many R related papers
- CRAN — Comprehensive R Archive Network for the R programming language.
External links
- The R Project for Statistical Computing
- The CRAN (Comprehensive R Archive Network) Project
- Web-based interface to R
- The R Reference Manual - Base Package by the R Development Core Team. ISBN 0-9546120-0-0 (vol. 1), ISBN 0-9546120-1-9 (vol. 2)
- The R Wiki User contributed R documentation and how to information.
- The R Graph Gallery shows several examples of graphics generated by R
- Robert Gentleman's site
- Ross Ihaka's site
- Rcmdr, an open source GUI for R
- List of IDEs and script editors for R
- Tinn-R, an advanced open source script editor for R under Windows