Rewrite engine

From Christoph's Personal Wiki
Revision as of 21:43, 15 April 2015 by Christoph (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

A rewrite engine is a piece of web server software used to modify URLs, for a variety of purposes. Some benefits derived from a rewrite engine are:

  • Making website URLs more user friendly
  • Making website URLs more search-engine friendly
  • Preventing undesired "inline linking"
  • Not exposing the (web address related) inner workings of a website to users

Many of these only apply to HTTP servers whose default behaviour is to map URLs to filesystem entities (i.e. files and directories); certain environments, such as many HTTP application server platforms, make this irrelevant.

The Apache HTTP server has a rewrite engine called mod_rewrite (see below), which has been described as "the Swiss Army knife of URL manipulation".

mod_rewrite

RewriteRule FLAGS
Flag Description
R[=code] Redirect to new URL, with optional code (see below).
F Forbidden (sends 403 header)
G Gone (no longer exists)
P Proxy
L Last Rule
N Next (ie, restart rules)
C Chain
T=mime-type Set Mime Type
NS Skip if internal sub-request
NC Case insensitive
QSA Append query string
NE Do not escape output
PT Pass through
S=x Skip next x rules
E=var:value Set environmental variable "var" to "value".


RewriteCond FLAGS
Flag Description
NC Case insensitive
OR Allows a rule to apply if one of a series of conditions are true.


Regular Expression Syntax
Flag Description
^ Start of string
$ End of string
. Any single character
(a|b) a or b
(...) Group sectioin
[abc] Item in range (a or b or c)
[^abc] Not in range (not a or b or c)
a? Zero or one of a
a* Zero or more of a
a+ One or more of a
a{3} Exactly 3 of a
a{3,} 3 or more of a
a{3,6} Between 3 and 6 of a
!(pattern) "Not" prefix. Apply rule when URL does not match pattern.


Redirection Header Codes
Flag Description
301 Moved permanently
302 Moved temporarily
403 Forbidden
404 Not found
410 Gone


Server Variables

Format

  • %{NAME_OF_VAR}

HTTP Headers

  • HTTP_USER_AGENT
  • HTTP_REFERER
  • HTTP_COOKIE
  • HTTP_FORWARDED
  • HTTP_HOST
  • HTTP_PROXY_CONNECTION
  • HTTP_ACCEPT

Request

  • REMOTE_ADDR
  • REMOTE_HOST
  • REMOTE_USER
  • REMOTE_IDENT
  • REQUEST_METHOD
  • SCRIPT_FILENAME
  • PATH_INFO
  • QUERY_STRING
  • AUTH_TYPE

Server

  • DOCUMENT_ROOT
  • SERVER_ADMIN
  • SERVER_NAME
  • SERVER_ADDR
  • SERVER_PORT
  • SERVER_PROTOCOL
  • SERVER_SOFTWARE

Time

  • TIME_YEAR
  • TIME_MON
  • TIME_DAY
  • TIME_HOUR
  • TIME_MIN
  • TIME_SEC
  • TIME_WDAY
  • TIME

Special

  • API_VERSION
  • THE_REQUEST
  • REQUEST_URI
  • REQUEST_FILENAME
  • IS_SUBREQ

Directives

  • RewriteEngine
  • RewriteOptions
  • RewriteLog
  • RewriteLogLevel
  • RewriteLock
  • RewriteMap
  • RewriteBase
  • RewriteCond
  • RewriteRule

Example rules

# Site has permanently moved to new domain
# domain.com to domain2.com
RewriteCond %{HTTP_HOST} ^www.domain.com$ [NC]
RewriteRule ^(.*)$ http://www.domain2.com/$1 [R=301,L]
# Page has moved temporarily
# domain.com/page.htm to domain.com/new_page.htm
RewriteRule ^page.htm$ new_page.htm [R,NC,L]
# Nice looking URLs (no querystring)
# domain.com/category-name-1/ to domain.com/categories.php?name=category-name-1
RewriteRule ^([A-Za-z0-9-]+)/?$ categories.php?name=$1 [L]
# Nice looking URLs (no querystring) with pagination
# domain.com/articles/title/5/ to domain.com/article.php?name=title&page=5
RewriteRule ^articles/([A-Za-z0-9-]+)/([0-9]+)/?$ article.php?name=$1&page=$2 [L]
# Block referrer spam
RewriteCond %{HTTP_REFERER} (weight) [NC,OR]
RewriteCond %{HTTP_REFERER} (drugs) [NC]
RewriteRule .* - [F]

User friendly / Search engine friendly URLs

People use website URLs in all kinds of ways. We send them to other people by email, put them on online discussion boards, or even write them on scraps of paper. This often applies not just to website home pages, but to specific content within a website. Typically website developers want to encourage this, as it means increased traffic to their sites. As such, a well designed website should allow users to enter at any URL (not just the homepage), and the URLs throughout the site should be easy to use.

A URL is easier to use if it is short but descriptive. The URL should have some text describing the content (not just numbers), but should not be too long.

Search engines will also find it easier to index pages which follow these rules. Content which is easier to index is more likely to be included in search results.

Website URLs are often quite long and quite meaningless to humans. This is because many websites have dynamic content, meaning that HTML returned to the browser is generated on-the-fly, rather than simply being stored as a static HTML file. The URL is used not only to reference an HTML document at a fixed address, but to pass pieces of data to software running on the webserver, which then generates the HTML page dynamically. Typically this software is of the form of scripts written in a web scripting language such as Perl or PHP.

Using an URL rewrite engine, the website software can be presented with URLs in one form, while actual requests (and URLs seen by the user) are in another form. So rewrite engines allow URLs to be tidied up and made more user friendly, by configuring rewrite rules, rather than modifying the webserver software.

Example VirtualHost domain redirect

<Directory />
  Options FollowSymLinks
  AllowOverride All
</Directory>

<VirtualHost *:80>

  ServerAdmin admin@example.com
  ServerName  example.com
  ServerAlias www.example.com

  # Index file and Document Root (where the public files are located)
  DirectoryIndex index.html index.php
  DocumentRoot /var/www/html/example.com

  # Rewrite rules
  # Example: http://xtof.ch/skills redirects to http://wiki.christophchamp.com/index.php/Technical_and_Specialized_Skills
  RewriteEngine On
  RewriteRule ^/skills$ http://wiki.christophchamp.com/index.php/Technical_and_Specialized_Skills [R=301,L]

  # Custom log file locations
  LogLevel warn
  ErrorLog  /var/log/httpd/example.com-error.log
  CustomLog /var/log/httpd/example.com-access.log combined

</VirtualHost>

See also

External links