Nginx - Rewrite Module

Initially, the purpose of this module (as the name suggests) is to perform URL rewriting. This mechanism allows you to get rid of ugly URLs containing multiple parameters, for instance, http://example.com/article. php?id=1234&comment=32 — such URLs being particularly uninformative and meaningless for a regular visitor. Instead, links to your website will contain useful information that indicate the nature of the page you are about to visit. The URL given in the example becomes http://website.com/article-1234-32-USeconomy-strengthens.html. This solution is not only more interesting for your visitors, but also for search engines — URL rewriting is a key element to Search Engine Optimization (SEO).

The principle behind this mechanism is simple — it consists of rewriting the URI of the client request after it is received, before serving the file. Once rewritten, the URI is matched against location blocks in order to find the configuration that should be applied to the request. The technique is further detailed in the coming sections.

Reminder on Regular Expressions

First and foremost, this module requires a certain understanding of regular expressions, also known as regexes or regexps. Indeed, URL rewriting is performed by the rewrite directive, which accepts a pattern followed by the replacement URI.

Purpose

The first question we must answer is: What's the purpose of regular expressions? To put it simply, the main purpose is to verify that a string matches a pattern. The said pattern is written in a particular language that allows defining extremely complex and accurate rules.

StringPatternMatches?Explanation
hello^hello$YesThe string begins by character h (^h), followed by e, l, l, and then finishes by o (o$).
hell^hello$NoThe string begins by character h (^h), followed by e, l, l but does not finish by o.
Hello^hello$DependsIf the engine performing the match is casesensitive, the string doesn't match the pattern.

This concept becomes a lot more interesting when complex patterns are employed, such as one that validate an e-mail addresses: ^[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}$. Validating the well-forming of an e-mail address programmatically would require a great deal of code, while all of the work can be done with a single regular expression pattern matching.

PCRE Syntax

The syntax that Nginx employs originates from the Perl Compatible Regular Expression (PCRE) library. It's the most commonly used form of regular expression, and nearly everything you learn here remains valid for other language variations.

In its simplest form, a pattern is composed of one character, for example, x. We can match strings against this pattern. Does example match the pattern x? Yes, example contains the character x. It can be more than one specific character — the pattern [a-z] matches any character between a and z, or even a combination of letters and digits: [a-z0-9]. In consequence, the pattern hell[a-z0-9] validates the following strings: hello and hell4, but not hell or hell!.

You probably noticed that we employed the characters [ and ]. These are called metacharacters and have a special effect on the pattern. There are a total of 11 metacharacters, and all play a different role. If you want to actually create a pattern containing one of these characters, you need to escape them with the \ character.

MetacharacterDescription

^

Beginning

The entity after this character must be found at the beginning.

Example pattern: ^h

Matching strings: hello, h, hh

Non-matching strings: character, ssh

$

End

The entity before this character must be found at the end.

Example pattern: e$

Matching strings: sample, e, file

Non-matching strings: extra, shell

.

Any

Matches any character.

Example pattern: hell.

Matching strings: hello, hellx, hell5, hell!

Non-matching strings: hell, helo

[ ]

Set

Matches any character within the specified set.

Syntax: [a-z] for a range, [abcd] for a set, and [a-z0-9] for two ranges. Note that if you want to include the – character in a range, you need to insert it right after the [ or just before the ].

Example pattern: hell[a-y123-]

Matching strings: hello, hell1, hell2, hell3, hell

Non-matching strings: hellz, hell4, heloo, he-llo

[^ ]

Negate set

Matches any character that is not within the specified set.

Example pattern: hell[^a-np-z0-9]

Matching strings: hello, hell;

Non-matching strings: hella, hell5

|

Alternation

Matches the entity placed either before or after the |.

Example pattern: hello|welcome

Matching strings: hello, welcome, helloes, awelcome

Non-matching strings: hell, ellow, owelcom

( )

Grouping

Groups a set of entities, often to be used in conjunction with |.

Example pattern: ^(hello|hi) there$

Matching strings: hello there, hi there.

Non-matching strings: hey there, ahoy there

\

Escape

Allows you to escape special characters.

Example pattern: Hello\.

Matching strings: Hello., Hello. How are you?, Hi! Hello...

Non-matching strings: Hello, Hello, how are you?

Quantifiers

So far, you are able to express simple patterns with a limited number of characters. Quantifiers allow you to extend the amount of accepted entities:

QuantifierDescription

*

0 or more times

The entity preceding * must be found 0 or more times.

Example pattern: he*llo

Matching strings: hllo, hello, heeeello

Non-matching strings: hallo, ello

+

1 or more times

The entity preceding + must be found 1 or more times.

Example pattern: he+llo

Matching strings: hello, heeeello

Non-matching strings: hllo, helo

?

0 or 1 time

The entity preceding ? must be found 0 or 1 time.

Example pattern: he?llo

Matching strings: hello, hllo

Non-matching strings: heello, heeeello

{x}

x times

The entity preceding {x} must be found x times.

Example pattern: he{3}llo

Matching strings: heeello, oh heeello there!

Non-matching strings: hello, heello, heeeello

{x,}

At least x times

The entity preceding {x,} must be found at least x times.

Example pattern: he{3,}llo

Matching strings: heeello, heeeeeeello

Non-matching strings: hllo, hello, heello

{x,y}

x to y times

The entity preceding {x,y} must be found between x and y times.

Example pattern: he{2,4}llo

Matching strings: heello, heeello, heeeello

Non-matching strings: hello, heeeeello

As you probably noticed, the { and } characters in the regular expressions conflict with the block delimiter of the Nginx configuration file syntax language. If you want to write a regular expression pattern that includes curly brackets, you need to place the pattern between quotes (single or double quotes):

rewrite hel{2,}o /hello.php; # invalid

rewrite "hel{2,}o" /hello.php; # valid

rewrite 'hel{2,}o' /hello.php; # valid

Captures

One last feature of the regular expression mechanism is the ability to capture sub-expressions. Whatever text is placed between parentheses ( ) is captured and can be used after the matching process.

Here are a couple of examples to illustrate the principle:

PatternStringCaptured
^(hello|hi) (sir|mister)$hello sir

$1 = hello

$2 = sir

^(hello (sir))$hello sir

$1 = hello sir

$2 = sir

^(.*)$nginx rocks$1 = nginx rocks
^(.{1,3})([0-9]{1,4})([?!]{1,2})$abc1234!?

$1 = abc

$2 = 1234

$3 = !?

Named captures are also supported: ^/(?<folder>[^/]*)/(?<file>.*)$/admin/doc $folder = admin$file = doc

When you use a regular expression in Nginx, for example, in the context of a location block, the buffers that you capture can be employed in later directives:

server {

  server_name website.com;

  location ~* ^/(downloads|files)/(.*)$ {

    add_header Capture1 $1;

    add_header Capture2 $2;

  }

}

In the preceding example, the location block will match the request URI against a regular expression. A couple of URIs that would apply here: /downloads/file.txt, /files/archive.zip, or even /files/docs/report.doc. Two parts are captured: $1 will contain either downloads or files and $2 will contain whatever comes after /downloads/ or /files/. Note that the add_header directive is employed here to append arbitrary headers to the client response for the sole purpose of demonstration.

Internal requests

Nginx differentiates external and internal requests. External requests directly originate from the client; the URI is then matched against possible location blocks:

server {

  server_name website.com;

  location = /document.html {

    deny all; # example directive

  }

}

A client request to http://website.com/document.html would directly fall into the above location block.

Opposite to this, internal requests are triggered by Nginx via specific directives. In default Nginx modules, there are several directives capable of producing internal requests: error_page, index, rewrite, try_files, add_before_body, add_after_body, the include SSI command, and more.

There are two different kinds of internal requests:

  • Internal redirects: Nginx redirects the client requests internally. The URI is changed, and the request may therefore match another location block and become eligible for different settings. The most common case of internal redirects is when using the Rewrite directive, which allows you to rewrite the request URI.
  • Sub-requests: Additional requests that are triggered internally to generate content that is complementary to the main request. A simple example would be with the Addition module. The add_after_body directive allows you to specify a URI that will be processed after the original one, the resulting content being appended to the body of the original request. The SSI module also makes use of sub-requests to insert content with the include command.

error_page

Detailed in the module directives of the Nginx HTTP Core module, error_page allows you to define the server behavior when a specific error code occurs. The simplest form is to affect a URI to an error code:

server {

  server_name website.com;

  error_page 403 /errors/forbidden.html;

  error_page 404 /errors/not_found.html;

}

When a client attempts to access a URI that triggers one of these errors, Nginx is supposed to serve the page corresponding to the error code. In fact, it does not just send the client the error page — it actually initiates a completely new request based on the new URI.

Consequently, you can end up falling back on a different configuration, like in the following example:

server {

  server_name website.com;

  root /var/www/vhosts/website.com/httpdocs/;

  error_page 404 /errors/404.html;

  location /errors/ {

    alias /var/www/common/errors/;

    internal;

  }

}

When a client attempts to load a document that does not exist, they will initially receive a 404 error. We employed the error_page directive to specify that 404 errors should create an internal redirect to /errors/404.html. As a result, a new request is generated by Nginx with the URI /errors/404.html. This URI falls under the location /errors/ block so the configuration applies.

Logs can prove to be particularly useful when working with redirects and URL rewrites. Be aware that information on internal redirects will show up in the logs only if you set the error_log directive to debug. You can also get it to show up at the notice level, under the condition that you specify rewrite_log on; wherever you need it.

A raw, but trimmed, excerpt from the debug log summarizes the mechanism:

->http request line: "GET /page.html HTTP/1.1"

->http uri: "/page.html"

->test location: "/errors/"

->using configuration ""

->http filename: "/var/www/vhosts/website.com/httpdocs/page.html"

-> open() "/var/www/vhosts/website.com/httpdocs/page.html" failed (2: No such file or directory), client: 127.0.0.1, server: website.com, request: "GET /page.html HTTP/1.1", host:"website.com"

->http finalize request: 404, "/page.html?" 1

->http special response: 404, "/page.html?"

->internal redirect: "/errors/404.html?"

->test location: "/errors/"

->using configuration "/errors/"

->http filename: "/var/www/common/errors/404.html"

->http finalize request: 0, "/errors/404.html?" 1

Note that the use of the internal directive in the location block forbids clients from accessing the /errors/ directory. This location can only be accessed from an internal redirect.

The mechanism is the same for the index directive — if no file path is provided in the client request, Nginx will attempt to serve the specified index page by triggering an internal redirect.

Rewrite

While the previous directive error_page is not actually part of the Rewrite module, detailing its functionality provides a solid introduction to the way Nginx handles requests.

Similar to how the error_page directive redirects to another location, rewriting the URI with the rewrite directive generates an internal redirect:

server {

  server_name website.com;

  root /var/www/vhosts/website.com/httpdocs/;

  location /storage/ {

    internal;

    alias /var/www/storage/;

  }

  location /documents/ {

    rewrite ^/documents/(.*)$ /storage/$1;

  }

}

A client query to http://website.com/documents/file.txt initially matches the second location block (location /documents/). However, the block contains a rewrite instruction that transforms the URI from /documents/file.txt to /storage/file.txt. The URI transformation reinitializes the process — the new URI is matched against the location blocks. This time, the first location block (location /storage/) matches the URI (/storage/file.txt).

Again, a quick peek at the debug log confirms the mechanism:

->http request line: "GET /documents/file.txt HTTP/1.1"

->http uri: "/documents/file.txt"

->test location: "/storage/"

->test location: "/documents/"

->using configuration "/documents/"

->http script regex: "^/documents/(.*)$"

->"^/documents/(.*)$" matches "/documents/file.txt", client: 127.0.0.1, server: website.com, request: "GET /documents/file.txt HTTP/1.1", host: "website.com"

->rewritten data: "/storage/file.txt", args: "", client: 127.0.0.1, server: website.com, request: "GET /documents/file.txt HTTP/1.1", host: "website.com"

->test location: "/storage/"

->using configuration "/storage/"

->http filename: "/var/www/storage/file.txt"

->HTTP/1.1 200 OK

->http output filter "/storage/test.txt?"

Infinite Loops

With all of the different syntaxes and directives, you may easily get confused. Worse — you might get Nginx confused. This happens, for instance, when your rewrite rules are redundant and cause internal redirects to loop infinitely:

server {

  server_name website.com;

  location /documents/ {

    rewrite ^(.*)$ /documents/$1;

  }

}

You thought you were doing well, but this configuration actually triggers internal redirects /documents/anything to /documents//documents/anything. Moreover, since the location patterns are re-evaluated after an internal redirect, /documents//documents/anything becomes /documents//documents//documents/anything.

Here is the corresponding excerpt from the debug log:

->test location: "/documents/"

->using configuration "/documents/"

->rewritten data: "/documents//documents/file.txt", [...]

->test location: "/documents/"

->using configuration "/documents/"

->rewritten data: "/documents//documents//documents/file.txt" [...]

->test location: "/documents/"

->using configuration "/documents/"

->rewritten data: -

>"/documents//documents//documents//documents/file.txt" [...]

->[...]

You probably wonder if this goes on indefinitely—the answer is no. The amount of cycles is restricted to 10. You are only allowed 10 internal redirects. Anything past this limit and Nginx will produce a 500 Internal Server Error.

Server Side Includes (SSI)

A potential source of sub-requests is the Server Side Include (SSI) module. The purpose of SSI is for the server to parse documents before sending the response to the client in a somewhat similar fashion to PHP or other preprocessors.

Within a regular HTML file (for example), you have the possibility to insert tags corresponding to commands interpreted by Nginx:

<html>

<head>

  <!--# include file="header.html" -->

</head>

<body>

  <!--# include file="body.html" -->

</body>

</html>

Nginx processes these two commands; in this case, it reads the contents of head.html and body.html and inserts them into the document source, which is then sent to the client.

Several commands are at your disposal; they are detailed in the SSI Module section. The one we are interested in for now is the include command — including a file into another file:

<!--# include virtual="/footer.php? -->

The specified file is not just opened and read from a static location. Instead, a whole subrequest is processed by Nginx, and the body of the response is inserted instead of the include tag.

Conditional Structure

The Rewrite module introduces a new set of directives and blocks, among which is the if conditional structure:

server {

  if ($request_method = POST) {

    […]

  }

}

This gives you the possibility to apply a configuration according to the specified condition. If the condition is true, the configuration is applied; otherwise, it isn't.

The following table describes the different syntaxes accepted when forming a condition:

OperatorDescription
None

The condition is true if the specified variable or data is not equal to an empty string or a string starting with character 0:

if ($string) {

  […]

}

=, !=

The condition is true if the argument preceding the = symbol is equal to the argument following it. The following example can be read as "if the request_method is equal to POST, then apply the configuration":

if ($request_method = POST) {

  […]

}

The != operator does the opposite: "if the request method is differentthan GET, then apply the configuration":

if ($request_method != GET) {

  […]

}

~, ~*, !~, !~*

The condition is true if the argument preceding the ~ symbol matches the regular expression pattern placed after it:

if ($request_filename ~ "\.txt$") {

  […]

}

~ is case-sensitive, ~* is case-insensitive. Use the ! symbol to negate the matching:

if ($request_filename !~* "\.php$") {

  […]

}

Note that you can insert capture buffers in the regular expression:

if ($uri ~ "^/search/(.*)$") {

  set $query $1;

  rewrite ^ http://google.com/search?q=$query;

}

-f, !-f

Tests the existence of the specified file:

if (-f $request_filename) {

  […] # if the file exists

}

Use !-f to test the non-existence of the file:

if (!-f $request_filename) {

  […] # if the file does not exist

}

-d, !-dSimilar to the -f operator, for testing the existence of a directory.
-e, !-eSimilar to the -f operator, for testing the existence of a file, directory, or symbolic link.
-x, !-xSimilar to the -f operator, for testing if a file exists and is executable.

As of version 1.2.9, there is no else- or else if-like instruction. However, other directives allowing you to control the flow sequencing are available.

You might wonder: what are the advantages of using a location block over an if block? Indeed, in the following example, both seem to have the same effect:

if ($uri ~ /search/) {

  […]

}

location ~ /search/ {

  […]

}

As a matter of fact, the main difference lies within the directives that can be employed within either block — some can be inserted in an if block and some can't; on the contrary, almost all directives are authorized within a location block, as you probably noticed in the directive listings. In general, it's best to only insert directives from the Rewrite module within an if block, as other directives were not originally intended for such usage.

Directives

The Rewrite module provides you with a set of directives that do more than just rewriting a URI. The following table describes these directives along with the context in which they can be employed:


rewrite

Context: server, location, if

As discussed previously, the rewrite directive allows you to rewrite the URI of the current request, thus resetting the treatment of the said request.

Syntax: rewrite regexp replacement [flag];

Where regexp is the regular expression the URI should match in order for the replacement to apply.

Flag may take one of the following values:

  • last: The current rewrite rule should be the last to be applied. After its application, the new URI is processed by Nginx and a location block is searched for. However, further rewrite instructions will be disregarded.
  • break: The current rewrite rule is applied, but Nginx does not initiate a new request for the modified URI (does not restart the search for matching location blocks). All further rewrite directives are ignored.
  • redirect: Returns a 302 Moved temporarily HTTP response, with the replacement URI set as value of the location header.
  • permanent: Returns a 301 Moved permanently HTTP response, with the replacement URI set as the value of the location header.

If you specify a URI beginning with http:// as the replacement URI, Nginx will automatically use the redirect flag.

Note that the request URI processed by the directive is a relative URI: It does not contain the hostname and protocol. For a request such as http://website.com/documents/page.html, the request URI is /documents/page.html.

Is decoded: The URI corresponding to a request such as http://website.com/my%20page.html would be /my page.html.

Does not contain arguments: For a request such as http://website.com/page.php?t need to consider including the arguments in the replacement URI — Nginx does it for you. If you wish for Nginx to not include the arguments in the rewritten URI, then insert a ? at the end of the replacement URI: rewrite ^/search/(.*)$ /search.php?q=$1?.

Examples:

rewrite ^/search/(.*)$ /search.php?q=$1;

rewrite ^/search/(.*)$ /search.php?q=$1?;

rewrite ^ http://website.com;

rewrite ^ http://website.com permanent;


break

Context: server, location, if

The break directive is used to prevent further rewrite directives. Past this point, the URI is fixed and cannot be altered.

Example:

if (-f $uri) {

  break; # break if the file exists

}

if ($uri ~ ^/search/(.*)$) {

  set $query $1;

  rewrite ^ /search.php?q=$query?;

}

This example rewrites /search/anything-like queries to /search.php?q=anything. However, if the requested file exists (such as /search/index.html), the break instruction prevents Nginx from rewriting the URI.


return

Context: server, location, if

Interrupts the request treatment process and returns the specified HTTP status code or specified text.

Syntax: return code | text;

Where code is picked among the following status codes: 204, 400, 402 to 406, 408, 410, 411, 413, 416, and 500 to 504. In addition, you may use the Nginx-specific code 444 in order to return a HTTP 200 OK status code with no further header or body data. You may also specify the raw text that will be returned to the user as response body.

Example:

if ($uri ~ ^/admin/) {

  return 403;

  # the instruction below is NOT executed

  # as Nginx already completed the request

  rewrite ^ http://website.com;

}


set

Context: server, location, if

Initializes or redefines a variable. Note that some variables cannot be redefined, for example, you are not allowed to alter $uri.

Syntax: set $variable value;

Examples:

set $var1 "some text";

if ($var1 ~ ^(.*) (.*)$) {

  set $var2 $1$2; #concatenation

  rewrite ^ http://website.com/$var2;

}


uninitialized_variable_warn

Context: http, server, location, if

If set to on, Nginx will issue log messages when the configuration employs a variable that has not yet been initialized.

Syntax: on or off

Examples:

uninitialized_variable_warn on;


rewrite_log

Context: http, server, location, if

If set to on, Nginx will issue log messages for every operation performed by the rewrite engine at the notice error level (see error_log directive).

Syntax: on or off

Default value: off

Examples:

rewrite_log off;


Common Eewrite Rules

Here is a set of rewrite rules that satisfy basic needs for dynamic websites that wish to beautify their page links thanks to the URL rewriting mechanism. You will obviously need to adjust these rules according to your particular situation as every website is different.

Performing a Search

This rewrite rule is intended for search queries. Search keywords are included in the URL.


Input URI     http://website.com/search/some-search-keywords

Rewritten URI  http://website.com/search.php?q=some-search-keywords

Rewrite rule    rewrite ^/search/(.*)$ /search.php?q=$1?;


User Profile Page

Most dynamic websites that allow visitors to register, offer a profile view page. URLs of this form can be employed, containing both the user ID and the username.


Input URI       http://website.com/user/31/James

Rewritten URI    http://website.com/user.php?id=31&name=James

Rewrite rule     rewrite ^/user/([0-9]+)/(.+)$ /user.php?id=$1&name=$2?;


Multiple Parameters

Some websites may use different syntaxes for the argument string, for example, by separating non-named arguments with slashes.


Input URI      http://website.com/index.php/param1/param2/param3

Rewritten URI    http://website.com/index.php?p1=param1&p2=param2&p3=param3

Rewrite rule     rewrite ^/index.php/(.*)/(.*)/(.*)$ /index.php?p1=$1&p2=$2&p3=$3?;


News Website Article

This URL structure is often employed by news websites as URLs contain indications of the articles' contents. It is formed of an article identifier, followed by a slash, then a list of keywords. The keywords can usually be ignored and not included in the rewritten URI.


Input URI      http://website.com/33526/us-economy-strengthens

Rewritten URI    http://website.com/article.php?id=33526

Rewrite rule    rewrite ^/([0-9]+)/.*$ /article.php?id=$1?;


Discussion Board

Modern bulletin boards now use pretty URLs for the most part. This example shows how to create a topic view URL with two parameters — the topic identifier and the starting post. Once again, keywords are ignored:


Input URI      http://website.com/topic-1234-50-some-keywords.html

Rewritten URI    http://website.com/viewtopic.php?topic=1234&start=50

Rewrite rule    rewrite ^/topic-([0-9]+)-([0-9]+)-(.*)\.html$ /viewtopic.php?topic=$1&start=$2?;