[0001]
[0002]
[0003]
[0004]
[0005]
[0006]
[0007]
[0008]
[0009]
[0010]
[0011]
[0012]
[0013]
[0014]
[0015]
[0016]
[0017]
[0018]
[0019]
[0020]
[0021]
[0022]
[0023]
[0024]
[0025]
[0026]
[0027]
[0028]
[0029]
[0030]
[0031]
[0032]
[0033]
[0034]
[0035]
[0036]
[0037]
[0038]
[0039]
[0040]
[0041]
[0042]
[0043]
[0044]
[0045]
[0046]
[0047]
[0048]
[0049]
[0050]
[0051]
[0052]
[0053]
[0054]
[0055]
[0056]
[0057]
[0058]
[0059]
[0060]
[0061]
[0062]
[0063]
[0064]
[0065]
[0066]
[0067]
[0068]
[0069]
[0070]
[0071]
[0072]
[0073]
[0074]
[0075]
[0076]
[0077]
[0078]
[0079]
[0080]
[0081]
[0082]
[0083]
[0084]
[0085]
[0086]
[0087]
[0088]
[0089]
[0090]
[0091]
[0092]
[0093]
[0094]
[0095]
[0096]
[0097]
[0098]
[0099]
[0100]
[0101]
[0102]
[0103]
[0104]
[0105]
[0106]
[0107]
[0108]
[0109]
[0110]
[0111]
[0112]
[0113]
[0114]
[0115]
[0116]
[0117]
[0118]
[0119]
[0120]
[0121]
[0122]
[0123]
[0124]
[0125]
[0126]
[0127]
[0128]
[0129]
[0130]
[0131]
[0132]
[0133]
[0134]
[0135]
[0136]
[0137]
[0138]
[0139]
[0140]
[0141]
[0142]
[0143]
[0144]
[0145]
[0146]
[0147]
[0148]
[0149]
[0150]
[0151]
[0152]
[0153]
[0154]
[0155]
[0156]
[0157]
[0158]
[0159]
[0160]
[0161]
[0162]
[0163]
[0164]
[0165]
[0166]
[0167]
[0168]
[0169]
[0170]
[0171]
[0172]
[0173]
[0174]
[0175]
[0176]
[0177]
[0178]
[0179]
[0180]
[0181]
[0182]
[0183]
[0184]
[0185]
[0186]
[0187]
[0188]
[0189]
[0190]
[0191]
[0192]
[0193]
[0194]
[0195]
[0196]
[0197]
[0198]
[0199]
[0200]
[0201]
[0202]
[0203]
[0204]
[0205]
[0206]
[0207]
[0208]
[0209]
[0210]
[0211]
[0212]
[0213]
[0214]
[0215]
[0216]
[0217]
[0218]
[0219]
[0220]
[0221]
[0222]
[0223]
[0224]
[0225]
[0226]
[0227]
[0228]
[0229]
[0230]
[0231]
[0232]
[0233]
[0234]
[0235]
[0236]
[0237]
[0238]
[0239]
[0240]
[0241]
[0242]
[0243]
[0244]
[0245]
[0246]
[0247]
[0248]
[0249]
[0250]
[0251]
[0252]
[0253]
[0254]
[0255]
[0256]
[0257]
[0258]
[0259]
[0260]
[0261]
[0262]
[0263]
[0264]
[0265]
[0266]
[0267]
[0268]
[0269]
[0270]
[0271]
[0272]
[0273]
[0274]
[0275]
[0276]
[0277]
[0278]
[0279]
[0280]
[0281]
[0282]
[0283]
[0284]
[0285]
[0286]
[0287]
[0288]
[0289]
[0290]
[0291]
[0292]
[0293]
[0294]
[0295]
[0296]
[0297]
[0298]
[0299]
[0300]
[0301]
[0302]
[0303]
[0304]
[0305]
[0306]
[0307]
[0308]
[0309]
[0310]
[0311]
[0312]
[0313]
[0314]
[0315]
[0316]
[0317]
[0318]
[0319]
[0320]
[0321]
|1String Matching|

|^ Matching of strings is a pervasive and important function within the server. 
Two types are supported; wildcard and regular expression.  Wildcard matching is
generally much less expensive (in CPU cycles and time) than regular expression
matching and so should always be used unless the match explicitly requires
otherwise. WASD attempts to improve the efficiency of both by performing a
preliminary pass to make simple matches and eliminate obvious mismatches using
a very low-cost comparison.   This either matches or doesn't, or encounters a
pattern matching meta-character which causes it to undertake full pattern
matching.

|^ To assist with the refinement of string matching patterns the Server
Administration facility has a report item named "Match".  This report allows
the input of target and match strings and allows direct access to the server's
wildcard and regular expression matching routines.  Successful matches show the
matching elements and a substitution field (|link|Expression Substitution||)
allows resultant strings to be assessed.

|^ To determine what string match processing is occuring during request
processing in the running server use the |/match| item available from the
Server Administration WATCH Report.

|2Wildcard Patterns|

|^ Wildcard patterns are simple, low-cost mechanisms for matching a string to a
template.  They are designed to be used in path and authorization mapping to
compare a request path to the root (left-hand side) or a template expression.

|0Wildcard Operators|

|table|
|~_ |: Expression|: Purpose
|~
|~ |. * |. Match zero or more characters (non-greedy)
|~ |. ** |. Match zero or more characters (greedy)
|~ |. % |. Match any one character
|!table|

|^ Wildcard matching uses the '*' and '%' symbols to match any zero or more,
or any one character respectively.  The '*' wildcard can either be greedy or
non-greedy depending on the context (and for historical reasons).  It can also
be forced to be greedy by using two consecutive ('**').  By default it is not
greedy when matching request paths for mapping or authentication, and is greedy
at other times (matching strings within conditional testing, etc.)

|0Greedy and Non-Greedy||

|^ Non-greedy matching attempts to match an asterisk wildcard up until the
first character that is not the same as the character immediately following the
wildcard.  It matches a minimum number of characters before failing.
Greedy matching attempts to match all characters up until the first string
that does not match what follows the asterisk.

|^ To illustrate; using the following string
|
|code|
non-greedy character matching compared to greedy character matching
|!code|

the following non-greedy pattern

|code|
*non-greedy character*matching
|!code|

does not match but the following greedy pattern

|code|
*non-greedy character**matching
|!code|

does match.  The non-greedy one failed as soon as it encountered the space
following the first "matching" string, while the greedy pattern continued to
match eventually encountering a string matching the string following the greedy
wildcard.

|2Regular Expressions|

|^ Regular expression matching is case insensitive (in line with other WASD
behaviour) and uses the POSIX EGREP pattern syntax and capabilities.  Regular
expression matching offers significant but relatively expensive functionality. 
One of those expenses is expression compilation.  WASD attempts to eliminate
this by pre-compiling expressions during server startup whenever feasable. 
Regular expression matching must be enabled using the [RegEx]
WASD_CONFIG_GLOBAL directive and are then differentiated from wildcard patterns
by using a leading "^" character.

|^ A detailed tutorial on regular expression capabilities and usage is well
beyond the scope of this document.  Many such hard-copy and on-line documents
are available.

|^+ |link%|http://en.wikipedia.org/wiki/Regular_expression|

|^ This summary is only to serve as a quick mnemonic.  WASD regular
expressions support the following set of operators.

|0Operator Overview|

|table>>|
|~_ |: Description|: Usage
|~
|~ |. Match-self Operator |. Ordinary characters.
|~ |. Match-any-character Operator |. .
|~ |. Concatenation Operator |. Juxtaposition.
|~ |. Repetition Operators |. *  +  ? {}
|~ |. Alternation Operator |. \|
|~ |. List Operators |. [...]  [^...]
|~ |. Grouping Operators |. (...)
|~ |. Back-reference Operator |. \^digit
|~ |. Anchoring Operators |. ^  $
|~ |. Backslash Operator |. Escape meta-character; i.e.
\^ ^ . $ \| [ (
|!table|

|^ The following operators are used to match one, or in conjunction with the
repetition operators more, characters of the target string.  These single and
leading characters are reserved meta-characters and must be escaped using a
leading backslash ("\^") if required as a literal character in
the matching pattern. |*Note| that this does not apply to the
|/range| hyphen; to include a hyphen in a range ensure the character
is the first or last in the range.

|0Matching Operators|

|table>>|
|~_ |: Expression|: Purpose
|~
|~ |. ^ |. Match the beginning of the line
|~ |. . |. Match any character
|~ |. $ |. Match the end of the line
|~ |. \| |. Alternation (or)
|~ |. [abc] |. Match only a, b or c
|~ |. [^abc] |. Match anything except a, b and c
|~ |. [a-z0-9] |. Match any character in the range a to z or 0 to 9
|!table|

|^ Repetition operators control the extent, or number, of whatever the
matching operators match. These are also reserved meta-characters and must be
escaped using a leading backslash if required as a literal character.

|0Repetition Operators|

|table>>|
|~_ |: Expression|: Function
|~
|~ |. * |. Match 0 or more times
|~ |. + |. Match 1 or more times
|~ |. ? |. Match 1 or zero times
|~ |. {n} |. Match exactly n times
|~ |. {n,} |. Match at least n times
|~ |. {n,m} |. Match at least n but not more than m times
|!table|

|2Examples|

|^ The following provides a series of examples as they might occur in use
for server configuration.

|number|

|item| Equivalent functionality using wildcard and regular expression patterns. 
Note that "Mozilla" must be at the start of the string, with the
regular expression using the start-of-string anchor resulting in two
consecutive "^"s, one indicating to WASD a regular expression, the other
being part of the expression itself.

|code|
if (user-agent:Mozilla*Gecko*)
if (user-agent:^^Mozilla.*Gecko)
|!code|

|item| This shows path matching using equivalent wildcard and regular expression
matching.  Note the requirement to use the regular expression
|/grouping| parentheses to provide the substitution elements,
something provided implicitly with wildcard matching.

|code|
map /*/-/* /wasd_root/runtime/*/*
map ^/(.+)/-/(.+) /wasd_root/runtime/*/*
|!code|

|item| This rather contrived regular expression example has no equivalent
capability available with wildcard matching.  It forbids the use of any path
that contains any character other than alpha-numerics, the hyphen, underscore,
period and forward-slash.

|code|
pass ^[^-_./a-z0-9]+ "403 Forbidden character in path!"
|!code|

|!number|

|2Expression Substitution|

|^ Expression substitution is available during path mapping (|link|Request
Processing Configuration||).  Both wildcard (implicitly) and regular
expressions (using |/grouping| operators) note the offsets of matched portions
of the strings.  These  are then used for wildcard and |/specified| wildcard
substitution where result strings provide for this (e.g. mapping 'pass' and
'redirect' rules).  A maximum of nine such wildcard substitutions are supported
(one other, the zeroeth, is the full match).

|0Wildcard Substitution||

|^ With wildcard matching each asterisk wildcard contained in the pattern
(|/template| string) has matching characters in the |/target| string noted and
stored.  Note that for the percentage (single character) wildcard no such
storage is provided.  These characters are available for substitution using
corresponding wildcards present in the |/result| string.  For instance, the
target string

|code|
this is an example target string
|!code|

would be matched by the pattern string

|code|
* is an example target *
|!code|

as containing two matching wildcard strings

|code|
this
string
|!code|

which could be substituted using the result string

|code|
* is an example result * 
|!code|

producing the resultant string

|code|
this is an example result string
|!code|

|0Regular Expression Substitution|

|^ With regular expression matching the groups of matching characters must be
explicitly specified using the |/grouping| parenthesis operator.  Hence with
regular expression matching it is possible to match many characters from the
target string without retaining them for later substitution.  Only if that
match is designated as a subsitution source do the matching characters become
available for substituion via any result string.  Using two possible target
strings as an example
|
|code|
this is an example target string
this is a contrived target string
|!code|
|
would both be matched by the regular expression
|
|code|
^^([a-z]*) is [a-z ]* target ([a-z]*)$
|!code|
|
which though it contains three regular expressions in the pattern, only
two have the grouping parentheses, and so make their matching string available
for substitution
|
|code|
this
string
|!code|
|
which could be substituted using the result string
|
|code|
* is the final result * 
|!code|
|
producing the resultant string
|
|code|
this is the final result string
|!code|

|0Specified Substitution||

|^ By default the strings matched by wildcard or grouping operators are
substituted in the same order in which they are matched.  This order may be
changed by specifying which wildcard string should be substituted where.  Not
all matched (and stored) strings need to be substituted.  Some may be omitted
and the contents effectively ignored.

|^ The specified substitution syntax is a result wildcard followed by a
single-apostrophe (') and a single digit from zero to nine (0|...|9).  The
zeroeth element is the full matching string.  Element one is the first matching
part of the expression, on through to the last.  Specifying an element that had
no matching string substitutes an empty string (i.e. nothing is added).  Using
the same target string as in the previous previous example
|
|code|
this is an example target string
|!code|
|
and matched by the wildcard pattern string
|
|code|
* is an example target *
|!code|
|
when substituted by the result string
|
|code|
*'2 is an example result
|!code|
|
would produce the resultant string
|
|code|
string is an example result
|!code|
|
with the string represented by the first wildcard effectively being discarded.