[0001]
[0002]
[0003]
[0004]
[0005]
[0006]
[0007]
[0008]
[0009]
[0010]
[0011]
[0012]
[0013]
[0014]
[0015]
[0016]
[0017]
[0018]
[0019]
[0020]
[0021]
[0022]
[0023]
[0024]
[0025]
[0026]
[0027]
[0028]
[0029]
[0030]
[0031]
[0032]
[0033]
[0034]
[0035]
[0036]
[0037]
[0038]
[0039]
[0040]
[0041]
[0042]
[0043]
[0044]
[0045]
[0046]
[0047]
[0048]
[0049]
[0050]
[0051]
[0052]
[0053]
[0054]
[0055]
[0056]
[0057]
[0058]
[0059]
[0060]
[0061]
[0062]
[0063]
[0064]
[0065]
[0066]
[0067]
[0068]
[0069]
[0070]
[0071]
[0072]
[0073]
[0074]
[0075]
[0076]
[0077]
[0078]
[0079]
[0080]
[0081]
[0082]
[0083]
[0084]
[0085]
[0086]
[0087]
[0088]
[0089]
[0090]
[0091]
[0092]
[0093]
[0094]
[0095]
[0096]
[0097]
[0098]
[0099]
[0100]
[0101]
[0102]
[0103]
[0104]
[0105]
[0106]
[0107]
[0108]
[0109]
[0110]
[0111]
[0112]
[0113]
[0114]
[0115]
[0116]
[0117]
[0118]
[0119]
[0120]
[0121]
[0122]
[0123]
[0124]
[0125]
[0126]
[0127]
[0128]
[0129]
[0130]
[0131]
[0132]
[0133]
[0134]
[0135]
[0136]
[0137]
[0138]
[0139]
[0140]
[0141]
[0142]
[0143]
[0144]
[0145]
[0146]
[0147]
[0148]
[0149]
[0150]
[0151]
[0152]
[0153]
[0154]
[0155]
[0156]
[0157]
[0158]
[0159]
[0160]
[0161]
[0162]
[0163]
[0164]
[0165]
[0166]
[0167]
[0168]
[0169]
[0170]
[0171]
[0172]
[0173]
[0174]
[0175]
[0176]
[0177]
[0178]
[0179]
[0180]
[0181]
[0182]
[0183]
[0184]
[0185]
[0186]
[0187]
[0188]
[0189]
[0190]
[0191]
[0192]
[0193]
[0194]
[0195]
[0196]
[0197]
[0198]
[0199]
[0200]
[0201]
[0202]
[0203]
[0204]
[0205]
[0206]
[0207]
[0208]
[0209]
[0210]
[0211]
[0212]
[0213]
[0214]
[0215]
[0216]
[0217]
[0218]
[0219]
[0220]
[0221]
[0222]
[0223]
[0224]
[0225]
[0226]
[0227]
[0228]
[0229]
[0230]
[0231]
[0232]
[0233]
[0234]
[0235]
[0236]
[0237]
[0238]
[0239]
[0240]
[0241]
[0242]
[0243]
[0244]
[0245]
|1Document Searching|

|^ The |/query| and |/extract| scripts provide real-time searching of
plain-text and HTML documents, and document retrieval.  The search is a
simple-string search, not a GREP-style search.  It is designed to provide a
useful mechanism for locating documents containing a keyword, not for document
analysis.  It has the useful feature for plain-text documents of allowing the
selective extraction of only the portion near the  |/hit||. 

|^ Only files with a plain-text or HTML MIME data type (see |link|Document
Access and Specification||) will be searched.  Others may be  specified, or be
selected from wildcard file specification, but they will not  actually have
their contents searched. 

|^ Directory specifications may include a wildcard elipsis (allowing a
directory tree to be traversed) and/or file name wildcards.  In other words,
anything acceptable as VMS file system syntax (except in URL-format of course).
See examples in |link|Standard Search Form||.

|2Plain-Text Search|

|^ A search of a plain-text file is straight-forward.  Each line in the file 
is searched for the required string.  The first time it is encountered is 
considered a |/hit||.  The line is not searched for any further 
occurances. 

|^ Searches of plain text files allow the subsequent selection of partial 
documents (i.e. the retrieval of only a number of lines around any actual 
hit).  This allows the user to selectively extract a portion of a document, 
avoiding the need to explcitly scan through to the section of interest. 

|2HTML Search|

|^ A search of an HTML file is a little more complex.  As might be expected, 
only text presented in the document text is searched, markup text is ignored.  
That is, all text not part of an HTML |/tag| construct is extracted  and
searched.  For example, out of the following HTML fragment 

|code|
<!-- an example HTML document -->
<p>
The document entitled <a target="_blank" href="example.html">"Example Document"</a>
provides only an <i>overview</i> of the full capabilities of HTML.
|!code|
only the following text would actually be searched

|code|
The document entitled "Example Document" provides only an overview
of the full capabilities of HTML.
|!code|

|^ The mechanism for partial document retrieval available with plain-text 
files is |*not| present with HTML documents.  HTML files generally must be
treated as a whole, with the formatting of current sections often very
dependent on the formatting of previous sections.  This makes extracting a
subsection perilous without extensive syntactical analyis.  On the positive
side, HTML documents tend to be already divided into meaningful subdocuments
(files), making retrieval of a hit naturally more-or-less within context.

|^ Instead of partial document retrieval, the document is processed to place
anchors for each hit, making it possible to jump directly to a particular
section of interest.  Generally this works well but may occasionally distort
the presentation of a document.

|2Search Syntax|

|^ A search may be initiated in basic three ways:

|number|

|item| Appending a question-mark and search string to a file specification (the 
simple syntax of "ISINDEX"-style searching).  This is standard HTTP, and of
course must conform to HTTP syntax.

|item| Providing the name of the query script followed by the directory path to
be searched.  The script then returns a standard search form.

|item| |/Forms||-based search, which allows the format and mechanism of 
the search to be controlled.

|!number|

|note|
|0.<isindex> tag obsolete (as of HTML4)|

|^ Placing the HTML tag "<isindex>" within a  document's text is sufficient to
inform the browser that searching is  available for that document.  The browser
will inform the user of this and  allow a search of that document to be
initiated at any time.  Note that it is  limited to the one document. 

|^ Using the keyword search syntax explicitly is another method of initiating 
a search, and additionally can use a wildcard in the document specification.  
For example:

|code|
/wasd_root/doc/env/*.*?formatted
|!code|

|^ The following link provides an online demonstration search using the above
syntax.  Note the difference in the way plain-text file hits are presented
compared with those of HTML files. 

|^+ |link%=|/wasd_root/wasdoc/env/*.*?formatted|
|!note|

|3Standard Search Form|

|^ Using the "QUERY" script name followed by a URL-format path
specifying the directory to be searched returns a standard, script-generated
search form.

|^ The following link provides an online demonstration of the standard search
form.

|^+ |link%=|/cgi-bin/query/wasd_root/wasdoc/env/|

|^ As with all search specifications, the directory specification may include
wildcard a elipsis (allowing a directory tree to be traversed) and/or file name
wildcards.  In other words, anything acceptable as VMS file system syntax
(except in URL-format of course).  See the following examples.

|table|
|~ |. |link%=|/cgi-bin/query/wasd_root/wasdoc/env/*.html|
|~ |. |link%=|/cgi-bin/query/wasd_root/wasdoc/.../|
|~ |. |link%=|/cgi-bin/query/wasd_root/wasdoc/.../*.html|
|!table|

|3Forms-Based Search|

|^ A "forms-based" search is initiated by the server receiving a file 
specification, which of course may contain wildcards, followed by a |/search|
parameter.  This is a typical HTML |/forms| format  URL.  For example: 

|code|
*.txt?search=SIMPLE
/web/.../*.*?search=THIS
sub_directory/*.*?search=THAT
../sibling_directory/*.HTML?search=OTHER
|!code|

|^ The following link provides an online demonstration search using the
form-based syntax. 

|^+ |link%=|/wasd_root/wasdoc/env/*.*?search=formatted|

|3Search Options|

|^ Additional URI components may be appended after the initial "search="
parameter.  These are appended with intervening "&") characters. 

|bullet|

|item| |*Case-Sensitivity |-|| An optional URI component of 
"case=yes" or "case=no" makes the search case-sensistive or 
case-insensistive (the default).  The following example illustrates the use of 
this syntax: 

|table|
|~ |. |link%=|/wasd_root/wasdoc/env/*.html?search=Protocol&case=yes|
   |. case-sensistive search for "Protocol"
|~ |. |link%=|/wasd_root/wasdoc/env/*.html?search=PrOtOcOl&case=no|
   |. case-|*in||sensistive search for "PrOtOcOl"
|!table|

|item| |*Hits |-|| An optional URI component of "hits=document" or "hits=line"
makes the search results be presented by-document (file) or by line-by-line
(the default).  The following example illustrates the use of this syntax: 

/web/html/.../*.html?search=protocol&hits=document
/web/html/.../*.html?search=protocol&hits=line

|table|
|~ |. |link%=|/wasd_root/wasdoc/env/*.html?search=protocol&hits=document|
   |. search result granularity by document
|~ |. |link%=|/wasd_root/wasdoc/env/*.html?search=protocol&hits=line|
   |. search result granularity by line (the default)
|!table|

|!bullet|

|3Example Search Form|

|^ To allow the client to enter a search string and submit a search to the 
server a HTML level 2 |/form| construct can be used.  Here is an  example: 

|code|
<form action="/web/html/.../*.html">
Search HTML documents for:&nbsp; 
<input type="text" name="search">
<input type="submit" value="execute">
</form>
|!code|

|^ The following provides an online demonstration of the form used above:

|asis+|
<p><form action="/wasd_root/wasdoc/env/*.html">
Search HTML documents for:&nbsp; 
<input type="text" name="search">
<input type="submit" value="execute">
</form>
||||

|0Bells and Whistles|

|^ A form providing all the options refered to in  |link|Search Options| is
shown below (some additional white-space  introduced for clarity): 

|code|
<form action="/web/html/.../*.html">

Search HTML documents for: 
<input type="text" name="search">
<input type="submit" value="execute">

<br><a target="_blank" href="/query/-/aboutquery.html">About</a> this search.

<br>Output By:
line <input type="radio" name="hits" value="line" checked>
document <input type="radio" name="hits" value="document">

<br>Case sensitive:
no <input type="radio" name="case" value="no" checked>
yes <input type="radio" name="case" value="yes">

</form>
|!code|

|^ The following provides an online demonstration of the form used above:

|asis+|
<p> <form action="/web/html/.../*.html">
Search HTML documents for: 
<input type="text" name="search">
<input type="submit" value="execute">
<br><a class="link" target="_blank" href="/query/-/aboutquery.html">About</a> this
search.
<br>Output By:
line <input type="radio" name="hits" value="line" checked>
document <input type="radio" name="hits" value="document">
<br>Case sensitive:
no <input type="radio" name="case" value="no" checked>
yes <input type="radio" name="case" value="yes">
</form>
||||