|1Document Searching| |^ The |/query| and |/extract| scripts provide real-time searching of plain-text and HTML documents, and document retrieval. The search is a simple-string search, not a GREP-style search. It is designed to provide a useful mechanism for locating documents containing a keyword, not for document analysis. It has the useful feature for plain-text documents of allowing the selective extraction of only the portion near the |/hit||. |^ Only files with a plain-text or HTML MIME data type (see |link|Document Access and Specification||) will be searched. Others may be specified, or be selected from wildcard file specification, but they will not actually have their contents searched. |^ Directory specifications may include a wildcard elipsis (allowing a directory tree to be traversed) and/or file name wildcards. In other words, anything acceptable as VMS file system syntax (except in URL-format of course). See examples in |link|Standard Search Form||. |2Plain-Text Search| |^ A search of a plain-text file is straight-forward. Each line in the file is searched for the required string. The first time it is encountered is considered a |/hit||. The line is not searched for any further occurances. |^ Searches of plain text files allow the subsequent selection of partial documents (i.e. the retrieval of only a number of lines around any actual hit). This allows the user to selectively extract a portion of a document, avoiding the need to explcitly scan through to the section of interest. |2HTML Search| |^ A search of an HTML file is a little more complex. As might be expected, only text presented in the document text is searched, markup text is ignored. That is, all text not part of an HTML |/tag| construct is extracted and searched. For example, out of the following HTML fragment |code|

The document entitled "Example Document" provides only an overview of the full capabilities of HTML. |!code| only the following text would actually be searched |code| The document entitled "Example Document" provides only an overview of the full capabilities of HTML. |!code| |^ The mechanism for partial document retrieval available with plain-text files is |*not| present with HTML documents. HTML files generally must be treated as a whole, with the formatting of current sections often very dependent on the formatting of previous sections. This makes extracting a subsection perilous without extensive syntactical analyis. On the positive side, HTML documents tend to be already divided into meaningful subdocuments (files), making retrieval of a hit naturally more-or-less within context. |^ Instead of partial document retrieval, the document is processed to place anchors for each hit, making it possible to jump directly to a particular section of interest. Generally this works well but may occasionally distort the presentation of a document. |2Search Syntax| |^ A search may be initiated in basic three ways: |number| |item| Appending a question-mark and search string to a file specification (the simple syntax of "ISINDEX"-style searching). This is standard HTTP, and of course must conform to HTTP syntax. |item| Providing the name of the query script followed by the directory path to be searched. The script then returns a standard search form. |item| |/Forms||-based search, which allows the format and mechanism of the search to be controlled. |!number| |note| |0. tag obsolete (as of HTML4)| |^ Placing the HTML tag "" within a document's text is sufficient to inform the browser that searching is available for that document. The browser will inform the user of this and allow a search of that document to be initiated at any time. Note that it is limited to the one document. |^ Using the keyword search syntax explicitly is another method of initiating a search, and additionally can use a wildcard in the document specification. For example: |code| /wasd_root/doc/env/*.*?formatted |!code| |^ The following link provides an online demonstration search using the above syntax. Note the difference in the way plain-text file hits are presented compared with those of HTML files. |^+ |link%=|/wasd_root/wasdoc/env/*.*?formatted| |!note| |3Standard Search Form| |^ Using the "QUERY" script name followed by a URL-format path specifying the directory to be searched returns a standard, script-generated search form. |^ The following link provides an online demonstration of the standard search form. |^+ |link%=|/cgi-bin/query/wasd_root/wasdoc/env/| |^ As with all search specifications, the directory specification may include wildcard a elipsis (allowing a directory tree to be traversed) and/or file name wildcards. In other words, anything acceptable as VMS file system syntax (except in URL-format of course). See the following examples. |table| |~ |. |link%=|/cgi-bin/query/wasd_root/wasdoc/env/*.html| |~ |. |link%=|/cgi-bin/query/wasd_root/wasdoc/.../| |~ |. |link%=|/cgi-bin/query/wasd_root/wasdoc/.../*.html| |!table| |3Forms-Based Search| |^ A "forms-based" search is initiated by the server receiving a file specification, which of course may contain wildcards, followed by a |/search| parameter. This is a typical HTML |/forms| format URL. For example: |code| *.txt?search=SIMPLE /web/.../*.*?search=THIS sub_directory/*.*?search=THAT ../sibling_directory/*.HTML?search=OTHER |!code| |^ The following link provides an online demonstration search using the form-based syntax. |^+ |link%=|/wasd_root/wasdoc/env/*.*?search=formatted| |3Search Options| |^ Additional URI components may be appended after the initial "search=" parameter. These are appended with intervening "&") characters. |bullet| |item| |*Case-Sensitivity |-|| An optional URI component of "case=yes" or "case=no" makes the search case-sensistive or case-insensistive (the default). The following example illustrates the use of this syntax: |table| |~ |. |link%=|/wasd_root/wasdoc/env/*.html?search=Protocol&case=yes| |. case-sensistive search for "Protocol" |~ |. |link%=|/wasd_root/wasdoc/env/*.html?search=PrOtOcOl&case=no| |. case-|*in||sensistive search for "PrOtOcOl" |!table| |item| |*Hits |-|| An optional URI component of "hits=document" or "hits=line" makes the search results be presented by-document (file) or by line-by-line (the default). The following example illustrates the use of this syntax: /web/html/.../*.html?search=protocol&hits=document /web/html/.../*.html?search=protocol&hits=line |table| |~ |. |link%=|/wasd_root/wasdoc/env/*.html?search=protocol&hits=document| |. search result granularity by document |~ |. |link%=|/wasd_root/wasdoc/env/*.html?search=protocol&hits=line| |. search result granularity by line (the default) |!table| |!bullet| |3Example Search Form| |^ To allow the client to enter a search string and submit a search to the server a HTML level 2 |/form| construct can be used. Here is an example: |code|

Search HTML documents for: 
|!code| |^ The following provides an online demonstration of the form used above: |asis+|

Search HTML documents for: 
|||| |0Bells and Whistles| |^ A form providing all the options refered to in |link|Search Options| is shown below (some additional white-space introduced for clarity): |code|
Search HTML documents for:
About this search.
Output By: line document
Case sensitive: no yes
|!code| |^ The following provides an online demonstration of the form used above: |asis+|

Search HTML documents for:
About this search.
Output By: line document
Case sensitive: no yes
||||