[0001] [0002] [0003] [0004] [0005] [0006] [0007] [0008] [0009] [0010] [0011] [0012] [0013] [0014] [0015] [0016] [0017] [0018] [0019] [0020] [0021] [0022] [0023] [0024] [0025] [0026] [0027] [0028] [0029] [0030] [0031] [0032] [0033] [0034] [0035] [0036] [0037] [0038] [0039] [0040] [0041] [0042] [0043] [0044] [0045] [0046] [0047] [0048] [0049] [0050] [0051] [0052] [0053] [0054] [0055] [0056] [0057] [0058] [0059] [0060] [0061] [0062] [0063] [0064] [0065] [0066] [0067] [0068] [0069] [0070] [0071] [0072] [0073] [0074] [0075] [0076] [0077] [0078] [0079] [0080] [0081] [0082] [0083] [0084] [0085] [0086] [0087] [0088] [0089] [0090] [0091] [0092] [0093] [0094] [0095] [0096] [0097] [0098] [0099] [0100] [0101] [0102] [0103] [0104] [0105] [0106] [0107] [0108] [0109] [0110] [0111] [0112] [0113] [0114] [0115] [0116] [0117] [0118] [0119] [0120] [0121] [0122] [0123] [0124] [0125] [0126] [0127] [0128] [0129] [0130] [0131] [0132] [0133] [0134] [0135] [0136] [0137] [0138] [0139] [0140] [0141] [0142] [0143] [0144] [0145] [0146] [0147] [0148] [0149] [0150] [0151] [0152] [0153] [0154] [0155] [0156] [0157] [0158] [0159] [0160] [0161] [0162] [0163] [0164] [0165] [0166] [0167] [0168] [0169] [0170] [0171] [0172] [0173] [0174] [0175] [0176] [0177] [0178] [0179] [0180] [0181] [0182] [0183] [0184] [0185] [0186] [0187] [0188] [0189] [0190] [0191] [0192] [0193] [0194] [0195] [0196] [0197] [0198] [0199] [0200] [0201] [0202] [0203] [0204] [0205] [0206] [0207] [0208] [0209] [0210] [0211] [0212] [0213] [0214] [0215] [0216] [0217] [0218] [0219] [0220] [0221] [0222] [0223] [0224] [0225] [0226] [0227] [0228] [0229] [0230] [0231] [0232] [0233] [0234] [0235] [0236] [0237] [0238] [0239] [0240] [0241] [0242] [0243] [0244] [0245]
|1Document Searching| |^ The |/query| and |/extract| scripts provide real-time searching of plain-text and HTML documents, and document retrieval. The search is a simple-string search, not a GREP-style search. It is designed to provide a useful mechanism for locating documents containing a keyword, not for document analysis. It has the useful feature for plain-text documents of allowing the selective extraction of only the portion near the |/hit||. |^ Only files with a plain-text or HTML MIME data type (see |link|Document Access and Specification||) will be searched. Others may be specified, or be selected from wildcard file specification, but they will not actually have their contents searched. |^ Directory specifications may include a wildcard elipsis (allowing a directory tree to be traversed) and/or file name wildcards. In other words, anything acceptable as VMS file system syntax (except in URL-format of course). See examples in |link|Standard Search Form||. |2Plain-Text Search| |^ A search of a plain-text file is straight-forward. Each line in the file is searched for the required string. The first time it is encountered is considered a |/hit||. The line is not searched for any further occurances. |^ Searches of plain text files allow the subsequent selection of partial documents (i.e. the retrieval of only a number of lines around any actual hit). This allows the user to selectively extract a portion of a document, avoiding the need to explcitly scan through to the section of interest. |2HTML Search| |^ A search of an HTML file is a little more complex. As might be expected, only text presented in the document text is searched, markup text is ignored. That is, all text not part of an HTML |/tag| construct is extracted and searched. For example, out of the following HTML fragment |code| <!-- an example HTML document --> <p> The document entitled <a target="_blank" href="example.html">"Example Document"</a> provides only an <i>overview</i> of the full capabilities of HTML. |!code| only the following text would actually be searched |code| The document entitled "Example Document" provides only an overview of the full capabilities of HTML. |!code| |^ The mechanism for partial document retrieval available with plain-text files is |*not| present with HTML documents. HTML files generally must be treated as a whole, with the formatting of current sections often very dependent on the formatting of previous sections. This makes extracting a subsection perilous without extensive syntactical analyis. On the positive side, HTML documents tend to be already divided into meaningful subdocuments (files), making retrieval of a hit naturally more-or-less within context. |^ Instead of partial document retrieval, the document is processed to place anchors for each hit, making it possible to jump directly to a particular section of interest. Generally this works well but may occasionally distort the presentation of a document. |2Search Syntax| |^ A search may be initiated in basic three ways: |number| |item| Appending a question-mark and search string to a file specification (the simple syntax of "ISINDEX"-style searching). This is standard HTTP, and of course must conform to HTTP syntax. |item| Providing the name of the query script followed by the directory path to be searched. The script then returns a standard search form. |item| |/Forms||-based search, which allows the format and mechanism of the search to be controlled. |!number| |note| |0.<isindex> tag obsolete (as of HTML4)| |^ Placing the HTML tag "<isindex>" within a document's text is sufficient to inform the browser that searching is available for that document. The browser will inform the user of this and allow a search of that document to be initiated at any time. Note that it is limited to the one document. |^ Using the keyword search syntax explicitly is another method of initiating a search, and additionally can use a wildcard in the document specification. For example: |code| /wasd_root/doc/env/*.*?formatted |!code| |^ The following link provides an online demonstration search using the above syntax. Note the difference in the way plain-text file hits are presented compared with those of HTML files. |^+ |link%=|/wasd_root/wasdoc/env/*.*?formatted| |!note| |3Standard Search Form| |^ Using the "QUERY" script name followed by a URL-format path specifying the directory to be searched returns a standard, script-generated search form. |^ The following link provides an online demonstration of the standard search form. |^+ |link%=|/cgi-bin/query/wasd_root/wasdoc/env/| |^ As with all search specifications, the directory specification may include wildcard a elipsis (allowing a directory tree to be traversed) and/or file name wildcards. In other words, anything acceptable as VMS file system syntax (except in URL-format of course). See the following examples. |table| |~ |. |link%=|/cgi-bin/query/wasd_root/wasdoc/env/*.html| |~ |. |link%=|/cgi-bin/query/wasd_root/wasdoc/.../| |~ |. |link%=|/cgi-bin/query/wasd_root/wasdoc/.../*.html| |!table| |3Forms-Based Search| |^ A "forms-based" search is initiated by the server receiving a file specification, which of course may contain wildcards, followed by a |/search| parameter. This is a typical HTML |/forms| format URL. For example: |code| *.txt?search=SIMPLE /web/.../*.*?search=THIS sub_directory/*.*?search=THAT ../sibling_directory/*.HTML?search=OTHER |!code| |^ The following link provides an online demonstration search using the form-based syntax. |^+ |link%=|/wasd_root/wasdoc/env/*.*?search=formatted| |3Search Options| |^ Additional URI components may be appended after the initial "search=" parameter. These are appended with intervening "&") characters. |bullet| |item| |*Case-Sensitivity |-|| An optional URI component of "case=yes" or "case=no" makes the search case-sensistive or case-insensistive (the default). The following example illustrates the use of this syntax: |table| |~ |. |link%=|/wasd_root/wasdoc/env/*.html?search=Protocol&case=yes| |. case-sensistive search for "Protocol" |~ |. |link%=|/wasd_root/wasdoc/env/*.html?search=PrOtOcOl&case=no| |. case-|*in||sensistive search for "PrOtOcOl" |!table| |item| |*Hits |-|| An optional URI component of "hits=document" or "hits=line" makes the search results be presented by-document (file) or by line-by-line (the default). The following example illustrates the use of this syntax: /web/html/.../*.html?search=protocol&hits=document /web/html/.../*.html?search=protocol&hits=line |table| |~ |. |link%=|/wasd_root/wasdoc/env/*.html?search=protocol&hits=document| |. search result granularity by document |~ |. |link%=|/wasd_root/wasdoc/env/*.html?search=protocol&hits=line| |. search result granularity by line (the default) |!table| |!bullet| |3Example Search Form| |^ To allow the client to enter a search string and submit a search to the server a HTML level 2 |/form| construct can be used. Here is an example: |code| <form action="/web/html/.../*.html"> Search HTML documents for: <input type="text" name="search"> <input type="submit" value="execute"> </form> |!code| |^ The following provides an online demonstration of the form used above: |asis+| <p><form action="/wasd_root/wasdoc/env/*.html"> Search HTML documents for: <input type="text" name="search"> <input type="submit" value="execute"> </form> |||| |0Bells and Whistles| |^ A form providing all the options refered to in |link|Search Options| is shown below (some additional white-space introduced for clarity): |code| <form action="/web/html/.../*.html"> Search HTML documents for: <input type="text" name="search"> <input type="submit" value="execute"> <br><a target="_blank" href="/query/-/aboutquery.html">About</a> this search. <br>Output By: line <input type="radio" name="hits" value="line" checked> document <input type="radio" name="hits" value="document"> <br>Case sensitive: no <input type="radio" name="case" value="no" checked> yes <input type="radio" name="case" value="yes"> </form> |!code| |^ The following provides an online demonstration of the form used above: |asis+| <p> <form action="/web/html/.../*.html"> Search HTML documents for: <input type="text" name="search"> <input type="submit" value="execute"> <br><a class="link" target="_blank" href="/query/-/aboutquery.html">About</a> this search. <br>Output By: line <input type="radio" name="hits" value="line" checked> document <input type="radio" name="hits" value="document"> <br>Case sensitive: no <input type="radio" name="case" value="no" checked> yes <input type="radio" name="case" value="yes"> </form> ||||