The information in this chapter merely outlines the WASD implementation details, which are in general very much vanilla CGI and NCSA CGI (Common Gateway Interface) compliant, originally based the INTERNET-DRAFT authored by D.Robinson (email@example.com), 8 January 1996, confirmed against the final RFC 3875, authored by David Robinson (firstname.lastname@example.org) and Ken A.L.Coar (email@example.com), October 2004.
With the standard CGI environment variables are provided to the script via DCL global symbols. Each CGI variable symbol name is prefixed with "WWW_" (by default, although this can be changed using the "/CGI_PREFIX" qualifier and the SET CGIPREFIX mapping rule, see "Features and Facilities", this is not recommended if the WASD VMS scripts are to be used, as they expect CGI variable symbols to be prefixed in this manner).
There are a number of non-"standard" CGI variables to assist in tailoring scripts for the WASD environment. Do not make your scripts dependent on any of these if portability is a goal.
NEVER, EVER SUBSTITUTE
the contents of CGI variables directly into the code stream using interpreters that will allows this (e.g. DCL, Perl). You run a very real risk of having unintended content maliciously change the intended function of the code. For example, never use comma substitution of a CGI variable at the DCL command line as in$ COPY 'WWW_FORM_SRC' 'WWW_FORM_DST'Always pre-process the content of the variable first, ensuring there has been nothing inserted that could subvert the intended purpose (repeated here to emphasize the significance of this rule).
CGI variable capacity now varies significantly with VMS version.
The total size of all CGI variable names and values is determined by the value of [BufferSizeDclCommand] configuration directive, which determines the total buffer space of a mailbox providing the script's SYS$COMMAND. The default value of 4096 bytes will be ample for the typical CGI script request, however if it contains very large individual variables or a large number of form fields, etc., it may be possible to exhaust this quantity.
CGI variables may contain values in excess of 8000 characters (the full 8192 symbol capacity cannot be realized due to the way the symbols are created via the CLI). This is a significant increase on earlier capacities. Mailbox buffer [BufferSizeDclCommand] may need to be increased if this capacity is to be fully utilized.
Values may contain approximately 1000 characters minus the size of the variable name. This should still be sufficient for most circumstances (if not consider using CGIplus or ISAPI, extensions to CGI programming which remove this limitation). Why such an odd number and why a little rubbery? A DCL command line with these versions is limited to 255 characters so the symbols for larger variables are built up over successive DCL commands with the limit determined by CLI behaviour.
On VMS V7.3-2 and later symbol capacity should never be an issue (well, perhaps only with the most extraordinarily poorly designed script). With VMS V7.3-1 and earlier, with a symbol value that is too large, the server by default aborts the request, generating and returning a 500 HTTP status. Experience has shown that this occurs very rarely. If it does occur it is possible to instruct the server to instead truncate the CGI variable value and continue processing. Any CGI variable that is truncated in such a manner has its name placed in CGI variable SERVER_TRUNCATE, so that a script can check for, and take appropriate action on, any such truncation. To have the server truncate such variables instead of aborting processing SET the path using the script=symbol=truncate mapping rule. For example
set /cgi-bin/script-name* script=symbol=truncate
Remember, by default all variables are prefixed by "WWW_" (though this may be modified using the set CGIprefix= mapping rule), and not all variables will be present for all requests. These CGI environment variables reflect a combination of HTTP/1.1 and HTTP/1.0 request parameters.
If the request path is set to provide them, there are also be variables providing information about a Secure Sockets Layer transported request's SSL environment.
In line with other CGI implementations, additional, non-compliant variables are provided to ease CGI interfacing. These provide the various components of any query string. A keyword query string and a form query string are parsed into
WWW_KEY_number WWW_KEY_COUNT WWW_FORM_form-element-name
Variables named WWW_KEY_number will be generated if the query string contains one or more plus ("+") and no equate symbols ("=").
Variables named WWW_FORM_form-element-name will be generated if the query string contains one or more equate symbols. Generally such a query string is used to encode form-URL-encoded (MIME type x-www-form-urlencoded) requests. By default the server will report an incorrect encoding with a 400 error response. However some scripts use malformed encodings and so this behaviour may be suppressed using the set script=query=relaxed mapping rule.
set /cgi-bin/script-name* script=query=relaxed
To suppress this decoding completely (and save a few CPU cycles) use the following rule.
set /cgi-bin/script-name* script=query=none
The UNIQUE_ID variable is a mostly Apache-compliant implementation (the "_" has been substituted for the "@" to allow its use in file names), for each request generating a globally and temporally unique 19 character string that can be used where such a identifier might be needed. This string contains only "A"-"Z", "a"-"z", "0"-"9", "_" and "-" characters and is generated using a combination of time-stamp, host IP address, server system process identifier and counter, and is "guaranteed" to be unique in (Internet) space and time.
WASD v7.0 had its CGI environment tailored slightly to ease portability between VMS Apache (Compaq Secure Web Server) and WASD. This included the provision of an APACHE$INPUT: stream and several Apache-specific CGI variables (see the table below). The CGILIB C function library (1.12 - Scripting Function Library) has also been made CSWS V1.0-1 and later (Apache 1.3.12 and higher) compliant.
The basic CGI symbol names are demonstrated here with a call to a script that simply executes the following DCL code:
$ SHOW SYMBOL WWW_* $ SHOW SYMBOL *Note how the request components are represented for "ISINDEX"-style searching (third item) and a forms-based query (fourth item).
This information applies to all non-DECnet based scripting, CGI, CGIplus, RTE, ISAPI. WASD uses mailboxes for script inter-process communication (IPC). These are efficient, versatile and allow direct output from all VMS environments and utilities. Like many VMS record-oriented devices however there are some things to consider when using them (also see IPC Tickler).
The mailboxes are created record, not stream oriented. This means records output by standard VMS means (e.g. DCL, utilities, programming languages) are discretely identified and may be processed appropriately by the server as text or binary depending on the content-type.
Being record oriented there is a maximum record size (MRS) that can be output. Records larger than this result in SYSTEM-F-MBTOOSML errors. The WASD default is 4096 bytes. This may be changed using the [BufferSizeDclOutput] configuration directive. This allocation consumes process BYTLM with each mailbox created so the account must be dimensioned sufficiently to supply demands for this quota. The maximum possible size for this is a VMS-limit of 60,000 bytes.
When created the mailbox has its buffer space set. With WASD IPC mailboxes this is the same as the MRS. The total data buffered may not exceed this without the script entering a wait state (for the mailbox contents to be cleared by the server). As mailboxes use a little of the buffer space to delimit records stored in it the amount of data is actually less than the total buffer space.
To determine the maximum record size and total capacity of the mailbox buffer between server and script WASD provides a CGI environment variable, GATEWAY_MRS, containing an integer with this value.
Script response may be CGI or NPH compliant (2.2.2 - Non-Parsed-Header Output). CGI compliance means the script's response must begin with a line containing one of the following fields.
Other HTTP-compliant response fields may follow, with the response header terminated and the response body begun by a single empty line. The following are examples of CGI-compliant responses.
Content-Type: text/html Content-Length: 35 <HTML> <B>Hello world!</B> </HTML>And using the status field.
Status: 404 Not Found Content-Type: text/plain Huh?
Strict CGI output compliance can be enabled and disabled using the [CgiStrictOutput] configuration directive. With it disabled the server will accept any output from the script, if not CGI or NPH compliant then it automatically generates plain-text header. When enabled, if not a CGI or NPH header the server returns a "502 Bad Gateway" error. For debugging scripts generating this error introduce a plain-text debug mode and header, or use the WATCH facility's CGI item (see Features and Facilities).
With HTTP/1.1 it is generally better to use CGI than NPH responses. A CGI response allows the server to parse the response header and from that make decisions about connection persistence and content-encoding. These can contribute significantly to reducing response latency and content transfer efficiency. It allows any policy established by server configuration for such characteristics to be employed.
This section describes how WASD deals with some particular output issues (also see IPC Tickler).
If the script response content-type is "text/..." (text document) WASD assumes that output will be line-oriented and requiring HTTP carriage-control (each record/line terminated by a line-feed), and will ensure each record it receives is correctly terminated before passing it to the client. In this way DCL procedure output (and the VMS environment in general) is supported transparently. Any other content-type is assumed to be binary and no carriage-control is enforced. This default behaviour may be modified as described below.
Carriage-control behaviour for any content-type may be explicitly set using either of two additional response header fields. The term stream is used to describe the server just transfering records, without additional processing, as they were received from the script. This is obviously necessary for binary/raw content such as images, octet-streams, etc. The term record describes the server ensuring each record it receives has correct carriage-control - a trailing newline. If not present one is added. This mode is useful for VMS textual streams (e.g. output from DCL and VMS utilities).
Using the Apache Group's proposed CGI/1.2 "Script-Control:" field. The WASD extension-directives X-record-mode and X-stream-mode sets the script output into each of the respective modes (Script-Control:).
Examples of usage this field:
Script-Control: X-stream-mode Script-Control: X-record-mode
By default WASD writes each record received from the script to the client as it is received. This can range from a single byte to a complete mailbox buffer full. WASD leaves it up to the script to determine the rate at which output flows back to the client.
While this allows a certain flexibility it can be inefficient. There will be many instances where a script will be providing just a body of data to the client, and wish to do it as quickly and efficiently as possible. Using the proposed CGI/1.2 "Script-Control:" field with the WASD extension directive X-buffer-records a script can direct the server to buffer as many script output records as possible before transfering it to the client. The following should be added to the CGI response header.
While the above offers some significant improvements to efficiency and perceived throughput the best approach is for the script to provide records the same size as the mailbox (2.2 - Script Output for detail on determining this size if required). The can be done explicitly by the script programming or if using the C language simply by changing stdout to a binary stream. With this environment the C-RTL will control output, automatically buffering as much as possible before writing it to the server.
if ((stdout = freopen ("SYS$OUTPUT", "w", stdout, "ctx=bin")) == NULL) exit (vaxc$errno);
Also see the section describing NPH C Script.
Non-C Runtime Libraries (C-RTL) do not contend with records delimitted by embedded characters (the newlines and nulls, etc., of the C environment). They use VMS' and RMS' default record-oriented I/O. The C-RTL needs to accomodate the C environment's bag-o'-bytes paradigm for file content against RMS' record structures, and it's embedded terminator, stream-oriented I/O with unterminated, record-oriented I/O. Often this results in a number of issues particularly with code ported from *x environments.
The C-RTL behaviour can be modified in all sorts of ways, including some file and other I/O characteristics. The features available to such modification are incrementally increasing with each release of the C-RTL and/or C compiler. It is well advised to consult the latest release (or as appropriate for the local environment) of the Run-Time Library Reference Manual for OpenVMS Systems for the current set.
Behaviours are modified by setting various flags, either from within the program itself using thef using the decc$feature_set() and allied group of functions, or by defining an equivalent logical name, usually externally to and before executing the image. See C-RTL Reference Manual section Enabling C RTL Features Using Feature Logical Names. This is particularly useful if the source is unavailable or just as a simpler approach to modifying code.
An example of a useful feature and associated logical name is DECC$STDIO_CTX_EOL which when enabled "writing to stdout and stderr for stream access is deferred until a terminator is seen or the buffer is full" in contrast to the default behaviour of "each fwrite generates a separate write, which for mailbox and record files generates a separate record". For an application performing write()s or fwrite()s with a record-oriented <stdio> and generating inappropriate record boundaries the application could be wrapped as follows (a real-world example).
$ set noon $ define/user/nolog sys$input http$input $ define/user DECC$STDIO_CTX_EOL ENABLE $ calcserver $ exit(1)
The interactions between VMS' record-oriented I/O, various run-time libraries (in particular the C-RTL), the streaming character-oriented Web, and of course WASD, can be quite complex and result in unintended output or formatting. The CGI script Inter-Process Communication (IPC) tickler
WASD_ROOT:[SRC.MISC]IPCTICKLER.Cis designed to allow a script programmer to gain an appreciation of how these elements interact, how WASD attempts to accomodate them, what mechanisms a script can use to explicitly convey exact requirements to WASD ... and finally, how these affect output (in particular the carriage-control) delivered to the client. If installed use
/cgi-bin/IPCticklerto obtain an HTML form allowing control of several parameters into the script.
The Apache Group has proposed a CGI/1.2 that includes a Script-Control: CGI response header field. WASD implements the one proposed directive, along with a number of WASD extensions (those beginning with the "X-"). Note that by convention extensions unknown by an agent should be ignored, meaning that they can be freely included, only being meaningful to WASD and not significant to other implementations.
The following is a simple example response where the server is instructed not to delete the script process under any circumstances, and that the body does not require any carriage-control changes.
Content-Type: text/plain Script-Control: no-abort; X-stream-mode long, slowww script-output ...
A simple script to provide the system time might be:
$ say = "write sys$output" $! the next two lines make it CGI-compliant $ say "Content-Type: text/plain" $ say "" $! start of plain-text body $ show time
A script to provide the system time more elaborately (using HTML):
$ say = "write sys$output" $! the next two lines make it CGI-compliant $ say "Content-Type: text/html" $ say "" $! start of HTML script output $ say "<HTML>" $ say "Hello ''WWW_REMOTE_HOST'" !(CGI variable) $ say "<P>" $ say "System time on node ''f$getsyi("nodename")' is:" $ say "<H1>''f$cvtime()'</H1>" $ say "</HTML>"
A script does not have to output a CGI-compliant data stream. If it begins with a HTTP header status line WASD assumes it will supply a raw HTTP data stream, containing all the HTTP requirements. This is the same as or equivalent to the non-parsed-header, or "nph..." scripts of many environments. This is an example of such a script response.
HTTP/1.0 200 Success Content-Type: text/html Content-Length: 35 <HTML> <B>Hello world!</B> </HTML>
Any such script must observe the HyperText Transfer Protocol, supplying a full response header and body, including correct carriage-control. Once the server detects the HTTP status header line it pays no more attention to any response header fields or body records, just transfering everything directly to the client. This can be very efficient, the server just a conduit between script and client, but does transfer the responsibility for a correct HTTP response onto the script.
The following example shows a DCL script. Note the full HTTP header and each line explicitly terminated with a carriage-return and line-feed pair.
$ lf[0,8] = %x0a $ crlf[0,16] = %x0d0a $ say = "write sys$output" $! the next line determines that it is raw HTTP stream $ say "HTTP/1.0 200 Success" + crlf $ say "Content-Type: text/html" + crlf $! response header separating blank line $ say crlf $! start of HTML script output $ say "<HTML>" + lf $ say "Hello ''WWW_REMOTE_HOST'" + lf $ say "<P>" + lf $ say "Local time is ''WWW_REQUEST_TIME_LOCAL'" + lf $ say "</HTML>" + lf
When scripting using the C programming language there can be considerable efficiencies to be gained by providing a binary output stream from the script. This results in the C Run-Time Library (C-RTL) buffering output up to the maximum supported by the IPC mailbox. This may be enabled using a code construct similar to following to reopen stdout in binary mode.
if ((stdout = freopen ("SYS$OUTPUT", "w", stdout, "ctx=bin")) == NULL) exit (vaxc$errno);
This is used consistently in WASD scripts. Carriage-control must be supplied as part of the C standard output (no differently to any other C program). Output can be be explicitly sent to the client at any stage using the fflush() standard library function. Note that if the fwrite() function is used the current contents of the C-RTL buffer are automatically flushed along the the content of the fwrite().
fprintf (stdout, "HTTP/1.0 200 Success\r\n\ Content-Type: text/html\r\n\ \r\n\ <HTML>\n\ Hello %s\n\ <P>\n\ System time is %s\n\ </HTML>\n", getenv("WWW_REMOTE_HOST"), getenv("WWW_REQUEST_TIME_LOCAL"));
As described above, 2.2 - Script Output, the default script<->server IPC uses a mailbox. While versatile and sufficiently efficient for general use, when megabytes, tens of megabytes, and hundreds of megabytes need to be transferred, using a memory buffer shared between script and server can yield transfer improvements of up to five times.
Of course, your mileage may vary with platform, O/S version and TCP/IP stack (i.e. as the relative bottlenecks shuffle about).
The script requests a memory-buffer using a CGI callout (6 - CGI Callouts). Buffer size is constrained by the usual VMS 32bit memory considerations, along with available process and system resources. The server creates and maps a non-permanent global section. If this is successful the script is advised of the global section name using the callout response. The script uses this to map the section name and can then populate the buffer. When the buffer is full or otherwise ready, the script issues a callout with the number of bytes to write, and then stalls. The complete memory buffer may be written at once or any subsection of that buffer. The write is accomplished asynchronously and may comprise multiple network $QIOs or TLS/SSL blocks. When complete, a callout response to the script is issued and the script can continue processing. Standard script mailbox I/O (SYS$OUTPUT, <stdout>) and memory-buffer I/O may be interleaved as required.
The callouts are as follows:
Create a temporary global section to act as a memory buffer shared between a script process and the server. The default is Megabytes.
Dispose of the shared memory buffer created by callout BUFFER-BEGIN.
Instruct the server to write <integer> bytes from the shared memory buffer to the client.
See working examples in WASD_ROOT:[SRC.MISC].
Actual data comparing standard mailbox IPC with memory-buffer generated using [SRC.MISC]MEMBUFDEMO.C on a HP rx2660 (1.40GHz/6.0MB) with 4 CPUs and 16383MB running VSI VMS V8.4-2L1 with Multinet UCX$IPC_SHR V55A-B147, OpenSSL 1.0.2k and WASD v11.2.0, with [BufferSizeDclOutput] 16384. In each case 250MB ("?250") is transfered via a either a 16.4kB mailbox (default) or 16.4kB memory buffer ("+b"). Significantly larger memory buffer may well improve throughput further.
$ wget "-O" nl: http://127.0.0.1/cgi-bin/membufdemo?250 --2017-10-14 03:19:05-- http://127.0.0.1/cgi-bin/membufdemo?250 Connecting to 127.0.0.1:80... connected. HTTP request sent, awaiting response... 200 OK Length: 262144000 (250M) [application/octet-stream] Saving to: 'nl:' nl: 100%[=====================>] 250.00M 25.6MB/s in 12s 2017-10-14 03:19:17 (20.5 MB/s) - 'nl:' saved [262144000/262144000] $ wget "-O" nl: http://127.0.0.1/cgi-bin/membufdemo?250+b --2017-10-14 03:19:23-- http://127.0.0.1/cgi-bin/membufdemo?250+b Connecting to 127.0.0.1:80... connected. HTTP request sent, awaiting response... 200 OK Length: 262144000 (250M) [application/octet-stream] Saving to: 'nl:' nl: 100%[=====================>] 250.00M 105MB/s in 2.4s 2017-10-14 03:19:26 (105 MB/s) - 'nl:' saved [262144000/262144000] $ wget "-O" nl: https://127.0.0.1/cgi-bin/membufdemo?250 --2017-10-14 03:19:50-- https://127.0.0.1/cgi-bin/membufdemo?250 Connecting to 127.0.0.1:443... connected. HTTP request sent, awaiting response... 200 OK Length: 262144000 (250M) [application/octet-stream] Saving to: 'nl:' nl: 100%[=====================>] 250.00M 14.5MB/s in 17s 2017-10-14 03:20:07 (14.5 MB/s) - 'nl:' saved [262144000/262144000] $ wget "-O" nl: https://127.0.0.1/cgi-bin/membufdemo?250+b --2017-10-14 03:20:12-- https://127.0.0.1/cgi-bin/membufdemo?250+b HTTP request sent, awaiting response... 200 OK Length: 262144000 (250M) [application/octet-stream] Saving to: 'nl:' nl: 100%[=====================>] 250.00M 16.6MB/s in 15s 2017-10-14 03:20:27 (16.6 MB/s) - 'nl:' saved [262144000/262144000]
It is obvious that memory-buffer provides significantly greater throughput than mailbox (from the http:// test) and that with TLS/SSL network transport the encryption becomes a significant overhead and choke-point. Nevertheless, there is still an approximate 15% dividend, plus the more efficient interface the script->memory-buffer->server provides. The VMS TLS/SSL implementation may improve with time, especially if TLS/SSL hardware engines become available with the port to x86_64.
The comparison also illustrates that the WASD environment can deliver significant bandwidth through its script->server->network pathways. On the demonstration class of system; ~200Mbps unencrypted and ~120Mbps encrypted using the standard mailbox IPC; with ~850Mbps unencrypted and ~130Mbps encrypted using the memory-buffer IPC.
For POST and PUT HTTP methods (e.g. a POSTed HTML form) the body of the request may be read from the HTTP$INPUT stream. For executable image scripts requiring the body to be present on SYS$INPUT (the C language stdin stream) a user-mode logical may be defined immediately before invoking the image, as in the example.
$ EGSCRIPT = "$WASD_EXE:EGSCRIPT.EXE" $ DEFINE /USER SYS$INPUT HTTP$INPUT $ EGSCRIPT
The HTTP$INPUT stream may be explicitly opened and read. Note that this is a raw stream, and HTTP lines (carriage-return/line-feed terminated sequences of characters) may have been blocked together for network transport. These would need to be explicity parsed by the program.
if ((HttpInput = fopen ("HTTP$INPUT", "r", "ctx=bin")) == NULL) exit (vaxc$errno);
When scripting using the C programming language there is a tendency for the C-RTL to check for and/or add newline (0x10, <LF>) carriage-control on receipt of record (single write). While this can be useful in converting from VMS to C conventions it can also be counter-productive if the stream being received is already using C carriage-control. To prevent the C-RTL reinterpreting data passed to it it often, perhaps invariably, necessary to reopen the input stream as binary using a construct similar to following.
This, and its <stdin> equivalent (below), are used consistently in WASD scripts.
if ((stdin = freopen ("HTTP$INPUT", "r", stdin, "ctx=bin")) == NULL) exit (vaxc$errno); if ((stdin = freopen ("SYS$INPUT", "r", stdin, "ctx=bin")) == NULL) exit (vaxc$errno);
The input stream should be read before generating any output. If an error occurs during the body processing it should be reported via a CGI response header indicating an error (i.e. non-200). With HTTP/1.1 request processing there is also a requirement (that CGILIB fulfills) to return a "100 Continue" interim response after receiving the client request header and before the client sends the request body. Output of anything before this "100 Continue" is delivered will cause it to be interleaved with the script response body.
A source code collection of C language functions useful for processing the more vexing aspects of CGI/CGIplus programming (1.12 - Scripting Function Library).
This assists with the generation of HTTP responses, including the transfer of binary content from files (copying a file back to the client as part of the request), and the processing of the contents of POSTed requests from DCL (1.11 - DCL Processing of Requests).