WASD Features and Facilities

WASD instances and environments are two distinct mechanisms for supporting multiple WASD server processes on a single system.

Server instances are multiple, cooperating server processes providing the same set of configured resources.

Server environments are multiple, independent server processes providing differently configured resources.

8.1Server Instances

The term instance is used by WASD to describe an autonomous server process. WASD will support multiple server processes running on a single system, alone or in combination with multiple server processes running across a cluster. This is not the same as supporting multiple virtual servers (see Virtual Services of WASD Configuration). When multiple instances are configured on a single system they cooperate to distribute the request load between themselves and share certain essential resources such as accounting and authorization information.

WARNING

Versions earlier than Compaq TCP/IP Services v5.3 and some TCPware v5.n (at least) have a problem with socket listen queuing that can cause services to "hang" (should this happen just disable instances and restart the server). Ensure you have the requisite version/ECO/patch installed before activating multiple instances on production systems!

8.1.1VMS Clustering Comparison

The approach WASD has used in providing multiple instance serving may be compared in many ways to VMS clustering.

A cluster is often described as a loosely-coupled, distributed operating environment where autonomous processors can join, process and leave (even fail) independently, participating in a single management domain and communicating with one another for the purposes of resource sharing and high availability.

Similarly WASD instances run in autonomous, detached processes (across one or more systems in a cluster) using a common configuration and management interface, aware of the presence and activity of other instances (via the Distributed Lock Manager and shared memory), sharing processing load and providing rolling restart and automatic "fail-through" as required.

Load Sharing

On a multi-CPU system there are performance advantages to having processing available for scheduling on each. WASD employs AST (I/O) based processing and was not originally designed to support VMS kernel threading. Benchmarking has shown this to be quite fast and efficient even when compared to a kernel-threaded server (OSU) across 2 CPUs. The advantage of multiple CPUs for a single multi-threaded server also diminishes where a site frequently activates scripts for processing. These of course (potentially) require a CPU each for processing. Where a system has many CPUs (and to a lesser extent with only two and few script activations) WASD's single-process, AST-driven design would scale more poorly. Running multiple WASD instances addresses this.

Restart

When multiple WASD instances are executing on a node and a restart is initiated only one process shuts down at a time. Others remain available for requests until the one restarting is again fully ready to process them itself, at which point the next commences restart. This has been termed a rolling restart. Such behaviour allows server reconfiguration on a busy site without even a small loss of availability.

Fail-Through

When multiple instances are executing on a node and one of these exits for some reason (resource exhaustion, bugcheck, etc.) the other(s) will continue to process requests. Of course requests in-progress by the particular instance at the time of instance failure are disconnected (this contrasts with the rolling restart behaviour described above). If the former process has actually exited (in contrast to just the image) a new server process will automatically be created after a few seconds.

The term fail-through is used rather than failover because one server does not commence processing as another ceases. All servers are constantly active with those remaining immediately and automatically taking all requests in the absence any one (or more) of them.

8.1.2Considerations

Of course "there is no such thing as a free lunch" and supporting multiple instances is no exception to this rule. To coordinate activity between and access to shared resources, multiple instances use low-level mutexes and the VMS Distributed Lock Manager (DLM). This does add some system overhead and a little latency to request processing, however as the benchmarks indicate increases in overall request throughput on a multi-CPU system easily offset these costs. On single CPU systems the advantages of rolling restart and fail-through need to be assessed against the small cost on a per-site basis. It is to be expected many low activity sites will not require multiple instances to be active at all.

When managing multiple instances on a single node it is important to consider each process will receive a request in round-robin distribution and that this needs to be considered when debugging scripts, using the Server Administration page and the likes of WATCH, etc. (see 8.1 Server Instances).

8.1.3Configuration

If not explicitly configured only one instance is created. The configuration directive [InstanceMax] allows multiple instances to be specified Global Configuration of WASD Configuration). When this is set to an integer that many instances are created and maintained. If set to "CPU" then one instance per system CPU is created. If set to "CPU-integer" then one instance for all but one CPU is created, etc. The current limit on instances is eight, although this is somewhat arbitrary. As with all requests, Server Administration page access is automatically shared between instances. There are occasions when consistent access to a single instance is desirable. This is provided via an admin service (see Service Configuration of WASD Configuration).

When executing, the server process name appends the instance number to the "WASD". Associated scripting processes are named accordingly. This example shows such a system:

Pid Process Name State Pri I/O CPU Page flts Pages 21600801 SWAPPER HIB 16 0 0 00:06:53.65 0 0 21600807 CLUSTER_SERVER HIB 12 1879 0 00:01:14.51 91 112 21600808 CONFIGURE HIB 10 30 0 00:00:01.46 47 23 … 21600816 ACME_SERVER HIB 10 71525 0 00:01:28.08 508 713 M 21600818 SMISERVER HIB 9 11197 0 00:00:02.29 158 231 21600819 TP_SERVER HIB 9 1337711 0 00:05:55.78 80 105 … 216421F1 WASD1:80 HIB 5 5365731 0 00:23:12.86 37182 7912 2164523F WASD2:80 HIB 5 5347938 0 00:23:31.41 38983 7831 2162BA5D WASD_WOTSUP HIB 3 2111 0 00:00:00.47 735 518 2164ABCF WASD1:80-651 LEF 6 57884 0 00:00:16.71 3562 3417 2164CBDB WASD2:80-612 LEF 4 19249 0 00:00:04.16 3153 3116 21631BDC WASD2:80-613 LEF 5 18663 0 00:00:07.19 3745 3636 2164BBE6 WASD1:80-658 LEF 5 3009 0 00:00:00.94 2359 2263 …

8.1.4Status

The instance management infrastructure distributes basic status data to all instances on the node and/or cluster. The intent is to provide an easily comprehended snapshot of multi-instance/multi-node WASD processing status. The data comprises:

The data are constrained to these items due to the need to accomodate it within a 64 byte lock value block for cluster purposes. Single node environments do not utilise the DLM, each instance updating its table entry directly.

Each node has a table with an entry for every other instance in that WASD environment. Instance data are updated once every minute so any instance with data older than one minute is no longer behaving correctly. This could be due to some internal error, or that the instance no longer exists (e.g. been stopped, exited or otherwise no longer executing). An entry for an instance that no longer exists is retained indefinitely, or until a /DO=STATUS=PURGE is performed removing all such expired entries, or a /DO=STATUS=RESET removing all entries (and allowing those currently executing to repopulate the instance data over the next minute.

These status data are accessible via command-line and in-browser reports, intended for larger WASD installations, primarily those operating across multiple nodes in a cluster. With the data being stored in a common, another of those other nodes can provide a per-cluster history even if one or more nodes become completely non-operational.

This is an example report on a 132 column terminal display. Due to screen width constraints the date/time omits the year field of the date.

$ httpd/do=status Instance Ago Up Ago Count Exit Ago Status Version /Min /Hour ~~~~~~~~~~~~~~~~ ~~~~ ~~~~~~~~~~~~~~~ ~~~~ ~~~~~ ~~~~~~~~~~~~~~~ ~~~~ ~~~~~~~~~~ ~~~~~~~ ~~~~ ~~~~~ 1 KLAATU::WASD:80 41s 18-DEC 23:27:57 54m 21 18-DEC 23:27:57 54m %X00000001 11.2.0 2 17 KLAATU::WASD1:80---1d-17-DEC-02:49:21---1d-----5-17-DEC-02:50:03---1d-%X00000001-11.2.0----3-----15 KLAATU::WASD2:80---1d-17-DEC-02:49:25---1d-----5-17-DEC-02:50:07---1d-%X00000001-11.2.0----0-----10 KLAATU::WASD3:80---1d-17-DEC-02:49:29---1d-----6-17-DEC-02:50:11---1d-%X00000001-11.2.0----0------3 as at 19-DEC-2017 00:22:41

This provides an example CLI report showing a single node, where a single instance has been started, changed to a three instance configuration, restarted so that the three instances have begun processing. The configuration has been returned a single instance and then the existing three instances restarted the previous day, resulting in the original single instance returning to processing. That instance was last (re)started some 54 minutes ago (a normal exit status showing) and its status was last updated some 41 seconds ago. Note that the three instances showing white-space struck-through with hyphens are stale, having last been updated 1 day ago. Entries older than three minutes are displayed in this format to differentiate them from current entries.

The same report on an 80 column terminal. Note that the overt date/time has been omitted, leaving only the period ago the event happened.

$ httpd/do=status Instance Ago Up Count Exit Status Version /Min /Hour ~~~~~~~~~~~~~~~~ ~~~~ ~~~~ ~~~~~ ~~~~ ~~~~~~~~~~ ~~~~~~~ ~~~~ ~~~~~ 1 KLAATU::WASD:80 5s 58m 21 58m %X00000001 11.2.0 1 18 KLAATU::WASD1:80---1d---1d-----5---1d-%X00000001-11.2.0----3-----15 KLAATU::WASD2:80---1d---1d-----5---1d-%X00000001-11.2.0----0-----10 KLAATU::WASD3:80---1d---1d-----6---1d-%X00000001-11.2.0----0------3 as at 19-DEC-2017 00:25:05

Where multiple instances exist, or have existed, and the terminal page size is greater than 24 lines, HTTPMON displays an equivalent of the 80 column report at the bottom of the display.

Similarly, the Server Admin report (9. Server Administration) shows an HTML equivalent of the 80 column report immediately below the control and time panels.

Using Instance Status

8.2Server Environments

WASD server environments allow multiple, distinctly configured environments to execute on a single system. Generally, WASD's unlimited virtual servers and multiple account scripting eliminates the need for multiple execution environments to kludge these requirements. However there may be circumstances that make this desirable; regression and forward-compatibility testing comes to mind.

8.Instances and Environments