<COMMENT>(--------------------------------------------------------------------)

<CHAPTER>(Instances and Environments\cr_instances_and_environments)

<P> WASD <EMPHASIS>(instances) and <EMPHASIS>(environments) are two distinct
mechanisms for supporting multiple WASD server processes on a single system.

<P> Server instances are multiple, cooperating server processes
providing the same set of configured resources.

<P> Server environments are multiple, independent server processes
providing differently configured resources.

<COMMENT>(....................................................................)

<HEAD1>(Server Instances\hd_server_instances)

<P> The term <EMPHASIS>(instance) is used by WASD to describe an autonomous
server process.  WASD will support multiple server processes running on a
single system, alone  or in combination with multiple server processes running
across a cluster.  This is <UNDERLINE>(not) the same as supporting multiple
virtual servers
(see <COMMENT>(HREF=[-.config]config.sdml!../config/#hd_virtual_services)<INCLUDE>([-.CONFIG]DOC_config.sdml))
When multiple instances are configured on a single system they cooperate to
distribute the request load between themselves and share certain essential
resources such as accounting and authorization information.

<NOTE>(WARNING)
Versions earlier than Compaq TCP/IP Services v5.3 and some TCPware
v5.<EMPHASIS>(n) (at  least)  have a problem with socket listen queuing that
can cause services to <QUOTE>(hang) (should this happen just disable instances
and restart the server).  Ensure you have the requisite version/ECO/patch
installed before activating multiple instances on production systems!
<ENDNOTE>

<HEAD2>(VMS Clustering Comparison\hd_instances_compare)

<P> The approach WASD has used in providing multiple instance serving may be
compared in many ways to VMS clustering.

<P> A cluster is often described as a loosely-coupled, distributed operating
environment where autonomous processors can join, process and leave (even fail)
independently, participating in a single management domain and communicating
with one another for the purposes of resource sharing and high availability.

<P> Similarly WASD instances run in autonomous, detached processes (across one
or more systems in a cluster) using a common configuration and management
interface, aware of the presence and activity of other instances (via the
Distributed Lock Manager and shared memory), sharing processing load and
providing rolling restart and automatic <QUOTE>(fail-through) as required.

<HEAD>(Load Sharing\hd_instances_load)

<P> On a multi-CPU system there are performance advantages to having processing
available for scheduling on each.  WASD employs AST (I/O) based processing and
was not originally designed to support VMS kernel threading.  Benchmarking has
shown this to be quite fast and efficient even when compared to a
kernel-threaded server (OSU) across 2 CPUs.  The advantage of multiple CPUs for
a single multi-threaded server also diminishes where a site frequently
activates scripts for processing.  These of course (potentially) require a CPU
each for processing.  Where a system has many CPUs (and to a lesser extent with
only two and few script activations) WASD's single-process, AST-driven design
would scale more poorly.  Running multiple WASD instances addresses this.

<P> <EMPHASIS>(Of course load sharing is not the only advantage to multiple
instances <HELLIPSIS>\BOLD)

<HEAD>(Restart\hd_instances_restart)

<P> When multiple WASD instances are executing on a node and a restart is
initiated only one process shuts down at a time.  Others remain available for
requests until the one restarting is again fully ready to process them itself,
at which point the next commences restart.  This has been termed a
<EMPHASIS>(rolling restart).  Such behaviour allows server reconfiguration on a
busy site without even a small loss of availability.

<HEAD>(Fail-Through\hd_instances_fail)

<P> When multiple instances are executing on a node and one of these exits for
some reason (resource exhaustion, bugcheck, etc.) the other(s) will continue
to process requests.  Of course requests in-progress by the particular instance
at the time of instance failure are disconnected (this contrasts with the
rolling restart behaviour described above).  If the former process has
actually exited (in contrast to just the image) a new server process will
automatically be created after a few seconds.

<P> The term <EMPHASIS>(fail-through) is used rather than <EMPHASIS>(failover)
because one server does not commence processing as another ceases.  All servers
are constantly active with those remaining immediately and automatically taking
all requests in the absence any one (or more) of them.

<HEAD2>(Considerations\hd_instances_cost)

<P> Of course <QUOTE>(there is no such thing as a free lunch) and supporting
multiple instances is no exception to this rule.  To coordinate activity
between and access to shared resources, multiple instances use low-level
mutexes and the VMS Distributed Lock Manager (DLM).  This does add some system
overhead and a little latency to request processing, however as the benchmarks
indicate increases in overall request throughput on a multi-CPU system easily
offset these costs.  On single CPU systems the advantages of rolling restart
and fail-through need to be assessed against the small cost on a per-site
basis.  It is to be expected many low activity sites  will not require multiple
instances to be active at all.

<P> When managing multiple instances on a single node it is important to
consider each process will receive a request in round-robin distribution and
that this needs to be considered when debugging scripts, using the Server
Administration page and the likes of WATCH, etc.
(see <REFERENCE>(hd_watch_instances)).

<HEAD2>(Configuration\hd_instances_config)

<P> If not explicitly configured only one instance is created.  The
configuration directive [InstanceMax] allows multiple instances to be
specified
(see <COMMENT>(HREF=[-.config]config.sdml!../config/#cr_config_directives)<INCLUDE>([-.CONFIG]DOC_config.sdml))
When this is set to an integer that many instances are created and maintained. 
If set to <QUOTE>(CPU) then one instance per system CPU is created.  If set to
<QUOTE>(CPU-<EMPHASIS>(integer)) then one instance for all but one CPU is
created, etc.  The current limit on instances is eight, although this is
somewhat arbitrary.  As with all requests, Server Administration page access is
automatically shared between instances.  There are occasions when consistent
access to a single instance is desirable.  This is provided via an
<EMPHASIS>(admin service)
(see <COMMENT>(HREF=[-.config]config.sdml!../config/#cr_service_directives)<INCLUDE>([-.CONFIG]DOC_config.sdml))

<P> When executing, the server process name appends the instance number to the
<QUOTE>(WASD).  Associated scripting processes are named accordingly.  This
example shows such a system:

<CODE_EXAMPLE>
  <COMMENT>()  Pid    Process Name    State  Pri      I/O       CPU       Page flts  Pages
  21600801 SWAPPER         HIB     16        0   0 00:06:53.65         0      0
  21600807 CLUSTER_SERVER  HIB     12     1879   0 00:01:14.51        91    112
  21600808 CONFIGURE       HIB     10       30   0 00:00:01.46        47     23
  <HELLIPSIS>
  21600816 ACME_SERVER     HIB     10    71525   0 00:01:28.08       508    713 M
  21600818 SMISERVER       HIB      9    11197   0 00:00:02.29       158    231
  21600819 TP_SERVER       HIB      9  1337711   0 00:05:55.78        80    105
  <HELLIPSIS>
  216421F1 WASD1:80        HIB      5  5365731   0 00:23:12.86     37182   7912
  2164523F WASD2:80        HIB      5  5347938   0 00:23:31.41     38983   7831
  2162BA5D WASD_WOTSUP     HIB      3     2111   0 00:00:00.47       735    518
  2164ABCF WASD1:80-651    LEF      6    57884   0 00:00:16.71      3562   3417
  2164CBDB WASD2:80-612    LEF      4    19249   0 00:00:04.16      3153   3116
  21631BDC WASD2:80-613    LEF      5    18663   0 00:00:07.19      3745   3636
  2164BBE6 WASD1:80-658    LEF      5     3009   0 00:00:00.94      2359   2263
  <HELLIPSIS>
<ENDCODE_EXAMPLE>

<HEAD2>(Status\hd_instance_status)

<P> The instance management infrastructure distributes basic status data to all
instances on the node and/or cluster.  The intent is to provide an easily
comprehended snapshot of multi-instance/multi-node WASD processing status.  The
data comprises:

<LIST>(UNNUMBERED)
<LE> instance name (e.g. "KLAATU::WASD:443")
<LE> date/time the instance status was last updated
<LINE> + how long <EMPHASIS>(ago) this was (seconds, minutes, hours, or days)
<LE> date/time the instance last started
<LINE> + how long <EMPHASIS>(ago) this was (seconds, minutes, hours, or days)
<LE> number of times the instance has started up
<LE> date/time the instance last exited
<LINE> + how long <EMPHASIS>(ago) this was (seconds, minutes, hours, or days)
<LE> the VMS status at the last exit
<LE> instance WASD version (e.g. "11.2.0")
<LE> number of requests processed during the preceding minute
<LE> number of requests processed during the preceding sixty minutes
<ENDLIST>

<P> The data are constrained to these items due to the need to accomodate it
within a 64 byte lock value block for cluster purposes.  Single node
environments do not utilise the DLM, each instance updating its table entry
directly.

<P> Each node has a table with an entry for every other instance in that
WASD environment.  Instance data are updated once every minute so any instance
with data older than one minute is no longer behaving correctly.  This could be
due to some internal error, or that the instance no longer exists (e.g. been
stopped, exited or otherwise no longer executing).  An entry for an instance
that no longer exists is retained indefinitely, or until a /DO=STATUS=PURGE is
performed removing all such <EMPHASIS>(expired) entries, or a /DO=STATUS=RESET
removing all entries (and allowing those currently executing to repopulate the
instance data over the next minute.

<P> These status data are accessible via command-line and in-browser reports,
intended for larger WASD installations, primarily those operating across
multiple nodes in a cluster.  With the data being stored in a common, another
of those other nodes can provide a per-cluster history even if one or more
nodes become completely non-operational.

<p> This is an example report on a 132 column terminal display.  Due to screen
width constraints the date/time omits the year field of the date.

<CODE_EXAMPLE>
$ httpd/do=status
    Instance          Ago Up               Ago Count Exit             Ago Status     Version /Min /Hour
    ~~~~~~~~~~~~~~~~ ~~~~ ~~~~~~~~~~~~~~~ ~~~~ ~~~~~ ~~~~~~~~~~~~~~~ ~~~~ ~~~~~~~~~~ ~~~~~~~ ~~~~ ~~~~~
 1  KLAATU::WASD:80   41s 18-DEC 23:27:57  54m    21 18-DEC 23:27:57  54m %X00000001 11.2.0    2     17
    KLAATU::WASD1:80---1d-17-DEC-02:49:21---1d-----5-17-DEC-02:50:03---1d-%X00000001-11.2.0----3-----15
    KLAATU::WASD2:80---1d-17-DEC-02:49:25---1d-----5-17-DEC-02:50:07---1d-%X00000001-11.2.0----0-----10
    KLAATU::WASD3:80---1d-17-DEC-02:49:29---1d-----6-17-DEC-02:50:11---1d-%X00000001-11.2.0----0------3
    as at 19-DEC-2017 00:22:41
<ENDCODE_EXAMPLE>

<P> This provides an example CLI report showing a single node, where a
single instance has been started, changed to a three instance configuration,
restarted so that the three instances have begun processing.  The configuration
has been returned a single instance and then the existing three instances
restarted the previous day, resulting in the original single instance returning
to processing.  That instance was last (re)started some 54 minutes ago (a
normal exit status showing) and its status was last updated some 41 seconds
ago.  Note that the three instances showing white-space struck-through with
hyphens are stale, having last been updated 1 day ago.  Entries older than
three minutes are displayed in this format to differentiate them from current
entries.

<P> The same report on an 80 column terminal.  Note that the overt date/time
has been omitted, leaving only the period <EMPHASIS>(ago) the event happened.

<CODE_EXAMPLE>
$ httpd/do=status
    Instance          Ago   Up Count Exit Status     Version /Min /Hour
    ~~~~~~~~~~~~~~~~ ~~~~ ~~~~ ~~~~~ ~~~~ ~~~~~~~~~~ ~~~~~~~ ~~~~ ~~~~~
 1  KLAATU::WASD:80    5s  58m    21  58m %X00000001 11.2.0    1     18
    KLAATU::WASD1:80---1d---1d-----5---1d-%X00000001-11.2.0----3-----15
    KLAATU::WASD2:80---1d---1d-----5---1d-%X00000001-11.2.0----0-----10
    KLAATU::WASD3:80---1d---1d-----6---1d-%X00000001-11.2.0----0------3
    as at 19-DEC-2017 00:25:05
<ENDCODE_EXAMPLE>

<P> Where multiple instances exist, or have existed, and the terminal page size
is greater than 24 lines, HTTPMON displays an equivalent of the 80 column
report at the bottom of the display.

<P> Similarly, the Server Admin report (<REFERENCE>(cr_server_admin)) shows an
HTML equivalent of the 80 column report immediately below the control and time
panels.

<HEAD>(Using Instance Status\hd_instance_status_using)

<LIST>(UNNUMBERED)

<LE> The strike-through (hyphens) of an instance line immediately indicates the
instance is no longer updating (after 3 minutes).
<P> Clear stale entries using $ HTTPD/DO=STATUS=PURGE.

<LE> The instance name <EMPHASIS>(Ago) shows how long ago it was last updated.

<LE> If the exit <EMPHASIS>(Ago) is more recent than the startup
<EMPHASIS>(Ago) the instance has exited but not restarted.  The exit
<EMPHASIS>(Status) can show a non-normal status (i.e. not %X00000001).

<LE> An excessive startup <EMPHASIS>(Count) suggests something amiss.

<LE> Per-minute and/or per-hour request counts that seem atypically low while
instance status seems otherwise normal suggests a networking issue, perhaps
up-stream.

<ENDLIST>

<COMMENT>(....................................................................)

<HEAD1>(Server Environments\hd_server_environments)

<P> WASD server environments allow multiple, distinctly configured environments
to execute on a single system.  Generally, WASD's unlimited virtual servers and
multiple account scripting eliminates the need for multiple execution
environments to kludge these requirements.  However there may be circumstances
that make this desirable; regression and forward-compatibility testing comes to
mind.

<P> First some general comments on the design of WASD.

<LIST>(UNNUMBERED)

<LE> WASD creates and populates it's own logical name table
(see <COMMENT>(HREF=[-.config]config.sdml!../config/#hd_logical_names)<INCLUDE>([-.CONFIG]DOC_config.sdml)).

<P> It also adds the WASD_FILE_DEV[<EMPHASIS>(n)] and WASD_ROOT[<EMPHASIS>(n)]
logical names to the SYSTEM logical name table.

<LE> WASD creates and uses rights identifiers.

<P> Installation creates and associates specific rights identifiers with
separate accounts for server and script execution.  Some specifically named
identifiers have functional meaning to the server.  Server startup can create
and associate rights identifers used to manage the server run-time environment.

<LE> WASD makes extensive use of the DLM to coordinate WASD activities system-
and cluster-wide.

<P> All executing server images are aware of all other executing server
images on the same system and within the same cluster.  This performs all
manner of coordination (e.g. instance recovery, instantiated services) and
data exchange (e.g. $HTTPD/DO=MAP/ALL) activities.

<LE> WASD uses global sections to accumulate data and for communication between
WASD instances.

<P> Some of these are by default permanent and remain on a system unless
explicitly removed.

<LE> WASD uses detached scripting processes.

<P> As it's possible to $STOP a server process (and thereby prevent it's
run-down handlers from cleaning up those detached processes).  It therefore
needs to be able to recognise as its 'own' and clean any such 'orphaned'
processes up next time it starts.  It does this by having a rights identifier
associated with the server process name (e.g. WASD:80 grants its scripting
processes WASD_PRC_WASD_80, a second instance WASD2:80, WASD_PRC_WASD2_80,
etc.)

<ENDLIST>

<P> All of these mechanisms support multiple, independent environments on a
single system.  Due to design and implementation considerations there are
fifteen such environments available per system.  The primary (default) is one. 
Environments two to fifteen are available for site usage.  (Demonstration mode,
/DEMO uses environment zero.)  Server <EMPHASIS>(instances)
(<REFERENCE>(hd_server_instances)) share a single environment.

<P> There are two approaches to provisioning such multiple, independent
environments.

<HEAD2>(Ad Hoc Server Wrapper\hd_instances_adhoc)

<P> This is a DCL procedure that allows virtually any WASD release HTTP server
to be executed in a detached process, either by itself or concurrently with a
full release or other ad hoc detached server.  The server image and associated
configuration files used by this process can be specified within the procedure
allowing completely independent versions and environments to be fully supported.

<P> Full usage instructions may be found in the example procedure(s) in

<COMMENT>(HTML=<a target="_blank" href="/wasd_root/example/*adhoc*.*">WASD_ROOT:[EXAMPLE]*ADHOC*.COM</a>)

<COMMENT>(HTML/OFF)
WASD_ROOT:[EXAMPLE]*ADHOC*.COM
<COMMENT>(HTML/ON)

<P> Two versions are provided, one for pre-v10 and one for post-v10 (due to
changes in logical naming schema).

<HEAD2>(Formal Environments\hd_instances_formal)

<P> Although the basic infrastructure for supporting multiple environments
(i.e. the 0..15 environment number) has been in place since version 8, formal
support in server CLI qualifiers and DCL procedures has only been available
since version 10.  To support version 9 or earlier environments the
<REFERENCE>(hd_instances_adhoc) must be used.

<P> WASD version 10 startup and other run-time procedures have been modified to
support running multiple WASD environments simply from independent WASD
file-system trees.  The standard

<COMMENT>(HTML=
<a target="_blank" href="/wasd_root/example/startup.com">STARTUP.COM</a>
)

<COMMENT>(HTML/OFF)
STARTUP.COM
<COMMENT>(HTML/ON)

procedure accepts the WASD_ENV parameter to specify which environment (1..15)
the server should execute within (primary/default is 1).  The procedure then
derives the WASD_ROOT logical name from the location of the startup procedure.

<P> For example:

<CODE_EXAMPLE>
  $! start current release
  $ WASD_STARTUP = "/SYSUAF=(ID,SSL)/PERSONA"
  $ @DKA0:[WASD_ROOT.STARTUP]STARTUP.COM
  $! start previous release in environment 2
  $ WASD_ENV = 2
  $ @DKA0:[WASD_ROOT_MINUS1.STARTUP]STARTUP.COM
<ENDCODE_EXAMPLE>

<HEAD2>(Considerations\hd_instances_formal_consider)

<P> WASD environments each fully support all WASD features and facilities
(including multiple server instances) with the exception of DECnet scripting
where because of DECnet objects' global (per-system) definition only the one
must be shared between environments.

<P> Per-environment configuration must be done in its own WASD_ROOT part of the
file-system and logical names must be defined in the environment's associated
logical name table.  The site administrator must keep track of which
environment requires to be accessed from the command-line and set the process
logical name search list using the appropriate

<CODE_EXAMPLE>
  $ @WASD_FILE_DEV<EMPHASIS>([n])
<ENDCODE_EXAMPLE>

<CP> where <EMPHASIS>(n) can be a non-primary environment number
(see <COMMENT>(HREF=[-.config]config.sdml!../config/#hd_logical_names)<INCLUDE>([-.CONFIG]DOC_config.sdml)).

<P> It is not possible to have multiple environments bind their services to the
same IP address and port (for fundamental networking reasons).  Unless the
network interface is specifically multi-homed for the purpose, services
provided by separate environments must be configured to use unique IP ports.

<P> Non-primary environments (2<HELLIPSIS>15) prefix the environment as a
(hex) digit before the <QUOTE>(WASD) in the process name.  The above example
when executing, each with a single scripting process, would appear in the
system as (second environment providing a service on port 2280):

<CODE_EXAMPLE>
  <COMMENT>()  Pid    Process Name    State  Pri      I/O       CPU       Page flts  Pages
  00000101 SWAPPER         HIB     16        0   0 00:00:11.98         0      0
  <HELLIPSIS>
  00000111 ACME_SERVER     HIB     10     6247   0 00:00:12.63       540    611 M
  00000112 QUEUE_MANAGER   HIB     10      328   0 00:00:00.18       136    175
  00000122 TCPIP$INETACP   HIB     10  1249419   0 00:07:33.95       401    326
  00000123 TCPIP$ROUTED    LEF      6  3495839   0 00:01:15.49       166    165 S
  <HELLIPSIS>
  00000468 WASD:80         HIB      6   132924   0 00:01:29.26     17868   2856
  0000046D 2WASD:2280      HIB      6   129344   0 00:01:29.26     17712   2840
  0000049D WASD:80-8       LEF      4     4449   0 00:00:00.67       934    194
  00000503 2WASD:2280-2    LEF      4      565   0 00:00:00.28       732    102
  <HELLIPSIS>
<ENDCODE_EXAMPLE>

<HEAD>(Cleaning Up\hd_instances_formal_cleanup)

<P> As described earlier each environment creates and maintains logical name
table(s) and system-level name(s), detached scripting processes, lock resources
and permananent global sections.  Lock resources disappear with the server
processes.  Logical names, global sections, rights identifiers and occasionally
detached scripting processes may require some cleaning up when a non-primary
environment's use is concluded.

<COMMENT>(--------------------------------------------------------------------)