1100_PERFORMANCE.WASDOC

[0001]
[0002]
[0003]
[0004]
[0005]
[0006]
[0007]
[0008]
[0009]
[0010]
[0011]
[0012]
[0013]
[0014]
[0015]
[0016]
[0017]
[0018]
[0019]
[0020]
[0021]
[0022]
[0023]
[0024]
[0025]
[0026]
[0027]
[0028]
[0029]
[0030]
[0031]
[0032]
[0033]
[0034]
[0035]
[0036]
[0037]
[0038]
[0039]
[0040]
[0041]
[0042]
[0043]
[0044]
[0045]
[0046]
[0047]
[0048]
[0049]
[0050]
[0051]
[0052]
[0053]
[0054]
[0055]
[0056]
[0057]
[0058]
[0059]
[0060]
[0061]
[0062]
[0063]
[0064]
[0065]
[0066]
[0067]
[0068]
[0069]
[0070]
[0071]
[0072]
[0073]
[0074]
[0075]
[0076]
[0077]
[0078]
[0079]
[0080]
[0081]
[0082]
[0083]
[0084]
[0085]
[0086]
[0087]
[0088]
[0089]
[0090]
[0091]
[0092]
[0093]
[0094]
[0095]
[0096]
[0097]
[0098]
[0099]
[0100]
[0101]
[0102]
[0103]
[0104]
[0105]
[0106]
[0107]
[0108]
[0109]
[0110]
[0111]
[0112]
[0113]
[0114]
[0115]
[0116]
[0117]
[0118]
[0119]
[0120]
[0121]
[0122]
[0123]
[0124]
[0125]
[0126]
[0127]
[0128]
[0129]
[0130]
[0131]
[0132]
[0133]
[0134]
[0135]
[0136]
[0137]
[0138]
[0139]
[0140]
[0141]
[0142]
[0143]
[0144]
[0145]
[0146]
[0147]
[0148]
[0149]
[0150]
[0151]
[0152]
[0153]
[0154]
[0155]
[0156]
[0157]
[0158]
[0159]
[0160]
[0161]
[0162]
[0163]
[0164]
[0165]
[0166]
[0167]
[0168]
[0169]
[0170]
[0171]
[0172]
[0173]
[0174]
[0175]
[0176]
[0177]
[0178]
[0179]
[0180]
[0181]
[0182]
[0183]
[0184]
[0185]
[0186]
[0187]
[0188]
[0189]
[0190]
[0191]
[0192]
[0193]
[0194]
[0195]
[0196]
[0197]
[0198]
[0199]
[0200]
[0201]
[0202]
[0203]
[0204]
[0205]
[0206]
[0207]
[0208]
[0209]
[0210]
[0211]
[0212]
[0213]
[0214]
[0215]
[0216]
[0217]
[0218]
[0219]
[0220]
[0221]
[0222]
[0223]
[0224]
[0225]
[0226]
[0227]
[0228]
[0229]
[0230]
[0231]
[0232]
[0233]
[0234]
[0235]
[0236]
[0237]
[0238]
[0239]
[0240]
[0241]
[0242]
[0243]
[0244]
[0245]
[0246]
[0247]
[0248]
[0249]
[0250]
[0251]
[0252]
[0253]
[0254]
[0255]
[0256]
[0257]
[0258]

|1Server Performance|

|note><|
|0These Are v11.5 Results|
It is planned to evaluate x86-64 v12 performance once OpenVMS V9.2-1 and native
compilers become available some time later in CY2022.
|!note|

|^ The server has a single-process, multi-threaded, asynchronous I/O design. On
a single-processor system this is the most efficient approach.  On a
multi-processor system it is limited by the single process context (with
scripts executing within their own context).  For I/O constrained processing
(the most common in general Web environments) the AST-driven approach is quite
efficient.

|^ The test-bench system was an |*DEC PWS 500 with 1 CPU and 1.5GB memory||,
running |*VSI OpenVMS V8.4-2L1 and VSI TCP/IP TCPIP V5.7-13ECO5F||.

|note|
|0Sure, an old clunker|
WASD largely has been developed on this system for 15+ years.

|^ While by today's standards it is a very resource constrained system,
especially by the EV56 (21164A) CPU, it has pretty-much done everything asked
of it for all that time.  Importantly, it has recent releases of system
software, courtesy of VSI's ISV support programme.  For performance purposes,
this allows comparison with recent releases of CSWS (VMS Apache).

|^ The requirements for a test-bench system effectively excludes production
systems, especially external ones, hence working with what is at hand. 

|!note|

|^ This performance data (WASD v11.5) has been collected very differently to
the next most recent from over a decade ago (WASD v10.0).  Apart from the move
from an HP rx2600 to the vintage PWS 500, the previous benchmarking tools were
WASD-in-house, ApacheBench (AB) and WASDbench (WB), executing on the same
system as the server, eliminating network traffic |/on-the-wire||.  The current
absolute benchmarks cannot meaningfully be compared to previous data.  The
relativities seem to be comparable.

|0Benchmark Setup|

|^ These data have been collected using the |/h2load|| utility
(|link%|https://nghttp2.org/documentation/h2load.1.html||) from the HTTP/2
C Library (|link%|https://nghttp2.org||).  This utility can be used to
configurably load |*HTTP, HTTPS and HTTP/2| servers.  Note that the number of
client threads (|=.-t||) is explicitly set to the connection concurrency
(|=.-c||) to maximise |/h2load|| processing.

|^ The |/h2load|| utility is running on a an 8CPU 32GB Mac Pro, across a 500
Mbps LAN to the 100 Mbps interface of the PWS.  The obvious resource
constraints are the single PWS CPU and network interface.  Every effort has
been made to ensure these do not unreasonably constrain the comparison.

|^ Clear text HTTP (port 80) data is collected to measure internal server
processing without the CPU-intensive overhead of encryption.  Encrypted HTTP
(port 443) data provides more real-world scenarios (especially now clear-text
is largely deprecated).  Both WASD and Apache were using OpenSSL 1.1.1 and
negotiated TLS v1.2. 

|^ Output from |/h2load|| benchmarking runs are included in the
|link%|/wasd_root/exercise/*v115*.txt|WASD_ROOT:[EXERCISE]*V115*.TXT| directory
and is summarised below.

|0These results are indicative only!|

|^ Every endeavour has been made to ensure the comparison is as equitable as
possible.  Both servers execute at the same process priority, access logging
and host name lookup disabled, and runs on the same machine in the same
relatively quiescent environment.  Each test run was interleaved between each
server to try and distribute any environment variations.  Those runs that are
very high throughput use a larger number of requests to improve sample period
validity.  Both servers were configured pretty-much "out-of-the-box", minimal
changes (generally just enough to get the test environment going).  Multiple
data collections have yielded essentially equivalent relative results.

|^ For the test-bench WASD v11.5 is present on ports 80 and 443.

|0Apache Comparison||

|^ The Apache comparison used the latest VSI AXPVMS CSWS V2.4-38C (based on
Apache v2.4.38) kit.  Apache is present on ports 7780 and 7443.

|0OSU Comparison||

|^ Previous benchmarking included OSU data.  These are no longer collected.

|2Simple File Request Turn-Around|

|^ A series of tests using batches of accesses. The first test returned an
empty file measuring response and file access time, without any actual
transfer. The second requested a file of 64k characters, testing performance
with a more realistic load.  All were done using one and ten concurrent
requests.

|block><|

|0_HTTP/1.1 clear|
|0Concurrency 1|
|tabular|
|~ |: |:2 Requests/Second|:2 Data Rate MBps
|~ |: Response|: WASD|: Apache|: WASD|: Apache
|~ |. 0k |. 352 |. 71 |. 0.104 |. 0.018
|~ |. 64k |. 61 |. 36 |. 3.740 |. 2.230
|!tabular|

|0Concurrency 10|
|tabular|
|~ |: |:2 Requests/Second|:2 Data Rate MBps
|~ |: Response|: WASD|: Apache|: WASD|: Apache
|~ |. 0k |. 1146 |. 67 |. 0.338 |. 0.017
|~ |. 64k |. 124 |. 48 |. 7.590 |. 2.940
|!tabular|

|0_HTTP/1.1 encrypted|
|0Concurrency 1|
|tabular|
|~ |: |:2 Requests/Second|:2 Data Rate MBps
|~ |: Response|: WASD|: Apache|: WASD|: Apache
|~ |. 0k |. 276 |. 51 |. 0.092 |. 0.013
|~ |. 64k |. 21 |. 25 |. 1.300 |. 1.550
|!tabular|

|0Concurrency 10|
|tabular|
|~ |: |:2 Requests/Second|:2 Data Rate MBps
|~ |: Response|: WASD|: Apache|: WASD|: Apache
|~ |. 0k |. 175 |. 46 |. 0.580 |. 0.112
|~ |. 64k |. 39 |. 24 |. 2.360 |. 1.440
|!tabular|

|0_HTTP/2 (encrypted)|
|^ (VMS Apache currently does not support HTTP/2)
|0Concurrency 1|
|tabular|
|~ |: |:2 Requests/Second|:2 Data Rate MBps
|~ |: Response|: WASD|: Apache|: WASD|: Apache
|~ |. 0k |. 191 |. - |. 0.286 |. -
|~ |. 64k |. 20 |. - |. 1.210 |. -
|!tabular|

|0Concurrency 10|
|tabular|
|~ |: |:2 Requests/Second|:2 Data Rate MBps
|~ |: Response|: WASD|: Apache|: WASD|: Apache
|~ |. 0k |. 156 |. - |. 0.240 |. -
|~ |. 64k |. 37 |. - |. 2.250 |. -
|!tabular|

|!block|

|^ Data file (non-relevant output snipped):

|simple#|
|item| |link%|/wasd_root/exercise/perf_files_v115.txt|\
WASD_ROOT:[EXERCISE]PERF_FILES_V115.TXT|
|!simple|

|0File Transfer Rate|

|^ Requests for a large |/binary| file (3.92MB - 8039 blocks) indicate a
|*potential transfer rate of multiple Mbytes per second||.

|block><|
|0Data Rate - MBytes/Second|
|^ (VMS Apache currently does not support HTTP/2)

|tabular|
|~ |.          |: Concurrent|: WASD|: Apache 
|~ |:12 HTTP/1.1\<br\>(clear) |. 1  |. 6.07 |. 4.40
|~          |. 10 |. 8.85 |. 8.70
|~ |:12 HTTP/1.1\<br\>(encrypted) |. 1  |. 2.91 |. 3.23
|~          |. 10 |. 2.77 |. 2.92
|~ |:12 HTTP/2\<br\>(encrypted) |. 1  |. 2.77 |. -
|~          |. 10 |. 2.80 |. -
|!tabular|
|!block|

|^ Data file (non-relevant output snipped):

|simple|
|& |link%|/wasd_root/exercise/perf_xfer_v115.txt|\
WASD_ROOT:[EXERCISE]PERF_XFER_V115.TXT|
|!simple|

|0File Record Format||

|^ The WASD server can handle STREAM, STREAM_LF, STREAM_CR, FIXED and UNDEFINED
record formats very much more efficiently than VARIABLE or VFC files. With
STREAM, FIXED and UNDEFINED files the assumption is that HTTP carriage-control
is within the file itself (i.e. at least the newline (LF), all that is required
required by browsers), and does not require additional processing.  With
VARIABLE record files the carriage-control is implied and therefore each record
requires additional processing by the server to supply it.  Even with variable
record files having multiple records buffered by the HTTPd before writing them
collectively to the network improving efficiency, stream and binary file reads
are by Virtual Block and are written to the network immediately making the
transfer of these very efficient indeed!

|2Scripting|

|^ A simple performance evaluation shows the relative merits of WASD scripting
and Apache in CGI and persistent environments, using
|link%|/wasd_root/src/cgiplus/cgiplustest.c|\
WASD_ROOT:[SRC.CGIPLUS]CGIPLUSTEST.C|
which executes in standard CGI, CGIplus and Apache loadable module
environments.  CGIplus and Apache modules are somewhat analagous.  A series of
accesses were made.  The first test returned only the HTTP header, evaluating
raw request turn-around time. The second test requested a body of 64k
characters, again testing performance with a more realistic load.

|block><|

|0Concurrency 1 - Requests/Second|
|tabular|
|~ |: Response|: WASD CGI|: WASD CGIplus|: Apache CGI|: Apache module
|~ |. 0kB |. 27 |. 193 |. 5 |. 52
|~ |. 64kB |. 14 |. 25 |. 5 |. 31
|!tabular|

|0Concurrency 10 - Requests/Second|
|tabular|
|~ |: Response|: WASD CGI|: WASD CGIplus|: Apache CGI|: Apache module
|~ |. 0kB |. 28 |. 337 |. 4 |. 51 
|~ |. 64kB |. 16 |. 65 |. 4 |. 37
|!tabular|

|!block|

|^ Data file (non-relevant output snipped):

|simple#|
|& |link%|/wasd_root/exercise/perf_scripts_v115.txt|\
WASD_ROOT:[EXERCISE]PERF_SCRIPTS_V115.TXT|
|!simple|

|0Persistent Scripting|

|^ CGI scripting is notoriously slow (as above), hence the effort
expended by designers in creating persistent scripting environments - those
where the scripting engine (and perhaps other state) is maintained between
requests.  Both WASD and Apache implement these as integrated features,
the former as |*CGIplus/RTE||, and in the latter as |*loadable modules||.  

|^ The |/CGIplus| and |/Apache module| data from the above CGIPLUSTEST.EXE
table show the benefits of having scripts persist, reducing activation latency,
thereby increasing throughput, and potentially retaining state, including the
scripts themselves in local caches.   Both WASD and VMS Apache use their
respective |*persistence technologies| to provide common scripting
environments, including |*Perl||, |*PHP| and |*Python||.

|^ The WASD CGIplus/RTE technology used to implement its persistent scripting
environments are available for general use and based on CGI principles offer a
ready adaptation of well-known principles.  Most site-specific scripts can also
be built using the libraries, code fragments, and example scripts provided
with the WASD package,  and obtain similar efficiencies and low latencies.
See |link%|../../scripting/scripting.html|WASD Scripting Environment| document.