/* ** COPYRIGHT (c) 1998 BY COMPAQ COMPUTER CORPORATION ALL RIGHTS RESERVED. ** ** THIS SOFTWARE IS FURNISHED UNDER A LICENSE AND MAY BE USED AND COPIED ** ONLY IN ACCORDANCE OF THE TERMS OF SUCH LICENSE AND WITH THE ** INCLUSION OF THE ABOVE COPYRIGHT NOTICE. THIS SOFTWARE OR ANY OTHER ** COPIES THEREOF MAY NOT BE PROVIDED OR OTHERWISE MADE AVAILABLE TO ANY ** OTHER PERSON. NO TITLE TO AND OWNERSHIP OF THE SOFTWARE IS HEREBY ** TRANSFERRED. ** ** THE INFORMATION IN THIS SOFTWARE IS SUBJECT TO CHANGE WITHOUT NOTICE ** AND SHOULD NOT BE CONSTRUED AS A COMMITMENT BY COMPAQ COMPUTER ** CORPORATION. ** ** COMPAQ ASSUMES NO RESPONSIBILITY FOR THE USE OR RELIABILITY OF ITS ** SOFTWARE ON EQUIPMENT WHICH IS NOT SUPPLIED BY COMPAQ OR DIGITAL. ** **===================================================================== ** WARNING - This example is provided for instructional and demo ** purposes only. The resulting program should not be ** run on systems which make use of soft-affinity ** features of OpenVMS, or while running applications ** which are tuned for precise processor configurations. ** We are continuing to explore enhancements such as this ** program which will be refined and integrated into ** future releases of OpenVMS. **===================================================================== ** ** GCU$BALANCER.C - OpenVMS Galaxy CPU Load Balancer. ** ** This is an example of a privileged application which dynamically ** reassigns CPU resources among instances in an OpenVMS Galaxy. The ** program must be run on each participating instance. Each image ** will create, or map to, a small shared memory section and periodically ** post information regarding the depth of that instances' COM queues. ** Based upon running averages of this data, each instance will ** determine the most, and least busy instance. If these factors ** exist for a specified duration, the least busy instance having ** available secondary processors, will reassign one of its processors ** to the most busy instance, thereby effectively balancing processor ** usage across the OpenVMS Galaxy. The program provides command line ** arguments to allow tuning of the load balancing algorithm. ** The program is admittedly shy on error handling. ** ** This program uses the following OpenVMS Galaxy system services: ** ** SYS$CPU_TRANSITION - CPU reassignment ** SYS$CRMPSC_GDZRO_64 - Shared memory creation ** SYS$SET_SYSTEM_EVENT - OpenVMS Galaxy event notification ** SYS$*_GALAXY_LOCK_* - OpenVMS Galaxy locking ** ** Since OpenVMS Galaxy resources are always reassigned via a "push" ** model, where only the owner instance can release its resources, ** one copy of this process must run on each instance in the OpenVMS ** Galaxy. ** ** ENVIRONMENT: OpenVMS V7.2 Multiple-instance Galaxy. ** ** REQUIRED PRIVILEGES: CMKRNL required to count CPU queues ** SHMEM required to map shared memory ** ** BUILD/COPY INSTRUCTIONS: ** ** Compile and link the example program as described below, or copy the ** precompiled image found in SYS$EXAMPLES:GCU$BALANCER.EXE to ** SYS$COMMON:[SYSEXE]GCU$BALANCER.EXE ** ** If your OpenVMS Galaxy instances utilize individual system disks, you ** will need to do the above for each instance. ** ** If you change the example program, compile and link it as follows: ** ** $ CC GCU$BALANCER.C+SYS$LIBRARY:SYS$LIB_C/LIBRARY ** $ LINK/SYSEXE GCU$BALANCER ** ** STARTUP OPTIONS: ** ** You must establish a DCL command for this program. We have provided a ** sample command table file for this purpose. To install the new command, ** do the following: ** ** $ SET COMMAND/TABLE=SYS$LIBRARY:DCLTABLES - ** /OUT=SYS$COMMON:[SYSLIB]DCLTABLES GCU$BALANCER.CLD ** ** This command inserts the new command definition into DCLTABLES.EXE ** in your common system directory. The new command tables will take ** effect when the system is rebooted. If you would like to avoid a ** reboot, do the following: ** ** $ INSTALL REPLACE SYS$COMMON:[SYSLIB]DCLTABLES.EXE ** ** After this command, you will need to log out, then log back in to ** use the command from any active processes. Alternatively, if you ** would like to avoid logging out, do the following from each process ** you would like to run the balancer from: ** ** $ SET COMMAND GCU$BALANCER.CLD ** ** Once your command has been established, you may use the various ** command line parameters to control the balancer algorithm. ** ** $ CONFIGURE BALANCER{/STATISTICS} x y time ** ** Where: "x" is the number of load samples to take. ** "y" is the number of queued processes required to trigger ** resource reassignment. ** "time" is the delta time between load sampling. ** ** The /STATISTICS qualifier causes the program to display a ** continuous status line. This is useful for tuning the parameters. ** This output is not visible if the balancer is run detached, as is ** the case if it is invoked via the GCU. It is intended to be used ** only when the balancer is invoked directly from DCL in a DECterm ** window. ** ** For example: $ CONFIG BAL 3 1 00:00:05.00 ** ** Starts the balancer which samples the system load every ** 5 seconds. After 3 samples, if the instance has one or ** more processes in the COM queue, a resource (CPU) ** reassignment will occur, giving this instance another CPU. ** ** GCU STARTUP: ** ** The GCU provides a menu item for launching SYS$SYSTEM:GCU$BALANCER.EXE ** and a dialog for altering the balancer algorithm. These features will ** only work if the balancer image is properly installed as described ** the the following paragraphs. ** ** To use the GCU-resident balancer startup option, you must: ** ** 1) Compile, link, or copy the balancer image as described previously. ** 2) Invoke the GCU via: $ CONFIGURE GALAXY You may need to set your ** DECwindows display to a suitably configured workstation or PC. ** 3) Select the "CPU Balancer" entry from the "Galaxy" menu. ** 4) Select appropriate values for your system. This may take some ** testing. By default, the values are set aggressively so that ** the balancer action can be readily observed. If your system is ** very heavily loaded, you will need to increase the values ** accordingly to avoid excessive resource reassignment. The GCU ** does not currently save these values, so you may want to write ** them down once you are satisfied. ** 5) Select the instance/s you wish to have participate, then select ** the "Start" function, then press OK. The GCU should launch the ** process GCU$BALANCER on all selected instances. You may want to ** verify these processes have been started. ** ** ALTERNATIVE STARTUP METHOD FOR HEAVILY LOADED SYSTEMS ** ** On systems with very large numbers of processes, the GCU Balancer's ** process priority should be raised to 15. Since the menu-driven ** startup method described above does not allow for altering priority, ** the following startup method is recommended. ** ** 1) Create a small command procedure such as "GCU_BALANCER_INP.COM" ** containing the following commands: ** ** $ WAIT 00:00:02 ** $ CONFIG BAL 3 1 00:00:05.00 ** ** Note: Adjust the balancer parameters to suit your needs. ** ** 2) Run the balancer using the following command: ** ** $ RUN/DETACH SYS$SYSTEM:LOGINOUT.EXE/PROCESS_NAME=GCU$BALANCER- ** _$/OUT=NL:/PRIORITY=15/INPUT=CLUSTER$COMMON:GCU_BALANCER_INP.COM ** ** Note: If not running in a cluster, place the command procedure ** file in an appropriate directory and alter the run command ** accordingly. ** ** SHUTDOWN WARNING: ** ** In an OpenVMS Galaxy, no process may have shared memory mapped on an ** instance when it leaves the Galaxy, as during a shutdown. Because of ** this, SYS$MANAGER:SYSHUTDWN.COM must be modified to stop the process ** if the GCU$BALANCER program is run from a SYSTEM UIC. Processes in the ** SYSTEM UIC group are not terminated by SHUTDOWN.COM when shutting down ** or rebooting OpenVMS. If a process still has shared memory mapped when ** an instance leaves the Galaxy, the instance will crash with a ** GLXSHUTSHMEM bugcheck. ** ** To make this work, SYS$MANAGER:SYSHUTDWN.COM must stop the process as ** shown in the example below. Alternatively, the process can be run ** under a suitably privileged, non-SYSTEM UIC. ** ** SYSHUTDWN.COM EXAMPLE - Paste into SYS$MANAGER:SYSHUTDWN.COM ** ** $! ** $! If the GCU$BALANCER image is running, stop it to release shmem. ** $! ** $ procctx = f$context("process",ctx,"prcnam","GCU$BALANCER","eql") ** $ procid = f$pid(ctx) ** $ if procid .NES. "" then $ stop/id='procid' ** ** Note, you could also just do a "$ STOP GCU$BALANCER" statement. ** ** OUTPUTS: ** ** If the logical name GCU$BALANCER_VERIFY is defined, notify the ** SYSTEM account when CPUs are reassigned. If the /STATISTICS ** qualifier is specified, a status line is continually displayed, ** but only when run directly from the command line. ** ** REVISION HISTORY: ** ** 23-Sep-2003 Add startup example showing raised execution priority ** 30-Jul-2001 CPU transition code change ANY_CPU -> ANY_OWNED_CPU ** 30-Apr-1999 Added LAN monitoring function. ** 02-Dec-1998 Greatly improved instructions. ** 03-Nov-1998 Improved instructions. ** 24-Sep-1998 Initial code example and integration with GCU. */ #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include /* For CLI */ #include #include #include #define HEARTBEAT_RESTART 0 /* Flags for synchronization */ #define HEARTBEAT_ALIVE 1 #define HEARTBEAT_TRANSPLANT 2 #define GLOCK_TIMEOUT 100000 /* Sanity check, max time holding gLock */ #define _failed(x) (!((x) & 1)) $DESCRIPTOR(system_dsc, "SYSTEM"); /* Brkthru account name */ $DESCRIPTOR(gblsec_dsc, "GCU$BALANCER"); /* Global section name */ struct SYI_ITEM_LIST { /* $GETSYI item list format */ short buflen,item; void *buffer,*length; }; /* System information and an item list to use with $GETSYI */ static unsigned long total_cpus; static uint64 partition_id; static long max_instances = 32; iosb g_iosb; struct SYI_ITEM_LIST syi_itemlist[3] = { {sizeof (long), SYI$_ACTIVECPU_CNT,&total_cpus, 0}, {sizeof (long), SYI$_PARTITION_ID, &partition_id,0}, {0,0,0,0}}; extern uint32 *SCH$AQ_COMH; /* Scheduler COM queue address */ unsigned long PAGESIZE; /* Alpha page size */ uint64 glock_table_handle; /* Galaxy lock table handle */ /* ** Shared Memory layout (64-bit words): ** ==================================== ** 0 to n-1: Busy count, where 100 = 1 process in a CPU queue ** n to 2n-1: Heartbeat (status) for each instance ** 2n to 3n-1: Current CPU count on each instance ** 3n to 4n-1: Galaxy lock handles for modifying heartbeats ** ** where n = max_instances * sizeof(long). ** ** We assume the entire table (easily) fits in two Alpha pages. */ /* Shared memory pointers must be declared volatile */ volatile uint64 gs_va = 0; /* Shmem section address */ volatile uint64 gs_length = 0; /* Shmem section length */ volatile uint64 *gLocks; /* Pointers to gLock handles */ volatile uint64 *busycnt,*heartbeat,*cpucount; /************************************************************************/ /* FUNCTION init_lock_tables - Map to the Galaxy locking table and */ /* create locks if needed. Place the lock handles in a shared memory */ /* region, so all processes can access the locks. */ /* */ /* ENVIRONMENT: Requires SHMEM and CMKRNL to create tables. */ /* INPUTS: None. */ /* OUTPUTS: Any errors from lock table creation. */ /************************************************************************/ int init_lock_tables (void) { int status,i; unsigned long sanity; uint64 handle; unsigned int min_size, max_size; /* Lock table names are 15-byte padded values, unique across a Galaxy. */ char table_name[] = "GCU_BAL_GLOCK "; /* Lock names are 15-byte padded values, but need not be unique. */ char lock_name[] = "GCU_BAL_LOCK "; /* Get the size of a Galaxy lock */ status = sys$get_galaxy_lock_size(&min_size,&max_size); if (_failed(status)) return (status); /* ** Create or map to a process space Galaxy lock table. We assume ** one page is enough to hold the locks. This will work for up ** to 128 instances. */ status = sys$create_galaxy_lock_table(table_name,PSL$C_USER, PAGESIZE,GLCKTBL$C_PROCESS,0,min_size,&glock_table_handle); if (_failed(status)) return (status); /* ** Success case 1: SS$_CREATED ** We created the table, so populate it with locks and ** write the handles to shared memory so the other partitions ** can access them. Only one instance can receive SS$_CREATED ** for a given lock table; all other mappers will get SS$_NORMAL. */ if (status == SS$_CREATED) { printf ("%%GCU$BALANCER-I-CRELOCK, Creating G-locks\n"); for (i=0; i 1000000) return (SS$_TIMEOUT); } } return (SS$_NORMAL); } /************************************************************************/ /* FUNCTION update_cpucount - Update the number of CPUs in this instance*/ /* */ /* ENVIRONMENT: Called directly or via a system event AST. */ /* INPUTS: None. */ /* OUTPUTS: Updates this instance's CPU count in shared memory. */ /************************************************************************/ void update_cpucount(int unused) { sys$getsyiw(EFN$C_ENF,0,0,&syi_itemlist,&g_iosb,0,0); cpucount[partition_id] = total_cpus; } /************************************************************************/ /* FUNCTION cpu_q - Count the number of processes in CPU COM queues */ /* */ /* ENVIRONMENT: OpenVMS Kernel Mode. */ /* INPUTS: None. */ /* OUTPUTS: Returns the number of processes on the COM queues. */ /************************************************************************/ long cpu_q(void) { uint32 *head, *tmp; long procs = 0; int p; head = SCH$AQ_COMH; /* Head of 1st COM queue */ sys_lock(SCHED,1,0); /* Obtain SCHED spinlock */ for (p=64; p>0; p--) /* Queues to scan (32 COM + 32 COMO) */ { tmp = (uint32 *) *head; /* Look at first flink */ while (tmp != head) /* Compare vs. head of queue */ { procs++; /* Different, count a job waiting */ tmp = (uint32 *) *tmp; /* Go to next queue entry */ } head = head + 2; /* Go to next queue (increment by 2*32) */ } /* And scan it (loop to "for p...") */ sys_unlock(SCHED,0,0); /* Release SCHED spinlock */ return procs; } /************************************************************************/ /* FUNCTION lockdown - Lock the cpu_q routine into the working set */ /* so that it can't pagefault while at elevated IPL */ /* */ /* ENVIRONMENT: Requires CMKRNL privilege. */ /* INPUTS: None. */ /* OUTPUTS: None. */ /************************************************************************/ void lockdown(void) { struct pdscdef *proc_desc = (void *)cpu_q; unsigned long sub_addr[2], locked_head[2], locked_code[2]; unsigned long status; sub_addr[0] = (unsigned long) cpu_q; sub_addr[1] = sub_addr[0] + PAGESIZE; if (__PAL_PROBER((void *)sub_addr[0],sizeof(int),PSL$C_USER) != 0) sub_addr[1] = sub_addr[0]; status = sys$lkwset(sub_addr,locked_head,PSL$C_USER); if (_failed(status)) exit(status); sub_addr[0] = proc_desc->pdsc$q_entry[0]; sub_addr[1] = sub_addr[0] + PAGESIZE; if (__PAL_PROBER((void *)sub_addr[0],sizeof(int),PSL$C_USER) != 0) sub_addr[1] = sub_addr[0]; status = sys$lkwset(sub_addr,locked_code,PSL$C_USER); if (_failed(status)) exit(status); } /************************************************************************/ /* FUNCTION reassign_a_cpu - Reassign a single CPU to another instance. */ /* */ /* ENVIRONMENT: Requires CMKRNL privilege. */ /* INPUTS: most_busy_id: partition ID of destination. */ /* OUTPUTS: None. */ /* */ /* Donate one CPU at a time - then wait for the remote instance to */ /* reset its heartbeat and recalculate its load. */ /************************************************************************/ void reassign_a_cpu(int most_busy_id) { int status,i; static char op_msg[255]; static char iname_msg[1]; $DESCRIPTOR(op_dsc,op_msg); $DESCRIPTOR(iname_dsc,""); iname_dsc.dsc$w_length = 0; /* Update CPU info */ status = sys$getsyiw(EFN$C_ENF,0,0,&syi_itemlist,&g_iosb,0,0); if (_failed(status)) exit(status); /* Don't attempt reassignment if we are down to one CPU */ if (total_cpus > 1) { status = sys$acquire_galaxy_lock(gLocks[most_busy_id],GLOCK_TIMEOUT,0); if (_failed(status)) exit(status); heartbeat[most_busy_id] = HEARTBEAT_TRANSPLANT; status = sys$release_galaxy_lock(gLocks[most_busy_id]); if (_failed(status)) exit(status); status = sys$cpu_transitionw(CST$K_CPU_MIGRATE,CST$K_ANY_OWNED_CPU,0, most_busy_id,0,0,0,0,0,0); if (status & 1) { if (getenv ("GCU$BALANCER_VERIFY")) { sprintf(op_msg, "\n\n*****GCU$BALANCER: Reassigned a CPU to instance %li\n", most_busy_id); op_dsc.dsc$w_length = strlen(op_msg); sys$brkthru(0,&op_dsc,&system_dsc,BRK$C_USERNAME,0,0,0,0,0,0,0); } update_cpucount(0); /* Update the CPU count after donating one */ } } } /************************************************************************/ /* IMAGE ENTRY - MAIN */ /* */ /* ENVIRONMENT: OpenVMS Galaxy */ /* INPUTS: None. */ /* OUTPUTS: None. */ /************************************************************************/ int main(int argc, char **argv) { int show_stats = 0; long busy,most_busy,nprocs; int64 delta; unsigned long status,i,j,k,system_cpus,instances; unsigned long arglst = 0; uint64 version_id[2] = {0,1}; uint64 region_id = VA$C_P0; uint64 most_busy_id,cpu_hndl = 0; /* Static descriptors for storing parameters. Must match CLD defs */ $DESCRIPTOR(p1_desc,"P1"); $DESCRIPTOR(p2_desc,"P2"); $DESCRIPTOR(p3_desc,"P3"); $DESCRIPTOR(p4_desc,"P4"); $DESCRIPTOR(stat_desc,"STATISTICS"); /* Dynamic descriptors for retrieving parameter values */ struct dsc$descriptor_d samp_desc = {0,DSC$K_DTYPE_T,DSC$K_CLASS_D,0}; struct dsc$descriptor_d proc_desc = {0,DSC$K_DTYPE_T,DSC$K_CLASS_D,0}; struct dsc$descriptor_d time_desc = {0,DSC$K_DTYPE_T,DSC$K_CLASS_D,0}; struct SYI_ITEM_LIST syi_pagesize_list[3] = { {sizeof (long), SYI$_PAGE_SIZE ,&PAGESIZE ,0}, {sizeof (long), SYI$_GLX_MAX_MEMBERS,&max_instances,0}, {0,0,0,0}}; /* ** num_samples and time_desc determine how often the balancer should check ** to see if any other instance needs more CPUs. num_samples determines the ** number of samples used to calculate the running average, and sleep_dsc ** determines the amount of time between samples. ** ** For example, a sleep_dsc of 30 seconds and a num_samples of 20 means that ** a running average over the last 10 minutes (20 samples * 30 secs) is used ** to balance CPUs. ** ** load_tolerance is the minimum load difference which triggers a CPU ** migration. 100 is equal to 1 process in the computable CPU queue. */ int num_samples; /* Number of samples in running average */ int load_tolerance; /* Minimum load diff to trigger reassignment */ /* Parse the CLI */ /* CONFIGURE VERB */ status = CLI$PRESENT(&p1_desc); /* BALANCER */ if (status != CLI$_PRESENT) exit(status); status = CLI$PRESENT(&p2_desc); /* SAMPLES */ if (status != CLI$_PRESENT) exit(status); status = CLI$PRESENT(&p3_desc); /* PROCESSES */ if (status != CLI$_PRESENT) exit(status); status = CLI$PRESENT(&p4_desc); /* TIME */ if (status != CLI$_PRESENT) exit(status); status = CLI$GET_VALUE(&p2_desc,&samp_desc); if (_failed(status)) exit(status); status = CLI$GET_VALUE(&p3_desc,&proc_desc); if (_failed(status)) exit(status); status = CLI$GET_VALUE(&p4_desc,&time_desc); if (_failed(status)) exit(status); status = CLI$PRESENT(&stat_desc); show_stats = (status == CLI$_PRESENT) ? 1 : 0; num_samples = atoi(samp_desc.dsc$a_pointer); if (num_samples <= 0) num_samples = 3; load_tolerance = (100 * (atoi(proc_desc.dsc$a_pointer))); if (load_tolerance <= 0) load_tolerance = 100; if (show_stats) printf("Args: Samples: %d, Processes: %d, Time: %s\n", num_samples,load_tolerance/100,time_desc.dsc$a_pointer); lockdown(); /* Lock down the cpu_q subroutine */ /* Get the page size and max members for this system */ status = sys$getsyiw(EFN$C_ENF,0,0,&syi_pagesize_list,&g_iosb,0,0); if (_failed(status)) return (status); if (max_instances == 0) max_instances = 1; /* Get our partition ID and initial CPU info */ status = sys$getsyiw(EFN$C_ENF,0,0,&syi_itemlist,&g_iosb,0,0); if (_failed(status)) return (status); /* Map two pages of shared memory */ status = sys$crmpsc_gdzro_64(&gblsec_dsc,version_id,0,PAGESIZE+PAGESIZE, ®ion_id,0,PSL$C_USER,(SEC$M_EXPREG|SEC$M_SYSGBL|SEC$M_SHMGS), &gs_va,&gs_length); if (_failed(status)) exit(status); /* Initialize the pointers into shared memory */ busycnt = (uint64 *) gs_va; heartbeat = (uint64 *) gs_va + max_instances; cpucount = (uint64 *) heartbeat + max_instances; gLocks = (uint64 *) cpucount + max_instances; cpucount[partition_id] = total_cpus; /* Create or map the Galaxy lock table */ status = init_lock_tables(); if (_failed(status)) exit(status); /* Initialize delta time for sleeping */ status = sys$bintim(&time_desc,&delta); if (_failed(status)) exit(status); /* ** Register for CPU migration events. Whenever a CPU is added to ** our active set, the routine "update_cpucount" will fire. */ status = sys$set_system_event(SYSEVT$C_ADD_ACTIVE_CPU, update_cpucount,0,0,SYSEVT$M_REPEAT_NOTIFY,&cpu_hndl); if (_failed(status)) exit(status); /* Force everyone to resync before we do anything */ for (j=0; j most_busy) { most_busy_id = (uint64) i; most_busy = busycnt[i]; } } if (show_stats) printf("Current Load: %3Ld, Busiest Instance: %Ld, Queue Depth: %4d\r", busycnt[partition_id],most_busy_id,(nprocs/100)); /* If someone needs a CPU and we have an extra, dontate it. */ if ((most_busy > busy + load_tolerance) && (cpucount[partition_id] > 1) && (heartbeat[most_busy_id] != HEARTBEAT_TRANSPLANT) && (most_busy_id != partition_id)) { reassign_a_cpu(most_busy_id); } /* Hibernate for a while and do it all again. */ status = sys$schdwk(0,0,&delta,0); if (_failed(status)) exit(status); status = sys$hiber(); if (_failed(status)) exit(status); } while (1); return (1); }