SPR - Softdoc T0403V02^AAF\OUTLEN 80 SOFTWARE RELEASE DOCUMENT Product Name: ASAP SRVR EXTENSION SPR ID: T0403V02^AAF DATE: 25MAY2005 Copyright Notice: HP CONFIDENTIAL: NEED TO KNOW ONLY Copyright 2005 Hewlett-Packard Development Company, L.P. Protected as an unpublished work. The computer program listings, specifications and documentation herein are the property of Compaq Computer Corporation and successor entities such as Hewlett-Packard Development Company, L.P., or a third party supplier and shall not be reproduced, copied, disclosed, or used in whole or in part for any reason without the prior express written permission of Hewlett-Packard Development Company, L.P. Summary: This SPR corrects a couple of minor problems with reporting of the operational state for a domain during its first ASAP interval. [AAE] This PVU corrects two obscure problems. [AAD] This PVU introduces support for super-scalar ZLE systems with tens of thousands of monitored domains. It also corrects problems with duplicate domains in the domain table, excessive opens against the domain table, ENFORM's inability to read attributes defined as TYPEDATA REAL64 and constant restarting of failed ASAPXMON processes. This PVU also provides improved startup time and better aggregation of certain ASAPX built-in attributes. Finally, the ASAP_REGISTER_ procedure now returns error -6 when statistics collection has been disabled. [AAC] This SPR corrects problems with certain compilers returning a syntax error on asapx.h, transaction error 97 when using very short timeout values on ASAP_REGISTER_, and reporting of file errors when domains are removed from ASAPX. [AAB] This SPR corrects problems with certain compilers returning a syntax error on asapx.h, corruption of ASCII text when supplied as a replace value for constant DATAITEMs, incorrect display of Status "Up" when the state is down, incorrect display of Status text and state when hundreds of aggregate domains are created in a CPU, up to 3-minute application hangs on ASAP_REGISTER_ procedure calls, the ASAPXSGP process failing to open the ASAP Collector on nodes numbered 0, and constant restarts of ASAPXMON processes in very large environments with downstream queuing. [AAA] This IPM corrects problems with: EMS event generation for Status attribute changes; an unexpected limit on the number of user-defined entities; calcuation of Busy percentages for the 1st interval; computing elapsed time correctly when SET TEST is on and samples are missed; rogue domain removal; and computation of the DownTime, Avail and UnAvail built-in attributes. [V02] The Compaq ASAP Extension (ASAPX) enables customer applications for monitoring by the Availability, Stats and Performance (ASAP) product. ASAPX provides a direct view into an application so a customer can always know if an application is meeting its service level objectives. ASAPX executes on D42.00/G03.00 onward. Superseded SPRs: T0403AAA T0403AAB T0403AAC T0403AAD T0403AAE Requisite SPRs: T0402V02^AAF ASAP SERVER Required Hardware: NonStop K-series servers NonStop S-series servers Communications controller allowing file transfer from PC to NonStop server. PC with 486 or higher processor and all of the following: CD-ROM drive VGA or higher-resolution video adapter and monitor Microsoft Mouse or similar pointing device Required Firmware: NONE Required Software: T0402V02 ASAP SERVER T9050D42 NONSTOP KERNEL if using D4x.xx T9050G06 NONSTOP KERNEL if using G0x.xx T9550D40 SOCKET LIBRARY T9552D40 TCP/IP FTP See Note 1 Note 1: Microsoft Windows 95, or Windows NT 4.0 Microsoft TCP/IP stack and FTP client Manuals: 425265-001 ASAP Extension Manual 425265-002 ASAP Extension Manual 425265-003 ASAP Extension Manual Installation Considerations: Subsystem interruption required Installation Instructions: - Perform these pre-installation steps: 1. If you are migrating from release 1 of ASAPX, read the comments and examples on release 1 to release 2 migration in the ASAP Extension Manual and perform the tasks described there. 2. Ensure that requirements for using this SPR are met (see Req't PVUs, Req'd HW, and Req'd SW). 3. Review the IPSetup User Guide (file USRGUIDE.PDF in the NSK_SW subdirectory on the ASAPX CD). IPSetup is provided on the CD to facilitate installation of Independent Products. Use Adobe Acrobat Reader (available on the CD) to read or print the IPSetup User Guide. 4. Decide whether you will use DSM/SCM to move the SPR's files to the Installation Subvolumes (ISVs) after they are placed on the NonStop server. Using DSM/SCM is optional, but recommended when DSM/SCM is available. - Perform these installation-related steps: 1. Run IPSetup to place (move files from the CD to the NonStop server) and optionally install (move files to correct ISVs) this SPR. If TCP/IP and FTP are unavailable, or if you have problems with automatic file placement, follow the instructions in the IPSetup User's Guide section "Manual Software Placement Using the IPSetup TACL Program" to manually place the files. 2. If you are using DSM/SCM to install this SPR: - COPY the SPR to a new revision of the software configuration you want to update. - BUILD and APPLY the configuration revision. - Run ZPHIRNM to perform the RENAME step. - Perform these post-installation steps: 1. Run ASAP Extension INSTALL to update the ASAPX subsystem: a. TACL 1 > VOLUME $.ZASAPX b. TACL 2 > RUN INSTALL c. Fill in the source and destination subvolumes. To install to a remote node, fully qualify the destination subvolume name. For example, \remote.$system.system New Features: 1. [AAD] ASPAX ZLE. ASAPX now provides support for super-scalar ZLE systems with tens of thousands of monitored domains. The domain per CPU limit is now user-defined with a new upper limit of 10,000 domains per CPU. Use the new SET MAXDOMAINS configuration command to increase the 1,024 domain per CPU default. The aggregate domain per CPU limit is now user-defined with a new upper limit of 10,000 aggregate domains per CPU. Use the new SET MAXAGGREGATES congiguration command to increase the 1,024 aggregate domain per CPU default. The SET AGGREGATE ONLY option is made into a viable method to reduce disk I/O to the historical database. A new MEMORY option on the APP command allows ASAP to retrieve detail domain data directly from ASAP memory. This lets a user write only aggregate records to the database, but still view detail records when domains alert. Set the following as the EDL DETAIL command in all user-defined EDL files: DETAIL "APP ^, DE, ST, MEM, MIN, COUNT 20" When the command defined by the EDL DETAIL property string is set as shown above, a user can right-mouse on an aggregate domain that is alerting and then choose "Show Object Details" to see alerting detail domains. Next, the user is given complete control over the content of aggregate domain records with the addition of aggregation syntax to the EDL MetricRule formula. The new MetricRule syntax is: is: # [SUM|AVG|MIN|MAX|CNT] (default SUM) C S [SUM|AVG] (default varies) is: +, -, / or * - arithmetic operators Examples: METRICRULE "#0 SUM/S SUM" METRICRULE "#0 MIN/S AVG" METRICRULE "#0 MIN" METRICRULE "C100*((#0 MIN + #1 MAX)/#2 AVG)" Finally, because the APP command's AGGREGATEONLY option is more frequently used, it has been abbreviated to "AGGONLY" and "AO" in addition to the complete spelling. In summary, ASAPX ZLE lets a user monitor tens of thousands of detail domains while reducing disk I/O to the historical database by writing only aggregate records to disk. Please see the new version of the ASAP Extension Manual that ships with this release for more information on the ASAPX ZLE enhancement. (SOLN 10-030822-9092) 2. [V02] ASAPX supports 1024 domains per processor and a maximum of approximately 90 user-defined application entities. (SOLN NO SOLN) 3. [V02] The NAK and UNAVAIL metrics from Release 1 have been added to a set of 11 optional metrics that ASAPX will compute on behalf of the application. Each optional metric, except Registration Time, also has an associated OEM state. The built-in metrics are: Nak Negative acknowledgement counter. UnAvail The unavailability of the registering process during the interval. Cpu The primary processor of the registering process. Pri The priority of the registering process. Busy Percent CPU busy for the registering process. PState The process state of the registering process. WState The wait state of the registering process. RegTime The registration date and time the domain was registered. DownTime Total downtime (in seconds) for the registering process. Avail The overall availability of the registering process. Version The application version supplied to ASAP_REGISTER_, or a default version supplied by ASAP. (SOLN NO SOLN) 4. [V02] ASAPX now computes and displays aggregate data for any of the 4 possible levels within a domain name. For example if Atm\West\SanFran\Loc23 is a typical domain then ASAPX can compute aggregate totals for Atm, Atm\West and Atm\West\Sanfran. (SOLN NO SOLN) 5. [V02] Each ASAPXMON process can now be configured to write detail records directly to the ASAP Collector, bypassing the ASAPXSGP process. Note the ASAP ID must not exceed 2 characters if the ASAP Collector resides on a remote node. For example, change $ZOO to $ZO. (SOLN NO SOLN) 6. [V02] Operational Status and State. Each ASAPX Entity now includes the operational status and state of the process that registered the domain. Both the 15-byte operational status and the resulting OEM state can also be set directly by an application using a new ASAPX API procedure called ASAP_OPSTATE_. (SOLN NO SOLN) 7. [V02] The ASAPX host-based objectives concept has been extended to all components of ASAP Server as Discrete Object Thresholds (DOTs). The ACTIVATE, ADD, ALTER, ASSUME, COMMIT, DEACTIVATE, DELETE, INFO, LIST, OPEN and REMOVE commands have been replaced by the ASAP MONITOR and RANK commands. Many ASAPX SET commands have also been deleted or replaced with ASAP SET commands. Due to the DOTs conversion the objectives database in the Release 1 version of ASAPX is obsolete. All objectives from Release 1 databases should be saved to an edit file using the ASAPX INFO, OBEYFORM command. They can then be edited into ASAP RANK commands. (SOLN NO SOLN) 8. [V02] ASAPX now supports multiple application definitions using the ASAP Entity Definition Language. Each EDL entity has its own set of DataItem and Metric(EDL Attribute) definitions. (SOLN NO SOLN) 9. [V02] Use of D-Series Enhancements: Can be created by a high PIN process Y D42.00 Can communicate with a high PIN requester Y D42.00 Can communicate with a high PIN server Y D42.00 Can create a high PIN process Y D42.00 Can run at a high PIN Y D42.00 Defaults to run at a high PIN N (HIGHPIN on) Has a new (since C30 FCS) maximum of OPENS Y D42.00 on a device, process name, or subdevice Issues new (since C30 FCS) operator messages Y D42.00 Issues new (since C30 FCS) tokens Y D42.00 Recognizes high PIN process IDs Y D42.00 Recognizes new (since C30 FCS) tokens Y D42.00 Recognizes remote 8-character Y D42.00 (including $) device/volume/file names Recognizes remote 6-character Y D42.00 (including $) process names Recognizes variable-length-string Y D42.00 file names (No SOLN) Problems Corrected: 1. [V02^AAF] SYMPTOMS: ASAP Extension would override the OpState value to the unranked value of 1 for the first statistics gathering interval. CONDITIONS: If the application called ASAP_OPSTATE_ to set the operational status and state during the first interval after it registered with ASAP. LIKELIHOOD: Likely (SOLN 10-050111-3469) 2. [V02^AAF] SYMPTOMS: ASAPX reported a domain as down for the first interval, instead of deactivated. CONDITIONS: If the application registered a domain, called ASAP_CONTROL_ to deactivate it and then the application stopped before ASAP had processed the first sample interval. LIKELIHOOD: Likely FIX-INFO: ASAPX now recognizes the ASAP_CONTROL_ deactivation for the 1st interval. (SOLN 10-050404-6302) 3. [AAE] SYMPTOMS: Error 10 messages due to ASAPXMON writing a duplicate record to the ASAP collector. CONDITIONS: When the exact number of detail domains happened to be the exact number that would fit into a reply buffer, and when aggregate domains were also being computed. LIKELIHOOD: Unlikely FIX-INFO: ASAPX no longer repeats the last detail domain in the aggregate domain buffer. (SOLN 10-040412-5020) 4. [AAE] SYMPTOMS: Duplicate domains in domain table. CONDITIONS: When a process registered multiple domains then its CPU failed and it restarted in another CPU but did not re-register all its domains before the original CPU was reloaded, then ASAP was restarted before all the original domains were re-registered in the second CPU. LIKELIHOOD: Unlikely FIX-INFO: ASAPXMON no longer does an automatic cleanup of all domains when it finds a process is running in another processor at restart time. (SOLN 10-040412-5021) 5. [AAD] SYMPTOMS: ASAP_REGISTER_ allows registrations even though statistics collection has been disabled. CONDITIONS: When the ASAPX DISABLE STATS command is entered. LIKELIHOOD: Certain FIX-INFO: ASAP_REGISTER_ now returns error -6 when statistics collection has been disabled. (SOLN 10-030912-9607) 6. [AAD] SYMPTOMS: Slower performance. CONDITIONS: When large domain populations are registered. LIKELIHOOD: Certain FIX-INFO: ASAPX modules are now accelerated. (SOLN 10-030822-9091) 7. [AAD] SYMPTOMS: Cannot access ASAPX from PIC applications. CONDITIONS: On G06.20 and beyond when running PIC applications. LIKELIHOOD: Certain FIX-INFO: ASAPX now provides a DLL for the API library (ASAPXDLL) for use with G06.20 and beyond. HP recommends use of the ASAPXDLL instead of the older libraries. (SOLN 10-030822-9093) 8. [AAD] SYMPTOMS: ENFORM cannot read attributes defined as floating point (REAL64). CONDITIONS: When attempting to read the ASAP database using ENFORM. LIKELIHOOD: Certain FIX-INFO: ASAPX now supports implied decimal points so attributes can be defined as TYPEDATA INT64 and still output decimal values using the fixed format option, for example FORMAT "Fx.y". ASAPX stores the value as FIXED(y) in the APP record. (SOLN 10-030822-9096) 9. [AAD] SYMPTOMS: Meaningless aggregation for CPU, Pri, PState, WState and Version built-in attributes. CONDITIONS: When SET AGGREGATE is defined for an entity and the entity uses one of the aforementioned attributes. LIKELIHOOD: Certain FIX-INFO: ASAPX now propagates the value associated with the worst state for each of these attributes. For example, if there are 10 domains that make up the aggregate and one of the domains has a critical alert on the CPU attribute, then that CPU attribute value and state will be propagated to the aggregate record. (SOLN 10-030822-9090) 10. [AAD] SYMPTOMS: Constant restart of failed ASAPXMON processes. CONDITIONS: When ASAPXMON fails due to configuration or resource problems. LIKELIHOOD: Likely FIX-INFO: ASAPXMON restarts are throttled to once per interval until 3 failures when it becomes the maximum of once per interval or once per hour. (SOLN 10-030822-9089) 11. [AAD] SYMPTOMS: ASAPXMON startup time is noticeable. CONDITIONS: When many domains are defined. LIKELIHOOD: Likely FIX-INFO: ASAPXMON restart time is dramatically improved by improvements to its list insertion algorithm. (SOLN 10-030822-9088) 12. [AAD] SYMPTOMS: Excessive opens against domain table. CONDITIONS: When many domains are in the down state. LIKELIHOOD: Likely FIX-INFO: ASAPX now keeps the domain table open until the end of the statistics interval if it opens the table during the interval to check the ownership of a down domain. (SOLN 10-030822-9087) 13. [AAD] SYMPTOMS: Duplicate domains in domain table. CONDITIONS: When access errors prevent ASAPX from updating the domain table for a domain remove operation. LIKELIHOOD: Likely FIX-INFO: ASAPX no longer removes a domain from memory if it cannot first remove it from the domain table. It will re-try failed removals at each interval until successful. (SOLN 10-030822-6067) 14. [AAC] SYMPTOMS: Duplicate domains in the domain table. CONDITIONS: NONE LIKELIHOOD: Unlikely FIX-INFO: ASAPX now reports an EMS event when it encounters a file error while attempting to remove a domain from the domain table. (SOLN NO SOLN) 15. [AAC] SYMPTOMS: File error 97 returned for ENDTRANSACTION. CONDITIONS: When ASAP_REGISTER_ is called from within a user transaction with a very short timeout value. LIKELIHOOD: Likely FIX-INFO: ASAPX no longer participates in the application transaction so canceled I/O operations to ASAPXMON will not result in error 97. (SOLN 10-021205-6906) 16. [AAC] SYMPTOMS: Compiler errors when referencing asapx.h from C++. CONDITIONS: Dependent upon the C++ compiler. LIKELIHOOD: Likely WORKAROUNDS: Edit $SYSTEM.SYSTEM.ASAPXH and change parenthesis to braces for C++ and add missing #endif statement. (SOLN 10-021002-2120) 17. [AAB] SYMPTOMS: Unexplained restarts of ASAPXMON processes and/or missed aggregate sample data. CONDITIONS: Can occur on very busy systems with thousands of application domains. LIKELIHOOD: Likely WORKAROUNDS: Run multiple ASAP Collectors and partition the ASAP database to reduce downstream queueing on ASAP Collector and/or database disk devices. FIX-INFO: ASAPXSGP now waits longer for data from ASAPXMON processes. Its wait time is now a function of the ASAP rate instead of a fixed value. (SOLN 10-020410-8084) 18. [AAB] SYMPTOMS: File error 18 when opening the ASAP Collector. CONDITIONS: Occurs when the ASAP Collector resided on a node numbered 0. LIKELIHOOD: Certain (SOLN 10-020404-7917) 19. [AAB] CAVEAT: This change just provides a method to minimize the impact of this problem. SYMPTOMS: Up to 3-minute application hangs when calling ASAP_REGISTER_. CONDITIONS: Can occur when ASAPX is running in "write-to-collector" mode on very busy systems with downstream queueing on ASAP Collector(s) and/or disk processes. LIKELIHOOD: Somewhat likely FIX-INFO: The TIMEOUT parameter used by ASAP_REGISTER_ to timeout its registration request to ASAPXMON has been externalized so callers can now set a shorter timeout value to reduce the impact of this problem on applications. The default timeout value is 3 minutes. See Known Problems Remaining #2. (SOLN 10-020222-6886) 20. [AAB] SYMPTOMS: Hanging opens after calling ASAP_REGISTER_ with an error return code. CONDITIONS: Occurs on I/O errors between ASAPXMON and ASAP_REGISTER_. LIKELIHOOD: Certain (SOLN 10-020212-6625) 21. [AAB] SYMPTOMS: Application domains shown as down when they are actually up. CONDITIONS: Occurs when hundreds of aggregate domains are created in a single CPU. LIKELIHOOD: Likely (SOLN 10-020212-6619) 22. [AAB] SYMPTOMS: Status text of "Up" shown on ASAP Client for a domain that is actually down. CONDITIONS: Occurs when processes set their own operational status and state using ASAP_OPSTATE_, and the registering process fails after setting the status text to "Up". LIKELIHOOD: Certain FIX-INFO: ASAPX now overrides both the Status text and state for a domain after the registering process fails if the Status text is "Up". If the text is anything else, only the state is overridden. (SOLN 10-020207-6537) 23. [AAB] CAVEAT: This change just provides a method to avoid the problem. SYMPTOMS: Corrupted ASCII text when supplied as the replace value (math 1) for a constant dataitem. CONDITIONS: NONE LIKELIHOOD: Certain FIX-INFO: Math 2 (replace text) was added as a workaround to this problem. Use math 2 to replace a constant dataitem with 8 bytes of printable ASCII text. See Known Problems Remaining #1. (SOLN 10-020207-6531) 24. [AAB] SYMPTOMS: Compiler errors when referencing asapx.h. CONDITIONS: Dependent upon the C compiler. LIKELIHOOD: Likely WORKAROUNDS: Edit $SYSTEM.SYSTEM.ASAPXH and remove the extraneous comma from the ASAP_OPSTATE_ procedure definition. (SOLN 10-011213-5384) 25. [AAA] ASAPXMON processes mysteriously stopped and restarted without notification. The 3rd I/O error encountered over any span of time caused the ASAPXSGP process to mistakenly stop and restart an ASAPXMON. The ASAPXSGP process now stops and restarts an ASAPXMON only after 3 consecutive I/O errors. (SOLN 10-011002-3508) 26. [AAA] ASAPX would write duplicate records to the ASAP collector under certain conditions. If several hundred aggregate domains were processed in a single CPU, ASAPX occassionally wrote duplicate records for one or more of the aggregate domains. (SOLN 10-011002-3507) 27. [AAA] ASAPX would incorrectly return a domain name error to callers of the ASAP_REGISTER_ procedure when the domain name being registered contained the name of an ASAP entity at a lower level of domain name. For example, the domain name "Atm\Disk" would produce an error even though it is a valid domain name. (SOLN 10-010919-3206) 28. [AAA] ASAPX computed DownTime incorrectly when a domain re-registered in a different CPU after at least one sample interval had passed since it stopped processing in the original CPU. (SOLN 10-010830-2759) 29. [AAA] ASAPX computed DownTime, Avail and UnAvail incorrectly for the interval where a domain failed. It also failed to preserve the contents of those 3 built-in attributes when a domain stopped and restarted. (SOLN 10-010823-2487) 30. [AAA] The command "MONITOR APP [], REMOVE" failed to remove the specified APP domain, and failed to remove all APP domains when no domain was specified. (SOLN 10-010823-2481) 31. [AAA] Invalid elapsed times were computed when SET TEST ON was specified in the ASAPXCNF configuration file and ASAPX missed at least one sample. The ASAP RATE value was multiplied by 60 seconds, not taking into account the extra time caused by the missed sample. ASAPX now rounds to the nearest minute and takes the difference between samples instead of using the ASAP RATE value. (SOLN 10-010822-2427) 32. [AAA] ASAPX reported invalid Busy percentages for the first interval after a process registered a domain in a CPU. ASAPX now reports 0.00 for the first interval and reports the actual Busy percent for the 2nd sample after establishing its baseline measurement. (SOLN 10-010820-2310) 33. [AAA] ASAPX would not allow definition of more than 11 (approximately) user-defined entities even though the ASAP maximum entity count is much greater. (SOLN 10-010809-2101) 34. [AAA] ASAPX failed to write EMS alerts for Status/operational state changes when configured to do so using the RANK command. For example, the command "RANK ATM, STATUS CRITICAL", did not cause ASAPX to generate a critical event when the Status attribute produced an alert condition. (SOLN 10-010803-1955) 35. [V02] In ASAPX 1.0 the ASAPXMON processes attempted to use shared memory and the domain table to reconstruct the environment after a restart. Under some cases the contents of shared memory are not valid after a restart, which causes ASAPXMON to mistakenly register multiple domains with no name into slot 0. This is known as the 0-domain problem. ASAP 2.0 removes the reliance on shared memory at restart time. ASAPXMON now uses only the contents of the domain table to reconstruct its environment after a restart. (SOLN 10-001229-8182) 36. [V02] The EDL compile process now enforces the following for user-defined EDL files: 1. The Dataitems Entity property must be defined. 2. The MetricRule Attribute property must be defined for all user-defined attributes except ASAPX header and builtin attributes. (SOLN NO SOLN) 37. [V02] The APP command no longer returns invalid records when Entity names exceed 8 characters. (SOLN NO SOLN) 38. [V02] The list insertion algorithm in ASAPXSGP is modified for better performance with large domain populations. (SOLN NO SOLN) 39. [V02] ASAPXMON now outputs the exponent sign correctly in large floating point numbers. (SOLN NO SOLN) 40. [V02] ASAPXMON no longer abends on problems with MONITOR, LIST, DETAIL commands when the length of a dataitem exceeds 20 bytes. (SOLN NO SOLN) 41. [V02] For large domain populations the AGGREGATEONLY option now processes fewer records resulting in improved response time. (SOLN NO SOLN) 42. [V02] To limit data flow to ASAP Client and still show pertinent information, the APP command now supports the MINSTATE and COUNT options. MINSTATE and COUNT are intended for use with EDL DETAIL commands in user-defined entity definitions. (SOLN NO SOLN) Known Problems Remaining: 1. SYMPTOMS: Corrupted ASCII text when supplied as the replace value (math 1) for a constant DataItem. CONDITIONS: NONE LIKELIHOOD: Certain WORKAROUNDS: As of T0403AAB, use math 2 to replace a constant DataItem with 8 bytes of printable ASCII text. As of T0403AAD define text DataItems as type "T". (SOLN 10-020207-6531) 2. SYMPTOMS: Delayed response from ASAP_REGISTER_ call. CONDITIONS: Can only occur when ASAPX is running in "write-to-collector" mode on a very busy system with queueing on the disk process. LIKELIHOOD: Unlikely WORKAROUNDS: As of T0403AAB, use the ASAP_REGISTER_ Timeout parameter to set a lower timeout value for calls to ASAP_REGISTER_. (SOLN 10-020222-6886)