To understand why this error occurs, you have to start by taking a look at how the Kapow Katalyst software is started. An instance of the Java Service Wrapper is invoked which as the name implies provides a wrapper around the Java Virtual Machine inside of which the Kapow Katalyst software (RoboServer) will run.
This error message is caused by the fact that once the Java Service Wrapper has started up the Java Virtual Machine, the Service Wrapper will periodically send a PING request to the JVM:
Java Wrapper Service <-----> PING REQUEST <-----> JVM (RoboServer and robots it is executing run inside of the JVM)
If the JVM does NOT respond within a certain amount of time to the PING, the Java Wrapper Service will think the JVM is "hung" and initiate a RESTART of the JVM. Naturally, that is a bad thing because at a minimum there is one instance of RoboServer running inside of that JVM which will in effect be "blown away".
If the JVM that is restarted ALSO has the Management Console running inside of it, then the Management Console itself is also shut down. If the instance of RoboServer that is restarted was started using the command: RoboServer -service socket:50000 -MC then this means that if the RoboServer running Port 50000 is shown as restarting, then that means the Management Console was restarted, too. Neither of these things is good.
BUT what is interesting is that the fact that this happened does NOT necessarily mean that the RoboServer was "hung" or having any problem at all. The JVM may not be able to respond to the incoming PING REQUEST if the RoboServer running inside of it is running at a sustained, high level of CPU utilization. If there is a combination of robots running at a given time that result in a sustained level of CPU utilization, this can prevent the JVM from being able to respond quickly enough to the PING REQUEST from the Wrapper Service, but the robots themselves may be running and processing data as expected. They are simply using a higher, sustained level of CPU resources.
The real reason for the observed behavior is the JVM is under a heavy load which normally manifests itself in high CPU utilization. This can often be observed using the Management Console's DASHBOARD view while robots are running.
If your application server will support it, it may be worth considering (if you are not already doing this) to have the Management Console start up in its own JVM. By doing that, even if an instance of RoboServer were to experience a problem, the Management Console could continue to process schedules and distribute robots to the remaining RoboServer(s) that are defined as part of your Management Console configuration.
In most cases when a client experiences this type of error on a regular basis, it may be an indication that it is time to think about things such as:
- Adding another RoboServer instance to the Management Console's Cluster configuration.
- Reducing the number of concurrent robots on each RoboServer instance. Running fewer robots will allow more CPU time and memory to be available for those robots that are running as well as for the JVM to respond to incoming Java Wrapper Service PING requests.
Starting in version 9.2, the maximum number of concurrent robots is configured in the Management Console's cluster settings. In earlier versions, it was controlled by the -maxConcurrentRobots command-line option.
There is more information on configuring RoboServer for versions 9.5 and later in the help documents can be found in the "documentation" folder located in the Kapow installation folder.
Please Note that it is possible to turn OFF the Java Wrapper Service PING timeout test. Doing that will keep the system from restarting if the JVM cannot respond to the Wrapper Service, but it really does nothing to address WHY the system is restarting.
You can add a configuration to the RoboServer.conf file that is called "wrapper.ping.timeout" and if you set that value to zero (the default is 30 seconds if no value is explicitly defined), then there will no timeout test performed. We do not recommend doing this. Instead, we would recommend increasing the value to something like 60 or 90 seconds to see if that prevents the Java Wrapper Service from restarting the underlying Java Virtual Machine simply because the JVM is too busy to respond to the Wrapper Service Ping Request.
Please see this URL for more information: http://wrapper.tanukisoftware.com/doc/english/prop- ping-timeout.html
Again, we want to stress while this will keep the server from restarting, if it is the case that the RoboServer is getting close to it's capacity in that the robots running on it need more CPU resources than are available to that instance of RoboServer, then disabling that timeout test is only supressing the symptom. If additional new robot traffic is introduced, it will likely result in other observed performance related issues.
Normally, these types of issues are resolved by reviewing the current configuration of the RoboServer instance(s) across which the robots that are scheduled are executed. Consider reducing the load on each server by lowering the number of concurrent robots on the RoboServer(s) as noted above.
If you turn on the Java Wrapper Services' DEBUG logging, you can see the actual PING requests going back and forth up to the point where the JVM fails to respond.
To enable logging please change RoboServer.conf to contain this:
Make sure the folder C:\logs is created and the user executing RoboServer has write permission in it. Linux users will need to use a path appropriate to their system.
Afterwards, the Java Wrapper Service will log the wrapper error output to WrapperDebug_RoboServer.log file. When the log file gets up to 50 megabytes in size, it will be renamed as WrapperDebug_RoboServer_1.log and logging will continue in a new, empty WrapperDebug_RoboServer.log file. Additional log files will be "rolled" according to this scheme until WrapperDebug_RoboServer_9.log is the oldest. After that, the next time the wrapper log file reaches 50 megabytes, the oldest log file will be deleted. If the RoboServer process is restarted, the log file will also be rolled. Different values can be specified for the maxsize and maxfiles settings to change this log rolling behavior.
Although it shouldn't be necessary for the issue discussed here, it's possible to log RQL messages by applying these wrapper.logfile settings and then starting RoboServer with the -verbose flag.
Often, if a client correlates the CPU usage graphs on the Management Console's Dashboard View to those times when the JVM is logged as possibly being "hung", the CPU usage is logged at a high usage rate for a sustained period of time.
In some cases, another factor slowing down the computer can cause this issue. Antivirus software or other background processes can sometimes slow down the system enough to the cause this issue. In rare cases, issues with the computer hardware (or virtualized infrastructure) can also cause it. Checking for other causes of system slowness is a good idea.
It's also recommended to set up the Java Wrapper to take a heap dump in case of hang or out of memory.
To configure this, add the following 5 lines to the RoboServer.conf :
The first two lines extend the time the wrapper waits before restarting when an error is detected; the default is 15 seconds and this timeout might cause the generated heap dump to be incomplete.
The three last lines add arguments to the jvm startup instructing it to dump the heap to specified directory on out of memory.
Make sure the folder C:\WrapperHeapDump is created and the user executing RoboServer has write permission in it (you can use a different location). Linux users will need to use a path appropriate to their system.
In common.conf one of the last lines reads "wrapper.filter.action.1000=RESTART" - extend this such that it reads
When out of memory occurs, it will not only restart the RoboServer but also make a thread dump.
Keywords: Kapow, Robotic Process Automation, Kapow 8.0, 8.1, 8.2, Kapow 8.3.x, Kapow 9.0.x, Kapow 9.1.x, Kapow 9.2.x, Kapow 9.3.x, Kapow 9.4.x, Article 12451, Java, JVM hung, RoboServer, wrapper, Timed out