Config: LSF9, AIX 5.3, Oracle 10 (on separate server), 8GB RAM, 16GB paging space
We were primed for a smooth cutover last month from 8.0.3 to LSF9. All testing was solid and clean. However, once we got a lot of people on the system we have some major performance problems. All activity slows down including simple form transfers in LID and RSS usage.
We're using a lot of pflow for requistion approval. Finance and Payroll are LID, Materials and HR are Portal. Many RSS users, ESS is enabled.
While watching topas I see that processor is not taxed, memory usage is always high (not a concern) and paging space usage varies, often getting to 75%. It's when I see the actual paging activity increase when the system slows. Portal and LID app users claim 10-20 second delays for form transfers. I suspect websphere. We're running a cluster but I have one of two nodes turned off. I just applied fix pack 19.
I'm working with our installation consultant to try to figure it out. Any experience and advice would be much appreciated.
Did you upgrade apps as well? One common mistake I've seen with apps upgrade is that the ARRAYBUFSIZE and INSERTBUFSIZE settings get changed in the target product line during the upgrade, and then that becomes the production product line, and the settings never get changed back, making performance horrible.
Thanks for your help. Here is ladb.cfg:
DICTS 15 /* maximum number of open dictionaries */ FILES 3000 /* maximum number of open files */ FOREIGN 500 /* maximum number of foreign servers */ IFILES 450 /* number of open files per foreign server */ LFILES 9 /* number of open files per lafile */ UFILES 450 /* maximum number of open files per user */ USERS 500 /* maximum number of user processes */
latm.cfg
APPLICATIONS 200 /* # of unique programs that can be up at one time */ RUNAPPS 200 /* # of running programs processes at one time */ REMOTEMACHINES 1 /* # of remote machines in the network */ REMOTEPROGRAMS 10 /* # of remote programs for each remote machine */ TCPUSERS 5 /* # of waiting users per tcptm process */ PMUSERS 50 /* # of lapm users */ MINUPTIME 5 /* # of minutes an program stays up after being closed */ MAXUPTIME 10 /* # of minutes an program can sit idle */ OPENTIMEOUT 240 /* # of secs programs have to start or read msg before assumed dead */ LOOPTIMEOUT 13 /* # of mins programs have to execute before assumed looping */ QUEATTEMPTS 5 /* # of times a full que is checked before full status is accepted */ WAKEUPINTERVAL 1 /* interval in minutes that lapm does housekeeping */ TIMESTAMP LONG /* short or long latm log time stamp format */ DEBUG OFF /* whether to turn on debugging at startup */ USELATM ON /* whether to set USELATM file at startup */
WebSphere JVM is set to min 512, max 1024, 2 cluster members like this but only one running right now.
We have probably 40 LID users (15-25 at any one time) and hundreds of portal users including about 20 app (materials and HR) and 300 RSS users(maybe 30-40 on at any one time). Also ESS but that is not in the typical Portal but a home-made frameset that calls the htm and js files.
I do not believe that WebSphere is set to trace or any extra logging. I'll look in to it.
You mentioned that you have upgraded to fixpack 19 on webpshere. When you deployed the IOS websphere app, did you uncheck "deploy enterprise bean" checkbox?
When we experienced slowness it was often a result of our ldap threads being used up. By monitoring the threads we were able to track it back to the root of the cause. As such you may want to monitor your ldap threads by using the ldapsearch command.