LID and Portal performance problems on LSF9 on AIX

 34 Replies
 0 Subscribed to this topic
 27 Subscribed to this forum
Sort:
Page 1 of 212 > >>
Author
Messages
Mike Schlenk
Veteran Member Send Private Message
Posts: 71
Veteran Member

Config:  LSF9, AIX 5.3, Oracle 10 (on separate server), 8GB RAM, 16GB paging space

We were primed for a smooth cutover last month from 8.0.3 to LSF9.  All testing was solid and clean.  However, once we got a lot of people on the system we have some major performance problems.  All activity slows down including simple form transfers in LID and RSS usage.

We're using a lot of pflow for requistion approval.  Finance and Payroll are LID, Materials and HR are Portal.  Many RSS users, ESS is enabled.

While watching topas I see that processor is not taxed, memory usage is always high (not a concern) and paging space usage varies, often getting to 75%.  It's when I see the actual paging activity increase when the system slows.  Portal and LID app users claim 10-20 second delays for form transfers.  I suspect websphere.  We're running a cluster but I have one of two nodes turned off.  I just applied fix pack 19.

I'm working with our installation consultant to try to figure it out.  Any experience and advice would be much appreciated.

John Henley
Send Private Message
Posts: 3351

Did you upgrade apps as well?  One common mistake I've seen with apps upgrade is that the ARRAYBUFSIZE and INSERTBUFSIZE settings get changed in the target product line during the upgrade, and then that becomes the production product line, and the settings never get changed back, making performance horrible.

Thanks for using the LawsonGuru.com forums!
John
Mike Schlenk
Veteran Member Send Private Message
Posts: 71
Veteran Member
Yes, we upgraded apps. All of this is on new servers. I'm not familiar with this setting. Where is it? (I'll do some digging as well).
Mike Schlenk
Veteran Member Send Private Message
Posts: 71
Veteran Member
We don't have this set at the program level but see on Lawson support that it can be set at the $LAWDIR//ORACLE file. Ours does not have that set.
John Henley
Send Private Message
Posts: 3351
Going back to your topic, you say that LID and Portal are both slow, which I think would rule out Websphere, and I think point to something basic in the environment. Is the slowness just form transfers? How about batch jobs--do they run slow as well? Are you using LAUA security or LS 9.0?
Thanks for using the LawsonGuru.com forums!
John
Mike Schlenk
Veteran Member Send Private Message
Posts: 71
Veteran Member
Batch jobs are mega fast, so is processflow. What I seeing is that websphere processes are doing a ton of paging. I believe this is slowing everything down.

We're using LAUA security.
John Henley
Send Private Message
Posts: 3351
I would look at ladb.cfg and latm.cfg to see if they are sized correctly. Also look at Websphere JVM settings. How many users do you have?
Thanks for using the LawsonGuru.com forums!
John
John Henley
Send Private Message
Posts: 3351
I would also look at logging/trace settings...I have seen this severely affect performance; for instance if the websphere is set to trace...
Thanks for using the LawsonGuru.com forums!
John
Mike Schlenk
Veteran Member Send Private Message
Posts: 71
Veteran Member

Thanks for your help.  Here is ladb.cfg:

DICTS  15  /* maximum number of open dictionaries */
FILES  3000 /* maximum number of open files                       */
FOREIGN  500  /* maximum number of foreign servers                    */
IFILES  450  /* number of open files per foreign server       */
LFILES  9  /* number of open files per lafile                     */
UFILES  450  /* maximum number of open files per user         */
USERS  500  /* maximum number of user processes                   */

latm.cfg
 

APPLICATIONS 200 /* # of unique programs that can be up at one time */
RUNAPPS 200 /* # of running programs processes at one time */
REMOTEMACHINES 1 /* # of remote machines in the network */
REMOTEPROGRAMS 10 /* # of remote programs for each remote machine */
TCPUSERS 5 /* # of waiting users per tcptm process */
PMUSERS 50 /* # of lapm users */
MINUPTIME 5 /* # of minutes an program stays up after being closed */
MAXUPTIME 10 /* # of minutes an program can sit idle */
OPENTIMEOUT 240 /* # of secs programs have to start or read msg before assumed dead */
LOOPTIMEOUT 13 /* # of mins programs have to execute before assumed looping */
QUEATTEMPTS 5 /* # of times a full que is checked before full status is accepted */
WAKEUPINTERVAL 1 /* interval in minutes that lapm does housekeeping */
TIMESTAMP LONG /* short or long latm log time stamp format */
DEBUG OFF /* whether to turn on debugging at startup */
USELATM ON /* whether to set USELATM file at startup */

WebSphere JVM is set to min 512, max 1024, 2 cluster members like this but only one running right now.

We have probably 40 LID users (15-25 at any one time) and hundreds of portal users including about 20 app (materials and HR) and 300 RSS users(maybe 30-40 on at any one time).  Also ESS but that is not in the typical Portal but a home-made frameset that calls the htm and js files.

I do not believe that WebSphere is set to trace or any extra logging.  I'll look in to it. 

John Henley
Send Private Message
Posts: 3351
That looks fairly typical; only setting I would change is PMUSERS to be 500 not 50 (should match USERS in ladb.cfg, and would be more consistent with your user load).

In Websphere, do you have the 'Disable JIT' JVM setting selected or not selected?
Thanks for using the LawsonGuru.com forums!
John
Mike Schlenk
Veteran Member Send Private Message
Posts: 71
Veteran Member
"Disable JIT" is not checked per a lawson document I received on webphere tuning. Verbose Garbage Collection is on, initial heap 512, max heap 1024.
Mike Schlenk
Veteran Member Send Private Message
Posts: 71
Veteran Member
WebSphere webcontainer thread pool settings:
min: 10
max: 50
inactivity timeout: 3500ms

webcontainer session management:
Maximum in-memory session count: 1000
timeout: 30 minutes
Mike Schlenk
Veteran Member Send Private Message
Posts: 71
Veteran Member
Also, we just went to fix pack 19 after fix pack 15 had a core-dump issue with AIX.
John Henley
Send Private Message
Posts: 3351
You might want to try making min thread pool equal to max thread pool, and setting inactivity timeout much higher. That will reduce/eliminate thread creation/destruction...
Thanks for using the LawsonGuru.com forums!
John
Mike Schlenk
Veteran Member Send Private Message
Posts: 71
Veteran Member
We'll look at that. I'm still working with our installation consultant. They're putting some experts together to try to help. I think it's going to be something small. We'll see. Your feedback is much appreciated.
Jimmy Chiu
Veteran Member Send Private Message
Posts: 641
Veteran Member

You mentioned that you have upgraded to fixpack 19 on webpshere. When you deployed the IOS websphere app, did you uncheck "deploy enterprise bean" checkbox?

Mike Schlenk
Veteran Member Send Private Message
Posts: 71
Veteran Member
It was done by the consultant, I'm not sure.
Mike Schlenk
Veteran Member Send Private Message
Posts: 71
Veteran Member
Here's a new discovery,

I know that the pflow processes, specfiically RMI has been known to crash. The resolution for the RMI crashing was to place heap parameters in the pfserv file to manage memory. I did this not only on rmi, but scheduler, pflow and bpm. We decided yeseterday to remove all of these except RMI (which was reduced). We rebooted last night and we're watching it today.

What do you think memory parameters on pflow would do to the processing if they were set too high?
Mike Schlenk
Veteran Member Send Private Message
Posts: 71
Veteran Member
We'll the processflow settings were significant. The paging activity is much, much better. However, I still do not believe we're optmized.

The biggest offender now when it comes to paging is the oradb10 processes. The "ps avg" command shows over 100 oradb processes, many of which occupy more paging space than actual memory. Or ORACLE file doesn't have any of the optional parameters. Perhaps there is tuning here?
Deleted User
New Member Send Private Message
Posts: 0
New Member

When we experienced slowness it was often a result of our ldap threads being used up.  By monitoring the threads we were able to track it back to the root of the cause.  As such you may want to monitor your ldap threads by using the ldapsearch command.

 

 

Mike Schlenk
Veteran Member Send Private Message
Posts: 71
Veteran Member
What method do you use to monitor threads?
Deleted User
New Member Send Private Message
Posts: 0
New Member
I created a shell script to run the following command every couple of seconds:

/opt/IBM/ldap/V6.0/bin/ldapsearch -h xxxx.xxxx.us -p 389 -b cn=monitor -s base objectclass=* |grep available >> /tmp/workerthreadsavail.txt

Mike Schlenk
Veteran Member Send Private Message
Posts: 71
Veteran Member
I'm not that familiar with the ldap commands. What's the xxxx.xxxx.us?
Mike Schlenk
Veteran Member Send Private Message
Posts: 71
Veteran Member
I'm leaning toward the ORACLE file. When I do a "ps avg" I see that the oradb10 files are typically using 105000 of memory. There are 90 of them right now. The oldest ones hardly page at all, the more recent page up to 100000. Is the 105000 too much?
Deleted User
New Member Send Private Message
Posts: 0
New Member
the x's are the host name of your server. You can also check the active threads via your LDAP web console, however, you won't be able to script it to track the threads over the course of a day or whatever timeframe your looking at.
Page 1 of 212 > >>