LID and Portal performance problems on LSF9 on AIX -

Name	Points
Greg Moeller	4184
David Williams	3349
JonA	3291
Kat V	2984
Woozy	1973
Jimmy Chiu	1883
Kwane McNeal	1437
Ragu Raghavan	1375
Roger French	1315
mark.cook	1244

Mike Schlenk

Veteran Member Send Private Message

Posts: 71

1/6/2009 1:27 PM

Config: LSF9, AIX 5.3, Oracle 10 (on separate server), 8GB RAM, 16GB paging space

We were primed for a smooth cutover last month from 8.0.3 to LSF9. All testing was solid and clean. However, once we got a lot of people on the system we have some major performance problems. All activity slows down including simple form transfers in LID and RSS usage.

We're using a lot of pflow for requistion approval. Finance and Payroll are LID, Materials and HR are Portal. Many RSS users, ESS is enabled.

While watching topas I see that processor is not taxed, memory usage is always high (not a concern) and paging space usage varies, often getting to 75%. It's when I see the actual paging activity increase when the system slows. Portal and LID app users claim 10-20 second delays for form transfers. I suspect websphere. We're running a cluster but I have one of two nodes turned off. I just applied fix pack 19.

I'm working with our installation consultant to try to figure it out. Any experience and advice would be much appreciated.

John Henley

Posts: 3351

1/6/2009 2:12 PM

Did you upgrade apps as well? One common mistake I've seen with apps upgrade is that the ARRAYBUFSIZE and INSERTBUFSIZE settings get changed in the target product line during the upgrade, and then that becomes the production product line, and the settings never get changed back, making performance horrible.

Mike Schlenk

Veteran Member Send Private Message

Posts: 71

1/6/2009 2:18 PM

Yes, we upgraded apps. All of this is on new servers. I'm not familiar with this setting. Where is it? (I'll do some digging as well).

Mike Schlenk

Veteran Member Send Private Message

Posts: 71

1/6/2009 2:24 PM

We don't have this set at the program level but see on Lawson support that it can be set at the $LAWDIR//ORACLE file. Ours does not have that set.

John Henley

Posts: 3351

1/6/2009 7:39 PM

Going back to your topic, you say that LID and Portal are both slow, which I think would rule out Websphere, and I think point to something basic in the environment. Is the slowness just form transfers? How about batch jobs--do they run slow as well? Are you using LAUA security or LS 9.0?

Mike Schlenk

Veteran Member Send Private Message

Posts: 71

1/6/2009 8:08 PM

Batch jobs are mega fast, so is processflow. What I seeing is that websphere processes are doing a ton of paging. I believe this is slowing everything down.

We're using LAUA security.

John Henley

Posts: 3351

1/7/2009 3:01 PM

I would look at ladb.cfg and latm.cfg to see if they are sized correctly. Also look at Websphere JVM settings. How many users do you have?

John Henley

Posts: 3351

1/7/2009 4:05 PM

I would also look at logging/trace settings...I have seen this severely affect performance; for instance if the websphere is set to trace...

Mike Schlenk

Veteran Member Send Private Message

Posts: 71

1/7/2009 4:51 PM

Thanks for your help. Here is ladb.cfg:

DICTS  15  /* maximum number of open dictionaries */
FILES  3000 /* maximum number of open files                       */
FOREIGN  500  /* maximum number of foreign servers                    */
IFILES  450  /* number of open files per foreign server       */
LFILES  9  /* number of open files per lafile                     */
UFILES  450  /* maximum number of open files per user         */
USERS  500  /* maximum number of user processes                   */

latm.cfg

APPLICATIONS 200 /* # of unique programs that can be up at one time */
RUNAPPS 200 /* # of running programs processes at one time */
REMOTEMACHINES 1 /* # of remote machines in the network */
REMOTEPROGRAMS 10 /* # of remote programs for each remote machine */
TCPUSERS 5 /* # of waiting users per tcptm process */
PMUSERS 50 /* # of lapm users */
MINUPTIME 5 /* # of minutes an program stays up after being closed */
MAXUPTIME 10 /* # of minutes an program can sit idle */
OPENTIMEOUT 240 /* # of secs programs have to start or read msg before assumed dead */
LOOPTIMEOUT 13 /* # of mins programs have to execute before assumed looping */
QUEATTEMPTS 5 /* # of times a full que is checked before full status is accepted */
WAKEUPINTERVAL 1 /* interval in minutes that lapm does housekeeping */
TIMESTAMP LONG /* short or long latm log time stamp format */
DEBUG OFF /* whether to turn on debugging at startup */
USELATM ON /* whether to set USELATM file at startup */

WebSphere JVM is set to min 512, max 1024, 2 cluster members like this but only one running right now.

We have probably 40 LID users (15-25 at any one time) and hundreds of portal users including about 20 app (materials and HR) and 300 RSS users(maybe 30-40 on at any one time). Also ESS but that is not in the typical Portal but a home-made frameset that calls the htm and js files.

I do not believe that WebSphere is set to trace or any extra logging. I'll look in to it.

John Henley

Posts: 3351

1/7/2009 6:41 PM

That looks fairly typical; only setting I would change is PMUSERS to be 500 not 50 (should match USERS in ladb.cfg, and would be more consistent with your user load).

In Websphere, do you have the 'Disable JIT' JVM setting selected or not selected?

Mike Schlenk

Veteran Member Send Private Message

Posts: 71

1/7/2009 6:48 PM

"Disable JIT" is not checked per a lawson document I received on webphere tuning. Verbose Garbage Collection is on, initial heap 512, max heap 1024.

Mike Schlenk

Veteran Member Send Private Message

Posts: 71

1/7/2009 6:51 PM

WebSphere webcontainer thread pool settings:
min: 10
max: 50
inactivity timeout: 3500ms

webcontainer session management:
Maximum in-memory session count: 1000
timeout: 30 minutes

Mike Schlenk

Veteran Member Send Private Message

Posts: 71

1/7/2009 6:59 PM

Also, we just went to fix pack 19 after fix pack 15 had a core-dump issue with AIX.

John Henley

Posts: 3351

1/7/2009 8:12 PM

You might want to try making min thread pool equal to max thread pool, and setting inactivity timeout much higher. That will reduce/eliminate thread creation/destruction...

Mike Schlenk

Veteran Member Send Private Message

Posts: 71

1/8/2009 1:24 PM

We'll look at that. I'm still working with our installation consultant. They're putting some experts together to try to help. I think it's going to be something small. We'll see. Your feedback is much appreciated.

Jimmy Chiu

Veteran Member Send Private Message

Posts: 641

1/8/2009 9:42 PM

You mentioned that you have upgraded to fixpack 19 on webpshere. When you deployed the IOS websphere app, did you uncheck "deploy enterprise bean" checkbox?

Mike Schlenk

Veteran Member Send Private Message

Posts: 71

1/9/2009 2:01 PM

It was done by the consultant, I'm not sure.

Mike Schlenk

Veteran Member Send Private Message

Posts: 71

1/9/2009 2:05 PM

Here's a new discovery,

I know that the pflow processes, specfiically RMI has been known to crash. The resolution for the RMI crashing was to place heap parameters in the pfserv file to manage memory. I did this not only on rmi, but scheduler, pflow and bpm. We decided yeseterday to remove all of these except RMI (which was reduced). We rebooted last night and we're watching it today.

What do you think memory parameters on pflow would do to the processing if they were set too high?

Mike Schlenk

Veteran Member Send Private Message

Posts: 71

1/13/2009 7:55 PM

We'll the processflow settings were significant. The paging activity is much, much better. However, I still do not believe we're optmized.

The biggest offender now when it comes to paging is the oradb10 processes. The "ps avg" command shows over 100 oradb processes, many of which occupy more paging space than actual memory. Or ORACLE file doesn't have any of the optional parameters. Perhaps there is tuning here?

Deleted User

New Member Send Private Message

Posts: 0

1/13/2009 8:20 PM

When we experienced slowness it was often a result of our ldap threads being used up. By monitoring the threads we were able to track it back to the root of the cause. As such you may want to monitor your ldap threads by using the ldapsearch command.

Mike Schlenk

Veteran Member Send Private Message

Posts: 71

1/13/2009 8:28 PM

What method do you use to monitor threads?

Deleted User

New Member Send Private Message

Posts: 0

1/13/2009 8:55 PM

I created a shell script to run the following command every couple of seconds:

/opt/IBM/ldap/V6.0/bin/ldapsearch -h xxxx.xxxx.us -p 389 -b cn=monitor -s base objectclass=* |grep available >> /tmp/workerthreadsavail.txt

Mike Schlenk

Veteran Member Send Private Message

Posts: 71

1/13/2009 9:01 PM

I'm not that familiar with the ldap commands. What's the xxxx.xxxx.us?

Mike Schlenk

Veteran Member Send Private Message

Posts: 71

1/13/2009 9:22 PM

I'm leaning toward the ORACLE file. When I do a "ps avg" I see that the oradb10 files are typically using 105000 of memory. There are 90 of them right now. The oldest ones hardly page at all, the more recent page up to 100000. Is the 105000 too much?

Deleted User

New Member Send Private Message

Posts: 0

1/13/2009 10:14 PM

the x's are the host name of your server. You can also check the active threads via your LDAP web console, however, you won't be able to script it to track the threads over the course of a day or whatever timeframe your looking at.