LID and Portal performance problems on LSF9 on AIX

 34 Replies
 0 Subscribed to this topic
 27 Subscribed to this forum
Sort:
Page 1 of 212 > >>
Author
Messages
Mike Schlenk
Veteran Member
Posts: 71
Veteran Member

    Config:  LSF9, AIX 5.3, Oracle 10 (on separate server), 8GB RAM, 16GB paging space

    We were primed for a smooth cutover last month from 8.0.3 to LSF9.  All testing was solid and clean.  However, once we got a lot of people on the system we have some major performance problems.  All activity slows down including simple form transfers in LID and RSS usage.

    We're using a lot of pflow for requistion approval.  Finance and Payroll are LID, Materials and HR are Portal.  Many RSS users, ESS is enabled.

    While watching topas I see that processor is not taxed, memory usage is always high (not a concern) and paging space usage varies, often getting to 75%.  It's when I see the actual paging activity increase when the system slows.  Portal and LID app users claim 10-20 second delays for form transfers.  I suspect websphere.  We're running a cluster but I have one of two nodes turned off.  I just applied fix pack 19.

    I'm working with our installation consultant to try to figure it out.  Any experience and advice would be much appreciated.

    John Henley
    Posts: 3353

      Did you upgrade apps as well?  One common mistake I've seen with apps upgrade is that the ARRAYBUFSIZE and INSERTBUFSIZE settings get changed in the target product line during the upgrade, and then that becomes the production product line, and the settings never get changed back, making performance horrible.

      Thanks for using the LawsonGuru.com forums!
      John
      Mike Schlenk
      Veteran Member
      Posts: 71
      Veteran Member
        Yes, we upgraded apps. All of this is on new servers. I'm not familiar with this setting. Where is it? (I'll do some digging as well).
        Mike Schlenk
        Veteran Member
        Posts: 71
        Veteran Member
          We don't have this set at the program level but see on Lawson support that it can be set at the $LAWDIR//ORACLE file. Ours does not have that set.
          John Henley
          Posts: 3353
            Going back to your topic, you say that LID and Portal are both slow, which I think would rule out Websphere, and I think point to something basic in the environment. Is the slowness just form transfers? How about batch jobs--do they run slow as well? Are you using LAUA security or LS 9.0?
            Thanks for using the LawsonGuru.com forums!
            John
            Mike Schlenk
            Veteran Member
            Posts: 71
            Veteran Member
              Batch jobs are mega fast, so is processflow. What I seeing is that websphere processes are doing a ton of paging. I believe this is slowing everything down.

              We're using LAUA security.
              John Henley
              Posts: 3353
                I would look at ladb.cfg and latm.cfg to see if they are sized correctly. Also look at Websphere JVM settings. How many users do you have?
                Thanks for using the LawsonGuru.com forums!
                John
                John Henley
                Posts: 3353
                  I would also look at logging/trace settings...I have seen this severely affect performance; for instance if the websphere is set to trace...
                  Thanks for using the LawsonGuru.com forums!
                  John
                  Mike Schlenk
                  Veteran Member
                  Posts: 71
                  Veteran Member

                    Thanks for your help.  Here is ladb.cfg:

                    DICTS  15  /* maximum number of open dictionaries */
                    FILES  3000 /* maximum number of open files                       */
                    FOREIGN  500  /* maximum number of foreign servers                    */
                    IFILES  450  /* number of open files per foreign server       */
                    LFILES  9  /* number of open files per lafile                     */
                    UFILES  450  /* maximum number of open files per user         */
                    USERS  500  /* maximum number of user processes                   */

                    latm.cfg
                     

                    APPLICATIONS 200 /* # of unique programs that can be up at one time */
                    RUNAPPS 200 /* # of running programs processes at one time */
                    REMOTEMACHINES 1 /* # of remote machines in the network */
                    REMOTEPROGRAMS 10 /* # of remote programs for each remote machine */
                    TCPUSERS 5 /* # of waiting users per tcptm process */
                    PMUSERS 50 /* # of lapm users */
                    MINUPTIME 5 /* # of minutes an program stays up after being closed */
                    MAXUPTIME 10 /* # of minutes an program can sit idle */
                    OPENTIMEOUT 240 /* # of secs programs have to start or read msg before assumed dead */
                    LOOPTIMEOUT 13 /* # of mins programs have to execute before assumed looping */
                    QUEATTEMPTS 5 /* # of times a full que is checked before full status is accepted */
                    WAKEUPINTERVAL 1 /* interval in minutes that lapm does housekeeping */
                    TIMESTAMP LONG /* short or long latm log time stamp format */
                    DEBUG OFF /* whether to turn on debugging at startup */
                    USELATM ON /* whether to set USELATM file at startup */

                    WebSphere JVM is set to min 512, max 1024, 2 cluster members like this but only one running right now.

                    We have probably 40 LID users (15-25 at any one time) and hundreds of portal users including about 20 app (materials and HR) and 300 RSS users(maybe 30-40 on at any one time).  Also ESS but that is not in the typical Portal but a home-made frameset that calls the htm and js files.

                    I do not believe that WebSphere is set to trace or any extra logging.  I'll look in to it. 

                    John Henley
                    Posts: 3353
                      That looks fairly typical; only setting I would change is PMUSERS to be 500 not 50 (should match USERS in ladb.cfg, and would be more consistent with your user load).

                      In Websphere, do you have the 'Disable JIT' JVM setting selected or not selected?
                      Thanks for using the LawsonGuru.com forums!
                      John
                      Mike Schlenk
                      Veteran Member
                      Posts: 71
                      Veteran Member
                        "Disable JIT" is not checked per a lawson document I received on webphere tuning. Verbose Garbage Collection is on, initial heap 512, max heap 1024.
                        Mike Schlenk
                        Veteran Member
                        Posts: 71
                        Veteran Member
                          WebSphere webcontainer thread pool settings:
                          min: 10
                          max: 50
                          inactivity timeout: 3500ms

                          webcontainer session management:
                          Maximum in-memory session count: 1000
                          timeout: 30 minutes
                          Mike Schlenk
                          Veteran Member
                          Posts: 71
                          Veteran Member
                            Also, we just went to fix pack 19 after fix pack 15 had a core-dump issue with AIX.
                            John Henley
                            Posts: 3353
                              You might want to try making min thread pool equal to max thread pool, and setting inactivity timeout much higher. That will reduce/eliminate thread creation/destruction...
                              Thanks for using the LawsonGuru.com forums!
                              John
                              Mike Schlenk
                              Veteran Member
                              Posts: 71
                              Veteran Member
                                We'll look at that. I'm still working with our installation consultant. They're putting some experts together to try to help. I think it's going to be something small. We'll see. Your feedback is much appreciated.
                                Jimmy Chiu
                                Veteran Member
                                Posts: 641
                                Veteran Member

                                  You mentioned that you have upgraded to fixpack 19 on webpshere. When you deployed the IOS websphere app, did you uncheck "deploy enterprise bean" checkbox?

                                  Mike Schlenk
                                  Veteran Member
                                  Posts: 71
                                  Veteran Member
                                    It was done by the consultant, I'm not sure.
                                    Mike Schlenk
                                    Veteran Member
                                    Posts: 71
                                    Veteran Member
                                      Here's a new discovery,

                                      I know that the pflow processes, specfiically RMI has been known to crash. The resolution for the RMI crashing was to place heap parameters in the pfserv file to manage memory. I did this not only on rmi, but scheduler, pflow and bpm. We decided yeseterday to remove all of these except RMI (which was reduced). We rebooted last night and we're watching it today.

                                      What do you think memory parameters on pflow would do to the processing if they were set too high?
                                      Mike Schlenk
                                      Veteran Member
                                      Posts: 71
                                      Veteran Member
                                        We'll the processflow settings were significant. The paging activity is much, much better. However, I still do not believe we're optmized.

                                        The biggest offender now when it comes to paging is the oradb10 processes. The "ps avg" command shows over 100 oradb processes, many of which occupy more paging space than actual memory. Or ORACLE file doesn't have any of the optional parameters. Perhaps there is tuning here?
                                        Chad Dirst
                                        Advanced Member
                                        Posts: 25
                                        Advanced Member

                                          When we experienced slowness it was often a result of our ldap threads being used up.  By monitoring the threads we were able to track it back to the root of the cause.  As such you may want to monitor your ldap threads by using the ldapsearch command.

                                           

                                           

                                          Mike Schlenk
                                          Veteran Member
                                          Posts: 71
                                          Veteran Member
                                            What method do you use to monitor threads?
                                            chad208
                                            Advanced Member
                                            Posts: 25
                                            Advanced Member
                                              I created a shell script to run the following command every couple of seconds:

                                              /opt/IBM/ldap/V6.0/bin/ldapsearch -h xxxx.xxxx.us -p 389 -b cn=monitor -s base objectclass=* |grep available >> /tmp/workerthreadsavail.txt

                                              Mike Schlenk
                                              Veteran Member
                                              Posts: 71
                                              Veteran Member
                                                I'm not that familiar with the ldap commands. What's the xxxx.xxxx.us?
                                                Mike Schlenk
                                                Veteran Member
                                                Posts: 71
                                                Veteran Member
                                                  I'm leaning toward the ORACLE file. When I do a "ps avg" I see that the oradb10 files are typically using 105000 of memory. There are 90 of them right now. The oldest ones hardly page at all, the more recent page up to 100000. Is the 105000 too much?
                                                  chad208
                                                  Advanced Member
                                                  Posts: 25
                                                  Advanced Member
                                                    the x's are the host name of your server. You can also check the active threads via your LDAP web console, however, you won't be able to script it to track the threads over the course of a day or whatever timeframe your looking at.
                                                    Page 1 of 212 > >>