A Lesson Learned in System i (AS/400) System Administration
Posted on May 12th, 2008 |
Over the weekend I ran into a problem with two user’s interactive sessions taking up 100% of the system resources. Usually the 400 is very good at managing run away processes but these particular jobs caused the system to stop responding to all user requests, even for new sign on sessions. The console was still operational and I was able to pull up WRKACTJOB and see the two jobs that were killing the system. I wanted to view the logs before I killed the processes and once I select a 5 to view the job my console also froze. I was dead in the water. After 20 minutes of waiting for the console to do something I caved in and performed a hard shutdown of the system by holding the power button which resulted in 30 damaged data queues that I had to manually recreate once the system came back up.
Lesson learned: put the jobs killing the system on hold before trying to diagnose them.
