With formal mainframe education a rarity, learning the platform often means learning on the job
By Andrew Wig
By Andrew Wig
“There's as much art as science to this performance and capacity, and the reason for that is there are so many different data points,” Brad Snyder, who helps customers with tasks such as processor sizing as an advanced technical sales support specialist at IBM, tells TechChannel.
Those data points start with the business environment and proliferate from there, ultimately abstracted as CPU utilization rate—the key metric in analyzing system performance.
It’s the elements that exist outside of the computing environment, however, that make the job of projecting computing needs such a challenge. “Unfortunately, we never have as much information as we'd really like to have,” Scott Chapman, CIO of Enterprise Performance Strategies, tells TechChannel. “In the old days, life was a little easier. I think in today's modern world, things change so fast.”
Given all the considerations that go into performance and capacity planning—from regulatory conditions, to business moves, to your system's transaction process history—it’s easy to see how things can get complicated quickly. “There are a lot of inputs into it. There are a lot of knowns and a lot of unknowns that you really have to kind of dig through,” Snyder says.
And given the element of unpredictability that infuses business drivers, the information that planners do have isn’t always perfect. These days, Chapman sees fewer linear trend lines. “It's just a much more chaotic environment,” he says. “So in my mind, I always like to tell people: Go back to what's driving the business.”
As you relate that to your technical drivers, avoid knowledge silos, he advises. For instance, the technical people in an organization might have eyes on CPU utilization, but do they know how busy the call center is? “If the calls to the call center are driving your utilization, that'd be a really good metric to know,” Chapman says.


At EPS, we are z/OS performance experts. We use our Pivotor performance reporting software to provide companies with a comprehensive view of their mainframe systems. Whether you are searching for a reporter, need performance education, or could benefit from some consulting expertise, we are here for you.
The multitude of metrics at play in sizing your upgrade all point to CPU capacity, the main factor driving decision-making. With the z17, part of that process is deciding whether to get one engine or 208. The key data point in that question, CPU utilization, is deep, multifaceted and dependent on dynamic cost considerations.
Tools like Workload Manager (WLM) have made it easier to set those priorities and run a machine closer at 90-100%. “However, that does kind of ignore the fact that when you run busier like that, you lose some efficiency, because there is more contention for the CPU caches and so forth,” Chapman says.
Running more processing units at 60% utilization, for instance, can be more efficient than running fewer at 90%, he explains. With that principle in mind, Chapman says that choosing the right strategy requires planners to judge whether the performance difference is enough to make the added processor units worth it.
Less efficient
More efficient
With the rise of tailored-fit pricing, meaning shops pay for all of their CPU usage, enterprises now have more incentive to maximize efficiency and find the sweet spot in engine count, Chapman notes.
“That efficiency becomes more important, because when you're running that machine at 100% busy or even potentially close to 100% busy, you are probably consuming more CPU time to get the same amount of work done. More CPU time consumed means higher software costs under tailored-fit pricing,” he says.


The efficiency gains found in running lower CPU utilization with more engines makes it more forgiving to err on the side of overprovisioning. “Even though you may have over-purchased the amount of capacity, if you just look at it from a utilization perspective, from a software efficiency perspective, it may make a lot of sense,” Chapman says.
Although shops weigh a number of variables to determine the ideal usage rate for their environment, queuing theory holds that the system is busy enough at 75-85% for online processing delays to occur, Snyder notes. “And that's unacceptable.”
Shops also have to consider the resources they devote to performance management when machines are running that hot. The closer a shop is 100%, the more time it has to spend ensuring workloads are running smoothly, Snyder explains.
IT departments' default practices may complicate matters. Some cap their utilization, which can make it more difficult to tell the CPU usage story. For example, if a system is capped at 80% and is hitting that mark, then that number isn’t reflecting the entire picture of processor demand, Chapman says. That’s why he likes to look at CPU delay samples and see how much queuing is occuring.
“If I'm seeing those lower-importance workloads suffer more, and more CPU delay, that tells me there's additional demand that I'm not seeing reflected in my utilization because of the artificial cap,” he explains.
Another complication has to do with the way MIPS are calculated. Snyder notes that IBM measures MIPS capacity with processes running at a certain CPU utilization rate, around 90%. This skews projections for processor capabilities in real-world environments. For instance, if a shop is running at 50% utilization and has a capacity of 20,000 MIPS, they might think they are drawing 10,000 MIPS. That would be wrong, since their CPU utilization is different utilization from what the processor units were tested at, creating potential complications when sizing an upgrade.
Snyder explains: "Let's say you want to move that workload from the processor onto another processor that's going to be busier. It might end up being undersized because you thought you were using 10,000 MIPS, and in reality you were using 11,000 or 12,000 MIPS, becausae you were getting a performance benefit, if you will, by running less busy on a larger processor."
Thankfully, WLM isn’t the only tool that can help mainframers analyze performance as they size their z17. One vital utility is z Performance Capacity Reference (zPCR), which can show users the delta between current and proposed configurations, and what kind of upgrade an enterprise needs to reach its performance target.
“Most of the time, if we're following the proper processor sizing guidelines using ZPCR and things like that, we're going to be okay,” Snyder says. But he would still prefer the peace of mind that comes with extra capacity, should the choice come down to weighing conservative estimates versus a more aggressive calculus. When there is more than enough power, “I don't have to worry about it. I can now worry about other parts of my business at that point,” he explains.



Properly configuring a z17 upgrade also requires anticipating brand new uses for the mainframe, and these days, that discussion includes consideration for on-premises AI processing. Those who plan to take advantage of the z17’s full capabilities in that department will want to make sure they have room to add the Spyre AI accelerator, which shipped Oct. 28. The PCIE-attached card can be added to the I/O drawer with little trouble—if the machine is properly configured.
“If we don't have the drawer space, all of the sudden you're talking about adding a frame or doing abnormal things to move things around to make space, and that can be highly disruptive,” Snyder says.
Timing is another facet of capacity planning. Most shops upgrade every three to five years, skipping a generation of IBM Z, though some large organizations upgrade every cycle, Chapman notes.
IBM’s incentivization of upgrades also influences the question of timing. That framework reduces software costs for those on a new machine, adding another twist to the financial calculation. “So some customers find it advantageous to always go to the new machine so they can reduce their software costs, get the latest feature function, all that sort of stuff,” Chapman says.
Clearly, those tasked with capacity planning have a lot to think about as they tackle a process that gets more complicated with each additional data point and nuance. “It’s that magic crystal ball,” Chapman says.