‘Everything Is Just Totally Different’: How New Mainframers Adapt

With formal mainframe education a rarity, learning the platform often means learning on the job

By Andrew Wig

Z17 Capacity Planning: Sorting Through the Data

A multitude of data points determine how you plan your z17 upgrade, but it’s not just a math problem

By Andrew Wig

The arrival of IBM’s new mainframe this year has enterprises doing the math as they work out just how much z17 they’ll need. But number crunching alone won’t get the job done.

“There's as much art as science to this performance and capacity, and the reason for that is there are so many different data points,” Brad Snyder, who helps customers with tasks such as processor sizing as an advanced technical sales support specialist at IBM, tells TechChannel.

 

Those data points start with the business environment and proliferate from there, ultimately abstracted as CPU utilization rate—the key metric in analyzing system performance.

 

It’s the elements that exist outside of the computing environment, however, that make the job of projecting computing needs such a challenge. “Unfortunately, we never have as much information as we'd really like to have,” Scott Chapman, CIO of Enterprise Performance Strategies, tells TechChannel. “In the old days, life was a little easier. I think in today's modern world, things change so fast.”

Asking the Right Questions

Given all the considerations that go into performance and capacity planning—from regulatory conditions, to business moves, to your system's transaction process history—it’s easy to see how things can get complicated quickly. “There are a lot of inputs into it. There are a lot of knowns and a lot of unknowns that you really have to kind of dig through,” Snyder says. 

 

And given the element of unpredictability that infuses business drivers, the information that planners do have isn’t always perfect. These days, Chapman sees fewer linear trend lines. “It's just a much more chaotic environment,” he says. “So in my mind, I always like to tell people: Go back to what's driving the business.”

 

As you relate that to your technical drivers, avoid knowledge silos, he advises. For instance, the technical people in an organization might have eyes on CPU utilization, but do they know how busy the call center is? “If the calls to the call center are driving your utilization, that'd be a really good metric to know,” Chapman says.

Assessing Your System and  Your Business
To help enterprises make the right decisions in a kinetic IT landscape, Chapman and Snyder note a few critical questions planners should be asking:
What regulatory changes may be in store?
What’s our business plan going forward?
Will we make an acquisition in the coming years?
Are we doing more hybrid cloud?
Are we implementing AI on Z?
What are our online transaction processes?
What are the batch processes and their history?
Will new applications will be coming onto the system?
SPONSORED CONTENT

Optimize Your Mainframe Efficiency and Streamline Your z/OS Performance Reporting with Pivotor®

Ensono logo
Pivotor Performance Reporting saves users time, money, and effort while prioritizing mainframe performance. Customers share their data and can then access 2,000+ reports/charts in Pivotor that are easily accessible and customizable. Notable features like outlier detection and analysis, exception reporting, and data aggregation, paired with the affordability and flexibility customers want, will excite both technical staff and management. Education and collaboration are central to the Pivotor service. We look forward to being your partner in achieving superior mainframe performance.

Superior z/OS Performance Reporting

EPS and Pivotor Performance Reporting

At EPS, we are z/OS performance experts. We use our Pivotor performance reporting software to provide companies with a comprehensive view of their mainframe systems. Whether you are searching for a reporter, need performance education, or could benefit from some consulting expertise, we are here for you.

See the BenefitsOpens in a new window.

See the Difference with Pivotor

How and Why Pivotor is Different Than Other Performance Management Reporters

Pivotor is a premiere z/OS performance service for intelligent reporting of your machine data. There are a wide variety of features and functionalities that are different from other comparable products. Find out why Pivotor receives rave customers reviews and is widely known in the performance landscape.
Learn more about why Pivotor stands outOpens in a new window.

Unlock Free Reporting

Free Performance Reporting

Looking for z/OS performance reporting but constrained by budget or time? The Pivotor Free Tier could be the perfect solution for you. Limited reporting on a single system.
Access Free Performance Reporting
Opens in a new window.

Mainframe Performance Meets Actual Intelligence

Pivotor's Outlier Detection and Analysis

Pivotor has various features that stand out amongst other reporters. We are proud to have Outlier Detection and Analysis in our software. Outlier (or anomaly) detection has been explored for decades with various statistical techniques, usually with mixed results: too complicated to set up and manage, too many false positives, too expensive to perform over large numbers of metrics, or too hard to investigate the results. Pivotor’s Outlier Detection and Analysis capability utilizes new machine learning techniques and visualizations to address these challenges and empower performance analysts with valuable insight.
Pivotor’s Outlier Detection ExplainedOpens in a new window.

CPU Utilization: Finding the Sweet Spot

The multitude of metrics at play in sizing your upgrade all point to CPU capacity, the main factor driving decision-making. With the z17, part of that process is deciding whether to get one engine or 208. The key data point in that question, CPU utilization, is deep, multifaceted and dependent on dynamic cost considerations.

 

Tools like Workload Manager (WLM) have made it easier to set those priorities and run a machine closer at 90-100%. “However, that does kind of ignore the fact that when you run busier like that, you lose some efficiency, because there is more contention for the CPU caches and so forth,” Chapman says. 

 

Running more processing units at 60% utilization, for instance, can be more efficient than running fewer at 90%, he explains. With that principle in mind, Chapman says that choosing the right strategy requires planners to judge whether the performance difference is enough to make the added processor units worth it.

100%

Less efficient

 

~60%

More efficient

 

20%

Too light (overprovisioned)

With the rise of tailored-fit pricing, meaning shops pay for all of their CPU usage, enterprises now have more incentive to maximize efficiency and find the sweet spot in engine count, Chapman notes. 

 

“That efficiency becomes more important, because when you're running that machine at 100% busy or even potentially close to 100% busy, you are probably consuming more CPU time to get the same amount of work done. More CPU time consumed means higher software costs under tailored-fit pricing,” he says.

Even though you may have over-purchased the amount of capacity, if you just look at it from a utilization perspective, from a software efficiency perspective, it may make a lot of sense.
—Scott Chapman, CIO, Enterprise Performance Strategies

When it Pays to Be Aggressive

The efficiency gains found in running lower CPU utilization with more engines makes it more forgiving to err on the side of overprovisioning. “Even though you may have over-purchased the amount of capacity, if you just look at it from a utilization perspective, from a software efficiency perspective, it may make a lot of sense,” Chapman says.

 

Although shops weigh a number of variables to determine the ideal usage rate for their environment, queuing theory holds that the system is busy enough at 75-85% for online processing delays to occur, Snyder notes. “And that's unacceptable.” 

 

Shops also have to consider the resources they devote to performance management when machines are running that hot. The closer a shop is 100%, the more time it has to spend ensuring workloads are running smoothly, Snyder explains.

When the Numbers Lie

IT departments' default practices may complicate matters. Some cap their utilization, which can make it more difficult to tell the CPU usage story. For example, if a system is capped at 80% and is hitting that mark, then that number isn’t reflecting the entire picture of processor demand, Chapman says. That’s why he likes to look at CPU delay samples and see how much queuing is occuring. 

 

“If I'm seeing those lower-importance workloads suffer more, and more CPU delay, that tells me there's additional demand that I'm not seeing reflected in my utilization because of the artificial cap,” he explains. 

 

Another complication has to do with the way MIPS are calculated. Snyder notes that IBM measures MIPS capacity with processes running at a certain CPU utilization rate, around 90%. This skews projections for processor capabilities in real-world environments. For instance, if a shop is running at 50% utilization and has a capacity of 20,000 MIPS, they might think they are drawing 10,000 MIPS. That would be wrong, since their CPU utilization is different utilization from what the processor units were tested at, creating potential complications when sizing an upgrade. 

 

Snyder explains: "Let's say you want to move that workload from the processor onto another processor that's going to be busier. It might end up being undersized because you thought you were using 10,000 MIPS, and in reality you were using 11,000 or 12,000 MIPS, becausae you were getting a performance benefit, if you will, by running less busy on a larger processor."

 

 

Thankfully, WLM isn’t the only tool that can help mainframers analyze performance as they size their z17. One vital utility is z Performance Capacity Reference (zPCR), which can show users the delta between current and proposed configurations, and what kind of upgrade an enterprise needs to reach its performance target.

 

“Most of the time, if we're following the proper processor sizing guidelines using ZPCR and things like that, we're going to be okay,” Snyder says. But he would still prefer the peace of mind that comes with extra capacity, should the choice come down to weighing conservative estimates versus a more aggressive calculus. When there is more than enough power, “I don't have to worry about it. I can now worry about other parts of my business at that point,” he explains.

You thought you were using 10,000 MIPS, and in reality you were using 11,000 or 12,000 MIPS.
—Brad Snyder, Advanced Technical Sales Support Specialist, IBM
Click to enlarge

Spyre and Your AI Plans

Properly configuring a z17 upgrade also requires anticipating brand new uses for the mainframe, and these days, that discussion includes consideration for on-premises AI processing. Those who plan to take advantage of the z17’s full capabilities in that department will want to make sure they have room to add the Spyre AI accelerator, which shipped Oct. 28. The PCIE-attached card can be added to the I/O drawer with little trouble—if the machine is properly configured. 

 

“If we don't have the drawer space, all of the sudden you're talking about adding a frame or doing abnormal things to move things around to make space, and that can be highly disruptive,” Snyder says.

The Upgrade Cycle: Peering Into the Future

Timing is another facet of capacity planning. Most shops upgrade every three to five years, skipping a generation of IBM Z, though some large organizations upgrade every cycle, Chapman notes. 

 

IBM’s incentivization of upgrades also influences the question of timing. That framework reduces software costs for those on a new machine, adding another twist to the financial calculation. “So some customers find it advantageous to always go to the new machine so they can reduce their software costs, get the latest feature function, all that sort of stuff,” Chapman says.

 

Clearly, those tasked with capacity planning have a lot to think about as they tackle a process that gets more complicated with each additional data point and nuance. “It’s that magic crystal ball,” Chapman says. 

 

It just takes a skilled fortune teller to read it.
Share this article
Share this article