How Zowe Is Opening Minds to the Mainframe

Seven years in, the open-source tool set has proved to be a ‘game changer’ in attracting younger developers to a foundational platform

By Andrew Wig

Capacity Planning in the Land of Plenty

Why capacity planning and performance management still matter, even when you’re swimming in MIPS

By Dava Stewart

The cloud has brought about essentially unlimited capacity, and that seems entirely positive … at first glance.

At one time, being in IT and running a mainframe meant being a jack of all trades, and that included understanding capacity planning and performance management, says Harry Batten, executive solution architect, IBM Expert Labs. In the modern environment, though, specialization has made such a breadth of knowledge less common. He describes the change like this: 

 

“If I go back in the past, MIPS were incredibly important—when you were running a box that had 50 MIPS on it, as opposed to the z17s nowadays that basically have hundreds of thousands of MIPS on it.”

 

Having unlimited capacity naturally leads to paying less attention to usage and performance. And, as capacity has increased, the tools for managing performance have improved, so there’s less training and monitoring in general. Those two factors have all but eliminated the need for a dedicated capacity planning and performance management role in most companies, but that lack can be problematic. To understand why this matters, consider a familiar scenario. 

The Electric Bill

A fair comparison can be made between managing a household’s electric bill and an organization’s capacity planning and performance management. Your electricity usage is essentially unlimited, but it comes at a cost, just as computing capacity does.

 

Your electric meter measures how much power your household consumes, and your bill is based (in part) on that usage. Similarly, mainframe users pay software costs based on a four-hour rolling average (4HRA). In both cases, you pay for what you consume rather than buying a set amount. 

 

Another commonality is that both your software and your electrical usage have peak usage times. In some places, electricity is billed at a higher rate during overall peak times, and in business, peak times affect the 4HRA dramatically, even if the peak only lasts a few minutes.

Looking to the Cloud

As organizations seek to take advantage of the cloud’s flexibility and virtually unlimited capacity, hybrid cloud environments have become the norm in mainframe shops. Meanwhile, the cloud’s popularity continues to grow overall.
0
%
99% of enterprises operate in hybrid cloud environments

$738 B

$1.6 T

Global cloud computing market share expected to grow from $738 billion in 2025 to $1.6 trillion in 2030
SPONSORED CONTENT
Performan character

z/VM vs. KVM:

Which Hypervisor Optimizes Your Linux on Z Investment?
Performia character
Is your enterprise leveraging the full potential of your IBM Z and LinuxOne infrastructure? The choice between z/VM and KVM directly impacts your operational control, resource utilization, and system performance. And with it the amount of engines you need for your workload. Gain the essential insights on instrumentation, efficiency, and benchmarking to confidently plan your future Linux on Z strategy.
Operational Support
Operational Support
When running large numbers of servers, feeding metrics into an intelligent operational alerting mechanism becomes essential
Resource Utilization
Resource Utilization
z/VM achieves high resource utilization by effectively sharing hardware resources among virtual machines
Performance Analysis
Performance Analysis
To quickly solve performance problems, accurate data and skills are needed
Chargeback / Resource Consumption Analysis
Chargeback / Resource Consumption Analysis
Metrics are required to evaluate costs associated with applications and workloads

From Planning to Problem-Solving

A family that decides to work on lowering their electricity bill would first need to understand how much electricity they were currently using. Then, they could begin to narrow down when and why it’s being consumed. In a place where prices vary depending on peak usage, the family might be able to spread out electricity use by doing things like running the dishwasher overnight.

 

This kind of analysis is analogous to capacity planning and results in a baseline and a goal. At that point, performance management can begin. Organizations need this same process of establishing baselines, understanding usage patterns and planning for future capacity needs. The benefits of capacity planning include predictable costs, improved resiliency and better planning for future upgrades.

 

However, without someone dedicated to monitoring capacity and usage in an organization, Batten says that performance management often becomes a “problem determination type of exercise.” In other words, instead of paying attention to how the box is performing over time and planning for upgrades, teams are discovering problems only after they occur.

Instead of ‘How’s our box performing? When do we need to upgrade?’ It’s ‘We put a new application on the system and it’s running hot.’ I’m not seeing a lot of capacity planning but performance management being reactive to a large degree.
—Harry Batten, executive solution architect, IBM

Modern Languages and the CPU Cycle Trade-Off

Everyone raves about these modern languages, but they use a lot more CPU cycles. The simpler a language becomes to write, the more CPU cycles you need to write it.
—Harry Batten, executive solution architect, IBM

In addition to the nearly unlimited capacity of modern mainframes and the fact that better tools lead to less attention, Batten notes that modern programming languages come with a trade-off—capacity usage.

 

“Everyone raves about these modern languages,” says Batten, “but they use a lot more CPU cycles. The simpler a language becomes to write, the more CPU cycles you need to write it.” He says that the further away you get from machine code, the more instructions it takes to do something.

 

Understanding the workload is the key to managing performance. Although it’s possible to offload capacity from general purpose processes and lower software costs, without understanding the workload, it’s impossible to predict the impact of changes. If you don’t have a baseline measurement of usage, trying to understand how new programs will affect performance is an exercise in frustration.

 

Once you have that baseline, Batten says, “you can go back and do a deeper dive. All the tools are there to do really deep dives into capacity and come up with a solution.”

The Future of Workload Management

A bedrock feature of mainframe systems that can help is the z/OS Workload Manager (WLM), which has effectively functioned like AI without a large language model beneath it for a long time, according to Batten. “Workload Manager is the glue that holds mainframes together. It distributes the work based on priority, availability of capacity and so on, and it’s had machine learning in it from day one,” he says. The system learns as it goes along.

 

“It effectively says, ‘Hey, I know that when this happens at this particular time, I get a spike in my CPU and I handle it accordingly and I learn how to do that,” says Batten, noting that it’s very similar to the way generative AI works. He expects that the future will bring more real-time performance monitoring.

 

“Right now you have products that have dashboards, and you can set up parameters so that when performance goes above a certain level, the dashboard turns yellow, and if it goes higher, it turns red, then you can draw down [capacity usage],” says Batten. He thinks that AI will be able to automatically handle the problem, and already is in some organizations. 

 

The more that systems can handle things with less human interaction, the smoother things will go, particularly during times of crisis. “I always look at capacity as a component of resilience,” says Batten, “You’ve got to have enough capacity for recovery if something goes wrong.”

 

As AI handles more of the systems around capacity planning and performance management, the less likely mistakes are to happen, and the better disaster recovery plans become, notes Battan. “In a crisis, the first thing you lose are your people,” he says. People are more interested in looking after their families than in keeping computing systems up and running. Plus, people with spreadsheets are more prone to mistakes than systems.

 

However, even with the improvements modern technology is bringing around capacity and performance management, Batten says there are still ways to better project CPU usage. When a programmer has a stellar new program that’s been tested and works well, they often can’t say how much capacity it will use. “We’ll just stick it into production and see how it goes,” says Batten, “and then usage jumps 20% when the program is running. All that work could have been done proactively.”

Maintaining Mainframe Reliability

The reason the mainframe is reliable is because of the policies in place, and those policies are being eroded. Taking a proactive stance around capacity planning and performance management, adopting new tools to automate bringing on capacity or soft capping, and using Workload Manager and automation to analyze and predict usage can help maintain the reliability of critical systems.

“Mainframes are incredibly complex but incredibly simple if you understand them,” says Batten. “You have to understand the technology you’re working with.” The time to solve an issue is before it happens, not when the problem is increasing costs exponentially.
Share this article
Share this article