O&M

New “Lean M&D” Solutions Promise the Right Fix at the Right Time

Issue 5 and Volume 121.

By Randy Bickford

Many large power generation fleet owners have deployed centralized monitoring and diagnostic (M&D) centers to improve generating performance and reliability. These M&D centers use advanced pattern recognition (APR) software to provide an early warning of changes in equipment condition that might indicate an impending failure. It is well documented that these centralized M&D centers have saved millions of dollars in avoided outage, maintenance and lost opportunity costs. However, a significant investment of human and capital resources must be made to achieve these benefits. False alerts are common and can distract M&D personnel from focusing on actual problems. Furthermore, interpreting what is causing the changes and determining the time horizon for action requires manual analysis and intervention by a time-constrained staff of experts and a dedicated monitoring team. This increases the cost of monitoring and can result in a delayed, inconsistent, or inaccurate diagnosis that wastes time, money or can put equipment and/or safety at greater risk. As a result, many smaller fleet operators are delaying implementation of a centralized M&D capability or are looking to outsource their M&D needs to third parties.

APR Signals
APR creates expected data signals corresponding with observed plant data signals.
APR creates expected data signals corresponding with observed plant data signals.

Next generation “Lean M&D” solutions are now available that can more accurately detect and characterize an anomaly with fewer false alerts and then apply automated online diagnostics and prognostics to determine the likely cause of the anomaly and predict the remaining useful life of the asset. This more advanced approach captures and automates existing expert knowledge to provide more valuable and timely information to the plant staff or remote monitoring team. This, in turn, allows the operations team to better manage the risk of the potential failure, expedite the repair before a serious failure occurs, and reduce the overall (actual and avoided) cost to the plant.

Key Elements of a Lean M&D Solution

  1. Scalable software minimizes the cost of implementation
  2. Accurate anomaly detection reduces the number of false alarms
  3. Automated diagnostics determines the cause for an anomaly
  4. Automated remaining life estimates guide the urgency for corrective action
  5. Well-defined alarm management and workflow processes maximize business value

By reducing the number of false alerts and automating the key expertise of subject matter experts (SMEs), fleet operators can benefit from online monitoring with a much smaller capital and human resource investment. This Lean M&D approach can be an enabler for smaller fleet operators who cannot afford the cost of a dedicated team of experts and monitoring staff. As the Lean M&D approach becomes more widely deployed, increasing efficiency will follow from the ability to capture and share valuable diagnostic and prognostic expertise across the industry. Taken together, these new analytics become a core element of harnessing the Industrial Internet in the power generation industry.

One of the key factors enabling Lean M&D is the availability of solutions designed to scale easily across the Industrial Internet. These solutions are designed to operate in the same way and perform the same services when running on a network edge device, on an engineering laptop or within a corporate or public cloud. This creates multiple points of entry for introducing powerful analytics that are interoperable across a deployment. Many new users benefit from solutions that can be run in full function mode on a single desktop or laptop. Few of the APR solutions deployed today offer this option and most require a large upfront information technology (IT) investment that has priced many smaller power generating companies out of the APR market. An ability to develop monitoring solutions locally and then scale up to the cloud or down to the device or control platform, when needed, is a new paradigm that is an enabling factor for Lean M&D.

The primary value of an APR-based solution is that it can be used to characterize plant operating anomalies in detail. Similar function can also be established using first principle models, such as a heat balance, when the variables of interest can be modeled based on physical, thermodynamic or electrical principles. The rise of APR solutions is mostly attributable to the fact that it is extremely easy to create an APR model of these same physical, thermodynamic or electrical principles using machine learning methods. Figure 1 on page 32 illustrates how APR is used to transform an observed data signal into a residual signal that has very useful properties for online monitoring. The APR model uses patterns in a set of signals to estimate the expected value of each of its input signals. The deviation between the observed and expected signals, often called the residual signal, will have a near zero mean and predictable statistics when the monitored system is operating normally and the APR model matches the data. When the monitored system moves away from normal operation, the change in properties of the residual signal can be characterized to define a set of symptoms that describe the change in behavior.

It is no surprise that all APR solutions are not equally capable of creating accurate expected data signals for plant operating data signals. However, the accuracy of the APR model’s expected data signal is very important when implementing Lean M&D. More accurate APR predictions translate directly to earlier problem detection and a more accurate initial diagnosis. More accurate APR predictions also mean fewer false alarms and lower staffing costs for alarm management. Managers of most large fleet remote monitoring centers cite false alarm management as the single greatest cost and inefficiency within their operations. Reducing the false alarm rate and improving the accuracy of problem detection and diagnosis is essential for moving to a Lean M&D implementation.

What then are the attributes that support a highly accurate APR solution? First, the APR algorithm itself controls the quality of the predicted values based on the observed values of the plant data. Most algorithms in use today are proprietary, but in general, those that use regression based methods will interpolate the expected data values more accurately than those that use cluster distance based methods or principle component based methods. A summary of several key features of a highly accurate solution are listed in Figure 2 on page 32.

Accurate APR solutions will also provide excellent support to help a user avoid the “garbage in-garbage out” problem. It is no surprise that a poorly designed APR model will be less accurate than a well-designed model. One key attribute of a well-designed model is that there is good correlation within the set of modeled plant data signals. In other words, there are actual patterns in the data for the model to learn and work with. Another key attribute of a well-designed model is the historical data used for calibration (a.k.a training) contains the full range of normal operation for the monitored equipment and does not contain any data from conditions that should be recognized as abnormal by the APR solution. Well-designed models also tend to require less on-going maintenance therefore further reducing the resources needed to maintain aLean M&D deployment.

Some next generation APR solutions go even further for improving APR accuracy by including tools for adaptive online calibration and operating mode optimization of the APR models and alerting thresholds. Adaptive calibration assures that false alarm rates remain low, despite normal aging related changes in the monitored equipment. This can help avoid the time and effort needed for periodic manual recalibration of the APR models and alerting threshold settings.

Operating mode model partitioning offers even greater benefit for accuracy by automatically optimizing the APR models and alarm thresholds for individual modes of operation of the equipment (i.e. high load versus low load) or variable equipment line-ups. Mode selection is automatic based on monitoring the plant control variables or operating data values. Mode partitioning allows finely tuned models to be engaged transparently as the monitored system moves through start-up, changing power levels, maintenance, and shut-down periods of operation. Alarm suppression and enablement are also mode specific so that user intervention is not required for managing mode-related false alarms.

Accurately characterizing changes in the plant data when a problem occurs is an essential prerequisite for analyzing the cause for anomalies automatically and providing the user with a diagnosis. Lean M&D solutions continuously update the diagnosis and advise the user based on the evolving state of the alarm events produced during monitoring. Alarm types useful for online diagnostics should include detectors for abnormal changes in plant data including: positive and negative mean value changes, increases and decreases in variance, excessive positive and negative rates of change, and values outside of reasonable ranges. These are available for application to observed data, predicted data and residual data and are configurable for each operating mode. Alarm settings used to characterize anomalies are learned during initial model calibration and are updated over time using adaptive calibration.

The diagnosis process itself can take one of three primary forms, all of which can be driven by the online monitoring alarm events. One common approach uses a rule-based expert system to process the alarm events. Most engineers are familiar with the IF-THEN rules approach used in an expert system and these can be effective in simple diagnostic scenarios. In the alternative, model-based reasoners and case-based reasoners are better suited for more complex scenarios, wherein multiple concurrent causes and overlapping symptoms are involved. Automated diagnostics for critical plant equipment often falls into this complex scenario.

As an example of a model-based approach, consider the diagnosis options for an observed increase in heat rate during operation of a combustion turbine. Figure 3 shows that an increase in heat rate will often be accompanied by one or more other symptoms, seven of which are identified on the bottom two rows of the diagram. Each of these symptoms can be activated or deactivated by the online anomaly detection system, for example using APR models and fault detectors. On the top row of the diagram, three possible causes for the increase in heat rate are shown: compressor fouling, turbine inlet temperature control error, and compressor discharge temperature measurement error. If the alarm state of the fault detectors is as illustrated by the yellow highlight, the compressor fouling diagnosis is the best supported cause for the observed anomaly.

Diagnosis Model       Fig 3
Diagnosis model for several combustion turbine problems having common symptoms.
Diagnosis model for several combustion turbine problems having common symptoms.

Determining the remaining time available to take a corrective action in response to an anomaly condition depends strongly on making the correct diagnosis. In a simple example for a human being, a high core temperature might indicate for a flu or, in the alternative, appendicitis. When additional symptoms confirm appendicitis, a very different time horizon for corrective action will apply. Lean M&D solutions link the online diagnostic system with appropriate online prognostic models.

Accurate APR methods provide a valuable resource for implementing a wide range of effective prognostic models useful for moving to condition-based and predictive maintenance strategies. By effectively characterizing the way a monitored system is moving away from a normal condition, these solutions provide a unique prognostic opportunity for many practical maintenance planning problems. In the simplest cases, data driven prognostics can predict how long before a filter should be backwashed or changed, or how long before an operations limit will cause an alarm to occur in a control room. Once again, the accuracy of the estimated values will influence the result. An accurate prediction of the expected values provides an accurate measure of the rate and magnitude of the deviation of the monitored system away from normal conditions.

Prognostic Methods     Figure 4
Predicting the time horizon for condition-based maintenance of a gas turbine compressor.
Predicting the time horizon for condition-based maintenance of a gas turbine compressor.

As an example of a prognostic model, consider the problem of determining when the compressor section of a gas turbine should be scheduled for a water wash to correct compressor fouling and the resulting loss in cycle efficiency. In the plots shown in Figure 4, an aircraft gas turbine progresses from an acceptable condition at the beginning of the data set to a maintenance needed condition at the end of the data set. APR models are used to monitor the airflow parameters and the temperature parameters from the compressor inlet to the exhaust section. Remaining time until required maintenance is predicted using a combination of the abnormal change in static pressure aft of the compressor section and the abnormal change in exhaust temperature aft of the low-pressure turbine. When both conditions occur simultaneously, the diagnostic system alerts for a decrease in compressor section efficiency. The diagnostic alert activates the prognostic model for compressor efficiency loss, which begins tracking the degradation in efficiency and estimating the remaining number of flights before the monitored engine must be scheduled into a depot for inspection and maintenance.

The power of these prognostic methods is evident in the four plots presented in Figure 4. In the upper left, the blue trace shows the turbine exhaust temperature measured at cruise for a series of flights. The temperature values have a wide dynamic range because each data point can be taken at a different altitude, ambient temperature and power lever angle setting. In comparison, the corresponding APR expected values are shown in the lower left plot. The expected values track the observed values well. By computing a residual degradation signal from the observed and expected values, as shown in the plot on the upper right, it is evident that the exhaust gas temperature is slowly creeping up as the data progresses from flight-to-flight.

At 177 flights into this data set, the diagnostic model alerts with a diagnosis of compressor efficiency loss. This activates the prognostic model to automatically evaluate and track the number of flights remaining before maintenance is required, as shown in the lower right plot. In this case, a perfect prediction would be a straight line from fifty-six (56) flights remaining at flight number 177, which is the time of the initial diagnosis, to zero (0) flights remaining at flight number 233, which is the known condition where maintenance is required for this engine. As is often observed, the remaining life predictions in this example are more uncertain at first and improve as the path of the degradation signal becomes better defined.

Lean M&D combines anomaly detection with advanced diagnostics and prognostics to assemble an integrated and automated system that significantly reduces the resources needed for a central monitoring team of experts. This allows smaller power generators to enjoy all the remote monitoring center benefits enjoyed by larger power generators who today can afford to retain expert staff for performing diagnostics and prognostics. All of this comes with a much lower initial investment and lower overall cost for operations.


Author:
Randy Bickford is president of Expert Microsystems Inc.