How extreme failure‐data filtering leads to poor process safety

This paper describes both appropriate and inappropriate practices of filtering field failure data (FFD) prior to using the resulting failure count from the failure data to estimate failure rates. The underlying causes of inappropriate (extreme) filtering of FFD are examined, and the consequences (poor process safety) are discussed. An example is presented illustrating the benefits of using all real failures discovered in the field in the estimation of failure rates based on FFD.


| INTRODUCTION
Safety standards 1-3 mandate that end users collect and analyze field failure data (FFD) from their plants on an ongoing basis to estimate the failure rates that various devices actually achieve in the field.
These estimated failure rates are used to calculate achieved process safety metrics, which should then be compared to the theoretical design safety metrics to ascertain if the end-user's safety targets are being met in the field. FFD is used by manufacturers and some certification bodies to estimate failure rates for individual devices, for example, sensors, logic solvers, solenoids, actuators, valves, etc., which are published in safety manuals or functional safety certificates. Additionally, industrial consortia use FFD to produce failure rate ranges for device assemblies, for example, sensor assemblies or valve assemblies. 4 Some certification bodies and other consulting companies publish failure rate ranges 5,6 based on their own data analyses.
FFD comes in a variety of forms, levels of detail, and quality. To estimate failure rates in a valid way, it is necessary to identify within the FFD, a subset of devices with similar technology operating in similar environments and maintained by similar maintenance cultures. The FFD for this subset of devices forms a homogeneous or nearly homogeneous dataset. These concepts are further defined and explained in Reference 7, which is another paper accompanying the panel discussion session. Given a homogeneous or nearly homogeneous dataset, the total failure count, n, and the total operating time attributable to the devices, T op , are used to estimate the failure rate.
The smaller n, the smaller the estimated failure rate.
Some filtering of the FFD is required. Not all field records should be used,f and not all recorded failures should be counted in the number n. This filtering or failure exclusion from data analysis, which reduces the value of n and thus reduces the estimated failure rate, may be entirely appropriate. However, this filtering may be taken to such extremes that the resulting failure rate estimates for individual devices are unrealistically low. When published failure rates are unrealistically low, they drive, in turn, unrealistic theoretical designs that cannot possibly achieve the required safety performance in the field, and the result is poor process safety that is unrecognized for what it is. 2 | THE NATURE OF FIELD FAILURE DATA FFD may take a variety of forms. Typically, an end-user's FFD will consist of proof test records, repair records, and maintenance records.
A manufacturer's field return database will include records for all devices returned to the manufacturer by end users. Failure data will come from tests and inspections performed by the manufacturer on the returned devices.
Computerized records lend themselves more readily to sorting, categorizing, and general exploration of the data. However, the authors have received FFD as a set of proof test records that were paper reports, many filled in by hand, that were scanned into a pdf file and had to be examined one page at a time to identify failures and estimate operating times.
The level of detail of failure records is important. Is the device manufacturer and model recorded? Is the installation date known? Is the application specified, for example, does the mechanical device close-on-trip or open-on-trip? Is the mechanical device used in a safety application and, therefore, stationery for long periods, or is it used in an application where it is subject to relatively frequent motion? Is it deployed in a continuous process or a batch process?
When a failure is discovered, is the device simply replaced, or repaired and returned to service, or is the device failure subject to root cause analysis (RCA) to identify the underlying cause of the failure? All this information and more informs the appropriate failure data filtering process. However, in the authors' experience, it is more common for data records to contain rather sparse information.

| APPROPRIATE FAILURE DATA FILTERING
One author has worked with a proof-test dataset from Savannah River Site (SRSite) for spring-operated pressure relief valves (SOPRVs) that is highly detailed. It contained information about the manufacturer/model, last test date, the current test date, inspection notes, technician's names, valve details such as inlet and outlet sizes, etc.
Every time the SOPRV was tested or retested there was a separate test record. And every discovered failure was subject to RCA detailed in a separate report, including, if possible, recommendations for changes that would reduce the occurrence of this failure in the future.
Such extensive record-keeping invariably contains records not associated with failures in the field. Now, clearly, if a SOPRV is discovered to have failed upon proof testing, is repaired, and then fails a proof (re)test immediately after repair, this last failure should not be counted as it did not occur in the field. This is an example of an appropriate exclusion of a failure from the failure count.
Similarly, if a manufacturer's field return database contains a record of a device returned because the customer ordered the wrong size or configured the device incorrectly for their application, these are not device failures, and these records should be excluded from data analysis.
As a rule, one can exclude from the failure count failure records that either represent no actual failure or represent a failure that did not occur in the field (where it would have potentially caused harm to personnel, plants, or the environment). This last point is important as it implies that any failure that occurred in the field should be retained in the failure count for the purposes of estimating device failure rates based on FFD.

| INAPPROPRIATE FAILURE DATA FILTERING
One author has visited various European plants and interviewed employees tasked with FFD collection and analysis and, on more than one occasion, has heard stories like this. "This last year we had ten field failures. But we only counted three because we deemed the other seven to be systematic failures, and the standards say we don't have to count those." This practice of excluding failures from data analysis that are known to have been present in the field due to a binary failure classification scheme (which classifies failures as either random or systematic) is an interpretation of the definitions described in the standards.
This binary classification is a common explanation given by those who  • "… system failure rates arising from random hardware failures can be quantified with reasonable accuracy but those arising from systematic failures cannot be accurately statistically quantified …" IEC 61508 3.6.5 Note 2 • "In determining safety integrity, all causes of random hardware and systematic failures which lead to an unsafe state can be included …" IEC 61511 3.2.68 Note 3 ISA TR84 9 addresses similar issues, though not always in the same way. It: • States "There is nothing in this guideline that precludes, replaces,

| What the standards intend: authors' opinion
It is the opinion of the authors that the standards intend to differentiate between failures that cannot be eliminated-called, in the standards, random failures-and failures that can and would be eliminated-called, in the standards, systematic failures. This would explain why ISA states that systematic failures should not be included in SIL calculations. Unfortunately, this interpretation is not made clear in the standards. After all, for the purposes of failure rate estimation, it makes no sense to exclude from the failure count, failures that actually occurred and represented the potential for harm, and then compute a safety metric based on that failure rate estimate that does not account for those real, potentially harmful failures, and believe that one is actually achieving that level of process safety in the field.
The authors have had conversations with several IEC 61508 and 61,511 and ISA84 committee members who agree with this viewpoint and believe that the standards should clarify that failures can be both random and systematic. Furthermore, one committee member categorically stated that failures due to causes such as corrosion and wear out were never intended to be classified as systematic failures. And this makes sense because these causes, while they may be improved upon, are unlikely to be permanently eliminated. Further, these failures can be accurately statistically quantified from the FFD, something the standards say is not possible for systematic failures.
Unfortunately, the lack of clarity about the classification of failures has had the unintended consequence of creating a loophole that allows some, albeit inappropriate, justification for extreme filtering of FFD to undercount failures that occurred in the field before estimating failure rates. Some analysts have exploited this loophole, whether intentionally or unintentionally. Consequently, it is not uncommon to find FFD analyses where the vast majority of field failures are excluded from the failure count on the grounds that the failures were only systematic and need not be included in the failure rate estimation. In these cases, this leads to estimated failure rates that are unrealistically low and process safety performance in the field that is much poorer than that indicated by the erroneous calculations that used the unrealistic failure rates.

| AN APPROACH TO FAILURE RATE ESTIMATION BASED ON A REALISTIC FAILURE DATA CLASSIFICATION SCHEME
While SRSite originally classified failures as random or systematic (from SRSite's viewpoint), they always counted all real field failures in their failure rate estimations. However, they later performed data analysis 10 based on a different failure classification scheme. In this new scheme, they classified failures as those that they could not practically influence and those that they could practically influence. They had been using RCA to improve their end-user practices and, hopefully, improve process safety performance. The RCAs allowed them to propose specific changes to their maintenance practices to accomplish this improvement. The new classification scheme allowed them to quantify the safety improvement that the changes to practice had wrought.
Specifically, two different, nearly homogeneous datasets were

| CONCLUSIONS
The practice of extreme filtering of FFD results largely from classifying many failures discovered in the field as systematic failures and excluding them from the failure count used for estimating failure rates. This practice violates the intent of the safety standards, leads to unrealistically low failure rate estimates, and results in the end user being deluded into believing that the process safety in the field is much higher than is actually being achieved. The SRSite facility has its own on-site ASME Board Certified valve shop. It is current practice at SRSite to perform an RCA on any SOPRV that fails its proof test. Among the SRSite RCA reports, there is an interesting example where a hard-seat stainless-steel trim SOPRV failed its proof test three times.
When first tested in its new ("as arrived") state, the SOPRV failed to open below 150% of set pressure on its "first pop" but popped at set pressure the next three times, which is typical of a random failure caused by stainless steel adhesions. According to ASME National Board regulations, it could be tagged and installed in the field. After a year in the field, it was proof tested with the same results as when it was tested as new. It was retagged and returned to the field. The first two failures occurred early in the data collection process, when the if the data represents one valve that failed three times, three valves that failed once each, or two valves failing in some combination of once and twice. So, it will not always be the case that analysts have the information required to correctly classify a failure as random or systematic. Therefore, all real failures should be counted in the failure rate analysis until it is proven that systematic issues have been eliminated, in which case they will no longer appear in the FFD.

A.2. | The problem with glue
SRSite received a number of SOPRVs from the same manufacturer, same model, and same lot number. SRSite proof tests all SOPRVs in their "as arrived" state before the first installation. Of this group of SOPRVs, two valves were proof tested as fail-to-open-a dangerous undetected failure in a SOPRV. Disassembly showed the seat and disc were sealed together by a foreign substance. The problem was reported to manufacturer, which claims in its literature to test every valve before it leaves the factory. One author consulted a mechanical engineering colleague with extensive valve experience. He said, "I know that manufacturer. It tests every valve to correctly fix the set pressure which is then locked with a locking nut. Then glue is applied to the locking nut so that the end user can't adjust the set pressure." Clearly, the manufacturer does not test the SOPRV after the glue is applied and has cured. There are several (random) reasons why there might be too much glue applied. The glue gun trigger may be pulled for too long, applying too much glue, or the glue reservoir may have been overfilled, so that even if the glue gun trigger is pulled for an appropriate time, too much glue will nevertheless be applied until the glue sits at or below the maximum fill line of the reservoir. If this excess glue drips down into the valve, it may come to rest on the disc and seat, which may or may not actually become glued together. Is this failure random or systematic? Well, it happens because of random events. But it is also a failure that can be eliminated if the manufacturer would allow the glue to cure and then retest all valves before shipping, thereby doubling the manufacturer's testing costs. Or the end user could test every SOPRV upon arrival (assuming they have the capability to do so) to eliminate the glued-shut valves. Elimination is possible, but not always practical. This story highlights the fact that failures can have both random and systematic characteristics.
Root Cause Analysis (RCA) was needed for SRSite to discover the underlying failure cause and eliminate it, making this a systematic failure for SRSite as an end user. But many end users would have to treat this as a random failure.

A.3. | The wrong screw
SRSite purchased several identical SOPRVs, which were all stainlesssteel construction except for a single carbon steel screw. After several of these valves failed due to corrosion of the carbon steel screws, SRSite approached the manufacturer and requested stainless steel screws for the valves. The manufacturer would not comply.
It would seem that these failures are systematic and could be eliminated by replacing the installed valves with different valves that were all stainless steel. However, SRSite policy requires that a given location can only be serviced by an approved valve (manufacturer and model) and that any replacement must be with an identical valve.
Replacement with a different valve involves a lengthy approval process. Thus, using a different valve was not an immediate option.
During the approval process, should the valve failures be counted as random or systematic? They could not be eliminated by the end user (until a new valve was approved) and were therefore counted as random. SRSite did take the precaution of shortening the proof test interval, a significant consideration because SRSite policy does not permit the use of block valves, so increased proof testing required increased process shutdown.

A.4. | Summary
Classifying a field failure as random or systematic is not a straightforward undertaking. The mere notation of a potential systematic cause of failure in a failure report does not mean that was the underlying cause of failure. Accurately classifying failures almost always requires RCA. Often, significant record-keeping is required to "connect the dots." A point of view must be considered. It is not sufficient to ask if a failure can be eliminated. One must also consider by whom the failure can be eliminated and at what cost. Failures that can possibly be eliminated but, in fact, cannot practically be eliminated or are simply not eliminated (for some reason) should be designated as counted failures for the purposes of failure rate estimation.
T A B L E A 1 Summary of SRSite SOPRV proof test data used to estimate both theoretical and achieved dangerous undetected (DU) failure rates in two different time periods representing two different maintenance cultures.