Improving reliability and safety of airborne wind energy systems

Airborne wind energy systems use tethered flying devices to harvest wind energy beyond the height range accessible to tower-based wind turbines. Current commercial prototypes have reached power ratings of up to several hundred kilowatts, and companies are aiming at long-term operation in relevant environments. As consequence, system reliability, operational robustness, and safety have become crucially important aspects of system development. In this study, we analyze the reliability and safety of a 100-kW technology development platform with the objective of achieving continuous automatic operation. We first outline the different components of the kite power system and its operational modes. In the next step, we identify failure modes, their causes, and effects by means of failure mode and effects analysis (FMEA) and fault tree analysis (FTA). Potentially hazardous situations and mechanisms which can render the system nonoperational are identified, and mitigation measures are proposed. We find that the majority of these measures can be performed by a failure detection, isolation, and recovery (FDIR) system for which we present a hierarchical architecture adapted from

consensus is that safe and robust operation with a sufficient degree of autonomy is a prerequisite for successful market introduction and public acceptance. 9The importance of demonstrating reliable long-term operation of airborne wind energy systems in a relevant wind environment has also been confirmed by a recently commissioned study for the European Commission. 10To our knowledge, none of the commercial prototypes has been operated continuously more than a few days.
2][13] Failure mode and effects analysis (FMEA) is a ''bottom-up'' analytical method that is used in the design phase to map and examine failures of individual components and to trace forward the potential effects on the performance of the entire system.FMEA is widely used in automotive and aerospace engineering as well as in many other domains.The method has also been adopted for wind turbines. 14Fault tree analysis (FTA) is the reverse of FMEA, a ''top-down'' deductive analysis, aiming at the identification and analysis of conditions that lead to a particular system failure, commonly the catastrophic event.Failure detection, isolation, and recovery (FDIR) is a technique to monitor the system during operation, identify faults that occur, and pinpoint the type of fault and its location to isolate it and to take appropriate recovery actions.
Only few studies have addressed the reliability and safety of AWE systems.Kruiff and Ruiterkamp 15 outline the civil aviation standards and design processes that are applied by Ampyx Power B.V. for rigid-wing AWE system development.Salma et al 16 describe the aviation-related risks introduced by AWE systems and give an overview of existing and expected regulations for AWE systems.Stoeckle 17 proposes an FDIR approach for autonomous parafoils that resemble kites with suspended control unit.Friedl 18 and Friedl et al 19 investigate means to augment the flight control system using an algorithm that detects potentially hazardous situations and reconfigures the system to ensure safe operation.Glass 20 reviews the relevant wind turbine and aviation standards and suggests an initial framework for a set of standard wind conditions for the certification of airborne wind turbines. 21No study to date has offered a complete methodology on improving the safety and reliability level of AWE systems.
The present study proposes a systematic approach for AWE system development in order to reach the required reliability and safety levels.For this purpose, a set of requirements for an FDIR system is defined.These requirements are gathered by a reliability analysis using FMEA and FTA methods.The obtained requirements in fact mitigate the failure cases of the system.Although we present the methodology for a specific AWE system, the flexible-wing kite power system of Delft University of Technology and Kitepower B.V., it is generic and can be applied to different types of AWE systems.The paper is organized as follows.Section 2 outlines the functional components and modes of operation of the system.Section 3 describes the systematic safety assessment and improvement using FMEA and FTA, complementing this by an FDIR system as an integral part of the fault management strategy.In Section 4, the achieved results are presented, and Section 5 finalizes the study with conclusions.

SYSTEM DESCRIPTION
In this section, we describe the technology demonstrator developed by Delft University of Technology and operated on a regular basis from 2010 to 2015. 22,23This platform has been designed for pumping cycle operation of a lightweight flexible-membrane wing with an average traction power of 18 kW during reel-out of the tether.Depending on the kite used, this platform achieved a mechanical net power of up to 7 kW. 23om 2016 onwards, the technology base has also been used as a starting point for the commercial development of a scaled-up version by the spin-off company Kitepower B.V. 24 The description mainly captures the development status at the time the technology was transferred to the commercial team.Since then, the development has progressed to second and even third component generations to accommodate the stepwise scaling of the system to an electrical net power of 100 kW.Important adaptations of the commercial development are included in the description.

Functional components
The functional system components are illustrated in Figure 1.The traction force is generated by a flexible-membrane wing that is steered by a kite control unit (KCU).This remote-controlled cable robot is suspended in the rear bridle line system and also modulates the force level by  adjusting the pitch angle of the wing.The airborne subsystem of wing, bridle line system, and KCU is denoted as kite and has been described in detail by Oehler and Schmehl. 25The tether is deployed from the drum/generator module of the ground station.A continuous positive net power output of the system is achieved by operating the kite in pumping cycles, alternating between reel-out and reel-in of the tether.While reeling out, the kite is flown in crosswind maneuvers to maximize the traction force and the generated energy. 26These figure-of-eight flight patterns are hinted in Figure 1.To reel in, the maneuvers are discontinued and the kite is depowered by pitching the wing to lower its angle of attack, which substantially reduces the traction force and the required energy to retract the kite.The part of the generated electrical energy that is used to retract the kite is buffered with a rechargeable battery.The described working principle crucially relies on active control of tether reeling and kite flight path.The individual components of the system are detailed in the following.

Wing
As illustrated in Figures 1 and 2, the wing consists of a fabric canopy and an inflated tubular frame, which combines a bow-shaped leading edge tube with several connected strut tubes.The distributed aerodynamic load acting on the flying wing is transferred to the tether by a bridle line system.This particular design is derived from leading edge inflatable (LEI) kites that, in smaller sizes, are popular for kite boarding.Rigid chordwise reinforcements have been added to increase the maximum wing loading of the flexible membrane structure.The leading edge tube has both an aerodynamic and a structural function.On the one hand, the pressurized tube defines the radius of the leading edge which has a substantial influence on the aerodynamic characteristics of the wing, 27 and on the other hand, the tubular frame defines the shape of the unloaded wing.
During flight, the wing deforms substantially and its shape is mainly controlled by the geometry of the bridle line system.Next to its main function of generating the traction force, the wing also acts as a morphing aerodynamic control surface.An asymmetric actuation of the rear bridle lines leads to a twist deformation of the wing which induces both a side force and a yaw moment that enable the kite to fly a turning maneuver. 28,29symmetric actuation, on the other hand, modulates the traction force by adjusting the pitch angle of the wing and by that its angle of attack.
The degree of symmetric actuation is quantified by the depower setting.Because this also shifts the aerodynamic load in chordwise direction, the entire kite pitches around the bridle point. 25The depicted LEI V3 kite with a 25-m 2 wing surface area can structurally support an aerodynamic load of up to 8 kN.The commercial development includes scaled wing prototypes of 25, 40, 60, and 100 m 2 surface area, with the aim to converge on a size below 80 m 2 for the 100-kW system.These kites use fabric materials with higher durability and ultraviolet (UV) coating for extended lifetime.Alternative designs without inflatable leading edge tube are also investigated.

Bridle line system
The leading edge tube and the front sections of the strut tubes are supported by the front bridle lines.The left and right line branches transfer the major part of the aerodynamic load and connect to the left and right power lines, respectively.These bypass the KCU and attach directly to the tether at the bridle point.The trailing edge of the wing and its tips are supported by the rear bridle lines.The two line branches connect via pulleys to the two steering lines.Together with the steering and depower tapes that are deployed from the KCU, the two steering lines form two connected line loops that are used for asymmetric or symmetric actuation of the wing.The KCU is connected to the bridle point by a short line segment.Depending on the kite, the bridle line system may include additional pulleys at bridle split points to allow the line geometry to passively adjust to a varying load distribution and shape of the wing.Just below the bridle point, the tether incorporates a weak link and a separate cable cutter.While the weak link breaks at a predefined tether force to avoid overload and possible damage of the system, the cable cutter severs the tether in an emergency situation on command.In case of such a passive or active separation of the kite from the tether, the safety line is used to [Colour figure can be viewed at wileyonlinelibrary.com] land the kite in tethered parachute or paraglide mode.This line is not tensioned during normal operation, connecting the center of the leading edge tube directly with the tether below the weak link.With the bridle point separated from the tether, the kite is instantly depowered.The relatively heavy KCU swings below the wing, which can be retracted to the ground station in a stable payload flight configuration at fairly low flight speed. 17

Kite control unit
Central components of the KCU are the actuation drive trains comprising steering and depower motors, gearboxes, tape drums, and depower break.Tapes are used instead of lines because of the better reeling behavior and lower layer build up on the drums.The maximum unloaded reeling speed for both motors is 0.4 ms −1 .For redundant communication with the ground station, the KCU relies on three separate wireless links.The main link uses a 5-GHz dipolar directional antenna and is backed up by a slower 2.4-GHz serial link.The ground control can use both links interchangeably, retaining full automatic control functionality.Additionally, a direct manual remote control of the KCU can be established via a 2.4-GHz link.The on-board voltage of 11.6 V is provided by a rechargeable battery module.The KCU uses two onboard computers.
A Micromint Electrum motherboard is used for tasks that are not too time-critical, like communications, while motor control is performed by a faster motherboard, developed at Delft University of Technology.All components are mounted in an aluminum chassis, enclosed by two watertight 5-mm high-density polyethylene (HDPE) covers, an additional foam padding, and a fabric outer hull.The commercial development includes second and third generation control units to meet the increased force levels of the 100-kW system. 30,31These units are equipped with an airborne wind turbine to power all onboard systems.

Tether
The function of the tether is to transfer the traction force of the kite to the ground station.The 4-mm rope is made of Dyneema SK75, has a total length of 1 km, a weight of 0.8 kg per 100 m, a mean breaking strength of 13 kN, and a special coating to enhance its lifetime under the cyclic bending load caused by the reeling on and off the drum. 32The tether is a major safety-critical system component.Because it is not redundant, it is designed according to a safe-life philosophy and has to be replaced when reaching a certain number of load cycles or a certain age.The tether of the commercial 100-kW system has a diameter of 14 mm and transfers a nominal traction force of 50 kN.

Ground station
The ground station uses a drum/generator module to convert the traction power of the outbound, powered kite into electrical energy and to retract the depowered kite, consuming some of the generated energy.The electrical machine of this regenerative winch has a nominal power of 18 kW and connects to the drum via a gearbox with fixed transmission ratio.As shown in Figure 1, the tether enters the ground station through a fixed swivel head and pulley guiding system.For systematic, layer-by-layer reeling on and off the drum, the entire winch is mounted on a sled that is moved transverse to the incoming tether.The alternating linear motion of the sled is coupled directly to the rotational motion of the drum.Except for the separate measurement mast and optional launch mast, the ground station houses all other ground components such as the control center, the rechargeable battery module, and the power electronics.The commercial system uses an electrical machine with a nominal power of 180 kW.

Distributed sensor network
A network of distributed sensors is used to measure environmental conditions and operational parameters of the system. 23However, only some of this information is required for automatic operation, some is for research and development purposes.We concisely describe here the sensor data that is useful for fault detection.The wind speed and direction 6 m above ground is measured by a sensor mounted at the tip of a mast, which transfers its data to the control center wirelessly.The elevation and azimuth angles of the tether and the traction force are measured at the swivel head where the tether leaves the ground station.The KCU is equipped with potentiometers and temperature sensors for both the steering and depower motors; also, the battery voltage is measured and recorded.As illustrated in Figure 2, the wing is equipped with a sensor unit comprising a global positioning system (GPS) receiver and inertial measurement unit (IMU).Because the wing deforms under load, these sensors may produce data that is misleading although the sensors actually work fine.

Winch controller
The winch controller modulates the reeling speed of the tether to maximize the energy output and at the same time ensure reliable and safe operation of the system.A baseline strategy for AWE systems in pumping cycle operation using crosswind maneuvers is to reel out at roughly one third of the wind speed 26 or slightly faster and reel in as fast as the depower capability of the specific kite design allows.For cost-competitive and resource-efficient system designs, the nominal tether force during reel out at the nominal wind speed is close to the maximum allowed value.
To avoid an overloading of the system due to natural fluctuations of the wind speed, we use set values for both reeling speed and maximum tether force.During reel out, the set value for the speed is tracked unless the maximum tether force is exceeded.In this case, the reeling speed is increased to track the set value of the force.During reel-in, a different combination of set values is used.Of particular importance is to transition between the set values gradually when switching the reeling direction.''PointToZenith'' the termination of these crosswind maneuvers and redirection of the kite to point towards the zenith, ''Depower'' the retraction of the kite with reduced angle of attack, ''Power'' the increase of the angle of attack, and ''Intermediate'' a diving maneuver to adjust the elevation angle of the tether to its value during the traction phase.The system can only be in one state at a time, and switching conditions are clearly defined.When reaching the switch criteria for a certain flight phase, the path planner updates the desired system state and by that initiates the next flight phase.When switching states, the flight path planner sets one or more new target points on the unit sphere around the ground station, adapts the desired depower setting, and issues a certain set force to the winch controller.
The flight path controller is only active during system states with more than one target point, for example, during the figure-of-eight maneuvers of the traction phase.Its task is to issue only one of those points at a time and to switch to the next one when certain conditions are met.In order to achieve the optimal pulling force in varying wind conditions, a measurement of the prevailing wind speed is used to calculate the desired elevation angle.The flight path controller has the authority to add a certain offset elevation to the fixed target points for optimizing the pulling force.With the flight path planner not only issuing settings for depower and winch control but also, assisted by the flight path controller, setting a target point on the unit sphere, it is the task of the course controller to steer the kite towards this target point.For this purpose, the course controller calculates the desired course using great circle navigation on the unit sphere 18 and the heading required to fly this course.The actual heading, estimated or measured, is then compared with the desired heading, and an anti-windup PID controller is used to minimize the error.

Distributed software architecture
The modular software architecture accounts for the fact that the hardware components of the control system are distributed over the different parts of the kite power system.For example, the two computers in the KCU are connected with the three computers in the ground station via wireless links.For this reason, an accurate timing of the communication between the distributed hardware components is of crucial importance.
During early flight tests, we observed unstable control behavior when the latency between a measurement and the corresponding reaction of an actor exceeded 100 ms. 35To address this, we chose a Linux tuned for low latency as main operating system.To stay within the maximum tolerable latency, the time budget of each component is precisely calculated based on its technical specifications.A typical example is an IMU signal from the sensor unit mounted on the wing.Such a measurement can take up to 20 ms, generating a signal which is transferred to the Micromint Electrum motherboard of the KCU on a wire (5 ms), wirelessly sent to the ground station (15 ms), processed by the Kite State Estimator (5 ms) and the Flight Path Controller (15 ms), wirelessly sent back to the motor control motherboard of the KCU, and transmitted to the steering motor controller (20 ms).Except for the winch control, which is subject to firm real-time requirements, communication between the distributed software components is realized via the transport layer ZeroMQ. 36This message library is easy to apply, supports the use of various programming languages, and its publish-subscribe pattern is well-suited for distributed designs.In combination with the flexible and straightforward serialization library Google Protocol Buffers, 37 the required time budget is met.

Modes of operation
The fundamental operational phases of an AWE system are launching, energy harvesting, and landing.These phases are adjusted to the prevailing wind conditions.For example, the kite is launched only when a certain minimum wind speed, the cut-in speed, is exceeded.To maximize the net energy output and to ensure a safe operation of the system, the pumping operation is adjusted for each cycle to the wind speed profile.When exceeding the maximum wind speed, the cut-out speed, the crosswind maneuvers are discontinued and the kite is steered towards a static flight position.This parking mode can also be initiated in reaction to potentially harmful weather conditions or other external influences.Parking is also one of the possible reactions to operational anomalies which the system may detect while continuously monitoring its health state.To be consistent with literature, we will adhere to the following terminology 38 : a fault is a defect of a component or a system, a failure is a state of not meeting the defined objective, a hazard is any source of potential damage, a malfunction is the state of functioning different than aimed, and a mitigation measure is the action for reducing the severity or probability of an undesired event.The wind window is the quarter spherical region downwind of an observer at the ground station in which the kite can be flown in a controlled way.In the following; we first propose a zoning concept for pumping kite power systems and then detail the different modes of operation.

Zoning concept
How the specific operation of two or more pumping kite power systems affects the use of airspace and land has been analyzed theoretically by Faggiani and Schmehl 39 for flexible wing systems and by Licitra 40 for rigid wing systems.The zoning concept for the commercial 100-kW system of Kitepower B.V. is depicted in Figure 4.The operational zone covers the volume swept by the kite and the tether during normal pumping cycle operation.The flight zone, on the other hand, covers the larger volume in which the kite and tether may fly during launching, landing, and parking.
The zone also includes an additional safety margin to cover deviations from normal flight path.On the ground, the danger zone is accessible only to experienced personnel.In the surrounding safety zone, people, animals, or light transportation are allowed but there has to be an awareness of the flight operations above.Accordingly, the safety zone excludes busy roads, railways, or open water.It is important to consider that this zoning concept is only a first proposal that is based on a decade of operational experience with a single system.Especially the joint operation of multiple systems in a park configuration is subject to continued research.The zoning concept will be affected majorly by the certification and regulation processes required for the commercial deployment of the kite power system. 16

Launching
The standard procedure to start-up the kite power system is a winch launch of the kite.For this purpose, the wing is placed with its trailing edge on the ground at some distance downwind of the ground station.For the technology demonstrator of the university research group with a maximum wing size of 25 m 2 , this was done by a ground crew, which also held the wing in position, until take-off.For the commercial system, the wing is retained by a ground anchoring system.In a short prelaunch procedure, the tether and bridle line system is first tensioned by the winch, then the wing is released automatically and pulled against the wind direction to take-off.To integrate launching and landing compactly with the ground station, experimental mast-and drone-based techniques have been investigated. 22,41

Normal operation
As long as the system does not detect any faults, failures, or malfunctions, the health state is set to normal operation and the flight path planner commands pumping operation by cycling through the system states illustrated in Figure 3A.As described in Section 2.1.8,the planned path is adjusted for each cycle depending on the expected wind resource.

Restricted operation
Restricted operation is the only health state that allows pumping operation even after detecting a fault.The issued restriction can relate to different system components, depending on the fault.If, for example, the standard deviation of the wind speed exceeds a limiting value, the set force for the winch controller is reduced for safety reasons.In case of unusually high temperatures of the steering motors, the course controller gains can be adjusted in such a way that the load on the motors is reduced.This can be done, for example, by flying larger figure-of-eight maneuvers with larger turning radius.

Parking
The kite is maneuvered into a parking position by terminating the crosswind maneuvers and steering the wing to point towards zenith.With the flight speed dropping to zero, also the traction force of the wing reduces substantially.This force and the elevation angle of the tether can be controlled with the depower setting of the kite.The parking maneuver is very similar to the maneuver executed during system state PointToZenith that follows the reel-out phase of a pumping cycle, as described in Section 2.1.8.Temporarily parking the kite can be useful, for example, to avoid landing and relaunching in case of a passing thunderstorm.Parking can also be triggered by the FDIR system as a reaction to an anomaly.This obviously makes sense only for faults that can potentially cause a malfunction of the system, but not a failure.Because other than a failure, a malfunction disappears with time such that pumping operation can be resumed.An example would be a failing dump load module of the ground station which can cause the battery voltage to exceed a threshold.Another example would be a drop of the power level of the KCU below a threshold.Several options have been investigated to authorize the KCU to park the kite autonomously if no steering inputs are received from the ground station.This autonomous fall back into a ''fail-safe'' state would increase the overall safety level by covering a worst case of loosing the wireless connection to the control center.

Immediate landing
Extensive loss of operational safety or other situations requiring repair or maintenance on the ground cause the FDIR system to request an immediate landing.The following automatic landing procedure consists of three phases.First, the kite is parked and reeled-in to an altitude of around 100 m.To make sure that tether forces stay well within acceptable limits and that the kite does not overfly the zenith, the depower setting during this flight phase is adjusted according to the wind speed.Once the set value of the tether length is reached and certain other requirements are met, the kite adapts its depower setting and dives into the wind window, passing three waypoints.Figure 5 shows the simulated flight path of this diving maneuver for three different wind speed ranges.The flattened spherical coordinate plane represents the wind window and the three dots within this window are the waypoints that vary with the wind speed range.When arriving at a defined minimum height, the kite is powered up and navigates towards a final fourth waypoint on the edge of the wind window.In this last phase of the decent, the kite decelerates and eventually drops to the ground.

Emergency landing
An emergency landing is initiated when the FDIR system diagnoses that the flight control system has lost steering authority.This can occur, for example, when a steering line ruptures and the fault detection algorithm detects a significant difference between the actual yaw rate of the wing, as estimated from GPS data, and the reference yaw rate, as derived from an empirical yaw rate correlation. 42To start the emergency landing, the cable cutter separates the KCU from the main tether.As consequence, the relatively heavy KCU, which is still attached to all bridle lines, swings below the wing, such that the kite can now be retracted to the ground station in a paraglide/parachute mode, using the additional safety line.This procedure has been described in detail in Section 2.1.2.An emergency landing can also be initiated passively by a breaking weak link, as illustrated in Figure 6, showing an early test of a mast-based launching system.At this stage of development, the response time of the winch controller was still insufficient and operation in a gusty wind environment could lead to temporary overload of the weak link. 35The debris visible in the large photo is the remainder of a shock damper construction that is integrated with the KCU to avoid an overload of the safety line itself.In this particular event, the bridle line system of the kite was damaged to such a degree that the wing curled up into a drag parachute-like structure.

SYSTEM SAFETY ASSESSMENT AND IMPROVEMENT
One of the first steps to improve the reliability and safety of a complex technical product like an AWE system is a systematic and comprehensive assessment of its architecture, design, installation, and maintenance to ensure that the relevant safety requirements are met.In the following, we use FMEA together with FTA to assess and systematically improve the reliability and safety of the technology development platform described in Section 2. Because FMEA and FTA depend strongly on the mix of people who contribute to them, we have involved team members with different professional backgrounds, such as system design, operations, safety, legal, and finances, to ensure a high quality.As an integral part of the fault management strategy, we propose a FDIR system.The operation target for this reliability analysis is 1 week of flight without human intervention, except for launching and landing.

Failure mode and effect analysis
The analysis method was developed by NASA in the 1960s, first used within the Apollo program and later adapted for aerospace, nuclear, and other applications with high severity in case of failure.Nowadays, FMEA is used in various fields, as, for example, automotive engineering, for quality management to identify and overcome weak points already during the early design phases of a product.The highly structured approach assesses, one by one, all possible failure modes and their consequences for all system components.For each failure mode, the worst consequence is taken into account.For these failures with high severity or high probability, mitigation measures are proposed.However, the process considers only one failure at a time and not a combined occurrence of failures and their effects.The quality of the analysis essentially depends on the available practical experience with the system and its different components. 43r the FMEA, the system is first divided into subsystems which are then broken down into components, as shown in Table 1.In the present study, we distinguish mechanical components, electronic hardware (HW) components, and software (SW) components.Software malfunctions (ie, wrong calculations, data corruption, and processing delays) and failures (ie, crash and not function at all) are investigated separately, while for other types of components, only failures are considered.Depending on the operation mode a failure mode can have different effects.Whenever this is the case, the failure mode is duplicated to investigate its effect for different operation modes such as energy harvesting, launching, or landing.The FMEA is conducted with a spreadsheet, listing one failure mode per row and grouping these rows into subsystems.Columns are the investigated properties such as (a) the potential fault mode (software malfunction, software fault, hardware fault, wrong configuration, data corruption, or data delay), (b) causes and mechanisms, (c) the foreseeable sequence of post-failure events, (d) the hazardous situation, (e) the worst case harm (physical injury or damage to the health of people, or damage to property or the environment), (f) the corresponding severity and (g) probability, (h) the proposed mitigation measure, (9) the residual, post-mitigation worst case harm, (i) severity, and (j) probability.The probability definitions used in the analysis are listed in Table 2, while Table 3 lists the severity definitions and the associated global harm.b Affects very little of the system, noticed by average customers.
c Most customers are annoyed, mostly financial damage.
d Causes a loss of primary function; loss of all safety margins, severe damage, severe injuries, maximum one possible death.
e Product becomes inoperative; failure may result in complete unsafe operation and possible multiple deaths.
f Only possible if kite leaves the operation zone, which is also the top event for the FTA discussed in Section 3.2.
For each failure mode a risk number R is calculated as product of severity S and probability and a proper assignment of these values is crucial for the risk evaluation of the specific failure mode.For this reason, the values for S and R are evaluated in close collaboration with the engineering team of Kitepower B.V. working on the 100-kW system.
In the next step, the investigated failure modes are prioritized based on the calculated risk numbers and corresponding mitigation measures proposed.Most of the failure modes can be mitigated by the FDIR system presented in Section 3.3.However, for some modes, the risk can be effectively lowered only by decreasing the failure probability of the component, which requires a stricter development or verification process or purchasing a higher quality component.In Section 4, we present the result of the FMEA and detail two specific failure modes as examples.

Fault tree analysis
As mentioned in the previous section, an FMEA does not consider combined occurrences of failures and their effects.However, also faults with low individual risk factors can cause hazardous situations when occurring simultaneously.To take this into account, we complement the FMEA by an FTA. 43The method has been developed in the 1960s for the analysis of a ballistic missile system and subsequently been applied in a broader context to analyze the risks related to safety and economically critical assets. 11,44A fault tree is a logic diagram describing the relationships between a particular system failure and the individual faults, failures, and malfunctions on component and subcomponent level that contribute to this particular failure.The fault tree follows a top-down structure using logic gates and events to model how the component states relate to the state of the entire system.The top event corresponds to the particular system failure that is investigated.Commonly used are AND, OR, and conditional logic gates, while events are top, intermediate, and basic events as well as undeveloped and conditional events. 45For quantitative failure analysis, the logic diagram is extended by quantitative information about component reliability, such as failure probabilities.
Most AWE systems crucially rely on active control of several distributed subsystems that are mechanically and electronically coupled, each consisting of several components.Each component can have several failure modes depending on the operation phase or physical characteristics of the failure.Thus, the number of possible combinations of these failure modes is very large when considering the entire AWE system.A common practice for FTA is to address only those combinations with catastrophic consequences.Once the fault tree is defined and failure models assigned to all involved system components, FTA software tools can be used to calculate the probability of the catastrophic consequences and prioritize the different contributors to the top event with catastrophic consequences.Contributors with high priority are then improved to decrease their impact on the failure.This process of FTA and subsequent design modifications is repeated iteratively until the computed probability of the catastrophic event has been decreased below a certain threshold.
For the kite power system investigated in this study, we define the catastrophic event as completely unsafe operation with possibly multiple deaths.This would be the case if the kite leaves the operation zone which could entail the following catastrophic consequences: • entering forbidden airspace and collision with other users of the airspace, 16 • crashing into a critical infrastructure on the ground, • crashing on a highway and causing accidents, and • crashing directly on many people in a crowded area.
Because of the severity of these consequences, we define the case of the kite leaving the operation zone as the top event (see also Table 3).
We analyze only these events which bring the system to this specific top event.Crashes or other undesired events within the operation zone are not included in the FTA because their consequences are not considered as catastrophic.We create the fault tree and model component failures for the same operational target as for the FMEA, namely 1 week flight without any pilot intervention.The complete fault tree is depicted in Figure 7, with 31 different basic events and two undeveloped events populating the leaf nodes.The undeveloped events ''winch system problem'' and ''kite damaged, not steerable'' could have been broken down further to the component level; however, within the frame of this study, we decided to not do this and instead assign integral probability models to both events.The parts of the fault tree highlighted in different colors are further detailed in Figures 8 to 11 The investigated failure events and the corresponding probability density functions are listed in Table 4.For all software and firmware components in the system, a constant failure rate of 10 −3 /h is used.This value corresponds to software developed according to DO-178C   46 Even though no formal standard was followed during the development of the current software, we consider using DAL-D level failure rate reasonable on the basis of the generated artifacts, test intensity, and use history of the software components.Failure databases provide generic failure data collected from a variety of sources.For some of the hardware components, Weibull failure coefficients from Barringer & Associates, Inc 47 are used as a starting point.For some of the basic events in the fault tree, data was not available in the failure databases.For such events, expert opinion and engineering judgment was used to estimate the failure probability.The expected mean time to failure for a nonrepairable system is abbreviated as MTTF.The Weibull probability density function is parametrized by the characteristic life (hours) , the slope , and the failure-free life (hours) .

Fault detection, isolation, and recovery
FDIR is an integral part of the fault management strategy because it implements the mitigation measures proposed by the reliability analyses.
A generic high-level FDIR functionality for AWE systems is outlined in Table 5.We aim at a minimal implementation complexity for several reasons.First, because this generally reduces the effort to validate the implemented FDIR system.Physical models that can be part of such a system are in most cases validated only for nominal operation with often insufficient prediction quality for off-design scenarios resulting from  anomalies.An example is a damaged bridle line system, which can substantially alter the flight dynamic behavior of the wing.An extreme case is shown in Figure 6D,E.Second, the overall complexity of the system has to be manageable.Therefore, detection mechanisms are designed only for those failure modes that have been identified by the FMEA and FTA.For the same reason, the mitigation measures are grouped as much as possible, using a common FDIR implementation per group.
We adapt an hierarchical FDIR architecture that is used in space industry 48 because it fits well to the investigated AWE system.Satellites, for example, have also high reliability requirements, incorporate a safe mode, and use holistic anomaly detection.The layered structure supports a clear organization of tasks and makes it difficult to overlook aspects.The modular and distributed architecture at low levels is straightforward to maintain in case of design changes.We implement this architecture with five hierarchical levels each represents a different way of monitoring and detection, as illustrated in Figure 12.Items represent the smallest functional units associated with the lowest FDIR level, followed by equipment, subsystems, and the entire system at successively higher levels.With the FDIR level, also the criticality of the inspected faults increases.Fault monitoring and detection starts at the lowest level.Successively higher levels are triggered only after activating the lower level several times without catching the fault.The five levels are introduced in the following.
• Level 0 performs built-in monitoring at item level.Some functional units have to be capable of recovering autonomously from faults without affecting the performance of the system.This is necessary especially if the recovery time for a specific fault is critical.Software and hardware watchdogs in microcontroller boards are typical examples for Level 0 FDIR.and command consistency(command timing and feedbacks).
• Level 2 monitors software performance at subsystem level.At this level, faults are caught that could not be localized at lower levels.The total current flow monitoring from the KCU onboard power subsystem and reconfiguring the subsystem in case of a fault is an example of Level 2 FDIR.
• Level 3 monitors software performance at system level.One or more faults which could not be recovered at the lower levels are caught by the Level 3 FDIR if it affects the system performance.Heuristic methods 49 or model-based methods 50 are in use for detecting such a system-wide anomaly.In general, the flight dynamic response of the kite to steering commands is a good indicator for the overall health status of the airborne subsystem.In our study, an empirical correlation 42 between the turn rate of the kite and the steering actuation is used for the Level 3 FDIR implementation.This correlation has to be determined by system identification for each kite, taking also into account the dependency on the depower setting.
• Level 4 performs hardware-only monitoring at system level to protect the system from catastrophic events.There are no recovery or isolation actions at this level.An example for such a hardwired system-level alarm is the cutting of the tether for a controlled landing in the event of a loss of control.
Levels 3 and 4 are centralized and monitor the entire system applying a holistic perspective.For example, the failure event ''kite leaves operation zone but causing fault not identified'' is handled at Level 3. In contrast to this, Levels 0, 1, and 2 are decentralized.Each subsystem has its own FDIR component which may consists of Levels 1 and 2. Similarly, items may have their own Level 0. FDIRs for each subsystem do not necessarily include all three levels.The composition of the levels including the fault detection sensitivity shall be determined according to the risk coefficient of the fault, which is determined by the FMEA.Thus, for noncritical faults, detection mechanisms only at higher levels, eg, Levels 3 and 4, may be sufficient.

RESULTS
Within the scope of the FMEA, we investigated 80 different failure modes of electronic hardware and software components.Based on the risk number calculated from Equation (1), we compiled a prioritized list of failure modes that could lead to potentially hazardous situations.To increase the system-wide reliability and safety, we distinguish modes that can be mitigated by an FDIR system and modes that require a redesign of the involved components to meet stricter reliability standards.As an example, Table 6 details a malfunction of the flight path controller software that can be mitigated by FDIR.The rows listing the residual severity, probability, and risk number indicate the improvement that can be achieved by the proposed mitigation measure.The specific operational target has an impact on the foreseeable sequence of events and the suitable mitigation measures.For example, for short flights, certain sensor failures may be tolerable when the pilot is in the loop.However, but for 1 week of flight, intervention of a pilot can not be proposed as a mitigation measure.
[Colour figure can be viewed at wileyonlinelibrary.com]

Residual severity 5
Residual probability 1 Residual risk number 5 Abbreviations: FMEA, failure mode and effects analysis.Table 7 shows an example for reducing the risk of a failing motor driver microcontroller software by defining a stricter software standard, for example, by changing the DAL level from D to C. The total risk number of the technology development platform calculated by the FMEA is 569.
Applying the proposed mitigation measures this number can be decreased to 370.The maximum risk number of a single failure mode is calculated as 15 which can be reduced by mitigation to 10.The residual severity of all individual failure modes is below 6, which means that the proposed mitigation measures effectively prevent any single failures causing the catastrophic event, which we define as the kite leaving the operation zone and potentially colliding with other users of the airspace or crashing on the ground with the possible consequence of multiple deaths.
With the FTA we focused on combinations of failures causing the catastrophic event.Based on the fault tree illustrated in Figure 7, we determined 33 different component failure events which in combination may trigger the catastrophic event.A probability model is assigned to each of these failure events, and then, the exact probability of the catastrophic event is determined using the binary decision diagram (BDD) method 51 and considering Boolean logical relationships of the failure events.During the probability calculation, minimal cut sets (MCS) are also extracted.In a fault tree, an MCS is the smallest combination of basic events causing the system failure.Unlike the classical MCS method, the BDD method provides exact values for the cut set unavailability and relative importance.We define unavailability as the probability that a specific cut set is in a failed state at time t and we use the Vesely-Fussell importance factor defined as the fraction of system unavailability contributed by a specific cut set. 44  Figure 13 shows the computed unavailability of the investigated system as a function of the days in operation.The computed unavailability after 1 week of operation is 2.75%, which is equivalent to a system failure rate of 0.163×10 −3 /h.The minimal cut sets listed in Table 8 amount to a joint unavailability after one week of 2.70%, which means that these first 10 cut sets practically describe the catastrophic system failure behavior resulting from simultaneous component failures.At the time of writing, the commercial technology development platform was continuously further developed, applying the insight gained from the reliability analysis step by step.As a result, the design of some of the critical components was improved and also some modules of the FDIR system were implemented.Even with the partial implementation of these measures, a reliability improvement can be recognized from the flight logs.Figure 14 illustrates the increasing number of flight days per quarter as well as the total accumulated number of flight days.The dashed line indicates the limit of continuous operation at around 90 days per quarter. [

CONCLUSIONS
System reliability, operational robustness, and safety are of crucial importance for the successful commercialization of AWE.Given the inherent complexity of the technology, these aspects have to be taken into account already early in the system design process.In this study, we combine a FMEA and a FTA to assess and systematically improve the reliability and safety of a 100-kW technology development platform, which has been derived from a well-documented technology demonstrator with a 18-kW electrical machine.Potentially hazardous situations are mitigated by FDIR using an hierarchical architecture from space industry that fits well the design of AWE systems.To our knowledge, this is the first published study defining a safety and reliability engineering process for AWE systems.
In the FMEA, we consider only single failure modes, one by one, proposing mitigation measures for each mode to increase the system-wide reliability and safety.With the FTA, we focus on simultaneous failures and determine the resulting probability of the worst-case event, defined as the kite leaving the operation zone in an uncontrolled way.We reveal the underlying fault mechanisms that can cause this event and provide this information to the engineering team for iterative improvement of the system design.The computed probability will later be used to prove to a certification body that the probability of harming people is below a certain limit.
At the time of writing, the commercial technology development platform was continuously further developed and the proposed mitigation measures were only partially implemented.For example, the holistic model-based technique at FDIR Level 3 was not yet validated for different flight conditions and also the redesign of critical components was not completed.Nevertheless, the initiated measures already show a clear impact and the uptime of the development platform is continuously increasing.

FIGURE 1
FIGURE 1 Components of the kite power system, equipped with a 18-kW ground station and 25 m 2 LEI V3 tube kite [Colour figure can be viewed at wileyonlinelibrary.com]

FIGURE 2 A
FIGURE 2 A, Front view of the LEI V3 kite 25 ; B, Photo of the V5.40 with 40 m 2 wing surface area (courtesy of Kitepower B.V.)

FIGURE 3 A
FIGURE 3 A, Side view of the measured flight path of a representative pumping cycle with indicated flight phases between switch points 18 ; B, Photographic visualization of a pumping cycle during night operation by tracing a marker light on the kite from the ground station (right) using long-term exposure (courtesy of Kitepower B.V.)

[FIGURE 4
FIGURE 4 General spacial layout of pumping kite power systems (courtesy of Kitepower B.V.)

FIGURE 5 FIGURE 6
FIGURE 5 Landing paths of the kite on the flattened spherical coordinate plane for different wind speed ranges18 . The top event with one abstracted branch represented by a triangle symbol is shown in Figure8.The intermediate event ''malfunction in tether length control'' is caused by any of the four events at lower level, ie, ''ground control HW problem'', ''system state controller SW problem'', ''winch control SW problem'' or ''winch system problem.''The undeveloped event includes all other ground components that cause a malfunction of the tether length control.A ''malfunction in tether length control'' or the basic event ''all tethers off'' cause the intermediate event ''tether does not keep the kite in operation zone.''If this occurs together with the condition ''kite steering wrong or not existing,'' the top event ''kite outside operation zone'' is caused.Figures 9 to 11 show branches that are fully detailed, ending at basic events, which are the leaf nodes of the fault tree.

FIGURE 7 FIGURE 8
FIGURE 7 Complete fault tree (see Figure8for symbol legend)

FIGURE 9 FIGURE 10 FIGURE 11
FIGURE 9 Fault tree for the intermediate event ''Communication system failure''

FIGURE 14
FIGURE 14 Number of flight days for Kitepower B.V. from 2016 until today 33,34 Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/we.2433by EVIDENCE AID -BELGIUM, Wiley Online Library on [13/04/2023].See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions)on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License

TABLE 1
Breakdown of the kite power system for the FMEA into subsystems and components Kite state estimator software, flight path controller software, flight path controller steering software, flight path controller destination software, kite control software, message forwarder software, central processing unit hardware, and microcontroller unit hardware Onboard power system Airborne wind turbine (AWT) hardware, maximum power point tracking hardware, batteries hardware, and power board hardwareGround control systemSystem state controller software, ground state estimator software, winch control software, clock software, message forwarder software, and ground control computerGround sensing system Sensor software, GPS hardware, wind sensor hardware, force sensor hardware.Ground power system Generator, gear box, sled and secondary electrical drive, tether guidance mechanism, low-level winch control, batteries hardware, dump load module, inverter, and grid connection (optional)

TABLE 2
Probability definitions for the failure modes

TABLE 3
Severity definitions and harm of the failure modes a Only results in a maintenance action, noticed by alert customers.

TABLE 5
Generic high-level FDIR functionality required for AWE systems

•
Level 1 monitors software at equipment level for units that can not detect and recover autonomously from faults.At this level, detection, recovery, and isolation (if required) are performed by the subsystem.Switching from the faulty sensor to a redundant one or deriving the data from different sensor(s) are examples of Level 1 FDIR.For the subsystems, inputs and outputs are checked by the Level 1 FDIR with regards to data consistency (continuity and frequency checks), measurement consistency (range check, rate check, comparison with a redundant sensor), 10991824, 2020, 2, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/we.2433by EVIDENCE AID -BELGIUM, Wiley Online Library on [13/04/2023].See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions)on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License

TABLE 6
Example for the FMEA of a failure mode forming a requirement for the FDIR system Abbreviations: FDIR, failure detection, isolation, and recovery; FMEA, failure mode and effects analysis.

TABLE 7
Example for the FMEA of a failure mode pointing out a necessity for a stricter standard

TABLE 8
Table 8 lists the MCSs, their unavailability, and relative importance for the technology development platform.First 10 minimal cut sets with unavailability and Vesely-Fussell (VF) importance factor calculated for one week of operation FIGURE 13 Unavailability of the kite power system Colour figure can be viewed at wileyonlinelibrary.com] [Colour figure can be viewed at wileyonlinelibrary.com] 10991824, 2020, 2, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/we.2433by EVIDENCE AID -BELGIUM, Wiley Online Library on [13/04/2023].See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions)on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License