Cyber-physical system testbed for power system monitoring and wide-area control verification

: The electric power system is intrinsically a cyber-physical system (CPS) with power flowing in the physical system and information flowing in the cyber-network. Testbeds are crucial for understanding the cyber-physical interactions and provide environments for prototyping novel applications. This study proposes a four-layer architecture for CPS testbeds with emphases on communication network emulation and networked physical components. A configurable software-defined network is employed to bridge physical components with wide-area applications for closed-loop control. In order to distribute physically coupled devices into multiple software simulations, this study proposes a data broker setup based on a distributed messaging environment to achieve low-latency data streaming. The decoupled design with data streaming allows for building testbed components as modules and running them in a distributed manner. Case studies verify the data broker setup for low-latency sensing and actuation, as well as the communication emulation setup for the desired network latency. Also illustrated is a replay attack scenario using synchrophasors in the Western Electricity Coordinating Council (WECC) 181-bus system for demonstrating the closed-loop cyber-physical simulation capability of the testbed.


Introduction
Modern electric power systems are facing new challenges and opportunities due to the increasing level of renewable energy. Traditional vertically integrated utilities are being deregulated, and information sharing between system operators is becoming common. Power system control paradigms are shifting from the traditional, local control dominated scheme to the modern, widearea synchrophasor data assisted controls [1]. Due to the intertwined nature of the physical and cyber components, the testing of emerging applications needs to be done in environments that can characterise both the physical system and the cyber network.
Testbeds serve as platforms for conducting rigid yet replicable tests and verification of new controls and applications. Traditional testing approaches involve computer simulation and physical hardware emulation with an emphasis on physical systems, where most of the closed-loop controllers are local. For example, analogue simulators, electromagnetic transient programs, transient stability programs [2][3][4], and digital real-time simulators belong to this category. These tools are still broadly utilised for modelling the physical system in testbeds.
More recently, the growing interest in modelling power system wide-area monitoring, control, and cybersecurity requires the integration of communication networks ('network' hereafter). A global event-driven co-simulation framework is described in [5] for wide-area measurement and control schemes. A cyber-physical system (CPS) testbed is proposed in [6] for intrusion detection in power systems. A toolkit for security research on CPS networks is proposed in [7] to connect CPS software and hardware for studying cyber attacks and defences. The architecture and studies of the PowerCyber testbed for substation cybersecurity are described in [8]. An integrated testbed, SURE, for security and resilience for general CPS is described in [9].
Based on the tightness of the coupling of the simulation, measurement, control, and communication, the above work can be further categorised into three types: full integration, co-simulation, and hybrid. The full-integration approach, such as iTesla in [30], aims to develop and simulate all the pieces in a unified tool, which is accurate but requires extensive expertise for implementation. The co-simulation approach aims to glue pieces together using a co-simulation middleware, such as the Hierarchical Engine for Large-scale Infrastructure Co-Simulation (HELICS) in [31], which is agnostic of the linked pieces. This approach is easy to extend and multi-domain applicable, but it leaves the consistency check and validation to the user. Our work applies a hybrid approach by allowing for the modules to be developed independently but coupled for closed-loop simulation and control through defined data interfaces.
Moreover, the modelling accuracy of the interactions between the physical system and communication networks is crucial for the fidelity of testing results. For example, in a testbed where the power grid simulation accepts control signals from over the network, the network topology, bandwidth, and latency become crucial and cannot be neglected. However, none of the existing research has evaluated the accuracy or proposed any solution for improvement. In addition, testbeds need to be structured to provide convenient data interfaces for integrating the control under test, which may be developed separated from other software pieces. This paper focuses on building a CPS testbed, which is built on top of an earlier version of the CURENT large-scale testbed (LTB) [32], for accurately modelling the low-latency data acquisition and latency-embedded communication networks for monitoring and closed-loop controls in power systems. Distributed software pieces representing the physical system, monitoring, energy management system (EMS), and measurement-based control system (MBCS) are organised into four-layer software architecture. This paper also proposes the data broker setup to stream data between applications and networked physical devices with low latency. In the proposed testbed, networked physical devices modelled by distributed software programs can exchange data through the low-latency distributed messaging channels, while wide-area applications can communicate over the highly configurable emulated Internet Protocol (IP) network. Compared with recent work on power system testbeds, for example, [20,21,26], our proposed testbed is focused on the integration of large-scale power system simulation with cyber networks for wide-area control and is unique in providing both low-latency data streaming and network emulation.
The rest of this paper is organised as follows. Section 2 introduces the four-layer software architecture and presents detailed descriptions of each layer. Section 3 proposes the distributed messaging environment and the data broker setup for low-latency data streaming between the physical system layer and the networked physical devices layer. Section 4 presents case studies for data streaming latency analysis and closed-loop CPS control. Section 5 draws the conclusions.

Overview
Generally speaking, a cyber-physical testbed for power systems should consist of at least a physical system, a communication network, and network applications. Our testbed follows the concept but adds a layer for the so-called 'networked physical components'. The overall architecture of our testbed is shown in Fig. 1. From the bottom up, the physical system layer runs simulations to characterise power system dynamics. The networked physical component layer models measurement devices and actuators that are equipped with communication capability and, in the meantime, interact with the physical system. The communication network emulation layer creates software-defined networks (SDN) for data transmission. The application layer consists of both power system applications and cyber-network applications.
Instead of developing a monolithic architecture, the testbed leverages existing tools by integrating them into a modular architecture. A module is a self-contained set of routines designed to complete a specified task. For example, a physical system layer may be composed of one conventional power system simulator, and a state estimation routine may become a module in the application layer. Modules are decoupled, which means one module is developed individually and executed asynchronously with the rest of the testbed.
The layered and decoupled testbed has the following advantages: (a) existing code and tools can be reused; (b) modules of similar functionality can be interchangeable; (c) data interfaces can be clearly designed; and (d) modules can be distributed to multiple processors or machines and executed in parallel.

Physical power system layer
The physical power system layer runs computational routines to simulate the characteristics of the physical power grid. The main components in power grids include generators, transformers, transmission lines, capacitors, and loads. The time scale for power system simulations may vary from microseconds to hours, depending on the type of problem. The testbed focuses on extended-term transient stability analysis (10 −3 -10 2 s) using positive-sequence phasor-domain models. By assuming a balanced network, ignoring transmission line transients, and considering the positive symmetric component, the stability-type simulation tools sufficient for large-scale electro-mechanical transient studies and short-term economic studies with less computation.
The physical layer simulators can be adapted from existing tools, either open-or closed-source, as long as data interfaces and execution control interfaces are provided. The physical layer provides application program interfaces for loading power system data, sending raw simulation data to the networked components, and receiving control commands from the networked components. In addition, the simulator module also has to provide the simulation time for synchronisation. The testbed interfaces to two open-source simulators (ANDES [33] and GridDyn [34]) and a commercial real-time simulator (ePHASORsim) for different testing purposes.

Networked components layer
The networked components layer proposed in this paper aims at modelling monitoring and actuation devices that are attached to the physical power system and communication-capable. Examples of networked components include synchrophasors, remote telemetry units, and actuators that can be controlled remotely. Such devices are not part of typical power system simulators due to the complexity of networking features. However, their critical roles in wide-area monitoring and control require explicit and accurate modelling.
Networked devices in the testbed can be hardware or software. A more familiar example is that a hardware phasor measurement unit (PMU) can be attached to the analogue signal outputs of a real-time simulator in order to measure the states in the simulated system. This example can be implemented purely in software so that the PMU acquires data from a running simulation software tool, embeds noises, errors, and latency, and then packages data in the IEEE C37.118-2011 format for streaming.
Using software for simulating networked components is appealing due to low costs and simplicity. Interactions between networked components and the physical system, however, require a messaging service for sending and receiving data rapidly amongst distributed software pieces. Details of the proposed messaging service will be discussed in Section 4.

Network emulation layer
The network emulation layer utilises SDN-based network emulation software for creating internet protocol-based communication networks. The IP-based network is composed of hosts, links, switches, routers, and network controllers. SDN provides flexibility for configuring arbitrary networks using these components to emulate large-scale physical networks. Parameters for the components such as link bandwidth and routing algorithms are configurable, and physical network interfaces can be mapped to the emulated network.
In the testbed, communication networks linking the networked components layer with the application layer can be defined based on actual or proposed topology. One can create a PMU data streaming network by connecting PMUs and phasor data concentrators (PDCs) to switches using links with different latency. Software-based networked components and applications can run on virtual environments or containers that are provided by or connected to the network emulator. Hardware-based components can be plugged into the physical network interfaces for access.

Application layer
The application layer is a collection of power system applications, such as the EMS, MBCS, and network applications, including traffic monitoring and cyber-attack defence. Power system applications and some network applications are modules for specific routines while communicating over the network. Other applications may directly interact with the SDN controller.
Specifically, among power system applications, EMS and MBCS are two different sets of modules differentiated by their functionality and execution time horizon. EMS modules are generally considered as steady-state calculations [35], while the MBCS modules are mostly developed for enhancing stability and resiliency. Regardless of functionality differences, the EMS and MBCS in the testbed software implementation can follow the same program structure.
Several EMS and MBCS methodologies have been implemented in the proposed testbed. For example, a two-stage, robust dynamic state estimator [36] has been integrated for estimating static and dynamic states from SCADA and PMU data. Remedial action schemes [37], such as a controlled system separation scheme [38] and guaranteed frequency response [39,40] has been implemented as a remedial action scheme for the largescale Western Electricity Coordinating Council (WECC) test system. Also, a wide-area damping control allocation algorithm [41] and frequency control frameworks [42,43] have also been implemented. The above are examples to prove that the LTB is designed for fast prototyping and validation of research methodologies.

Distributed messaging environment
The four-layer architecture proposed in Section 2 involves distributed software and hardware modules, between which rapid data exchange, are required. This issue exists in most distributed simulation or co-simulation environment. If a distributed simulation only involves two pieces of programs, ad-hoc data exchange may be programmed based on the scenario. In a more complex testbed, data exchange is needed between multiple modules. For example, the physical system simulator may need to send the calculated states to multiple PMU simulators and, in the meantime, check for a control signal from substation simulators. To address the difficulty in managing multi-party data streaming in a rapid manner, the Distributed Messaging Environment (DIME) is proposed and implemented.
The DIME has a server/client architecture and implements several routines for distributed messaging. The DIME server can connect to a Transmission Control Protocol (TCP), User Diagram Protocol (UDP), or a Unix Inter-Process Communication (IPC). A DIME client can assign itself a name and connect it to the server. Clients are able to query the names of connected clients and perform one-to-one or one-to-many messaging. When a message is received by the server, it will enter the queue for the recipient. A client is able to retrieve the first item in the queue by synchronising with the DIME server, which will de-queue the sent data. In addition, the DIME server is transparent to its clients.
The differences between DIME and SDN are listed as follows: (i) DIME relies on an existing IP-based network for data streaming if working in the TCP or UDP mode, while SDN creates an IPbased network itself inside a Linux operating system. (ii) DIME has no control over the network configuration where the data is sent, while SDN can control the network topology and parameters that affect how the data is sent.
(iii) DIME is for sending information between two modules where, in reality, the info is immediately available (via hardware). For example, the PMU will obtain the measurement locally and almost instantaneously. However, to implement this in the testbed, we need to connect two modules and stream the 'measurement' data.
In contrast, the network represented by the SDN is always present in reality.
The proposed DIME allows for decoupling simulation, data acquisition, and actuation processes on multiple computers. In the testbed, data acquisition and actuation programs may run in virtual hosts (VHs) created in emulated environments on the same machine as the network emulator. Physical system simulation is a heavy-loading number-crunching process that occupies the central processor for a long time. The DIME opens up the possibility of running the simulation on a dedicated machine, which is connected to the network emulation machine with data acquisition and actuation.

Data broker for the applications layer
Care needs to be taken when using DIME over a network with link latency. Ideally, in the above example, the simulation computer may be plugged into the emulated network and use TCP/IP based DIME for messaging. If link latency is emulated, however, the latency between DIME clients will double, which is an undesired situation and will result in system errors. This issue is illustrated in Fig. 2. In this setup, delivering power system states from the simulator to the PMU at Virtual Host 1 will bear a latency of 5 ms. This latency should have kicked in only for wide-area data transmission. Therefore, the delay from the simulator to the PMU, or in general, the delay between the physical layer and networked component layer, must be eliminated. The proposed approach for avoiding the double latency issue and achieve low-latency distributed messaging introduces a data broker program and an IPC-based DIME. As illustrated in Fig. 3, the simulator is connected to a separate switch where a host for the data broker is connected. The data broker instantiates two DIME clients, one connected to the simulator over TCP/IP and the other one connected to the IPC-based DIME, and relays data from one client to the other. In addition, networked components running at other virtual hosts are also connected to the IPC-based DIME. Since that (a) IPC protocol is implemented using file systems, which can be treated as instantaneous, and (b) there is no latency on the link between the simulator and the switch, the distributed messaging between the simulator and networked components can achieve low latency.
The proposed approach can be extended to systems with networked components running on dedicated hardware rather than on virtual hosts. For example, there are cases when prototyping a PMU simulator on a Raspberry Pi micro-computer is needed. An additional physical network switch can be added between the simulator and the emulator. The Raspberry Pi will connect its onboard network interface to the switch and run DIME client over TCP/IP, and then use a USB-based Ethernet adapter to connect to the network emulator. Clearly, this setup achieves low-latency distributed messaging between the simulator and the PMU simulator on hardware.

Closing the loops
The last feature to emphasise is the closed-loop simulation capability, which differentiates the testbed from a conventional open-loop environment. Control loops modelled in the testbed include local control loops and wide-area measurement-based loops. Local controllers are mostly modelled in the physical system simulator, whereas wide-area controllers are implemented as applications connected to the network since communication must be addressed in wide-area control to make it highly practical. Data-driven applications in wide-area control loops usually have incoming paths from PMUs and outgoing paths to actuators. The incoming paths are established between PMUs and a PDC program, behind which the data-driven application resides. The outgoing paths are established between the application and actuators, to which the control signals should be sent, using substation communication protocols such as the International Electrotechnical Commission (IEC) Standard 61850.
The execution of the closed-loop simulation and testing are described in Table 1. First, processes for the four layers will be started in the given sequence, which will naturally handle the dependency of layers. Then, the initialisation loops will be executed for each module and between layers. Next, after the initialisation, the main loops of the processes will run to generate, exchange and process data until the end of the simulation.

Open-source implementation
The following open-source tools and libraries are utilised: • ZeroMQ: the backend for the DIME server and client.

Verification of rapid data acquisition
This section presents case studies for verifying the proposed rapid data acquisition using the DIME and data broker.
The setup of the case study involves the physical system layer and networked components layer and is distributed onto two computers. The setup for the scenario is similar to Fig. 3. Computer 1 represents the physical system layer using the power system simulation tool, ANDES, and establishes a DIME server at tcp://192.168.1.200. Computer 2 represents the networked components layer and part of the network emulation layer by executing a network emulation program, inside which one network switch and a virtual host is created. A PMU simulator runs on the virtual host and connects to a local DIME server at ipc:///tmp/dime. Computer 2 also runs the data broker program for relaying data between the two DIME servers. Lastly, Computer 1 is plugged into Computer 2 using an RJ45 Ethernet cable.
In terms of the data volume, we consider measurements on a bus, including V, θ V , I, θ I , f , namely, voltage magnitude, voltage phase, injection current magnitude, injection current phase, and frequency, respectively. Thus, for each time step, the simulator will send a group of five variables to the PMU through the switch and the data broker. The round-trip time between the sender and receiver is measured for 5000 random samples. The measured oneway latency, which is equal to half of the loop-back time, is scatter plotted in Fig. 4, along with the linear regression results.
It can be observed that the latency of the distributed messaging between the simulator and the PMU runs stably around 1 ms. This latency includes the networking time and double serial/deserialisation time in the DIME. It is worth mentioning that it is simpler to measure the round-trip time by the sender than to measure the difference by the receiver because the latter involves synchronising the clocks of the two computers. If compared with the double communication network latency without the data broker, the proposed method has significantly reduced the time for data acquisition and can be considered as having low latency. Fig. 3 Proposed low-latency distributed messaging setup using two DIME servers and a data broker Table 1 Execution workflow for the cyber-physical testbed (a) Start processes for (i) Communication network emulation (ii) DIME servers and the data broker (iii) Applications running on virtual hosts (iv) Networked components and (v) Physical power system simulator (b) Initialisation for (i) Self-initialisation of modules (ii) Between networked components and the simulator, and (iii) Between applications and networked components (c) Execute processes for (i) The simulator, at each integration step, (1) send variable values to networked components, and (2) receive control signals from networked components. The scalability of the proposed approach is also illustrated in Fig. 5. Consider an increase in the data volume from five measurements to 500 with a step of five. Results show that the proposed method scales well in the range.

Verification of the communication network latency
Next, the SDN-based communication network emulation is verified for the accuracy of communication latency. Two scenarios are examined, including a small test system with three hosts, two network switches, and four links, and a proposed communication network for WECC.
The topology of Scenario 1 is illustrated in Fig. 6. In the network emulator, two switches, two VHs, and three links with the noted latency are created. A physical computer is connected to a physical network interface, which is connected to Switch 1 using a latency-free link. The theoretical latency is 7 ms between H1 and H2, 12 ms between H1 and PC, and 15 ms between H2 and PC. The latency is measured by a server/client socket connection over TCP, and the results are plotted in Fig. 7, which shows the observed latency matches the theoretical values well. Note that the first connection may take longer due to the address resolute protocol (ARP) for querying the media access control (MAC) address. Once the ARP caches the MAC, the time reflects the actual latency.
Then, a network topology for the WECC system is used to examine the emulated latency. The communication network topology shown in Fig. 8 contains 15 balancing regions. Multiple hosts are created in each region to resemble PMU substations (green circles) and a regional PDC (purple circles). PMUs and the PDC in each region are connected to a regional network switch (solid red circles), and the switches are interconnected to form the network for WECC wide-area data streaming. The latency in the network is configured as follows: (i) 10 ms between a PMU and a switch.
(ii) 5 ms between a PDC and a switch. (iii) 15 ms between two regional switches.
The latency of the following scenarios is measured: (i) Between a PMU and a PDC in the LADWP region. (ii) Between the two PDCs in AESO and SRP.
For Scenario 2, the path computed by the shortest path algorithm has been highlighted in Fig. 8. The theoretical latency for the scenarios is 15 and 100 ms, respectively. The measured latency for the two scenarios is shown in Fig. 9, which shows a perfect match with the theoretical values.

Wide-area sophisticated replay attack using PMUs
This subsection aims at providing a comprehensive demonstration of the cyber-physical simulation capability of the testbed. In future cyber-physical power systems, PMUs will become crucial for situational awareness and wide-area control decision making. It may be possible that hackers exploit the vulnerabilities in the PMU firmware for recording and replaying data. With some knowledge of the system, a sophisticated attacker may record data from severe events and replay the data when the system is normal in order to cause disturbances.
Details of this scenario are described as follows. In the physical layer, the WECC 181-bus system is simulated in a quasi-real-time mode. The topology of the WECC system can be found in [38]. In the networked components layer, ten PMUs arbitrarily chosen are simulated, and ten actuators for breakers, each controlling a transmission line, are included. In the network emulation layer, the aforementioned WECC communication network configuration is used for wide-area data transferring. One OpenPDC application program for visualisation on a separate PC is plugged into the network emulator and linked to the PMUs. Another application running in a VH is a PDC program with a system separation controller. The controller relies on the frequency measurement and opens the tie line, which connects the north and south, after a 7second latency if the frequency difference exceeds 0.4 Hz.
The simulation and testing process is comprised of two steps: record and replay. In the recording step, the hacker controls the compromised PMUs to record frequency data when a simulated    generator trip actually happens in the north. In the replaying step, the hacker controls the PMUs to stream recorded event data, even when the system is perfectly secure, instead of the actual measurement data. The replay attack is expected to trigger the controller and cause the system to split by opening the north-south tie-line. The frequency data produced by the physical system layer in the recording step is plotted in Fig. 10. At 2 s, a hydro generator on Bus 29 is tripped, and the frequency measurement at Bus 34 in the north starts to diverge from other buses in the south. The system separation scheme is activated around 4.5 s, and the system separation signal is sent after 4 s. After the system separation, partial under frequency load shedding application takes control of the load and attempts to recover the frequency to nominal. In the replay step, the hacker controls six out of the ten compromised PMUs by sending the recorded data. The recipients of the PMU data include the OpenPDC application and the same system separation controller. The detailed replay attack is composed of three phases, as shown in Fig. 11. After the replay attack started, the data received from the replaying PMUs start to diverge, while the compromised but normal PMUs are still reporting 60 Hz. As time progresses, the frequency measurements keep diverging, following the same pattern as the data shown in Fig. 10. In the last phase, after the system separation command is sent, the disturbances caused by the system separation can be observed by the normal PMUs.
The visualisation of the system separation caused by the attack is given in Fig. 12. Frequency deviation from the normal value is animated using Delaunay triangulation based on the given colour map. The frequency difference between the northern and southern parts can be clearly seen.
Finally, it is arguable that in real systems, the attack may not be as straightforward as the simple setup in the testbed. In other words, real systems have routines such as state estimations that can mitigate the attacks [36]. Although the complexity in real systems is acknowledged, the point of showing this case study is mainly to demonstrate the capability of performing closed-loop controls with realistic models of the communication network modelled simultaneously with the power system, as well as possible attacks in our testbed.

Conclusion
This paper introduces a cyber-physical power system testbed for wide-area monitoring and control verification. A four-layer architecture is proposed with representations of physical system, networked physical components, communication network emulation, and applications. To cope with the DIME environment, a latency-free setup is proposed using the data broker and distributed messaging service. All modules in the testbed eventually form a closed-loop for cyber-physical simulation and testing.
Three case studies are shown for verification and demonstration. The proposed data broker setup is verified to be latency-free. The SDN-based network emulation is verified to achieve the designated network latency for both a simple network and a multi-node large network. Finally, a replay attack using PMUs for system separation is demonstrated using simulation data, OpenPDC data, and visualization.x.
Future work for LTB involves generalising the frameworks for network-based applications, building applications for the communication network, and prototyping cyber-security scenarios.

Acknowledgment
This work was supported in part by the Engineering Research Center Program of the National Science Foundation and the Department of Energy under NSF Award Number EEC-1041877 and the CURENT Industry Partnership Program.