Industrial Control via Application Containers:Maintaining determinism in IAAS

Industry 4.0 is changing fundamentally data collection, its storage and analysis in industrial processes, enabling novel application such as flexible manufacturing of highly customized products. Real-time control of these processes, however, has not yet realized its full potential in using the collected data to drive further development. Indeed, typical industrial control systems are tailored to the plant they need to control, making reuse and adaptation a challenge. In the past, the need to solve plant specific problems overshadowed the benefits of physically isolating a control system from its plant. We believe that modern virtualization techniques, specifically application containers, present a unique opportunity to decouple control from plants. This separation permits us to fully realize the potential for highly distributed, and transferable industrial processes even with real-time constraints arising from time-critical sub-processes. In this paper, we explore the challenges and opportunities of shifting industrial control software from dedicated hardware to bare-metal servers or (edge) cloud computing platforms using off-the-shelf technology. We present a migration architecture and show, using a specifically developed orchestration tool, that containerized applications can run on shared resources without compromising scheduled execution within given time constraints. Through latency and computational performance experiments we explore limits of three system setups and summarize lessons learned.

Industry 4.0 is changing fundamentally data collection, its storage and analysis in industrial processes, enabling novel application such as flexible manufacturing of highly customized products. Real-time control of these processes, however, has not yet realized its full potential in using the collected data to drive further development. Indeed, typical industrial control systems are tailored to the plant they need to control, making reuse and adaptation a challenge. In the past, the need to solve plant specific problems overshadowed the benefits of physically isolating a control system from its plant. We believe that modern virtualization techniques, specifically application containers, present a unique opportunity to decouple control from plants. This separation permits us to fully realize the potential for highly distributed, and transferable industrial processes even with real-time constraints arising from time-critical sub-processes.
In this paper, we explore the challenges and opportunities of shifting industrial control software from dedicated hardware to bare-metal servers or (edge) cloud computing platforms using off-the-shelf technology. We present a migration architecture and show, using a specifically developed orchestration tool, that containerized applications can run on shared resources without compromising scheduled execution within given time constraints. Through latency and computational performance experiments we explore limits of three system setups and summarize lessons learned.

K E Y W O R D S
Industrial Control Systems, Real-Time, IAAS, Container orchestration,

| INTRODUCTION
Emerging technologies such as the Internet of Things and Cloud Computing are radically re-shaping structure and control of industrial processes. These innovations allow the creation of highly flexible production systems, an essential component of the fourth industrial revolution. Key enabling technologies such as distributed sensing, big-data analysis time environment, resource virtualization constrains the determination of a proper environment on cloud computing platforms. A previously monolithic application is now part of an operating system (OS) managed environment, adding further difficulties such as inter-process or inter-service communication (IPC/ISC). Yet, we believe that the principles of Industry 4.0 present a unique opportunity to explore complementing traditional automation components with a novel control architecture [3].
We believe that modern virtualization techniques such as application containerization [4,3,5] are essential for appropriate utilization of cloud computing resources in industrial control systems. Such techniques would yield the same advantages that traditional containerized micro-services present: the creation of light and easily distributed control applications able to run on any system and that are, at the same time, easy to maintain and update [6].
With control containerization we create a strong enabler for Industry 4.0 attributes. Beyond the migration capabilities and flexibility, containers simplify the parallel execution of control software on devices such as PLCs and, to a lesser extent, on sensing and actuating field devices. This results in increased reliability and robustness, while enabling further exploitation of self-* properties (i.e., Self-aware, Self-predict, Self-compare, Self-configure, Self-maintain, Self-organize [7]). Time-machines (snapshots of control software and/or machine state), control redundancy (parallel operation of containers and/or virtual server instances [7]) and online system reconfiguration (reprogramming of control algorithms and product specifications with little or no downtime [8]) are only a few of the Industry 4.0 tools made accessible. Containers allow applications such as performance and distributed health monitoring [9,10] to run on a shared end node. They can host a Digital-Twin [11] to predict malfunction, maintenance intervals and tool lifespan.
Lastly, these modern virtualization techniques enables mixed criticality contexts, promoting increased efficiency, reduction of the operational cost and decrease of production downtime [12]. In this paper, we explore the feasibility of relocating real-time control applications, using off-the-shelf technology, from dedicated infrastructure and hardware onto a shared resource environment, both on a bare-metal host and in the cloud. The contributions of this paper are: • An architecture proposal to ease migration and enable extension with, and integration of, Industry 4.0 features.
• A proof of concept of an orchestration solution, opt to statically allocate and monitor containers and their resources.
• Evaluation and resource efficiency tests of the hard real-time task scheduling with application containers • Demonstration of how, under specific conditions, the same tasks can be run in the cloud.
The rest of this paper is structured as follows. Section 2 analyzes related work and background motivating our analysis. Section 3 proposes an architectural solution, while in Section 4, we discuss the methodology and design of experiments. We next detail the determined run-time contexts for containers, their frameworks, candidate host OSs and system latency in Section 5.2. Section 5.3 describes orchestration of containers, including software tool and tests.
Finally, we discuss lessons learned and conclude in the last two Sections.

| LITERATURE AND MOTIVATION
The proposed migration deals with two different areas: high performance computing (HPC) and control software containers. Both focus on different aspects of control program execution. The former focuses on lowering its latency and gives less importance to its execution determinism. The latter tries to reshape its run-time environment and thus, to create a level of independence to its underlying hardware. During this redesign determinism stays in focus, leaving system virtualization in the background. The resulting combination of containers executed on cloud resources and strictly time-dependent control application containerization constitutes a new challenge that can be coped with applying insights from both fields. Such a combination requires an operating system kernel that supports and exceeds soft real-time guarantees secured by low latency kernel flavors in use on HPC installations while keeping only limited environmental control. In this paper we assess the feasibility of this approach using off-the-shelf technology.
The following subsections detail motivation and related work. We discuss briefly useful insights for our investigation and outline the motivation to examine our problem. The section closes with the research questions for this study.

| Control containerization
Containerizing control applications has been discussed in recent literature. Moga et al. [4], for instance, presented the concept of containerization of full control applications to decouple the hardware and software life-cycles of an industrial automation system. Due to the performance overhead in hardware virtualization, the authors state that OS-level virtualization is a suitable technique to cope with automation system timing demands. They propose two approaches to migrate a control application into containers on top of a patched real-time Linux-based operating system: a) a given system is decomposed into subsystems, where a set of sub-units performs a localized computation, which then is actuated through a global decision maker, or b) Devices are defined as a set of processes, where each process is an isolated standalone solution with a shared communication stack, and based on this, systems are divided into specialized modules, allowing a granular development and update strategy. The authors demonstrate the feasibility of real-time applications with containerization, even though they express concern on the maturity of the technical solution presented.
Goldschmidt and Hauk-Stattelmann in [5] perform benchmark tests on modularized industrial Programmable Logic Controller (PLC) applications. This analysis examines the impact of container-based virtualization on real-time constraints. As there is no solution for legacy code migration of PLCs, the migration to application containers could extend a system's lifetime beyond the physical device's limits. Even though tests showed a worst-case latency in the order of 15ms on Intel-based hosts, the authors argue that the container engines may be stripped down and optimized for real-time execution. In a follow-up work, Goldschmidt et al. [13], a possible multi-purpose architecture was described and tested in a real-world use case. The results show the worst case latency of about 1ms for a Raspberry PI single-board computer, making the solution viable for cycle times of about 100ms to 1s. The authors state that topics such as memory overhead, containers' restricted access and problems due to technology immaturity are still to be investigated.
Tasci et al. [3] address architectural details not discussed in [5] and [13]. These additions include the definite runtime environment and how deterministic communication of containers and field devices may be achieved in a novel container-based architecture. They proposed a Linux-based solution as host operating system, including both single kernel preemption-focused PREEMPT-RT patch and co-kernel oriented Xenomai. With this patch, the approach exhibits better predictability, although it suffers from security concerns introduced by exposed system files required by Xenomai.
For this reason, they suggested limiting its application for safety-critical code execution. They analyzed and discussed inter-process messaging in detail, focusing on the specific properties needed in real-time applications. Finally, they implemented an orchestration run-time managing intra-container communication and showed that task times as low as 500µs are possible.
The three solutions discussed above share one common aspect: they base on a bare-metal configuration. These solutions illustrate a first step for the re-allocation of an embedded control software onto a dedicated infrastructure.
They all consider real-time constraints but remain limited to the execution on physical hardware. However, a take-way remains that containerization of hard real-time applications is viable.

| Cloud and High Performance Computing
In 2014, Garcia-Vallas et al. [14] analyzed challenges for predictable and deterministic cloud computing. Even though they focus on soft real-time applications, certain aspects and limits apply to any real-time systems. Merging cloud computing with real-time requirements is a challenging task; the authors state the guest OS has only limited access to physical hardware and thus suffers from unpredictability of non-hierarchical scheduling, and thick stack communications.
While there exist real-time enabled hypervisors that manage virtual instances such as the paravirtualized RT-Xen with direct access to hardware, the shared resources still suffer from latency that may make real-time execution impossible.
Hallmans et al. [15] draw similar observations, but they reach different conclusions. They not only conclude that it is possible to move a complete soft real-time system into the cloud, the authors see an upcoming development that further allows for hard real-time systems. Many latency performance evaluations confirm this possibility. Nonetheless, to our knowledge no one has verified the proper execution of real-time tasks within deadlines.

| Architecture and Scheduling
Felter et al. in [16] focused on identifying the performance of instances based on hardware virtualization via Kernelbased Virtual Machines (KVMs) and container OS-virtualization using the cross-platform capable Docker. The benchmarks confirm that Docker results in equal or better performance than KVMs in almost all cases. Arango et al. [17] analyzed three containerization techniques for use in cloud computing. The paper compares Canonical's Linux Containers (LXC), Docker and Singularity, an engine developed by Lawrence Berkeley National Laboratory, to a bare-metal application. In many aspects, the Singularity containers performed better, sometimes even better than the bare-metal implementation, but this is largely due to the blended approach of the engine; Singularity is an incomplete virtualization solution since it grants access to I/O operations without context changes.
A recent work by Telschig et al. [1] explores a platform-independent container architecture for real-time systems.
The authors identify mixed-criticality, cross-platform operation and third party software use as main reason for the development of new architectures. In their proposal manages communication between this dependent distributed software through an architecture. This architecture focuses on isolation of critical from non-critical tasks and portability.
The presentation concludes with the introduction of a prototype agent.
Abeni et al. [18] tried to extend the Linux standard scheduler to get better response times. In their recent work they detail how to extend the complete fair scheduler (CFS) hierarchically with a deadline based algorithm optimizing latency results for containerized software. The modified scheduler successfully manages larger amount of time critical tasks, performing better than the default deadline based scheduler.
Ultimately, containerization has shown powerful enough to be a resource economic replacement for traditional virtualization techniques. However, a performance investigation with real-time applications remains due; scheduling techniques like those presented in [18] further prove that there is still room for improvement.

| Research question and motivation
While flexibility and efficiency are big advantages of the new paradigm, such Smart systems display increased running costs. New architectures suggested for the fourth industrial revolution mostly provide for distributed and, above all, decentralized control. This control layer however has to carry out increasingly complex tasks. With the resulting high amount of small distributed supervision loops grows the need of dedicated hardware performing a single task. In turn, this would increase maintenance and operation costs.
Virtualization of control units, i.e., abstracting the function from hardware, allows up-scaling the installed computation appliances. Such unit can run on shared hardware exploiting cost reduction advantages typical to cloud computing environments. To keep a low maintenance profile, this up-scaling has to use standard hardware and software. The goal of our experiments is therefore to explore whether, and to which extent, off-the-shelf technology can help to migrate hard real-time applications to a virtualized computing resource. After successful migration, practitioners can run applications on a smaller, centralized amount of computing entities, consequently saving resources and substantially reducing operational cost.
A requirement for success remains that the software keeps its timing within bounds past migration. Control software usually characterizes by one or more real-time tasks with periodic execution and a computation deadline.
In the literature, three categories of periodic real-time applications have been analyzed: Soft, where computation value decreases with a deadline overshoot; Firm, when exceeding the maximum delivery time nulls the computation value; or Hard, where a missed deadline may have catastrophic consequences [19]. A task that exceeds its timing limits may further impede the execution of dependent and independent tasks. The delayed scheduler yield takes additional resources that may cause a bottleneck and following tasks may not maintain their deadlines. Consequently, if running multiple real-time capable instances, we have to verify that all task follow their run-time parameters.
A single relationship identifies all these parameters. The total required computation time c i of a periodic real-time application i relates to its relative deadline d i and period p i in the following manner: where f i is the wake-up or firing time (latency) and r i is the total run-time. The former captures the time spent between the period start and the execution start of a task. Its measurement includes task switching times and delays due to higher priority task and interrupts served. The latter expresses the actual used computation time t i and task (n i k ) or environment (n i j ) induced noise. This task noise includes interruptions by higher priority tasks, task IPC and I/O waits and latency due to missed pages, while environment noise includes hypervisor delays and hardware or (kernel) software interrupts. If the sum off all factors (c i ) exceeds the relative deadline d i , the resulting misbehavior of a controlled system might have catastrophic consequences. Hence, monitoring and containing these parameters can make a migration sustainable.
Motivating Example: Figure 1 depicts a distributed facility that controls the auxiliary and cooling systems of a group of thermo-electric gas-turbines. It illustrates an example of a migration to a software system with shared resource and application containerization. Like in Hallmans et al. [15], the control components are separate from the on-site remote terminal unit and run in a common, two instance virtualized environment. The real-time capable virtualization Motivating example: A cooling and auxiliary regulation system configuration for a gas turbine migrated with to real-time enabled cloud. RT-Node1 monitors and handles the on-premises installation.
instance, right in the private cloud, acts as an intermediary between "Monitoring and Management" and the on-premises end-terminals. Due to migration, this architecture abstracts the control logic from the production site requiring a reorganization of its software.
When migrating to a shared-resource system we want to assign each application binary to its dedicated environment. As suggested by Moga et al. [4], the software has been divided into multiple independent binaries, isolated and adapted to run on a standard Linux system. The application refresh rate, or periodicity p, should not exceed the expected maximum round-trip time between remote terminal unit and cloud of 100ms. If we assume now the system software exhibits the worst case computing time (WCET) of 10ms and that each instance uses assigned CPU and memory exclusively, the remaining CPU time of 90ms is spent in idle mode, resulting in high resource waste. Sharing the spare CPU-time can reach a better resource utilization. Placing multiple containers on the same resources can additionally reduce the required system size and its running costs. Separate running environments enable thus flexible resource management and may reduce infrastructural cost.
To verify migration behavior, we examine the following research questions:

RQ1:
What are possible off-the-shelf system configurations that make resource sharing through containers viable?
We have seen in Equation 1 that the achievable amount of resource sharing depends on the system and concurrent running tasks. This research questions intends to investigate the responsiveness of possible candidate off-the-shelf systems. Systems that prove a low and stable firing time, f i , fulfill a vital prerequisite to achieve determinism in a shared context.

RQ2: What is the achievable level of CPU sharing with a standard real-time enabled kernel?
While a constant f i describes vigor of reaction, actual CPU load will show the variability of task run-times, r i .
Monitoring software run-time on shared CPUs thus displays the impact of task interaction and operating system, I/O and virtualization delays on the programmed run-time t i , and finally, the determinism. By isolating system (n i j ) from task noise (n i k ), we can determine the upper bound of a system's availability to resource sharing.
In this paper, we approach these two points and focus on the feasibility of a migration. Firstly, we illustrate latency behavior with shared resource on multiple hardware and system setups. Exploring operating systems and container engines, we select configurations for latency stress tests. A monitored test will show how task reaction times change with varying configurations and system load, RQ1. Secondly, we extend the experiments on the best performing candidates to analyze performance and determinism with different loads. Through static resource allocation we can further explore computation time stability, delays and occurring deadline misses if more than one task runs to the same resource. Isolated tests of CPU performance allow us to remove confounding factors and have a hint on the upper sharing boundary, RQ2. Section 5 will tackle the answers of these research questions.
The motivating example of this section further shows that the migration of control requires adaptation and reorganization of software and system structure. To ease such migration, we extend our investigation with an architecture proposal for the Industry 4.0 context. Its design acts as template easing transition to a containerized setup and shared instances, and it enables advanced features for novel industrial control systems. The next section illustrates and details this architecture proposal.

| REAL-TIME SMART SYSTEM ARCHITECTURE
Migrating from dedicated hardware to shared resources requires more than a simple relocation strategy. The control algorithm needs adaptation and reorganization into units that take care of the different responsibilities of a control system within their constraints and run-time parameters [15]. A "leveled" design [15] can split the responsibilities of a such an algorithm depending on their criticality and timeliness, enabling the integration of cloud systems into a real-time industrial process. Via a layered approach to system architecture we perform this allocation of responsibilities for a smart system.
The challenges of a migration do not end on a system level. Once a practitioner adapts and configures a binary to run on the new system, he or she has to monitor the correct execution within its timing parameters. The interaction among control applications and unmanaged inter-process communication may cause irregular and unpredictable run-time behavior [3]. Also, the amount of real-time applications potentially running concurrently on a single node requires for adequate monitoring and management tools to avoid overloads or mutual influence. Thus, the system architecture proposed in this section supports the migration of control applications on shared resources (i.e., control virtualization), aiming at addressing the above issues.

| Overview and Layering
Our architecture extends the concept of the "leveled" model of Hallmans et al. [15] to off-the-shelf technology and  • The Control Cluster or "Real-time Cloud", for process control and control-related services; • The On-premises Installation or "Process", connecting a multitude of heterogeneous devices that interact with the physical world.
The three layers may overlap such that, for example, the control cluster can be part of the cloud or on-premises installation as an internal IAAS infrastructure.
We use then the layer classification and analysis technique of recent work [20] Comparison and mapping of layers to the architecture proposal of the two used references, Han et al. [21] and Lee et al. [7]. layers and competences.
As we illustrate in the rest of this paper, this architecture allows for a better implementation of virtualized control.
Through a layered approach we ease problem identification and handling. It enables detailed assessments such as gradual detection of security issues and adaptation of architectures as carried out in within similar heterogeneous environments [22,20]. In the following sections, we detail the three layers and their mapping to the two reference architecture styles while pointing out the connection to functions and attributes of Industry 4.0, refereed as I4.0.

| Management and Monitoring Cloud
The first component of the architecture consists in cloud-based monitoring and management infrastructure and services, top left in Figure 2. Many of the architectural approaches introduced after publication of the Industry 4.0 vision include this component as a hub. In this layer, data is globally collected and analyzed and data-dependent supervision decision are taken. It performs data acquisition and aggregation from the on-premises devices and analyzes them, for instance, through artificial intelligence tools. The integration in the layer of distributed diagnosis and prognosis frameworks as proposed in Wu et al. [9] allows for host machine learning processes based on collected and aggregated plant data.
Techniques such as Preventive Health Management (PHM) as a Service, which reduces the maintenance effort for the plant operator by relying on Platform as a Service (PaaS) and Software as a Service (SaaS) can be implemented [10].
Such frameworks and services ultimately enable self-adjustment and self-optimization techniques to reduce production waste and adapt to variations such as mechanical wear (I4.0 "Configuration" level).
This cloud layer also hosts a service for container management, providing instruments for appropriate planning, positioning, and execution of real-time containers. Real-time tasks as well as containers are arranged according to their function and interdependence and deployed on available real-time capable nodes, called K-nodes in Figure 2.
Through the help of client side agents, the service is able to seamlessly update and replace distributed applications during runtime. This component enables self-configuration (I4.0 "Configuration" level) by taking care of the software replacements based on a reconfiguration plan [8]. Paired with the container management tool, a system monitoring tool can verify the container execution. Differently from the above-cited PHM and diagnosis and prognosis frameworks, the aim of this monitoring tool is not production surveillance, but monitoring the health status of K-nodes and virtual servers.
Extension of the monitoring service with a time series database further allows tracking changes in time, performing data analysis, and applying data-based techniques such as deep learning.
A further service placed at this level of architecture is an interface to the human operator as plant operators in an Industry 4.0 context interact with the system in a more decision-making than an operative role. As such, information is displayed to enable informed operators to make decisions and interventions in production processes. To this aim, features such as simulation and synthesis may optionally be available [23] (I4.0 "Cognition" level). Finally, replicas in form of a "digital twin" test reliability and the overall monitoring and supervision of the cloud environment (I4.0 "Cyber" level).

| Control Cluster
The central element of the architecture is an IAAS infrastructure with main purpose to host services and processes that have to interact with on-premises devices. Figure 2 shows an example for a hardware configuration for container-based virtualized industrial control systems. Depending on the system's needs, the represented components are either virtual servers (or instances) running in a cloud or physical servers operated in a private (edge) cloud. In the latter case, each server can run more than one virtual instance, obtaining again the same resource sharing advantages of a computing cloud infrastructure. In both cases, the hosted virtual instances can be real-time capable running control software or non-real-time capable for further services. The real-time instances, called K-nodes in the Figure, are running multiple containers managed by the cloud service. A dedicated tool orchestrates system resources at run-time (See Section 3.5).
In this environment, each binary of an application can be managed within one container to which we can add constraints and boundaries to ensure operations.
As noted by Telschig et al. [1], the continuous growing demand on extension of control loops with cloud based analytics (see Section 3.2) requires mixed-critical software components to run on the same system. Thus, in the Control Cluster, a time critical component runs on fix assigned resources to guarantee timeliness. It is isolated from the components that run on a best-effort CPU scheduling policy, while still sharing resources of the system. In this setting, non-real-time instances or separately allocated resources can handle such best-effort tasks. The co-located best-effort resources may then be reclaimed to buffer for real-time task resource shortage. Non-real-time instances can then handle other, less critical, services. For instance, they can run a time-machine or carry the edge computing portion of the health monitoring framework detailed in Wu et al. [9] (I4.0 "Conversion" level). The former collects snapshots of real-time applications to enable peer comparison and similarity analysis, thus promoting self-awareness [7]. The latter operates with redundancy on multiple copies of containers (I4.0 "Cyber" level), or the virtual instances themselves can have replicas to increase system's robustness. Server 2 in Figure 2 could be a replica of Server 1, ready to take control when the latter fails. We can have replicas in form of "digital twins", requiring the real-time application to be extended by a model representing the device and its environment. The model, fed with sensory input coming from on-premises and interfaced with the running process and/or human operators, finally allows further self-comparison and diagnosis [11].

| On-premises Installation
The control software connects with the sensing and actuation devices placed on or near the equipment of the factory (I4.0 "Connection" level). Depending on the timing and determinism requirements, this connection might need to follow more restrictive protocols. An example of such protocols can be found in the Time Sensitive Networking (TSN) standards family [24]. However, application-specific needs and physical location set the need of such protocols. Depending on control requirements including device distance and cycle times, popular COTS Ethernet enabled protocols may suffice.
Traditional choices such as isochronous ProfiNet and EtherCAT manage hundreds of devices in time-critical manner for local networks [25]. The proposal leaves thus the choice of the connection type to each application case.
Although the on-premises computation has been moved to the Control Cluster component, the proposed style foresees further control software installed on on-premises devices as well. For redundancy purposes, the cluster may indeed operate a redundant copy of the on-premises controller (Section 3.3), or some units may operate as Remote Terminal Units (RTU), serving as interface to the containerized software or even execute some minor local control function. Such local control loops [21] would have the advantage to reduce latency while exploiting the computing power of a (private) cloud.
Morabito [26] shows that control applications can run inside a container on typical ARM single board computers with minimal performance impact. Replication and snapshots further enable Industry 4.0 features also for such on-site devices. As part of data evaluation and sharing, they can now independently calculate health, estimated remaining useful life etc., bringing self-awareness to machines. Via snapshots, a machine can compare its performance with itself and others of a fleet enabling self-comparison [7]. Thus, through containerization we ease maintenance and reduce cost while increasing resilience and robustness.

| The Orchestrator
The heart of this proposed architecture style is the orchestration software running on each real-time capable node of the Control Cluster. An orchestrator, in this context, is a tool developed to increase resource utilization without significantly impacting determinism. It monitors containers and resources, and assigns the latter according to algorithms, rules or predetermined configurations.
There are two ways to manage resources: static and dynamic. If statically configured, the level of latency and determinism that is achievable can be defined up-front. A static resource schedule is created off-line and passed to the orchestrator for execution. Although such a configuration would be the safest, the amount of resource sharing gained is limited. For such a static schedule, the configuration must be pessimistic, taking worst case execution times as regular and granted, and reserving the corresponding CPU-slice for every application. For higher resource savings, a dynamic reallocation strategy is attractive. A dynamic scheduling strategy instead reallocates containers during run-time to guarantee timeliness when unforeseen delays occur. It allows higher resource sharing as it can adapt to current needs. However, complete dynamic rescheduling of containers would be non-deterministic as it depends on the feasibility/admission test [19]. With given constraints, the determinism can however be managed within a certain probability of success.
In dynamic resource management, instead of allocating resources based on worst case parameters, it uses probabilities to asses situation [27]. The orchestrator considers typical run times, contemporaneity factors and probabilities of occurrence of the WCET. It samples run-times and performs curve-fitting to predict distribution models and probabilities. The combined probabilities then tell the rate of success of a schedule and trigger resource organization as needed.
This approach resembles the "vertical scaling techniques" used in cloud-hosted applications [28]. Similar approaches in cloud computing environments increase resource efficiency through over-subscription where the reserved resources may exceed the actual requirements acting as buffers for worst case situations [29]. In our case, can be assessed to which probability a system-wide malfunction may occur, allowing a system administrator to set a maximum acceptable boundary of risk. This boundary then defines the probability of success of dynamic scheduling in relation with the achieved resource savings: the higher the risk, the more savings may be achieved.

| Summary
Migrating control applications from hardware to bare-metal and finally to an IAAS infrastructure has two major advantages. First, application containerization allows managing and monitoring execution easily. It eases parallel operation, redundancy, quick updates and upgrades for container-confined code. We have seen that replacing a set of On the other side, a recent systematic mapping study [2] highlights the limits of available architectures, for instance based on the 5C attributes. The architecture style illustrated in this section has been designed not only to ease application migration, but also to support self-* properties of the migrated application with management and monitoring layer. In summary, the proposed architecture gives support to a more complete control solution for both research and industry.

| METHODOLOGY AND DESIGN OF EXPERIMENTS
To proceed with a migration onto IAAS, we require experiments that confirm the viability of a migration, Figure 4. These experiments have to validate adequacy of system latency and the sufficiency of run-time determinism. In Hofer et al. [30] we explore the running context and execute qualifying latency tests to assess system and Hypervisor influence. Specifically, we compare bare-metal with virtualization approaches that use hypervisors of Type 1 (native). The former is expected to perform better in latency, but worse in resource economy whereas the latter display better resource economy, but limited in hardware control. Each system will run the same Real-Time enabled OS and test software that will log measurement data during run-time. We compare resource sharing capabilities on the following three architectures: • A bare-metal server, which we use as migration baseline for a typical industrial control systems; • A Type 1 hypervisor controlled virtual generic instance; • A Type 1 hypervisor controlled virtual compute-optimized instance.
The latency tests verify the suitability of specific hardware or virtualization solutions. By applying computational and I/O stress to a task's shared resources we can examine latency effects on its real-time parameters. We start by measuring firing time variations on a system with Type 2 hypervisor to identify the best performing virtualization setup in both cases, when idle and stressed (settings test). During test run we gradually isolate the measurement tasks, the guest OS and host OS using tools like Linux control groups (CGroups) and system configurations such as task and interrupt affinity. We pick the best configuration based on low standard deviation (stability) and reduced firing time (reactivity). Then, we perform the same test with the best configuration on the three mentioned architectures. We track on all tests how the latency parameters alter as we change the environment and pick again best performing configurations based on stability and reactivity. The analysis of the data will then allow to make a judgment on the level of suitability of each architecture.
In the second part we focus on the interaction of virtualized control tasks with the shared environment. The performance tests execute in container batches with varying system load and timing constraints. Through Earliest Deadline First (EDF) scheduling we can reach high theoretical utilization rates of 100% [19]. We partition resources via CGroups so to virtually address every resource slice as if it were a separate computation unit. The orchestration software of Section 3.5 will help us in this manner by managing interrupts, creating CGroups and assigning its slices, and managing system resources to isolate them from our test containers during run-time. First, we observe performance variation by changing kernel boot parameters of the off-the-shelf OS chosen in the latency tests. During multiple reboots, we apply boot time kernel settings such as scheduler tick timing, scheduler isolation and RCU back-off CPUs. The goal is to find settings that promise steady execution on the three hardware instances. A stable median, low average and standard deviation indicate ideal kernel configurations for each machine type. Next, we compare the performance of on the three test architectures with the most stable setup. We increase and mix task configurations, and verify the testing run-time determinism in long-term execution. Dropping of performance and the amount missed deadlines ascertain then the absolute upper sharing bound.
During both experiments we work to identify parameters of Equation

| Context evaluation
We first explore the running context to asses possible system software candidates for the migration. We review stateof-the-art operating systems that can provide both (hard) real-time and container framework support. Besides OSs targeted for server infrastructures, we also evaluate some lightweight OS. The selected OS must exploit the given resources properly, allowing the hardware to perform at its best while not increasing the burden of operation. Container daemons are selected based on features, ease of use, maintenance and system footprint.

| Experiment Setup
For our tests we have chosen the following systems. The bare-metal server features two Intel Xeon X5560 (Q1'09) processors on 8 cores, 16 threads, limited to two cores for our experiments. For hypervisor based tests, we selected Amazon Web Services (AWS) to host the cloud-based environments. Their recent virtual instances use a new hypervisor based on KVM, called HVM, which allows direct assignment and control of hardware and resources reducing the virtualization overhead. The new instances offer comparable HPC performance, but greater flexibility and scalability [32].
We selected an AWS HVM Type 1 hypervisor based T3.xlarge generic and a C5.xlarge computation optimized instance.

| System latency tests
We use cyclictest [33] Version 1.0 to measure the latency of cyclic firing behavior of a real-time application and stress [34] to simulate load in the system. 3 The offline preliminary tests run on a dual core, 4 thread, i7 Skylake (U) system, while the main hardware comparison tests run on the three systems detailed in Section 4. During the progressive isolation of CPU resources we measure the idle firing time and firing time change with every CPU runs stressing threads. Once found the best setup we perform one idle and one stressed test for each configuration. All variants, i.e., Standard Ubuntu, Xenomai and PREEMPT-RT patch run the tests for at least one million firing loops. The logged results are then used for the long term test evaluation.
Further details, the script executing all the tests, the installation scripts, the experiment data and technical details and results are available in [30] and online [31].

| Execution and Results
In summary, the latency tests give the following main results. The first preliminary latency tests determined that for our purposes, guest-host CPU isolation with load balancer is the best setting. Table I in [30] displays test results for the preliminary test. Figure 5 then shows the comparison test results with our found best setting. Ideally, the maximum firing delay of the threads should stay below 1 10 t h of the cycle time, which we assumed to be 100ms for sake of comparison in this study. Therefore, Figure 5 features two reference lines visualizing the boundaries for typical thresholds, one at 10ms (for a 100ms cycle) and 100µs (for a 1ms cycle).
A total of ten million loops over multiple hours have been executed for each configuration. All results obtained have been gathered under stress and should be considered the worst case scenarios. Among all standard kernel configurations, the reference bare-metal solution equipped with any of the three patches (BM, left box-plots in Figure 5) performs best in mean. If we consider the PREEMPT_RT configurations (Prt) across all machine types, the bare-metal set-up performs best in mean but not in spread as the box-plot whisker spans higher and almost reaching the 100µs threshold. With only 96 occurrences out of 10 million (0.00096%) exceeding the upper limit, a general T3 instance with PREEMPT_RT can be an economic solution for a bare-metal replacement where strict determinism is not needed or cycle times are higher than the peak value measured, 49ms. It shows the lowest spread and peak (114µs) among the measured instances that only a PREEMPT_RT T3-Unlimited enabled unit outperforms.

RQ1: What are possible off-the-shelf system configurations that make resource sharing through containers viable?
Among the examined low maintenance options we identified Ubuntu 16.04 LTS with the PREEMPT_RT real-time patch and Docker containers as best fit. By observing systems under stress and analyzing task latency across different configurations, we came up with four different solutions suitable for migration to application containerization. These solutions maintain wake-up determinism at different levels as follows: 1. The bare-metal solution (BM) ensures the most deterministic behavior for hard real-time requirements. Even though it is the weakest among all configurations in terms of CPU resources, the strict bond between hardware and software boosts its responsiveness.

2.
The virtualized instance C5 with PREEMPT_RT patch is the best non-hardware solution for hard real-time require-ments that trades off good average latency and deterministic behavior. While it still suffers from some Hypervisor latency, the exclusiveness of CPU access and the ability to control C-states allow reducing non I/O induced noise and plot better value consistency.

3.
The T3 unlimited instance with PREEMPT_RT is a cheap solution with good average latency. As there is no guarantee on the availability or responsiveness of extra CPU power, these configurations can be chosen as an intermediate solution between T3 and C5 instances.

4.
The T3 instance with PREEMPT_RT is a viable solution with good average latency that might not qualify for hard real-time requirements. Also, this T3 instance may not ensure the physical CPU exclusiveness. For this reason, the C5 PREEMPT_RT instance may be a better choice for stricter timing requirements.
In conclusion, the results are promising and confirm the feasibility of migration to IAAS solutions.

| Performance tests for resource optimization with container orchestration
We perform the following resource efficiency to tests by placing a set of real-time applications on shared resources. For this purpose, we use the real-time test software rt-app [35] to create configurable dummy applications. We place them into separate containers and configure them with running periods and computation times. For simplicity, we match Each test batch consists of the following four configurations: Test Case 1 -lower bound: homogeneous period and run-time among all containers executing on the same resources with a WCET w i smaller than the best case scheduler's wake-up granularity (1000µs). With this test, we force high resolution granularity scheduling, causing more scheduler calls than planned for the highest scheduler tick rate.
The test setup consists of ten containers with a WCET w i of 900µs. With a period and deadline of 10ms each, this results in a resource utilization factor U of 0.9.

| Execution and results
The first test batch for kernel configuration shows the full dynamic tick configuration with mandatory back-off as performing best. The setup scored the best results in most runs for average and median stability. Test case 1's values gave standard deviations of 31µs on bare-metal, 7µs on type C5 and 15µs on type T3 instances when running ten containers in parallel. Second by performance and stability is the fixed-tick kernel configuration. This second configuration turns out useful if more than one task is available to run or next in line at the same time, mostly when mixed with non-deadline oriented schedules. For the long-run tests of test batch two, we thus choose a PREEMPT_RT kernel with full dynamic ticks and RCU back-off, with a run-time of 15 minutes each.
In Test Batch Two, we repeated the same tests on all three systems. To test the repeatability, we re-calibrated and repeated the tests multiple times. Additional results and diagrams can be found in the project archive [31]. Tables 2 to 5 report the results of the four test cases. We display numbers for test case 1 and 4 with loads from close to 50 up-to 100% of CPU time only. During preliminary testing, we noted that the restart of a virtual instance on the AWS cluster causes it to move to a different system rack. Given that hardware across system racks may not be equal, e.g., Xeon 8100 vs 8200 series CPU, this change after shutdown is a variable to be considered. While this influenced the calibration for AWS based tests, it does not influence the comparison among the results of the same virtual instance.
The resulting run-time data from both test batches shows that resource sharing for real-time containers is feasible.
Properly configured, a system can reach a utilization limit of 0.9 or 90%. Through our tests we have shown that, although under stress, both latency and determinism reach desired values. Among all, the AWS C5 shows the most stable run-time values. It is the most resourceful of all systems and thus likely suffering the least from background noise. Being virtual, it does not respond directly to hardware interrupts like the bare-metal system, softening amount and duration of interrupts. However, this does not mean it is not influenced by system noise. As seen in test configuration two of Table 5, the C5 system can still be subject to major variations. The bare-metal instance, yet, shows higher fluctuation in skew and standard deviation, but still stays steady in a certain range. In all tests, the results for skew and deviation remained within 20 or 30µs. While this jitter may seem a problem, the fact that it can be isolated to this range make it predictable and thus ideal for hard real-time use. Lastly, the generic AWS T3 shows the worst but still rather stable run-time behavior. The highest fluctuations are shown in test case 4, where idle times between cycle repetition are the longest. Indeed, this confirms that during idling, the hypervisor may change the physical CPU reservation. If we consider these constraints, also an economic generic AWS T3 instance may suffice our computational needs.
In the end, all systems show adequate stability for the sample loads we created. The worst variation of system run-time stays within 126µs, a value that has to be considered when dealing with hard deadlines in the order of few milliseconds or less. However, this confirms that all setups allow shared computational loads up-to and exceeding 90%.
Only close to full load the systems starts to suffer from a deadline overshoots. Starting from these results, we can now investigate if off-the-shelf technology keeps the process viable once task I/O (n i k ) and network latency are taken into account.

| LESSONS LEARNED
Thanks to the performed test series and after discussing results and consequences, we have drawn few lessons we learned that could be beneficial for practitioners aiming to use application containers for industrial control as in the following: Real-time requirements, and consequently architecture, are application specific. While a generic architecture as presented in Section 3 covers most situations, the current layout changes for each case. Like in the motivating example, Figure 1, levels may merge where the environment requires it, and give the final architecture a different, reduced shape. However, responsibilities, function and goals remain unchanged.
Picking a real-time capable OS does not guarantee determinism. Different OSs have distinct trade-offs. While the Linux Xenomai patch outperforms the PREEMT_RT patch, its induced kernel overhead limits systems scalability.
When choosing OS, we have to closely match hardware and constraints for best results.
Modern virtualization techniques perform well enough to accommodate hard real-time environments. Both, the latency and performance test showed satisfying results confirming viability of an application migration. Depending on task configuration, we can reach subscription rates exceeding 90% of CPU resources. The next and last constraint to tackle will be the network and I/O latency. This, however, depends on the applications' timing requirements and thus, needs further investigation.
Direct hardware access decreases latency and improves responsiveness. Despite the less powerful hardware, the Bare-Metal server still outperforms newer Hypervisor based instances for task responsiveness. Similarly, limited access to CPU resources improves virtualized performances, i.e. AWS C5 vs T3-Unlimited. Thus, although possible, virtualized instances require newer and better hardware to reach similar performance. A practitioner might thus need to consider resource sharing beyond control containers to reduce hardware installation costs. The architecture of Section 3 helps to address this job.
Economic virtual instances may suffice for less strict determinism requirements. Generic AWS T3 shared instances show comparable results for task firing latency, but add variability when under stress. While this variability discourages their use in environments with strict timing requirements, i.e., task periods of few milliseconds or less, it enables them however for less critical operations, e.g., periods in 100's of milliseconds like in the motivating example, Section 2.4.

| CONCLUSIONS AND FUTURE WORK
In this paper, we explored limits and feasibility of migrating real-time applications from bare-metal servers to virtualized IAAS configurations. We showed that containerization offers a novel paradigm for control applications. Previously isolated computation tasks, however, may operate concurrently and interact with each other, potentially influencing timing performance. We concur with Goldschmidt et al. [13] that this new paradigm requires investigation on topics such as container security, restricted access and intra-container data exchange. We suggest an architecture to help migration and placement of these new applications in an Industry 4.0 focus. Through the alignment of technologies and the interconnection of attributes, the proposal enables features previously not available. We next introduced an orchestration tool that can schedule real-time containers based on pre-configured capacities. We showed configurations that maximize resource utilization without significantly impacting overall execution determinism. Through targeted tests, we verified migration viability and influence on a computation only task considering system I/O and latency.
In future work, I/O and system latency will be investigated and dynamic allocation strategies will be exploited to further improve system performance. A dynamic orchestration algorithm will help to tackle issues that arise when task do not respect their designed parameters. This new configuration will also help to increase robustness of a system and detect a deviation of task behavior due to cyber-attacks or externally induced overloads. New latency and performance tests on industrial use cases will help further analyze limits and possibilities for shared-resource real-time systems, including robustness and behavior when under attack.