The Digital Twin – Your Ingenious Companion for Process Engineering and Smart Production

This essay of the temporary Working Group ‘‘100% Digital’’ of the Association ‘‘Process, Apparatus and Plant Engineering’’ PAAT in ProcessNet addresses characteristics of a digital twin from the user’s point of view and is intended to help solution providers to give product development a customer-oriented direction. For this purpose, the different data models are described which play a role in the life cycles of chemical processes, plants, and products. In particular, for existing (brownfield) plants, essential aspects of the digital twin are subsequently formulated. Further expressions and consequences, e


Motivation
''Data is the new gold'' [1], can often be heard or read in the context of many Industrial Internet of Things (IIoT), Industry 4.0, Smart Manufacturing or Big Data initiatives. Many keen users from the process industry might be tempted to deduce this comparison from the fact that even when prospecting for the desired precious metal one is hardly ever successful. But when you do, you do it right. Perhaps. The number of publications on all aspects of artificial intelligence (AI) is rapidly growing, the enthusiasm of management consultants seems to be unrestrained (see, e.g., in [2]). Often, it can be read about what is or will be possible, and there are also undoubtedly worth reading publications from the process industry environment. Nevertheless ''in the process industry [K] traditionally the measurement, control, and automation technology is strong, but the introduction of the ''Internet of Things'' is rather reluctant'' [3]. The organizers of the 57th Tutzing Symposion 2018 ''100% digital! Survival Strategies for the Process Industry'' did recognize this. The participants of the symposion and members of the Temporary Working Group (TAK) ''100% Digital'', which was founded afterwards, explored ''what special requirements the process industry has, what has already been implemented, and where there is still need for action''. The results and theses were published in [3][4][5][6].
The authors of [7] attribute the skepticism of many colleagues from the process industry about the assertiveness of a ''digital twin'' to the fact that ''it is often unclear to operators and contractors what added value can be generated from the digital twin and how business models can be designed in the digital world'' and point out that this has already been presented and published, e.g., in [8][9][10]. Nevertheless, many col-leagues from the process industry may indeed find it difficult in individual cases to deduce the economic benefit of a digitalization initiative, in particular as the majority of the publications are not the work of the process and operations engineers themselves, but of (service or academic) colleagues from the automation field, and in fact also describe what is possibleespecially in the area of predictive maintenance -rather than what is needed. In this respect, this essay is intended to make the voice of the user in the process industry heard and to help the supplier to give its product development a customer-oriented direction.

Characteristics of the Digital Twin
Competitors from the USA not only benefit from comparatively low primary energy prices, competitors from China not only benefit from still comparatively low labor costs and high availability of industrial land. In addition, these two countries generate by far the highest AI-related research activities and investments [2]. In contrast, German and European companies must achieve competitive advantages, especially with existing (brownfield) facilities, whose optimal operation is made difficult by the use of multiple, unlinked systems and repositories. The digital twin tailored to these framework conditions must therefore not only master the necessary linking of current and future information technologies and their applications in the process industry, but must in particular make it possible to further develop brownfield plants with all their analog burdens step by step in such a way that they can make a positive contribution to Germany and Europe as a business location in the long term.
Systems which have not been linked up so far could be converted into monolithic solutions, which would offer a high degree of security with regard to data integrity. These solutions are in competition with the idea of platform architectures, in which the user in a competitive market makes use of scalable individual and special applications which are compatible with each other, which are suitable for open innovation approaches, and which can be enriched in many places by open-source tools already available today.
The digital twin in the process industry has already gained some attention [11,12], but can be purposefully structured as a product-process-resource model (Fig. 1), since these sub-models are created and maintained by different disciplines, departments, and organizations and are characterized by very different speeds of change: -Product model: These models contain information about the products in the process engineering value-added network. These are on the one hand the physical properties of the pure components and their mixtures, product specifications, safety data sheets (MSDS), etc., but also data on production orders, returned goods, product-specific logistics data such as containers, storability, or special transport requirements. Sources for these models are product development, application development, or sales and marketing. -Process models are empirical or mechanistic models of the thermodynamic, chemical, physical, and control engineering relationships and regularities in time and space. The invariant structural models (balances, equations of state) are supplemented with approaches and correlations for adaptation to the plants (e.g., efficiencies or transport coefficients). On the one hand, these models reflect the general state of knowledge and are processed in libraries of simulation software, on the other hand, they are elaborated for specific objectives in the development departments. -Resource model: These models describe the mechanical, functional, and operational attributes of the plant structure and all equipment elements. In many classical applications of digital twin, the resource model can be equivalent to the proven plant model, but especially in AI and IIoT applica-tions it also contains computing, storage, and communication resources [13]. Sources for these models are the engineering systems of operators and suppliers as well as empirical values from operational operation. In addition to the plant data, all characteristics of mobile equipment, containers, auxiliary equipment, stock levels, and transport capacities are also stored here. Thus, resource models extend the plant models by the elements of production logistics.
By combining the product and process model into a transformation model, requirements for process control can be derived and theoretical limits of individual transformation, separation, and transport steps as well as sequences with economic and safety relevance can be derived, e.g., residence times, throughputs, or risk potentials. The setup of this transformation model must be done in an iterative way and will vary from case to case. This plant-independent description level is a prerequisite for the use of the digital twin for the efficient and effective conception of new production methods and processes. However, this level may include virtual as well as physical components [14], while it has to be adapted to the individual application, but with standardized interfaces.
The combination of resource and process models to a capability model allows statements on the capability to perform process engineering operations with certain stationary and dynamic parameters and assured properties in existing or planned facilities. This ideally largely product-independent level is particularly relevant for the rapid market launch of product innovations. In the case of modular plant concepts according to VDI/VDE/NAMUR 2658 and VDI 2776, the capabilities of a modular process unit are derived from the boundary conditions of the plant model and the process model and made accessible as service interfaces. Thus, simulation data from the plant models are available during operation.
The combination of transformation and capability model to the operation model finally allows to derive optimized operating parameters and recipes for the product portfolio, to accept process orders, and to create optimized production scenarios and allocation scheduling.
A key task of the digital twin is to organize and coordinate the networking of the partial models into an integrated information space [15] and to permanently store (raw) data and algorithms and make them easily available to the people and technical systems authorized to access them. Metadata, i.e., information that provides the necessary context to make it easier to find and evaluate relevant content and to act in open innovation processes [16], is of particular importance in this context.
All documentation and data generated during the life cycle of the production unit represented in the digital twin are recorded and made globally available, readable, searchable, and interpretable in the data memory. From the completely available data, all document templates are filled with the contents required for compliant and safe operation in a regulated environment, e.g., operating and maintenance instructions, test protocols, batch records, or official reports and approval documents.
In the manufacturing industry, the digital twin is often referred to the discrete product/workpiece (e.g., in [17] and [18]). The digital twin of the workpiece knows at all times where its real counterpart is located in the factory and in the production chain, which production steps must follow next, and which certificates have already been obtained. The approach described above differs fundamentally from this. However, it can make sense to include the logistics and life cycle of mobile equipment in the digital image of a production unit in the process industry. Then it becomes possible to link supply logistics with the production planning of a multiproduct operation in real time. Fatal confusions of raw and auxiliary materials, flows of valuable products, and waste would be avoided, and provision areas and transport capacities would be optimally used.
The modular design is established beyond monoproduct and world-scale plants, and a skid design, which also gives modular units local flexibility, is likely to become more and more widespread in the future. At the latest then, aspects such as audit trail, spare parts management, protection goals, and modular models for plant safety as well as availability of plant components (modules, skids) will have to be elements of the resource model. The key point, the primordial force of the digital twin, however, is that it is not only a representation of one or all plants and their shells -the buildings -of a production unit, but also of all processes, all procedures, and products which are implemented in this unit.
This makes it clear that in designing the digital twin, we are not relying on a monolithic system, but on an expandable platform whose individual components complement each other and are interchangeable. The digital twin needs a backbone that allows the exchange of information across all components. This is probably the greatest challenge, but the first steps have been taken: engineering data from piping and instrumentation (P&I) diagrams are already proven to be exchangeable via the DEXPI format and an extension to other applications of the plant life cycle is in preparation [19], the data models of the building information and management systems are well defined [20], the modular type package (MTP) with the process orchestration layer (POL) [21] has the potential to open up and innovate the architectures of process control systems. This integration is exemplarily indicated by the arrows in Fig. 2 and the number rows in the lower level.
The minimum requirements for digital manufacturer data for the process industry are available [22]. Now it is up to the user, to the customer, to no longer allow proprietary solutions, because they are contrary to the idea of the digital twin as an ingenious companion of the engineers and operators in the process industry. The providers of software solutions should no longer be measured by the customer only by how strongly the offered solution solves the specified, individual application problem, but how virtuously this solution plays on open standards. The further development of these standards for data exchange must be driven much more than in the past by the providers of computer-aided design/computer-aided engineering (CAD/CAE) and simulation tools. This can be done in bilateral pilot projects, but also in public-funded consortia, in which large users already often meet, and in which small and medium-sized suppliers of software components often also participate, but in which the large suppliers of these markets contribute far too rarely.
A project that starts without a concrete use case, without a short or medium-term exploitation plan for all contributors to the consortium, cannot drive innovation with the number of cycles now offered. One may be tempted not to strive to develop compatible solutions as long as the open standards are not elaborated. This excuse should no longer be allowed. Rather, project partners must have the courage to create de facto standards by creating executable, immediately useful, prototypical or production-ready solutions. This business model works. It is particularly popular in those countries that are again investing so much more in AI technologies than anyone else. To the same extent that vendors should disclose or abandon their proprietary solutions, users must deal with their in-house standards. ''In the globalized, dynamic world, company-specific standards are no longer an option'' [7].
The ingenious companion does not replace the ingenuity of engineers or natural scientists, but is constantly fed with knowledge by them, supports them, and will certainly relieve them of all repetitive tasks and thus everything an algorithm can do. It is a cognitive amplifier.
A digital twin of this type does not need to be fed with recipes or procedural instructions, it derives them from the stored knowledge. Just as the process engineer or operator does today from the information available to him. The real value-adding, cognitive achievement is not to write these dossiers, but to generate the information summarized there. If documents need to be released by humans, they go through corresponding procedures (workflows). The necessary models are known and exist [7].

The Digital Twin in the Plant Life Cycle
The process industry is characterized by two value chains (Fig. 2). These are the horizontal supply chain and the vertical asset life cycle, both of which come together in production. The efficiency of the overall system is largely determined by the fast and comprehensive availability of reliable information from all parts of the two value chains. For example, a change in raw material quality may require changes in production, which may have already been investigated and are available in process development. Digitalization including machine learning and AI will bring significant progress here and lead to an increase in process reliability, efficiency, and finally profitability.
The digital twin consolidates and continuously updates data from the asset life cycle and supply chain. The digital twin also includes models that simulate processes and conditions in the real world [23,24]. The models can access all data from the supply chain and the asset life cycle and provide correlations or patterns that are not currently visible. Physically based and modern data-driven and hybrid models thereof are used. However, statistical learning provides correlations, not causations. These must be inferred from the expert, e.g., a learning method can give a hint to certain pattern in the data that always occurs before a pump breaks down. In how far the pattern in the data can be used to predict pump failure is the task of the engineer.
Of course, a digital twin does not have to map the supply chain and asset life cycle entirely (cf. Fig. 2) but can be limited to sub-areas. However, the decisive factor is automated continuous data alignment with the real world, which will be all the more complex to implement the less standardized the data models of the system landscape under consideration are. An example for the efficient use of aligned data is the exchange of an equipment in the real plant. The exchange is usually documented in the enterprise SAP system with all data. Linking the process simulations to these systems make it possible to keep the simulations up to date, which always ensures a model for the current process behavior. This will lead to various model and documentation elements of static and dynamic nature, of documentation of development history and construction, but also of simulation models for the current operation and much more.

Particular Aspects of the Process Model
Models are indispensable for the development of an optimal design and for the efficient operation of processes. Models are not without simplifications and uncertainties in their approach, their structure, or in the model parameters. The combination of models makes it even more complex. Models must be both valid and dependable (validity and reliability). They are part of the digital twin as soon as they are implemented in algorithms and software. The interfaces are important to ensure consistent input and output. The current status of modeling in process engineering is described in a recent white paper from the working group Process Simulation, Process Synthesis and Knowledge Processing, MPO in ''Process, Apparatus and Plant Engineering'' (PAAT) of ProcessNet [25] including aspects of future development.
Previous conventional mechanistic process models are based on known physical-chemical relationships as well as empirical relationships for the individual process steps. Together with the material data from the product model, these models enable the calculation of process engineering processes. Since the physical process knowledge is incomplete in practice, the process calculation can achieve only approximations. Superordinate goals such as resource or energy efficiency can therefore also only be accurately calculated to a limited extent.
For this reason, methods must be developed which, by means of machine learning from process data, complement existing process models (an example can be found in [26]) and provide information on the necessity of additional sensors. The combination of the mechanistic process models with the newly developed methods for incorporating process data then enables an accurate description of the overall process behavior and, thus, a comprehensive, process-technological optimization.
To achieve optimized process control with the aid of the elements of the digital twin, it is also necessary that the mechanistic process models also describe dynamic processes and also calculate the process engineering process in real time, i.e., that answers are available within an assured time limit. This allows a foresighted simulation of the operation steps with the aim of developing new control concepts even for complex processes, where the classical approaches of control engineering fail. One limitation of mechanistic process models, however, is that often not all material and system properties are reproduced by the developed approaches, or the analysis of material functions represents too much effort. Together with the lateral discipline-overarching architecture (including chemists, chemical and process engineers, electrical and mechanical engineers, as well automation and data engineers) together with the signal pathway from sensor to process control system and beyond, a complete companion is available assisting engineers and operators.
Here, the equipment of chemical engineering processes with online or in-situ process analytics has the advantage that a large amount of process data is available over a long period of time. Here, too, the use of machine learning methods enables the usually very large quantities of data to be evaluated in a targeted manner and put in relation to one another. The coupling of machine learning with the physically based process models allows a calibration of the physically based process models to new states [26]. Another example is the tuning of loss functions to consider expert knowledge of the process [27], thus developing a profound understanding of the process.

Realization of a Digital Twin for a Brownfield Plant
Following the idea of scaling projects in such a way that a benefit is quickly created for all parties involved, the project plan for DT2030, the digital twin in the final configuration stage, cannot be derived here. Rather, it is intended to outline which contents should be reflected in an initial project for the realization of a digital twin of a brownfield system [28]. One lesson from the Tutzing Symposion 2018 is that no inefficient and overloaded processes or workflows should be digitalized. Thus, optimized lean processes, e.g., according to the Lean & Six Sigma approach [29], are a basic prerequisite and starting point for digitalization. Right from the start, the design of the digital twin must follow a multicriteria objective. Three global goals are at the forefront. The first is to design ongoing processes and new product launches with a view to reducing the use of resources. Here, modular equipment setup [30,31] and AI-assisted experimentation will drastically reduce time and lab effort [32,33]. Secondly, reaction times for new product launches (time-to-market), product transfers, but also during ongoing operations can be reduced [34]. Here too, the economic benefits will be evident, especially in regulated markets and processes with high working capital. Thirdly, the digital twin must help to control increasingly complex production processes while ensuring safety for people and the environment as well as product quality and safety [35,36]. Within the framework of the multicriteria approach, decision-makers should be enabled to find the best solutions in the respective context in a transparent and flexible manner.
As illustrated in Fig. 2, the digital twin connects systems with their models and data stores in the various value chains of the process world. To be able to implement this networking efficiently and comprehensively and to generate benefits from the availability of data, the following questions in Tab. 1 must be addressed. The solutions found can be quite differentiated depending on the equipment and range of the process units under consideration. Hence, it is not claimed that the digital twin offers a perfect solution for all these issues and must be developed in an iterative workflow along the process and plant life cycle.
In a typical project, after reflecting on the above-mentioned questions, a figure of the digital twin is obtained, in which the models are created at different points in time of the vertical asset life cycle and in different systems and are further devel- Table 1. Questions and issues for implementation of a process industry data network.
Products, processes, and product/process models What types of process models exist?
Which methodological discontinuities exist between process models, media discontinuities between IT tools, ''consistency breaks'' in substance data?
Which processes are suitable for process simulation, process optimization, which are not and why not?
Are further processes to be modeled or can existing process models be usefully supplemented?
How can models also be supplemented or enriched by methods of artificial intelligence (machine learning methods)?
How are scalable models created during process development and how do they grow with process planning up to product launch?

Resources and resource models
Which methodological gaps exist between plant models, media discontinuities between IT and planning tools, ''consistency breaks'' in plant data?
How can existing analog plant documentation of existing plants be efficiently transformed into suitable, intelligent asset management, CAE and documentation management systems?
How can an automatic, contextual linkage of plant data and documentation be achieved in different systems used?
Should production logistics and planning also be supported by the digital twin?
Can existing plant sensors be interconnected to form a higher-level unit (soft sensor), and thus, together with modeling, provide detailed information on the process and product?
How can plants of the production unit under consideration be usefully supplemented with process analysis technology or cognitive sensors, such as microphones or cameras, in order to also observe process variables that cannot be measured using methods of data analysis? The additional information can be used to diagnose the plant status but also for optimized process control.

Interfaces and procedures
What do human-machine interfaces look like, which on the one hand reliably support humans in their decisions, and on the other hand make this support so transparent that trust in it is strengthened and thus a real benefit is created?
Which documents should be created or changed by procedures/ workflows (change management), which not and why not?
How should the work processes of software engineering and process engineering be changed to derive maximum benefit from the use of the digital twin, in particular from the use and maintenance of process models?
How can system and data maintenance be made easier and operational activities and decision-making processes be simplified, supported, and accelerated by the digital twin? oped from then on. Fig. 3 shows a representative list of some systems and the data managed or generated by them along the timeline. Here, it soon becomes clear that an integrated and networked system landscape is required to process all ''assets'' and data. The integration of the laboratory information and management systems (LIMS) will be obligatory, as well as that of the enterprise resource planning (ERP) and manufacturing execution systems (MES), the engineering systems (CAD/ CAE), and the simulation tools for the process models. The operator training systems (OTS) can usually be fed with the process models. The entire digital twin framework can be used as base for training simulations and virtual reality tools. Even dismantling of production plants must be documented for pharmaceutical processes. according to good manufacturing practice (GMP).
The plant models as part of the resource models are maintained in the CAE and building information and management systems (BIM) and represent all physical elements of the instrumentation and control technology as well as of the equipment and piping across all plant sections. Here, a distinction is typically made between plant structure and equipment element PEA (process equipment assembly). The sensor data from the running operation are processed in the programmable logic controller (PLC) and distributed control system (DCS) and can be used to compare the existing process models with reality and also to enrich them with data-driven (partial) models.
In this form, the digital twin enables fast and comprehensive access to data from all areas. In addition, a wide range of possibilities arise from the model-based analysis of the data, the results of which then flow back into the real world and influence it. For example, the prediction of the wear condition of a component can be made on the basis of design and operating data. In addition to the aspect of need-based, intelligently supported maintenance (''predictive/prescriptive maintenance''), resource-saving process management is achieved in particular by using the models in algorithms for (dynamic) online optimization.
The suitable mixture of fully automated, recipe-controlled operation and support of the personnel by decision-supporting, interpretable hints, messages, or remarks depends on the complexity and maturity of the process and the plant structure. The level of autonomy is still under discussion [37]. The functionalities of the digital twin support a flexible selection, interconnection, and orchestration of modular production units to an ensemble on which the optimized recipe can be executed.
In addition to the technical characteristics, the digital twin also gives rise to requirements for training and further education [5]. Working with the digital twin requires interdisciplinary and holistic thinking. This must take place in the training of Bachelor's and Master's degrees as well as in the first advanced training units after the entry of new engineers in the respective companies and enterprises [38]. For the existing workforce, appropriate continuing training courses must be established in the company organization. Furthermore, the concepts of other related activities may give new aspects for own activities, e.g., from biotechnology [39]. Only in this way can the new approaches with the digital twin be successfully implemented in diverse practical applications.

Conclusions and Outlook
The Temporary Working Group ''100% Digital'' of the professional association ''Process, Apparatus and Plant Engineering'' (PAAT) in ProcessNet was founded as a result of the Tutzing Symposion 2018 to record digitalization activities in the field of  ProcessNet and biotechnology. The description of the digital twin was soon identified as an important result. This contribution addresses characteristics of a digital twin from the user's point of view and is intended to help solution providers to give their product development a customer-oriented direction. For this purpose, the different data models that play a role in the life cycles of chemical processes, plants, and products are described. In particular, for existing (brownfield) plants, the aspects of the digital twin are finally formulated. Further characteristics and effects, e.g., on education and training, will be derived from this. Fertilizing the beneficial use of AI methods in process engineering is the purpose of KEEN, a cooperation project funded by the German Ministry for Economy, involving more than twenty partners from industry and academia. The partners in KEEN, which is short for ''AI incubator labs in the process industry'', have identified representative use cases, where the combination of model-based expert knowledge with data-driven machine learning methods will show the intended substantial progress.
These use cases range from predicting material properties over surrogate models for virtual what-if-scenarios up to an autonomous, self-optimizing chemical production plant. Preliminary work and first results are most promising, especially when it comes to dealing with the situation of sparse data, which in process engineering is much more frequent than bigdata issues: A matrix-completion method has been applied to predict mixture properties for vapor-liquid equilibria with little data [40]. One goal of the KEEN partners is to demonstrate the benefit of this method for estimating properties of industrially relevant mixtures. Another use case consists in the application of surrogates to closely connect the different model hierarchies of chemical production processes, ranging from substance properties over unit operations up to entire flowsheets. Here, preliminary work indicates that machine learning methods are able to learn valuable model properties like the feasible range and Pareto boundaries [41].

Acknowledgment
To all members of the Temporary Working Group 100% Digital (TAK DIG), especially to Marco Oldiges and Manfred Dammann for their valuable contributions and fruitful discussion. This work was partly funded by the BMWi in the ENPRO ORCA project (FKZ 03ET1517). Open access funding enabled and organized by Projekt DEAL.
The authors have declared no conflict of interest.