Integrating FMI and ML/AI models on the open-source digital twin framework OpenTwins

The realm of digital twins is experiencing rapid growth and presents a wealth of opportunities for Industry 4.0. In conjunction with traditional simulation meth-ods, digital twins offer a diverse range of possibilities. However, many existing tools in the domain of open-source digital twins concentrate on specific use cases and do not provide a versatile framework. In contrast, the open-source digital twin framework, OpenTwins, aims to provide a versatile framework that can be applied to a wide range of digital twin applications. In this article, we introduce a re-definition of the original OpenTwins platform that enables the management of custom simulation services and the management of FMI simulation services, which is one of the most widely used simulation standards in the industry and its coexistence with machine learning models, which enables the definition of the next-gen digital twins. Thanks to this integration, digital twins that reflect reality better can be developed, through hybrid models, where simulation data can feed the scarcity of machine learning data and so forth. As part of this project, a simulation model developed through the hydraulic software Epanet was validated in OpenTwins, in addition to an FMI simulation service. The hydraulic model was implemented and tested in an agricultural use case in collaboration with the University of Córdoba, Spain. A machine learning model has been developed to assess the behavior of an FMI simulation through machine learning.

technologies such as the Internet of Things (IoT), artificial intelligence (AI), or the analysis of a large amount of data.The use of those technologies combined with the need to analyze the collected data to improve production processes has led to the birth of the digital twins concept. 1,2lthough digital twins are still an emerging paradigm, there is no unanimously accepted definition.A digital twin is a digital and dynamic representation of a real object or system that may or may not be tangible. 3Digital twins are usually created using a set of data obtained from different IoT devices and different models based on AI or simulation.These are capable of simulating and analyzing the behavior and performance of the objects they represent.
Simulation is a very important component of digital twins since it allows testing and analyzing different behaviors of the physical system in a simulated environment, which not only provides some benefits such as cost and efficiency savings but also allows testing different scenarios in a virtual environment, helping save costs and time and improving before testing in the physical system.][6] For those reasons, simulation is important in the world of digital twins, as it allows many aspects of digital twins to be improved.Currently, there are different software used by the industry to simulate products and behaviors, as is the case of the FMI (Functional Mock-up Interface) 7,8 standard, * which allows incorporating different simulation environments in many applications.Other examples such as Simulink, which is used in different industries such as automotive, aerospace, or telecommunications, or Epanet, a simulation library to study the behavior of water lines or other kinds of fluid systems.
A problem related to simulation and digital twins revolves around the prevalent use of proprietary frameworks, such as Microsoft Azure Digital Twins, † AWS Digital Twins, ‡ and the solutions offered by Siemens § and Bosch, ¶ within the industry.In contrast, open-source initiatives like the Eclipse Ditto project # or iTwin.js|| focus primarily on facilitating the creation and management of twins and their associated data, but do not provide adequate support for simulation.
Similarly, other open-source projects concentrate solely on simulation aspects. 9,10Consequently, with the exception of scenario-specific or proprietary solutions, there is a notable absence of diverse frameworks that provide digital twins, simulation, and machine learning (ML) capabilities.Considering that various industries have effectively demonstrated the value of simulation as a practical and regularly employed resource, it becomes crucial to bestow an open-source framework with standardized, portable, and easily implementable simulation capabilities.
In this work, we introduce a re-definition of the OpenTwins framework that merges the fields of digital twins and simulation.By combining digital twins with robust simulation capabilities, the framework OpenTwins, presented in Reference 11, opens up new possibilities and significantly expands the open-source community, adding support for custom simulation services and FMI standard simulations that are totally compatible with the ML functionalities presented in the original platform.This new tight integration opens up a new range of opportunities as discuss in Section 3.3.Open-Twins also has application in other domains such as the GEDERA ** project, where simulations are used in the context of digital twins for energy management, or the 5G+Tactile † † project, in the context of autonomous driving.
To validate the performance of the simulation service, a digital twin of an irrigation pivot situated in the Rabanales field in Córdoba has been developed.This digital twin incorporates a simulation model that leverages the Epanet 2 framework to study the behavior and help in the use of this machine in collaboration with the University of Córdoba.To show the viability of this platform beyond the integration with custom simulation models such as Epanet, an FMI-standard simulation service has been developed and validated.
Therefore, the main contributions of this article are: 1. Redefined the OpenTwins 11 architecture to support FMI and custom simulations.
2. In addition to FMI and custom simulations, different types of simulation behaviors are proposed to adapt to different environments.
3. Validation of the simulation digital twins platform in an agriculture use case and through an open FMI-based service.4. Possibility to use the simulation service to develop and co-exits with ML models through compatibility with Kafka-ML. 12e rest of the article is structured as follows: Section 2 presents related work and differences with this framework.Section 3 presents the implementation details and the integration of the simulation service and machine learning.Descriptions of the use cases studied are presented in Sections 4 and 5. Section 6 provides an evaluation of the framework.Lastly, in Section 7, the conclusions of the project and future work are presented.

RELATED WORK
The development of digital twins had a major impact in the last few years.Although there are several applications that give a wide variety of functions to develop digital twins, most of them focus on a specific task, combining digital twins and simulation, ML, or just data analysis.Hürkamp 13 addresses the challenge of combining digital twins, simulation and ML through the development of a framework adapted to the manufacture of plastic molds, in contrast, OpenTwins is a general purpose framework and the ML compatibility can be used for several objectives.Another example is Reference 4 where M. Dietz presents a framework that combines digital twins with simulation for the insurance improvement of a SOC.Therefore, developers may have difficulty finding different general purpose open-source frameworks for developing digital twins that combine visual representation, IoT, artificial intelligence and simulation.Moreover, simulation stands as a pivotal technology within the realm of Industry 4.0, facilitating the creation of planning and exploratory models aimed at optimizing decision-making processes as well as streamlining the design and operations of intricate systems.The objective of Kamath et al. 14 is to bridge the gap between academia and industry by providing an architecture based on some of the open-source components used by OpenTwins.However, OpenTwins allows the integration not only of digital twins with real-time data monitoring but also with ML techniques (Kafka-ML 12 ), 3D rendering and the most recent, simulation.Finally, as the main architecture, the whole OpenTwins platform and simulation components are open source and available on GitHub.‡ ‡ Karan et al. 15 also consider the inclusion of open tools such as Eclipse Ditto and OpenPLC to develop a digital twin framework.They orchestrate a simulation model for modeling computational fluid dynamics.This framework lacks flexibility and is only adapted to CFD systems.In contrast OpenTwins is extremely flexible, as it can be adapted to different environments and systems.
One of the open platforms under development is WLDT. 16A general purpose library for digital twin development.In the absence of general purpose frameworks, it proposes an architecture designed to maximize modularity, reusability, and flexibility.Although it is a very powerful tool, its functionalities are closer to the developer's scope.On the other hand, OpenTwins offers tools that allow anyone to develop digital twins in a simple and intuitive way, offering more simplicity and at the same time as much flexibility and modularity as the one proposed.
Vats 17 presents a framework for developing HDT (human digital twin).This framework presents ML functionalities with the objective of detecting cardiovascular diseases.This type of digital twin would benefit greatly from the simulation functionalities incorporated in OpenTwins, since in conjunction with the ML models they would be able not only to detect, but also to experiment with the simulation of a possible reaction to certain types of treatment, making the digital twin much more efficient.
Furthermore, simulation holds the potential to contribute significantly to the assessment and implementation of Industry 4.0 within organizations by enabling the evaluation of diverse scenarios. 18owever, Industry 4.0 is moving towards the use of different ways of using simulation services combined with the use of artificial intelligence methods, as in the case of Reference 19, and this is the direction in which the OpenTwins platform is moving.
Other works, such as Reference 20, emphasizes the importance of simulation as a tool for understanding and optimizing the performance of complex production systems in the era of Industry 4.0.It highlights the potential of simulation-based analysis to support decision-making processes and drive improvements in production efficiency, flexibility, and quality.OpenTwins has the same vision but goes further.Thanks to the implementation of simulation support, ‡ ‡ https://github.com/ertis-research/OpenTwins.OpenTwins is not only a platform that allows the development of digital twins of different nature, but also allows the incorporation of different simulation services to improve the performance of digital twins.
In the evolving field of simulation frameworks, it has become apparent that many of these frameworks are bound to specific sectors and don't have the functionality to develop digital twins, as is the case of Logisim § § that is a simulation framework designed to create logical circuits, or OpenSim, ¶ ¶ which is a framework for biomechanical modeling.
Indeed, when it comes to global simulation frameworks like OpenModelica, ## which primarily focus on the creation of multiple models using linear equations, there may be limitations in terms of functionalities specifically tailored to support digital twins.Consequently, the task of developing versatile models for digital twins becomes more challenging within such frameworks, a task that the platform featured in this work aims to facilitate.
Merging those two fields, there are several frameworks such as the one presented by Kombaya. 21A framework for reconfigurable manufacturing systems is presented.This framework allows making digital twins of manufacturing systems, allowing the incorporation of simulation elements.Similarly, Singh 22 proposes a toolbox for the development of digital twins focused on the development of processes, manufacturing systems and so forth.Angjeliu et al. 23 bridge the gap between construction and digital twins through simulation, achieving high precision in assessing the structural conditions of the structure.
On the one hand, these frameworks offer more specific functionalities for this type of twins than OpenTwins, being able to achieve a more faithful twin in those fields.However, the functionalities of OpenTwins, such as multipurpose simulation and its compatibility with ML, offer a much wider range of possibilities, making it possible to achieve a much more versatile digital twin.
Bogomolov et al. 24 also consider a simulation service to analyze the behavior of the digital twin through the FMI standard.By using a standard, it is possible to achieve a much wider range of possibilities than previous proposals.However, similar to the work discussed above, it does not have ML functionalities.The lack of these functionalities or the possibility of running different simulation services makes OpenTwins a more attractive proposition for the development of different digital twins.
OpenTwins not only focuses on a single simulation aspect of digital twins but also gives the opportunity to use different kinds of simulation services through an API that can fit on several twins, such as custom simulations or the FMI standard.In addition, OpenTwins provides ML techniques by connecting with Kafka-ML.This feature, combined with simulation, can bring much better results to the performance of the digital twins.

OPENTWINS ARCHITECTURE
The original OpenTwins framework 11 has as its fundamental pillar Eclipse Ditto.This open-source framework offers a high variety of functionalities to develop and integrate digital twins, such as persistent storage for historical data and integration with ML and data streams through Kafka-ML. 12In addition, it uses Grafana as front-end and gives support to 3D visualization of twins using the Unity engine.|||| In this project, the architecture OpenTwins has been redefined to allow integration with different types of simulations.Figure 1 shows the components that currently comprise the platform: • Developed on this project: Colored in gray are the support for different simulation services.
• Original platform:Yellow those components that give support to data prediction and ML, the red is the virtual representation, in blue is the essential functionality, the green part gives intermediate functionality to communicate different components.

Simulation service
Simulation is a powerful tool that not only allows the platform to compare the behavior of the physical model with the expected behavior but also allows trying new configurations on the model before using it over the real asset modeled such  as in the case of an assembly line digital twin.Simulating a new part or configuration is useful instead of testing it in the real world, saving a lot of costs in the process.Supporting the FMI standard is a practical choice, as it is indeed one of the most widely used simulation standards in various industries.By incorporating FMI support in OpenTwins, it enables compatibility with a broad range of simulation tools and models that adhere to this standard as is the case of Modelica *** or Matlab Simulink.† † † This flexibility can be beneficial for users who work with FMI-compliant simulations.
Additionally, the decision to support simulations created in any programming language further enhances the versatility of the platform.This approach allows developers to leverage their existing simulation models, even if they are not initially based on the FMI standard.
By incorporating both FMI-based simulations and customized model simulations, the digital twin platform caters to diverse requirements and facilitates the integration of a wide array of simulation scenarios, enhancing the platform's versatility and applicability.
To integrate a simulation service into the platform, it is required that the platform sends the simulated information to the message broker, where different components of the platform can read the data.Furthermore, in order to be able *** https://modelica.org/.
† † † https://es.mathworks.com/solutions/system-design-simulation.html.to run it from the Grafana interface itself, it would be necessary to allow its activation through calls to a REST API.The connection between Eclipse Ditto and the messaging broker where the simulated information is written is really what will allow the digital twin to be updated.The development of simulation compatibility has been designed to adhere to the schema below, using containers to align with the core architecture of the platform.The simulation service consists of two distinct components: • REST API.It serves as an interface to facilitate the exchange of information between OpenTwins components through HTTP requests.
• Simulation service.Executes the simulation itself and sends the data to the designed message broker.
Those separated components were packed into two Docker images and use Python.This provides great portability because it is ready to work without installing any additional software far from Docker.In our case, those images were requested with Kubernetes, which enables the management of containers in a multi-node deployment and other features such as high availability and fault tolerance.Packing the simulations into two different images and using Kubernetes, allows the system to manage several instances of simulation.Therefore, in case of having a Kubernetes cluster made of different devices, the execution of the simulation will run on different devices regarding the load of each node at the moment.
The REST API, developed through a Python package called Flask API, is always running, waiting, and listening for a simulation.In the case that the main platform makes a request to get a simulation, the API Flask will receive that request and launches the simulation on the corresponding container, allowing the system to run several simulations at a time.
In this article, two categories of simulations are introduced.
• Types of simulations by implementation.As discussed, the digital twin platform encompasses two distinct types of simulations that have been implemented, FMI based simulations and custom simulations.
• Types of simulations by behavior.This second category of simulations is characterized by their focus on the behavior and characteristics of the results generated by these simulations.

Types of simulations by implementation
While simulations have been supported by deploying the corresponding containers following an API REST and simulation container, two different types of simulations have been implemented: • FMI simulation: This simulation service adheres to the FMI standard, used in many different software and frameworks.FMI provides a standardized interface for exchanging simulation models and data between different simulation tools and environments.The FMI standard runs FMUs, which are the packaged models of simulation previously developed.It allows running different simulation models that were designed to be used in multiple frameworks.By supporting FMI-based simulations, the digital twin platform ensures compatibility with a wide range of simulation models developed using FMI-compliant tools.In order to accommodate the varying configurations and variables present in each FMU, the service has been designed to offer flexibility in configuring multiple parameters.The following are the configurable parameters available in the service: By allowing the configuration of these parameters, the service offers adaptability and customization to accommodate different FMUs with their respective configurations and variable sets.This flexibility ensures that the simulation service can handle a wide range of scenarios and adapt to the specific requirements of each FMU being executed.
• Custom simulation: It gives the chance to implement specific simulations, such as in the case of the irrigation machine.It has been designed to allow multiple kinds of algorithms, models and so forth.For instance, the Epanet model presented in the use case or a custom car simulation model.This functionality is extremely useful in cases where simulation services have not been developed under any standard.In many occasions, developers or companies have old services or services developed in some specific language that work correctly and want to reuse them.The operation of this type of simulation is based, as explained above, on a Rest API to which requests are made and a service that is executed after the call to the Rest API.This API is in responsible for sending the necessary information to the custom simulation service.One difference with the FMI simulation service is that it does not require specific configuration parameters, as it will depend on the simulation to be run.
This service, although it is completely customized, needs to have some elements that allow the information to reach the data flow of the platform.These elements are: 1. Sending data to the messaging broker.2. Formatting the data in a specific format called Ditto Protocol.
In cases where the simulation data does not conform directly to the Ditto Protocol, a JavaScript mapping can be employed.This mapping involves utilizing JavaScript code within Eclipse Ditto to transform the format or structure of the data to match the expected Ditto Protocol.By applying this JavaScript mapping, the simulation data can be effectively converted and integrated into Eclipse Ditto, ensuring compatibility and seamless data processing within the platform.
As mentioned above, the execution of the different types of simulations works through a REST API, that is, through HTTP requests.To facilitate this process, feature that enables the creation of the necessary forms for running simulations has been introduced.When dealing with FMI, uniformity is maintained across all FMI forms due to its standardized nature, facilitating the configuration of aforementioned variables, this form can be observed in Figure 2A.However, for customized simulations, our tool provides the flexibility to generate entirely tailored forms.This feature empowers users to incorporate diverse variables along with their respective types, ensuring the proper functionality of the simulations.Figure 2B shows this form, comprising a series of fields designed to capture the following crucial parameters: 1. Simulation ID: This field represents a unique identifier for the simulation, enabling easy tracking and management of individual simulations.2. Service URL: The URL of the REST service is specified in this field, defining the endpoint where the simulation service can be accessed.3. HTTP request type: This field allows the selection of the specific HTTP request type that Grafana utilizes to interact with the simulation service (POST, GET, etc.).
Additionally, a range of versatile functionalities have been incorporated to facilitate the provision of any required information to the simulation service.These functionalities empower users to customize and tailor the simulation according to their specific needs.On the right side of the form, a JSON document is constructed, which encapsulates all the relevant information necessary for creating the simulation.

Types of simulations by behavior
When creating digital twin simulations, the platform allows you to create two types of simulation, regardless of whether it is a custom simulation or an FMI simulation: • As simulation instances that serve as snapshots or representations of the twin's state and behavior at particular time intervals.

F I G U R E 2
Types of simulations by behavior.
• As new twins.Allowing to create "What-if" twins of the real device/process.As they are new twins, they have data independence.
As mentioned, the simulated data will affect the system in one way or another depending on its specific case.This will be reflected in how the Eclipse Ditto connection is established.The two proposals are shown in Figure 3.A primary twin is defined as one that has a direct relationship with the real object, receiving its data in real time, and being able to interact with it.
The first option, shown in Figure 3A, is to include the simulated information as features or attributes of one or more primary twins.With this option, a single copy of the main twin is created in the first run, and each of the following runs replaces the data of these twins with the simulated data.In this case, a run implies the update of at least one twin.The correspondence of a simulated data with an attribute or a feature will depend on whether this data is constant or variable, respectively, in relation to the digital twin data.This option can be useful when the simulation corresponds to only a small part of the twin, not even one of the twins of which it is composed, and there is no need to treat the data as a specific twin.This option is fast, as it involves only one or several updates of parts of twins.The flow of data from Grafana to the database can be observed in Figure 4.The figure depicts the following sequence of steps: 1. HTTP request from Grafana: Grafana initiates an HTTP request to communicate with the REST API.This request serves as the trigger for deploying the necessary simulations.

REST API:
The REST API receives the HTTP request from Grafana and handles it accordingly.It orchestrates the execution of the required simulations based on the provided parameters and configurations.Execution 3 [21]   idSimulationRun: Original Twin Copy of the original Twin (a) Simulation as new twin data.
Execution 1 [20, 30]   Execution 2 [45] Execution 3 [21]   Original Twin Copies of the original Twin F I G U R E 4 Data flow of the data using simple simulations instances.
3. Simulation: Every simulation has its own container, regardless of whether it is an FMI or custom simulation.4. Kafka broker: Once the simulations are completed, the resulting data is sent through a messaging broker.The messaging broker acts as an intermediary, facilitating the transfer of data between different components of the system. 5. Telegraf: After passing through Eclipse Ditto, the data is processed by Telegraf.Telegraf is a plugin-driven server agent that collects, processes, and aggregates data from various sources.It enhances the data for further processing and storage.6. InfluxDB: The processed data is then stored in InfluxDB, a database designed to handle high volumes of data in form of time-series.InfluxDB efficiently stores and organizes the data for retrieval and analysis.7. Grafana visualization: Finally, Grafana retrieves the stored data from InfluxDB and utilizes its visualization capabilities to represent the data in the desired format.
The second option, shown in Figure 3B, is to create a copy of the main twins involved and replace their data with the simulated data, that is, one run implies the creation of at least one twin.This is recommended for those who want to separate the simulated data from the real data and/or want to operate with the simulation results as if they were the main twin.
As the data of all twins, including these copies, are stored in a database, the data of several runs can be queried even if they have been sent to the same twin.To facilitate this query, an identifier is stored next to the data of each simulation run.The data that a copy stores will always correspond to the last simulation run.The dataflow of this type of simulation can be observed in Figure 5.It can be observed that the dataflow is similar to the previous case, with the exception that after step 4, instead of reaching Telegraf, the data arrives to Eclipse Ditto and then to a MQTT broker and finally to Telegraf: • Eclipse Ditto: In case of creating a "what-if" simulation of the original digital twin, the data passes through Eclipse Ditto, which creates a new copy of the twin.
• MQTT Broker: After receiving the data and updating states, Eclipse Ditto sends the data through MQTT.
These two options differ mainly in execution speed and data independence.The first option requires less storage, but its functionality is limited, so it is useful for those cases in which you want to query or work with data from many runs of a simulation and, at most, apply some process to the last of these.The second option, as data must flow through Eclipse Ditto, is normally slower and requires more storage resources due to the twin cloning process.The advantage is the total data independence of each simulation run.This option is recommended for simulations where prediction, or some similar process, is to be applied to different simulation runs or if it is needed to create "what-if" twins of the original.
When creating a simulation, the chosen option will determine how the connection should be established with the messaging protocol used.Eclipse Ditto provides the flexibility to configure the connection using various messaging protocols, such as MQTT, AMQP, or HTTP, depending on the requirements and capabilities of the simulation data.The chosen protocol should align with the capabilities of the simulation data source or the messaging infrastructure being utilized.
As explained before, in cases where the simulation data does not conform directly to the Ditto Protocol, a JavaScript mapping can be employed.

Visualization of the simulation results
As mentioned before, OpenTwins uses Grafana as front-end, an open source platform designed for data visualization, so multiple plugins can be used in order to display all collected data correctly.
From the Grafana plugin for digital twins developed, 11 a simulation can be started by filling in all the necessary data for the call to the rest service that runs the simulation, such as an instance of Epanet 2 simulations (the Grafana dashboard developed for this purpose can be seen in Figure 6).After a short period of time, the simulation is available on the database and can be studied using the designed dashboard that incorporates the Unity plugin that shows the 3D model of the twin.
In Figure 7, an example of a Grafana dashboard showing both the 3D model and the data received from the twins and the simulation is showcased, specifically a 3D representation of an FMI simulation of a ball bounce.The dashboard shows two key elements that play a vital role in facilitating accurate data visualization.
The Unity plugin assumes a prominent position within the dashboard's framework.Its primary function is to present a three-dimensional representation of the actual machine, dynamically adapting its movements based on the simulation data received or stored.This plugin offers an interactive interface that enables users to manipulate the three-dimensional model, thereby facilitating exploration from various angles and perspectives.

F I G U R E 7
Bouncing ball Grafana dashboard.
The lower panel of the dashboard serves as a significant component that displays all the received simulation data in the form of a time series.This panel effectively portrays the data, allowing users to discern temporal patterns and trends.By presenting the information in a comprehensive manner, the lower panel enhances the understanding and analysis of the simulation results.

Integration of simulation and machine learning
Combining simulation with ML models is a powerful approach that can offer several benefits in various fields.Therefore, designing the simulation services with the purpose of being fully compatible with the functionalities already offered  by OpenTwins in the field of ML was something that would provide OpenTwins with a wide range of possibilities and functionalities.Some ways in which simulation and ML can be effectively combined are: 1. Training data generation: One of the prominent challenges encountered in the development of ML models is the insufficiency of available data.Simulations offer a viable solution by facilitating the generation of substantial volumes of training data for ML models. 25This approach becomes particularly valuable when the acquisition of real-world data proves to be constrained by different limitations.To illustrate, within the field of robotics, simulations can effectively produce a broad spectrum of scenarios, serving as a valuable resource for training robotic control algorithms in instances where obtaining diverse real-world data is impractical or cost-prohibitive.2. Transfer learning: Models trained in simulations can be fine-tuned on real-world data using transfer learning.This helps the model adapt to the specific nuances of the real environment while benefiting from the initial training in a simulated environment. 26. Safety and robustness testing: Simulations provide a controlled environment for testing the safety and robustness of ML models.This is particularly important in applications like autonomous vehicles and drones, where real-world testing can be risky and costly. 27. Anomaly detection: Simulations can be employed to create normal behavior models.ML models can then be trained to detect anomalies by identifying deviations from the expected behavior. 28

Adversarial training:
Simulations can be used to create a variety of adversarial scenarios to train ML models to be robust against unexpected or malicious inputs.6. Cost reduction: Conducting experiments in the real world can be expensive and time-consuming.Simulations offer a cost-effective means to pre-train models and iterate on algorithms before deploying them in the real environment.7. Reinforcement learning in simulated environments: Reinforcement learning agents can be trained in simulated environments, allowing them to learn complex tasks without the need for extensive real-world experience. 29anks to the integration and operation of simulation and ML functionalities, digital twins can be developed that reflect the behavior of the physical counterpart more closely to reality.As mentioned, through hybrid models, where simulation data can feed the scarcity of ML data, complement the performance of certain specific elements within the digital twin, as is the case of the irrigation pivot use case explained in Section 4.
The structure of the platform where all the data flows through Kafka allowed the compatibility of the simulation service and the ML module to be simple as observed in Figure 8.As explained above, OpenTwins offers ML functionalities thanks to Kafka-ML.The pipeline allows the design, training and inference of ML models.Training and inference datasets for ML models can be fed through Apache Kafka, so they can be directly connected to data streams such as those provided by OpenTwins.
In this context, an hybrid ML model of the FMI use case presented in Section 5 has been developed to detect anomalies in the ball behavior.From this example, one of the benefits discussed of the integration of simulation and AI/ML in the OpenTwins framework can be observed.This model is adept at identifying irregularities in the operation of simulations, facilitated by the seamless integration of the simulation service with the ML capabilities of OpenTwins.

USE CASE: SIMULATION OF AN IRRIGATION PIVOT FROM RABANALES FIELD, CÓRDOBA
The custom simulation service added to this platform was validated through an agricultural use case in a crop field.In collaboration with the University of Córdoba, an experimental farm with an irrigation pivot irrigating a surface of 9 hectares has been taken into account.This use case, within the Spanish Gedier project, is based on creating a functional digital twin of crop fields, by collecting different data such as soil moisture, levels of different elements, vegetation index maps and so forth.In addition, data-based and classical models or the incorporation of the irrigation pivot simulation are integrated, with the aim of predicting the future behavior of the farm crops.
This digital twin of the Rabanales farm is operated through a Grafana application.This farm is divided into different areas, where different crops and methods are tested.In addition, the farm has different machines that help the irrigation of the farm, such as irrigation pivots.
As part of this digital twin, an application has been developed to visualize and execute the functionalities of this twin.This application consists of different visualization dashboards where different data related to the area can be observed.
These panels vary depending on the data to be visualized.For example, in the irrigation panel, shown in Figure 9, two main panels can be observed, the left panel is the map with the working area of one of the irrigation pivots showing different data about this machine within a set time range.On the other hand, the panel on the right side shows the 3D representation of the irrigation pivot.The data produced by this simulation serve as input for different ML models for predicting the behavior of crop fields, being one of the integration cases of simulation and ML discussed in Section 3.3.This allows the creation of simulations based on the current data and perform tests to see the evolution of the twin without wasting water.
In other dashboards, other types of information can be observed, such as NDVI maps, which are an indicator that shows the greenness, density and health of the vegetation in each pixel of a satellite image or different data related to the campaign and so forth.
The use of the irrigation pivot simulation is extremely useful especially in these circumstances, as the lack of rainfall in Spain, makes it impossible to use this machine.So the use of the simulation to check the results and make an intelligent irrigation is practically mandatory in order to save water consumption or reduce the emission of greenhouse gases.
The objective is to create a simulation service that is able to predict the configuration needed to use the irrigation pivot correctly, given several configurable parameters.For this purpose, Epanet 2 has been used.
Epanet 2 is a software used in the analysis of water distribution systems. 30It is an open-source software, and it was developed by the U.S. Environmental Protection Agency.This software allows performing hydraulic analysis based on the characteristics of the hydraulic network, where liquids with similar characteristics to water flow.
The irrigation is carried out by the University of Córdoba in the field of Rabanales, Córdoba.The University of Córdoba has provided real-time data from the sensors installed on the machine, which are considered single twins that will constitute the main twin of the irrigation machine.A photo of the irrigation pivot can be seen in Figure 10B.
Using those single sensor twins, an Eclipse Ditto Thing scheme has been defined to group those twins and make the complete twin of the irrigation pivot.In this instance, every sensor is provided with a single measurement, accompanied by the timestamp of when it was recorded.To create the twins, the plugin developed for Grafana was used.As an example, the Eclipse Ditto Thing scheme for the main twin and one of the sensors is shown in Listings 1 and 2 respectively.
F I G U R E 9 Screenshot of the Gedier digital twin.Figure 10A shows the 3D representation of the pivot irrigation machine of this use case.Consequently, a digital twin of the irrigation procedure has been achieved, featuring a representation of its behavior.Various graphical elements, including diverse types of visual representations and a 3D model, are employed.This model, when fed with sensor data, offers the capability to showcase relevant information on the machine, such as its motion, and permits various forms of user interaction.

USE CASE: FUNCTIONAL MOCK-UP INTERFACE SIMULATION
Similar to the validation of the custom simulation service through an irrigation machine use case, the FMI simulation service was also validated using an example of the execution of Functional Mock-up Units (FMUs).In this case, due to the lack of freely available FMUs, a model of the bouncing of a ball has been chosen to be used.This is an FMU example available on.‡ ‡ ‡ In this validation process, the FMI simulation service is put to the test by executing FMUs, which are self-contained simulation models packaged in a standardized format.These FMUs are obtained from various sources and represent different system components or models.
By executing the FMUs within the FMI simulation service, the platform validates its compatibility and functionality with respect to FMI-compliant simulation models.The use case provides an opportunity to assess the performance, accuracy, and reliability of the FMI simulation service, ensuring that it effectively executes the FMUs and produces the expected simulation results.
Through such tests, the digital twin platform establishes its capability to successfully handle FMI-based simulations, enabling users to leverage the power of FMUs and integrate them seamlessly into their digital twin environments.Each FMU encapsulated different variables relevant to the digital twin and the simulation process.Specifically, the variables included in this simulation are velocity, time, acceleration, and position.

Test 1: measurement of the custom simulation service and robustness
This test, performed over the use case of the irrigation pivot in the Rabanales field, will evaluate the latency from the moment the request is made to the simulation until the data is stored in the InfluxDB database.The purpose of conducting latency tests for a simulation service is to ensure that the service can handle quick interactions and provide a smooth user experience.Simulations often require quick response times to maintain the illusion of realism and to enable timely decision-making.This test will not only evaluate that but an incremental execution will also be done to check the robustness of the service according to the number of simultaneous simulations and the capacity of the platform to receive a large amount of data at the same time.
Due to the design of the platform, each twin is composed of smaller twins, so they can be combined in a simpler way.That is why the twin of the irrigation machine is composed of a total of 593 smaller twins corresponding to the components of the machine.
This test has been split into several simulation batches, from batch 0 to batch 15.After just one HTTP request, every batch executes a total of (batchNumber * 10) + 1 simulation instances simultaneously, stores the timestamp when the simulation is called, and compares it later to obtain latency and throughput.Once the simulation service has done the job, the data is sent through Kafka and read by Eclipse Ditto.Eclipse Ditto updates the state of the simulated twin and then creates an event that sends the data to an MQTT broker.This data is then read by Telegraf which stores all the data in InfluxDB.
When all the data produced by the simulation is stored, all stored timestamps are taken to calculate the elapsed time of the execution.All this data flow can be seen in Figure 4.
The result of the test is an average of 10 executions per batch and can be seen in Figure 11B.This figure shows the mean elapsed time per batch.
Executing this high amount of simulations is not a usual scenario in this use case, but it has been made to study the behavior of the platform dealing with a high amount of clients and messages.
F I G U R E 11 Data plot of the proposed evaluation.As detailed in Section 4, each simulation is responsible for transmitting a total of 593 messages to a Kafka broker.Consequently, as the number of simulations increases, the overall number of messages sent undergoes a substantial surge.For instance, in the first set of simulations, where only one simulation is executed, a total of 593 messages are transmitted.In the second which involves the execution of 11 simulations, the cumulative number of messages sent amounts to 6523.Finally, in the last set, an extensive volume of 89,543 messages is transmitted as a result of the high number of simulations involved in the process.
As illustrated in Figure 11A, when the number of simultaneous simulations is increased, the time interval between the simulation call and the storage the first transmitted data in the database remains within a 2-s threshold.This indicates that the system is capable of efficiently processing and storing the data despite the increased workload, highlighting that not only do the developed services exhibit a robust capacity for parallelization, but also that all the data reach the database in a short period of time.This capability enables the system to handle high workloads efficiently while maintaining a resilient performance, even when confronted with large amounts of data.

Test 2: Latency measurement of the FMI simulation service and robustness
This second test, performed over the bouncing ball use case, aims to assess the latency involved in the process of making a request to the simulation service and storing the corresponding data in the InfluxDB database.Unlike the previous test, which involved multiple simulations executed through a single HTTP request, this test focuses on evaluating the simulation service using one HTTP request per simulation, as well as the FMI simulation service.This test follows an incremental execution approach, similar to the previous test, in order to examine the robustness of the service with respect to the number of simultaneous simulations.However, in this case, each simulation is initiated through independent HTTP requests, thereby challenging the platform's capacity to handle a significant influx of data concurrently.As a result of this test, the system experiences a higher level of overhead.This test, similar to the previous one, is divided into multiple simulation batches, ranging from batch 0 to batch 15.Each batch runs a total of (batchNumber * 10) + 1 simulation instances simultaneously, triggered this time by one HTTP request per simulation.These simulations are executed using threads, and the timestamps are recorded when each simulation is called.The recorded timestamps are later compared to measure latency and throughput.This batch size selection allows for a comparison between the two simulation services developed for this platform.
Once the simulation service completes its task, the data is transmitted through Kafka, Eclipse Ditto, and MQTT, and ultimately stored in InfluxDB using Telegraf.After all the simulation data is stored, the stored timestamps are collected to calculate the elapsed time of the entire execution, as in the previous test.This data flow is depicted in Figure 4.
The test results consist of an average of 10 executions per batch.These results are presented in Figure 11B.This figure represents the mean elapsed time per batch.
It is important to note that executing such a high number of simulations is not typical but has been done to analyze the platform's behavior when handling a large volume of clients and messages.
As illustrated in Figure 11B, it is evident that when the number of simultaneous simulations is increased, the time interval remains within a 4-s threshold.It is slower compared to the other simulation, but this time, more HTTP requests have been made.Even with this high volume of requests, this evaluation indicates that the system is still capable of processing and storing the initial data despite the increased workload.In this case, each simulation is responsible for transmitting a total of 501 messages to a Kafka broker, achieving a similar load as the previous test.
Similarly to the previous test, Figure 11B shows the overall average times, and reinforces the findings presented in the previous test.Reaffirming that the service provides significant parallelization capabilities and efficiency.

Test 3: Machine learning model training with simulation data
To demonstrate the integration of OpenTwins simulation services and ML functionalities, a ML model has been developed using data collected from an FMI simulation.The utilized model is a multi-layered memory model, designed to extract information from a time-series dataset and enable the prediction of the desired characteristic.The model is composed of 2 GRU layers and 3 dense layers with an input of 512 data and an output of 2 data.This model can be observed in Table 1.Given the simulation inputs, which are the starting height and gravity, and outputs, which are the time series of heights and velocities at each time instant respectively, the model can detect whether the simulation is working properly or not.
In Figure 12 Mean Abs Error of the training can be observed.As shown in the figure, thanks to the virtually unlimited data generation possibilities of the simulations, the error value drops rapidly to values below 1 cm.
These results show not only the correct functioning of the coexistence of the simulation service and the ML functionalities of OpenTwins, but also demonstrate the importance of this coexistence.In this case an ML model with these characteristics has been achieved thanks to the large amount of data, however, the incorporation of both simulation and ML models allows the development of digital twins with better features and functionality.

CONCLUSIONS AND FUTURE WORK
This article proposes the re-definition of the OpenTwins framework to provide simulation support for the development of open source digital twins.The digital twin concept has gained significant attention in recent years, offering a promising approach for representing and analyzing real-world systems in a virtual environment.However, the lack of open-source frameworks that effectively integrate simulation functionality within digital twin architectures poses a significant limitation.The availability of an open-source framework with simulation capabilities can democratize the use of digital twins, making them accessible to a wider audience.By extending the OpenTwins framework with simulation capabilities and the integration with ML features, this research addresses the need for a comprehensive and accessible solution that allows developers and researchers to harness the power of digital twins.The integration of simulation within OpenTwins and ML, enables the exploration of "what-if" scenarios, the prediction of system behavior, testing of alternative strategies, advanced modeling techniques, algorithms, and data-driven approaches to gain insights, optimize system performance and so forth all in a cost-effective and risk-free virtual environment.
While this article has focused on the presentation of an open-source solution for running ML-compatible simulations in a digital twin application, further research and development are necessary to refine and extend the proposed framework.In future work, we have pointed out the following challenges and improvements to be carried out: • Expanding the spectrum of potential simulation types, facilitating their implementation and execution.

F I G U R E 1
OpenTwins architecture and components overview.
(b) Simulation as a copy of the twins.F I G U R E 3 Types of simulations by behavior.HTTP Request to start a simulation Grafana reads the data from influxDB and plots them on the 3d models and

HTTPF G U R E 5
Request to start a simulation Grafana reads the data from influxDB and plots them on the 3d models and graphs Grafana Broker MQTT After creating the new copy of the twin, Eclipse Ditto sends read data to MQTT Eclipse Ditto reads the data and creates a copy of the twins that becomes Data flow of the data using "what-if" simulations.

F I G U R E 6
Grafana pivot simulation form.

8
Dataflow from simulation to Kafka-ML.

F
I G U R E 10 3D representation and real pivot.