A formal approach to automatically analyse extra‐functional properties in mobile applications

This paper presents an integrated approach for testing mobile applications (apps) against a set of extra‐functional properties to be used by app developers. The approach starts with the (manual or automatic) extraction of the interaction model, that is, a formal model of the potential user interactions with the app. The model is constructed to allow a model checking tool to exhaustively extract the so‐called app user flows, that is, the sequences of user actions, that constitute the test cases. In the final step, the app user flows are executed on the app running on real devices. The resulting execution traces are enriched with different measures and verified against a set of extra‐functional properties of interest. The approach has been adapted to analyse several applications running at the same time with several devices supporting the applications. This paper presents the definition and formalization of both the modelling language for the interaction model and the specification language to represent the extra‐functional properties. It also describes a methodology for automatically extracting the model. Finally, it presents an implementation focused on Android apps, which is integrated in the TRIANGLE testing framework, and the evaluation of the approach. © 2019 The Authors. Software Testing, Verification & Reliability Published by John Wiley & Sons Ltd.


INTRODUCTION
The automated analysis of applications that run on smartphones is a hot topic due to the increasing role of these platforms as the main way for users to connect to the Internet. Execution errors and underperformance in mobile apps (the usual name for applications) have a great impact on user experience, on the overall behaviour of the smartphone and on the mobile communication network. This potential negative impact is not negligible considering that more than 2 billion devices run connected apps every day. In addition to the software's functional properties, the analysis of extrafunctional properties (EFPs), such as energy consumption, response time, traffic generation and memory use, is a central issue in developing new techniques to verify mobile apps.
The current application of formal methods, such as variants of static analysis or model checking, does not effectively predict the behaviour of mobile apps regarding EFPs due to the difficulties in constructing a realistic model of the whole environment where the application is running, including user interaction, other apps running on the same device, the operating system and interaction with the mobile networks. Approaches that work with models such as App Explorer [1] or Per-fChecker [2] still need more accurate information on delays or energy consumption, which can be obtained only from real executions. On the other hand, approaches that rely only on runtime monitoring to characterize app behaviour lack the mechanics to generate all realistic scenarios and/or to formally ensure the correctness or coverage of the analysis. For instance, tools such as AntMonitor [3], NetworkProfiler [4] and ProfileDroid [5] support methods to explore several executions to produce statistics, identify reference patterns in the traffic or locate potentially suspicious behaviours, but they lack a suitable formal framework to control the coverage of the executions and to describe the EFPs to be analysed. This paper presents a new approach to assist mobile application developers to test apps in different network scenarios. The approach combines model-based testing [6] and runtime verification [7] techniques to verify whether a mobile app satisfies a given set of extra-functional properties. It is worth noting that model checking [8] is the underlying method that makes it possible to automate both the generation of test cases and the analysis of traces. Thus, if the EFPs are violated, it is possible to locate the execution traces of the app that causes the violation. This proposal has been integrated into the TRIANGLE testing framework [9], which provides a mobile network in a laboratory. In addition, the TRIANGLE framework provides application automation and monitoring functionality to allow the execution of tests in different controlled network scenarios. Figure 1 shows a high-level overview of the testing framework, clearly separating the proposed methodology to describe both the interaction model and the expected properties (at the upper part of the figure) and the particular implementation of these ideas in TRIANGLE (the lower part of the figure). The app developer provides to the TRIANGLE Portal the binaries of the app (APK file in Android) and the extra-functional property (EFP), the interaction model and a description of the sequences of interactions suitable for the test cases (app user flow requirements). To ease the construction of the interaction model, the framework provides a support tool that extracts the interaction model from the application binaries. The interaction model and the app user flow requirements are then transformed and analysed with the SPIN model checker in order to produce a set of app user flows, that is, a set of realistic user actions suitable to evaluate the EFP. The Experiment Control & Mobile Network Testbed is in charge of executing the sequence of actions in a real device under different emulated network scenarios and providing the corresponding execution traces. user behaviours. This section presents the concepts needed to model the interaction of the user and the mobile application, the modelling language and its formalization.

Elements of mobile applications
Users interact with a mobile application mainly through graphical elements called controls, for example, buttons, text fields and lists. In a touch-based interface, these controls can be used in several ways, from simple gestures such as tapping to more complex gestures such as pinch to zoom. Not all controls respond to these gestures, for example, a button may react to taps but not to swipes. In addition, some user actions may also depend on events that they do not directly control. For instance, a 'play' button may be disabled while a song is being downloaded.
Due to the constraints of mobile devices and their small displays, most applications show only part of their graphical interface at a time. The set of controls that fits into the display at the same time comprises what is usually called a screen. While interacting with the application, the contents of the current screen may be replaced with others. For instance, an email application may have a screen with the list of emails in the inbox. When one of the emails is tapped, the application shows a different screen with the contents of the message. Figure 2 shows three screens of the Universal Music Player app, which will be used as a case study throughout the paper. This music player is a sample app that is included in the Android Studio IDE and can also be obtained from the Google Play Store. Each screen shows different controls. For instance, initially, the app shows a list of songs (left screen). If a song name is clicked, then the app starts to play the song and shows some playback controls and the album cover image (central screen). In this state, if the album cover image is clicked, a completely new screen is shown with the complete playback controls (right screen). Observe that some interactions happen within the screen, such as tapping on a song to start playing, while others change the current screen, such as tapping on an album photo to open the full-screen player.
Navigation between screens presents interesting challenges. Many applications organize their screens hierarchically: new screens are higher in the hierarchy, and the user can also navigate back to the previous screen at any time. This 'go back' action is usually supported directly by the mobile device, such as Android's system-level 'back' button, or the application framework, such as the iOS navigation bar.

of 35
This navigation model can be pictured as a stack of screens, where the screen at the top of the stack is the screen currently being displayed. When the user navigates to a new screen, that screen is placed on top of the stack. When the user goes back, the top screen is popped off the stack. The 'home screen' of the mobile device is always at the bottom of this stack.
In addition, the same screen may be reached through more than one path. When users press back, they expect to see the previous screen, not one of the other possible 'previous screens' from other paths. However, in certain cases, past screens may be removed from the stack at a certain point. For instance, after completing a multi-screen 'wizard', users may not be able to return to the wizard by pressing back.
New applications can also be started from others; for example, an email application could start the web browser application to load a website when a link is tapped. The new application might start at a screen other than its 'main' screen depending on the request made by the first application.
Not all behaviours in mobile applications can be described solely through user actions. Some depend at a certain point on events that are not directly controlled by the user, such an email being received or an alarm going off. Such events are called system events and must be taken into account in the proposed modelling language.

Modelling language
Given the elements of mobile applications identified in the previous section, a modelling language based on state machines is now proposed to describe them. This modelling language adopts many elements from UML state machines [14] and Harel statecharts [15], including their graphical notation, but is not a strict subset due to the introduction of additional elements to represent aspects of mobile applications.
State machines consist of states connected through labelled transitions. States are represented by rounded rectangles and transitions by directed arrows, both with an optional label. The unlabelled circles with two ongoing transitions are connection states, which will be explained later. State machines are also represented by rounded rectangles with a label, and they contain states and transitions. There are two special types of states: initial and final. The former are represented as unlabelled filled circles, while the latter are represented by circles with a cross inside.
Transitions represent user actions over the screen controls. Each transition is labelled with the action that the user would perform to progress to a new state. These actions include pressing a button, entering text into a field or scrolling up or down a list. Transitions may also be labelled with system events, which represent an event or condition not controlled by the user. This distinction is important for test case generation, as system events are not translated into actions performed on the screen. System events are distinguished by a '-' suffix in their names. Finally, the time elapsed between the current transition and the next can be specified by including the attribute '{time=x}' in the label.
Describing almost any application with a single state machine is impractical. Here, therefore, state machines are organized in two ways. First, a state machine may 'call' another state machine, for example, with each one representing the user behaviour on a different application screen. Second, state machines are contained in a hierarchy that mimics the elements identified earlier: devices, applications and screens. At the top level of the hierarchy are the device state machines. Each device contains one or more application state machines, which in turn may consist of one or more activity state machines. Activity state machines are useful for grouping different user behaviours in the same screen. At the bottom of the hierarchy are the state machines that define the different sets of user behaviours. They are called user-view state machines and contain states and transitions labelled with user/system events. This type of hierarchical composition is purely for convenience, as devices and user-view state machines would be logically sufficient. Figure 3 shows part of the Universal Music Player model, which includes different nested state machines. 'UniversalMusicPlayer' is the application state machine. It contains three nested state machines that represent the app activities: 'MusicPlayerActivity', 'PlaceHolderActivity' and 'FullScreenActivity'. Similarly, these state machines include user-view state machines that specify the user's interaction with the app screen by means of states and transitions labelled with user or  system events. For instance, in 'MusicPlayerActivity_0', the transition from state S4 to S5 is fired when the user clicks on the play button, and the next transition takes place 300 s later.
A special type of state, called connection state, can be used to call another user-view state machine on the same device. Connection states are represented as unlabelled empty circles and have two outgoing unlabelled transitions: one to a target state machine and another to a returning state within the source state machine. When a connection state is reached, the execution continues with the userview state machine referenced from the connection state. When the target state machine finishes, that is, when it reaches an end state, the execution resumes in the returning state.
Connection states can also reference an activity state machine. In this case, any externally accessible state machine within the activity can be executed next. External accessibility is a feature of applications, activities and user-view state machines and is represented with a pair of initial and final states reflecting the opportunity for access. The execution of the whole model starts with a user-view state machine that can be reached through a direct path of externally accessible elements, that is, applications and activities. This organization allows any state machine to be called explicitly from another.
In Figure 3, only the activity 'MusicPlayerActivity' is externally accessible, and thus when the application starts, only the state machine 'MusicPlayerActivity_0' is accessible. The other state machines are accessed through connection states. In this example, there is a connection state from state S5 to the state machine 'FullScreenPlayerActivity_0', which is in a different activity. If the current state is S5 and the user clicks on the control bar, the execution continues from the initial state of 'FullScreenPlayerActivity_0'. When this state machine reaches its final state, the connection activity indicates that the returning state is S5. From the S5 state, the user can click the control bar again or click the pause button.

of 35
By following the available transitions from the initial state in the interaction model, an app user flow, that is, a sequence of user actions, is generated. If there is more than one available transition from a state, each one will lead to a different app user flow.

Formal semantics of the modelling language
In this section, the modelling language presented earlier and the method by which meaningful app user flows are constructed are formally defined. The use of mobile applications is formalized through the composition of state machines at three different abstraction levels. User-view state machines are at the lowest level and represent the user behaviour on a mobile screen. Users can interact with the active screen, firing user events. Sometimes one of these events activates a different screen. This control transfer between views is modelled through the binary composition relation between user-view state machines from which device state machines are constructed. Device state machines use connection states to switch from the current user-view state machine to a different one. This formalization does not take into account whether both user views belong to the same or different applications. Finally, the third level corresponds to the concurrent execution of device state machines. At this level, different mobile devices are assumed to be executing and interacting through some applications. Note that activity state machines, as described in the previous section, are only structural components of models that do not essentially affect the app description. Consequently, to simplify the formalization in the description hereafter, they have been omitted.

Definition 1
A user-view state machine is a labelled transition system M D h †; I; ! ; E; C; F i, where † is a finite set of states, I Â † are the initial states, C Â † are the so-called connection states, F Â † is the set of final states, E is the set of user/internal events and !Â † E † is the labelled transition relation. Sets I , C and F are mutually disjoint.
Final states are states from which it is not possible to evolve. Connection states are states from which it is possible to transit to a different state machine. These states are essential to model the switching between typical views of smartphone devices.
Let  The set of events E is divided into two disjoint sets: the set of user events, denoted by E C , with events such pressing a button, and the set of system events, denoted by E , which includes, for instance, system responses to user requests. In the following, e C and e represent user events and system events, respectively, and e alone refers to events of either type.
According to Definition 2, test cases are finite sequences of user and system events. For instance, sequence e C 1 e C 2 e 3 e C 4 represents a test case where the user first fires events e C 1 and e C 2 , the system then fires e 3 and finally the user fires e C 4 . Thus, user and system events are handled similarly during the generation of test cases. The difference between them is important when test cases are transformed into executable code, as described in Section 5. User events will be transformed into non-synchronized calls to methods that simulate the real occurrence of the event, while system events will correspond to calls to synchronized methods which wait for the arrival of the system event.

of 35
A. R. ESPADA ET AL.

Composition of user-view state machines.
This section will describe how user-view state machines are composed to construct flows that navigate through different user views. A binary relation X , between connection and initial states, models this navigation. Assume that the flow in execution belongs to a user-view state machine M i and that a connection state cs of M i has been reached. If relation X defines a transition from cs to some initial state of another machine M j , the flow could jump from M i to M j and then proceed following the transition relations of M j . This jump implies a change in the active view from M i to M j . Subsequently, the user-view state machine that is visible in the device is called active, and the other user-view state machines, which have been created but are not currently visible in the device, are called created.
Assume a finite family of n state machines M i D h † i ; I i ; ! i ; E i ; C i ; F i i such that no state is shared between machines; that is, 81 Ä i; j Ä n: i 6 D j implies † i \ † j D ;. Then, the machine In addition, E Â E denotes the set of call events that provoke the switch between active user-view state machines.

Definition 3
The connection of a finite family of n user-view state machines M 1 ; ; M n is given by a binary relation X Â C E I , which enables the transition from connection states to initial states.
In the following, 3-tuples .s i ; e; s j / of X are denoted as s i e ! c s j . Note that the source and target machines i and j could coincide.
When one view is left to continue in another view, the caller view usually stays active, which means that the execution could return to that view in the future. To account for this behaviour, each connection state s 2 C i is assumed to have a related state return.s/ 2 † i , which represents the state to be returned to when the caller view continues its execution.
When a new view is created, the call event could specify some parameters that determine how it must be started or finished. For example, if the view has already been created, the caller can choose whether to reuse the previously created view or create a new view. Additionally, when the newly created view has finished its execution, the caller view could automatically become active or not. The Boolean functions reuse; auto_return W E ! ¹f alse; t rueº establish these parameters for the call events. Although there are other parameters that can be defined in the call events, these two are sufficient to describe the mobile behaviour.
The device state machine is now defined as the composition of the behaviour displayed by the user-view state machines along with the connection relation. The states of device state machines are termed configurations. A configuration is a 3-tuple hsh; rh; ehi in which sequence sh is the list of states s 0 s 1 s n that have been visited thus far in a flow, where s n is the current state in the active user-view state machine. Sequence rh is the stack of states r 1 r 2 r m that constitutes the history of the view machines that have been created (and not yet destroyed) in the device but are not currently visible. Each state r i of r 1 r 2 r m is the return state of a connection state of a user-view state machine that was previously active but became inactive when a transition from this view to another user-view machine took place. Notably, s 0 s n 9 of 35 and r 1 r m both represent sequences of states, that is, subsets of † , and the naming is purely to distinguish between these two components of a configuration. Finally, eh D e 1 e m is the history of events that provoked a user-view switch in the current execution. Observe that stacks rh and eh have the same length. If e i in eh is an event that fired a view switch, then r i is the state to be returned to when this new state machine finishes. In the succeeding text, represents the empty state/event history and eh e m (similarly, rh r m ) denotes a non-empty stack with event e m (state r m ) at the top.
The following rules define the relation ! d which is constructed from the transition relations of user-view state machines ! i , and the binary connection relation ! c . In these rules, given a history of states r 1 r m and an index j of a user-view state machine M j , the function top W † N ! † [ ¹?º returns the last state of the user-view state machine M j in the sequence r 1 r m . That is, top.r 1 r m ; j / returns r k , if 1 Ä k Ä m is the largest index such that r k 2 † j , or ?, if such a state does not exist. Rule R1 states that a transition inside a user-view state machine M i corresponds to a transition in the device state machine. The new state s 0 is added to the list of visited states sh s. Rules R2 and R3 model a transition from a machine to a new machine (M j , for some index j ) when both the new state s 0 and the event e are added to the view and event histories of the current system configuration. Rule R2 is applied when event e does not involve reusing a previously created view (reuse.e/ is false), while R3 applies when a view of M j should have been reused (reuse.e/ is true) but the current view history does not contain one (top.r 1 r m ; j / D ?). Rule R4 defines a transition from a machine to M j by reusing a previously created view of M j (reuse.e/ is true) stored in rh (top.r 1 r m ; j / D r k ). Finally, R5 defines the case when the flow of the currently active view has finished, and the execution must continue with the state r m stored at the top of the view history rh r m . Otherwise, that is, if auto_return.e/ returns false, the current configuration hsh; rh; ehi cannot evolve. Note that, for simplicity, this model omits the case when the developer modifies the usual flow when finishing an activity: instead of returning to the previous activity in the stack (auto_return is t rue), the developer can insert a different activity to jump from the closing activity (auto_return is f alse).  Observe that the state space of device state machines is not finite, given that the configurations include the state, view and event histories, which may have arbitrary lengths. In addition, the state space generated when an explicit model checker is constructing all the flows allowed by a device state machine is non-finite. This circumstance is due not only to the state and event histories, but also to the necessity for the matching algorithm, applied during the state space search, to account for both the current state of the flow and the history of the previous states of the flow, as contained in the first component of each configuration. For example, this approach allows both flows 1 D

Given a device state machine
! d hs 0 s 4 s 1 s 2 s 3 ; ; i to be generated by the model checker. Even though both visit the states s 1 , s 2 and s 3 , the path taken in each flow is different.
As a consequence, the models of device state machines are not, in general, state finite, which means that the model checking process does not, in general, terminate. The current implementation solves this problem by bounding the depth of the execution flows analysed by generating O n .D/ for some fixed n. The communication between the two devices is modelled by a user event in the sender device (the device that starts the communication) and a system event in the receiver device (the device that receives the message). Thus, for instance, using the previous example, if e 1 D e C 1 is an event that implies a communication from D to D 0 and e 0 1 D e 1 is the corresponding event to be read by D 0 from D, the test cases e C 1 e 1 and e 1 e C 1 are generated. Note that in the second test case, the method that implements the transition for the receiver event will suspend the execution of D 0 until event e C 1 is fired by D.
In addition, when addressing the use of more than one device, model checking optimization techniques such as partial order reduction [8] are used to avoid the generation of multiple test cases that correspond to a single feasible interaction between the devices.

SPECIFICATION LANGUAGE FOR EXTRA-FUNCTIONAL PROPERTIES
This section introduces the language to describe EFPs on execution traces. Usually, verification techniques such as model checking evaluate properties over traces, abstracting the real time when each state occurs. This abstraction is adequate, for example, to analyse functional (safety and liveness) properties. However, the analysis of some non-functional properties, such as energy consumption, requires taking into account (tracking and measuring) the values of some non-discrete variables that evolve over time. For instance, to analyse the energy consumed by a device when downloading a file, it is necessary to determine when the download starts and finishes and measure the energy consumed by the device during this period.
Three interval formulae constitute the proposed simple specification language. In these formulae, there exists an implicit synchronization between the discrete evolution of traces and the 11 of 35 continuous evolution of the magnitudes to be checked on the traces. The interval formulae are first presented on an intuitive level and then formalized. Finally, this section addresses their translation into linear temporal logic (LTL) to be analysed by on-the-fly automata-based model checkers such as SPIN.

Interval formulae
Assume that O.P / is the set of execution traces determined by a transition system P D h †; 7 !; L; s 0 i. Traces are sequences of states of the form D s 0 7 ! s 1 7 !
. ‡ This section uses the term execution traces, or simply traces, to denote the sequences of states produced by a transition system since this generality suffices to describe the syntax and semantics of the interval formulae. In any case, in the following sections, the execution traces will correspond to the real sequences of states generated when the test cases are executed on the mobile devices. Let F be a set of state formulae to be evaluated on the states of †. The relationˆÂ † F associates each state with the state formulae that it satisfies, in other words, given s 2 †, and p 2 F, sˆp iff state s satisfies formula p. As usual, the state formulae are assumed to be constructed from a set of atomic propositions and Boolean operators.
Assume a set of variables C that represent continuous magnitudes to be analysed on the executions of traces. Each c 2 C is a real-valued function c W R 0 ! R that defines the evolution of c with time. Thus, c.t/ 2 R gives the value of c at time instant t . State formulae are used to determine the state intervals on which it is necessary to measure the continuous variables. For example, the state formulae wifi_on; wifi_off may serve to detect the states in the traces during which the wifi is activated.
Following this idea, a simple language is proposed for the specification of non-functional properties, which will use the intervals of states to describe the periods during which the continuous variables of interest are evolving and must be monitored.
Given p; q 2 F, c 2 C and K 2 R 0 , three valid formulae are defined in the specification language: OEOEdiff_c Ä K OEp;q , 8OEOEd iff _c Ä K OEp;q and 9OEOEd iff _c Ä K OEp;q . The intuitive semantics of 8OEOEdiff_c Ä K OEp;q is as follows. An execution of trace D s 0 7 ! satisfies 8OEOEdiff_c Ä K OEp;q iff for each pair of states s i ; s j of , with i Ä j and s iˆp ; s jˆq , the following condition (Cond ) holds: 'the difference between the values of variable c at the time instants when s i and s j took place is less than or equal to K'. That is, p and q determine the trace interval s i 7 ! 7 ! s j on which the evolution of variable c must to be observed. The other two formulae are similar, except that 9OEOEd iff _c Ä K OEp;q /OEOEd iff _c Ä K OEp;q holds iff condition Cond holds for some/the first pair of states s i ; s j with i Ä j , and s iˆp ; s jˆq .
For example, if variable c gives the energy consumed by a device, formula 8OEOEd iff _c Ä k OEwifi_on;wifi_off holds for an execution of a trace iff each time the wifi is activated, the energy consumed is less than or equal to K.

Interval formulae semantics
The traces D s 0 7 ! can be described as maps W N ! † that associate each natural number with the corresponding state in the trace, that is, .i/ D s i . Because the traces provided by the test cases are finite, each trace can be supposed to have an ending state o that repeats infinitely often. Hence, assume that for each trace , there exists a natural number n > 0 (the length of the trace, denoted as length. /) such that (i) .n 1/ 6 D o § and (ii) 8k n: .k/ D o.
Although the execution time is abstracted in operational semantics, it is clear that the execution of each trace takes time, during which many other things that influence or are affected by the trace execution may occur. The following definition makes the time taken by the execution of the traces explicit. ‡ Because transition labels are not necessary, they are omitted from the transition relation. § To simplify the presentation, it is assumed that traces have at least one non-ending state s 0 .  The intervals of states (within the traces) are used to determine the periods of time during which continuous variables should be observed. To accomplish this task, the interval calculus introduced by Chaochen and Hansen [16] is used to give formal semantics to the language for EFPs presented in Section 3.1. The interval logic domain is the set of time intervals Intv defined as ¹OEt Interval expressions that describe properties on intervals can be constructed by using a set of interval variables, relational and boolean operators and real constants. For instance, if K is a constant, then d iff _c Ä K W Intv ! ¹t rue; f alseº defines the property on time intervals OEt 1 ; t 2 to be true Given a trace and an interval of natural numbers OEi; j , # OEi; j denotes the state interval/subtrace of from state .i/ to .j /. Similarly, given an execution e of , eOE # OEi; j represents the time interval OEeOE .i/; eOE .j / from the creation of state .i/ to the creation of state .j / in execution eOE . Thus, state intervals and executions of traces provide time intervals on which to evaluate interval expressions such as d iff _c Ä K.
State formulae are used to construct state intervals as follows. Given the set of state formulae F defined in Section 3.1, the term formula intervals is applied to expressions such as OEp; q with p; q 2 P rop. The satisfaction relationˆon state intervals is extended as follows.

Definition 7
Given a trace and an interval of natural numbers I D OEi; j with i Ä j , the state interval # I satisfies OEp; q, written as # IˆOEp; q, iff the following conditions hold: 1. .i/ˆp 2. .j /ˆq 3. 8i < k < j: .k/ 6 q that is, OEi; j is a state interval of such that .i/ satisfies p, and .j / is the first state after .i/ that satisfies q. ¶ It is assumed that eOE .i/ represents the time instant when state s i is created. || It is assumed that if is a trace of length n, eOE associates the final ending states of with the time instant when the last non-ending state took place, that is, 8k n:eOE .k/ D eOE .n 1/.

of 35
The following assumes that the ending state o satisfies no formula of F, that is, 8p 2 F: o 6 p. Now, given a trace and a proposition interval OEp; q, + OEp; q denotes the finite sequence of state intervals of , written as I 0 I 1 I m 1 , that satisfy OEp; q in the sense described earlier, that is, 80 Ä i < m: # I iˆOE p; q.
The following two definitions show how the sequence of intervals + OEp; q may be constructed.

Definition 8
Given a state formula p, a finite trace of length n, and k 0, # k p is the first state of that occurs after (including) .k/ and that satisfies p, if it exists, or is 1, otherwise. Then, # k p can be inductively defined as: Given a finite trace and two state formulae p; q, the sequence of state intervals determined by p; q, + OEp; q, is inductively defined from operator + k with k 0, as described in the succeeding text.
Thus, the two state formulae p; q 2 F determine a sequence of state intervals + OEp; q D I 0 I m 1 in that satisfy OEp; q. This definition can be extended to executions e of as eOE + OEp; q D eOE # I 0 eOE # I m 1 . This approach provides semantics for the three formulae presented in Section 3.1.
The following definition applies when an execution e of a trace satisfies an interval expression such as d iff _c Ä K. That is, an execution e of trace satisfies formula (1) OEOEˆ OEp;q iff the first time interval determined by + OEp; q and e satisfiesˆ; (2) 9OEOEˆ OEp;q iff a time interval exists in the sequence eOE + OEp; q that satisfiesˆ; (3) 8OEOEˆ OEp;q iff all the time intervals determined by + OEp; q and e satisfyˆ. For instance, ifˆD d iff _c Ä K, then OEOEˆ OEswif i;ewif i establishes that the time interval determined by the first state interval on which OEswif i; ewif i holds must satisfyˆ.

From interval properties to LTL
This section discusses how the interval properties can be practically evaluated on execution traces. Each type of interval property can be described by an LTL formula to be given to the model checker.
To simplify the presentation of the formulae, given two state formulae p and q,ˆ.p; q/ is defined as:ˆ. Intuitively,ˆ.p; q/ is the LTL representation of property: 'p holds on the current state, q will be true in a future state and, at that moment, the time interval determined by p and q will satisfyˆ', as the following diagram illustrates: For OEOEˆ OEp;q properties, the following LTL specification is used: The intended meaning of this formula is as follows. First, search for the first state (s i ) on which p holds; then, search for the first state following s i on which q holds (s j ). These two states s i and s j determine a time interval. Ifˆis true on this interval, then formula OEOEˆ OEp;q holds. Otherwise, if it is not possible to find either s i or s j , or if the time interval does not satisfyˆ, then the formula is false. The following sequence shows a trace that satisfies OEOEˆ OEp;q . Note that solid arrows are used to identify intervals and dashed arrows otherwise.
For the property 9 OEOEˆ OEp;q , following LTL specification is used: that is,ˆ.p; q/ should be true either in the first state or in some future instant. Note that d represents the 'next' operator, which is safe to use in this case because the analysing addresses linear execution traces rather than concurrent programs. The use of d is needed to ensure that the formulaˆ.p; q/ is evaluated on a maximal time interval determined by p and q; that is, the state on which p is true must be preceded, if it is not the initial state, by a state that does not satisfy p. The following sequence shows an example of a trace for whichˆ.p; q/ holds on the second time interval determined by p and q.
Finally, the LTL formula for 8 OEOEˆ OEp;q is given by: that is, all maximal intervals determined by OEp; q properties in their ending points should satisfyâ t the instant when the right ending point occurs. The following sequence shows a trace with two time intervals, determined by p and q, for whichˆis true. Note that the last state is labelled with the symbol o to indicate that the trace does not contain any state interval after OEs i 2 ; s j 2 satisfying OEp; q.

TRIANGLE TESTING FRAMEWORK
The goal of the TRIANGLE project † † is to provide app developers and device makers with benchmarking services to ensure that their products are ready to meet the challenges presented by 5G mobile networks. The improvements resulting from the upcoming standard will push existing use † † www.triangle-project.eu.
cases and enable new ones, for example, in connection with eHealth, smart cities and media streaming, and will increase the expectations of the end users. In this context, the TRIANGLE project focuses less on functional testing, which is covered by many existing tools and services, than on quality of experience (QoE). The benchmarking process is built on a set of key performance indicators (KPIs), which are organized around domains such as power consumption, network resource usage and user experience. Each KPI will provide insight into a different aspect of the application or device that influences user perception. The aggregation of the KPIs from each domain will be compared against a set of reference apps or devices. Figure 5 presents a high-level overview of the TRIANGLE platform. The project involves building a testing framework using state-of-the-art testing equipment, which enables the emulation of many different network scenarios in a realistic manner, using real mobile devices and apps. This equipment is expensive for most developers, and its configuration and use is a difficult, error-prone task that requires a high degree of expertise. In addition, the framework consists of many pieces that must be coordinated to conduct a test. The project aims to hide all this complexity behind a userfriendly web interface allowing users to focus on their products and the results of the benchmarking. Users will see only high-level network scenarios, such as urban pedestrian or high-speed train environments, instead of the actual configuration of the emulation equipment. The user interacts with a web portal, where the app under test must be provided along with additional relevant information, such as the features supported by the app. The framework uses this information to decide which test cases are applicable to the app and to prepare and execute them. The execution of a test case involves coordinating the configuration of the network equipment, the automation of the user equipment (UE), that is, the mobile device, the measurement equipment and tools and the automation of the app itself. The processed results are shown to the user in the web portal.
This paper extends the original architecture of the TRIANGLE testing framework. First, the Transform block is extended by a formal definition of the interaction model that describes the user interactions with the app. A support tool has been implemented that automatically extracts the model from the compiled applications and allows the refinement of the model. Given the interaction model, model checking (exhaustive exploration) is used to obtain a set of app user flows that can be included among the test cases to measure KPIs. Second, the Analysis and Reporting block is extended by means of a formal language to define EFPs. It is now possible to verify whether the traces produced during test case execution satisfy some of the properties.

App automation
One vital piece of the framework for benchmarking apps is guiding their execution. For instance, to execute a test in which the performance of a video playback will be evaluated, the app must be automated to play a video. The TRIANGLE framework uses Quamotion WebDriver [17], an automation solution for mobile apps based on the WebDriver standard for web browser automation. This tool can send commands to the mobile device that simulate actual user interactions with an app, such as tapping a button or entering text in a field. The sequence of actions used to carry out a test case in the TRIANGLE project is called the app user flow. The framework can take a JSON [18] script with a sequence of app user flows and perform them on the device using the Quamotion WebDriver.
Model-based testing [12] can help to automate the generation of app user flows as JSON scripts. Furthermore, if the model is correctly annotated, only the app user flows that are useful to compute any given KPI are generated (preliminary work [13]). If the requirements to measure a KPI change, then new app user flows can be generated from the same model. Therefore, the web portal can accept a single interaction model instead of a set of automation scripts. Section 5.1 presents an automatic way of creating this interaction model from the app itself so that most of the work is performed for the user. This process is based on the compiled version of the app (apk in Android). The generated model should still be fine-tuned by the app developer, for example, to add meaningful event names, but the bulk of the work will be performed automatically. The following sections presents the formal languages for modelling the app and describing the EFPs.

IMPLEMENTATION FOR ANDROID DEVICES
This section describes the implementation of this proposal for analysing mobile applications in the context of the TRIANGLE platform. Although this architecture can be adapted for different mobile operating systems, this paper focuses on its implementation for Android devices.
As discussed earlier in Sections 2 and 3, model checking, in particular the SPIN model checker, is the underlying formal technique (Figure 1), not only to generate app user flows that will be executed on real devices as part of the test cases but also to carry out the effective verification of EFPs, such as energy consumption, on these test cases. Figure 6 shows an overview of the approach to automatically extract the interaction model. It is based on three main elements: the app controller, the exploration algorithm and the model parser. Given the application binaries, the app controller installs and manages the execution of the target app. The app controller can perform user events and capture the hierarchy of visual elements. The exploration algorithm decides the order in which the user events are performed and determines the different interaction model states based on the app UI. Finally, the model parser transforms the states and transitions obtained from the exploration into the interaction model. The following sections explain each element in more detail.

App controller.
The app controller is in charge of interacting with the device where the application is running and is responsible for the following tasks: 1. Install, launch and close the application 2. Obtain the hierarchy of visual elements of the active view 3. Determine the list of visual elements that accept user events 4. Perform user events on specific visual elements (e.g. click, long click or scroll) 5. Fire system events (e.g. open/close keyboard or play/pause media) Some of these tasks can be carried out with existing UI automation testing tools [19][20][21][22]. This approach has implemented an app controller for Android devices based on two testing frameworks. First, Quamotion WebDriver [17] is a test automation framework that automates iOS and Android apps on real devices. WebDriver [23] is an open protocol for test automation originally designed for web applications based on the exchange of JSON messages. Second, Android UIAutomator [22] is a UI testing framework included in the Android SDK. All previous tasks can be implemented with the SDK testing framework, but such implementation is less reusable. For this reason, UIAutomator is used only for tasks that WebDriver cannot carry out or cannot perform efficiently. In particular, UIAutomator's main task is to extract the app's Document Object Model (DOM), that is, the hierarchy of visual elements. For each visible element, the DOM includes a list of attributes: the resource identifier, the class and the user events accepted. This information is especially useful for determining whether the UI has changed after performing a user event and obtaining the list of controls and events that must be explored.
Mobile applications have complex UIs with multiple activities, fragments, overlays, controls, etc. that accept different types of user events. The exhaustive exploration of all visual elements that react to user events can lead, in the worst case, to large models with a poor performance in the generation phase of the app user flow. To manage the size of the application model, the app controller includes the following configurable options: Types of events considered in the exploration. For instance, if scrolling a layout produces the same effect as clicking on a tab menu, the scrollable views can be ignored and only click events analysed. Number of list items explored. In some apps, the effect of performing an event on any list item is the same. For instance, in a music player with a list of playable songs, clicking on each song will start playback. Thus, exploring only some of the items is sufficient to extract the model. Ignored elements. Some events in specific elements could have non-desired effects during the model extraction or the test execution, such as elements that restore the account password or open the configuration settings. For this reason, such elements are systematically excluded from the exploration. Predefined text for specific EditText fields. This concern is relevant in apps whose behaviour depends on the information provided in a form, for instance apps that require a login. Observe that most of the configurable options are connected to obtaining the list of visual elements that will be explored. Thus, the exploration algorithm examines a reduced number of states. In contrast, the configuration of EditText fields increases the number of explored states to include all desirable app behaviours.

Exploration algorithm.
The exploration algorithm defines a strategy to execute user events in order to traverse the different app states. Amalfitano et al. [24] compared testing techniques and tools for mobile apps and determined that the most widely used strategies are depth-first and breath-first search. In the present work, both strategies were implemented, but only the results of the depth-first search are used in the evaluation section because the number of times that the app has to be initialized (or relaunched) is lower, which in turn decreases time required to extract the model.
An app state represents the app UI after the performance of a user event on a specific element. Because a state can be reached by performing user events in different UI elements, the exploration algorithm stores a 4-tuple (xPath, event, dom, exp), where xPath [25] identifies the element in the source DOM, event is the event performed on the element, and dom is the DOM after the event, in other words, the app state, and exp is a flag that, if true, indicates that the current configuration and its successors have been explored. A configuration has successors if the dom has controls that can handle user events. In the following, the term configurations is used for the 4-tuples handled by the algorithm to distinguish them from the interaction model states.

of 35
Listing 1 shows a simplified version of the algorithm in Java, which implements a depth-first search strategy. The main data structures are the set of visited configurations, the path (list) of configurations that leads to the current configuration and the stack of unvisited configurations. This stack stores incomplete configurations in the sense that the dom is empty because the event has not yet been performed. The input of the algorithm is the initial configuration s0, which is stored in unvisited with xPath '/', event 'launch' and an empty dom (line 9). While there are unexplored configurations, the algorithm extracts one, requests that the app controller perform the user event s.event on the visual element s.xpath (line 14) and stores the resulting dom. Then, the algorithm checks whether a configuration in visited has an equivalent dom (line 15). If so, the configuration is considered visited, and the interaction model is updated with a new transition. Otherwise, the configuration s is new and is included in visited. In addition, the interaction model is updated with the new state and the transition. If the configuration has successors, that is, if the dom has controls that can handle user events, they are included in the stack of unvisited configurations (line 32). Finally, the algorithm backtracks if the configuration s is in visited (line 22) or if s has no successors (line 36).

Matching criterion.
The matching criterion defines when two doms can be considered equivalent to diminishing the number of configurations explored. In Listing 1, the matching criterion is applied to determine if there is a configuration equivalent to s in visited (line 15). Listing 2 shows part of a dom provided by the app controller. Observe that it includes the hierarchy of visual elements (node) and information about them (user events enabled, class or text content).
The matching criterion determines that two doms are equivalent if they are in the same activity, they have the same hierarchy (hierarchy relation and number of nodes), and their nodes are equivalent: Two nodes are equivalent if the following attributes are equal: resource-id, class, package, checkable, clickable, enabled, scrollable and long-clickable. If the nodes are editable, their text attributes must also be equal. A node is editable if its class is android.widget.EditText or inherits from this class. This rule is required to generate models of forms. Figure 2 shows three different states of the Universal Music Player app. Applying the matching criterion, (a) (list of songs) and (b) (list of songs with first song playing) are different states; although they are in the same activity, they have a different number of nodes (observe that (b) includes new playback controls).

5.1.2.2.
Backtracking. The exploration algorithm backtracks to a previously visited configuration when the explored configuration has been visited previously or when it has no successors (Listing 1, lines 22 and 36). Listing 3 shows a simplified Java code of the backtracking algorithm. The backtracking process consists of finding a configuration in path (from last to first) that has at least one unexplored successor (lines 4-13).
If path is not empty after this process, then the last configuration in path has a successor that has not yet been explored and corresponds to the next configuration that will be extracted from unvisited (Listing 1). In this case, the app controller closes the app and performs the list of user events (on their corresponding elements) included in path. When the app controller ends, the app will be running on the device and ready to accept the user event of the next unexplored configuration. Otherwise, if the path is empty, the app controller closes the app and throws an exception to indicate the end of the exploration.

Model parser.
The model parser produces the interaction model that will be used to generate the app user flows. The model parser interacts with the exploration algorithm when new states and/or transitions are added (Listing 1, lines 19, 25 and 26). When the exploration algorithm ends, the model parser dumps the interaction model into a file, using an XML-based internal representation of the modelling language presented in Section 2.2. The model parser is based on Velocity ‡ ‡ , a Java template engine that generates the XML representation of the interaction model from a template.
It is worth mentioning some limitations of the current implementation. Although the modelling language can capture user and system events and delays between events, the automatically generated models do not include temporal delays in transitions or interactions with other apps. However, app user flow requirements can constrain the time elapsed after firing a particular event, which in practice is equivalent to including the delay in the model. Figure 3 shows the graphical representation of the Universal Music Player model. ‡ ‡ http://velocity.apache.org/.

Evaluation of the model extraction.
This section presents an evaluation of the model extraction approach using the depth-first search exploration strategy. The evaluation has been performed using a Samsung Galaxy S4 smartphone (non-instrumented) with Android 5.0.1 and a Windows 8 machine with an Intel Core i-7 3.600 GHz. The approach has been applied to five different apps, all of which were obtained from the Google Play Store or the Android SDK. The configurable parameters are the maximum depth explored (number of consecutive user events), the number of list items explored and the GUI elements excluded from the exploration. In addition, specific text can be associated with text input elements (e.g. a login form). Table I shows the configuration of these parameters for each app. Universal Music Player (UAMP) § § is a music player that presents to the user a list of songs that can be streamed. From the point of view of the interaction model, playing any of the songs has the same effect. Thus, the model extraction is set to explore only the first item of each list. Note that UIAutomator has the limitation that it cannot obtain the DOM when the UI is changing dynamically, for instance, if a video is playing. Thus, the application controller (Section 5.1.1) has to pause the media playing before obtaining the DOM.
iDo Calculator ¶ ¶ is a calculator with simple and scientific modes. Text input is performed by clicking independent buttons for each digit and arithmetic operation. The maximum depth has been set to 5 to obtain a model that considers arithmetic operations with 1 or two digits.
Kolab Notes |||| is an app for taking notes that can be local to a device, or shared by multiple devices using an account. The model considers only the local mode, and the controls to share the notes were excluded from the exploration. In addition, a colour picker component was excluded because it is currently difficult to handle this component in the model. With respect to the text input, sample text was provided for the note title and content.
Topeka *** is an app for performing quizzes. Generated quiz questions are random; some are answered by choosing from four possible answers and others by writing text. To increase the probability of answering both types of questions during the exploration, the maximum depth was set to a high value.
WordPress † † † is an app for visualizing and managing WordPress sites. This app has an initial login form, and thus, the form associates specific text with a valid user name and password. The app provides links to recover the user name and password, create a new account or read the Terms of Service, and these controls were excluded from the exploration. In addition, the app suggests a list of sites to visit, which changes dynamically and includes different clickable elements. This situation is similar to the dynamic questions in Topeka. However, to produce a first version of the model, the number of sites explored was limited by the list items explored and the maximum depth. Table II shows some results. Because the max depth of the algorithm was limited, the resulting models may be a partial or incomplete representation of the app. On the one hand, for the UAMP app, increasing the depth of the algorithm produces the same interaction model, which implies a good representation of the app. On the other hand, the models of Topeka and WordPress, whose behaviour have a random component, are partial. In addition, different models can be extracted in   2  2  9  93  49  23  UAMP  3  4  13  41  14  22  Kolab Notes 2  2  16  69  32  45  Topeka  3  3  16  51  33  38  Word Press  14  16  31  77  54  39 different executions of the model extraction algorithm. However, all these models present common interaction patterns. Thus, it is recommended to manually tune the models to allow the common patterns. Finally, the iDo Calculator model considers a click on each button as a different user event.
Although all these events leave the app in the same state, the number of transitions is too high to produce a set of app user flows. In this situation, it is recommended to modify the model to abstract the buttons, for example, by defining two types of buttons, numeric and arithmetic, and thereby reducing the number of transitions.

App user flow generation
This section describes how to automatically produce app user flows, that is, sequences of user actions, using a model-based approach. The process starts with a model of the user interaction with the app, provided by the developer or automatically extracted from the compiled app as explained in Section 5.1. In both cases, the model is specified using the language described in Section 2.2. The interaction model is explored exhaustively to generate all possible sequences of user actions. For this step, the power of the SPIN model checker [8] is used. SPIN can be used to analyse the correctness of concurrent software modelled using the PROMELA specification language. The focus of the tool is on the design and validation of computer protocols, although it has also been applied in other areas. SPIN checks the occurrence of a property over all possible executions of a system specification and provides counterexamples when violations are found.
In this case, SPIN performs an exhaustive exploration of the interaction model by translating the XML representation into a PROMELA specification. Each device is represented by a PROMELA process that models all of the state machines contained in that device. While the interaction model is composed of nested state machines, the PROMELA code for a device consists of a single loop, where each branch corresponds to a transition in the model. A global variable per device is used to track its current state and decide which transitions can be performed next.
To implement connection states, each device keeps a backstack, that is, a record of the transitions made in the past that lead to connection states. When a state machine finishes, the top of the backstack is checked. If it contains a connection state, then that information can be used to restore the next state in a previous state machine. If it is empty, then the execution of the model terminates.
This specification is explored depth-first by SPIN and each transition taken is recorded. When a valid end-state is reached, the sequence of transitions during the exploration contains the user actions, that is, an app user flow. If more than one transition can be chosen at one point, SPIN will first explore one of them and later return to explore the rest. In this way, the set of all possible app user flows will be generated.
Listing 4 presents a simplified excerpt of the PROMELA specification generated for the Univer- the app user flows must satisfy to evaluate a KPI. In the example, the app user flow requirements are to click on the play button and later on the stop button.
In previous work [13], the authors present an approach to optimize the generation of app user flows. The idea is to verify the interaction model against a set of properties, so that counterexamples represent the app user flows that satisfy the requirements. For this purpose, the requirements are described as Büchi automata using a never claim, which is a special PROMELA process that monitors the execution of the interaction model. Two different ways of defining these never claims, called pruning and non-pruning, were evaluated. Since the pruning never claims obtained a better performance, in this paper, all app user flow requirements are described in this way.
Currently, the TRIANGLE testing framework supports app user flow requirements described in XML notation. Listing 5 shows an example of the XML notation. The requirements are classified as invariants or constraints. Invariants describe conditions over states, events, views (activities) and loops that must always be true. In the example, the invariant restricts the number of loops (line 7) and forces the app user flow to avoid state S1 (line 8) and the activities FullScreenPlayerActivity and PlaceholderActivity. In addition, only two back events (line 9) can occur in the app user flow, and after them, a 10-s delay is included until a new event is fired. Constraints are restrictions over states, events or activities that must eventually hold in a given order. In Listing 5, the app user flow must pass at some point through state S6, then through state S7 and finally through state S8. Constraints can be simple or complex. In the first case, the constraint affects only a state, event or activity. In the second case, the constraint is defined as a set of restrictions that must hold simultaneously.
The interaction model and the app user flow requirements are given as XML files and have to be compliant with their respective XML schemas. If they are not compliant, they cannot be translated 25 of 35 to PROMELA and the automatic generation of app user flows cannot continue. Listing 6 shows the pruning never claim equivalent to the requirements of Listing 5. The body of the never claim (lines 20-33) is a sequence of do blocks, each one with two guarded branches. The first branch is guarded by the invariant and the constraint that must hold at some point, and if it is satisfied, the execution jumps to the next block. The second branch is guarded by the invariant and the pruning condition, which is essentially the negation of the constraint. If the guard is satisfied, the never claim remains in this do block. The final do block is slightly different: the first branch executes an assert(false) instead of a goto statement. Figure 6 presents a simplified definition of the invariants and the constraints (lines [4][5][6][7][8][9][10][11][12][13][14][15][16][17][18]. Note that these definitions include references to variables and constants from the PROMELA model ( Figure 5) and calls to auxiliary functions written in C. For example, numEvents returns how many times a specific event has been fired in the current app user flow.
The verification is carried out by the SPIN model checker. If the requirement contains references to non-existing elements in the interaction model, such as misspelled event or state names, SPIN reports the syntactical errors. The SPIN model checker is configured to detect assertion violations and does not stop when an error is found. In this way, it produces a counterexample each time the assert statement is executed, and each counterexample is an app user flow that satisfies the requirements.

Execution of app user flows and test cases
This section describes how the app user flows are automatically executed. To execute the app user flows, Quamotion WebDriver was used again [17]. Because it implements the standard WebDriver protocol, a number of clients can interface with it. In TRIANGLE, the Keysight Testing Automation Platform (TAP) [26] automates the whole process. TAP is a flexible platform that orchestrates the interaction of the testing framework with a number of instruments that interact around the device under test (DUT). A new TAP plugin has been developed to interact with the Quamotion WebDriver, which executes an app user flow written as a JSON script. The TAP plugin translates each of the user actions into calls to the WebDriver API. Before executing an action, the plugin checks and waits for the action to be executable, for example, that the referenced xPath is indeed available in the current contents of the screen. The TRIANGLE testing framework monitors the execution of the app user flows, measures different parameters (average current, transmitted and received data, user events, etc.) and stores this information in a database.

Verification of extra-functional properties
The final step of this approach is the analysis of the test case execution. This step involves the analysis of execution traces to verify the given EFPs.
The TRIANGLE testing framework provides information about the device on which the app runs (such as the average power, transmitted/received data and transmission/reception rate) and the network conditions (cells, eNodeBs, etc.). In addition, the developer can instrument the app under test to include in the traces when certain events or pieces of code are executed, which is needed during the runtime verification process of EFPs. We call these new traces enriched traces.
Each EFP is transformed into a monitor whose implementation is described in the succeeding text. The EFP monitor analyses the trace on the fly while the test case is running, and it returns a verdict as soon as one is available. Currently, the TRIANGLE testing framework provides the enriched trace after the test case ends and, as a consequence, the monitor analyses it offline.
The EFP monitor carries out two interrelated tasks. On the one hand, it has to read the states of the enriched trace and, on the other, it has to simultaneously evaluate these states against the EFP. These two tasks are implemented using a PROMELA specification with embedded C code such as the one shown in Listing 7. Thus, the PROMELA monitor contains (i) code to translate on the fly the enriched trace into a sequence of states [27] that constitutes the model analysed by SPIN (for example, the code of Listing 7) and (ii) code that implements the evaluation of the EFP, which is implemented as a never claim PROMELA process from the LTL formula, as described in Section 3.3. The never claim process corresponds to the Büchi automaton of the LTL formula in SPIN notation. The tool SPIN automatically transforms (the negation of) the LTL specifications into never claim processes and executes them with the system model in a synchronized manner. This means that SPIN always fires a transition in the model followed a transition in the never claim and that naturally acts as a observer of the system execution searching for erroneous behaviours.
In the following, Listing 7, which shows a simplified fragment of the PROMELA specification for the next formula, is used to detail how the SPIN model checker evaluates the formula on an enriched trace. variable c_t1 is needed for the latter. These variables are updated automatically by the so-called update functions, that is, C functions that compute the values of variables that are obtained from the enriched trace.
Line 13 shows the update function for the energy_t1 variable (line 4). This function updates the values of energy_t1 only at the start of a new interval. Another variable _interval, which is automatically updated by another function (line 7), is introduced to detect the intervals. This function encodes the start and end conditions of an interval formula. Both update functions are executed inside the loop (line 29) after the variables for the next state have been retrieved, in the same atomic step.

EVALUATION
In previous work [12], the authors applied the proposed model-based methodology to analyse the network traffic produced by the Spotify app for Android devices. The implementation of this approach ‡ ‡ ‡ was not integrated into the TRIANGLE testing framework, and some of its parts, such as automatic model extraction or the optimization of app user flows, were not yet designed and implemented. This section presents the evaluation of the approach using a real app as case study.

Description of the case study
The Universal Music Player is the application under test that will be used as case study. It is music streaming app included in the Android Studio that includes a fixed play list. The main GUI components are presented in Section 2.1 and Figure 2. When the user selects a song to play, the app send a request to the music servers and start the download of the song and the playback. Clearly, the network (network load, coverage, user mobility, etc.) can influence the quality of service, which is reflected for example in the traffic transferred between the music servers and the app. Thus, the case study objective is to evaluate if the UAMP satisfies the following EFP: 'during the playback of the first song of the playlist, the data received by the app are under a threshold'. If the EFP is satisfied, then there is a low number of packet retransmissions during the playback.

Experiment set-up
This section summarizes the configuration used in the evaluation of each of the three phases of the approach. The first phase consists of extracting of the interaction model from the app binaries, which is independent of the TRIANGLE testbed and can be carried out manually or automatically. The automatic model extraction has been performed using a Samsung Galaxy S4 (non-instrumented) with Android 5.0.1 and a Windows 8 machine with an Intel Core i-7 3.600 GHz. In addition, the exploration algorithm was configured with a maximum depth of 10 click events, and in case of list, only the first item of lists is explored.
The second phase is the automatic generation of app user flows that activate the features necessary to analyse the EFP. The specification of the app user flow requirement has to be consistent with the interaction model obtained in the first phase. To facilitate this task, the (automatically extracted) interaction model was modified to include more intuitive and unified event names, for example, clicking the play/pause button or the back button takes place in multiple transitions. Figure 3 shows the UAMP interaction model with the modified event names.
The app user flow requirement must at least start playing a song during a time interval and then pause, which can be expressed in different ways. Three different app user flow requirements have been defined in order to ensure the playback of video during the test case execution as well as other extra requirements to reduce the number of app user flows generated. In addition, the maximum app user flow length is configured to 10, which means that the app user flows will have, at most, 10 events. ‡ ‡ ‡ http://www.morse.uma.es/tools/mve. Req. 1: The app user flow must eventually reach state S6 with the event selectSong and then eventually reach S6 with the event play/pause. In addition, each previous event can happen only once. Req. 2: The app user flow must eventually reach state S6 with the event selectSong and then eventually reach S6 with event play/pause. In addition, loops are not allowed, and FullScreenPlayerActivity and PlaceHolderActivity are never entered. Req. 3: The app user flow must reach state S6 with the event selectSong, then eventually fire the event skipNext and finally the event play/pause. In addition, loops are not allowed, and states S1 and S9 are not visited.
The last phase is the execution of the test case and the analysis of the EFP in different network scenarios. The app user flow (obtained in the previous phase) satisfies Requirement 3. The app user flow starts playing a song, then skips to the next song and finally pauses the playback. Figure 7 shows the TRIANGLE platform, which includes a mobile network testbed that emulates complex radio and network conditions. The EFP included in the case study is especially relevant in network scenarios with a high density of users and a heavily loaded network. This issue has been considered to select the following network scenarios for the case study: 1. The urban pedestrian scenario emulates a user walking down an urban street at 1 to 3 km/h. This scenario is considered as normal conditions. 2. The suburban festival scenario reproduces an outdoor event with many attendees, and thus, some small cells are deployed to increase capacity. 3. The Internet café busy hours emulates an Internet café during busy hours in a dense urban area, and thus, there is a dense number of users in a reduced area.
The TRIANGLE testing framework executes automatically generated app user flows to start and pause the playback in three different network scenarios, and the resulting execution traces are used to verify the EFP. Due to time and space limitations, each test case is executed twice.
The last two phases, the automatic generation of app user flows followed by the test case execution and the analysis of EFPs, are carried out in the TRIANGLE testbed, which emulates realistic network scenarios that are wholly reproducible. In addition, the TRIANGLE testbed supports the traceability and repeatability of the test. In this case, a rooted Samsung Galaxy S7 (instrumented) with Android 6.0.1 has been connected to the testbed to execute the tests.      some fine tuning of the model is required. However, the authors have observed that it is possible to infer some common interaction patterns from the different models that can help the app developer to manually tune or extract the model.
The second threat is the test coverage and the accuracy of the statistical results. Currently, the TRIANGLE testbed can automatically run longer test campaigns with more network scenarios and execute different app user flows selected from the pool of app user flows obtained in the second phase. However, due to space and time limitations, the evaluation only considers the execution of one app user flow two times per network scenario. More test cases must be executed to improve the coverage of the results. It is worth mentioning that further results of ongoing work to apply the proposed technique to commercial apps will be reported at TRIANGLE website.

COMPARISON WITH RELATED WORK
There are some existing proposals that apply model-based testing to Android applications. Some of them assume that the testing process starts without a precise model of the expected behaviour of the applications and focus on techniques to obtain such a model. The MobiGUITAR framework [28] automatically constructs a state machine of one application by executing events in the running application and recording a tree with fireable events for each new state. The authors use a 'breadthfirst' traversal of the app GUI for open source applications. They do not consider any knowledge about how to use the application but instead conduct an exhaustive execution. Therefore, they need criteria to make some states equivalent to prevent state explosion.
The Swift-Hand technique proposed by Choi et al. [29] employs machine learning to construct an approximate model of the application during the testing process. Their aim is to cover as much behaviour as possible, forcing the execution to enter unexplored parts of the state space. Other approaches also use a formal specification of the application to start the test generation. Jing et al. [30] describe how to follow a property-driven method to build models in Alloy, a formal language based on first-order logic. In their proposal, the role of the model checker in this approach is performed by the Alloy analyser, which generates positive (expected) and negative (undesired) test cases.
In contrast to MobiGUITAR, this approach separates test generation from testing, and the states in the high-level state machines are limited and differentiated by design. Thus, these models are more compact. For example, compared with MobiGUITAR, it is unnecessary to conduct extra work to remove unrealistic test cases. In addition, this approach allows the generation of test cases for several applications that interact using Android intents, while the complexity of the runtime-based modelling process for MobiGUITAR and Swift-Hand makes them more suitable for single applications. Similar to this approach, Jing et al. [30] use XML-based transformations to translate the test cases to an executable form to activate the applications being tested. Apart from the inner technologies (model checking vs constraint solver), the main difference between the proposals is how the refined executable model is obtained. The Alloy specification presented by Jing et al. [30] is constructed manually, while the PROMELA specification in this work is generated automatically from the highlevel design of the user-view state machines. Both should be applied to the same case study to obtain a quantitative comparison of the human and computational effort required in these two approaches.
Many commercial and academic tools have been developed to monitor and analyse certain types of EFP in mobile phones. One group of tools focuses on traffic analysis. AntMonitor [3] is a powerful tool that monitors all connected apps in the Android device to produce statistics, such as how each app contributes to the total network traffic, but does not monitor specific events or user interactions systematically. NetworkProfiler [4] is a tool designed to help cellular operators identify the traffic in their networks when transported over HTTP/HTTPS. NetworkProfiler uses device emulators and machine learning techniques to create an app fingerprint by collecting information on the hosts to which the app connects. Because the manual exploration of apps being tested will not cover all behaviours, NetworkProfiler randomly generates user actions to interact with the apps. This random strategy is also supported by other tools such as Monkey [19]. Other tools, such as Monkeyrunner [20], Robotium [21] and Troyd [31], employ scripts to make the apps work with a predefined sequence of user interactions. However, they do not provide an automated methodology to generate such scripts. ProfileDroid [5] is designed to systematically profile apps to discover inconsistencies or surprising behaviours. It is based on a multilayer analysis of the apps, considering the static analysis of byte code, user interactions, calls to the operating system and network traffic. This tool requires a real user to interact with the phone, but the sequence of interactions can be recorded and replayed later in a different scenario. The approach of Automatic Android App Explorer (A3E) [1] involves the novel use of static dataflow analysis on the app bytecode to construct a high-level control flow graph that captures legal transitions among activities (i.e. app screens). Depth-first exploration is then used to reach 64 per cent activity coverage and 36 per cent method coverage in some typical Android apps. This flow graph plays the same role as the models in this study that represent user interactions, and the depth-first search is similar to the approach used in this study for exhaustive test generation. ProfileDroid and A3E are the closest proposals to this work and could in fact be adopted in this methodology.
Compared with ProfileDroid and A3E, this methodology has the following main novel points: (i) the controlled coverage of user interactions for one or several apps due to the model-based approach for test generation; (ii) the ability to automatically find execution traces that violate the expected behaviour of the app in terms of their effect over the Internet; (iii) the inclusion of time in the models making possible to test realistic situations in which the time between user interactions is relevant; (iv) the ability to combine models of several apps running in parallel; and (v) a technique to automatically compare the actual behaviour of each execution of the apps with the expected behaviour. ProfileDroid could be used to generate the reference patterns employed to analyse the actual behaviour of the apps. A3E could be used to facilitate the construction of the interaction models.
There are many references to existing research on one of the EFPs highlighted in this paper, namely estimating power consumption. Phatak et al. [32] provided one of the first classifications of energy bugs for hardware and software and proposed a roadmap towards developing a systematic diagnostic framework for treating these energy bugs. Later, Phatak and other authors presented the eprof tool [33], a fine-grained energy profiler used to gain insight into the energy usage of smartphones. Simultaneously, Yepang Liu [2] studied energy bugs (e.g. their types and manifestation) and identified common patterns. They implemented a static code analyser, PerfChecker, to detect and identify bug patterns. The E-loupe project [34] explores an alternative that mitigates the ill effects of an energy-hungry application. The framework consists of monitoring data in the mobile phone, which are then processed in the cloud to detect the risk of energy drain and to produce information to isolate the dangerous applications. Memory leaks can also be a cause of energy consumption. Xia et al. [35] design a light memory leaks detector that focuses on activity leak and a priority adjustment module to prioritize the killing of leaking apps. In a different approach, a framework is built by Zhang et al. [36] to detect energy leaks using dynamic taint analysis (a form of information flow analysis). Finally, the energy consumed in different mobile platforms is studied in different works [37][38][39]. The first approach compares energy bugs, the second compares energy efficiency and the last developed a power estimation method based on battery traces.
The model-based framework in this study, also required energy consumption monitoring, similar to most of the aforementioned proposals. However, this output is not used directly, but instead as an input for a more sophisticated analysis. Interval logic is used to represent the energy properties to drive the identification of bad behaviours by the applications running the smartphone. As a result, this verification technique is able to detect leaking apps in a very precise way: it can provide the exact execution sequence of one or several apps that are causing the system to lose more energy than expected.
Thompson et al. present a methodology and tool [40], called SPOT, to model the architecture of an application and emulate its energy consumption during the design phase. The application is modelled using configurable abstractions of elements that typically contribute to energy consumption, such as GPS and network connections, attached to activities or background services. The SPOT tool running on the Android device uses this model to emulate each of the components. The energy consumption of an emulation is logged using an Android API and provided as feedback to the programmer.