Modeling performances of concurrent big data applications



Big Data applications are characterized by a non-negligible number of complex parallel transactions on a huge amount of data that continuously varies, generally increasing over time. Because of the amount of needed resources, the ideal runtime scenario for these applications is based on complex cloud computing and storage infrastructures, providing a scalable degree of parallelism together with isolation between different applications and resource abstraction. However, such additional abstraction degree also introduces significant complexity in performance modeling and decision making. Potential concurrency of many applications on the same cloud infrastructure has to be evaluated, and, simultaneously, scalability of applications over time has to be studied through proper modeling practices, in order to predict the system behavior as the usage patterns evolve and the load increases. For this purpose, in this paper, we propose an analytic modeling technique based on the use of Markovian Agents and Mean Field Analysis that allows the effective description of different concurrent Big Data applications on a same, multi-site cloud infrastructure, accounting for mutual interactions, in order to support the careful evaluation of several elements in terms of real costs/risks/benefits for correctly dimensioning and allocating the resources and verifying the existing service level agreements. Copyright © 2014 John Wiley & Sons, Ltd.