Design and implementation of task scheduling strategies for massive remote sensing data processing across multiple data centers



Data intensive applications of remote sensing data processing are more and more widespread resulting from the evolutions in computer and network technologies. Especially, bags-of-tasks (BoTs) applications with a mass of sharing input files and directed acyclic graph (DAG) applications with data dependencies in a widely distributed computing environment bring new challenges. In this article, a strategy of partitioning group based on hypergraph (PGH) is introduced to formulate the model of sharing files. Within the PGH algorithm, BoTs applications would be partitioned into several groups to minimize the time of data transferring. We also adopted another scheduling policy, which is called optimized task tree (OTT) strategy to handle the DAG workflow of massive remote sensing data processing with data dependencies. A scheduling queue of DAG tasks would be updated according to the priorities changing. With the help of GridSim simulation environment, we designed the Gridlets within scheduler to test the performance of PGH and OTT. Copyright © 2013 John Wiley & Sons, Ltd.