In distributed computing, the overall task of data processing is split into a number of smaller subtasks. Ideally, most or even all of these subtasks can be ran in parallel, allowing multiple computers (so-called worker nodes) to work together to achieve the results much faster than an individual computer could. Yet such parallel processing requires sophisticated data storage approaches that maximize the simultaneous availability of these data to the different worker nodes, coupled with careful coordination of the nodes' computational effort. This central task is typically handled by a dedicated controller node that manages the incoming requests for processing power and storage allocation, and distributes these across the network. Over the years, frameworks for such distributed computing platforms have taken on various forms, their evolution essentially keeping lockstep with the increasingly connected nature of computers. The advent of early, local networks fostered the development of cluster computing approaches that subsequently matured into even more spreadout and generic GRID computing architectures once these networks grew large and fast enough to connect entire buildings. With the emergence of the internet and the adoption of high-speed links between data centres around the world, these GRIDs have now opened up into cloud based systems where both storage and computing are transparently handled by on-demand worker nodes thatmay be located anywhere in globally spread out networks.
For further details see article in this issue by Kenneth Verheggen, Harald Barsnes and Lennart Martens, Proteomics 2014, 14, 367–377 (DOI: 10.1002/pmic.201300288).
Illustration created by Kenneth Verheggen, Harald Barsnes and Lennart Martens; cover design by SCHULZ Grafik-Design.