Big data and cloud computing are two disruptive trends nowadays, provisioning numerous opportunities to the current information technology industry and research communities while posing significant challenges on them as well. Cloud computing provides powerful and economical infrastructural resources for cloud users to handle ever increasing data sets in big data applications. However, processing or sharing privacy-sensitive data sets on cloud probably engenders severe privacy concerns because of multi-tenancy. Data encryption and anonymization are two widely-adopted ways to combat privacy breach. However, encryption is not suitable for data that are processed and shared frequently, and anonymizing big data and manage numerous anonymized data sets are still challenges for traditional anonymization approaches. As such, we propose a scalable and cost-effective framework for privacy preservation over big data on cloud in this paper. The key idea of the framework is that it leverages cloud-based MapReduce to conduct data anonymization and manage anonymous data sets, before releasing data to others. The framework provides a holistic conceptual foundation for privacy preservation over big data. Further, a corresponding proof-of-concept prototype system is implemented. Empirical evaluations demonstrate that scalable and cost-effective framework for privacy preservation can anonymize large-scale data sets and mange anonymous data sets in a highly flexible, scalable, efficient, and cost-effective fashion. Copyright © 2013 John Wiley & Sons, Ltd.