In the context of immersive communications that aim to enable natural experiences and interactions among people, objects, and the environment, we propose a method to enable natural video interactions through hand gesture recognition between users and a video meeting system. An end-to-end study was performed: we started with the development of specific gesture recognition algorithms and concluded with a user evaluation to validate our results. Gestures and their associated functionalities were identified via a user survey which focused on distinguishing two concepts which are often confused: hand posture and hand gesture (i.e., static versus dynamic). Our recognition process was composed of two main tasks: hand posture recognition (i.e., skin segmentation, background subtraction, region combination, feature extraction, and classification) and hand gesture recognition (tracking and recognition). Our approach combined a signal similarity study with a data-mining tool for dynamic gesture recognition. We focused on the experimentation and user evaluation to improve our approach, taking into account user feedback and analyzing performance in different environments and for different users. © 2013 Alcatel-Lucent.