Group randomized trials (GRTs) randomize groups, or clusters, of people to intervention or control arms. To test for the effectiveness of the intervention when subject-level outcomes are binary, and while fitting a marginal model that adjusts for cluster-level covariates and utilizes a logistic link, we develop a pseudo-Wald statistic to improve inference. Alternative Wald statistics could employ bias-corrected empirical sandwich standard error estimates, which have received limited attention in the GRT literature despite their broad utility and applicability in our settings of interest. The test could also be carried out using popular approaches based upon cluster-level summary outcomes. A simulation study covering a variety of realistic GRT settings is used to compare the accuracy of these methods in terms of producing nominal test sizes. Tests based upon the pseudo-Wald statistic and a cluster-level summary approach utilizing the natural log of observed cluster-level odds worked best. Due to weighting, some popular cluster-level summary approaches were found to lead to invalid inference in many settings. Finally, although use of bias-corrected empirical sandwich standard error estimates did not consistently result in nominal sizes, they did work well, thus supporting the applicability of marginal models in GRT settings.