Analyses of large projects involving human leukocyte antigen data often face the difficulty of having data sets gathered using distinct techniques and resolution levels. Furthermore, it is not infrequent that missing and ambiguous data arise at one or several loci. This article describes a set of computer programs that can be used to work efficiently with these kinds of data. The tasks of concern include format conversions, data recoding and replacement, and Expectation–Maximization (gene–counting) based frequency estimation for sets with ambiguous cases either under the assumption of Hardy–Weinberg equilibrium frequencies or when some deviation exists (measured by a one degree of freedom inbreeding coefficient). This set of utilities is built on the top of a data format formally defined. The formal definition of the format allows to express all kinds of observable ambiguities, to define a simpler form for writing and manipulating data set files and to make substantial modifications of the actual symbols chosen to describe the data.