With the move toward patient electronic medical records (EMRs), accessing information for insurance coding and research depends on standardized taxonomies to organize and index the content. Controlled vocabularies are necessary to interpret content consistently. Established quasi-taxonomies provide codes for medical conditions and treatments, but applying these codes as metadata to index the records is laborious, requiring translation from natural language in the EMR to a code's verbal equivalent to the code. Indexing systems can streamline the categorization process for greater efficiency and accuracy by using Bayesian engines or a rule-based approach. Analyzing discrepancies between human indexing and the software system results shows where editorial intervention is needed for continual improvement, with a goal of 85% or higher accuracy. Using a categorization system with a hierarchical taxonomy enables deep, precise indexing or quick, automatic filtering to more general concepts. The accuracy of medical indexing systems varies widely, based on the degree of automation and capacity for semantic analysis.