• line breaking;
  • page breaking;
  • widows;
  • orphans;
  • keeps;
  • pagination;
  • XML publishing


The problem of line breaking consists of finding the best way to split paragraphs into lines. It has been cleverly addressed by the total-fit algorithm exposed by Knuth and Plass in a well-known paper. Similarly, page-breaking algorithms break the content flow of a document into page units. Formatting languages—such as the World Wide Web Consortium standard Extensible Stylesheet Language Formatting Objects (XSL-FO)—allow users to set which content should be kept in the same page and how many isolated lines are acceptable at the beginning/end of each page. The strategies most formatters adopt to meet these requirements, however, are not satisfactory for many publishing contexts as they very often generate unpleasant empty areas. In that case, typographers are required to manually craft the results in order to completely fill pages. This paper presents a page-breaking algorithm that extends the original Knuth and Plass line-breaking approach and produces high-quality documents without unwanted empty areas. The basic idea consists of delaying the definitive choice of breaks in the line-breaking process in order to provide a larger set of alternatives to the actual pagination step. The algorithm also allows users to decide the set of properties to be adjusted for pagination and their variation ranges. An application of the algorithm to XSL-FO is also presented, with an extension of the language that allows users to drive the pagination process. The tool, named FOP+, is a customized version of the open-source Apache Formatting Objects Processor formatter. Copyright © 2011 John Wiley & Sons, Ltd.