Software: Practice and Experience

Volume 48, Issue 3
SHORT COMMUNICATION

Toward characterizing HTML defects on the Web

Joaquim Mendes

Centre for Informatics and Systems, Department of Informatics Engineering, University of Coimbra, Coimbra, Portugal

Search for more papers by this author
Nuno Laranjeiro

Corresponding Author

E-mail address: cnl@dei.uc.pt

Centre for Informatics and Systems, Department of Informatics Engineering, University of Coimbra, Coimbra, Portugal

Correspondence

Nuno Laranjeiro, Centre for Informatics and Systems, Department of Informatics Engineering, University of Coimbra, 3030‐790 Coimbra, Portugal.

Email: cnl@dei.uc.pt

Search for more papers by this author
Marco Vieira

Centre for Informatics and Systems, Department of Informatics Engineering, University of Coimbra, Coimbra, Portugal

Search for more papers by this author
First published: 13 September 2017

Summary

HTML is being massively used as an interface to provide services to users. Web developers are producing and changing sites at a high pace while trying to support the latest HTML standards. In this context, it is common to find websites that do not comply with the standards and fail to be correctly processed by browsers. Considering this dynamic environment and the increasingly large diversity of browsers with frequent updates, the appearance of problems in web pages is a common, sometimes severe, and hard‐to‐track problem. In this short communication, we describe the initial design of an approach that will be used to obtain information regarding the characteristics of HTML documents on the Web and extract indicators of representative errors made by their developers. Preliminary results show nearly 90% of the pages analyzed having at least one type of error and the prevalence of a small number of error types.

The full text of this article hosted at iucr.org is unavailable due to technical difficulties.