Top Page >>A Guide for Archiving Web Pages >> Best practices for web archiving

A Guide for Archiving Web Pages

header image

Best practices for web archiving

Strictly speaking, web archiving practices should be able to cope with any valid code for internet pages. In practice, however, it is highly desirable for pages to observe standard practices and guidelines. Start by reviewing this highly-regarded list prepared in 2004 by consultant Russ Weakley: Web standards checklist. The major elements in this list are accessibility, separation of content and presentation, and use of effective meta tags to consistently describe contents. Formats and contents that hinder access by people with disabilities should be avoided. The XHTML format is preferred to HTML. Formatting via linked style sheets is preferable to formating by tables or other outmoded means. A full set of preservation meta tags should be present on each page. In addition, not mentioned in the checklist, there should be written policies for assuring content integrity.

Accessibility

Key elements of accessible pages include providing text equivalents for images and other non-text elements, avoiding use of colored text or images as the only way to convey information, identifying each language change in the page using the appropriate XHTML codes, avoiding use of frames on web pages, and making sure the pages can be read even is javascript is disabled.
-- more on accessibility --

Separation of content and presentation

Separation of content and presentation makes sense both theoretically and practically. In theory, the mixture of the two is conceptually chaotic. In practice it results in the use of structural elements, such as the code for tables, to achieve placement of page elements, in other words, design. You can design handsome pages using tables, but their code is difficult to understand and maintain and their code bloat results in pages whose code is much more extensive -- and thus slower to load -- than it should be. The use of style sheets makes the web page author's life easier by enabling global changes to design elements.
-- more on separation of content and presentation --

Meta tags for description of page contents

Metadata are data about data, that is to say they are descriptive elements, usually given in the non-displaying header of a web page, which are used to describe the contents of the page and give useful information about it. The minimum set of preservation metadata include the title of the page, its creator and other source data, the topics covered in the page, the date of the page, a unique identifier, the dominant language of the page, information on intellectual property rights, and a description of the contents. The most-widely recommended standard for preservation metadata is the Dublin Core Metadata schema.
-- more on meta tags --

Policies for assuring content integrity

Hardware and software problems, human error, and malicious attacks can all cause degredation of pages you have archived. To assure content integrity you should create and follow a policy for regular maintenance. This entails checking new files as they are added and regularly checking old ones.
-- more on policies for assuring content integrity --

Some resources on best practices for web pages

Top Page >>A Guide for Archiving Web Pages >> Best practices for web archiving