The Centre for Information Quality Management

 

Centre for Information Quality Management

 

A service of IAL, run on behalf of the UK eInformation Group (UKeiG) and CILIP: the Chartered Institute of Library and Information Professionals since 1993.

More...
 

THE CIQM WEBSITE QUALITY SURVEYS

The main series of reports reviews the overall quality of web resources. It provides an in-depth analysis of the site features as experienced by users and deals with authority, navigation and content. Analysis is made of both the visible page and the underlying code. The results offer a unique and detailed view of web quality and clearly demonstrate the need to evaluate resources before relying on them for day-to-day use.

The reports are:

  • CIQM Report on Top of the Web Survey 2003 report
    • Download report (Word file) here -->> Download
  • The CIQM 2004 Website Quality Survey: .gov.uk (below)
  • The CIQM 2003 Website Quality Survey (below)
  • UK Adoption Websites (2 reports): Link

THE CIQM 2003 & 2004 WEBSITE QUALITY SURVEYS
Summary reports
Roger Fenton and Chris Armstrong

Download full report (Word file) here -->> Download 2004 Report
Download full report (Word file) here -->> Download 2003 Report

2004: .gov.uk sites

The aim of the 2004 CIQM Website Quality Survey was to provide data on the quality of websites in a specific domain, and to compare these with the results of the 2003 survey. In order to achieve this a random set of Websites from the .gov.uk domain was analysed in terms of authority, and the care with which pages were constructed and maintained. This is expected to be the first in a series of such single-domain surveys of this type. We evaluated 60 Websites in terms of a limited set of criteria, including dating, links to responsible persons or bodies, accessibility, navigational context, corrupt or suspect links and the provision of metadata.

To make comparison with the 2003 survey results easier, the report format is the same, including table numbers. The tables from the 2003 report are repeated in this report, with .gov.uk data added. A few new features have been surveyed for this survey. Their results are intercalated with the existing numbering system. For example, Table 3 provides data on the provision of any date on the visible homepage. Table 4 provides data on homepages displaying any accessibility validation certificate. A new Table 3.5 between these two now provides more detailed data collected for the first time in 2004 on the kinds of date and time information displayed on the homepage.

Methodology

We evaluated existing home pages, without regard to any statements about future e-government development which might have been made by the organisation. Other studies, such as Beynon-Davies and Williams (2003) cover this aspect of the progress to e-government. Nor did we concern ourselves with evaluating interactive aspects, such as online forms, registration, transfer of funds and databases. These are typically features of the deeper levels of Websites and have also been the subject of other studies, such as the Top of the Web Survey on Quality and Usage of Public E-services (2003).

AltaVista.com <http://uk.altavista.com/> was used to search for all sites in the .gov.uk domain, in all languages (allowing inclusion of sites in Gaelic, Irish and Welsh), with 50 hits per page displayed, and the “site-collapse” function enabled (to reduce the number of pages returned from the same Website to a maximum of two). AltaVista returns a maximum of 1,000 hits: 20 pages of 50 hits. The search was carried out on 25 March 2004 and results saved for later use. Total AltaVista hits were 1,917,161.

The defining features of selected websites were:

• the URLs ended in .gov.uk, and
• there was no more than one domain name preceding .gov.uk in the URL, excepting domains indicating that the site belonged to one of the agencies or regional governments of England, Scotland, Wales and Northern Ireland.

Deep-site pages from Websites were excluded, as were Websites for libraries (which will be studied separately). URL extensions indicating homepages, such as “/index” and “/default” were accepted. In a few cases where a selected URL was for a cover-page or jump page to two or more independent sites (for example, a Welsh local authority with parallel sites in Welsh and English), the first (top/left) of these was taken for analysis.

These criteria ensured that selected Web pages were front pages of the Websites of

(1) UK-wide institutions: Westminster-level departments, executive agencies, quangos, and initiatives with their own Websites, or
(2) similar institutions at the level of the UK’s constituent nations (England, Scotland, Northern Ireland and Wales), or
(3) local authorities (counties, regions, cities, etc.)

From each page of hits, the first three qualifying Websites were selected until quotas of 30 UK/national (categories [1] and [2]) and 30 local Websites (category [3]) were filled.

If a sample page redirected the user to another page, the new page was used for analysis, as long as it fitted the other criteria.

The analysis was carried out in April 2004.

Conclusions

The 2004 CIQM survey of Web pages substantiates the 2003 findings that there are clear and significant differences between domains in their inclusion of selected quality-enhancing features, but these differences do not necessarily apply across all the features studied. Each feature produced its own pattern of domains which were better or worse in the features selected for evaluation. A penultimate table shows which domains appeared as the most and least compliant for each feature.

Awarding one point for compliance, 0 for non-compliance and 0.5 for partial compliance of the surveyed features, a maximum score of 33 was possible. The lowest score was from a Northern Ireland local authority Website, with just 4.5 points, and also a relatively high rate (5.1%) of corrupt links. The highest scorer was an English county, with 18.5 points and 2% corrupt links. The mean score of all 60 sites was 10.6 points, or less than one third of the possible. The feature most often complied with was the total lack of unwanted pop-up windows (100% compliance); the feature least often complied with was providing a Crystal Mark for clear English (5% compliance).

Local authority Websites performed rather better than “others”, but not significantly so. Of the 31 sites with lower than average scores (i.e., under 10.6), 12 were local authorities and 19 were “other”; conversely, of those performing better than average, 18 were local authorities and 11 were “others”. Looking at the scores a different way, the mean score for local authorities was 11.6, while the mean for “others” was 10.1, again not statistically significant.


2003

The aim of the 2003 CIQM Website Quality Survey was to provide a snapshot of Web quality. In order to achieve this a random set of Websites from different domains was analysed in terms of authority, and the care with which pages were constructed and maintained. This is the first in a series of such surveys. We evaluated 600 Websites in terms of a limited set of criteria, including dating, links to responsible persons or bodies, accessibility, navigational context, corrupt or suspect links and the provision of metadata.

Methodology

Websites for analysis were selected from a range of URL domains: .ac.uk, .edu, .co.uk, .com, .net, .org, and .org.uk. One hundred pages each from the .co.uk, .com and .net domains were chosen, and 50 each from the others. In addition, 50 e-journals and 50 e-books were analysed.

Candidate sites for the first seven domains proper were found by searching on AltaVista.com for the term “url:.xxx”, where xxx = the domain name, specifying “worldwide” search. The maximum number of hits returned by AltaVista is 1000, and search engine results are not random. Pages are ranked by combining a number of criteria, such as URL content, links to that page from other pages, various word frequency, proximity and position counts and metadata content.

From the results of the searches, the following categories of pages were omitted:

  • paid placements
  • pages other than homepages
  • pages flagged as “related pages” in the search results
  • Internet service providers and search engines
  • narrowly-focused commercial providers of services to the Internet community: site submission services, domain name sellers, and Web page hosting services (but professional Website designers were not excluded)
  • Weblogs
  • wiki pages
  • individual mailing lists, although mailing list services were included
  • W3C sites
  • pages in languages which the authors were unable to analyse
  • near-duplicate pages, such as several different city guides published by the same commercial organisation
  • multiple sites of the same organisation, such as different departments within a university or the national sites of a multi-national corporation
  • empty pages
  • .edu sites for institutions outside the USA

From the remaining pages a sample of every nth hit was taken until the specified number of pages was reached.

If a page redirected the user to another page, the new page was used for analysis, as long as it was within an unfilled domain population.

In the case of electronic journals and monographs, domains are not available as a selection criterion. Examples found during the analysis of the other domains were used. The remaining journals were taken from the NewJour’s consolidated alphabetical list <http://gort.ucsd.edu/newjour/toc.html> dated 30 May 2003, using a list of 100 pseudo-random numbers. Both electronic journals with print counterparts (e+p journals) and electronic-only journals (e-only journals) were analysed, but only currently-published full-text titles were taken. The sample included academic journals, magazines, e-zines and newspapers. Because they typically impose a standard format on their titles, only one title was allowed from any one publisher. The publisher’s page was analysed in preference to that of an aggregator.

The remaining e-books were taken by visiting a number of e-book aggregators and publishers, taking one at random from each site. Therefore, the sample is not random, but represents a range of possible formats.

The survey was undertaken in late May and early June 2003.

Results

Pages were evaluated in terms of elements visible on the displayed page and also on features of the source code. The evaluation criteria described in the following section do not represent a comprehensive or thorough evaluation of Websites but rather are a manageable subset that members of the public can use easily as a quick test of page or site quality.

The Visible Page

There are any number of possible quality markers by which Websites can be judged, and papers on evaluation criteria can be found on the CIQM Web pages. Most agree that authority can be assessed by indicators which include ownership and currency, and we have used these as the first two tests. Consideration for visually disadvantaged users and those with browsers unable to handle frames or graphics, and help with navigation around a Website all suggest attention to detail and care over quality, and this survey tested these aspects. Finally, the display of some outward sign that the page has been quality checked (for example, by Bobby or W3C) would seem clear evidence of responsible Web page development. These tests together form the first part of our quality toolkit.

Did the page include a link to the author, owner or Webmaster?

To qualify, a link had to be to the person or organisation administratively responsible for the content of the page or site, not its design. Also, it had to be either a blank e-mail form available within a single click (this could be a feedback form on the Web page itself or a mailto link or e-mail address which could be cut and pasted into an e-mail application); or other contact details, sufficient to make contact possible without recourse to another part of the Website or any other source. A link to a customer helpline or information hotline did not qualify, nor did a link to a commercial Website developer. In the case of e-books, the link had to be to the publisher, or to author, if self-published.

While virtually all Websites do include contact details, a significant number do not give this information high visibility on their front pages, where it would be easy to use, or else they give it on a “contact us” page which does not include a feedback form. The figures for e-books were particularly low, reflecting that fact that the page examined for this survey was the equivalent of a title page or contents page, and contact details for publishers would normally be on the publisher’s homepage — too many clicks away to qualify. Educational and related institutions were particularly conscientious about including contact details on their front pages.

Was the page dated?

The date had to be a copyright date, creation date or a “last updated” date. Dated content elements on the page, such as press releases or news items, did not qualify.

Results showed .com pages to be particularly compliant, which may indicate a need to demonstrate currency to visitors (although.co.uk pages did not show that they felt anything like the same need), while e-journals commonly reflect new issues by changing the “last updated” date. Apparently other types of organisations do not feel the same need. Dates given across all domains were commonly several years old, with no date of later revision given.

UK pages provided dating significantly less often than US pages.

Did the page include an accessibility validation certificate from a recognised authority and offer a low-graphics, text-only or other alternative for the visually impaired or older browsers?

Any certificate from W3C, or Bobby (from Watchfire Corporation) or AskAlice (from Adobe and SSB Technologies) qualified.

Pages which included no graphics or a maximum of two text-based graphics were assessed as “none needed”. The alternative page had to include all the content of the main page, and a “print this page” facility did not qualify.

Overall, only 10 pages displayed any kind of certificate indicating they had been validated by W3C or Bobby for accessibility. Of those, six provided a text-only or low-graphics alternative site, and several more were among those assessed as not needing an alternative because the basic site itself used virtually no graphics. In a very few cases the page analysed was a splash cover page with virtually no content beyond a logo or video and an “enter this site” icon; the real homepage was located at the next level down. The results are too insignificant to allow comparisons between domains.

If one counts “print this page” features, the figures would rise slightly, but not significantly. This feature is more often found at lower, more content-rich levels of a Website.

A significant number of the sites were already text-only, including help sites for computer professionals, e-versions of out-of-copyright texts and e-journals. Educational institutions were the most likely to provide text-only versions, reflecting a need to demonstrate that they are disabled-friendly, especially to prospective students. Commercial organisations in all domains seem particularly oblivious to the needs of handicapped users. Of concern is the fact that very few sites of charitable and medical organisations provided alternatives.

Comparing UK and US domains (.ac.uk + .co.uk + .org.uk and .edu + .com + .org), there is little difference in the provision of validation certificates (2.5% UK against 2% US), but the provision of text alternatives is clearer: 12% of UK pages included such an alternative, while only 6.5% of the US pages did so.

If the page incorporated frames, was a no-frames alternative available?

Frames are still not widely used, and just under a third of sites using frames provided a no-frames alternative, especially true for educational and cultural bodies. In most other cases users were instructed to upgrade their browsers, with a link to a Netscape or Internet Explorer upgrade site. Only a few sites completely ignored the possibility of users having pre-frames browser versions or worse yet, gave them a terse “You need frames to use this site” message, with no further help.

There are clear UK-US national differences with more UK pages than US pages incorporating frames. While the number of pages using frames is too small to provide a reliable sample (29 UK pages and 11 US pages), 14 of the 29 UK pages offered a no-frames alternative, but only two of the 11 US pages failed to do so.

Did the Website include a breadcrumb trail (BCT), allowing a user at a lower level to know where he or she was in relation to hierarchically higher pages?

This proved difficult to evaluate, because rather than a straightforward

Animals > Mammals > Cats > Siamese

type of BCT, many sites used combinations of sidebars, tabs, outline formats, typography, etc. In the end a personal test was applied: could the evaluator discover without undue effort, where the page was within the Website?

E-journals and e-books often avoided the need for BCTs by having their entire content on a single page, or having only one layer of pages below the homepage, for individual articles or chapters. Most surprising was the relative lack of such guides for academic institutions, especially in light of their commonly very large and complex Websites. To some extent this was compensated for by the use of sidebars or sitemaps. Other types of sites were more likely to include BCTs, but these could be such complex mixes of tabs, outline sidebars, typographical conventions and the like as to be more confusing than helpful. In a small number of cases the site included what was evidently intended to be a BCT, but were disqualified due to missing intermediate levels. Other sites included BCTs selectively, for only part of the site.

UK and US sites again differ, with UK sites being significantly less likely to include a BCT.

Overall page-visible feature results

Of the 600 pages under consideration, only 12 provided all six page-visible data elements: author link, date, validation certificate, low-graphics content (or alternative), a no frames version and a BCT, and seven of those were e-books. Only three provided none of these elements: two .co.uk pages and one .com page (disregarding cases where an element, particularly a BCT, was judged not to be necessary). Pages included an average of just under 3 of the elements, with .co.uk and .ac.uk having significantly fewer elements, and e-journal and e-books significantly more. There was no significant difference in the performance of commercial versus non-commercial pages, and the difference between UK and US pages was also slight.

Source Code Data

The source code for Web pages can also be analysed to determine quality. A valid HTML document declares the version of HTML used in its composition with a DOCTYPE tag – so its inclusion (or otherwise) is a clear indication of care on the part of the Webmaster. Similarly use of metadata indicates attention to detail – precisely because it is not required or necessary and Web pages function perfectly well without it. Appropriate metadata also improve the ability of search engines rating a site relevant to a search request. Testing source code also allows further assessment of the degree of concern for the visually disadvantaged: do images have text alternatives? Finally, corrupt links demonstrate an obvious lack of maintenance and quality control. These further six tests complete the toolkit used for this short survey.

In the case of a page that used frames, all the frame pages were analysed and the results were combined into a single overall rating. For example, if any one frame page included a DOCTYPE command, the entire page was rated as having one.

Results

Source code elements seem to be included when the page author feels they will be useful to him/her. Date metadata are seldom included, author, DOCTYPE and keyword elements progressively more often.

There were differences between non-commercial pages and commercial pages. More commercial than non-commercial domain pages included a DOCTYPE statement. Date and keyword metadata were given almost equally by commercial and non-commercial pages. Non-commercial pages were significantly more likely to include authorship or ownership metadata than commercial pages.

Comparing UK and US sites in their use of metadata shows that DOCTYPE statements appear in the source codes of more UK pages than of US pages. The same is true for authorship and date <meta> tags. But keyword <meta> tags are more common in US pages’ source codes. Also notable is the reversal of the trend in .ac.uk vs. .edu pages, where US pages are considerably more likely to include keyword <meta> tags than their UK counterparts.

UK pages also scored higher than US pages in the number of elements provided. The individual domains all provided similar amounts of these elements, except for e-journals and e-books, which scored very much lower. Only 16 pages out of 600 (2.7%) included both the complete DOCTYPE tag and supplied all the metadata.

If the page included <img> tags, were these accompanied by alt= statements with content?

Perhaps the best-publicised means of supporting the visually-disadvantaged is by the provision of text alternatives for the images on a Web page. All <img> tags were checked, including those for “trivial” elements such as horizontal rules and borders. A statement such as “alt=‘ ’” did not qualify. Three levels of provision were recognised, in addition to a complete absence of <img> tags: no <img> tag included a qualifying alt= statement; at least one, but not all, tags included a qualifying alt= statement; and all tags included a qualifying alt= statement.

Very few pages included no <img> tags at all, and these were almost exclusively “plain vanilla” e-books of the Project Gutenberg type, where original illustrations, if any, are omitted. Pages very commonly provided full alt= statements for substantive graphics, such as illustrations, logos, and radio buttons, while omitting them for items such as rules and border elements. These account for the great majority of the pages with a “some but not all” score, while the “none” and “all” categories were comprised of pages which either had no trivial <img> tags or which were rigorous in providing valid alt= statements for everything.

It was equally common for <img> tags to omit alt= statements entirely and to provide empty statements of the format “alt=’’”.

Non-commercial pages are significantly more conscientious than commercial pages in the provision of valid alt= statements.

Links

LinkScan™ <http://www.elsop.com/linkscan/> was used to analyse the pages’ links. LinkScan™ breaks down links from a page into six categories:

  • unknown (not tested)
  • error (definite error or corrupt link)
  • possible error
  • warning (something unusual about the link; almost always caused by a missing final /)
  • advisory (manual inspection recommended) and
  • no error.

We classified Warning and No error links as acceptable and the other four categories as corrupt, for calculating percentages of corrupt links. In fact, the Possible error and Advisory categories were rare, and the Unknown category occurred in only a very few sites, each of which however had a great many of them.

Excluding the occasional rogue pages and outliers and anomalies caused by the link-checking software, the quality of links was seen to be very good: almost half of all pages had no corrupt links, and over two thirds of all pages had none or only one such link.

The most striking features of the links data are the considerable variation in the numbers of links from page to page, and the extent to which overall results are skewed by a few extreme pages with large numbers of links and large numbers of apparently corrupt links: the median number of corrupt links is well below the average number.

Commercial pages had a somewhat higher rate of corrupt links than non-commercial pages, while the per centage of pages with no corrupt links was very similar.

Looking at the differences between UK and US pages (.ac.uk + .co.uk + .org.uk against .edu + .com + .org), it is immediately apparent that while the median number of corrupt links is virtually identical, there are significant differences in both the total numbers of links and numbers of corrupt links per page, with US pages having roughly twice the number of both in every case, except for the total links in the .co.uk and .com domains, which are virtually identical. But removing the highest outliers from each domain’s total and corrupt links finds two countries much closer together:

Conclusions

The 2003 CIQM survey of Web pages indicates clear differences between domains in their inclusion of selected quality-enhancing features. Each feature produced its own pattern of domains which were better or worse in the features selected for evaluation.

Overall it appears that .ac.uk pages were the most compliant, or provided more of the data and the user-friendly features assessed in this exercise. They also were never ranked worst on a feature. E-books were ranked best on four features, but they were also ranked worst on four features. E-journals were the most non-compliant, with four worst scores and no best scores.

Did any page pass all the tests: providing all six page-visible features, a DOCTYPE statement, all three <meta> tags, full alt= statements for all its <img> tags, and with no corrupt links? No. The two best-performing pages included eight of the 10 page-visible features and <meta>-type features, and had alt= tags for all their <img> tags. They also had no corrupt links. They were a UK university’s homepage and a plain-vanilla French-language e-book text. The worst performer of all was an official UK regulatory body’s homepage, which included none of the positive quality features, had a frames site with no alternative, included no alt= statements for its <img> tags, and had 2.7% corrupt links. There were another six pages which were almost as poor, except that they did provide one of the 10 page-visible or <meta> features.

  News
Copmany newsIndustry News

© 2003, 2004, 2005, 2006, 2007, 2008 Information Automation Limited  
Last updated: Aug 2007