THE CIQM WEBSITE QUALITY SURVEYS
The main series of reports reviews the overall
quality of web resources. It provides an in-depth analysis of the
site features as experienced by users and deals with authority,
navigation and content. Analysis is made of both the visible page
and the underlying code. The results offer a unique and detailed
view of web quality and clearly demonstrate the need to evaluate
resources before relying on them for day-to-day use.
The reports are:
- CIQM Report on Top of the Web Survey 2003 report
- Download report (Word file) here -->> Download
- The CIQM 2004 Website Quality Survey: .gov.uk (below)
- The CIQM 2003 Website Quality Survey (below)
- UK Adoption Websites (2 reports): Link
THE CIQM 2003 & 2004 WEBSITE QUALITY SURVEYS
Summary reports
Roger Fenton and Chris Armstrong
Download full report (Word file) here -->>
Download
2004 Report
Download full report (Word file) here -->> Download
2003 Report
2004: .gov.uk sites
The aim of the 2004 CIQM Website Quality Survey
was to provide data on the quality of websites in a specific domain,
and to compare these with the results of the 2003 survey. In order
to achieve this a random set of Websites from the .gov.uk domain
was analysed in terms of authority, and the care with which pages
were constructed and maintained. This is expected to be the first
in a series of such single-domain surveys of this type. We evaluated
60 Websites in terms of a limited set of criteria, including dating,
links to responsible persons or bodies, accessibility, navigational
context, corrupt or suspect links and the provision of metadata.
To make comparison with the 2003 survey
results easier, the report format is the same, including table numbers.
The tables from the 2003 report are repeated in this report, with
.gov.uk data added. A few new features have been surveyed for this
survey. Their results are intercalated with the existing numbering
system. For example, Table 3 provides data on the provision of any
date on the visible homepage. Table 4 provides data on homepages
displaying any accessibility validation certificate. A new Table
3.5 between these two now provides more detailed data collected
for the first time in 2004 on the kinds of date and time information
displayed on the homepage.
Methodology
We evaluated existing home pages, without
regard to any statements about future e-government development which
might have been made by the organisation. Other studies, such as
Beynon-Davies and Williams (2003) cover this aspect of the progress
to e-government. Nor did we concern ourselves with evaluating interactive
aspects, such as online forms, registration, transfer of funds and
databases. These are typically features of the deeper levels of
Websites and have also been the subject of other studies, such as
the Top of the Web Survey on Quality and Usage of Public E-services
(2003).
AltaVista.com <http://uk.altavista.com/>
was used to search for all sites in the .gov.uk domain, in all languages
(allowing inclusion of sites in Gaelic, Irish and Welsh), with 50
hits per page displayed, and the “site-collapse” function
enabled (to reduce the number of pages returned from the same Website
to a maximum of two). AltaVista returns a maximum of 1,000 hits:
20 pages of 50 hits. The search was carried out on 25 March 2004
and results saved for later use. Total AltaVista hits were 1,917,161.
The defining features of selected websites were:
• the URLs ended in .gov.uk, and
• there was no more than one domain name preceding .gov.uk
in the URL, excepting domains indicating that the site belonged
to one of the agencies or regional governments of England, Scotland,
Wales and Northern Ireland.
Deep-site pages from Websites were excluded, as
were Websites for libraries (which will be studied separately).
URL extensions indicating homepages, such as “/index”
and “/default” were accepted. In a few cases where a
selected URL was for a cover-page or jump page to two or more independent
sites (for example, a Welsh local authority with parallel sites
in Welsh and English), the first (top/left) of these was taken for
analysis.
These criteria ensured that selected Web pages
were front pages of the Websites of
(1) UK-wide institutions: Westminster-level departments,
executive agencies, quangos, and initiatives with their own Websites,
or
(2) similar institutions at the level of the UK’s constituent
nations (England, Scotland, Northern Ireland and Wales), or
(3) local authorities (counties, regions, cities, etc.)
From each page of hits, the first three qualifying
Websites were selected until quotas of 30 UK/national (categories
[1] and [2]) and 30 local Websites (category [3]) were filled.
If a sample page redirected the user to another
page, the new page was used for analysis, as long as it fitted the
other criteria.
The analysis was carried out in April 2004.
Conclusions
The 2004 CIQM survey of Web pages substantiates
the 2003 findings that there are clear and significant differences
between domains in their inclusion of selected quality-enhancing
features, but these differences do not necessarily apply across
all the features studied. Each feature produced its own pattern
of domains which were better or worse in the features selected for
evaluation. A penultimate table shows which domains appeared as
the most and least compliant for each feature.
Awarding one point for compliance, 0 for non-compliance
and 0.5 for partial compliance of the surveyed features, a maximum
score of 33 was possible. The lowest score was from a Northern Ireland
local authority Website, with just 4.5 points, and also a relatively
high rate (5.1%) of corrupt links. The highest scorer was an English
county, with 18.5 points and 2% corrupt links. The mean score of
all 60 sites was 10.6 points, or less than one third of the possible.
The feature most often complied with was the total lack of unwanted
pop-up windows (100% compliance); the feature least often complied
with was providing a Crystal Mark for clear English (5% compliance).
Local authority Websites performed rather better
than “others”, but not significantly so. Of the 31 sites
with lower than average scores (i.e., under 10.6), 12 were local
authorities and 19 were “other”; conversely, of those
performing better than average, 18 were local authorities and 11
were “others”. Looking at the scores a different way,
the mean score for local authorities was 11.6, while the mean for
“others” was 10.1, again not statistically significant.
2003
The aim of the 2003 CIQM Website Quality Survey
was to provide a snapshot of Web quality. In order to achieve this
a random set of Websites from different domains was analysed in
terms of authority, and the care with which pages were constructed
and maintained. This is the first in a series of such surveys. We
evaluated 600 Websites in terms of a limited set of criteria, including
dating, links to responsible persons or bodies, accessibility, navigational
context, corrupt or suspect links and the provision of metadata.
Methodology
Websites for analysis were selected from a range
of URL domains: .ac.uk, .edu, .co.uk, .com, .net, .org, and .org.uk.
One hundred pages each from the .co.uk, .com and .net domains were
chosen, and 50 each from the others. In addition, 50 e-journals
and 50 e-books were analysed.
Candidate sites for the first seven domains proper
were found by searching on AltaVista.com
for the term “url:.xxx”, where xxx = the domain name,
specifying “worldwide” search. The maximum number of
hits returned by AltaVista is 1000, and search engine results are
not random. Pages are ranked by combining a number of criteria,
such as URL content, links to that page from other pages, various
word frequency, proximity and position counts and metadata content.
From the results of the searches, the following
categories of pages were omitted:
- paid placements
- pages other than homepages
- pages flagged as “related pages” in the search
results
- Internet service providers and search engines
- narrowly-focused commercial providers of services to the Internet
community: site submission services, domain name sellers, and
Web page hosting services (but professional Website designers
were not excluded)
- Weblogs
- wiki pages
- individual mailing lists, although mailing list services were
included
- W3C sites
- pages in languages which the authors were unable to analyse
- near-duplicate pages, such as several different city guides
published by the same commercial organisation
- multiple sites of the same organisation, such as different
departments within a university or the national sites of a multi-national
corporation
- empty pages
- .edu sites for institutions outside the USA
From the remaining pages a sample of every nth
hit was taken until the specified number of pages was reached.
If a page redirected the user to another page,
the new page was used for analysis, as long as it was within an
unfilled domain population.
In the case of electronic journals and monographs,
domains are not available as a selection criterion. Examples found
during the analysis of the other domains were used. The remaining
journals were taken from the NewJour’s consolidated alphabetical
list <http://gort.ucsd.edu/newjour/toc.html>
dated 30 May 2003, using a list of 100 pseudo-random numbers. Both
electronic journals with print counterparts (e+p journals) and electronic-only
journals (e-only journals) were analysed, but only currently-published
full-text titles were taken. The sample included academic journals,
magazines, e-zines and newspapers. Because they typically impose
a standard format on their titles, only one title was allowed from
any one publisher. The publisher’s page was analysed in preference
to that of an aggregator.
The remaining e-books were taken by visiting a
number of e-book aggregators and publishers, taking one at random
from each site. Therefore, the sample is not random, but represents
a range of possible formats.
The survey was undertaken in late May and early
June 2003.
Results
Pages were evaluated in terms of elements visible
on the displayed page and also on features of the source code. The
evaluation criteria described in the following section do not represent
a comprehensive or thorough evaluation of Websites but rather are
a manageable subset that members of the public can use easily as
a quick test of page or site quality.
The Visible Page
There are any number of possible quality markers
by which Websites can be judged, and papers on evaluation criteria
can be found on the CIQM Web pages.
Most agree that authority can be assessed by indicators which include
ownership and currency, and we have used these as the first two
tests. Consideration for visually disadvantaged users and those
with browsers unable to handle frames or graphics, and help with
navigation around a Website all suggest attention to detail and
care over quality, and this survey tested these aspects. Finally,
the display of some outward sign that the page has been quality
checked (for example, by Bobby or W3C) would seem clear evidence
of responsible Web page development. These tests together form the
first part of our quality toolkit.
Did the page include a link to the author,
owner or Webmaster?
To qualify, a link had to be to the person or
organisation administratively responsible for the content of the
page or site, not its design. Also, it had to be either a blank
e-mail form available within a single click (this could be a feedback
form on the Web page itself or a mailto link or e-mail address which
could be cut and pasted into an e-mail application); or other contact
details, sufficient to make contact possible without recourse to
another part of the Website or any other source. A link to a customer
helpline or information hotline did not qualify, nor did a link
to a commercial Website developer. In the case of e-books, the link
had to be to the publisher, or to author, if self-published.
While virtually all Websites do include contact
details, a significant number do not give this information high
visibility on their front pages, where it would be easy to use,
or else they give it on a “contact us” page which does
not include a feedback form. The figures for e-books were particularly
low, reflecting that fact that the page examined for this survey
was the equivalent of a title page or contents page, and contact
details for publishers would normally be on the publisher’s
homepage — too many clicks away to qualify. Educational and
related institutions were particularly conscientious about including
contact details on their front pages.
Was the page dated?
The date had to be a copyright date, creation
date or a “last updated” date. Dated content elements
on the page, such as press releases or news items, did not qualify.
Results showed .com pages to be particularly compliant,
which may indicate a need to demonstrate currency to visitors (although.co.uk
pages did not show that they felt anything like the same need),
while e-journals commonly reflect new issues by changing the “last
updated” date. Apparently other types of organisations do
not feel the same need. Dates given across all domains were commonly
several years old, with no date of later revision given.
UK pages provided dating significantly less often
than US pages.
Did the page include an accessibility validation
certificate from a recognised authority and offer a low-graphics,
text-only or other alternative for the visually impaired or older
browsers?
Any certificate from W3C, or Bobby (from Watchfire
Corporation) or AskAlice (from Adobe and SSB Technologies) qualified.
Pages which included no graphics or a maximum
of two text-based graphics were assessed as “none needed”.
The alternative page had to include all the content of the main
page, and a “print this page” facility did not qualify.
Overall, only 10 pages displayed any kind of certificate
indicating they had been validated by W3C or Bobby for accessibility.
Of those, six provided a text-only or low-graphics alternative site,
and several more were among those assessed as not needing an alternative
because the basic site itself used virtually no graphics. In a very
few cases the page analysed was a splash cover page with virtually
no content beyond a logo or video and an “enter this site”
icon; the real homepage was located at the next level down. The
results are too insignificant to allow comparisons between domains.
If one counts “print this page” features,
the figures would rise slightly, but not significantly. This feature
is more often found at lower, more content-rich levels of a Website.
A significant number of the sites were already
text-only, including help sites for computer professionals, e-versions
of out-of-copyright texts and e-journals. Educational institutions
were the most likely to provide text-only versions, reflecting a
need to demonstrate that they are disabled-friendly, especially
to prospective students. Commercial organisations in all domains
seem particularly oblivious to the needs of handicapped users. Of
concern is the fact that very few sites of charitable and medical
organisations provided alternatives.
Comparing UK and US domains (.ac.uk + .co.uk +
.org.uk and .edu + .com + .org), there is little difference in the
provision of validation certificates (2.5% UK against 2% US), but
the provision of text alternatives is clearer: 12% of UK pages included
such an alternative, while only 6.5% of the US pages did so.
If the page incorporated frames, was a no-frames
alternative available?
Frames are still not widely used, and just under
a third of sites using frames provided a no-frames alternative,
especially true for educational and cultural bodies. In most other
cases users were instructed to upgrade their browsers, with a link
to a Netscape or Internet Explorer upgrade site. Only a few sites
completely ignored the possibility of users having pre-frames browser
versions or worse yet, gave them a terse “You need frames
to use this site” message, with no further help.
There are clear UK-US national differences with
more UK pages than US pages incorporating frames. While the number
of pages using frames is too small to provide a reliable sample
(29 UK pages and 11 US pages), 14 of the 29 UK pages offered a no-frames
alternative, but only two of the 11 US pages failed to do so.
Did the Website include a breadcrumb trail
(BCT), allowing a user at a lower level to know where he or she
was in relation to hierarchically higher pages?
This proved difficult to evaluate, because rather
than a straightforward
Animals > Mammals > Cats
> Siamese
type of BCT, many sites used combinations of sidebars,
tabs, outline formats, typography, etc. In the end a personal test
was applied: could the evaluator discover without undue effort,
where the page was within the Website?
E-journals and e-books often avoided the need
for BCTs by having their entire content on a single page, or having
only one layer of pages below the homepage, for individual articles
or chapters. Most surprising was the relative lack of such guides
for academic institutions, especially in light of their commonly
very large and complex Websites. To some extent this was compensated
for by the use of sidebars or sitemaps. Other types of sites were
more likely to include BCTs, but these could be such complex mixes
of tabs, outline sidebars, typographical conventions and the like
as to be more confusing than helpful. In a small number of cases
the site included what was evidently intended to be a BCT, but were
disqualified due to missing intermediate levels. Other sites included
BCTs selectively, for only part of the site.
UK and US sites again differ, with UK sites being
significantly less likely to include a BCT.
Overall page-visible feature results
Of the 600 pages under consideration, only 12
provided all six page-visible data elements: author link, date,
validation certificate, low-graphics content (or alternative), a
no frames version and a BCT, and seven of those were e-books. Only
three provided none of these elements: two .co.uk pages and one
.com page (disregarding cases where an element, particularly a BCT,
was judged not to be necessary). Pages included an average of just
under 3 of the elements, with .co.uk and .ac.uk having significantly
fewer elements, and e-journal and e-books significantly more. There
was no significant difference in the performance of commercial versus
non-commercial pages, and the difference between UK and US pages
was also slight.
Source Code Data
The source code for Web pages can also be analysed
to determine quality. A valid HTML document declares the version
of HTML used in its composition with a DOCTYPE tag – so its
inclusion (or otherwise) is a clear indication of care on the part
of the Webmaster. Similarly use of metadata indicates attention
to detail – precisely because it is not required or necessary
and Web pages function perfectly well without it. Appropriate metadata
also improve the ability of search engines rating a site relevant
to a search request. Testing source code also allows further assessment
of the degree of concern for the visually disadvantaged: do images
have text alternatives? Finally, corrupt links demonstrate an obvious
lack of maintenance and quality control. These further six tests
complete the toolkit used for this short survey.
In the case of a page that used frames, all the
frame pages were analysed and the results were combined into a single
overall rating. For example, if any one frame page included a DOCTYPE
command, the entire page was rated as having one.
Results
Source code elements seem to be included when
the page author feels they will be useful to him/her. Date metadata
are seldom included, author, DOCTYPE and keyword elements progressively
more often.
There were differences between non-commercial
pages and commercial pages. More commercial than non-commercial
domain pages included a DOCTYPE statement. Date and keyword metadata
were given almost equally by commercial and non-commercial pages.
Non-commercial pages were significantly more likely to include authorship
or ownership metadata than commercial pages.
Comparing UK and US sites in their use of metadata
shows that DOCTYPE statements appear in the source codes of more
UK pages than of US pages. The same is true for authorship and date
<meta> tags. But keyword <meta> tags are more common
in US pages’ source codes. Also notable is the reversal of
the trend in .ac.uk vs. .edu pages, where US pages are considerably
more likely to include keyword <meta> tags than their UK counterparts.
UK pages also scored higher than US pages in the
number of elements provided. The individual domains all provided
similar amounts of these elements, except for e-journals and e-books,
which scored very much lower. Only 16 pages out of 600 (2.7%) included
both the complete DOCTYPE tag and supplied all the metadata.
If the page included <img> tags, were
these accompanied by alt= statements with content?
Perhaps the best-publicised means of supporting
the visually-disadvantaged is by the provision of text alternatives
for the images on a Web page. All <img> tags were checked,
including those for “trivial” elements such as horizontal
rules and borders. A statement such as “alt=‘ ’”
did not qualify. Three levels of provision were recognised, in addition
to a complete absence of <img> tags: no <img> tag included
a qualifying alt= statement; at least one, but not all, tags included
a qualifying alt= statement; and all tags included a qualifying
alt= statement.
Very few pages included no <img> tags at
all, and these were almost exclusively “plain vanilla”
e-books of the Project Gutenberg type, where original illustrations,
if any, are omitted. Pages very commonly provided full alt= statements
for substantive graphics, such as illustrations, logos, and radio
buttons, while omitting them for items such as rules and border
elements. These account for the great majority of the pages with
a “some but not all” score, while the “none”
and “all” categories were comprised of pages which either
had no trivial <img> tags or which were rigorous in providing
valid alt= statements for everything.
It was equally common for <img> tags to
omit alt= statements entirely and to provide empty statements of
the format “alt=’’”.
Non-commercial pages are significantly more conscientious
than commercial pages in the provision of valid alt= statements.
Links
LinkScan™ <http://www.elsop.com/linkscan/>
was used to analyse the pages’ links. LinkScan™ breaks
down links from a page into six categories:
- unknown (not tested)
- error (definite error or corrupt link)
- possible error
- warning (something unusual about the link; almost always caused
by a missing final /)
- advisory (manual inspection recommended) and
- no error.
We classified Warning and No error links as acceptable
and the other four categories as corrupt, for calculating percentages
of corrupt links. In fact, the Possible error and Advisory categories
were rare, and the Unknown category occurred in only a very few
sites, each of which however had a great many of them.
Excluding the occasional rogue pages and outliers
and anomalies caused by the link-checking software, the quality
of links was seen to be very good: almost half of all pages had
no corrupt links, and over two thirds of all pages had none or only
one such link.
The most striking features of the links data are
the considerable variation in the numbers of links from page to
page, and the extent to which overall results are skewed by a few
extreme pages with large numbers of links and large numbers of apparently
corrupt links: the median number of corrupt links is well below
the average number.
Commercial pages had a somewhat higher rate of
corrupt links than non-commercial pages, while the per centage of
pages with no corrupt links was very similar.
Looking at the differences between UK and US pages
(.ac.uk + .co.uk + .org.uk against .edu + .com + .org), it is immediately
apparent that while the median number of corrupt links is virtually
identical, there are significant differences in both the total numbers
of links and numbers of corrupt links per page, with US pages having
roughly twice the number of both in every case, except for the total
links in the .co.uk and .com domains, which are virtually identical.
But removing the highest outliers from each domain’s total
and corrupt links finds two countries much closer together:
Conclusions
The 2003 CIQM survey of Web pages indicates clear
differences between domains in their inclusion of selected quality-enhancing
features. Each feature produced its own pattern of domains which
were better or worse in the features selected for evaluation.
Overall it appears that .ac.uk pages were the
most compliant, or provided more of the data and the user-friendly
features assessed in this exercise. They also were never ranked
worst on a feature. E-books were ranked best on four features, but
they were also ranked worst on four features. E-journals were the
most non-compliant, with four worst scores and no best scores.
Did any page pass all the tests: providing
all six page-visible features, a DOCTYPE statement, all three <meta>
tags, full alt= statements for all its <img> tags, and with
no corrupt links? No. The two best-performing pages included eight
of the 10 page-visible features and <meta>-type features,
and had alt= tags for all their <img> tags. They also had
no corrupt links. They were a UK university’s homepage and
a plain-vanilla French-language e-book text. The worst performer
of all was an official UK regulatory body’s homepage, which
included none of the positive quality features, had a frames site
with no alternative, included no alt= statements for its <img>
tags, and had 2.7% corrupt links. There were another six pages which
were almost as poor, except that they did provide one of the 10
page-visible or <meta> features.
|