The Impact of Web sites: a comparison between Australasia and Latin America.

Paper presented at INFO'99, Congreso Internacional de Informacion, Havana, 4-8 October 1999.

 

Alastair G. Smith
Senior Lecturer
School of Communications and Information Management
Victoria University of Wellington
PO Box 600
Wellington
New Zealand/ Aotearoa
Phone 64 4 463 5785
Fax 64 4 495 5235
URL: <http://www.vuw.ac.nz/~agsmith/>
Email: Alastair.Smith@vuw.ac.nz

 

Abstract

 

What is the impact of World Wide Web sites on the overall information resource of the Internet? What differences are made by the cultural, linguistic and geographic areas that the sites originate from? To what extent are the Internet information resources of the Spanish language world recognised in the largely English language world of the Internet? This paper reports on a study of the impact and influence of the web sites of educational and research organisations in Australasia and Latin America. Both these areas border the Pacific Ocean, and have economic and cultural similarities, but also are divided by language and culture.

 

The study is based in the emerging discipline of webometrics, which applies the techniques of bibliometrics to the study of the World Wide Web. In particular, the study uses the concept of Web Impact Factors, a measure that uses the number of links made to a site to measure of the site's overall influence on the Web. Web Impact Factors are related to Journal Impact Factors, which are used to compare the influence of journals in a discipline, utilising citation counts generated by citation databases. Web Impact Factor studies use hypertext links, which are measured by search engines such as AltaVista, rather than bibliographic citations. Web Impact Factors have been shown to be useful measures of the influence of sites belonging to organisations such as universities and research institutes. The study compares Web Impact Factors for a sample of educational and research institutions in Australia, New Zealand, Central America, the Caribbean, and South America.

Overall, the web sites for Australasian institutions have a higher external WIF than the web sites for Latin American institutions. While specific features of sites can affect the institution’s Web Impact Factor, there is a small correlation between the proportion of English language pages at an institution’s site and the institution’s WIF. This indicates that for linguistic reasons, Latin American sites may not receive the attention that they deserve from the World Wide Internet. This raises the possibility that information may be ignored due to cultural, linguistic and geographic barriers, and this should be taken into account in the development of the global Internet.

Introduction

What differences are made by the cultural, linguistic and geographic areas that World Wide Web sites originate from? To what extent are the Internet information resources of the Spanish language world recognised in the largely English language world of the Internet? This paper reports on a study of the impact and influence of the web sites of educational and research organisations in Australasia and Latin America. Both these areas border the Pacific Ocean, and have economic and cultural similarities, but also are divided by language and culture.

In the past, bibliometric research methodologies have been used to study the communication of information. Cronin and McKim[CRO96] have identified that the World Wide Web is becoming a significant communication medium for science and scholarship, and bibliometric studies of research publishing are being extended to the World Wide Web. A growing literature has emerged that applies bibliometric measures to cyberspace. Terms have been applied to this new area of study include "webometrics" (Almind and Ingwersen[ALM97]), "cybermetrics" (the title of a journal[CYB])

Webometric studies have used large scale Web search engines such as AltaVista. These allow measurements to be made of the total number of pages in a web space and links to these web spaces. The term "web space" is used in this paper to mean a domain (either a top level domain such as .com or .nz, or a lower level domain such as vuw.ac.nz) or a set of directories, such as the pages in the directories and subdirectories of http://www.vuw.ac.nz/scim/. Web search engines provide similar possibilities for the investigation of links between documents to those provided by the citation databases created by the Institute for Scientific Information.

Web links can be seen as an indicator of the overall significance of a site, and the number of web links to a site are being used by the Google search engine[WHY99] as a way of ranking search results. A number of webometric studies have studied the way in which links are made between web sites. Larson[LAR96] carried out a clustering study of geophysical sites and identified key sites that were highly linked to. Almind and Ingwersen[ALM97] used the search capabilities of AltaVista to calculate the size and other characteristics of Scandanavian web spaces. Rousseau[ROU97] coined the term "sitations" for the links between web sites, and carried out a frequency distribution study of sitations related to the subject of bibliometrics. Hit rates are also used as a measure of web site success, but can only be measured at the server, and are only an indication that users visited the site, not that they acquired useful information.

A useful measure of the overall influence of a web space, using the links made to the web space, has been proposed independently by two bibliometric researchers. It is an interesting illustration of the dominance of English language research publishing that the concept of WIFs was published first in a Spanish language publication by Rodríguez i Gairín[ROD97], but was not widely noticed until published in an English language journal by Ingwersen [ING98]. The Web Impact Factor (WIF) is analogous to the Impact Factor used to assess the influence of journals. A journal Impact Factor [ING98, p.237] is the number of citations made in a period T1 to articles in a journal that have been published in a period T2, divided by the total number of articles published in the journal in the period T2. A Web Impact Factor is the number of pages linking to a web space, divided by the number of pages in the web space. The WIF differs in respect to time period from the journal Impact Factor, which is constrained by the methods used to compile citation indexes. The journal Impact factor measures citations made in journals published during one time period, to articles published in another time period. The WIF, in contrast, is a "snapshot" from the search engine database of all links to a web space at the time of measurement.

Links to a web space can be made from within the web space, or from outside, giving rise to three distinct WIFs: the external WIF, reflecting the number of pages linking from outside the web space being measured; the self-link WIF, reflecting links made from inside the web space; and an overall WIF, combining external and self-links.

For the current study, a methodology for calculating WIFs was used that had originally been developed for a comparison of Australian and New Zealand web spaces [SMI99]. This was extended to make a comparison between education and research sites in Australasia (Australia and New Zealand) with similar sites in Latin America (Central and South America).

Methodology

In undertaking a webometric study, it is necessary to select a suitable Web search engine that will count the number of pages in the web space studied, and the number of pages linking to the web space. For a search engine to be used for webometric studies, particularly for calculating WIFs, it should have a large database, covering as much of the Web as possible. This precludes the use of search engines restricted to a particular locality, such as SearchNZ (http://www.searchnz.co.nz/) or MexMaster (http://www.mexmaster.com/). Another way of measuring the impact of individual pages on the web is by page ranks, a measure used by the search engine Google (http://www.google.com/) [WHY99]. However Google only gives page ranks for individual pages, not for sites.

Currently, AltaVista satisfies the requirements most closely out of the available search engines, having one of the largest databases [NOT99], and commands to search for both links and number of pages at a web site. However there are problems:

AltaVista does not always provide consistent Boolean results. For example A Ç B does not always return the same result as B Ç A

The sets retrieved by commands in AltaVista sometimes include unexpected members.

An example of the boolean inconsistency problem is illustrated by the following search results on AltaVista:

L link:auckland.ac.nz/ 14796
S1 link:auckland.ac.nz/ AND host:auckland.ac.nz 4629
S2 host:auckland.ac.nz AND link:auckland.ac.nz/ 4353
E link:auckland.ac.nz/ AND NOT host:auckland.ac.nz 10616

In this case the two equivalent Boolean statements S1 and S2 differ by 6%, and the sum of S1 and E (15245) is 3% more than L, though by strict Boolean logic these should be the same.

Boolean inconsistencies arise because at busy times, the AltaVista search engine times out at certain points of the search and does not create full sets, so there may be inconsistencies in the total numbers – in fact the AltaVista search screen specifically states that the result is "about nnnn Web pages". From a relevance searching point of view this may be of little consequence: the relevancy algorithms are intended to provide the most relevant items at the beginning of the results, and the timing out only affects the less relevant items.

In this study, searches were carried out to determine the following quantities eg. for pages in the vuw.ac.nz web space.

The number of pages in the web space, D, using the command

host:vuw.ac.nz/

The total number of pages linking to the web space, L, for example

link:vuw.ac.nz/

The number of self-links (links from pages in the same web space). This was measured in two ways to overcome the effects of boolean inconsistency, for example for the vuw.ac.nz domain:

S1 link: vuw.ac.nz / AND host:vuw.ac.nz/

S2 host:vuw.ac.nz/ AND link: vuw.ac.nz /

The number of external links (links from pages outside the web space). This was measured in three ways, for example for the vuw.ac.nz web space:

E link:vuw.ac.nz/ AND NOT host:vuw.ac.nz/

E1 link:vuw.ac.nz/ AND NOT (host:vuw.ac.nz/ AND link:vuw.ac.nz/)

E2 link:vuw.ac.nz/ AND NOT (link:vuw.ac.nz/ AND host:vuw.ac.nz/)

Boolean inconsistency meant that AltaVista frequently gave different results for S1 and S2, and for E, E1 and E2. Several measurements were carried out over several days until a measurement was achieved where the boolean inconsistency was zero or very low.

The selected observations were used to calculate

The overall WIF: L/D

The external WIF: (average E,E1,E2)/D

The self-link WIF: (average S1,S2)/D

It might be thought that it would be possible to use WIFs to compare the impact on the web of individual countries, for instance by looking for links to Mexican web sites by searching on AltaVista

link:.mx/

However the link: command used in AltaVista does not reliably discriminate between links to a domain, such as .mx; and links to URLs where the character string comprising the domain name appears in other parts of the URL, for instance …/mx.htm. As a result it is not possible to reliably calculate WIFs for top level domains with the currently available AltaVista searches. However for lower level domains, such as uanl.mx, it is unlikely that the string will appear in a URL other than as a reference to that web space, so the link: command provides a reasonable estimate of the number of pages linking to the web space. Thus it is possible to reliably calculate WIFs for institutions in Australasia and Latin America, and regard these as one indicator of the overall influence of these institutions on the overall World Wide Web.

Results

Table 1 displays web impact factors measured in July 1999 for a selection of university and research web spaces in Australasia and Latin America. In both regions, the larger institutions, as measured by the size of teaching/research staff, were chosen. As it happens, the institutions chosen from Latin American countries are largely from countries which have roughly equivalent levels of technological and economic development to Australia and New Zealand: Argentina, Brazil, Chile and Mexico.

The table is divided into Australasian web spaces and Latin American web spaces. Within these, the figures are ordered by the external WIF. The external WIF is probably the most significant measure, for the following reasons.

External links are probably the best indicator of the overall significance of a web space to the external Web community

Links within a domain are often not detected by the AltaVista link: command. For instance the command link:vuw.ac.nz/ will not retrieve links that use the relative directory structure (eg. <a href=".../scim/">) rather than a full URL (eg. <a href="http://www.vuw.ac.nz/scim/">) but relative links are common in organisational web sites. For this reason, the self link WIFs are probably underestimates, although they are interesting for comparative purposes.

Other data in Table 1 are:

the number of pages (D) at the web site found by the AltaVista search engine.

the Internal WIF. The Internal (or self-link) WIF provides a measure of the amount of internal linking. A web space with a high Internal WIF may be largely functioning as an Intranet.

the number of teaching staff, taken from the World of Learning [WOR99]. This is an indicator of the overall size of the institution.

the percentage of pages in the English language, measured by setting AltaVista to search for English language pages when using the host: command, and comparing this with the total number of pages found, D.

In the Latin American sites, the web space with the highest external WIF is the Universidad de Chile (16.1). A high proportion of the links to are to the SunSite mounted there at http://sunsite.dcc.uchile.cl/. This indicates how a single well used resource at a site can increase the overall attractiveness and significance of the site for Internet users.

In general, however the Australasian web spaces sampled had higher external WIFs than those in Latin America. Although the average external WIF for Latin American sites (1.63) is higher than that for Australasian sites (1.37), this is distorted by the very high external WIF for the Universidad de Chile. If the Universidad de Chile is ignored, the mean external WIF for Latin American sites is only 0.82. If the medians are compared, Australasian sites have a median external WIF of 1.06, while Latin American sites have a median external WIF of 0.77.

Why are the external WIFs of Latin American web spaces lower? One important difference is that of language. Australasian sites are almost entirely in English, while Latin American sites, naturally, are largely in Spanish or Portuguese. But English is the de facto language of the Web. Could the higher external WIF of Australasian web sites be because they are in English, and therefore more accessible to the global Web community? If this is so, it should also be true that Latin American sites which have a higher proportion of English language pages should have higher external WIFs. Graph 1 shows the correlation between the external WIF and the proportion of English language pages for Latin American sites. For the purposes of this comparison, the Universidad de Chile, which has a very high WIF because of the presence of the SunSite, but has only 10% of its pages in English language, is excluded. There is a small positive correlation (Pearson correlation coefficient = 0.14) between the external WIF and the percentage of English language pages. Clearly factors other than language are important in determining the WIF. Interestingly, the correlation between the external WIF and the percentage of English language pages for sites with external WIFs less than 1 is much higher (Pearson correlation coefficient =0.7). So it is possible that for a site which does not yet have much recognition on the Web, the language of the pages is important. However once a site has gained significant recognition, the language of the pages is less important.

Another factor is that Internet search engines tend to cover the most popular sites [LAW99], so the English language dominance of the Web may cause search engines such as AltaVista to concentrate their coverage on English language sites, so that the link measurements may be biased in favour of English language sites. Another significant factor could be that the overall development of web sites in Latin American institutions may not have advanced as much as in Australasia. The average number of web pages per staff member for the Australasian sites is 16.2, compared with 0.2 pages per staff member for Latin American institutions.

 

Conclusion

This study has been exploratory, and there is scope for future webometric research in this area. It would be useful to carry out a more comprehensive study, comparing more institutions, and comparing Web with conventional publication output and indicators of economic and technological development.

The growth of the importance of the World Wide Web as a means of making knowledge available has changed the way we view the study of information. Bibliometric techniques can be extended to the Web, and Web Impact factors provide one way to compare the influence of web sites. A comparison of research and education web sites in Australasia and Latin America raises interesting questions about the place of different cultures and languages on the Web. Australasia and Latin America are both outside the main Web area, dominated by the USA and Europe. It appears that Australasian web sites may achieve a higher visibility on the Web because of closer cultural and language links with the current mainstream of the Internet, which is dominated by the Anglophone North America and Europe. It also appears that Latin American sites which are in English may achieve greater recognition than those in the local languages. This is a warning to the citizens of cyberspace. If the Internet is dominated by English language sites, important knowledge created in non-English speaking areas may be missed, or recognition may be delayed, as occurred with the initial concept of the Web Impact Factor used in this research.

References:

[ALM97] Almind, T C.; Ingwersen, P (1997). Informetric analyses on the World Wide Web: methodological approaches to "Webometrics". Journal of Documentation, 53(4):404-426.

[CRO96] Cronin, B; McKim G (1996). Science and scholarship on the World Wide Web: a North American perspective. Journal of Documentation, 52(2), , 163-171.

[CYB] Cybermetrics (1997-) http://www.cindoc.csic.es/cybermetrics/ (visited 23 June 1999)

[ING98] Ingwersen, Peter (1998). The calculation of web impact factors. Journal of Documentation 54(2), pp. 236-243.

[LAR96] Larson, R R (1996). Bibliometrics of the world wide web: an exploratory analysis of the intellectual structure of cyberspace. http://sherlock.berkeley.edu/asis96/asis96.html (visited 23 June 1999)

[LAW99] Lawrence S. and Giles, CL (8 July1999). Accessibility of information on the web. Nature 400: 107-109.

[NOT99] Notess, G (5 May1999). Search Engine Statistics: Database Size.. http://www.notess.com/search/stats/size.html (visited 23 June 1999)

[ROU97] Rousseau, R (1997). Sitations: an exploratory study. Cybermetrics, 1(1). http://www.cindoc.csic.es/cybermetrics/articles/v1i1p1.html (visited 23 June 1999)

[ROD97] Rodríguez i Gairín, J.M (1997). Valorando el impacto de la información en Internet: Altavista, el "Citation Index" de la Red. [Impact assessment of information on the Internet: Atavista, the citation index of the web] Revista Espanola de Documentacion Scientifica, 20(2):175-181. also available at http://www.kronosdoc.com/publicacions/altavis.htm (visited 23 June 1999)

[SMI99] Smith, Alastair (1999). ANZAC webometrics: exploring Australasian Web structures. In Proceedings of Information Online and On Disc 99: Strategies for the next millennium. Sydney, Australia, 19-21 January 1999. [Sydney]:ALIA: 159-181. Also available at http://www.csu.edu.au/special/online99/proceedings99/203b.htm.(Visited 15 July 1999)

[WHY99] Why use Google? http://www.google.com/why_use.html (visited 23 June 1999)

[WOR99] World of Learning (1999). London : Europa Publications.

 

Table 1. Web Impact factors of university and research web spaces in Australasia and Latin America

Country Institution URL Web pages External WIF Internal WIF Staff English pages (%)
Australasian web spaces
Au University of Queensland uq.oz.au

4533

3.58

0.11

1560

95.6

Au Australian National University anu.edu.au

44938

2.00

0.56

725

96.5

Nz Victoria University of Wellington vuw.ac.nz

9056

1.77

0.51

474

93.8

Au University of Melbourne unimelb.edu.au

42944

1.56

0.68

1893

97.2

Nz University of Auckland auckland.ac.nz

8657

1.36

0.44

1175

87.2

Au Latrobe University latrobe.edu.au

11473

1.20

0.49

2442

94.9

Au Monash University monash.edu.au

45981

1.06

0.52

1657

95.4

Au University of Technology, Sydney uts.edu.au

12365

1.03

0.54

1100

95.1

Au University of South Australia unisa.edu.au

9076

1.01

0.60

1025

70.4

Au University of Sydney usyd.edu.au

35185

0.92

0.43

1816

95.0

Au Victoria University of Technology vut.edu.au

4188

0.90

0.45

1200

97.7

Au Curtin University curtin.edu.au

18314

0.75

0.54

1130

96.1

Au Royal Melbourne Institute of Technology (RMIT) rmit.edu.au

25988

0.71

0.49

1554

95.0

Latin American web spaces
Cl Universidad de Chile uchile.cl

769

16.10

0.43

4369

10.0

Mx Universidad Autónoma de Nuevo León (Mexico) uanl.mx

926

1.83

0.33

29979

4.6

Cl Universidad de Santiago de Chile usach.cl

766

1.39

0.45

2025

10.2

Co Universidad Nacional de Columbia unal.edu.co

739

1.31

0.43

9137

6.5

Ar Universidad Nacional La Plata (Argentina) unlp.edu.ar

2650

1.28

0.37

6300

16.8

Mx Universidad de Guadalajara (Mexico) udg.mx

9040

1.27

0.34

10269

17.2

Ar Universidad Nacional de Córdoba (Argentina) uncor.edu

1571

0.95

0.44

6918

40.9

Ar Universidad de Buenos Aires uba.ar

7699

0.83

0.36

21864

9.0

Br Universidade Federal da Paraíba (Brazil) dsc.ufpb.br

796

0.80

0.46

2930

10.7

Cl Pontificia Universidad Catolica de Chile puc.cl

5032

0.77

0.39

2413

15.0

Br Universidade de São Paulo usp.br

29808

0.72

0.46

4953

19.4

Mx Instituto Tecnológico y de Estudios Superiores de Monterrey (Mexico) itesm.mx

21970

0.61

0.47

6201

11.7

Br Universidade Estadual Paulista unesp.br

4526

0.59

0.36

3400

19.1

Br Universidade Federal do Rio de Janeiro ufrj.br

14983

0.57

0.38

3580

19.2

Mx Universidad Nacional Autonoma de Mexico unam.mx

40243

0.49

0.29

29979

8.7

Pe Universidad Nacional Mayor de San Marcos unmsm.edu.pe

907

0.47

0.48

3150

2.3

Mx Universidad Veracruzana (Mexico) uv.mx

2611

0.44

0.33

4173

7.7

Ar Universidad Tecnologica Nacional (Argentina) utn.edu.ar

1459

0.36

0.33

16185

5.1

Cr Universidad Autónoma de Centro América (Costa Rica) uaca.ac.cr

748

0.14

0.98

2500

0.5

 

Graph 1. External Web Impact Factor vs. percentage of English Language pages for Latin American websites.