Register or Login to browse without ads

Thu 2 Sep 2010 - 5:18 pm UTC

Home | Ask a Question | Browse Questions

5 stars ANSWERED on Fri 2 Nov 2007 - 3:24 pm UTC by davidsarokin

Question: What is the average number of links on a web page?

Home » Technology » #1015

Please carefully read the Disclaimer and Terms & conditions.
Priced at $25.00
The customer tipped the researcher $5.00

Actions: Add Comment

Asked by kevin2kelly on Thu 1 Nov 2007 - 7:31 pm UTC:

I have some older references for this question but I am looking for more
recent and more reliable figures. Older sources include:

Prefetching Hyperlinks, 1999, by Dan Duchamp -
http://www.sagecertification.org/publications/library/proceedings/usits99/full_papers/duchamp/duchamp_html/doc004.html
Finds an average of 22.6 links per page.

Article in 2000 referring to a now defunct company Linkguard,
http://news.bbc.co.uk/1/hi/sci/tech/790685.stm
Finds 52 links per page.

Calculated from a secondary figure in 2006 article on estimating pagerank
http://portal.acm.org/citation.cfm?id=1150419
Finds 4.2 links per page on 4 million .edu pages and 3.9 links/pp on
political pages

That is such a wide range, I'd like source(s) with more confidence.

Uclue Researcher Request for clarification by Researcher bobbie7 on Fri 2 Nov 2007 - 5:18 am UTC:

Hello again Kevin2kelly,

I located a web standards audit of 105 Australian Government web sites
performed during December 2006
http://gdispain.site.net.au/standards/ag-website-audit-dec06/


The spreadsheet at the link below link contains the data for the 105
websites.
http://gdispain.site.net.au/standards/ag-website-audit-dec06/pubs/ag-website-audit-dec06.xls


If the above link doesn't work, you may download the web site audit data
(XLS - 180 KB) from page 47.   
http://gdispain.site.net.au/standards/ag-website-audit-dec06/


Column DQ lists the average number of links per webpage for each of the 105
websites.


For example, here are the figures for the first nine websites.


URL                      Average number of links per web page
www.aad.gov.au             50

www.abs.gov.au             57

www.accc.gov.au            38

www.accesscard.gov.au      41

www.afma.gov.au           102

www.afp.gov.au             51

www.ag.gov.au              33

www.agimo.gov.au           47

www.ags.gov.au             38


They calculated the average number of links per webpage for all 105 web
sites at 43.5.


Would these figures work for you?


Thanks, 
Bobbie

Question clarification by kevin2kelly on Fri 2 Nov 2007 - 6:41 am UTC:

No, the sample size of 100 websites is so small as to be meaningless,
especially since they were all .gov sites.

Uclue Researcher Request for clarification by Researcher bobbie7 on Fri 2 Nov 2007 - 7:18 am UTC:

Kevin2kelly,

I'll try again and if I find more relevant figures I'll let you know. In
the meantime, I am unlocking this question so that other researchers can
take a crack at it as well.

Bobbie

Uclue Researcher 5 stars Answer by Researcher davidsarokin on Fri 2 Nov 2007 - 3:24 pm UTC:

kevin2kelly,

Actually, the range you cited in your question probably isn't as large as
it first appears.  From what I can see from your third source, "Estimating
the Global PageRank of Web Communities", the count of links was restricted
to edu links, and is not representative of the web as a whole.

And thereby hangs a tale.  Counting links is not straightforward.  There's
a huge difference between what the eye sees in viewing a site, and what the
spider is instructed to see when crawling the same site.  Studies of link
statistics may include or exclude, as they see fit, all sorts of
hyperlinks, such as advertising links, image hyperlinks, duplicated links
on the same page, and so on.  

In addition, the web appears to be so strongly skewed in terms of site
distribution, with a few large sites that house tens or even hundreds of
thousands of links, that measures of central tendency are inherently
difficult, and simple means (averages) tell quite a different tale than
medians.

On top of these complexities, most of the data on links statistics arises
from cyber-academics studying the composition of the web, and their
insistent refusal to speak anything resembling English often makes the
interpretation and comparision of their results difficult, and sometimes
impossible.  

With that whiny caveat to kick things off, the best overview I came across
of link stats is -- far and away -- this 2003 study performed by Microsoft,
IBM and HP researchers:

http://research.microsoft.com/research/sv/sv-pubs/p96-broder/p96-broder.pdf
Efficient URL Caching for World Wide Web Crawling


In it, the researchers conducted a large-scale crawl of several hundred
million pages over the course of several weeks.  Their findings:

"...These pages contained about 26.83 billion links, equivalent to an
average of 62.55 links per page; however, the median number of links per
page was only 23, suggesting that the average is inflated by some pages
with a very high number of links..." 


So there you have it.  Websites with an average of about 62 links per page,
or a median of 23 links....take your pick.  Either way, the numbers are not
wildly different than the first two cites in your question, and look to be
quite consistent with the australian data that bobbie7 cited.  


Note, however, that the researchers counted *all* links on a page,
including things like image links, in contrast to some other counting
methods which only count links in anchor tags.  Secondly, the authors note
that, unlike some other studies, they counted each and every link, even if
it was a duplicate on the same page:

 
"...most studies report the number of unique links per page. The numbers
above include duplicate copies of a link on a page. If we only consider
unique links per page, then the average number of links is 42.74 and the
median is 17..."


Last point:  the authors note that their numbers are somewhat larger than
other data in the literature:


"...Earlier studies reported only an average of 8 links or 17 links per
page..."


and speculate that, in addition to counting all links and duplicate links,
their numbers are larger because their crawl had the capacity to include
very large webpages, which other studies were not able to include due to
memory limitations.  Thus, they were able to include in their counts some
mega-large pages with many thousands of links.  However, I would point out
their non-duplicates median figure of 17 links per page is not
substantially different from some of the other studies available.


Not all studies are as clear as this one, in terms of what was actually
counted, and how.  I've included a few of these studies below, only some of
which are directly linkable, and the rest were accessed through
subscription databases.




[analysis by Brazilian researchers of several hundred thousand web pages,
though what exactly was counted is not clear]
JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY,
57(2):208–221, 2006
Link-Based Similarity Measures for the Classification of Web Documents

"...TodoBR provides 40,871,504 links between Web pages (an average of 6.9
links per page)..."



Stochastic models for the web graph
R. Kumar, P. Raghavan, S. Rajagopalan, D. Sivakumar, A. Tomkins, and E.
Upfal
FOCS ’00: Proceedings of the 41st Annual Symposium
on Foundations of Computer Science
IEEE Computer Society,
2000, p. 57.

"The web may be viewed as a directed graph each of whose vertices is a
static HTML web page, and each of whose edges corresponds to a hyperlink
from one web page to another...[with] an average degree of about 7..."

[The above is widely cited (and interpreted) as meaning there is an average
of 7 links per page, though it's not clear to me what the authors actually
counted, or how they derived this figure]






http://www.iit.cnr.it/staff/marco.pellegrini/papiri/www015-pellegrini.pdf
Extraction and Classification of Dense Communities in the Web
May 8–12, 2007

"...Andrei Broder et al. [6] in the year 2000 estimated the size of the
indexable web graph at 200M pages and 1.5G edges (thus an average degree
about 7.5 links per page, which is consistent with the average degree 8.4
of the WebBase data of 2001)..."

[Again, not entirely clear what or how things are counted.  WebBase refers
to existing set of researcher-accessible data about the web which is
described in this reference from the above article:  J. Cho and H.
Garcia-Molina. WebBase and the stanford interlib project. In 2000 Kyoto
International Conference on Digital Libraries: Research and Practice,
2000.]





http://www.dcc.ufla.br/infocomp/artigos/v5.2/art07.pdf
Assessment of WWW-Based Ranking Systems for Smaller Web Sites
2006

"...The database contains 7312 pages, of which 2728 are HTML pages with
outgoing links. There are a total of 22970 hyperlinks, yielding an average
of approximately 8.42 outgoing hyperlinks per HTML page..."
[This small sample size study counts only outgoing links, and excludes
internal links to other pages on the same website]







18th International Workshop on Database and Expert Systems Applications
Hyperlink Classification: A New Approach to Improve PageRank
2007. DEXA apos;07. 18th International Conference on Database and Expert
Systems Applications
Volume , Issue , 3-7 Sept. 2007 Page(s):274 - 277

"...We fetched about 21,717 pages by open source search engine Nutch. We
find that there are about 82 hyperlinks on each page on average, however,
there are only twenty hyperlinks or even less are relating about the
page’s topic while most of the hyperlinks are about the information about
the whole Web site map or the ads..."

[appears to have been Chinese pages, but it's not totally clear]




Semantic prefetching objects of slower web site pages 
Journal of Systems and Software 
Volume 79, Issue 12, December 2006, Pages 1715-1724    

"...The average web page contains 8.87 images per page, ranging from 1 to
25 images, with a standard deviation of 4.27. The average number of
hyperlinks per web page is 10.70, ranging from 1 to 26 hyperlinks, with a
standard deviation of 4.49..."

[appears to be a small sample size]





I trust the information here meets your needs, (hopefully in a
non-controversial manner).  But if there's anything more I can do for you,
just say the word.

5 stars Accepted and rated by kevin2kelly on Fri 2 Nov 2007 - 4:18 pm UTC:

David,

That's impressive, and I appreciate your wrap up. Since I am interested in
counting ALL links including invisible, duplicate, and advertising ones,
the answer you found is perfect. That it also confirms (to an order of
magnitude) most of the other smaller studies, including the tiny one that
bobbie found, means I now have excellent confidence in the number.

Thank you.

-- kk

Actions: Add Comment

Bookmark it!   Del.icio.us Digg Furl Reddit Yahoo MyWeb StumbleUpon Technorati Mixx MySpace Facebook

Frequently Asked Questions | Terms & Conditions | Disclaimer | Privacy Policy | Contact Us | Spread the word!

© 2010 Uclue Ltd