The strategy conversation you can only have here
Excited about your website notching up hits? Pleased with that counter on your web page clicking over week after week. Perhaps you shouldn’t be, or not at least until you have taken a closer look to see what lies behind those figures. Current technology makes calculating the number of visitors to a web site a bit like measuring the water in a dam using a rusty old sieve.
The problem with hits…
…are numerous but lets take a look at some of the major issues. Caching is one
of them. Service providers and large companies generally set their new users
up to point at what is called a proxy server. A proxy server uses some clever
technology to read in the contents of popular web sites and store them
locally. When you request a page from a website, lets use CNN.COM as an
example, the proxy first looks in it’s own store of pages. If the page you are
requesting already exists then the proxy server compares it’s copy of the page
to the original copy on CNN.COM to see if it has changed. If it hasn’t then
the proxy just shows you the copy it has and CNN.COM is non the wiser. If it
doesn’t have the latest copy, it pulls it in and stores it locally for
yourself and the next users. This has a major performance advantage for users
because they get their pages from across town instead of across the world.
Unfortunately the owners of CNN.COM are none the wiser that someone has looked
at their page. Sites who use hits to calculate how many users are viewing
their site never hear about the hits that get delivered from proxies located
all around the world.
The next problem is with hits themselves. Without going into the intricacies of what can be done on a web page, a typical web page consists of text and graphics. A web page that consists of only text registers 1 hit when a browser looks at it. A page that contains 10 graphics and text will deliver 11 hits when a browser looks at it. 1 for the text and 1 each for the graphics. As you can imagine this causes a major distortion in your web site figures. The more graphics you put on a site, the more hits you receive.
On the flip side of this rather distorted coin lies the search engines. A quick survey I did found more than 200 companies that are making their living by exploring the web and cataloguing every page they can lay their hands on. These companies use sophisticated computers to browse the web and follow every link on every page. If we take our Stones site as an example, a search engine would start at our home page and then follow every link to our unit trust site, our restaurant guide, our business directory, Hyperactive our web design company and so on and so forth. At each of those pages it would again follow links to the next level until it could find no more pages.
Companies who publish web sites don’t often take into account the fact that search engines may account for a significant number of hits on their site. Those who do, may quickly (and correctly so) reach the conclusion that the more pages they have on their site, the more hits they will get from the search engines. One search engine working its way through a site with 10 pages, each with 5 graphics on them, will notch up 60 hits (10 x text page hits and 10 x 5 graphics hits on each page). If that same site were to re- arrange itself so that instead of displaying that information on 10 pages, it split the information and displayed it on 20 pages with links between the pages, the hits for the site would double (20 x text page hits and 20 x 5 graphics hits on each page) just from that one search engine. Now take the 200 or so search engines that may be working their way through a site at any one time…
Users a much more accurate way of tracking usage.
Most reasonable web statistics software, records not only hits but users
sessions. A timeout is set, perhaps 15 minutes (which is what we use), and
then a person entering a site and remaining active (not pausing for longer
than 15 minutes), will count as one user session. This means that Joe Bloggs
entering a site at 8:15 on a Monday morning and browsing through 30 or 40
pages (perhaps 150 to 200 hits) without a break of longer than 15 minutes
would count as 1 user session. Likewise a search engine doing the same thing
would also only record 1 user session, even if it spent 6 hours perusing a
site. The search engine dilemma mentioned above is not likely to distort user
session statistics enough to be meaningful.
If this is taken a step further and unique user sessions are measured, then it is likely that a more accurate picture of web site usage will appear. Unique user sessions count only different people that browse a website so that even if a person spends more than 15 minutes away from a website, they are still measured as one user.
The way forward
The measurement of web site visitors is likely to remain an inexact science
until such time as research companies step into the arena and turn web site
statistics into a business. At that time standards can be a agreed and
webmasters will be forced to report usage in a standard manner. Until then, it
is likely that we will always be attempting to make sense of disparate
reporting which depends only on the whim of the person doing the reporting.