Hits (and misses)

Excited about your website notching up hits? Pleased with that counter on your web page clicking over week after week. Perhaps you shouldn’t be, or not at least until you have taken a closer look to see what lies behind those figures. Current technology makes calculating the number of visitors to a web site a bit like measuring the water in a dam using a rusty old sieve.

The problem with hits…
…are numerous but lets take a look at some of the major issues. Caching is one of them. Service providers and large companies generally set their new users up to point at what is called a proxy server. A proxy server uses some clever technology to read in the contents of popular web sites and store them locally. When you request a page from a website, lets use CNN.COM as an example, the proxy first looks in it’s own store of pages. If the page you are requesting already exists then the proxy server compares it’s copy of the page to the original copy on CNN.COM to see if it has changed. If it hasn’t then the proxy just shows you the copy it has and CNN.COM is non the wiser. If it doesn’t have the latest copy, it pulls it in and stores it locally for yourself and the next users. This has a major performance advantage for users because they get their pages from across town instead of across the world. Unfortunately the owners of CNN.COM are none the wiser that someone has looked at their page. Sites who use hits to calculate how many users are viewing their site never hear about the hits that get delivered from proxies located all around the world.

The next problem is with hits themselves. Without going into the intricacies of what can be done on a web page, a typical web page consists of text and graphics. A web page that consists of only text registers 1 hit when a browser looks at it. A page that contains 10 graphics and text will deliver 11 hits when a browser looks at it. 1 for the text and 1 each for the graphics. As you can imagine this causes a major distortion in your web site figures. The more graphics you put on a site, the more hits you receive.

On the flip side of this rather distorted coin lies the search engines. A quick survey I did found more than 200 companies that are making their living by exploring the web and cataloguing every page they can lay their hands on. These companies use sophisticated computers to browse the web and follow every link on every page. If we take our Stones site as an example, a search engine would start at our home page and then follow every link to our unit trust site, our restaurant guide, our business directory, Hyperactive our web design company and so on and so forth. At each of those pages it would again follow links to the next level until it could find no more pages.

Companies who publish web sites don’t often take into account the fact that search engines may account for a significant number of hits on their site. Those who do, may quickly (and correctly so) reach the conclusion that the more pages they have on their site, the more hits they will get from the search engines. One search engine working its way through a site with 10 pages, each with 5 graphics on them, will notch up 60 hits (10 x text page hits and 10 x 5 graphics hits on each page). If that same site were to re-arrange itself so that instead of displaying that information on 10 pages, it split the information and displayed it on 20 pages with links between the pages, the hits for the site would double (20 x text page hits and 20 x 5 graphics hits on each page) just from that one search engine. Now take the 200 or so search engines that may be working their way through a site at any one time…

Users a much more accurate way of tracking usage.
Most reasonable web statistics software, records not only hits but users sessions. A timeout is set, perhaps 15 minutes (which is what we use), and then a person entering a site and remaining active (not pausing for longer than 15 minutes), will count as one user session. This means that Joe Bloggs entering a site at 8:15 on a Monday morning and browsing through 30 or 40 pages (perhaps 150 to 200 hits) without a break of longer than 15 minutes would count as 1 user session. Likewise a search engine doing the same thing would also only record 1 user session, even if it spent 6 hours perusing a site. The search engine dilemma mentioned above is not likely to distort user session statistics enough to be meaningful.

If this is taken a step further and unique user sessions are measured, then it is likely that a more accurate picture of web site usage will appear. Unique user sessions count only different people that browse a website so that even if a person spends more than 15 minutes away from a website, they are still measured as one user.

The way forward
The measurement of web site visitors is likely to remain an inexact science until such time as research companies step into the arena and turn web site statistics into a business. At that time standards can be a agreed and webmasters will be forced to report usage in a standard manner. Until then, it is likely that we will always be attempting to make sense of disparate reporting which depends only on the whim of the person doing the reporting.

Author: Dale Williams

Dale is based in Cape Town on the southern tip of Africa from where he maintains connections with people all over the world through his portfolio life.