Ask us a question!

Web Moves Blog

Web Moves News and Information

20
Jun
2003

Understanding Your Log Files

What are log files?

Log files are text files that are automatically created when someone accesses your web site. They record the requests made to your web server. These files are opened when the web services of your server starts and remain open as your server responds to requests. The request information is added to the log files in “real time”.

Logs tell you if search engine optimization and other marketing operations are successful. These logs will also show you exactly in which zones you have success and where you need to put in more work. A wealth of information about the activities of your visitors is available from your web server log files. These can be used for various marketing plans and troubleshooting.

The log files are the summary of the accesses to your Web site for a particular day. They provide you with detailed information as to how many files were transmitted from your Web site, how many bytes of data were sent, where visitors of your site are coming from etc. Each individual request is listed on a separate line in a log file. A sample request is given below.

Where to find the server logs

The log files are located on your server. Generally, these are located in your www/logs directory. If you cannot find your server logs in the www/logs directory, contact your hosting solution provider. They will be able to provide you the information as to where your logs are located.

What information can the server logs provide?

Which pages of your web site were requested, and how often,
How many visits did your web site receive,
Who are your most frequent visitors (in some cases),
Where your visitors came from (in some cases),
Which search engine spiders have found your site?
Which sections/pages of your site were the most popular during a given period of time (say a week or a month),
Which keyword searches lead people to your site,
What browsers and platforms (operating systems) were they using,
Which are the pages through which people enter your site most frequently,
How long was the average “view time” for a given web page in your site,
During which times of the day and days of the week is your server is the busiest,
What are the errors that people encounter.

What is a log file analyzer?

Log files are not easy to understand in their raw format. For future assessment and marketing plans, you should be able to see totals for the whole site. Log analyzer is a software program that is capable of reading raw log files and turning them into easy-to-understand statistics. These can be used to tweak your promotional and marketing plans.

Sample log file entry

Each individual request is listed on a separate line in a log file, called a log file entry. It is automatically created every time someone makes a request to your web site. We will analyze each part of the log entry in this article.

201.58.170.90 — [18/Aug/2002:01:53:23 -0500] “GET /frames.htm
HTTP/1.1” 200 11631 “http://www.searchengineethics.com/” “Mozilla/4.0
(compatible; MSIE 5.5; Windows NT 5.0)”
This lines gives us the following information about the request:

IP address or hostname of the visitor
Login [ -]
Authuser [ -]
Date and time [18/Aug/2002:01:53:23 -0500]
Request method [GET]
Request path [frames.htm]
Request protocol [HTTP/1.1]
Response status [200]
Response content size [11631]
Referrer path [http://www.searchengineethics.com]
User agent [Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0)]
201.58.170.90 => is the ip address of the visitor. The host is the user’s server that requested the data. In this case it is the visitor’s ip address. If your server is configured to “resolve” (look up) ip addresses, this ip address could have been resolved into the host name. Some web servers are set to automatically resolve ip addresses by conducting a “whois” lookup.

Example of the log file entry where the IP address has been resolved:
ip218.m4.nwlink.com — [18/Aug/2002:01:53:23 -0500] “GET /frames.htm HTTP/1.1” 200 11631 “http://www.searchengineethics.com/” “Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0)”

— => These two fields are reserved for the identification of the visitor’s user name. This field is used rarely, and can be faked easily. In this entry, no data has been recorded.

18/Aug/2002 => is the date when the request was made.

01:53:23 => is the time when the request was made.

-0500 => the offset from GMT (Greenwich mean time). -0500 indicates 5 hours behind GMT.

GET => It is the request method. This records the type of request from the client’s browser to the server. The different types of requests are

POST – places a file on the server
HEAD – requests the header information of the file
GET – requests the file in its entirety

frames.htm => is the actual object of the web site that was requested. In this case, it is the htm file called frames.htm.

HTTP/1.1 => is the request protocol.

200 => is the response status. The value 200 stands for successful operation. This code could be anything from 100 to 500, depending on the action resulting from the file request. The actions are:

1xx – continue
2xx – success
3xx – redirect (also a success)
4xx – client error (failure)
5xx – server error (failure)

For more accurate information on response status, visit the following url,
http://www.tvpress.com/promote/server/

11631 => is the amount of bytes of information that were returned in response to the request.

http://www.searchengineethics.com => is the referrer path or url. This indicates who referred the visitor to the web page, thus telling you from where your visitors are coming. In this case, the visitor came from a searchengineethics.com.

If the referring site/url/path were a search engine, it would have looked this way: “http://www.google.com/search?q=search+engine+optimization”

This particular indicates the visitor has come to this site from Google. It also indicates the keywords that the visitor has used to reach your site.

Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0) => is the user agent. It indicates which browser and platform the visitor was using. Mozilla is the Netscape browser. When there is extra information such as in “Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0)” this usually indicates that it was Microsoft Internet Explorer disguised as Netscape.

Types Of Log Files

Four separate log files are kept on your Server.

  • Access log
    It records information about which files are being requested from your server. It is located in the www/logs/ directory and called access_log.
  • Agent log
    It records information about the web clients that make requests on your server. It is located in the www/logs/ directory and called agent_log.
  • Referer log
    It records information about the url that the web browser had been viewing immediately before making the request on your server. This is particularly useful when you want to determine where requests on your web server come from and what websites are referring web traffic to your server. It is located in the www/logs/ directory and called referer_log.
  • Error log
    It records information about failed requests of your server. If someone tries to access a file on your server that doesn’t exist, your server automatically generates an error message. Each of these error messages is recorded in the referrer log. It is located in the www/logs/ directory and called error_log.


Analyze The Log Files

The log files are complex to read in their raw format. It will take quite a few minutes to understand one log entry. While making marketing decisions, you might sometimes have to analyze the data of several months. You might also have to analyze the data related to particular web pages, or requests from a particular search engine.

In order to utilize the information that is collected from the log files onto your server, the statistics are required to be presented in a format that can be easily understood and used. Generally, servers offer two statistical analysis programs. One is getstats. This program allows quick and simple analysis of your access log. To run this program, go to the telnet prompt and type “getstats”. Or type “getstats ?c”, if you want a concise report.

The other one is analog. It creates a webpage with analysis of your access, referrer, agent and error logs. To run analog, type “analog” at the telnet prompt. Wait till the prompt returns. The output web page will be available at http://www.com/vstats/.

Some popular log analyzer programs

  • Web trends
    http://www.webtrends.com/
    Offers web analytics software that provides insights into visitors’ behavior and preferences.
  • WebSideStory, Inc.
    http://www.websidestory.com/
    Offers statistical analysis software and services.
  • FreeStats
    http://www.freestats.com/
    A web site tracking service.
  • Coremetrics, Inc.
    http://www.coremetrics.com/
    It provides insight to the browsing and purchasing behavior of internet users.
  • eXTReMe Tracking
    http://www.extreme-dm.com
    It provides numbers, percentages, stats, totals and averages; from simple visitor-counting to tracking the keywords they use to find your site.
  • NetGenesis
    http://www.netgen.com/
    Analytic consulting and services company that helps businesses develop customer intelligence capabilities.
  • DeepMetrix Corporation
    http://www.deepmetrix.com/
    Offers livestats for the tracking and analysis of website visitors and online campaigns.

Glossary

Glossary of the commonly used terms in most log analyzer programs:

Hits

A hit occurs when a web page, image or something else is accessed on your site. If one person views an html file with 5 images on it, that will be counted as 6 hits. This means one for the web page and five for the five image files.

Unique visitors

Unique visitors are usually identified by ip addresses. If a visitor from a specific ip address looks at 10 different pages on your site, yet it will be counted as one visitor only.

Visitors by hour/day

This tells you at which hours of the day the traffic is more and when it is less. It may be useful data in deciding when to implement and upload modifications in the web site.

Authenticated Users

Shows users who have gained access to password protected pages.

Visitors Paths

This shows how the visitors have navigated through the site, and which pages they have visited. This is one of the most important information that you can get from your server logs.

Top entry pages

This shows which are the pages that users access to enter your site. These are usually the pages that rank high in the search engines or the book-marked pages or pages with many external links pointing to them.

Top exit pages

This shows which pages that the user had accessed as their last page when they had visited your site. These are the pages that users use while exiting the web site. As you don’t want your visitors to leave quickly, you must do something about those pages so that your customers stay longer in your site.

Popular pages

These pages are seen most often. Placing important information or links to important pages on these pages might be very useful.

Popular files

These are the files that are downloaded by the users.

Browsers

Shows which browsers the visitors are using. Make sure your site looks well when it’s seen through the kind of browser your visitors are using.

Operating Systems

Shows what operating systems the visitors are using. MS Win 95, NT, Mac, etc.

Countries

Shows from which countries your visitors are coming from.

Referring sites

Shows what URLs are sending visitors to your site. These sites link to your site. Make sure all these pages are indexed in the search engines and would in turn consider linking back if their site is related to yours. Another use for these stats is to measure the effect of ads in these urls.

Unknown referrer usually means that visitors either typed in the link to the site or used bookmarked links.

Search Engines

Shows which search engines send visitors to your site. It shows you the areas your search engine optimization works and where it doesn’t.

Search phrases

These are the actual terms that people type into the search engines to find your site.

Spiders

It shows the names of the search engine crawlers who have visited your site. Here you can see which search engines have found your site and you can check if they have spidered all your pages. If a search engine crawler has spidered your site, it usually means that the site will get into the particular search engine’s index within a few weeks.

Errors

This shows different errors and status messages.

Author: SearchEngineEthics.com