I spent the past few hours writing a program to parse the browser string from the web server log files. Why didn't I use an existing web analyizer package? I wanted the browser strings to be rewriten to have correct information, as well as being in a more consistent style. This meant changing it from, say:
Mozilla/4.0 (compatible; MSIE 6.0; Windows 98; Win 9x 4.90; Q312461)
to
MSIE/6.0 Windows/98
This also means I can generate decent stats about the popularity of certain browsers on the fly (using the Unix command line, I can pull out the browser string, feed that through the newly written program, then count unique browsers easier). An initial run through last month's log file for my blog:
Table: Browser Statistics for The Boston Diaries # Hits Browser/Version OS/Version 1,228 Googlebot/2.1 -/- 748 MSIE/6.0 WindowsNT/5.1 712 MSIE/6.0 Windows/98 641 MSIE/6.0 WindowsNT/5.0 476 Mercator/2.0 -/- 371 MSIE/5.5 Windows/98 303 MSIE/5.0 Windows/98 302 MSIE/5.5 WindowsNT/5.0 238 -/- -/- 216 MSIE/5.01 WindowsNT/5.0 137 ia_archiver/- -/- 113 Syndic8/1.0 -/- 101 NCSA/- -/- 101 MSIE/5.01 Windows/98 100 MSIE/6.0 WindowsNT/4.0 99 Mozilla/3.01 -/- 89 Gecko/20020529 Linux/i686 88 Gecko/20020523 WindowsNT/5.0 81 MSIE/5.14 Mac_PowerPC/- 79 Mozilla/5.0 -/- 68 SlySearch/1.2 -/- 66 MSIE/5.5 Windows/95 62 MSIE/5.5 WindowsNT/4.0 62 Gecko/20020529 PPC/Mac 61 Openfind/- -/- 55 MSIE/5.0 Mac_PowerPC/- 49 Indy-Library/- -/- 48 Gecko/20020510 Linux/i686 42 Mozilla/3.0 -/- 41 sitecheck.internetseer.com/- -/- 40 Gecko/20020311 WindowsNT/5.1 38 MSIE/5.01 Windows/95 36 bumblebee@relevare.com/- -/- 33 Gecko/20020530 WindowsNT/5.0 28 bumblebee/1.0 -/- 28 Gecko/20020510 WinNT4.0/- 27 Opera/6.02 Windows/2000 27 MSIE/5.0 WindowsNT/4.0
This gives a decent flavor for what's being used to view my site (out of the 7,943 hits last month, about 16% were from the Google spider [1]) but one of the primary reasons I did this was to see just how many people are still using older browsers like Netscape 4x or Internet Explorer 4x (which would show up as Mozilla/4.x and MSIE/4.x respectively). So, strip out the operating system column, and look at only the major version numbers, we then get:
Table: More Specific Browser Statistics for The Boston Diaries # Hits Browser/major Version 2,210 MSIE/6 1,671 MSIE/5 1,228 Googlebot/2 543 Gecko/- 476 Mercator/2 238 -/- 142 Opera/6 141 Mozilla/3 137 ia_archiver/- 134 Mozilla/4 113 Syndic8/1 101 NCSA/- 79 Mozilla/5 68 SlySearch/1 61 Openfind/- 49 Indy-Library/- 45 MSIE/4 41 sitecheck.internetseer.com/- 37 Netscape6/6.2 36 bumblebee@relevare.com/- 28 bumblebee/1 26 linkhype.com/1 26 Netscape/7 24 BlogBot/1 22 Win32/- 22 Konqueror/3.0 20 Frontier/8.0 16 Internet/- 16 Ask-Jeeves/- 15 Mozilla/- 14 Microsoft/- 14 Konqueror/2.2 12 w3m/0.2 12 obidos/bot 12 Mozilla/4.7C-CCK-MCD 11 myownhomeblogindexingservicecrawler/- 11 htdig/3.1 10 Mozilla/3.x
The bad news: 48% of the browsers were Internet Explorer 5x or 6x (although surprisingly enough, I did get five hits from a Mozilla [2] based browser under OS/2). The good news though, is that 58% of the hits were from browsers capable of viewing CSS (Cascading Style Sheets) without crashing. And speaking of horrible browsers that can't support CSS, about 2.5% were running Netscape 4x or IE 4x (they can see the site, only it doesn't look that great).
I also checked the log file for Spring's [3] site (Hi honey!). 53% of her visitors are using Internet Explorer 5 or higher, or Mozilla (or Netscape 6 and higher). Only about 3% are using Netscape 4x or Internet Explorer 4x, which is pretty much on par with my site (the rest are mostly robots or experiemental browsers).