Thursday, May 05, 2005

Log Parser Basics... Ok, a little more than basics.

Mark Minasi's Windows Networking Tech Page newsletter this month posted an article to describe details behind using the Microsoft LogParser utility.  It is good reading and will come in handy...

 

Introducing Log Parser, a Tool You Must Learn

I'm too cheap to pay for one of those Web site analysis tools, but I'd like to be able to extract a few statistics from my Web logs -- in particular, it'd be great to know how many hits a particular page had, or how many of you took a moment and read this newsletter.  When I asked my friend and IIS expert Brett Hill, he got this mystical look in his eyes -- you know the way people look when they're about to tell you about the Secrets Of The Universe that they've recently discovered? -- and beatifically intoned, "Log Parser."

Now, I'd already heard about Log Parser, but I'd also heard that it was a [fill in your favorite frustration-related adjective] nightmare to understand syntax-wise.  Brett said no problem, he was going to do a talk about Log Parser at the next Windows Connections conference.  But Brett got hired away by Microsoft -- he's now an IIS 7.0 Evangelist and yes, I did mean to type "7.0" rather than "6.0" -- and so Randy Franklin Smith, a big-time security techie, stepped in.  In his presentation, Randy did just what I needed him to do:  give me a bunch of already-working examples of Log Parser syntax so I could get started.  So I've been working with it and in this article, I'll explain why you really want to learn it and then I'll explain some of its truly nightmarish syntax.

I strongly recommend that you give this article a look.  This is a very useful tool and, of course, the price is right.

What Log Parser Can Work On

Log Parser is a free command-line tool that can analyze and report on a variety of files.  I've already suggested one use, to count the number of times that this newsletter has been viewed.  But Log Parser can also analyze event logs, your Active Directory ("show me all of the managers -- that is, someone who appears in the 'manager' attribute of my user accounts -- and compute how many people each person manages"), the file system ("show me the names of the ten largest files on the hard disk"), any text file ("how many times does the word "really" appear in this document?"), Network Monitor output, the Registry ("how many REG_DWORD values exist in my Registry?") and a number of other formats.  It will then output that data as text, a file, a datagrid, new entries in a SQL database, SYSLOG entries, XML, and so on.

Installing Log Parser and Some Sample Data

You can find Log Parser at Microsoft's downloads section.  It's a simple MSI file and so a snap to install.  Unfortunately it installs itself to a folder in Program Files without modifying the system's PATH environment variable, meaning that you've got to be in Log Parser's directory to run it or you'll get a "bad command or file name" error.  Either add Log Parser's directory to your PATH variable, or if you're lazy like me then just copy the logparser.exe file into System32.  Then open up a command line and you're ready to start parsing.

But we'll need some data to analyze.  I've simplified (and sanitized) two day's logs from my Web site and put them at

http://www.minasi.com/testlogs.zip

Right-click that URL from Internet Explorer and choose "Save target as..." to save the file to your hard disk.  Unzip it and you'll find two log files -- put them in a directory named c:\logfiles.  Make that your default directory ("cd \logfiles") in your command prompt window and your commands can be shorter, as you won't have to type "c:\logfiles" in your Log Parser commands.  With that, we're ready to go.

Note:  here's another strong recommendation:  grab that ZIP file, download Log Parser and try this stuff out.  If you're feeling lazy, remember you can always just copy a Log Parser line from the Web page you're reading and paste it into your command prompt.

A First Query

Let's try out just about the simplest Log Parser command possible:


logparser "select * from *.log" -i:iisw3c

Picked apart, it is the command "logparser," followed by a SQL query statement -- don't run away, I'll show you all the SQL you'll need today! -- followed by the -i option, which explains to Log Parser what kind of file it is (an IIS log file, in this case).  The SQL query is "Select * from *.log", which just means "get everything" -- the asterisk works in SQL the same as it does in DOS commands, meaning "everything" -- from all of the files with the extension "log" in the current directory. 

(Aside: this is why learning Log Parser is difficult -- you're trying to learn two new things at the same time.  Half of what you're trying to learn is Log Parser's syntax, which is ugly enough all by itself.  But every Log Parser query includes a SQL query, and if you've never written SQL queries then you'll find that they're a quite wide field of syntax to master as well.  I strongly recommend taking the time to browse through the logparser.chm Help file that installs in the same directory as Log Parser.  And let me note at this point that I'm not a SQL query expert, so I may not be approaching these problems in the best way.)

Controlling Where Log Parser Puts Its Output: rtp, FROM and Data Grids

You'll get ten lines of output and then a "press a key..." prompt.  Given that there are about 6491 lines in the two logs and assuming that you want to see every line, that'd mean you'd have to press a key about 649 times... yuck.  That's where the -rtp ("records to print?") option comes in; set it to 100 and it'll only pause every 100 lines.  Set it to -1 and you'll never see a "press a key..." again:


logparser "select * from *.log" -i:iisw3c -rtp:-1

Of course, that still takes a long time and is kinda useless in a command prompt window.  We can tell Log Parser to stuff the result into a file by adding the "INTO" option.  It goes in the SQL query before the FROM part.  This takes the output and puts it in a file called OUTPUT.TXT:


logparser "select * into output.txt from *.log" -i:iisw3c -rtp:-1

Open output.txt in Notepad and you'll see that you've got all of the info from the two logs nicely collected in output.txt.  But Log Parser can output its data in other ways as well.  In particular, it can use a built-in thing that 2000, XP and 2003 contain called a "data grid."  We tell Log Parser to output to something other than its "native" format (dumping all of the junk onto the screen) with the -o: option:


logparser "select * from *.log" -i:iisw3c -rtp:-1 -o:datagrid

I guess I shouldn't be surprised given that Windows is 40 or 50 million lines of code these days, but it's always interesting to learn that there's something "new" that I already owned.  Notice the "Auto Resize" button -- click it and the columns figure out how wide they should be; very nice.

Seeing What's In an Input File

Notice what's going on here -- Log Parser used spaces, tabs or commas to separate -- "parse" is the correct phrase -- each line into particular items.  You then see in the column headers the names of those items.  For example, the IP address of the person visiting my Web site is in the c-ip field, and the file that they viewed is in the cs-uri-stem field.  Or alternatively you can ask Log Parser about the "iisw3c" format like so:


logparser -h -i:iisw3c

But what we've seen so far is really just a straight dump of the logs, no computation or analysis.  What if I just wanted to see the IP addresses of my visitors?  I'd do that by restricting the things that I SELECT:


logparser "select c-ip from c:\logfiles\*.log" -o:datagrid -i:iisw3c -rtp:-1

Doing a Little Analysis and Beautifying:  COUNT, GROUP BY and AS

Ah, a much smaller amount of data, but again no analysis.  It'd be more interesting to see how often each one visited.  Again, I modify the SQL SELECT statement. I can create a new field that reports the number of times that a given IP address appears by adding a "count(*)" variable to the SELECT statement.  COUNT does just what you'd expect it to do -- it counts records.  A super-simple example might be (note that this doesn't work, I'm just introducing the idea):


logparser "select c-ip, count(*) from *.log" -i:iisw3c -o:datagrid -rtp:-1

Now, if this did work, it'd list two columns -- each IP address and how often that IP address appears.  If something showed up 30 times you'd see it 30 times with a count of 30 next to it each time.  But, again, this doesn't work, and Log Parser says that it wants a "group by" clause.   So let's do it logparser's way and add a GROUP BY clause.


logparser "select c-ip, count(*) from *.log group by c-ip" -i:iisw3c -o:datagrid -rtp:-1

That works, and we get about 1200 entries instead of almost 6500, as the duplicates are gone.  But the data grid's column label for the c-ip count is "COUNT(ALL *)," which is not all that meaningful.  That count is the number of visits that a particular IP address made, so we'd like Log Parser to call that column something more meaningful, like for example, oh, "visits."  I can do that by adding an AS clause to the list of selected items:


logparser "select c-ip, count(*) as visits from *.log group by c-ip" -i:iisw3c -o:datagrid -rtp:-1

Sorting and Shortening:  ORDER BY, TOP, DESC and HAVING

That query's output has a better column title, but the list shows the IP addresses in no particular order. It'd be nice to have it sorted by frequency, so we add the "ORDER BY" clause (this should be typed as one line although I've  broken it up so that your browser doesn't make you scroll left and right):


logparser "select c-ip, count(*) as visits from *.log group by c-ip
order by visits" -i:iisw3c -o:datagrid -rtp:-1

Neat; now it's easy to see that one IP address visited over 140 times.   But there are an awful lot of one-visit IP addresses; can we just see the top five visitors?  Sure, with the TOP clause (again, type as one line even though I've broken it for a happier browser experience):


logparser "select top 5 c-ip, count(*) as visits from *.log group by c-ip
order by visits" -i:iisw3c -o:datagrid -rtp:-1

Hmmm... that showed me the "top" five, all right, but the "top" of the data grid is the low numbers, as it starts out with the one-visit IPs and ends up with the multi-time visitors at the bottom of the data grid.  How to see the bottom five?  Well, there isn't a BOTTOM clause, but we can tell it to sort descending rather than its default, ascending, by adding the DESC clause to ORDER BY.  Then the most frequently-visiting IP addresses end up at the top of the results and TOP 5 works as we'd hoped (type as one line):


logparser "select top 5 c-ip, count(*) as visits from *.log group by c-ip
order by visits desc" -i:iisw3c -o:datagrid -rtp:-1

Or alternatively I might just want to see all of the IP addresses that have visited me more than 50 times. I can do that with the HAVING clause of the SQL query (type as one line):


logparser "select c-ip, count(*) as visits from *.log group by c-ip
having count(*) >50 order by visits desc" -i:iisw3c -o:datagrid -rtp:-1

Notice that in that case I removed the "top 5" and added "having count(*) > 50" after the "group by."  Log Parser will squawk at you if you put them in a different order.  In SQL queries want to see their clauses in a particular order -- first the SELECT, then the things you're selecting, then the FROM, then the HAVING, then the ORDER BY.  If you're using an INTO, it goes after the things you're selecting and before the the FROM.  So for example if I wanted the output of my query to go to a text file named out2.txt, I'd type (yes, again please type as one line although it's broken):


logparser "select c-ip, count(*) as visits into out2.txt from *.log
group by c-ip having count(*) >50 order by visits desc" -i:iisw3c -rtp:-1

Side Trip:  More Useful SQL Syntax: WHERE, LIKE and More

As long as I'm talking about the format of the SQL SELECT statement, let's take a short side-trip from questing for the perfect query (remember, I'm trying to figure out how many people saw thismonth.htm) and look at what the SELECT statement can do in a bit more detail. 

If you have a really long, ugly SQL query then you can put it in a text file and refer to it.  For example, consider that last query -- it had a huge SELECT statement.  I could type its SQL part, "select c-ip, count(c-ip) as visits from *.log group by c-ip having count(*) >50 order by visits desc" in a text file like so:


select top 5
c-ip,
count(*) as visits
from *.log
group by c-ip
having count(*) > 50
order by visits desc

I then save that in a file I'll call myquery.sql (but I could call it anything).  Then this command gets the job done:


logparser file:myquery.sql -i:iisw3c -o:datagrid -rtp:-1

But I've left out a couple more types of clauses that you can put into a Log Parser SQL query.    There are also the WHERE and USING clauses.  (By the way, Log Parser only supports a subset of honest-to-God SQL.  Thank goodness.  And USING seems not to be a standard SQL clause.)  The order of these clauses in a Select statement is:

  • USING
  • INTO
  • FROM
  • WHERE
  • GROUP BY
  • HAVING
  • ORDER BY

And no, I haven't defined what USING does yet, that's kind of beyond the scope of this introductory article.  I just wanted to offer one place where I presented all of the SQL clauses in order for easy reference.  Let's take up WHERE next.

By saying "select c-ip, count(c-ip) from *.log...", I told Log Parser to get the c-ip data from every single line on all of the logs in my local directory, and then do its analysis on those records.  But sometimes I want to tell Log Parser not to fetch every single record, but instead a subset.  I do that with the WHERE clause.  Suppose (I know this is a stretch but I need an example that fits into our data set) I only want to see entries where the IP address starts with "194."  This WHERE clause will do that :


logparser "select top 5 c-ip, count(*) as hits from *.log where c-ip like '194.%.%.%'
group by c-ip order by hits desc" -i:iisw3c -o:datagrid

This command includes the clause "where c-ip like '194.%.%.%' and "LIKE" means "matches a particular pattern."  Patterns can either be particular characters, like if I'd written "cs-ip like '194.44.22.91,'" or they can use the "_" and "%" wildcards.  "_" means "match exactly one character" and "%" means "match zero or more characters."  The pattern '194.%.%.%,' then -- notice that LIKE patterns are always surrounded by single quotes -- would match any IP address that started with 194, followed by a period, followed by anything (the %), followed by another period, followed by another percent, followed by a final period and a percent.  Here are a few more LIKE pattern examples:

Pattern Examples that would match
'Mark' Mark -- "mark" wouldn't do it, case matters
'Mark%' Mark Minasi, Mark77, Marky, Mark
'Mark_' Marky, MarkM; Neither Mark nor Mark Minasi would work
'Mar%k' Mark, Maraardvark
% any string at all, or even nothing


There is also a NOT LIKE command.

So we've seen that we can use WHERE to restrict the things that SELECT does its work on.  But doesn't HAVING do that as well?  Kind of, but not exactly.  WHERE restricts the data that SELECT looks at to do its analysis; HAVING restricts the results of that analysis.  If that's not clear, let's do another query that will make it clear, as well as giving me an excuse to do some more Log Parser examples.

Querying For the Most Popular Files

We've explored our most-visiting-IP-addresses, but recall that wasn't really what I wanted to do -- I wanted to see how often people viewed thismonth.htm.  Can you see how you'd change it so that we're not seeing the most frequently-visiting IP address, but instead to see the most-requested files?  A look at the data grid output shows that the name of a requested file -- default.asp, thismonth.htm, or the like is the field "cs-uri-stem."   Form the same query as before, but replace "c-ip" with "cs-uri-stem."  Additionally, "hits" is probably a better phrase than "visits" and "file-requested" is more meaningful to most than "cs-uri-stem" and so we end up with this query (again broken here but should be typed as one line):


logparser "select top 5 cs-uri-stem as requested-file, count(c-ip) as hits from *.log
group by requested-file order by hits desc" -i:iisw3c -o:datagrid -rtp:-1

But let's recall that I had a particular query in mind when I got started -- how many hits did my newsletter file, thismonth.htm, get?  I could figure that out from the query that showed me the top five most-visited files, but that's kind of lame.  Instead, this query does it with a WHERE clause, reporting right to the screen (again, type as one line):


logparser "select cs-uri-stem as filename, count(*) as hits from *.log
where filename='/thismonth.htm' group by filename" -i:iisw3c

Again, notice that WHERE clause.  SQL queries fetch some subset of the fields in a database (cs-uri-stem in this case, for example) -- Log Parser didn't grab every available field in the IIS logs, just cs-uri-stem, and so it had a smaller bunch of data to work on, which presumably would make the query run faster.  So naming particular fields in the SELECT statement instead of entering * to get all of the fields reduces the number of fields to fetch before doing some kind of analysis and reporting.  In contrast, using a WHERE clause reduces the number of records fetched.  Fewer records also means less work for the query engine, which means a faster query.

Now I can offer an example where WHERE does something similar to HAVING.  We could phrase the query this way (type as one line):


logparser "select cs-uri-stem as filename, count(*) as hits from *.log
group by filename having filename='/thismonth.htm'" -i:iisw3c

In this second query, I told Log Parser to grab the cs-uri-stem data from every single record and do a bit of computation on it (count the frequency of each file).  Once it's done with that, then Log Parser's got the breakdown of file name frequencies for every single file every encountered.  Now, I don't want to see all of those file name frequencies, I just want the frequency for thismonth.htm.  That's what the HAVING clause does -- it says "Log Parser, you've got a huge hunk of data, but I only want you to show me a tiny bit of it." 

I figured that the first query, the one with the WHERE clause would be a bit more efficient as it says to only bother computing the hit count on records about thismonth.htm, where the second computed hit counts on every single file mentioned in the log, and then only showed thismonth.htm.  And my guess was borne out, as Log Parser reports how long it takes to do something.   And yeah, the time difference was about 0.1 second, but remember there's only two log files in our test bunch -- analyzing five and a half years' worth of logs might pay off in terms of a noticeable time difference with a more efficient query.  Of course, I could be wrong --  remember, I'm just an apprentice SQLer.  (Is that pronounced "squealer?"  I'm getting these creepy flashbacks to Ned Beatty in Deliverance for some reason.)

I should also mention that, again, I'm just scratching the surface here, but here's an even more efficient way to tally the thismonth.htms:


logparser "select count(*) from *.log where cs-uri-stem='/thismonth.htm'" -i:iisw3c

Controlling IIS Log Dates To Query From

Now, that's all pretty neat, except for one thing:  I've been running a Web site named www.minasi.com for quite a long time.  But I've only been offering these free newsletters since 1999 and if I recall right I've only used the "thismonth.htm" file name for the past two.  In addition, I'm really only interested in how many people have looked at this in, say, the past month.  How, then, do I tell Log Parser "do that query, but for heaven's sake don't read every IIS log I've got going back to the beginning of time; instead, only look at entries since Friday, 29 April 2005 at 7:00 PM.

As it turns out the particular iisw3c input type has a special option designed to do just that, as I discovered by looking in the "IISW3C Input Format Parameters" page of Log Parser help.  Just add the -mindatemod parameter, followed by the earliest time that you want the log searched in yyyy-mm-dd format.  For example, to only see the "thismonth.htm" hits since 29 April 2005 I'd type (as one line)


logparser "select count(*) from *.log where cs-uri-stem='/thismonth.htm'"
-i:iisw3c -mindatemod 2005-04-29

Or, to include the time as well, add time as hh:mm:ss and put quotes around the date/time combination.  To see all the hits since 7 PM on the 29th of April 2005, I'd type (as one line)


logparser "select count(*) from *.log where cs-uri-stem='/thismonth.htm'" -i:iisw3c
-mindatemod "2005-04-29 19:00:00"

One little quirk to remember is that the log files store time in Greenwich/Universal/Zulu time.  So in my case, as I live in the Eastern time zone, I'd have to ask not for 7 PM (which is Eastern Daylight Time as I write this) but instead for 11 PM Universal, as I'm four hours behind that time zone.  In the winter I'm five hours behind UTC, so 7 PM for me would be the next day's midnight.  

Log Parser Output Fun

Whew, that SQL syntax stuff can be rough going, if useful.  Let's take a break and play with some more fun stuff -- a few of the ways that Log Parser can output the results of your queries.  We've seen the "native" (simple text) and data grid outputs.  Just for the sake of Log Parser weirdness, try changing the output format to "neuroview (type as one line):"


logparser "select top 5 c-ip, count(*) as visits from *.log group by c-ip
order by visits desc" -i:iisw3c -o:neuroview -looprows:1

Which produces a pretty much useless output... but it looks like the credits from the Matrix movies so you'll no doubt impress the crowd at your next presentation.

If you've got Office loaded on your system, then you can do some graphing too.  Try creating a bar chart instead of a data grid (type as one line):


logparser "select top 5 c-ip, count(*) as visits into chart.gif from *.log
group by c-ip order by visits desc" -i:iisw3c -o:chart -charttype:bar3d -view:on

Again, look in Log Parser's Help for more ways to display its query results.  You might sometimes want to do a really complex query that can't be done in one SQL statement; in that case, you'd do the query in parts, where you do the initial query and save those results to some kind of file (probably a comma separated variable or CSV file), then do a second query on that file, and so on.

Querying the File System with Log Parser

Let's try a query or two on something other than an IIS log.  Wondering how many MP3 files are on your system?  Well, we've seen all of the fields in iisw3c-type input files.  Here we'll use the "fs" input type file and we can get a listing of its fields with the logparser -h -i:fs approach, or look in the quite helpful Log Parser help file.  I find that, not surprisingly, there is an attribute called "name"


logparser "select count(*) from c:\* where name like '%.mp3'" -i:fs

You'll probably get a complaint that it couldn't complete the task but that's because it can't read the System Volume Information folder -- the results are still correct.  (If you like, you can give yourself Read permissions to the System Volume Information folder, but you'll get the same results either way.  Unfortunately there is no way that I know of to say to Log Parser, "search all of C:\ except for such-and-such directory.")  Or total up how much space they're taking:


logparser "select sum(size) from c:\* where name like '%.mp3'" -i:fs

Querying Active Directory

Here's a quick Log Parser AD example.  Suppose I want to get a list of all of the first names in the company, and how many people have each first name.  The only trick you've got to know is the AD word for "first names," which (if you read Newsletter 45) you know is "givenname."  Second, the "FROM" part looks like

'ldap://yourusername:yourpassword@yourdomainname/wheretosearch'

So suppose I've got an administrative account called bigguy with password "swordfish" at bigfirm.com and I want to search the whole domain for user names.  The query would look like this (and it's an ugly one, again you'd type as one line although I split it into three on the page):


logparser "select givenname, count(givenname) as tally
from 'ldap://bigguy:swordfish@bigfirm.com/dc=bigfirm,dc=com
group by givenname order by tally desc" -objclass:User -o:datagrid -rtp:1 -i:ADS

More Resources

This was just a start with Log Parser.  The help's got lots of examples that are worth working through, and of course you can Google it for more.  You'll also find more links about Log Parser at www.logparser.com.  Apparently Log Parser's even got a book written about it!

8 comments:

Coco said...

can i make use of len function in logparser?

Anonymous said...

Good Day!!! nbe.blogspot.com is one of the best innovative websites of its kind. I enjoy reading it every day. nbe.blogspot.com rocks!

Anonymous said...

He that makes a good war makes a good peace.


--------------
The University of British Columbia in Canada

Anonymous said...

He that makes a good war makes a good peace.

-----------------------------------

Anonymous said...

I visited multiple websites however the audio
quality for audio songs present at this website is genuinely superb.


Here is my website :: diet that works

Anonymous said...

http://medassist.org/media/buypropecia/#24982 propecia prescription - propecia risks

Anonymous said...

Great information!

Channdeep Singh said...

This is amazing sir. I fixed the problem of duplicate LOG PARSER records in SQL database using your article. Thanks a lot !