Tuesday, March 31, 2009 12:59 AM
JackFree
Advanced Blogging about Advanced Logging
Advanced Blogging about Advanced Logging
One of the many new IIS 7 modules announced at MIX about 2 weeks ago is the IIS Advanced Logging Module, developed by the IIS Media Team. Why is it called "Advanced" Logging? Well, to make a short story long, the new logging feature went through a lot of names, including IIS Media Logging, Custom Logging, and more contentious & quirky names proposed by various team members to varying degrees of frivolity and political correctness, like "Brack" Logging (a combination of Brian/Jack, the primary developers who have been working on it), "Super Fly Logging", "Cupcake Logging" (based on an affectionate, well-meaning yet silly nickname for a certain coworker), and countless others that would have been instantly (and understandably) rejected by Marketing had they even made it that far. In the end we settled on "Advanced Logging", which, while somewhat sterile, is certainly more professional and marketable, and does convey the notion of a richer feature set, and one that is not exclusive to media.
A quick assessment of the typical logging functionality in most web servers today reveals a modest but respectable offering that includes per-website logging, with a choice of log policy (when to roll a log file) and a reasonable (but fixed) set of fields to choose from. This has served the web well for some time now, but there has also been a growing need to accommodate more complex web and media scenarios that demand a lot of flexibility and control. These scenarios may require custom log fields, real-time access to data, greater control over what gets logged and when, extensibility for new and different sources of data, the ability to consolidate log data coming from clients and correlating it to server data, the option of sharing data from various sources to store in multiple logs, capturing system-state information, inclusion in logs of canceled/aborted requests, and even logging multiple times per request. That's a big feature-set to tackle - let's go into detail into how the IIS Advanced Logging module (hereafter referred to as ALM) meets these demands, and others.
One of the first things we tackled and one of the most common requests has been to allow custom log fields to be included in standard logs. Consider a standard W3C style log today:
#Version: 1.0
#Date: 20-Mar-2008 12:34:56
#Fields: time cs-method cs-uri
12:53:23 GET /abc.html
12:55:16 GET /subfolder/oprah.html
12:55:34 GET /subfolder/AtTheBuffet.jpg
12:57:11 GET /dietPlan.xml
There’s not a lot of flexibility in the fields you can log. What if you wanted to include the last-modified time of the file that was downloaded? Or a custom request header set by your client application? Previously you’d have to try to lump this information into the uri query string (cs-uri-query), which can work, but isn’t very clean and doesn’t scale too well. With the ALM you can define as many custom log fields as you want. We support retrieving values from request and response headers, from server variables (an IIS construct), in addition to standard built-in types like time-taken, bytes-sent, etc. Furthermore, the name used to identify origin of the data (the request-header name, or the event item name coming from another IIS module [more on this later], etc) does not have to be the same as the log header name that is stored in the final file. This level of abstraction allows you to use human-readable, friendly names for what gets stored in the file. So even if the source of your data is a cryptic name like “CModPlylstSessionStateGuid” [totally made up] you can log it as “Playlist Session” or whatever you like. Have as many log fields as you want, name them how you choose, and thanks to IIS’s powerful configuration system, the custom fields you define are available at lower configuration path nodes (so a custom log field that you define on a site is available at that site’s virtual directories, subfolders, etc, and a field defined at the server level is visible everywhere).
Here’s a short example from applicationHost.config that defines a custom log field called “ManifestUrl”. (Note that you don’t have to edit appHost manually, we’ve got a nice UI to use too ;)
<field id="ManifestURL" logHeaderName="ManifestURL" sourceName="clientManifestUrl" sourceType="module" category="Client Fields" description="The URL to the manifest for the content played" />
The id attribute uniquely identifies this log field so that many log definitions can reference the same field.
The logHeaderName attribute defines the actual text that will be included in the file as the log header.
The sourceName tells ALM how to identify the incoming data.
The sourceType tells ALM where the data is coming from. (A source type of “module”, for example, tells ALM that another IIS module will be providing this data).
Category and description are used by the ALM Server UI to help administrators easily indentify and categorize log fields to make them easier to find.
The various options in defining custom logs will become clearer as (and if) you read on – the main point I reemphasize is that you are free to define as many custom log fields as you want that will be treated as first class citizens in the log files produced!
Logging data is typically not available instantly – you may have to wait some time for cached (in-memory) log data to be flushed to disk, and long-running requests for large files will (today) not offer you access to the log data until the entire response has been sent. Further, you are typically tied to the file as the one and only output of the data. Not anymore! ALM publishes an event (based on IIS’s core tracing subsystem which is in turn based on ETW) that allows any IIS module to hook into ALM to receive notifications just before a log entry is enqueued (saved in memory but not yet written to disk). This means that we do all the heavy-lifting of aggregating log data from different sources throughout the lifetime of a request/response, ordering and validating that data against the log definition, and then providing a consolidated complete view in real-time. What does all this fancy verbiage mean? It means you can easily write a custom module that listens to these events and offers real-time access to log data, such as an analytics application. You could also publish the log data to another website or archive it in a database the instant it is available.
- Greater control over what gets logged
Presently log data is a pretty simple entity – it’s either there or it’s not. You get a dash if the data is not there, and that’s about it. Not very exciting.
ALM might not be so exciting that your jaw drops in shock while reading this (no small feat for a blog article about logging), but it does offer more control. Specifically, you can mark individual fields as “required” so that a log entry will never be written unless it contains a value for that field. For example, if you have a client session Id that is sent as a cookie or a request header, you can mark it as a required field in your log definition so that requests that are missing this data will not be included. This is very useful in minimizing the data logged by verifying early on that fields critical to log parsing are always present.
Additionally, ALM allows you to specify a default value for a log field if it is absent, offering you a little more flexibility than just “-“. We also expose control over the text that is used for delimiting log fields, for new-lines and you can even shut off writing log headers to disk. This allows you to produce a simple flat-file (comma-separated or whatever you choose) to easily import into Excel, for example. The maximum size of a log field can also be customized, providing greater control of overall log growth, allowing certain fields to be quite large and restricting others to be quite small. It’s all configurable on a per-log basis!
- Extensibility for new and different sources of data
IIS’s Advanced Logging Module has built-in support for retrieving log field values from a number of sources. It inherently understands request and response headers. These values can be included in any log, and, just like any other log field, can be marked required or given a max size. In addition to request/response headers, you can also log any server variable, whether populated by IIS or by a custom module. Have your own custom module for IIS 7? It’s a cinch to make your own fields available to be logged. It’s as simple as a call to RaiseTraceEvent .
- Built-in support for client-provided log data
One of the most common sources of log data is (of course) not the server itself, but in fact the clients who connect to it. The Advanced Logging Module comes with a client handler that is used to accept client POST data in an XML format and publish it to any logs that wish to log the fields provided. The client handler is actually a relatively simple, straightforward and small piece of code. It would be very easy to write your own client handler to accept binary POST data, or to parse values which are part of a query string, and then publish those values for use by any custom log definition. Hopefully we’ll get some sample code online soon that shows just how easy it is to pass along data from any incoming format to the Advanced Logging Module.
- Sharing data from various sources to store in multiple logs
Another very handy feature of our Advanced Logging Module is that it is naturally integrated with IIS’s hierarchical configuration system. Instead of logging every request that reaches a site, you can configure a log on an application, a virtual directory, a folder or even a file. This allows almost unlimited flexibility in logging just the requests of interest. It allows you to exclude uninteresting logs, say for “.jpg” files, and just log requests for HTML or ASPX pages.
You may be wondering how this all works – what is the logic behind getting data from its source to its destination? The answer is both simple and complex (of course). What makes Advanced Logging so useful is that it employs a very generic and extensible model for aggregating log data. There is a deliberately loose coupling of the source of log data and its destination. This means that the providers of log data need only give their fields a unique name, and don’t need to know anything about how many logs are configured, or for which sites, or even whether logging is enabled at all. The Advanced Logging Module does the work of determining which logs are eligible to receive data, all based on the configuration path of the request and the configuration path of the log definitions.
An example is in order. Let’s say you issue a request to your server for http://www.myserver.com/subfolder/a/b/test.html. Logs defined at the following configuration levels would have an opportunity to create a log entry for this request:
/ (site level)
/a (the “a” subfolder, or virtual directory)
/a/b (the “b” subfolder)
/a/b/test.html (just the ‘test.html’ file)
However, a request to http://www.myserver.com/a would just reach the logs defined at the site level (“/”) and the “/a” subfolder (or virtual directory).
You’re not limited to one log at any particular configuration level, so it’s easy and useful to partition log data based on its final destination or purpose. There’s no restriction on which fields can go in which logs, so you can configure separate logs with no overlapping data, or share the same fields between logs at any level.
- Capturing system-state information
Another interesting option that IIS Advanced Logging offers is the ability to include system state information in the logs themselves. We offer built in support for performance counters, allowing administrators to capture network load or CPU utilization at the time a request is logged. This can be very handy in diagnosing server issues such as increased latency, excessive memory consumption, or heavy disk I/O, by correlating system state with the requests that influence it.
- Inclusion in logs of canceled/aborted requests
Previously it was not possible to capture log data for requests that were aborted or canceled. No so anymore – IIS Advanced Logging is implemented in such a way as to ensure that log entries can be generated for all requests, whether canceled, aborted, timed-out, etc.
- Logging multiple times per request
The current paradigm for web server logging is that one line in a log file corresponds directly to one HTTP request. That’s fine most of the time, but it doesn’t offer any information about long-running requests which may take minutes or hours to complete. With IIS Advanced Logging you can configure a log so that each relevant trace event (that occurs within the lifetime of a request/response) generates a new line in the log file. These trace events can be raised by other IIS modules, such as IIS’s Bit Rate Throttling Module, or by any custom module. This can be very useful in debugging problems with long-running requests or for providing real-time data about the progress of a response. Because IIS Advanced Logging allows you to define any number of logs at a given configuration level, you can easily create a separate log for interim data that utilizes this behavior while maintaining another log that just stores per-request data.
- Fine tuning of archive behavior
Another area of customizability that IIS Advanced Logging offers is control over how data is archived. Each log can specify its own archive-to-disk interval so that time-sensitive crucial data can be written very frequently, while other logs may choose to leverage the improved performance achieved by caching more data in memory between disk writes. Alternatively, for each log you could specify a maximum size (in KB) of log data to keep in memory before flushing to disk. Both of these settings (archiveToDiskInterval and inMemoryQueueSizeKB respectively) have reasonable defaults and need not be modified in order for archiving to happen regularly and reliably.
A rather low-level setting that can also be controlled on a per-log basis is the blocking queue size. By nature the Advanced Logging Module does the absolute minimum amount of processing on the thread that is servicing a request – it just saves off a copy of the relevant data. The actual disk I/O occurs asynchronously on another thread. If you have mission-critical log data that mustn’t be lost then you can optionally set a “blocking queue size” which specifies the maximum amount of log data to be stored in memory before requests will actually block and wait for the data to be archived.
If you have your own destination for the log data, or are just using the eventing feature of Advanced Logging, you may wish to turn off disk-writing entirely – a valid and supported scenario. Just set writeLogDataToDisk=false in the configuration for your log.
IIS Advanced Logging also offers a number of options for how to manage your log files – basic log roll options include schedule-based (hourly/daily/weekly/monthly/never), file-size based, or by duration of the entries contained within the log. The files themselves are created with names that you control, and we currently support using environment variables in the file names.
Just as with IIS’s existing logging functionality you can choose to log as UTF-8, or the default codepage of the system (CP_ACP). Since there’s no way to know whether all log servers will be configured with the same codepage, it is recommended to log in UTF-8.
I realize that this is a lot of information, but hopefully some will find it useful and applicable. We've put a lot of time and effort into making IIS Advanced Logging a feature-rich, robust and reliable logging solution that goes above and beyond expectations. We're always glad to hear your suggestions and feedback!
Cool things are still to come - as IIS Advanced Logging continues to develop and mature keep an eye out for even greater filtering and formatting features, and a few other surprises!
Happy logging,
-Jack
Tags: IIS Advanced Logging Log Data Config