Log Watch Overview

How the architectural pieces fit together.

The primary purpose of the log watch daemon is to collect data from a set of log files and transmit it to a central server. The act of harvesting the data from a log file is called a "reap". To accomplish this we must have the following set of logical components:

Configuration
- Which files to reap and how
Watcher
- Watches a file for state changes
- Logically divided into
  - Common watch code (lwatch)
  - One or more implementation specific watch backends. Some OS's have facilities for monitoring file changes and we utilize the OS specifc monitoring whenever possible because it's the most efficient. If no OS support is available for file monitoring we can fallback on polling. At the moment the following backends are implemented:
    - Inotify backend
Persistent state storage
- The log watch daemon will start and stop (and hopefully never crash). However there is nothing preventing the file system from being modified during any interval of time the log watch daemon is not running. Therefore we must store the state of every file we're watching in persistent storage so that we know things such as what data for the file has already been reaped and sent to the central server, what constitutes unreaped data, has the file been overwritten, etc. We store this information in a SQLite database. When the log watch daemon starts up it consults the database and examines the file system to determine what actions it must take. During run time the database is used to "cache" information about a file or a watch.
Central server communications
- When data for a file is reaped it must be sent to the central server.
- The log watch daemon is also responsible for sending log events (ELAPI) to the central server sent to it from applications running locally.

How files are watched.

When a file is asked to be watched by the user it is called a watch target. A watch target does not need to exist, the log watcher must be capable of watching for when it comes into existence and when it no longer exists without error and to be able to reap all it's data whenever it is in existence.

The establishment of a watch target effectively constitutes a watch on a logical log file which consists of a primary log file and it's associated set of backups (see Vocabulary Terms). The backup log files must also be watched in addition to the primary file because all the files in a logical log file has data which needs to be reaped.

There is a further complication introduced by the presence of log rotation, backups, and the behavior of the UNIX file system. In UNIX a path name is a weak association to the contents of a file. File contents in UNIX are actaully identified by an inode. In fact it is entirely possible for more than one path name to point to the same inode. Another consequence is a file can be opened under one path name (OS locates the inode and returns a file descriptor handle to the file contents) and while the process has that file open it's path name can be modified such that it no longer has the same path name as when it was opened but the application is still reading or writing the same file contents for the same inode but now under a new pathname.

Typically when a log file is rotated the rotation is performed by a process other than the application (e.g. the logrotate daemon). Logrotate will rename the primary file (the one the application has open) to a log backup name. Logrotate may optionally signal the application after performing the rotation that it should close it's log file and reopened it again under the primary name. Or it may wait for the application to close and reopen the primary file at some future point in time (some applications open and close their log files after every write). The critical issue for us is that we cannot just watch the primary file, after a rotation the application may continue to write into what is now a backup path name because it is unaware the path name of the file it has a file descriptor open on has been subsequently renamed, it thinks it's still writing to the primary file. Thus we must watch all logical log files (primary plus backups). We also need to watch backups because during an interval of time when the log watch daemon is not running a rotation may have occured which we did not receive an event for.

It is usually not sufficient to ask a file monitoring backend to monitor a file because operations such as file creation, file deletion, and file renames occur on the directory containing the file, not on the file itself. As a consequence of this we ONLY MONITOR DIRECTORIES. Each directory is assigned a set of files to watch for.

If the directory containing a watched file does not yet exist we can't set a watch on it because watches can only be established on file system objects which exist. Therefore the watch on a directory may also be given the responsibility to watch not only for files it may contain but also for the creation of any sub-directories which might lead to a watched file (descendant watch).

Thus in order to watch changes in a requested target we may also end up watching:

any directory containing a target
any directory containing a backup of the target
any directory which is the ancestor of a directory needing to be watched but which does not yet exist.

Just to make life fun the set of watches established to accmodate the above need to be continuously adjusted because the file system is constantly changing underneath us. If a directory containing a watched file is deleted we need to move the watch to one of it's ancestors. If a directory is created which should have been watched but wasn't because it didn't exist then we need to move the watch from it's ancestor to the closest existing ancestor. The act of deciding a watched file has new data appended to it which needs to be reaped is actually one of the simpler tasks in log watching.

Please see How Watch Points Watch Descendants for further explanation of how watch points, descendants, and non-existent entities are handled.

Independent backends for watching.

Different OS's have different support for monitoring file system modifications. We need to be portable across many different OS's but we also need to be as efficient and robust as possible. Therefore we abstract out the OS specific mechanism for file system monitoring and put it into a monitoring "back end". The monitoring backend sends generic platform independent monitoring events to the log watcher. The log watcher dispaches on these generic monitoring events. Thus we can deploy the log watch daemon in a variety environments and configure it use the optimal monitoring backend for the system. If an OS does not provide file system monitoring facilities an backend can be written which relies on polling the file system. A polling backend is less than ideal because it may miss file system events which occur between polling and because it's inherently inefficient, it would have to check every watched item irrespective of whether it actually changed and repeat this every poll interval. Clearly it's more efficient and robust to use an OS based file monitoring facility which sends events only when things change.

The basic job of a file monitoring backend is to accept requests to watch specific objects and when a change occurs on that object to formulate it as a generic platform neutral event and dispatch that event to the log watcher.

How watch events are utilized.

See also:: lwatch_event_t for a description of the generic log watch event structure.

The set of possible generic log watch events types are:

LWATCH_EVENT_CREATE: a file or directory was created
LWATCH_EVENT_DELETE: a file or directory was deleted
LWATCH_EVENT_RENAME: a file or directory was renamed
LWATCH_EVENT_MODIFY: a file or directory was modified
LWATCH_EVENT_OPEN: a file or directory was opened
LWATCH_EVENT_CLOSE: a file or directory was closed

Vocabulary Terms

path watch object: In the inotify backend watches are established on path names. Inotify assigns the watch an id. When Inotify sends an event it identifies which watch the event is associated with via the id. The path watch object encapsultes both the id and the path name into one object.
watch target: A file we've been explicity asked to watch by the user. In the course of running we may watch other directories and files in addition to the explicit targets requested in order to fulfill other needs. To distinguish between watches requested by the user and watches needed for other purposes we refer to user requested watches as a "watch target" because it's a explicit target established by the user.
descendant watch: Inotify can watch both directories and files. Watch targets are files only thus one might expect we would establish an Inotify watch on a file. However it is the directory containing the file which reports the file's creation, deletion, and renaming. In addition an Inotify watch on a directory will report modifications to a file just as if the watch had been established on the file directly. Also critically import is the fact Inotify watches can only be established on file system objects which exist. These constraints mean we only establish Inotify watches on directories. An Inotify watch on a directory is responsible for watching for every file it contains which is a watched file AND for any files below it whose directory structure does not yet exist. These are descendants of the directory being watched. Technically files contained in the directory are also descendants of the directory if one considers a path name as a sequence of path components. A descendant watch then is any file contained in the directory being watched or any file below the directory which should be watched for it's creation. This means an Inotify path watch object (i.e. watch point) has a set of descendant watches it is responsible for.
watch point: Any directory which has one or more descendant watches (either because it holds a file being watched or because it's along the path to non-existent directory) is called a watch point. This is because the watch backend is instructed to watch that directory for any of it's descendant watches.
closest existing ancestor: When asked to watch a file we actually establish a watch on the directory containing the file. But what if that directory doesn't exist? It means we have to start at the last directory in the chain of directories comprising the path and start popping off directory components progressively toward the root of the file system until we find a directory which exists in the file system, we call this the "closest existing ancestor".
reap: The primary task of the log watch daemon is to collect data written to log files and transmit them to a central server for storage and analysis. The term "reap" means to collect or harvest. Thus when we say "reap" we mean the act of collecting or harvesting the log data for transmission to the central server.
log rotation: Log files are typically rotated meaning that when they reach a threshold on size or time the log file is moved to a backup name. Typically the N youngest backups are kept, each time a file is rotated the oldest backup is deleted to make room the new backup so there are most N backups at any given time. Think of it as a FIFO.
primary file: A primary file is the log file before being rotated. The primary file and the set of rotated backups derived from the primary file constitute the same logical log file, but partitioned into different physical files.
backup file: A backup file was a primary file prior to rotation. A backup file always has exactly one primary file that it was derived from. There may be many backups who share a single common primary file. One primary file and a collection of zero or more backups derived from the primary constitute a logical log file.
logical log file: A single primary and a collection of zero or more backups derived from the primary constitute a logical log file. This is because the primary and the ordered sequence of backups represent a single continuous data stream which happen to be partitioned into distinct physical files in the file system.