Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
08d3fdc
Merge pull request #4 from datalust/dev
nblumhardt Feb 11, 2018
dae74e9
Merge pull request #8 from datalust/dev
nblumhardt Feb 12, 2018
a9dffcd
Merge pull request #15 from datalust/dev
nblumhardt Feb 12, 2018
155b232
Plain text ingestion WIP - read frames of input with trailing lines
nblumhardt Feb 21, 2018
68d7750
Merge pull request #18 from nblumhardt/feat/plain
nblumhardt Feb 21, 2018
a3359b2
Work-in-progress pattern executor
nblumhardt Feb 21, 2018
4d3120c
Non-greedy matching
nblumhardt Feb 21, 2018
9f02d73
Some notes to self
nblumhardt Feb 21, 2018
7c19491
Merge pull request #23 from datalust/dev
nblumhardt Feb 22, 2018
2b39a1c
Merge branch 'dev' of https://github.com/datalust/seqcli into pattern
nblumhardt Feb 22, 2018
a600cca
Hook the frame reader and pattern matcher up to IngestCommand
nblumhardt Feb 22, 2018
e639dec
LogEventBuilder tests
nblumhardt Feb 22, 2018
f756b37
Merge pull request #20 from nblumhardt/pattern
nblumhardt Feb 23, 2018
3d17962
Initial extraction pattern support
nblumhardt Feb 23, 2018
c424b98
-p already taken for properties
nblumhardt Feb 23, 2018
7f1703c
More placeholder tests, WIP
nblumhardt Feb 23, 2018
cbe7336
Enough working to parse default Serilog.Sinks.File formatted events
nblumhardt Feb 24, 2018
cf1647d
Multiple-token non-greedy lookahead
nblumhardt Feb 24, 2018
0144a1e
Merge pull request #24 from nblumhardt/pattern-lang
nblumhardt Feb 25, 2018
5bf6c18
Groups work-in-progress
nblumhardt Feb 25, 2018
36f8a31
Equals for compound match expressions
nblumhardt Feb 26, 2018
e912be0
Doc tweaks
nblumhardt Feb 26, 2018
685f23c
Merge pull request #28 from nblumhardt/compound-patterns
nblumhardt Feb 28, 2018
0a43221
Merge remote-tracking branch 'origin/master' into feat/plain
nblumhardt Feb 28, 2018
7991c8c
Merge remote-tracking branch 'origin/dev' into feat/plain
nblumhardt Feb 28, 2018
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
73 changes: 72 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,14 +55,16 @@ Send JSON log events from a file or `STDIN`.
Example:

```
seqcli ingest -i events.clef --filter="@Level <> 'Debug'" -p Environment=Test
seqcli ingest -i events.clef --json --filter="@Level <> 'Debug'" -p Environment=Test
```

| Option | Description |
| ------ | ----------- |
| `-i`, `--input=VALUE` | CLEF file to ingest; if not specified, `STDIN` will be used |
| `--invalid-data=VALUE` | Specify how invalid data is handled: fail (default) or ignore |
| `-p`, `--property=VALUE1=VALUE2` | Specify event properties, e.g. `-p Customer=C123 -p Environment=Production` |
| `-x`, `--extract=VALUE` | An extraction pattern to apply to plain-text logs (ignored when `--json` is specified) |
| `--json` | Read the events as JSON (the default assumes plain text) |
| `-f`, `--filter=VALUE` | Filter expression to select a subset of events |
| `-s`, `--server=VALUE` | The URL of the Seq server; by default the `connection.serverUrl` value will be used |
| `-a`, `--apikey=VALUE` | The API key to use when connecting to the server; by default `config.apiKey` value will be used |
Expand Down Expand Up @@ -147,3 +149,72 @@ Stream log events matching a filter.
### `version`

Print the current executable version.

## Extraction Patterns

The `seqcli ingest` command can be used for parsing plain text logs into structured log events.

```shell
seqcli ingest -x "{@t:timestamp} [{@l:ident}] {@m:*}{:n}{@x:*}"
```

The `-x` argument above is an _extraction pattern_ that will parse events like:

```
2018-02-21 13:29:00.123 +10:00 [ERR] The operation failed
System.DivideByZeroException: Attempt to divide by zero
at SomeClass.SomeMethod()
```

### Syntax

Extraction patterns have a simple high-level syntax:

* Text that appears in the pattern is matched literally - so a pattern like `Hello, world!` will match logging statements that are made up of this greeting only,
* Text between `{curly braces}` is a _match expression_ that identifies a part of the event to be extracted, and
* Literal curly braces are escaped by doubling, so `{{` will match the literal text `{`, and `}}` matches `}`.

Match expressions have the form:

```
{name:matcher}
```

Both the name and matcher are optional, but either one or the other must be specified. Hence `{@t:timestamp}` specifies a name of `@t` and value `timestamp`, `{IPAddress}` specifies a name only, and `{:n}` a value only (in this case the built-in newline matcher).

The _name_ is the property name to be extracted; there are four built-in property names that get special handling:

* `@t` - the event's timestamp
* `@m` - the textual message associated with the event
* `@l` - the event's level
* `@x` - the exception or backtrace associated with the event

Other property names are attached to the event payload, so `{Elapsed:dec}` will extract a property called `Elapsed`, using the `dec` decimal matcher.

Match expressions with no name are consumed from the input, but are not added to the event payload.

### Matchers

Matchers identify chunks of the input event.

Different matchers are needed so that a piece of text like `200OK` can be separated into separate properties, i.e. `{StatusCode:nat}{Status:alpha}`. Here, the `nat` (natural number) matcher also coerces the result into a numeric value, so that it is attached to the event payload numerically as `200` instead of as the text `"200"`.

There are three kinds of matchers:

* Matchers like `alpha` and `nat` are built-in _named_ matchers. These are built-in.
* The special matchers `*`, `**` and so-on, are _non-greedy content_ matchers; these will match any text up until the next pattern element matches (`*`), the next two elements match, and so-on. We saw this in action with the `{@m:*}{:n}` elements in the example - the message is all of the text up until the next newline.
* More complex _compound_ matchers are described using a sub-expression. These are prefixed with an equals sign `=`, like `{Phone:={:nat}-{:nat}-{:nat}}`. This will extract chunks of text like `123-456-7890` into the `Phone` property.

### Processing

Extraction patterns are processed from left to right. When the first non-matching pattern is encountered, extraction stops; any remaining text that couldn't be matched will be attached to the resulting event in an `@unmatched` property.

Multi-line events are handled by looking for lines that start with the first element of the extraction pattern to be used. This works well if the first line of each event begins with something unambiguous like an `iso8601dt` timestamp; if the lines begin with less specific syntax, the first few elements of the extraction pattern might be grouped to identify the start of events more accurately:

```
{:=[{@t} {@l}]} {@m:*}
```

Here the literal text `[`, a timestamp token, adjacent space ` `, level and closing `]` are all grouped so that they constitute a single logical pattern element to identify the start of events.

When logs are streamed into `seqcli ingest` in real time, a 10 ms deadline is applied, within which any trailing lines that make up the event must be received.
38 changes: 28 additions & 10 deletions src/SeqCli/Cli/Commands/IngestCommand.cs
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@
using SeqCli.Cli.Features;
using SeqCli.Connection;
using SeqCli.Ingestion;
using SeqCli.PlainText;
using Serilog;
using Serilog.Core;
using Serilog.Events;
Expand All @@ -28,15 +29,16 @@
namespace SeqCli.Cli.Commands
{
[Command("ingest", "Send JSON log events from a file or `STDIN`",
Example = "seqcli ingest -i events.clef --filter=\"@Level <> 'Debug'\" -p Environment=Test")]
Example = "seqcli ingest -i events.clef --json --filter=\"@Level <> 'Debug'\" -p Environment=Test")]
class IngestCommand : Command
{
readonly SeqConnectionFactory _connectionFactory;
readonly InvalidDataHandlingFeature _invalidDataHandlingFeature;
readonly FileInputFeature _fileInputFeature;
readonly PropertiesFeature _properties;
readonly ConnectionFeature _connection;
string _filter;
string _filter, _pattern;
bool _json;

public IngestCommand(SeqConnectionFactory connectionFactory)
{
Expand All @@ -45,10 +47,18 @@ public IngestCommand(SeqConnectionFactory connectionFactory)
_invalidDataHandlingFeature = Enable<InvalidDataHandlingFeature>();
_properties = Enable<PropertiesFeature>();

Options.Add("x=|extract=",
"An extraction pattern to apply to plain-text logs (ignored when `--json` is specified)",
v => _pattern = string.IsNullOrWhiteSpace(v) ? null : v.Trim());

Options.Add("json",
"Read the events as JSON (the default assumes plain text)",
v => _json = true);

Options.Add("f=|filter=",
"Filter expression to select a subset of events",
v => _filter = string.IsNullOrWhiteSpace(v) ? null : v.Trim());

_connection = Enable<ConnectionFeature>();
}

Expand All @@ -71,14 +81,22 @@ protected override async Task<int> Run()
? new StreamReader(File.Open(_fileInputFeature.InputFilename, FileMode.Open, FileAccess.Read,
FileShare.ReadWrite))
: null)
using (var reader = new LogEventReader(inputFile ?? Console.In))
{
return await LogShipper.ShipEvents(
_connectionFactory.Connect(_connection),
reader,
enrichers,
_invalidDataHandlingFeature.InvalidDataHandling,
filter);
var input = inputFile ?? Console.In;

var reader = _json ?
(ILogEventReader)new ClefLogEventReader(input) :
new PlainTextLogEventReader(input, _pattern);

using (reader as IDisposable)
{
return await LogShipper.ShipEvents(
_connectionFactory.Connect(_connection),
reader,
enrichers,
_invalidDataHandlingFeature.InvalidDataHandling,
filter);
}
}
}
catch (Exception ex)
Expand Down
6 changes: 3 additions & 3 deletions src/SeqCli/Csv/CsvTokenizer.cs
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ namespace SeqCli.Csv
{
class CsvTokenizer : Tokenizer<CsvToken>
{
static readonly TextParser<TextSpan> Content = Span.While(ch => ch != '"');
static readonly TextParser<TextSpan> Content = Span.WithoutAny(ch => ch == '"');

protected override IEnumerable<Result<CsvToken>> Tokenize(TextSpan span)
{
Expand All @@ -26,9 +26,9 @@ protected override IEnumerable<Result<CsvToken>> Tokenize(TextSpan span)
if (!next.HasValue) yield break;

var text = Content(next.Location);
while (text.HasValue)
while (text.HasValue || !text.Remainder.IsAtEnd)
{
if (text.Value.Length > 0)
if (text.HasValue)
{
if (TryMatchSpecialContent(text.Value, out var specialTokenType) &&
!IsEscapedDoubleQuote(text.Remainder))
Expand Down
45 changes: 45 additions & 0 deletions src/SeqCli/Ingestion/ClefLogEventReader.cs
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
// Copyright 2018 Datalust Pty Ltd
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.

using System;
using System.IO;
using System.Threading.Tasks;
using Serilog.Events;
using Serilog.Formatting.Compact.Reader;

namespace SeqCli.Ingestion
{
class ClefLogEventReader : ILogEventReader, IDisposable
{
readonly LogEventReader _reader;

public ClefLogEventReader(TextReader input)
{
_reader = new LogEventReader(input ?? throw new ArgumentNullException(nameof(input)));
}

public Task<LogEvent> TryReadAsync()
{
if (_reader.TryRead(out var evt))
return Task.FromResult(evt);

return Task.FromResult<LogEvent>(null);
}

public void Dispose()
{
_reader.Dispose();
}
}
}
10 changes: 10 additions & 0 deletions src/SeqCli/Ingestion/ILogEventReader.cs
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
using System.Threading.Tasks;
using Serilog.Events;

namespace SeqCli.Ingestion
{
interface ILogEventReader
{
Task<LogEvent> TryReadAsync();
}
}
22 changes: 14 additions & 8 deletions src/SeqCli/Ingestion/LogShipper.cs
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,6 @@
using Serilog.Core;
using Serilog.Events;
using Serilog.Formatting.Compact;
using Serilog.Formatting.Compact.Reader;

namespace SeqCli.Ingestion
{
Expand All @@ -38,7 +37,7 @@ static class LogShipper

public static async Task<int> ShipEvents(
SeqConnection connection,
LogEventReader reader,
ILogEventReader reader,
List<ILogEventEnricher> enrichers,
InvalidDataHandling invalidDataHandling,
Func<LogEvent, bool> filter = null)
Expand All @@ -47,7 +46,7 @@ public static async Task<int> ShipEvents(
if (reader == null) throw new ArgumentNullException(nameof(reader));
if (enrichers == null) throw new ArgumentNullException(nameof(enrichers));

var batch = ReadBatch(reader, filter, BatchSize, invalidDataHandling);
var batch = await ReadBatchAsync(reader, filter, BatchSize, invalidDataHandling);
while (batch.Length > 0)
{
StringContent content;
Expand All @@ -67,7 +66,7 @@ public static async Task<int> ShipEvents(

if (result.IsSuccessStatusCode)
{
batch = ReadBatch(reader, filter, BatchSize, invalidDataHandling);
batch = await ReadBatchAsync(reader, filter, BatchSize, invalidDataHandling);
continue;
}

Expand All @@ -80,7 +79,7 @@ public static async Task<int> ShipEvents(

Log.Error("Failed with status code {StatusCode}: {ErrorMessage}",
result.StatusCode,
(string)error.ErrorMessage);
(string)error.Error);
}
catch
{
Expand All @@ -95,16 +94,23 @@ public static async Task<int> ShipEvents(
return 0;
}

static LogEvent[] ReadBatch(LogEventReader reader, Func<LogEvent, bool> filter,
int count, InvalidDataHandling invalidDataHandling)
static async Task<LogEvent[]> ReadBatchAsync(
ILogEventReader reader,
Func<LogEvent, bool> filter,
int count,
InvalidDataHandling invalidDataHandling)
{
var batch = new List<LogEvent>();
do
{
try
{
while (batch.Count < count && reader.TryRead(out var evt))
while (batch.Count < count)
{
var evt = await reader.TryReadAsync();
if (evt == null)
break;

if (filter == null || filter(evt))
{
batch.Add(evt);
Expand Down
60 changes: 60 additions & 0 deletions src/SeqCli/PlainText/Extraction/ExtractionPatternInterpreter.cs
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
using System;
using System.Collections.Generic;
using System.Linq;
using SeqCli.PlainText.Patterns;

namespace SeqCli.PlainText.Extraction
{
static class ExtractionPatternInterpreter
{
public static NameValueExtractor MultilineMessageExtractor { get; } = new NameValueExtractor(new[]
{
new SimplePatternElement(Matchers.MultiLineMessage, ReifiedProperties.Message)
});

static PatternElement[] CreatePatternElements(ExtractionPattern pattern)
{
if (pattern == null) throw new ArgumentNullException(nameof(pattern));

var patternElements = new PatternElement[pattern.Elements.Count];
for (var i = pattern.Elements.Count - 1; i >= 0; --i)
{
var element = pattern.Elements[i];
switch (element)
{
case LiteralTextPatternExpression text:
patternElements[i] = new SimplePatternElement(Matchers.LiteralText(text.Text));
break;
case CapturePatternExpression capture
when capture.Content is NonGreedyContentExpression ngc:
patternElements[i] = new SimplePatternElement(
Matchers.NonGreedyContent(patternElements.Skip(i + 1).Take(ngc.Lookahead).ToArray()),
capture.Name);
break;
case CapturePatternExpression capture
when capture.Content is MatchTypeContentExpression mtc:
patternElements[i] = new SimplePatternElement(
mtc.Type == null ? Matchers.Token : Matchers.GetByType(mtc.Type),
capture.Name);
break;
case CapturePatternExpression capture
when capture.Content is GroupedContentExpression gc:
patternElements[i] = new GroupedPatternElement(
CreatePatternElements(gc.ExtractionPattern),
capture.Name);
break;
default:
throw new InvalidOperationException($"Element `{element}` not recognized.");
}
}

return patternElements;
}

public static NameValueExtractor CreateNameValueExtractor(ExtractionPattern pattern)
{
var patternElements = CreatePatternElements(pattern);
return new NameValueExtractor(patternElements);
}
}
}
Loading