Skip to content

Extraction patterns#24

Merged
nblumhardt merged 5 commits into
datalust:feat/plainfrom
nblumhardt:pattern-lang
Feb 25, 2018
Merged

Extraction patterns#24
nblumhardt merged 5 commits into
datalust:feat/plainfrom
nblumhardt:pattern-lang

Conversation

@nblumhardt

@nblumhardt nblumhardt commented Feb 23, 2018

Copy link
Copy Markdown
Member

This includes enough functionality to parse log files written by the Serilog.Sinks.File sink using its default output format:

2018-02-21 13:29:00.123 +10:00 [ERR] The operation failed
System.DivideByZeroException: Attempt to divide by zero
  at SomeClass.SomeMethod()

Extraction pattern:

seqcli ingest --extract="{@t:timestamp} [{@l:ident}] {@m:*}{:n}{@x:*}"
  • {@t:timestamp} - @t is the well-known timestamp property; timestamp is a named matcher that will be extended to handle a variety of formats (iso8601dt could be specified here if the file used ISO date times)
  • [ - any text not enclosed in curly braces is a literal part of the pattern
  • {@l:ident} - the level; ident is anything conforming to C-style identifier rules; omitting the :ident would be possible here if the level had whitespace to the right, but since we don't want to pull ] into the level name, we need to be more specific
  • {@m:*} - the matcher * is the "non-greedy content" matcher; it'll read until the pattern following it matches
    • The plan is to extend this so that ** will match until the two following patterns match, and so-on
  • {:n} is an anonymous match - the name is empty, and n is a matcher for newlines
  • {@x:*} - exception; also using the "non-greedy content" matcher, but since the following pattern is end-of-input it'll slurp up anything that's left in the frame

Patterns stop matching when the frame ends, so the pattern can still handle:

2018-02-21 13:29:00.123 +10:00 [DBG] Starting up

I.e. the newline and exception are not mandatory.

There are some comments in ExtractionPatternInterpreterTests suggesting how we might extend this to a couple more cases.

Needs better error messages, and error handling/"report" mode, but does get the end-to-end scenario going :-)

@nblumhardt nblumhardt changed the title [WIP] Extraction patterns Extraction patterns Feb 24, 2018
@nblumhardt

Copy link
Copy Markdown
Member Author

This should be enough to make seqcli ingest usable in a limited manner, calling it ready to merge through feat/plain to dev.

@KodrAus KodrAus left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me! The loop over pattern elements is pretty subtle. Maybe we should document how it works once the design's nailed down more?

@nblumhardt

Copy link
Copy Markdown
Member Author

Thanks @KodrAus!

I've got another changeset in the works that should bring the whole PatternElement thing up to scratch - it's all a bit odd right now because it's mostly been scaffolding while I've wrapped my head around how recursive patterns should work :-)

I think I've got the essentials figured out so that the one "recursive pattern" feature will give us groups (all-or-nothing), captures that include literal text:

Loaded {SignalId:signal-{:ident}}
// Matches:
//     Loaded signal-12345abc
// Extracts:
//    SignalId = "signal-12345abc"

{:{Year:nat}-{Month:nat}}
// Matches
//    2012-02
// Does not match:
//    2012

With some small additions we should also be able to squeeze optionals and alternation out of it, still throwing some ideas around.

Cheers!

@nblumhardt nblumhardt closed this Feb 25, 2018
@nblumhardt nblumhardt reopened this Feb 25, 2018
@nblumhardt nblumhardt merged commit 0144a1e into datalust:feat/plain Feb 25, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants