feat: add basic gateway tracing#8595
Conversation
66b4fe3 to
5d08759
Compare
5cab996 to
9b9691c
Compare
791c811 to
2a4398d
Compare
| return nil, err | ||
| } | ||
|
|
||
| span.SetAttributes( |
There was a problem hiding this comment.
Allocations can be avoided here (and elsewhere when attributes are used) by only setting them when the span is recording. e.g.
if span.IsRecording() {
// set stuffs
}There was a problem hiding this comment.
Admittedly, this is tedious for some of the one-liners that start a span and set attributes together; I wonder if there is an alternative...
There was a problem hiding this comment.
I also wonder if it's worth adding an Attributes method to some of these types like settings, for example. It may be more appropriate for a follow-on.
There was a problem hiding this comment.
My thought process here was that we already know we have way bigger fish to fry in terms of memory allocations, so I wasn't too concerned with this. Hopefully one day we get to the point where these matter...in the meantime, we can enjoy the simpler code
iand
left a comment
There was a problem hiding this comment.
Looks good and I've been using this branch to study gateway performance characteristics.
|
|
||
| // Span starts a new span using the standard IPFS tracing conventions. | ||
| func Span(ctx context.Context, componentName string, spanName string, opts ...traceapi.SpanStartOption) (context.Context, traceapi.Span) { | ||
| return otel.Tracer("go-ipfs").Start(ctx, fmt.Sprintf("%s.%s", componentName, spanName), opts...) |
There was a problem hiding this comment.
I think the argument to otel.Tracer should be an empty string to use the default instrumentating library name as per the documentaion. Note the distinction made between Instrumented Library and Instrumenting Library
There was a problem hiding this comment.
My understanding is that this tracer name ("instrumentation name") is similar to our logging "subsystems", in that the application can use these to know which component a span comes from, and use that to decide things like whether to record the span at all (see e.g. https://opentelemetry.io/docs/instrumentation/java/automatic/agent-config/#suppressing-specific-agent-instrumentation and https://github.com/open-telemetry/opentelemetry-go/blob/main/sdk/trace/span_processor_filter_example_test.go#L53 ). E.g. if an application uses this repo as a library, it can decide to drop all spans from go-ipfs using this instrumentation name. If we use empty string here, the default go.opentelemetry.io/otel/sdk/tracer will be used. See also https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/api.md#get-a-tracer .
There was a problem hiding this comment.
I think it is intended that the default tracer is used since we aren't changing the format of the traces. We already record the name of the system and the software used to record the trace in the Resource section. See, for example, this section of one trace. The otel documentation is not super clear but I feel that the presence of Version and SchemaURL fields in the InstrumentationLibrary object imply that this is related to the type of trace that has been produced.
"Resource": [
{
"Key": "service.name",
"Value": {
"Type": "STRING",
"Value": "go-ipfs"
}
},
{
"Key": "service.version",
"Value": {
"Type": "STRING",
"Value": "0.13.0-dev"
}
},
{
"Key": "telemetry.sdk.language",
"Value": {
"Type": "STRING",
"Value": "go"
}
},
{
"Key": "telemetry.sdk.name",
"Value": {
"Type": "STRING",
"Value": "opentelemetry"
}
},
{
"Key": "telemetry.sdk.version",
"Value": {
"Type": "STRING",
"Value": "1.6.1"
}
}
],
"InstrumentationLibrary": {
"Name": "go-ipfs",
"Version": "",
"SchemaURL": ""
}There was a problem hiding this comment.
I think the service is more at the "application" scope, so that we can tell from which system the spans of a distributed trace originate.
I'm also new to these concepts, so I've generally been mimicking the provided examples, which seem to use the same distinction: https://github.com/open-telemetry/opentelemetry-go/tree/main/example
There was a problem hiding this comment.
Fair enough. I don't feel strongly about it. It's possible that the SchemaURL/Version fields are intended to describe the schema of the tags that the library is writing.
There was a problem hiding this comment.
Yeah I'm not sure what those are for. This is all clearly communicated as experimental so that we have some wiggle room to adjust these over time as we learn more.
| } | ||
|
|
||
| // Span starts a new span using the standard IPFS tracing conventions. | ||
| func Span(ctx context.Context, componentName string, spanName string, opts ...traceapi.SpanStartOption) (context.Context, traceapi.Span) { |
There was a problem hiding this comment.
This is a candidate for moving to a new library package. As we add tracing to more base libraries we will repeat this multiple times and we can add more helpers to keep things consistent and to reduce the number of imports needed to start a span.
I started one here https://github.com/iand/go-ipfs-tracing as part of my experiments with this branch and instrumenting other related libraries.
There was a problem hiding this comment.
Yeah that makes sense, I was waiting to create something like this until I used OpenTelemetry in a few repos to understand our usage patterns better. I'm also hesitant to make a tracing "kitchen sink", since it seems like a place that would accumulate dependencies over time and significantly contribute to our dependency hell.
| // Span names follow a convention of <Component>.<Span>, some examples: | ||
| // | ||
| // - component=Gateway + span=Request -> Gateway.Request | ||
| // - component=CoreAPI.PinAPI + span=Verify.CheckPin -> CoreAPI.PinAPI.Verify.CheckPin |
There was a problem hiding this comment.
Can we tighten this convention up? I think we need to define what a component is and what convention we have for naming the span - is it the type name and method, or should it include the package somehow? We'll need this as soon as we start adding tracing to other ipfs libraries.
One convention could be that the component is the repo name go-ipfs/go-blockservice/go-namesys and the Span name is the function name, prefixed by the typename if it's a method. Additional spans within a function would use a name derived from this base. The convention would lead to names like:
go-ipfs.PinAPI.Addgo-blockservice.Session.GetBlockgo-namesys.ResolveIPNS
We could include package name but it would make the names much longer for no great benefit, go-ipfs.PinAPI.Add could be ``go-ipfs.core.coreapi.PinAPI.Add` (would also be extra maintenance burden to keep names in sync when refactoring)
This convention is not the prettiest but it would work consistently across all our libraries and gives signposts for developers looking for where a particular span was created.
There was a problem hiding this comment.
This would need to be modified if we keep the library name in the Instrumentation Library field (although I don't know how/if that surfaces in the Jaeger UI for example)
There was a problem hiding this comment.
I'd like to protect our ability to refactor repos, modules, and packages without breaking tracing consumers, so I'd like to avoid coupling it to those as much as possible. Much like we do with logging subsystems (I toyed with the idea of coupling these with logging subsystems but gave up.) The downside though is that it makes it harder to find the code origin of traces, but I still don't think that's very hard with a quick GitHub search. Or we could add some middleware that adds an attribute to all spans with the source file + line info of where the span originated?
Adds more visibility into how long generic lookup takes vs producing specific response type.
|
The only CI test that fails is go-ipfs-http-client, which is unrelated to changes from this PR and being fixed in #8838 Resolved conflicts, merging to avoid doing this again. |
There was a problem hiding this comment.
LGTM, we can adjust / add more spans in future PRs.
- Added Jaeger UI instruction to
docs/environment-variables.md, should be really useful for newcomers / gateway admins. - Drafted a blurb for go-ipfs 0.13 Release Notes in #8595 (comment)
@lidel's blurb for Release Notes (wip)
Added spans
Rateway.Requesthttp.targethas "original" requestResolvedPathhas immutable IPFS path behind the original request (matters when/ipns/)Gateway.ServeRawBlockGateway.ServeCarGateway.ServeUnixFsGateway.ServeFileGateway.ServeDirectoryHow to use Jaeger UI
One can use the
jaegertracing/all-in-oneDocker image to run a full Jaegerstack and configure go-ipfs to publish traces to it (here, in an ephemeral
container):
Then, in other terminal, start go-ipfs with Jaeger tracing enabled:
Finally, the Jaeger UI is available at http://localhost:16686
Demo