feat(opamp): compute and store effective_config_hash for collector pipelines by juliaElastic · Pull Request #6872 · elastic/fleet-server

juliaElastic · 2026-04-21T14:25:29Z

Summary

Computes a SHA-256 hash of the OTel collector's effective config and stores it as effective_config_hash in the .fleet-agents index.
Keys are canonicalized via yaml.v3 Marshal (sorts alphabetically) before hashing, so key order never affects the output.
Hash is computed from the raw YAML body (before the sensitive-value redaction pass that produces effective_config).

Closes: https://github.com/elastic/ingest-dev/issues/7064

To verify:

Start ES, Kibana, fleet-server locally https://github.com/elastic/fleet-server/blob/main/docs/opamp.md#setup
Go to Fleet UI, click on Add collector
Copy or download OTel config with OpAMP
Start EDOT or otel-contrib collector with the generated config
Verify that the effective_config_hash is stored in .fleet-agents.

GET .fleet-agents/_search
{
    "_source": ["effective_config_hash", "effective_config"]
}

    "hits": [
      {
        "_index": ".fleet-agents-7",
        "_id": "95d87b10-299e-45ce-a5dd-72e500451e48",
        "_score": 1,
        "_source": {
          "effective_config": {
           ...
          },
          "effective_config_hash": "4e13e7566f2c31e563e235dd8e1a7ae8998a6677fae6124806ca842904f6aa0e"
        }
      },

🤖 Generated with Claude Code

…pelines Computes a SHA-256 hash of the pipeline topology fields (receivers, processors, exporters, connectors, service.pipelines, service.extensions) from the OpAMP effective config and stores it as effective_config_hash in the .fleet-agents index. Non-topology fields (extensions config, service.telemetry, etc.) are excluded so the hash reflects only what the pipeline does, not how it is observed. Keys are canonicalized via yaml.v3 Marshal (which sorts alphabetically) before hashing to ensure identical topologies always produce the same hash regardless of key order. Closes elastic/ingest-dev#7064 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…h_test Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

mergify · 2026-04-21T15:02:05Z

This pull request does not have a backport label. Could you fix it @juliaElastic? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

backport-./d./d is the label to automatically backport to the 8./d branch. /d is the digit
backport-active-all is the label that automatically backports to all active branches.
backport-active-8 is the label that automatically backports to all active minor branches for the 8 major.
backport-active-9 is the label that automatically backports to all active minor branches for the 9 major.

…onfig hash Adds an adjective-noun label (e.g. "swift-hawk") stored alongside effective_config_hash in .fleet-agents. The label is derived from the first two bytes of the SHA-256 hash using two fixed 256-entry wordlists embedded in source, giving 65,536 possible combinations. Because the wordlists are frozen in the codebase the mapping is stable across deployments and dependency updates. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

juliaElastic · 2026-04-22T15:17:10Z

CI fails due to an environmental issue, not related to changes in this PR:

The job, Package x86_64, has been canceled as it failed to get an agent after 5 tries.

The job, Package x86_64 FIPS, has been canceled as it failed to get an agent after 5 tries.

ebeahan · 2026-04-23T01:09:09Z

CI fails due to an environmental issue, not related to changes in this PR

#6881 should address the CI issues.

michel-laterman · 2026-04-23T20:13:07Z

+	if effectiveConfig.ConfigMap == nil || effectiveConfig.ConfigMap.ConfigMap[""] == nil {
+		return "", nil
+	}
+	body := effectiveConfig.ConfigMap.ConfigMap[""].Body


In theory we should hash everything in the configmap, not just the default config

Added support for multiple config files: 9ac32c8

Though I'm not sure if two separate files should hash differently from the same content in one file.

michel-laterman · 2026-04-23T20:15:40Z

+	}
+
+	topology := make(map[string]any)
+	for _, k := range []string{"receivers", "processors", "exporters", "connectors"} {


Why do we need to only use these keys when computing the sha?
How do we ensure that a change to another key is emitted to a collector?

Why do we need to only use these keys when computing the sha?

The issue scope is deliberate: the hash is meant to answer "are two collectors running the same pipeline wiring?", not "are all their configs byte-for-byte identical." The topology keys describe what data flows where. Non-topology fields like extensions.* configs or service.telemetry describe how the pipeline operates (scrape intervals, log levels, endpoints) but don't change the pipeline graph.

The use case is: two collectors that differ only in their health_check endpoint or telemetry verbosity should be considered "the same topology" and produce the same hash, so you can cluster/query them together.

How do we ensure that a change to another key is emitted to a collector?

Changes to other keys are not surfaced via the hash, but they are captured in full in effective_config (the redacted JSON blob already stored in .fleet-agents). If a non-topology field changes, effective_config changes, but effective_config_hash stays the same.

Can you add that to the doc string or in schema.json? It's important to note how this hash should differ from the RemoteConfig hash

Yes, this needs to be differentiated from AgentRemoteConfig.config_hash in the opamp spec.

Reading https://github.com/elastic/ingest-dev/issues/7064 it's not clear why we need this new concept of a topology hash, vs just the hash of the configuration.

The problem this seems like this topology hash would solve is if you had a lot of collector configurations that had superflous components that did not modify the function of the pipeline. This feels like an optimization that can be done later once we confirm we have this problem.

Is there a reason we can't start with just the config_hash from the spec that includes everything, and only introduce an optimized topology hash as an optimization on that once that's proven to be necessary?

I think the reason for the topology hash is to group collectors on the UI that run the same pipelines, @andrewvc might have more context.

Here is Claude's take on it:

Why not just hash the full config?

The OTel collector expands environment variables before it reports EffectiveConfig back to
fleet-server. So a collector config with:

service: telemetry: resource: service.instance.id: "${env:HOSTNAME}"

arrives at fleet-server as:

service: telemetry: resource: service.instance.id: "Julias-MacBook-Pro.local"

Every collector in the same group — running identical pipelines — produces a unique full-config
hash just because of its hostname. Grouping collectors by hash becomes impossible.

That is the concrete reason the topology normalization exists: it strips per-instance runtime
values that vary across collectors in the same group. It is not an optimization for superfluous
components.

When is a full hash sufficient?

If the only requirement is per-collector change detection ("has this collector's config
changed since last check-in?"), a full hash is simpler and still useful. The per-instance
expansion issue does not matter when comparing a single collector against its own history.

If the requirement is topology grouping ("show N collectors running pipeline X"), the
topology hash is necessary.

Removed the topology hashing and using the full effective config to calculate the hash.

@cmacknz with the new opamp metadata around pipeline groups etc. in #6769 I think the hashes will always be unique in the UI so long as well as the case @juliaElastic mentioned. Correct me if I'm wrong but from what I can tell opamp today just uses the bytes of the file right?

If that's the case the flow around "Tell me if the collectors in this metadata group are all running the same config" no longer can be achieved in the UI or API. So, tracking a config rollout would not be possible.

We can still group collectors by the non_identifying_attributes we tell users to populate can't we?

The problem we have with environment variable substitution in those attributes can happen in any part of the collector config, not just service telemetry.

I think this grouping feature is good, I just struggle to see how we can make it work 100% reliably with hashing in Fleet Server.

We are storing these configurations is a search engine that can compute document similarity in queries based on specific fields so I don't know why we need to try to do something based on exact equivalence of fields in a less flexible part of the system design.

It's a good point @cmacknz , I agree the hashing mechanism is not perfect. I hadn't thought of using relevance to match similar configs, it's a neat idea! I assume we would implement that a bit later?

Julia did a good analysis in https://github.com/elastic/ingest-dev/issues/7064#issuecomment-4358616898 of why using relevance isn't completely straightforward either.

The conclusion on that thread is asking our PMs about how they view the need for this so we can figure out what the right trade off is. This seems like a completely sensible feature to have, but one where a reliable implementation is much harder than we would like it to be.

Co-authored-by: Michel Laterman <82832767+michel-laterman@users.noreply.github.com>

…entry HashEffectiveConfig now iterates all named config files in the OpAMP ConfigMap in sorted key order, feeding each file's topology fields and its key name into a single SHA-256. Previously only the default ("") file was considered. Topology extraction is refactored into extractTopologyFields for per-file reuse. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

michel-laterman · 2026-04-24T17:29:01Z

+	}
+
+	topology := make(map[string]any)
+	for _, k := range []string{"receivers", "processors", "exporters", "connectors"} {


Can you add that to the doc string or in schema.json? It's important to note how this hash should differ from the RemoteConfig hash

ycombinator · 2026-04-27T17:19:43Z

+        },
+        "effective_config_label": {
+          "description": "Human-readable adjective-noun label derived from the first two bytes of effective_config_hash",
+          "type": "string"


This feels like a presentation concern. Should it be generated from the effective_config_hash in the UI code?

Agree this feels out of place in fleet-server an definitely feels like a presentation concern. This could be calculated in the UI.

If we move it to the UI, it either has to be calculated dynamically or the .fleet-agents docs has to be updated from the UI to store it.

If we calculate it dynamically, it should be a runtime field to support filtering on it.

Removed the effective_config_label changes from here and added to kibana: elastic/kibana#265005

Is there a large performance concern with this?

I would like to keep fleet-server in the architectural role an action router, this is getting away from that much more than other things we've done. I can also imagine other requirements that would require the UI to write this name.

I could imagine that users might ask for the ability to customize the names of their configurations in the future, like they do for agent policies. The friendly label is just an initial placeholder before we can do that.

If using a runtime field, it's a small overhead when querying agents, I would start with that.

If kibana wrote the label to .fleet-agents, there is a chance of write conflicts with fleet-server, and we would need to poll from kibana to notice changes in the effective config hash to update the label.

we would need to poll from kibana to notice changes in the effective config hash to update the label.

Once a label is set, would we want to keep changing it? As I'm understanding it — and let me know if I'm off here — the hash and the label are answering two different questions for a pipeline:

hash: has the topology of this pipeline changed?

label: is this conceptually the same pipeline as before?

I mean the label has to stay consistent with the effective config hash, and the effective config can change on check-in if the collector config changes.

We are moving away from the topology hash, see #6872 (comment)

The way I view this is that we are trying to build a concept similar to that of an agent policy out of the per agent effective configurations.

That is a configuration shared by multiple agents doing the same job in the same role with a human readable and assignable name.

The problem is the per agent configurations can all vary per agent because parts of the configuration are determined at runtime (just like Elastic Agent) so doing this is not totally trivial.

Co-authored-by: Michel Laterman <82832767+michel-laterman@users.noreply.github.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

cmacknz · 2026-04-29T20:53:54Z

+			initialOpts = append(initialOpts, checkin.WithEffectiveConfigHash(configHash))
+		}
+
+		if defaultParsed, ok := parsedFiles[""]; ok {


We only include the effective configuration if it uses a single configuration file? If the collector defines multiple configurations this does nothing?

changed to handle multiple files

cmacknz · 2026-04-29T20:54:55Z

+
+		if defaultParsed, ok := parsedFiles[""]; ok {
+			redactSensitive(defaultParsed)
+			effectiveConfigBytes, err := json.Marshal(defaultParsed)


HashEffectiveConfig marshals each individual configuration JSON already and we throw those bytes away, seems like we should reuse them?

cmacknz · 2026-04-29T21:00:15Z


 	if aToS.EffectiveConfig != nil {
-		effectiveConfigBytes, err := ParseEffectiveConfig(aToS.EffectiveConfig)
+		parsedFiles, err := parseConfigFiles(aToS.EffectiveConfig)


It would help if you had unit tests showing that this handles all the AgentConfigMap arrangements we can have no files, one file, multiple files, non-yaml files, etc

https://github.com/open-telemetry/opamp-spec/blob/2da595f59a0016abe67b4d44aa52afa3549f8742/proto/opamp.proto#L1032-L1039

It's not completely trivial to do this hash so if we don't have a concrete use for it in the UI we could always defer it. I think we have already concluded it's not a great way to group collector configurations together.

added unit tests

I moved the PR to draft

- Skip non-text/yaml config files per OpAMP spec (content_type check) - Store all config files, not just the single unnamed one; multi-file and named-file configs are stored as {"name": content} keyed maps - Return canonical JSON bytes from HashEffectiveConfig so the storage path reuses them without a second marshal pass; redaction now happens before hashing so the returned bytes are already redacted - Add TestUpdateAgentEffectiveConfigMap covering nil config, empty map, single unnamed file, single named file, multiple files, non-YAML files, mixed YAML/non-YAML, empty body, and sensitive field redaction Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

juliaElastic added enhancement New feature or request backport-skip Skip notification from the automated backport with mergify skip-changelog labels Apr 21, 2026

fix lint: extract duplicate string literals to constants in configHas…

6990501

…h_test Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

juliaElastic mentioned this pull request Apr 21, 2026

Add effective_config_hash keyword mapping to fleet-agents index elastic/elasticsearch#146975

Merged

2 tasks

juliaElastic marked this pull request as ready for review April 22, 2026 07:17

juliaElastic requested a review from a team as a code owner April 22, 2026 07:17

juliaElastic requested review from blakerouse and samuelvl April 22, 2026 07:17

juliaElastic and others added 2 commits April 22, 2026 10:51

Merge branch 'main' into hash-effective-config

7e2ffd2

juliaElastic mentioned this pull request Apr 22, 2026

[Fleet] Add effective_config_hash and effective_config_label to UI elastic/kibana#265005

Draft

10 tasks

juliaElastic requested a review from a team April 22, 2026 11:37

Merge branch 'main' into hash-effective-config

bbd2bf0

juliaElastic requested review from michel-laterman and ycombinator April 23, 2026 08:01

michel-laterman reviewed Apr 23, 2026

View reviewed changes

juliaElastic and others added 3 commits April 24, 2026 09:01

Update internal/pkg/api/configLabel_test.go

d1f6866

Co-authored-by: Michel Laterman <82832767+michel-laterman@users.noreply.github.com>

Update internal/pkg/api/handleOpAMP.go

95b387e

Co-authored-by: Michel Laterman <82832767+michel-laterman@users.noreply.github.com>

pierrehilbert added the Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team label Apr 24, 2026

juliaElastic requested a review from michel-laterman April 24, 2026 10:53

michel-laterman previously approved these changes Apr 24, 2026

View reviewed changes

ycombinator reviewed Apr 27, 2026

View reviewed changes

Comment thread internal/pkg/api/configHash.go Outdated

ycombinator reviewed Apr 27, 2026

View reviewed changes

Update internal/pkg/api/configHash.go

79ceea8

Co-authored-by: Michel Laterman <82832767+michel-laterman@users.noreply.github.com>

juliaElastic dismissed michel-laterman’s stale review via 79ceea8 April 28, 2026 07:02

macroscopeapp Bot reviewed Apr 28, 2026

View reviewed changes

Comment thread internal/pkg/api/configHash.go Outdated

juliaElastic added 2 commits April 28, 2026 09:08

address review comments

c9551fd

add comment about topology hash

fe59550

juliaElastic requested review from cmacknz, michel-laterman and ycombinator April 28, 2026 07:22

juliaElastic and others added 3 commits April 28, 2026 11:40

update schema.go

a8ea5cb

remove effective_config_label from fleet-server, move to UI

cbdc2ba

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

hash full effective config instead of topology-only subset

40d76d8

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

cmacknz reviewed Apr 28, 2026

View reviewed changes

Comment thread internal/pkg/api/configHash.go Outdated

parse effective config YAML once, share between hash and storage

0d15d08

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

juliaElastic requested a review from cmacknz April 29, 2026 07:36

fix: run goimports on handleOpAMP.go

5b01b6d

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

cmacknz reviewed Apr 29, 2026

View reviewed changes

Comment thread internal/pkg/api/configHash.go

cmacknz reviewed Apr 29, 2026

View reviewed changes

juliaElastic marked this pull request as draft April 30, 2026 07:55

Conversation

juliaElastic commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

mergify Bot commented Apr 21, 2026

Uh oh!

juliaElastic commented Apr 22, 2026

Uh oh!

ebeahan commented Apr 23, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Why not just hash the full config?

When is a full hash sufficient?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

juliaElastic Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cmacknz Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

juliaElastic commented Apr 21, 2026 •

edited

Loading

juliaElastic Apr 28, 2026 •

edited

Loading

cmacknz Apr 29, 2026 •

edited

Loading