Skip to content

Nondeterministic behaviour when scanning in parallel #444

Description

@rafl

Describe the bug
The sequential analyzer always yields the same results (if nothing changed in what we're scanning). The parallel analyzer does not.

To Reproduce
Steps to reproduce the behavior:

  • use gdu as a library like this:
type Opts struct {
        minSize int64
        minAge  time.Duration
        now     time.Time
}

type Result map[string]Meta

type Meta struct {
        Mtime time.Time
        Size  int64
}

func Dir(p string, o Opts) (Result, error) {
        s := analyze.CreateSeqAnalyzer()
        ms, err := device.Getter.GetMounts()
        if err != nil {
                return nil, err
        }
        r := s.AnalyzeDir(p, ignore(device.GetNestedMountpointsPaths(p, ms)), true)                                      <-s.GetDone()
        r.UpdateStats(fs.HardLinkedItems{})
        return wannaDel(o.now.Add(-o.minAge), r, o), nil
}

func wannaDel(t time.Time, r fs.Item, o Opts) Result {
        type rec func(rec, fs.Item) Result
        walk := func(walk rec, f fs.Item) Result {
                p, mt, s := f.GetPath(), f.GetMtime(), f.GetSize()
                if mt.Before(t) && s >= o.minSize && p != r.GetPath() {
                        return Result{p: Meta{mt, s}}
                }
                ret := Result{}
                for _, c := range f.GetFiles() {
                        maps.Copy(ret, walk(walk, c))
                }
                return ret
        }

        return walk(walk, r)
}

func ignore(ps []string) func(string, string) bool {
        ign := map[string]struct{}{}
        for _, p := range ps {
                ign[p] = struct{}{}
        }
        return func(_, p string) bool { _, ok := ign[p]; return ok }
}
  • run Dir(p, o) with with the same parameters many times on a reasonably large read-only file-system
    Observe that the results for every run will be identical.
  • replace CreateSeqAnalyzer with CreateAnalyzer
  • re-run the previous test many times with the updated code
    Observe that the results of different runs sometimes differ. Many runs exactly match the results of using the sequential analyzer, but many others do not (usually only with minor differences).

Expected behavior
Sequential and parallel analyzers should not differ in output.

System (please complete the following information):

  • OS:
    • Ubuntu 24.04 (WSL2)
    • Ubuntu 24.04 (bare metal)
  • Version: Latest git

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions