Skip to content

Commit 59bcdb5

Browse files
aalteresjohnharr-intel
authored andcommitted
drm/i915/guc: Don't update engine busyness stats too frequently
Using two different types of workoads, it was observed that guc_update_engine_gt_clks was being called too frequently and/or causing a CPU-to-lmem bandwidth hit over PCIE. Details on the workloads and numbers are in the notes below. Background: At the moment, guc_update_engine_gt_clks can be invoked via one of 3 ways. ni#1 and ni#2 are infrequent under normal operating conditions: 1.When a predefined "ping_delay" timer expires so that GuC- busyness can sample the GTPM clock counter to ensure it doesn't miss a wrap-around of the 32-bits of the HW counter. (The ping_delay is calculated based on 1/8th the time taken for the counter go from 0x0 to 0xffffffff based on the GT frequency. This comes to about once every 28 seconds at a GT frequency of 19.2Mhz). 2.In preparation for a gt reset. 3.In response to __gt_park events (as the gt power management puts the gt into a lower power state when there is no work being done). Root-cause: For both the workloads described farther below, it was observed that when user space calls IOCTLs that unparks the gt momentarily and repeats such calls many times in quick succession, it triggers calling guc_update_engine_gt_clks as many times. However, the primary purpose of guc_update_engine_gt_clks is to ensure we don't miss the wraparound while the counter is ticking. Thus, the solution is to ensure we skip that check if gt_park is calling this function earlier than necessary. Solution: Snapshot jiffies when we do actually update the busyness stats. Then get the new jiffies every time intel_guc_busyness_park is called and bail if we are being called too soon. Use half of the ping_delay as a safe threshold. NOTE1: Workload1: IGTs' gem_create was modified to create a file handle, allocate memory with sizes that range from a min of 4K to the max supported (in power of two step-sizes). Its maps, modifies and reads back the memory. Allocations and modification is repeated until total memory allocation reaches the max. Then the file handle is closed. With this workload, guc_update_engine_gt_clks was called over 188 thousand times in the span of 15 seconds while this test ran three times. With this patch, the number of calls reduced to 14. NOTE2: Workload2: 30 transcode sessions are created in quick succession. While these sessions are created, pcm-iio tool was used to measure I/O read operation bandwidth consumption sampled at 100 milisecond intervals over the course of 20 seconds. The total bandwidth consumed over 20 seconds without this patch was measured at average at 311KBps per sample. With this patch, the number went down to about 175Kbps which is about a 43% savings. Signed-off-by: Alan Previn <alan.previn.teres.alexis@intel.com> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220623023157.211650-2-alan.previn.teres.alexis@intel.com
1 parent bcb9aa4 commit 59bcdb5

2 files changed

Lines changed: 21 additions & 0 deletions

File tree

drivers/gpu/drm/i915/gt/uc/intel_guc.h

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -230,6 +230,14 @@ struct intel_guc {
230230
* @shift: Right shift value for the gpm timestamp
231231
*/
232232
u32 shift;
233+
234+
/**
235+
* @last_stat_jiffies: jiffies at last actual stats collection time
236+
* We use this timestamp to ensure we don't oversample the
237+
* stats because runtime power management events can trigger
238+
* stats collection at much higher rates than required.
239+
*/
240+
unsigned long last_stat_jiffies;
233241
} timestamp;
234242

235243
#ifdef CONFIG_DRM_I915_SELFTEST

drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1314,6 +1314,8 @@ static void __update_guc_busyness_stats(struct intel_guc *guc)
13141314
unsigned long flags;
13151315
ktime_t unused;
13161316

1317+
guc->timestamp.last_stat_jiffies = jiffies;
1318+
13171319
spin_lock_irqsave(&guc->timestamp.lock, flags);
13181320

13191321
guc_update_pm_timestamp(guc, &unused);
@@ -1386,6 +1388,17 @@ void intel_guc_busyness_park(struct intel_gt *gt)
13861388
return;
13871389

13881390
cancel_delayed_work(&guc->timestamp.work);
1391+
1392+
/*
1393+
* Before parking, we should sample engine busyness stats if we need to.
1394+
* We can skip it if we are less than half a ping from the last time we
1395+
* sampled the busyness stats.
1396+
*/
1397+
if (guc->timestamp.last_stat_jiffies &&
1398+
!time_after(jiffies, guc->timestamp.last_stat_jiffies +
1399+
(guc->timestamp.ping_delay / 2)))
1400+
return;
1401+
13891402
__update_guc_busyness_stats(guc);
13901403
}
13911404

0 commit comments

Comments
 (0)