Skip to content

Add ExecutionPlan::apply_expressions()#20337

Merged
adriangb merged 21 commits intoapache:mainfrom
LiaCastaneda:lia/add-expressions-function-physical-plan
Mar 2, 2026
Merged

Add ExecutionPlan::apply_expressions()#20337
adriangb merged 21 commits intoapache:mainfrom
LiaCastaneda:lia/add-expressions-function-physical-plan

Conversation

@LiaCastaneda
Copy link
Contributor

@LiaCastaneda LiaCastaneda commented Feb 13, 2026

Which issue does this PR close?

Needed for datafusion-contrib/datafusion-distributed#180

Rationale for this change

Right now, there is no easy way to know if a given node in the plan holds Dynamic Filters or to traverse all physical expressions in an ExecutionPlan. This PR implements apply_expressions() that visits all PhysicalExprs inside an ExecutionPlan using a callback pattern, including DynamicFilterPhysicalExpr. This is similar to the existing apply_expressions() API for LogicalPlan.

What changes are included in this PR?

  • Added apply_expressions() method to the ExecutionPlan trait with no default implementation, forcing all implementors to explicitly handle their expressions
  • Uses a visitor pattern with FnMut(&dyn PhysicalExpr) -> Result<TreeNodeRecursion> to avoid allocations
  • Implemented apply_expressions() for all ExecutionPlan implementations
  • Also added apply_expressions() to FileSource and DataSource traits (required, no default)

Are these changes tested?

Yes, added a test that traverses the plan and discovers dynamic filters using apply_expressions().

Are there any user-facing changes?

Yes, the new API ExecutionPlan::apply_expressions(), FileSource::apply_expressions(), and DataSource::apply_expressions().

@github-actions github-actions bot added core Core DataFusion crate datasource Changes to the datasource crate physical-plan Changes to the physical-plan crate labels Feb 13, 2026
@LiaCastaneda LiaCastaneda changed the title Implement expressions() Implement ExecutionPlan::expressions() Feb 13, 2026
Copy link
Contributor

@adriangb adriangb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense to me. It mirrors the APIs for Logical expressions, is clean and a relatively small change.

But since this is an API change let's leave this open for a couple of days and get at least 1 more approval from a committer before moving forward with it.

// Check expressions from this node
let exprs = plan.expressions();
for expr in exprs.iter() {
if let Some(_df) = expr.as_any().downcast_ref::<DynamicFilterPhysicalExpr>() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this expr.apply() for nested expressions? Should it deduplicate Arc'ed copies?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this expr.apply() for nested expressions?

iiuc the LogicalPlan counterpart returns just the top level expressions.

Should it deduplicate Arc'ed copies?

yeah deduping is a good idea

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was referring to this helper function, not the general API. The general API should only expose top level expressions and do no deduplication.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

about deduping, the objective of this test is to prove how many times the Dynamic Filter appears in the plan and if each node is able count how many dynamic filters it contains, if we dedup then we would count it once only

/// joins).
fn children(&self) -> Vec<&Arc<dyn ExecutionPlan>>;

/// Returns all expressions (non-recursively) evaluated by the current
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This API forces an allocation and also cloning all the PhysicalExprs -- what would you think about adding apply_expressions and map_expressions methods to parallel the ones on LogicalPlan instead?

Maybe you can start with just the apply_expressions one in this PR

I think we should probably also not provide a default implementation to force all implementations to properly visit the expressions

If we provide this default implementation, then downstream implementors will likely not implement the API and if something in the datafusion core depends on the API in the future it will be hard to debug what is going on

Copy link
Contributor Author

@LiaCastaneda LiaCastaneda Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should probably also not provide a default implementation to force all implementations to properly visit the expressions
If we provide this default implementation, then downstream implementors will likely not implement the API and if something in the datafusion core depends on the API in the future it will be hard to debug what is going on

makes sense, I included a default implementation because didn't want to incroduce a breaking change but is better to be safe and force the implementation 👍

what would you think about adding apply_expressions and map_expressions methods to parallel the ones on LogicalPlan instead?

nice catch, I missed the allocation fact, I will give it a try

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @LiaCastaneda and @adriangb

I am a little worried about the default implementation here --

I also think a slightly different API might be worth considering

@adriangb
Copy link
Contributor

adriangb commented Feb 13, 2026

Thanks for reviewing Andrew - that's very good feedback that I missed in my review. I agree that apply_expressions(|expr: &Arc<dyn PhysicalExpr>| ...) would be a better API.

@LiaCastaneda
Copy link
Contributor Author

Thanks both for the reviews! I will work on your suggestion @alamb

@LiaCastaneda LiaCastaneda marked this pull request as draft February 16, 2026 07:45
@github-actions github-actions bot added optimizer Optimizer rules catalog Related to the catalog crate labels Feb 16, 2026
@LiaCastaneda LiaCastaneda force-pushed the lia/add-expressions-function-physical-plan branch from 10c7c28 to 51dd8d0 Compare February 16, 2026 08:40
@LiaCastaneda LiaCastaneda changed the title Implement ExecutionPlan::expressions() Implement ExecutionPlan::apply_expressions() Feb 16, 2026
@github-actions github-actions bot added the ffi Changes to the ffi crate label Feb 16, 2026
@LiaCastaneda LiaCastaneda force-pushed the lia/add-expressions-function-physical-plan branch from 938297d to bd5b02f Compare February 16, 2026 09:19
@LiaCastaneda LiaCastaneda force-pushed the lia/add-expressions-function-physical-plan branch from bd5b02f to 88730b0 Compare February 16, 2026 09:21
@LiaCastaneda LiaCastaneda marked this pull request as ready for review February 16, 2026 09:27
let mut tnr = TreeNodeRecursion::Continue;
if let Some(ordering) = self.cache.output_ordering() {
for sort_expr in ordering {
tnr = tnr.visit_sibling(|| f(sort_expr.expr.as_ref()))?;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It took me a while to understand visit_sibling -- iiuc it basically short circuits the loop for us. Oncefreturns Stop, every subsequent tnr.visit_sibling(...) call just skips the next f and passes Stop through, so we don't need a manual match + early return after each call. I added a small test in ExecutionPlan to test it works when it returns Stop

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth putting in as a code comment? I always struggle to follow the logic of these tree traversals. It makes sense once you get it but yeah they're hard to grok - having a comment that tries to explain in words what is going on may be helpful.

fn apply_expressions(
&self,
f: &mut dyn FnMut(
&dyn datafusion::physical_plan::PhysicalExpr,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this / could this be &Arc<dyn PhysicalExpr>?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would bring similiar issues to #19937. It will be hard to do operations on the expression like downcasting.

Copy link
Contributor

@adriangb adriangb Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those that fail to learn from history are doomed to repeat it... seems like I flunked the class 😆

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tbf I had a rough time upgrading DF because of this, so that’s mainly why I remember it so well haha

let mut tnr = TreeNodeRecursion::Continue;
if let Some(ordering) = self.cache.output_ordering() {
for sort_expr in ordering {
tnr = tnr.visit_sibling(|| f(sort_expr.expr.as_ref()))?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth putting in as a code comment? I always struggle to follow the logic of these tree traversals. It makes sense once you get it but yeah they're hard to grok - having a comment that tries to explain in words what is going on may be helpful.


[#19692]: https://github.com/apache/datafusion/issues/19692

### `ExecutionPlan::apply_expressions` is now a required method
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LiaCastaneda the 53.0.0 branch has been cut, so I don't think this will make it in since it's decidedly a new feature. Is that okay with you? If so let's move this section to 54.0.0. I think it's always good to merge a new API right after a release, it gives us time to make non breaking changes if we find issues 2 weeks in.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ohh, I was not aware of it, no problem from my side, I will add it to 54.0.0

Copy link
Contributor

@adriangb adriangb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor comments but I think we can merge this whenever you think it's ready Lía

@LiaCastaneda
Copy link
Contributor Author

There are some conflicts again, wil fix them...

@adriangb
Copy link
Contributor

adriangb commented Mar 2, 2026

There are some conflicts again, wil fix them...

Thank you and sorry for the delays causing conflicts and bump to v54

@LiaCastaneda
Copy link
Contributor Author

no worries, they were not too complex to solve, I added docs/source/library-user-guide/upgrading/54.0.0.md does it look ok? (it's the first time I add a file from scratch in DF)

if so, I think the PR is good to go

@adriangb adriangb added this pull request to the merge queue Mar 2, 2026
Merged via the queue into apache:main with commit a5f490e Mar 2, 2026
33 checks passed
@askalt
Copy link
Contributor

askalt commented Mar 11, 2026

Hi! There is a patch #20009 that adds a more expressive API by splitting responsibilities into:

  1. reading expressions
  2. writing expressions

This approach not only helps to check for specific types of expressions in the plan but also enables replacing them, which extends the number of contexts where the API can be used. It looks a bit confusing to have all these methods together (apply_expressions, physical_expressions and with_physical_expressions), so with this, we can implement apply_expressions as a simple helper, like:

pub fn visit_expressions(
    plan: &dyn ExecutionPlan,
    f: &mut dyn FnMut(&dyn PhysicalExpr) -> Result<TreeNodeRecursion>,
) -> Result<TreeNodeRecursion> {
    let mut tnr = TreeNodeRecursion::Continue;
    for expr in plan.physical_expressions() {
        tnr = tnr.visit_sibling(|| f(expr.as_ref()))?;
    }
    Ok(tnr)
}

@LiaCastaneda
Copy link
Contributor Author

👋 Hey, I was not aware there was already an initiative to build a similar API. This PR implements apply_expressions, which mirrors LogicalPlan::apply_expressions and is intended to be read only and allocation free. Ideally, we should also implement map_expressions (mirroring LogicalPlan::map_expressions) to support modifying PhysicalExprs and rebuilding the node at the same time. Would both of these APIs cover your use case?

@askalt
Copy link
Contributor

askalt commented Mar 12, 2026

Would both of these APIs cover your use case?

Yes, it would be nice to have a writing API. The important property we need is that map_expressions should not recompute plan properties, assuming that they are not changed (user responsibility), i.e. we avoid a typical plan ::new() call in this case. Is there an issue or branch to track the implementation?

@LiaCastaneda
Copy link
Contributor Author

I think we can reuse the properties of the rest of the plan (avoiding ::new()), similar to how LogicalPlan::map_expressions does it.

I created this issue #20899. I haven't started working on it yet and probably won't have much time this week, so I'll likely give it a try next week, but feel free to take it if you'd like

@LiaCastaneda
Copy link
Contributor Author

Actually, now that I think about it, there are some cases where we would need to recompute properties right? for example, if a user changes an expression from a > something to a < something. How do we specify in this API whether we want to recompute properties or not? should map_expressions have a recompute_properties: bool argument? 🤔

@askalt
Copy link
Contributor

askalt commented Mar 12, 2026

Actually, now that I think about it, there are some cases where we would need to recompute properties right? for example, if a user changes an expression from a > something to a < something. How do we specify in this API whether we want to recompute properties or not? should map_expressions have a recompute_properties: bool argument? 🤔

Yes, it may be useful to explicitly ask for properties re-computation. And it seems for me that by default the safest option is to force properties to be re-computed.

Another way to satisfy it is to introduce "args struct" like:

struct MapExpressionsArgs<'a> {
    f: &'a dyn FnMut(&Arc<dyn PhysicalExpr>) -> Result<Arc<dyn PhysicalExpr>>,
    preserve_properties: bool,
}

Like is done here:

/// Arguments for scanning a table with [`TableProvider::scan_with_args`].
#[derive(Debug, Clone, Default)]
pub struct ScanArgs<'a> {
filters: Option<&'a [Expr]>,
projection: Option<&'a [usize]>,
limit: Option<usize>,
}

to not add a bool argument each time when the method semantics is extended. But maybe this is overkill here and bool parameter will be enough.

@LiaCastaneda
Copy link
Contributor Author

lets continue this discussion in the issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api change Changes the API exposed to users of the crate catalog Related to the catalog crate core Core DataFusion crate datasource Changes to the datasource crate documentation Improvements or additions to documentation ffi Changes to the ffi crate optimizer Optimizer rules physical-plan Changes to the physical-plan crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Access DynamicFilterPhysicalExpr expressions from outside the plan

4 participants