[FLINK-39190] Relocate Jackson in filesystem plugin POMs to prevent classpath conflicts#27700
[FLINK-39190] Relocate Jackson in filesystem plugin POMs to prevent classpath conflicts#27700driessamyn-sparta wants to merge 1 commit intoapache:masterfrom
Conversation
All Hadoop-based filesystem plugins (s3-hadoop, s3-presto, azure, gs, oss) pull in jackson-databind transitively via AWS SDK or Hadoop, but do not relocate it. If the plugin JAR ends up on the job classpath, its older unshaded Jackson shadows any newer version used by the job causing version incompatibilities. Add a com.fasterxml.jackson relocation to the maven-shade-plugin configuration of each affected module, using a module-specific shaded package (e.g. org.apache.flink.fs.s3hadoop.shaded.jackson), consistent with how flink-shaded-jackson handles the core runtime.
|
@driessamyn-sparta Thnx for the patch. I do have some concerns, because what happens if the internal implementation of AWS SDK v1 relies on Jackson somewhere? Relocating Jackson would then break then the implementation. |
Thanks, @MartijnVisser My understanding is that given we shade the AWS SDK, relocation should work. This is hard to verify using a unit test as the test would need to operate on the packaged jar. JAR=$(find flink-filesystems/flink-s3-fs-hadoop/target \
-name "flink-s3-fs-hadoop-*.jar" \
-not -name "original-*" \
-not -name "*sources*" | head -1)
echo "Inspecting: $JAR"
echo ""
echo "Check 1: Relocated Jackson classes are present ==="
jar tf "$JAR" | grep "org/apache/flink/fs/s3hadoop/shaded/jackson/databind/ObjectMapper.class"
echo ""
echo "Check 2: Original com/fasterxml/jackson classes are NOT present ==="
ORIGINAL=$(jar tf "$JAR" | grep "^com/fasterxml/jackson" || true)
if [ -z "$ORIGINAL" ]; then
echo "PASS: no com/fasterxml/jackson classes found in shaded jar"
else
echo "FAIL: unrelocated Jackson classes found:"
echo "$ORIGINAL" | head -5
fi
echo ""
echo "Check 3: AWS SDK bytecode references relocated Jackson ==="
MARKER="org/apache/flink/fs/s3hadoop/shaded/jackson/databind/ObjectMapper"
if unzip -p "$JAR" "com/amazonaws/util/json/Jackson.class" 2>/dev/null | grep -qa "$MARKER"; then
echo "PASS: com.amazonaws.util.json.Jackson references relocated Jackson ($MARKER)"
else
echo "FAIL: relocated Jackson reference not found in AWS SDK class"
fi |
What is the purpose of the change
This PR relocates Jackson in filesystem plugin shade configurations
All Hadoop-based filesystem plugins (s3-hadoop, s3-presto, azure, gs, oss) pull in jackson-databind 2.17.x transitively via AWS SDK v1 or Hadoop, but do not relocate it. If the plugin JAR ends up on the job classpath, its older unshaded Jackson shadows any newer version used by the job, causing for example:
NoSuchFieldError: CLEAR_CURRENT_TOKEN_ON_CLOSEStreamReadFeature.CLEAR_CURRENT_TOKEN_ON_CLOSEwas added in Jackson 2.18.0, so any job using jackson-dataformat-csv 2.18+ fails at runtime with this error.Brief change log
Added a com.fasterxml.jackson relocation to the maven-shade-plugin configuration of each affected module, using a module-specific shaded package (e.g. org.apache.flink.fs.s3hadoop.shaded.jackson), consistent with how flink-shaded-jackson handles the core runtime.
Verifying this change
This change is a trivial rework / code cleanup without any test coverage.
Does this pull request potentially affect one of the following parts:
@Public(Evolving): noDocumentation