Skip to content

Commit e37c516

Browse files
AliRana30rok
andauthored
GH-49104: [C++] Fix Segfault in SparseCSFIndex::Equals with mismatched dimensions (#49105)
### Rationale for This Change The `SparseCSFIndex::Equals` method can crash when comparing two sparse indices that have a different number of dimensions. The method iterates over the `indices()` and `indptr()` vectors of the current object and accesses the corresponding elements in the `other` object without first verifying that both objects have matching vector sizes. This can lead to out-of-bounds access and a segmentation fault when the dimension counts differ. ### What Changes Are Included in This PR? This change adds explicit size equality checks for the `indices()` and `indptr()` vectors at the beginning of the `SparseCSFIndex::Equals` method. If the dimensions do not match, the method now safely returns `false` instead of attempting invalid memory access. ### Are These Changes Tested? Yes. The fix has been validated through targeted reproduction of the crash scenario using mismatched dimension counts, ensuring the method behaves safely and deterministically. ### Are There Any User-Facing Changes? No. This change improves internal safety and robustness without altering public APIs or observable user behavior. * GitHub Issue: #49104 Lead-authored-by: Alirana2829 <alimahmoodrana00@gmail.com> Co-authored-by: Ali Mahmood Rana <159713825+AliRana30@users.noreply.github.com> Co-authored-by: Rok Mihevc <rok@mihevc.org> Signed-off-by: Rok Mihevc <rok@mihevc.org>
1 parent 33f1ea5 commit e37c516

2 files changed

Lines changed: 27 additions & 8 deletions

File tree

cpp/src/arrow/sparse_tensor.cc

Lines changed: 4 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -405,13 +405,10 @@ SparseCSFIndex::SparseCSFIndex(const std::vector<std::shared_ptr<Tensor>>& indpt
405405
std::string SparseCSFIndex::ToString() const { return std::string("SparseCSFIndex"); }
406406

407407
bool SparseCSFIndex::Equals(const SparseCSFIndex& other) const {
408-
for (int64_t i = 0; i < static_cast<int64_t>(indices().size()); ++i) {
409-
if (!indices()[i]->Equals(*other.indices()[i])) return false;
410-
}
411-
for (int64_t i = 0; i < static_cast<int64_t>(indptr().size()); ++i) {
412-
if (!indptr()[i]->Equals(*other.indptr()[i])) return false;
413-
}
414-
return axis_order() == other.axis_order();
408+
auto eq = [](const auto& a, const auto& b) { return a->Equals(*b); };
409+
return axis_order() == other.axis_order() &&
410+
std::ranges::equal(indices(), other.indices(), eq) &&
411+
std::ranges::equal(indptr(), other.indptr(), eq);
415412
}
416413

417414
// ----------------------------------------------------------------------

cpp/src/arrow/sparse_tensor_test.cc

Lines changed: 23 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1641,10 +1641,32 @@ TYPED_TEST_P(TestSparseCSFTensorForIndexValueType, TestNonAscendingShape) {
16411641
ASSERT_TRUE(st->Equals(*sparse_tensor));
16421642
}
16431643

1644+
TYPED_TEST_P(TestSparseCSFTensorForIndexValueType, TestEqualityMismatchedDimensions) {
1645+
using IndexValueType = TypeParam;
1646+
using c_index_value_type = typename IndexValueType::c_type;
1647+
1648+
// 2D vs 3D - comparing indices with different dimensionality
1649+
// 2D CSF: ndim=2, so indptr.size()=1, indices.size()=2
1650+
std::vector<int64_t> axis_order_2D = {0, 1};
1651+
std::vector<std::vector<c_index_value_type>> indptr_2D = {{0, 1}};
1652+
std::vector<std::vector<c_index_value_type>> indices_2D = {{0}, {0}};
1653+
auto si_2D = this->MakeSparseCSFIndex(axis_order_2D, indptr_2D, indices_2D);
1654+
1655+
// 3D CSF: ndim=3, so indptr.size()=2, indices.size()=3
1656+
std::vector<int64_t> axis_order_3D = {0, 1, 2};
1657+
std::vector<std::vector<c_index_value_type>> indptr_3D = {{0, 1}, {0, 1}};
1658+
std::vector<std::vector<c_index_value_type>> indices_3D = {{0}, {0}, {0}};
1659+
auto si_3D = this->MakeSparseCSFIndex(axis_order_3D, indptr_3D, indices_3D);
1660+
1661+
ASSERT_FALSE(si_2D->Equals(*si_3D));
1662+
ASSERT_FALSE(si_3D->Equals(*si_2D));
1663+
ASSERT_TRUE(si_2D->Equals(*si_2D));
1664+
}
1665+
16441666
REGISTER_TYPED_TEST_SUITE_P(TestSparseCSFTensorForIndexValueType, TestCreateSparseTensor,
16451667
TestTensorToSparseTensor, TestSparseTensorToTensor,
16461668
TestAlternativeAxisOrder, TestNonAscendingShape,
1647-
TestRoundTrip);
1669+
TestRoundTrip, TestEqualityMismatchedDimensions);
16481670

16491671
INSTANTIATE_TYPED_TEST_SUITE_P(TestInt8, TestSparseCSFTensorForIndexValueType, Int8Type);
16501672
INSTANTIATE_TYPED_TEST_SUITE_P(TestUInt8, TestSparseCSFTensorForIndexValueType,

0 commit comments

Comments
 (0)