Skip to content

crash at GC get_blobs_to_replace #403

@Besroy

Description

@Besroy

Environment

Parameter Value
Cluster 908
Namespace nuobject2sh-dev
Deployment sm-long-running2-1015
HomeObjVer homeobject/4.1.3@oss/main
HomeStoreVer homestore/7.5.2@oss/master

Issue Description

During GC, the SM2 attempted crash at get_blobs_to_replace

Test Scenario

  1. Start write 1 million blobs in 2 PGs
    • blob_size: 150K
    • replica_count: 3
    • write:delete:read ratio: 86:14
  2. Randomly select one member to kill every 3 minutes
  3. After SM2 was killed, restarted and running some time, it crashed during GC

Symptoms

1. Stack trace

(gdb) bt
#0  0x000063b5e42fde40 in homeobject::BlobRouteByChunkKey::BlobRouteByChunkKey (other=..., this=<synthetic pointer>)
    at /home/jenkins/.conan2/p/b/homeo33192ae4f30c1/b/src/lib/homestore_backend/index_kv.hpp:61
#1  homestore::Btree<homeobject::BlobRouteByChunkKey, homeobject::BlobRouteValue>::query (this=0x63b5f9b1a858, qreq=..., out_values=std::vector of length 141, capacity 230 = {...})
    at /home/jenkins/.conan2/p/homesd362146e71c38/p/include/homestore/btree/btree.ipp:267
#2  0x000063b5e42d6ed6 in homeobject::GCManager::pdev_gc_actor::get_blobs_to_replace (this=this@entry=0x63b602c1a380, move_to_chunk=<optimized out>, move_to_chunk@entry=405,
    valid_blob_indexes=std::vector of length 141, capacity 230 = {...}, task_id=<optimized out>, task_id@entry=2, pg_id=<optimized out>)
    at /home/jenkins/.conan2/p/b/homeo33192ae4f30c1/b/src/lib/homestore_backend/gc_manager.cpp:521
#3  0x000063b5e42e9a61 in homeobject::GCManager::pdev_gc_actor::process_gc_task (this=0x63b602c1a380, move_from_chunk=<optimized out>, priority=<optimized out>, task=...,
    task_id=<optimized out>) at /home/jenkins/.conan2/p/b/homeo33192ae4f30c1/b/src/lib/homestore_backend/gc_manager.cpp:1240
#4  0x000063b5e42eb0ff in operator() (__closure=0x7e8d59dadf40) at /home/jenkins/.conan2/p/b/homeo33192ae4f30c1/b/src/lib/homestore_backend/gc_manager.cpp:389
#5  folly::detail::function::FunctionTraits<void()>::callSmall<homeobject::GCManager::pdev_gc_actor::add_gc_task(uint8_t, homeobject::chunk_id_t)::<lambda()> >(folly::detail::function::Data &) (p=...) at /home/jenkins/.conan2/p/follyfb4eadbee73db/p/include/folly/Function.h:349
#6  0x000063b5e47245b1 in folly::detail::function::FunctionTraits<void ()>::operator()() (this=0x7e8d59dadf40)
    at /home/jenkins/.conan2/p/b/folly8428752782aa2/b/src/folly/Function.h:378
#7  folly::catch_exception<folly::Function<void ()>&, void (&)(char const*) noexcept, char const*&, void>(folly::Function<void ()>&, void (&)(char const*) noexcept, char const*&) (
    c=<optimized out>, t=...) at /home/jenkins/.conan2/p/b/folly8428752782aa2/b/src/folly/lang/Exception.h:286
#8  folly::Executor::invokeCatchingExns<folly::Function<void ()> >(char const*, folly::Function<void ()>) (f=..., p=0x63b5e529b770 "ThreadPoolExecutor: func")
    at /home/jenkins/.conan2/p/b/folly8428752782aa2/b/src/folly/Executor.h:234
#9  folly::ThreadPoolExecutor::runTask (this=<optimized out>, thread=std::shared_ptr<folly::ThreadPoolExecutor::Thread> (use count 7, weak count 0) = {...}, task=...)
    at /home/jenkins/.conan2/p/b/folly8428752782aa2/b/src/folly/executors/ThreadPoolExecutor.cpp:102
#10 0x000063b5e4718c2e in operator() (__closure=0x63b603fcf380) at /home/jenkins/.conan2/p/b/folly8428752782aa2/b/src/folly/executors/IOThreadPoolExecutor.cpp:148
#11 folly::detail::function::FunctionTraits<void()>::callBig<folly::IOThreadPoolExecutor::add(folly::Func, std::chrono::milliseconds, folly::Func)::<lambda()> >(folly::detail::function::Data &) (p=...) at /home/jenkins/.conan2/p/b/folly8428752782aa2/b/src/folly/Function.h:363
#12 0x000063b5e4738146 in folly::detail::function::FunctionTraits<void ()>::operator()() (this=0x7e8d59dae050)
    at /home/jenkins/.conan2/p/b/folly8428752782aa2/b/src/folly/Function.h:376
#13 folly::EventBase::FuncRunner::operator()(folly::Function<void ()>) (func=..., this=<optimized out>)
    at /home/jenkins/.conan2/p/b/folly8428752782aa2/b/src/folly/io/async/EventBase.cpp:131
#14 folly::detail::invokeConsumerWithTask<folly::Function<void ()>, folly::EventBase::FuncRunner&, void, void, void>(folly::EventBase::FuncRunner&, folly::Function<void ()>&&, std::shared_ptr<folly::RequestContext>&&) (consumer=..., rctx=..., task=...) at /home/jenkins/.conan2/p/b/folly8428752782aa2/b/src/folly/io/async/AtomicNotificationQueue-inl.h:281
#15 folly::AtomicNotificationQueue<folly::Function<void ()> >::drive<folly::EventBase::FuncRunner&>(folly::EventBase::FuncRunner&) (this=this@entry=0x63b603c0ab00, consumer=...)
    at /home/jenkins/.conan2/p/b/folly8428752782aa2/b/src/folly/io/async/AtomicNotificationQueue-inl.h:339
#16 0x000063b5e473fe41 in folly::EventBaseAtomicNotificationQueue<folly::Function<void ()>, folly::EventBase::FuncRunner>::drive<folly::EventBase::FuncRunner&>(folly::EventBase::FuncRunner&) (consumer=..., this=0x63b603c0aa00) at /home/jenkins/.conan2/p/b/folly8428752782aa2/b/src/folly/io/async/EventBaseAtomicNotificationQueue-inl.h:265
#17 folly::EventBaseAtomicNotificationQueue<folly::Function<void ()>, folly::EventBase::FuncRunner>::execute() (this=0x63b603c0aa00)
    at /home/jenkins/.conan2/p/b/folly8428752782aa2/b/src/folly/io/async/EventBaseAtomicNotificationQueue-inl.h:285
#18 0x000063b5e473ff21 in non-virtual thunk to folly::EventBaseAtomicNotificationQueue<folly::Function<void ()>, folly::EventBase::FuncRunner>::handlerReady(unsigned short) ()
    at /home/jenkins/.conan2/p/b/folly8428752782aa2/b/src/folly/io/async/EventBaseAtomicNotificationQueue-inl.h:275
#19 0x000063b5e4745b56 in folly::EventHandler::libeventCallback (fd=<optimized out>, events=2, arg=0x63b603c0aa28)
    at /home/jenkins/.conan2/p/b/folly8428752782aa2/b/src/folly/io/async/EventHandler.cpp:161
#20 0x000063b5e4846935 in event_persist_closure (ev=<optimized out>, base=<optimized out>) at /home/jenkins/.conan2/p/b/libev76e83d22d755a/b/src/event.c:1623
#21 event_process_active_single_queue (base=base@entry=0x63b603364b00, activeq=0x63b5f9d0bf20, max_to_process=max_to_process@entry=2147483647, endtime=endtime@entry=0x0)
    at /home/jenkins/.conan2/p/b/libev76e83d22d755a/b/src/event.c:1682
#22 0x000063b5e4846d87 in event_process_active (base=0x63b603364b00) at /home/jenkins/.conan2/p/b/libev76e83d22d755a/b/src/event.c:1783
#23 event_base_loop (base=0x63b603364b00, flags=1) at /home/jenkins/.conan2/p/b/libev76e83d22d755a/b/src/event.c:2006
#24 0x000063b5e4738dae in (anonymous namespace)::EventBaseBackend::eb_event_base_loop (flags=1, this=<optimized out>)
    at /home/jenkins/.conan2/p/b/folly8428752782aa2/b/src/folly/io/async/EventBase.cpp:76
#25 folly::EventBase::loopMain (this=this@entry=0x63b607cd5900, flags=flags@entry=0, ignoreKeepAlive=ignoreKeepAlive@entry=false)
    at /home/jenkins/.conan2/p/b/folly8428752782aa2/b/src/folly/io/async/EventBase.cpp:472
#26 0x000063b5e4739414 in folly::EventBase::loopBody (this=this@entry=0x63b607cd5900, flags=flags@entry=0, ignoreKeepAlive=ignoreKeepAlive@entry=false)
    at /home/jenkins/.conan2/p/b/folly8428752782aa2/b/src/folly/io/async/EventBase.cpp:401
#27 0x000063b5e47394aa in folly::EventBase::loop (this=this@entry=0x63b607cd5900) at /home/jenkins/.conan2/p/b/folly8428752782aa2/b/src/folly/io/async/EventBase.cpp:380
#28 0x000063b5e473b6aa in folly::EventBase::loopForever (this=0x63b607cd5900) at /home/jenkins/.conan2/p/b/folly8428752782aa2/b/src/folly/io/async/EventBase.cpp:614
#29 0x000063b5e471b231 in folly::IOThreadPoolExecutor::threadRun (this=0x63b6031c4a40, thread=...)
    at /home/jenkins/.conan2/p/b/folly8428752782aa2/b/src/folly/executors/IOThreadPoolExecutor.cpp:239
#30 0x000063b5e47271ba in std::__invoke_impl<void, void (folly::ThreadPoolExecutor::*&)(std::shared_ptr<folly::ThreadPoolExecutor::Thread>), folly::ThreadPoolExecutor*&, std::shared_ptr<folly::ThreadPoolExecutor::Thread>&> (__f=<optimized out>, __t=<optimized out>) at /usr/include/c++/13/bits/invoke.h:74
#31 std::__invoke<void (folly::ThreadPoolExecutor::*&)(std::shared_ptr<folly::ThreadPoolExecutor::Thread>), folly::ThreadPoolExecutor*&, std::shared_ptr<folly::ThreadPoolExecutor::Thread>&> (__fn=<optimized out>) at /usr/include/c++/13/bits/invoke.h:96
#32 std::_Bind<void (folly::ThreadPoolExecutor::*(folly::ThreadPoolExecutor*, std::shared_ptr<folly::ThreadPoolExecutor::Thread>))(std::shared_ptr<folly::ThreadPoolExecutor::Thread>)>::__call<void, , 0ul, 1ul>(std::tuple<>&&, std::_Index_tuple<0ul, 1ul>) (__args=..., this=<optimized out>) at /usr/include/c++/13/functional:506
#33 std::_Bind<void (folly::ThreadPoolExecutor::*(folly::ThreadPoolExecutor*, std::shared_ptr<folly::ThreadPoolExecutor::Thread>))(std::shared_ptr<folly::ThreadPoolExecutor::Thread>)>::operator()<, void>() (this=<optimized out>) at /usr/include/c++/13/functional:591
#34 folly::detail::function::FunctionTraits<void ()>::callSmall<std::_Bind<void (folly::ThreadPoolExecutor::*(folly::ThreadPoolExecutor*, std::shared_ptr<folly::ThreadPoolExecutor::Thread>))(std::shared_ptr<folly::ThreadPoolExecutor::Thread>)> >(folly::detail::function::Data&) (p=...) at /home/jenkins/.conan2/p/b/folly8428752782aa2/b/src/folly/Function.h:349
#35 0x00007e8d7daaadb4 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#36 0x00007e8d7d72aaa4 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#37 0x00007e8d7d7b7c6c in ?? () from /lib/x86_64-linux-gnu/libc.so.6
(gdb)
(gdb) f 0
#0  0x000063b5e42fde40 in homeobject::BlobRouteByChunkKey::BlobRouteByChunkKey (other=..., this=<synthetic pointer>)
    at /home/jenkins/.conan2/p/b/homeo33192ae4f30c1/b/src/lib/homestore_backend/index_kv.hpp:61
61	in /home/jenkins/.conan2/p/b/homeo33192ae4f30c1/b/src/lib/homestore_backend/index_kv.hpp
(gdb) p this
$1 = (homeobject::BlobRouteByChunkKey * const) <synthetic pointer>
(gdb) p *this
$2 = <optimized out>
(gdb) f 1
#1  homestore::Btree<homeobject::BlobRouteByChunkKey, homeobject::BlobRouteValue>::query (this=0x63b5f9b1a858, qreq=..., out_values=std::vector of length 141, capacity 230 = {...})
    at /home/jenkins/.conan2/p/homesd362146e71c38/p/include/homestore/btree/btree.ipp:267
warning: 267	/home/jenkins/.conan2/p/homesd362146e71c38/p/include/homestore/btree/btree.ipp: No such file or directory
(gdb) p qreq
$3 = (homestore::BtreeQueryRequest<homeobject::BlobRouteByChunkKey> &) @0x7e8d59dacbe0: {<homestore::BtreeRangeRequest<homeobject::BlobRouteByChunkKey>> = {<homestore::BtreeRequest> = {m_app_context = 0x0, m_op_context = 0x0, route_tracing = std::unique_ptr<std::vector<homestore::trace_route_entry, std::allocator<homestore::trace_route_entry> >> = {
        get() = 0x0}}, m_search_state = {m_input_range = {m_start_key = {<homestore::BtreeKey> = {_vptr.BtreeKey = 0x63b5e58513c0 <vtable for homeobject::BlobRouteByChunkKey+16>},
          key_ = {chunk = 405, shard = 0, blob = 0}}, m_end_key = {<homestore::BtreeKey> = {_vptr.BtreeKey = 0x63b5e58513c0 <vtable for homeobject::BlobRouteByChunkKey+16>}, key_ = {
            chunk = 405, shard = 18446744073709551615, blob = 18446744073709551615}}, m_start_incl = true, m_end_incl = true,
        m_multi_selector = homestore::MultiMatchOption::DO_NOT_CARE}, m_working_range = {m_start_key = {<homestore::BtreeKey> = {
            _vptr.BtreeKey = 0x63b5e58513c0 <vtable for homeobject::BlobRouteByChunkKey+16>}, key_ = {chunk = 405, shard = 0, blob = 0}}, m_end_key = {<homestore::BtreeKey> = {
            _vptr.BtreeKey = 0x63b5e58513c0 <vtable for homeobject::BlobRouteByChunkKey+16>}, key_ = {chunk = 405, shard = 18446744073709551615, blob = 18446744073709551615}},
        m_start_incl = true, m_end_incl = true, m_multi_selector = homestore::MultiMatchOption::DO_NOT_CARE}, m_trimmed = false, m_exhausted = false}, m_batch_size = 4294967295},
  m_query_type = homestore::BtreeQueryType::SWEEP_NON_INTRUSIVE_PAGINATION_QUERY,
  m_filter_cb = {<std::_Maybe_unary_or_binary_function<bool, homestore::BtreeKey const&, homestore::BtreeValue const&>> = {<std::binary_function<homestore::BtreeKey const&, homestore::BtreeValue const&, bool>> = {<No data fields>}, <No data fields>}, <std::_Function_base> = {static _M_max_size = 16, static _M_max_align = 8, _M_functor = {_M_unused = {
          _M_object = 0x0, _M_const_object = 0x0, _M_function_pointer = 0x0, _M_member_pointer = NULL}, _M_pod_data = '\000' <repeats 15 times>}, _M_manager = 0x0},
    _M_invoker = 0x0}}

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions