[FIX] [16.0] queue_job: Add requeue default config parameter for started_delta + improve README#642
Conversation
11f845e to
3991b80
Compare
|
@simahawk @gurneyalex you are welcome! 😄 |
yajo
left a comment
There was a problem hiding this comment.
Just a few typos, but it all looks ok.
3991b80 to
3c8922b
Compare
sbidoul
left a comment
There was a problem hiding this comment.
Since you can have multiple Odoo instances running jobs and crons on different machines, there is actually no guarantee that the pid stored on the job is a pid of the machine trying to kill it. So it may be ineffective, or worse, killing an unrelated pid.
So I'm afraid we can't do this.
|
Yikes, true. However it's important to understand that we currently have a different problem. Imagine a job that takes 10 minutes to execute. Maybe because it's slow, or maybe because it's buggy (e.g. a request without a timeout). After 5 minutes, the cron runs and reschedules it. Then it is picked up by another worker. Since the 1st worker still didn't end, the job will run twice. That's a race condition. We could check the PID is currently running and belongs to an odoo process before terminating it. Also, we could add this option as a parameter into the cron function directly (False by default). Benefits:
I'd like a better solution, such as being able to check if the job is actually running or not. But then the jobrunner should start 2 threads, one of which would be a keepalive one, or something like that. Way more complex... But do you have any other ideas? |
There was this idea of taking a lock on the job record (#423), so we can know for sure that some worker somewhere is still processing the job. I think it is feasible but it is tricky to get right. Also, with the current implementation, if you configure the cron so the delay for re-queuing is greater than the CPU time limit of the odoo jobs workers, then you can be sure that the job will have been killed before being requeued. |
|
With psutil library, we could hash some pid information (pid and create_time at least) and store on the job to ensure we are killing the right. Then we can check the process to ensure is running and is not zombie. Parameter started_delta=0 passed to the function must be tweaked in every instance to ensure this won't trigger until Odoo has killed his own process (set on the parameters of the environment) |
|
However, if we keep this in mind:
Knowing that Odoo will kill the worker after exceeding its allowed time, do we really need to kill it ourselves? Can't we just assume it's being killed and focus on properly rescheduling it? |
Yes, I think is not needed :/ I'm going to update README properly to include job reset configuration |
3c8922b to
f0470e3
Compare
|
This PR has the |
f0470e3 to
c01eed5
Compare
49bee5b to
24f0dcd
Compare
24f0dcd to
cd78484
Compare
|
All ready |
|
This PR has the |
|
@guewen can you merge this with your bless? 🙌🏻 |
|
/ocabot merge patch Thanks! |
|
This PR looks fantastic, let's merge it! |
|
Congratulations, your PR was merged at dbfd111. Thanks a lot for contributing to OCA. ❤️ |
Added requeue documentation to readme file
MT-5357 @moduon @yajo @rafaelbn @sbidoul @guewen please review if you want 😄