Bug #594

top_block.unlock() deadlock when flow graph contains Python sync_block

Added by Joshua Lackey over 1 year ago. Updated 5 months ago.

Status:AssignedStart date:09/20/2013
Priority:NormalDue date:
Assignee:Johnathan Corgan% Done:


Target version:release-3.8.0


When a flow graph contains a sync_block written in Python, calling top_block.lock() and then top_block.unlock() always freezes.

Simple debugging shows the hang is caused by a thread calling sem_wait().

The attached file contains a simple repro.

unconnect_test.py Magnifier (1.16 KB) Joshua Lackey, 09/20/2013 09:12 pm


#1 Updated by Johnathan Corgan over 1 year ago

  • Category set to gnuradio-runtime
  • Status changed from New to Assigned
  • Assignee set to Johnathan Corgan
  • Priority changed from Normal to High

#2 Updated by Johnathan Corgan over 1 year ago

When unlock() is called, the flowgraph is stopped (in order to implement any recongfiguration done while locked). This issues an thread interrupt to all the flowgraph threads, and the wait() function joins all these threads. The problem is the worker thread that is handing the Python-based block is never returning from join(), and the call to wait() (and thus unlock() ) never finishes.

This may be related to the handling of the Python GIL (Global Interpreter Lock). When calling up into Python from C++, the Python GIL must be acquired before executing the Python work function, and released on exit. So it might be the case that the thread is in an uninterruptible state while doing this. Still investigating.

#3 Updated by Tom Rondeau over 1 year ago

  • Status changed from Assigned to Feedback

The bug occurs during the call to "stop" in gr::block_gateway_impl.cc, specifically when calling _handler->calleval(0). If you comment this line out, the above program will finish.

The calleval(0) line is calling into py_feval.h, gr::py_feval_ll::calleval and blocking on the line:

ensure_py_gil_state _lock;

Apparently, PyGILState_Ensure() is never returning. I have not been able to figure out why. This same code works fine during a direct call to 'stop' but not through 'unlock'. I cannot see where the GIL is being acquired at any time before this call (and not being released), and there is no indication in the Python docs that this call should ever block like this.

This suggests that there is something different in the path through a call to tb.stop to this stage and a call to tb.unlock.

#4 Updated by Johnathan Corgan 5 months ago

  • Status changed from Feedback to Assigned
  • Priority changed from High to Normal
  • Target version set to release-3.8.0

Also available in: Atom PDF