Bug #594

top_block.unlock() deadlock when flow graph contains Python sync_block

Added by Joshua Lackey about 1 year ago. Updated 12 months ago.

Status:FeedbackStart date:09/20/2013
Priority:HighDue date:
Assignee:Johnathan Corgan% Done:

0%

Category:gnuradio-runtime
Target version:-
Resolution:

Description

When a flow graph contains a sync_block written in Python, calling top_block.lock() and then top_block.unlock() always freezes.

Simple debugging shows the hang is caused by a thread calling sem_wait().

The attached file contains a simple repro.

unconnect_test.py Magnifier (1.16 KB) Joshua Lackey, 09/20/2013 09:12 pm

History

#1 Updated by Johnathan Corgan about 1 year ago

  • Category set to gnuradio-runtime
  • Status changed from New to Assigned
  • Assignee set to Johnathan Corgan
  • Priority changed from Normal to High

#2 Updated by Johnathan Corgan 12 months ago

When unlock() is called, the flowgraph is stopped (in order to implement any recongfiguration done while locked). This issues an thread interrupt to all the flowgraph threads, and the wait() function joins all these threads. The problem is the worker thread that is handing the Python-based block is never returning from join(), and the call to wait() (and thus unlock() ) never finishes.

This may be related to the handling of the Python GIL (Global Interpreter Lock). When calling up into Python from C++, the Python GIL must be acquired before executing the Python work function, and released on exit. So it might be the case that the thread is in an uninterruptible state while doing this. Still investigating.

#3 Updated by Tom Rondeau 12 months ago

  • Status changed from Assigned to Feedback

The bug occurs during the call to "stop" in gr::block_gateway_impl.cc, specifically when calling _handler->calleval(0). If you comment this line out, the above program will finish.

The calleval(0) line is calling into py_feval.h, gr::py_feval_ll::calleval and blocking on the line:

ensure_py_gil_state _lock;

Apparently, PyGILState_Ensure() is never returning. I have not been able to figure out why. This same code works fine during a direct call to 'stop' but not through 'unlock'. I cannot see where the GIL is being acquired at any time before this call (and not being released), and there is no indication in the Python docs that this call should ever block like this.

This suggests that there is something different in the path through a call to tb.stop to this stage and a call to tb.unlock.

Also available in: Atom PDF