Fun with Multi-Threading

Today I tried to improve upon a hacky test I wrote for rdebug, a tool within debugger. The particular hack I’m trying to remove from the test is polling one thread until it’s ready to respond to work.

This code is necessary because occasionally (between 1 and 4% of the time), an error message like this appears:

1
2
3
4
/Users/erichu/.rvm/gems/ruby-1.9.3-p448@debugger/gems/debugger-1.6.1/lib/ruby-debug.rb:108:in `initialize': Connection refused - connect(2) (Errno::ECONNREFUSED)
  from /Users/erichu/.rvm/gems/ruby-1.9.3-p448@debugger/gems/debugger-1.6.1/lib/ruby-debug.rb:108:in `new'
  from /Users/erichu/.rvm/gems/ruby-1.9.3-p448@debugger/gems/debugger-1.6.1/lib/ruby-debug.rb:108:in `start_client'
  from ../../bin/rdebug:188:in `<main>'

Some quick background: rdebug can start in local, client or server mode. This test is for the client and server modes. The server needs to be started and waiting before the client can connect to it.

I’m fairly certain that the cause is running the server in a separate thread. It seems to sometimes take longer to start than the main thread takes to initialize the client.

As far as I can see, the separate threads are necessary for this behavior-style test. The client has to start after the server is accepting requests, or it’ll stop with the error message seen above. The server will block its thread until it receives a request, so there’s no way to start a server and client on the same thread.

I was under the impression that I could inspect the status of the server thread to cut out the polling from my test. Here’s some test code that does that and repeats it:

1
2
3
4
5
6
7
8
9
10
100.times do
  server_thread = Thread.new {`../../bin/rdebug --server --wait ./simple_loop.rb >> rdebug_server_output`}
  while server_thread.status == 'run'
    puts server_thread.status
    sleep(0.001)
  end
  puts server_thread.status
  `printf '\nc\n' - | ../../bin/rdebug --client ./simple_loop.rb >> rdebug_client_output`
  puts "------------------------------ client done ------------------------------"
end

The 1-4 percentages I gave earlier, come from this code. I ran this a few times, and between 1 and 4 of the 100 iterations would fail with the above message. On my machine, I can drive this down to 0 by replacing the entire while block above with:

1
sleep(0.1)

If it’s that simple, why not just do that? This is a hack, just like my original polling solution. It’s actually worse because it’ll flat out fail sometimes, whereas the polling approach will get a connection eventually. The “fix” with sleep will fail on some machines if they’re much slower than my laptop, whether it’s due to hard drive speed, ram, cpu, or just other running processes.

Ultimately, I thought I found a better approach than polling by running the server in a separate thread and inspecting Thread#status for when it changed from “run” to “sleep”. It doesn’t look like this is the case, though.

My evaluation here is that the server thread fires off a message to yet another thread and goes to sleep before it’s ready to listen on a port.

Aside: If you want to reproduce this test, just pull my fork of the debugger gem, switch to commit a818a9e2f843c7d62dd45e42b58c432a597b3306 and then run the code from an irb instance in <debugger folder>/test/rdebug/.