Distributing a series of tasks

Arjen Markus (5 january 2005) I was inspired by a white paper from Electric Cloud, Inc. to consider a solution for a particular problem me and my colleagues are facing: the automatic tests for part of our software are taking hours and hours to complete.

Now these are all independent tasks: a single case consists of one or more programs that are run and results that are checked. There are a lot of cases. So, using several machines seems a nice way to reduce the time needed to run them all.

The scripts below illustrate how this can be done:

  • The server is started first. It reads a file with the description of the various jobs that need to be done
  • One or more clients (just start the client script multiple times) connect to the server and receive the information about the job at hand.
  • Once the client is finished with that job, it asks for the next one (by sending a message to the server)

Okay, nothing fancy, but the principle works: there is one process in charge, all others cooperate smoothly.


 # server.tcl --
 #     Server application that distributes work among interested clients
 #
 #     General information:
 #     This script sets up a server that listens to a socket.
 #     Clients connect to it and are then passed a "job" that
 #     they should process. The jobs are found in the file "server.inp"
 #

 # setUp --
 #     Create the server
 # Arguments:
 #     None
 # Return value:
 #     None
 #
 proc setUp  { } {
     global srvsock
 
     set port    8085
     set timeout 60000

     set srvsock [socket -server [list acceptClient $timeout] $port]
 }

 # acceptClient --
 #    Accept the service request
 #
 # Arguments:
 #     timelimit  Maximum time to wait
 #     socket     Name of the socket we listen to
 #     ip         IP address of client
 #     args       (Possibly) all other arguments
 #
 # Return value:
 #     None
 #
 proc acceptClient { timelimit socket ip args } {
 
     fconfigure $socket -block false
     fileevent  $socket readable [list sendJob $socket]
 }

 # sendJob --
 #     Send the description of a newjob to the client
 #
 # Arguments:
 #     socket     Name of the socket we listen to
 #
 # Return value:
 #     None
 #
 proc sendJob { socket } {
     global jobfile
     global srvsock

     if { ! [eof $socket] } {
         gets $socket line
         puts "Client: $line"

         #
         # Get the next job description
         #
         set end 1
         while { [gets $jobfile job] >= 0 } {
             set end 0
             if { [lindex [split $job] 0] == "job" } {
                  puts $socket [join [lrange [split $job] 1 end]]
                  puts "   Sent: $job"
                  flush $socket
                  break
             }
         }

         #
         # If end of job file ...
         #
         if { $end } {
             close $jobfile
             close $srvsock
             puts "Server done!"
             exit
         }
     } else {
         close $socket
     }
 }

 # main --
 #     Open the file with jobs and get the server going ...
 #
 set ::jobfile [open "server.inp" "r"]
 setUp

 puts "Server up and running ..."
 vwait forever

 # client.tcl --
 #     Client application that takes work from a server
 #
 #     General information:
 #     This script connects to the server and accepts a job to be done
 #

 # setUp --
 #     Connect to the server
 # Arguments:
 #     None
 # Return value:
 #     None
 #
 proc setUp  { {host localhost} } {
     global channel

     set port    8085
     set channel [socket $host $port]

     puts $channel "client-[pid]"
     flush $channel
     fileevent $channel readable [list acceptJob $channel]
 }

 # acceptJob --
 #     Receive the job that is to be done from the server
 #
 # Arguments:
 #     None
 # Return value:
 #     None
 #
 proc acceptJob {channel} {

     if { [gets $channel line] >= 0 } {
         puts "Received job: $line"

         #
         # Wait a while ...
         #
         after [expr {int(10000*rand())}]

         #
         # Send a string to say we are ready
         #
         puts $channel "client-[pid]"
         flush $channel
     } else {

         #
         # No more jobs ...
         #
         close $channel
         puts "Done"
         exit
     }
 }

 # main --
 #     Get the client going ...
 #
 if { [llength $argv] > 0 } {
     setUp [lindex $argv 0]
 } else {
     setUp
 }
 vwait forever

 # server.inp --
 # Input for the server:
 # Lines that represent jobs ...
 #
 job "Test 1"
 job "Test 2"
 job "Test 3"
 job "Test 4"
 job "Test 5"
 job "Test 6"
 job "Test 7"
 job "Test 8"
 job "Test 9"
 job "Test 10"

"Distributed computation"