Django Gearman JBox


Project maintained by jbox-web Hosted on GitHub Pages — Theme by mattgraham

django-gearman-jbox

django-gearman-jbox is a convenience wrapper for the Gearman Python Bindings.

With django-gearman-jbox, you can code workers as well as clients in a Django project with minimal overhead in your application. Server connections etc. all take place in django-gearman-jbox and don't unnecessarily clog your application code.

This library is based in large part on Fred Wenzel's django-gearman and Jozef Ševčík's django-gearman-commands.

But there are some modifications :

Workers are now launched individually, so you have to pass 2 mandatory parameters to start the worker :

The -q parameter is still here and has the same function than in django-gearman.

Installation

It's the same for both the client and worker instances of your django project :

$ pip install django-gearman-jbox

Add django_gearman_jbox to the INSTALLED_APPS section of settings.py.

Specify the following setting in your local settings.py file:

# One or more gearman servers
GEARMAN_CLIENT_SERVERS = ['127.0.0.1']
GEARMAN_WORKER_SERVERS = ['127.0.0.1']

Workers

Registering workers

Create a directory gearman_workers in any of your django apps, and define as many workers as you like, one worker per file. Create an empty __init__.py so the directory will be loaded as a package.

Example :

my_django_app
  |_ models.py
  |_ gearman_workers
      |_ __init__.py
      |_ worker_foo.py
      |_ worker_bar.py

Registering tasks

In the worker file, you can define as many tasks as functions as you like. The function must accept a single argument as passed by the caller and must return the result of the operation, if applicable. (Note : It must accept an argument, even if you don't use it).

Mark each of these functions as gearman tasks by decorating them with :

import django_gearman_jbox.decorators.gearman_task

@gearman_task()
def my_task_function(foo):
  pass

Task naming

The tasks are given a default name of their import path, with the phrase gearman_task stripped out of them, for readability reasons. You can override the task name by specifying name parameter of the decorator. Here's how :

import django_gearman_jbox.decorators.gearman_task

@gearman_task(name='my-task-name')
def my_task_function(foo):
  pass

Task parameters

The gearman docs specify that the task function can accept only one parameter (usually refered to as the data parameter). Additionally, that parameter may only be a string. Sometimes that may not be enough. What if you would like to pass an array or a dict? You would need to serialize and deserialize them. Fortunately, django-gearman-jbox can take care of this, so that you can spend all of your time on coding the actual task.

@gearman_task(name='my-task-name')
def my_task_function(foo):
  pass

client.submit_job('my-task-name', {'foo': 'becomes', 'this': 'dict'})
client.submit_job('my-task-name', Decimal(1.0))

Tasks with more than one parameter

You can pass as many arguments as you want, of whatever (serializable) type you like. Here's an example job definition :

@gearman_task(name='my-task-name')
def my_task_function(one, two, three):
  pass

You can execute this function in two different ways :

client.submit_job('my-task-name', one=1, two=2, three=3)
client.submit_job('my-task-name', args=[1, 2, 3])

Unfortunately, executing it like this:

client.submit_job('my-task-name', 1, 2, 3)

would produce the error, because submit_job from Gearman's Python bindings contains a lot of arguments and it's much easier to specify them via keyword names or a special args keyword than to type something like seven Nones instead :

client.submit_job('my-task-name', None, None, None, None, None, None, None, 1, 2, 3)

The only limitation that you have are gearman reserved keyword parameters. As of Gearman 2.0.2 these are :

* data
* unique
* priority
* background
* wait_until_complete
* max_retries
* poll_timeout

So, if you want your task definition to have, for example, unique or background keyword parameters, you need to execute the task in a special, more verbose way. Here's an example of such a task and its execution :

@gearman_task(name='my-task-name')
def my_task_function(background, unique):
  pass

client.submit_job('my-task-name', kwargs={"background": True, "unique": False})
client.submit_job('my-task-name', args=[True, False])

Finally:

client.submit_job('my-task-name', background=True, unique=True, kwargs={"background": False, "unique": False})

Don't panic, your task is safe! That's because you're using kwargs directly. Therefore, Gearman's bindings would receive True for submit_job function, while your task would receive False.

Always remember to double-check your parameter names with the reserved words list.

Starting a worker

To start a worker, run python manage.py gearman_worker -a <django_app_name> -n <worker_name>. It will start serving all registered tasks for that worker.

Example :

$ python manage.py gearman_worker -a django_app_name -n worker_foo
$ python manage.py gearman_worker -a django_app_name -n worker_bar

To spawn more than one worker see Supervisord configuration below.

Task queues

Queues are a virtual abstraction layer built on top of gearman tasks. An easy way to describe it is the following example: Imagine you have a task for fetching e-mails from the server, another task for sending the emails and one more task for sending SMS via an SMS gateway. A problem you may encounter is that the email fetching tasks may effectively "block" the worker (there could be so many of them, it could be so time-consuming, that no other task would be able to pass through). Of course, one solution would be to add more workers (via the Supervisord), but that would only temporarily solve the problem. This is where queues come in.

The first thing to do is to pass a queue name into the job description, like this :

@gearman_task(name="task_foo", queue="foo")
def function_foo(some_arg):
  pass

@gearman_task(name="task_bar", queue="bar")
def function_bar(some_arg):
  pass

@gearman_task(name="task_babar", queue="bar")
def function_babar(some_arg):
  pass

You may then proceed to start the tasks that are bound to a specific queue :

python manage.py gearman_worker -a <django_app_name> -n <worker_name> -q bar

Be aware of the fact that if you don't specify the queue name, the worker will load all tasks.

Start workers with Supervisord

Supervisor - http://supervisord.org/ is babysitter for processes. It allows you to launch, restart and monitor running processes. In our case it will be workers. To do so, create one config file by worker and adjust the number of workers you want with the 'numprocs' parameter :

worker_foo.conf :

[program:worker_foo]
command         = /path-to-your-virtualenv/bin/python /path-to-your-project/manage.py gearman_worker -a <django_app_name> -n %(program_name)s
process_name    = %(program_name)s_%(process_num)02d
numprocs        = 1
autostart       = true
autorestart     = true
user            = myapp
directory       = /home/myapp/
environment     = HOME='/home/myapp',USER='myapp',LOGNAME='myapp',

worker_bar.conf :

[program:worker_bar]
command         = /path-to-your-virtualenv/bin/python /path-to-your-project/manage.py gearman_worker -a <django_app_name> -n %(program_name)s -q bar
process_name    = %(program_name)s_%(process_num)02d
numprocs        = 2
autostart       = true
autorestart     = true
user            = myapp
directory       = /home/myapp/
environment     = HOME='/home/myapp',USER='myapp',LOGNAME='myapp',

You can also create a groups.conf file with this content :

[group:foo]
programs=worker_foo, worker_foo2

[group:bar]
programs=worker_bar, worker_bar2

This will create process 'group' and allows you to reload all workers related to this group at once when you redeploy new code.

Once you're config files are created, do /etc/init.d/supervisord start to start Supervisord and supervisorctl reload if you modify config or

supervisorctl reread
supervisorctl update
supervisorctl restart foo:*
supervisorctl restart bar:*

Execute code when workers die

Workers catch SIGTERM and SIGINT signals to kill themselves with a sys.exit(0) in a callback function. At this point in the code you can add your own function(s) which will be executed before the sys.exit(0) See django_gearman_jbox\management\commands\gearman_worker.py, line 116

Note that this will impact all workers as it resides in the gearman_worker.py script which is global for all workers.

Clients

To make your workers work, you need a client app passing data to them. Create and instance of the django_gearman_jbox.GearmanClient class and execute submit_job with it :

from django_gearman_jbox import GearmanClient

sentence = "The quick brown fox jumps over the lazy dog."

client = GearmanClient()
res = client.submit_job("foo", kwargs={"sentence": sentence})
print "Result: '%s'" % res

Dispatching a background event without waiting for the result is easy as well :

client.submit_job("foo", background=True, kwargs={"sentence": sentence})

Gearman Server Infos

python manage.py gearman_server_info outputs current status of Gearman servers. If you installed Prettytable dependency, here is how output looks like :

$ python manage.py gearman_server_info
+---------------------+------------------------+
| Gearman Server Host | Gearman Server Version |
+---------------------+------------------------+
|    127.0.0.1:4730   |        OK 0.29         |
+---------------------+------------------------+.

+---------------+---------------+--------------+-------------+
|   Task Name   | Total Workers | Running Jobs | Queued Jobs |
+---------------+---------------+--------------+-------------+
| data_unlock   |       1       |      0       |      0      |
| data_import   |       1       |      1       |      0      |
| cache_cleanup |       1       |      0       |      0      |
+---------------+---------------+--------------+-------------+.

+-----------+------------------+-----------+-----------------+
| Worker IP | Registered Tasks | Client ID | File Descriptor |
+-----------+------------------+-----------+-----------------+
| 127.0.0.1 |   data_unlock    |     -     |        35       |
| 127.0.0.1 |   data_import    |     -     |        36       |
| 127.0.0.1 |  cache_cleanup   |     -     |        37       |
+-----------+------------------+-----------+-----------------+

If you have a lot of workers, you can filter output using command argument (case-sensitive):

$ python manage.py gearman_server_info cleanup
+---------------------+------------------------+--------------------+
| Gearman Server Host | Gearman Server Version | Ping Response Time |
+---------------------+------------------------+--------------------+
|    127.0.0.1:4730   |        OK 1.1.3        | 0.0006871223449707 |
+---------------------+------------------------+--------------------+.

+---------------+---------------+--------------+-------------+
|   Task Name   | Total Workers | Running Jobs | Queued Jobs |
+---------------+---------------+--------------+-------------+
| cache_cleanup |       1       |      0       |      0      |
+---------------+---------------+--------------+-------------+.

+-----------+------------------+-----------+-----------------+
| Worker IP | Registered Tasks | Client ID | File Descriptor |
+-----------+------------------+-----------+-----------------+
| 127.0.0.1 |  cache_cleanup   |     -     |        37       |
+-----------+------------------+-----------+-----------------+

Licensing

This software is licensed under the Mozilla Tri-License:

***** BEGIN LICENSE BLOCK *****
Version: MPL 1.1/GPL 2.0/LGPL 2.1

The contents of this file are subject to the Mozilla Public License Version
1.1 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.mozilla.org/MPL/

Software distributed under the License is distributed on an "AS IS" basis,
WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License
for the specific language governing rights and limitations under the
License.

The Original Code is django-gearman.

The Initial Developer of the Original Code is Mozilla.
Portions created by the Initial Developer are Copyright (C) 2010
the Initial Developer. All Rights Reserved.

Contributor(s):
  Frederic Wenzel <fwenzel@mozilla.com>>
  Jeff Balogh <me@jeffbalogh.org>
  Jonas <jvp@jonasundderwolf.de>
  Jozef Ševčík <sevcik@codescale.net>
  Nicolas Rodriguez <nrodriguez@jbox-web.com>

Alternatively, the contents of this file may be used under the terms of
either the GNU General Public License Version 2 or later (the "GPL"), or
the GNU Lesser General Public License Version 2.1 or later (the "LGPL"),
in which case the provisions of the GPL or the LGPL are applicable instead
of those above. If you wish to allow use of your version of this file only
under the terms of either the GPL or the LGPL, and not to allow others to
use your version of this file under the terms of the MPL, indicate your
decision by deleting the provisions above and replace them with the notice
and other provisions required by the GPL or the LGPL. If you do not delete
the provisions above, a recipient may use your version of this file under
the terms of any one of the MPL, the GPL or the LGPL.

***** END LICENSE BLOCK *****