python - Should django model object instances be passed to celery?

Question

Welcome To Ask or Share your Answers For Others

python - Should django model object instances be passed to celery?

posted Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - Should django model object instances be passed to celery?

# models.py
from django.db import models

class Person(models.Model):
    first_name = models.CharField(max_length=30)
    last_name = models.CharField(max_length=30)
    text_blob = models.CharField(max_length=50000)

# tasks.py
import celery
@celery.task
def my_task(person):
    # example operation: does something to person 
    # needs only a few of the attributes of person
    # and not the entire bulky record
    person.first_name = person.first_name.title()
    person.last_name = person.last_name.title()
    person.save()

In my application somewhere I have something like:

from models import Person
from tasks import my_task
import celery
g = celery.group([my_task.s(p) for p in Person.objects.all()])
g.apply_async()

Celery pickles p to send it to the worker right?
If the workers are running on multiple machines, would the entire person object (along with the bulky text_blob which is primarily not required) be transmitted over the network? Is there a way to avoid it?
How can I efficiently and evenly distribute the Person records to workers running on multiple machines?

Could this be a better idea? Wouldn't it overwhelm the db if Person has a few million records?

# tasks.py

import celery
from models import Person
@celery.task
def my_task(person_pk):
    # example operation that does not need text_blob
    person = Person.objects.get(pk=person_pk)
    person.first_name = person.first_name.title()
    person.last_name = person.last_name.title()
    person.save()


#In my application somewhere
from models import Person
from tasks import my_task
import celery
g = celery.group([my_task.s(p.pk) for p in Person.objects.all()])
g.apply_async()

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-23T18:58:44+0000

I believe it is better and safer to pass PK rather than the whole model object. Since PK is just a number, serialization is also much simpler. Most importantly, you can use a safer sarializer (json/yaml instead of pickle) and have a peace of mind that you won't have any problems with serializing your model.

As this article says:

Since Celery is a distributed system, you can't know in which process, or even on what machine the task will run. So you shouldn't pass Django model objects as arguments to tasks, its almost always better to re-fetch the object from the database instead, as there are possible race conditions involved.

Categories

python - Should django model object instances be passed to celery?

python - Should django model object instances be passed to celery?

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags