Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
234 views
in Technique[技术] by (71.8m points)

python - DynamoDB not receiving the entire SQS message body

I am pulling data from an API in batches and sending it to an SQS Queue. Where I am having an issue is processing the message in order to send the data to DynamoDB. There is supposed to be 147,689 records in the dataset. However, when running the code, sometimes less than 147,689 records will be put to DynamoDB, sometimes more than 147,689 records will be put to DynamoDB, and sometimes 147,689 records will be put to DynamoDB. It is not consistently putting 147,689 records into the database.

I have tried everything I can think of to try and fix this issue including (utilizing a Fifo queue instead of a standard queue, increasing the visibility timeout, increasing the delivery timeout, using uuid.uuid1() instead of uuid.uuid4()) I am looping through the "Record" list so not sure why it is not processing the entire batch. Below is my latest code to process the message and send the data to DynamoDB:

import boto3
import json
import uuid
import time

dynamo = boto3.client("dynamodb", "us-east-1")

def lambda_handler(event, context):
    for item in json.loads(event["Records"][0]["body"]):
        item["id"] = uuid.uuid1().bytes
        for key, value in item.items():
            if key == "id":
                item[key] = {"B": bytes(value)}
            elif key == "year":
                item[key] = {"N": str(value)}
            elif key == "amt_harvested":
                item[key] = {"N": str(value)}
            elif key == "consumed":
                item[key] = {"N": str(value)}
            else:
                item[key] = {"S": str(value)}

     
            time.sleep(0.001)
        
        dynamo.put_item(TableName="TableOne", Item=dict(item))
question from:https://stackoverflow.com/questions/65646101/dynamodb-not-receiving-the-entire-sqs-message-body

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Lambda Event Source Mapping for SQS will poll for messages and call Lambda function for a batch of records based on batch size which by default is 10. And processing the batch should be done by looping event["Records"] array.

Key factors that should be considered for setting batch size.

  • If lambda processing fails, entire batch will be resend and will be retried by AWS. If function can't accept processing duplicate records, batchsize should be set to 1.
  • If processing a single record in lambda takes 20 ms, we will still be charged for 100ms(this is minimum) by AWS, we can easily reduce 5x cost by simply setting batch size of 5.

Always recommended to

  • Set a batch size higher and code lambda to be idempotent.
  • Code Lambda to process all records irrespective what batch size is.

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...