Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
463 views
in Technique[技术] by (71.8m points)

python - App Engine Bulk Loader Performance

I am using the App Engine Bulk loader (Python Runtime) to bulk upload entities to the data store. The data that i am uploading is stored in a proprietary format, so i have implemented by own connector (registerd it in bulkload_config.py) to convert it to the intermediate python dictionary.

import google.appengine.ext.bulkload import connector_interface
class MyCustomConnector(connector_interface.ConnectorInterface):
   ....
   #Overridden method
   def generate_import_record(self, filename, bulkload_state=None):
      ....
      yeild my_custom_dict

To convert this neutral python dictionary to a datastore Entity, i use a custom post import function that i have defined in my YAML.

def feature_post_import(input_dict, entity_instance, bulkload_state):
    ....
    return [all_entities_to_put]

Note: I am not using entity_instance, bulkload_state in my feature_post_import function. I am just creating new data store entities (based on my input_dict), and returning them.

Now, everything works great. However, the process of bulk loading data seems to take way too much time. For e.g. a GB (~ 1,000,000 entities) of data takes ~ 20 hours. How can I improve the performance of the bulk load process. Am i missing something?

Some of the parameters that i use with appcfg.py are (10 threads with a batch size of 10 entities per thread).

Linked a Google App Engine Python group post: http://groups.google.com/group/google-appengine-python/browse_thread/thread/4c8def071a86c840

Update: To test the performance of the Bulk Load process, I loaded entities of a 'Test' Kind. Even though this entity has a very simple FloatProperty, it still took me the same amount of time to bulk load those entities.

I am still going to try to vary the bulk loader parameters, rps_limit, bandwidth_limit and http_limit, to see if i can get any more throughput.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)
Waitting for answers

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

1.4m articles

1.4m replys

5 comments

57.0k users

...