I am trying to create a dataframe of twitter data. Using the twitter API, I have a list of twitter objects as a list (tweets
) and want to populate a dataframe with various info from those twitter objects and using some other functions on the text. The current method I have uses list comprehension for each column, iterating through all tweets each time.
df = pd.DataFrame(data=[tweet.all_text for tweet in tweets], columns=["tweets"])
df.loc[:, 'id'] = np.array([tweet.id for tweet in tweets])
df.loc[:, 'len_tweet'] = np.array([len(tweet.all_text) for tweet in tweets])
df.loc[:, 'date_created'] = np.array([tweet.created_at_datetime for tweet in tweets])
df.loc[:, 'author'] = np.array([tweet.name for tweet in tweets])
df.loc[:, 'clean_tweet'] = np.array([self.clean_tweet_eng(tweet) for tweet in df.tweets])
df.loc[:, 'clean_stopwords_tweet'] = np.array([self.stopwords_clean(tweet) for tweet in df.tweets])
etc...
As I scale up the number of tweets, this becomes very slow.
I have looked at two other methods: creating the dataframe through iteratively adding elements to a dictionary, and building up the dataframe one row at a time using iterrows to only cycle through the list of tweets once. Both seem to be slower.
What is the fastest way to do achieve this?
question from:
https://stackoverflow.com/questions/66049573/what-is-the-quickest-way-to-add-rows-to-a-pandas-dataframe-built-from-a-list 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…