Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
881 views
in Technique[技术] by (71.8m points)

csv - Change frequency of x-axis tick label of datetime data in python bar chart using matplotlib

I have a script that takes multiple .csv files and outputs multiple bar plots. The data are daily rainfall totals and so the x-axis is the date in daytime format %d %m %Y. As is, the code tries to include all 365 days in the label but the x-axis gets clogged. What code can I use to only include one label per month in the format "Jan 01", for example.

import pandas as pd
import time
import os
import matplotlib.pyplot as plt

files = ['w.pod.csv',
't.pod.csv',
'r.pod.csv',
'n.pod.csv',
'm.pod.csv',
'k.pod.csv',
'j.pod.csv',
'h.pod.csv',
'g.pod.csv',
'c.pod.csv',
'b.pod.csv']

for f in files:
    fn = f.split('.')[0]
    dat = pd.read_csv(f)
    df0 = dat.loc[:, ['TimeStamp', 'RF']]
    # Change time format
    df0["time"] = pd.to_datetime(df0["TimeStamp"])
    df0["day"] = df0['time'].map(lambda x: x.day)
    df0["month"] = df0['time'].map(lambda x: x.month)
    df0["year"] = df0['time'].map(lambda x: x.year)
    df0.to_csv('{}_1.csv'.format(fn), na_rep="0")  # write to csv

    # Combine for daily rainfall
    df1 = pd.read_csv('{}_1.csv'.format(fn), encoding='latin-1',
              usecols=['day', 'month', 'year', 'RF', 'TimeStamp'])
    df2 = df1.groupby(['day', 'month', 'year'], as_index=False).sum()
    df2.to_csv('{}_2.csv'.format(fn), na_rep="0", header=None)  # write to csv

    # parse date
    df3 = pd.read_csv('{}_2.csv'.format(fn), header=None, index_col='datetime',
             parse_dates={'datetime': [1,2,3]},
             date_parser=lambda x: pd.datetime.strptime(x, '%d %m %Y'))

    def dt_parse(date_string):
        dt = pd.datetime.strptime(date_string, '%d %m %Y')
        return dt

    # sort datetime
    df4 = df3.sort()
    final = df4.reset_index()

    # rename columns
    final.columns = ['date', 'bleh', 'rf']

  [![enter image description here][1]][1]  final[['date','rf']].plot(kind='bar')
    plt.suptitle('{} Rainfall 2015-2016'.format(fn), fontsize=20)
    plt.xlabel('Date', fontsize=18)
    plt.ylabel('Rain / mm', fontsize=16)
    plt.savefig('{}.png'.format(fn))

This is an extension of my previous question: Automate making multiple plots in python using several .csv files

enter image description here

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

It is not easy, but this works:

#sample df with dates of one year, rf are random integers
np.random.seed(100)
N = 365
start = pd.to_datetime('2015-02-24')
rng = pd.date_range(start, periods=N)

final = pd.DataFrame({'date': rng, 'rf': np.random.randint(50, size=N)})  
print (final.head())
        date  rf
0 2015-02-24   8
1 2015-02-25  24
2 2015-02-26   3
3 2015-02-27  39
4 2015-02-28  23

fn = 'suptitle'
#rot - ratation of labels in axis x 
ax = final.plot(x='date', y='rf', kind='bar', rot='45')
plt.suptitle('{} Rainfall 2015-2016'.format(fn), fontsize=20)
plt.xlabel('Date', fontsize=18)
plt.ylabel('Rain / mm', fontsize=16)
#set cusom format of dates
ticklabels = final.date.dt.strftime('%Y-%m-%d')
ax.xaxis.set_major_formatter(ticker.FixedFormatter(ticklabels))

#show only each 30th label, another are not visible
spacing = 30
visible = ax.xaxis.get_ticklabels()[::spacing]
for label in ax.xaxis.get_ticklabels():
    if label not in visible:
        label.set_visible(False)

plt.show()

graph


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...