Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
120 views
in Technique[技术] by (71.8m points)

python - How can make a dataset of elements of matrices in dataframe?

I have dataset of 3 parameters 'A','B','C' in .TXT file and after I print them in 24x20 matrices I need to collect the 1st elements of 'A','B','C' put in long arrays in panda dataframe and then 2nd elements of each then 3rd and so on till 480th elements.

So my data is like this in text file: my data is txt file is following:

id_set: 000
     A: -2.46882615679
     B: -2.26408246559
     C: -325.004619528

I already made a panda dataframe includes 3 columns of 'A','B','C' and index and defined functions to print 24x20 matric in right way. Simple example via 2x2 matrices:

1st cycle:  A = [1,2,    B = [4,5,     C = [8,9,
                 3,4]         6,7]          10,11]
2nd cycle:  A = [0,8,    B = [1,9,     C = [10,1,
                 2,5]         4,8]          2,7]

Reshape to this form:

          A(1,1),B(1,1),C(1,1),A(1,2),B(1,2),C(1,2),.....
Result=  [1,4,8,2,5,9,3,6,10,4,7,11] #1st cycle
         [0,1,10,8,9,1,2,4,2,5,8,7]  #2nd cycle

My scripts are following:

import numpy as np
import pandas as pd
import os

def normalize(value, min_value, max_value, min_norm, max_norm):
    new_value = ((max_norm - min_norm)*((value - min_value)/(max_value - min_value))) + min_norm
    return new_value

dft = pd.read_csv('D:mc25.TXT', header=None)
id_set = dft[dft.index % 4 == 0].astype('int').values
A = dft[dft.index % 4 == 1].values
B = dft[dft.index % 4 == 2].values
C = dft[dft.index % 4 == 3].values
data = {'A': A[:,0], 'B': B[:,0], 'C': C[:,0]}

df = pd.DataFrame(data, columns=['A','B','C'], index = id_set[:,0])  

#next iteration create all plots, change the number of cycles
cycles = int(len(df)/480)
print(cycles)
for cycle in range(0,10):             
    count =  '{:04}'.format(cycle)
    j = cycle * 480
    for i in df:
        try:
            os.mkdir(i)
        except:
            pass

        min_val = df[i].min()
        min_nor = -1
        max_val = df[i].max()
        max_nor = 1

        ordered_data = mkdf(df.iloc[j:j+480][i])
        csv = print_df(ordered_data)
        #Print .csv files contains matrix of each parameters by name of cycles respectively
        csv.to_csv(f'{i}/{i}{count}.csv', header=None, index=None)            
        if 'C' in i:
            min_nor = -40
            max_nor = 150
            #Applying normalization for C between [-40,+150]
            new_value3 = normalize(df['C'].iloc[j:j+480], min_val, max_val, -40, 150)
            df3 = print_df(mkdf(new_value3))
            df3.to_csv(f'{i}/norm{i}{count}.csv', header=None, index=None)
        else:
            #Applying normalization for A,B between    [-1,+1]
            new_value1 = normalize(df['A'].iloc[j:j+480], min_val, max_val, -1, 1)
            new_value2 = normalize(df['B'].iloc[j:j+480], min_val, max_val, -1, 1)
            df1 = print_df(mkdf(new_value1))
            df2 = print_df(mkdf(new_value2))
            df1.to_csv(f'{i}/norm{i}{count}.csv', header=None, index=None) 
            df2.to_csv(f'{i}/norm{i}{count}.csv', header=None, index=None)  

Note2: I provided a dataset in text file for 3 cycles: Text dataset

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

I am not sure if I understood your question fully but this is a solution:

Convert your data frame to a 2d numpy array using as_matrix() then use ravel() to get a vector of size 480 * 3 then cycle over your cycles and use vstack method for stacking rows over each other in your result, this is a code with your example data:

A = [[1,2,3,4], [10,20,30,40]]
B = [[4,5,6,7], [40,50,60,70]]
C = [[8,9,10,11], [80,90,100,110]]

cycles = 2

for cycle in range(cycles):
    data = {'A': A[cycle], 'B': B[cycle], 'C': C[cycle]}
    df = pd.DataFrame(data)
    D = df.as_matrix().ravel()
    if cycle == 0:
        Results = np.array(D)
    else:
        Results = np.vstack((Results, D2))
# Output: Results= array([[  1,   4,   8,   2,   5,   9,   3,   6,  10,   4,   7,  11], [ 10,  40,  80,  20,  50,  90,  30,  60, 100,  40,  70, 110]], dtype=int64)
np.savetxt("Results.csv", Results, delimiter=",")

Is this what you wanted?


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...