Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
62 views
in Technique[技术] by (71.8m points)

python - include extra row before duplicated rows

I have two data frames SF and OF.

SF: 

   PartNumber  ParentPartNumber   Webname       Brand        Value_Size     Full Description               ImagePath                      Short Description     Weight       RetailPriceEUR    Isfeatured       
     2.5            2                Sidi         Si              S     Honeycomb elastic          https://link1,https://link2           Honey               2.3             331           yes
     2.6            2                Sidi         Si              M     Honeycomb elastic          https://link1,https://link2            Honey              2.3             331           yes
     2.7            2                Sidi         Si              L     Honeycomb elastic          https://link1,https://link2           Honey               2.3             331           yes
     3.2            3                Shoei        Sho            S      E.Q.R.S.                   https://link3                          ERQS               1.5             331           yes
     3.3            3                Shoei        Sho            M      E.Q.R.S.                   https://link3                          ERQS               1.5             331           yes
     2.9            2                Sidi         Si              XL   Honeycomb elastic           https://link1,https://link2            Honey              2.3             331           yes

OF:

SKU      Published         Name     Parent        Size     Full Description           ImagePath                          ShortDescription       Weight           RetailPriceEUR            Isfeatured           height
 4          1              Bec
 8          1              Lin
  

What I want to do is to add an extra row before each duplicated row present in SF and append it to the OF data frame. For example, if there are duplicates in a parent like 2,2,3,3 the first row of 2 and the second row of 2 needs to be copied, and in addition to all rows, there have to be an extra one added before them with info as in the description. So the end result should look like

Result (SF rows appended in OF):

   SKU      Published         Name      Parent           Size             Full Description           ImagePath                          ShortDescription       Weight           RetailPriceEUR            Isfeatured           height
 4          1              Bec
 8          1              Lin
 2          1              Sidi                        S,M,L,XL          Honeycomb elastic         https://link1,https://link2                                                                            yes
 2.5        0              Honey        2               S                                                                                 Honey                2.3              331                        yes
 2.6        0              Honey        2               M                                                                                 Honey                2.3              331                        yes
 2.7        0              Honey        2               L                                                                                 Honey                2.3              331                        yes
 2.9        0              Honey        2               XL                                                                                Honey                2.3              331                        yes
 3 (extra)  1              Sho                         S,M                E.Q.R.S.                 https://link3                                                                                          yes
 3          0              ERQS         3               S                                                                                 EQRS                 1.5               33                        yes
 3          0              ERQS         3               M                                                                                 ERQS                 1.5               33                        yes
     

This is what I have done so far

# create a DataFrame keep SF all duplicated rows
cond = SF.duplicated(keep=False)
df = SF.loc[cond, ['PartNumber', 'ParentPartNumber', 'Webname' , 'Value_Size']].copy()
df['Published'] = 0

# rename columns, and sorted columns
df.columns = ['SKU', 'Parent', 'Name', 'Size', 'Published']
df = df[OF.columns].copy()

# drop duplicates(keep the first duplicate row), and assign 1 to Published column
dfn = df.drop_duplicates(keep='first').copy()


  # get rows with duplicated ParentPartNumber 
  SFs = SFs[SFs['ParentPartNumber'].duplicated(keep = False)]
  #group by size
gk =( SFs.groupby('ParentPartNumber'['Size'].agg(','.join).reindex(SFs['ParentPartNumber'].unique()).reset_indx())

# drop duplicates(keep the first duplicate row),
SFs = SFs.drop_duplicates(subset='ParentPartNumber', keep='first').copy()
dfn['SKU'] = SFs['ParentPartNumber'].values
dfn['Name'] = SFs['WebName'].values
dfn['Attribute 1 value(s)'] = gk['Size'].values
dfn['Published'] = 1

# append first duplicate row to df, and sort index to keep row order
dfn = dfn.append(df).sort_index()

# append the duplicates rows to OF
print(OF.append(dfn))
   

The issue is it works well with small sample size, however, for large files, it's not matching ParentPartNumber well and producing results where SKU = 3 should be below 3(extra) not above it

SKU      Published         Name      Parent           Size
 4          1              Bec
 8          1              Lin
 2          1              Sidi                        S,M,L,XL
 2.5        0              Honey        2               S
 2.6        0             Honey        2               M
 2.7        0              Honey        2               L
 2.9        0              Honey        2               XL
 *3          0              Sho         3               S*
 3 (extra)  1              ERQS                         S,M
 3          0              ERQS         3               M

EDIT:

I have a source file and have to perform some operations given below to obtain the dataframe shown above. Please note that some of the variables are missing from the sample file. The operations are:

Create new row in OF with values as follows 
OF Type = variable
OF SKU = SF ParentPartNumber
OF Name = SF WebName
OF YMM = ,0,0,,
OF Published = 1
OF Is featured = 0
OF Visibility in catalog = visible
OF Description = SF FullDescription
OF Tax status = taxable
OF In stock? = 1
OF Backorders allowed? = 0
OF Sold individually? = 0
OF Allow customer reviews? = 1
OF Images = SF ImagePath (when there is more than 1 link in SF replace | with , (comma))
OF Position = 0
OF Attribute = Size
OF Attribute 1 value(s) = all values from SF Size that belong to the same ParentPartNumber separated with comma
OF Attribute 1 visible = 0
OF Attribute 1 global = 0

then below that row copy all rows that belong to this ParentPartNumber as follows 

OF Type = variation 
OF SKU = SF PartNumber
OF Name = SF ShortDescription 
OF YMM = ,0,0,,
OF Published = 0
OF Is featured = 0
OF Visibility in catalog = visible
OF Tax status = taxable
OF Tax class = parent
OF In Stock = 1 
OF Backorders allowed? = 0
OF Sold individually? = 0
OF Weight (kg) = SF Weight
OF Allow customer reviews? = 0
OF Regular price = SF RetailPriceEUR
OF Parent = SF ParentPartNumber
question from:https://stackoverflow.com/questions/65546310/include-extra-row-before-duplicated-rows

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)
Waitting for answers

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...