Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
281 views
in Technique[技术] by (71.8m points)

python - Create a mixed data generator (images,csv) in keras

I am building a model with multiple inputs as shown in pyimagesearch, however I can't load all images into RAM and I am trying to create a generator that uses flow_from_directory and get from a CSV file all the extra attributes for each image being processed.

Question: How do I get the attributes from the CSV to correspond with the images in each batch from the image generator?

def get_combined_generator(images_dir, csv_dir, split, *args):
    """
    Creates train/val generators on images and csv data.

    Arguments:

    images_dir : string
        Path to a directory with subdirectories for each class.

    csv_dir : string
        Path to a directory containing train/val csv files with extra attributes.

    split : string
        Current split being used (train, val or test)
    """
    img_width, img_height, batch_size = args

    datagen = ImageDataGenerator(
        rescale=1. / 255)

    generator = datagen.flow_from_directory(
        f'{images_dir}/{split}',
        target_size=(img_width, img_height),
        batch_size=batch_size,
        shuffle=True,
        class_mode='categorical')

    df = pd.read_csv(f'{csv_dir}/{split}.csv', index_col='image')

    def my_generator(image_gen, data):
        while True:
            i = image_gen.batch_index
            batch = image_gen.batch_size
            row = data[i * batch:(i + 1) * batch]
            images, labels = image_gen.next()
            yield [images, row], labels

    csv_generator = my_generator(generator, df)

    return csv_generator
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

I would suggest creating a custom generator given this relatively specific case. Something like the following (modified from a similar answer here) should suffice:

import os
import random
import pandas as pd

def generator(image_dir, csv_dir, batch_size):
    i = 0
    image_file_list = os.listdir(image_dir)
    while True:
        batch_x = {'images': list(), 'other_feats': list()}  # use a dict for multiple inputs
        batch_y = list()
        for b in range(batch_size):
            if i == len(image_file_list):
                i = 0
                random.shuffle(image_file_list)
            sample = image_file_list[i]
            image_file_path = sample[0]
            csv_file_path = os.path.join(csv_dir,
                                         os.path.basename(image_file_path).replace('.png', '.csv'))
            i += 1
            image = preprocess_image(cv2.imread(image_file_path))
            csv_file = pd.read_csv(csv_file_path)
            other_feat = preprocess_feats(csv_file)
            batch_x['images'].append(image)
            batch_x['other_feats'].append(other_feat)
            batch_y.append(csv_file.loc[image_name, :]['class'])

        batch_x['images'] = np.array(batch_x['images'])  # convert each list to array
        batch_x['other_feats'] = np.array(batch_x['other_feats'])
        batch_y = np.eye(num_classes)[batch['labels']]
        yield batch_x, batch_y

Then, you can use Keras's fit_generator() function to train your model.

Obviously, this assumes you have csv files with the same names as your image files, and that you have some custom preprocessing functions for images and csv files.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...