Future Person Localization in First-Person Videos (CVPR2018)
This repository contains the code and data (caution: no raw image provided) for the paper "Future Person Localization in First-Person Videos" by Takuma Yagi, Karttikeya Mangalam, Ryo Yonetani and Yoichi Sato.
Prediction examples
Requirements
We confirmed the code works correctly in below versions.
If you wish downloading via terminal, consider using custom script.
Extract the downloaded tar.gz file at the root directory.
tar xvf fpl.tar.gz
Pseudo-video
Since we cannot release the raw images, we prepared sample pseudo-video below.
The video shows the automatically extracted location histories, poses. The number shown in the bounding box corresponds to the person id in the processed data.
Background colors are the result from pre-trained dilated CNN trained with MIT Scene Parsing Benchmark.
Download link (pseudo-video)
Create dataset
Run dataset generation script to preprocess raw locations/poses/egomotions.
A single processed file will be generated in datasets/.
# Test (debug) data
python utils/create_dataset.py utils/id_test.txt --traj_length 20 --traj_skip 2 --nb_splits 5 --seed 1701 --traj_skip_test 5
# All data
python utils/create_dataset.py utils/id_list_20.txt --traj_length 20 --traj_skip 2 --nb_splits 5 --seed 1701 --traj_skip_test 5
Prepare training script
Modify the "in_data" arguments in scripts/5fold.json.
In our environment (a single TITAN X Pascal w/ CUDA 8, cuDNN 5.1), it took approximately 40 minutes per split.
# Train proposed model and ablation models
python utils/run.py scripts/5fold.json run <gpu id>
# Train proposed model only
python utils/run.py scripts/5fold_proposed_only.json run <gpu id>
Evaluation
python utils/eval.py experiments/5fold_yymmss_HHMMSS/ 17000 run <gpu id> 10
# Run this code after placing <video_id>.mp4 into data/pseudo_viz/
# Extract images from video
python utils/video2img_all.py data/pseudo_viz/
# Plot images
python utils/plot_prediction.py <experiment>/<fold> --traj_type 0
# Write videos
python utils/write_video.py <experiment>/<fold> --vid GOPRXXXXU20 --frame XXXX --pid XXX
License and Citation
The dataset provided in this repository is only to be used for non-commercial scientific purposes. If you used this dataset in scientific publication, cite the following paper:
Takuma Yagi, Karttikeya Mangalam, Ryo Yonetani and Yoichi Sato. Future Person Localization in First-Person Videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018.
@InProceedings{yagi2018future,
title={Future Person Localization in First-Person Videos},
author={Yagi, Takuma and Mangalam, Karttikeya and Yonetani, Ryo and Sato, Yoichi},
booktitle={The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year={2018}
}
请发表评论