Matching in GAN latent space for better bias benchmarking and semantic image editing. πΆπ»π§πΎπ©πΌβπ¦°π±π½ββοΈπ΄πΎ
This code allows one to project images into the GAN latent space, after which they can be modified for certain attributes (e.g. age, gender, hair-length) and mixed with other faces (e.g. other people, older/younger versions of the same person). All this code is handled by the projection_manipulation/project_and_manipulate.sh
script - the easiest way to get started is to use the Colab notebook, where you can upload your own images, and they will be automatically cropped, aligned projected, manipulated, and interpolated
Start with 2 real images (higher-res photos work better, as well as photos where the face is front-facing and not obstructed by things like hats, scarves, etc.):
Interpolating between the images:
Manipulating an image along pre-specified attributes:
Can do a lot more, like blending together many faces or interpolating between different faces of the same person!
The matching code here finds images that match across a certain attribute (e.g. perceived gender). This is useful for removing confounding factors when doing downstream analyses of things like gender bias in facial recognition. Similarly, we can perform matching using other methods, such as propensity scores, using the GAN latent space as covariates. Some example matches:
After performing matching, confounding is much lower on CelebA-HQ. This is illustrated by the fact that the mean values of several key (binary) attributes become much closer after matching:
source activate python3
, and then running pip install tensorflow-gpu==1.14.0
data/celeba-hq/ims
folder
data/processed
folder
dists_pairwise_gan.npy
, dists_pairwise_vgg.npy
, dists_pairwise_facial.npy
, dists_pairwise_facial_facenet.npy
, dists_pairwise_facial_facenet_casia.npy
, dists_pairwise_facial_vgg2.npy
- (30k x 30k) matrices storing the pairwise distances between all the images in celeba-hq using different distance measuresdata/processed/gen/generated_images_0.1
celeba_hq_latents_stylegan2.zip
- these are used in downstream analysis and are required for the propensity score analysisconfig.py
fileBoth the matching_benchmarking folder and the projection_manipulation folder contain two types of files:
.py
files in the scripts
subdirectories - these scripts are used to calculate the cached outputs in the gdrive folder. They do not need to be rerun, but show how the cached outputs were generated and can be rerun on new datasets..ipynb
notebooks - these are used to reproduce the results from the cached outputs in the gdrive folde. Noteboks beginning with eda
are for exploratory analysis, which can be useful but are note required to generate the final results in the paper@article{singh2021matched,
title={Matched sample selection with GANs for mitigating attribute confounding},
author={Chandan Singh and Guha Balakrishnan and Pietro Perona},
journal={arXiv preprint arXiv:2103.13455},
year={2021}
}