Image-to-Image Demo - Affine Layer

Interactive Image Translation with pix2pix-tensorflow

Written by Christopher Hesse — February 19^th, 2017

Recently, I made a Tensorflow port of pix2pix by Isola et al., covered in the article Image-to-Image Translation in Tensorflow. I've taken a few pre-trained models and made an interactive web thing for trying them out. Chrome is recommended.

The pix2pix model works by training on pairs of images such as building facade labels to building facades, and then attempts to generate the corresponding output image from any input image you give it. The idea is straight from the pix2pix paper, which is a good read.

edges2cats

Trained on about 2k stock cat photos and edges automatically generated from those photos. Generates cat-colored objects, some with nightmare faces. The best one I've seen yet was a cat-beholder.

Some of the pictures look especially creepy, I think because it's easier to notice when an animal looks wrong, especially around the eyes. The auto-detected edges are not very good and in many cases didn't detect the cat's eyes, making it a bit worse for training the image translation model.

facades

Trained on a database of building facades to labeled building facades. It doesn't seem sure about what to do with a large empty area, but if you put enough windows on there it often has reasonable results. Draw "wall" color rectangles to erase things.

I didn't have the names of the different parts of building facades so I just guessed what they were called.

edges2shoes

Trained on a database of ~50k shoe pictures collected from Zappos along with edges generated from those pictures automatically. If you're really good at drawing the edges of shoes, you can try to produce some new designs. Keep in mind it's trained on real objects, so if you can draw more 3D things, it seems to work better.

edges2handbags

Similar to the previous one, trained on a database of ~137k handbag pictures collected from Amazon and automatically generated edges from those pictures. If you draw a shoe here instead of a handbag, you get a very oddly textured shoe.

Implementation

The models were trained and exported with the pix2pix.py script from pix2pix-tensorflow. The interactive demo is made in javascript using the Canvas API and runs the model using deeplearn.js.

The pre-trained models are available in the Datasets section on GitHub. All the ones released alongside the original pix2pix implementation should be available. The models used for the javascript implementation are available at pix2pix-tensorflow-models.

The edges for the cat photos were generated using Holistically-Nested Edge Detection and the functionality was added to process.py and the dependencies were added to the Docker image.