While browsing aimlessly the Internet, I stumbled on a subreddit named /r/DeepDream.

The entire purpose of the subreddit is to promote some machine-learning generated art. The results are a bit unexpected and trippy, but always interesting.

Journey by u/2CB_Education

Now, this article is going to focus on the VQGAN+CLIP, a text-to-image algorithm capable of generating astonishing art with only a few prompts and tweaking.

Getting our hands dirty

Most of the work can be done through some Google Collab, so you won't need a dedicated GPU on your own computer. You'll also need some space on your google drive, to host the different models you'll have to download.

Using google collab is free, though you won't have priority access to the GPU.

You can find the first collab used at this link.

1. Changing the site hosting the different models

As of october 24th 2021, the site hosting different models (http://mirror.io.community/) is down.

You'll need to copy the collab on your own google drive, and change some line in the Selection of models to download sections. Here is mine, where I put the correct link for both image_net:

if imagenet_1024:
  !curl -L -o vqgan_imagenet_f16_1024.ckpt -C - 'https://heibox.uni-heidelberg.de/f/140747ba53464f49b476/?dl=1'
  !curl -L -o vqgan_imagenet_f16_1024.yaml -C - 'https://heibox.uni-heidelberg.de/f/6ecf2af6c658432c8298/?dl=1'
if imagenet_16384:
  !curl -L -o vqgan_imagenet_f16_16384.ckpt -C - 'https://heibox.uni-heidelberg.de/f/867b05fc8c4841768640/?dl=1'
  !curl -L -o vqgan_imagenet_f16_16384.yaml -C - 'https://heibox.uni-heidelberg.de/f/274fb24ed38341bfa753/?dl=1'

Additional informations can be found here.

2. Using the models

A very detailed tutorial on how to use the model can be found here. Also, if you're not inspired, here is a very interesting list of combinaison of keyword.

Please be carefull when choosing the model you're gonna use. Certain are suitable for commercial use (aka NFT), while others, like imagenet, are not.

For a first try, I decided to use the following image as an initial image. It's a photo a friend's apartment:

My friend does have a cool apartment

A first naïve try was to use the prompt sketch on the initial image, with the sflckr model.

That did not went as planned

The first try was a bit... Uncanny. It did morph into what could be called a sketch. Unfortunately, the Initial image provided was (obviously) not suitable.

With a bit of tweaking, and using imagenet_16384, with a combined prompt "living room:60| voxel:30| 4K:10" (meaning: do an image composed of 60% of a living room, 30% voxel and 10% of 4k).

That went better

Some artifacts are still able to be seen (especially around plants) but the result looks way better.

3. To go further

The previous shown image are for rendering statics images (like the sketched bernie). But in order to render something like the second gif, named Journey, we need to use a Zooming VQGAN + Clip. The main idea behind it is that you use a zoomed version of the last iteration for the next one, in order to have a continious zoom effect.

One added side benefit is that you can use time-stamped inputs. You can then do composition like this one:

Ascension by u/ascendedhand

Additional resources

VQGAN+CLIP — How does it work?
Introduction to VQGAN+CLIP
Combinaison of keyword (Already linked in the article, but please, take a look !)