Use DALL-E to create infinite zoom movies

Ever since DALL-E launched a few months ago, OpenAI’s machine learning tool has been blowing the Internet’s mind with its uncanny ability to convert any sentence into an image. I’ve been playing around with DALL-E and one of the things that I stumbled into, is making zoomable movies. A number of people asked me how to do it and even though the tooling isn’t perfect, I though I’d give it a write-up. Warning: it gets a little technical and you’ll need access to DALL-E.

DALL-E can convert sentences into images, but can do so optionally based on a pre-existing image. These images have to be partly transparent and DALL-E will then fill in the blanks to make the end result match the prompt as well as possible. People have been using this to “uncrop” art works to see what could be around, say, the Mona Lisa, but we can go further. Let’s create a Van Gogh movie!

Fire up DALL-E and create an initial image. This will be the last frame in our movie:

The first frame is being generated

For the next step, we need to set-up some Python scripts. In a shell execute the following code:

This should get you a working version of the scripts needed. Now download the image you want to start with and place it in the vangogh directory with the name frame1.png. Then execute in the shell:

This produces an image called zoomed.png which is your frame1, but zoomed out and with the surroundings transparent. Go back to DALL-E and upload this image. DALL-E will ask you to change the crop (which we don’t need to do) and then whether you want to edit the image. Say yes.

You have to click on some part of the transparent image using the brush — otherwise DALL-E doesn’t know it was edited. Then enter a new sentence describing the slightly zoomed out scene:

Then click generate and pick the best result. Save that as frame2 and execute the same command but now with, eh, frame2:

Repeat the procedure until you have a decent number of frames. I went with completions describing “three windows”, “a house”, “a farm house surrounded by trees” etc — always starting the prompt with “a painting by Vincent Van Gogh depicting …”.

As we are zooming out, it is not always easy to keep the image from going completely trippy; I find that trying to reframe it sometimes helps — tell the system you want a picture on a wall for example. In this example I tried a scene switch by making the original painting being a painting inside of the starry night sky.

I stopped at frame14 at which point we have a bunch of cats looking at Van Gogh’s starry night. At that point my 14 frames looked like this:

Now let’s reverse everything and turn it into a movie. Type something like the following into your shell:

frame_count is the number of frames you created. frames is how many frames there will be between each transition. target is the name of the directory with the frames and also determines the name of the movie produced. You should see it process for a bit and then a new file vangogh.mp4 is produced.

An animated gif version of my movie looks like this — it’s a bit choppy due to the medium size restrictions but it gets the meaning across.

Resulting movie as a not very fluent animated gif

Some more examples on youtube:

Let me know if this technique works for you and what movies it produces!

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Douwe Osinga

Entrepreneur, Coding enthusiast and co-founder of Neptyne, the programmable spreadsheet