
Computational photography pipeline that performs multiple inferences from any image or video.

PRISMA is a computational photography pipeline that performs multiple inferences (refere as "bands") from any image or video. Like light pasing through a prism that bends it into different wavelengths, this pipeline expands images into data that can be use for 3D reconstruction or realtime post-processing operations.

It's a combination of different algorithms and open sourced pre-train models such as:

The resulting bands are stored in a folder with the same name as the input file. Each band is stored as a single .png or .mp4 file. And can be imported on:



Main dependencies:

git clone [email protected]:patriciogonzalezvivo/prisma.git
cd prisma

conda env create -f environment.yml
conda activate prisma

# Install mmcv (for mmdetection)
pip install -U openmim
mim install mmengine
mim install "mmcv-full==1.7.1"

How it works?

a. Process

We start by processing an image or video. Let's start by processing an image:

python -i data/gog.jpg

With out providing an --output this will create a folder with the same filename which will contain all the derived bands (rgba, flow, mask and depth_*).

├── depth_patchfusion.png
├── mask.png
├── metadata.json
└── rgba.png

In the forlder you will find a metadata.json file that contains all the metadata associated with the original image or video.

    "bands": {
        "rgba": {
            "url": "rgba.png"
        "depth_patchfusion": {
            "url": "depth_patchfusion.png",
            "values": {
                "min": {
                    "value": 1.6147574186325073,
                    "type": "float"
                "max": {
                    "value": 11.678544044494629,
                    "type": "float"
        "mask": {
            "url": "mask.png",
            "ids": [
    "width": 934,
    "height": 440,
    "principal_point": [
    "focal_length": 641.0616195031489,
    "field_of_view": 37.88246641919117

Currently PRISMA supports multiple depth estimation algorithms. You can select which one to use by providing the --depth|-d argument: depth_midas, depth_zoedepth, depth_patchfusion, depth_marigold or all. By defualt images will be processed using depth_patchfusion, while videos will use depth_anything.

When processing videos, by default PRISMA creates the least ammount of data by creating a single .png or .mp4 for each band. In the case of videos data like min/max values will be stored on .cvs.

it's possible to save extra data by setting the --extra|-e level number.

  1. store bands as a single .png and .mp4 (video have usually an associated .csv file)
  2. store images as .ply point clouds, for videos it extracts the reslting frames as .png
  3. store optical flow from videos as .flo files.
  4. store inferenced depth as .npy files.

Let's try now extracting all depth models and individual frames from a video:

python -i data/rocky.mp4 -d all -e 1

Which produce the folowing folder structure:

├── depth_anything/
│   ├── 000000.png
│   ├── 000001.png
│   ├── ...
│   └── 000110.png
├── depth_anything_max.csv
├── depth_anything_min.csv
├── depth_anything.mp4
├── depth_marigold/
│   ├── 000000.png
│   ├── 000001.png
│   ├── ...
│   └── 000110.png
├── depth_marigold_max.csv
├── depth_marigold_min.csv
├── depth_marigold.mp4
├── depth_midas/
│   ├── 000000.png
│   ├── 000001.png
│   ├── ...
│   └── 000110.png
├── depth_midas_max.csv
├── depth_midas_min.csv
├── depth_midas.mp4
├── depth_patchfusion/
│   ├── 000000.png
│   ├── 000001.png
│   ├── ...
│   └── 000110.png
├── depth_patchfusion_max.csv
├── depth_patchfusion_min.csv
├── depth_patchfusion.mp4
├── depth_zoedepth/
│   ├── 000000.png
│   ├── 000001.png
│   ├── ...
│   └── 000110.png
├── depth_zoedepth_max.csv
├── depth_zoedepth_min.csv
├── depth_zoedepth.mp4
├── flow_raft/
│   ├── 000000.png
│   ├── 000001.png
│   ├── ...
│   └── 000110.png
├── flow_raft.csv
├── flow_raft.mp4
├── flow_gmflow/
│   ├── 000000.png
│   ├── 000001.png
│   ├── ...
│   └── 000110.png
├── flow_gmflow.csv
├── flow_gmflow.mp4
├── images/
│   ├── 000000.png
│   ├── 000001.png
│   ├── ...
│   └── 000110.png
├── mask/
|   ├── 000000.png
|   ├── 000001.png
|   ├── ...
|   └── 000110.png
├── mask.mp4
|── sparse/
|   └── 0/
|       ├── cameras.bin
|       ├── images.bin
|       ├── points3D.bin
|       └── points3D.txt
|── camera_pose.csv
|── colmap.db
├── metadata.json
└── rgba.mp4

b. Visualize

View the resulting bands from the processed image/video using ReRun:

python -i data/rocky

c. Concatenate bands

In order to export the bands as a single image or video you can use the script:

python -i data/gog -o test.png

Licenses and Credits

This pipeline is Copyright (c) 2024, Patricio Gonzalez Vivo and Licensed under CC BY-NC-SA 4.0 please reach out to patriciogonzalezvivo at gmail dot com, for getting a comercial license.

All the models and software used by it are commercial ready licenses like MIT, Apache and BSD.

