VNext:

VNext is a Next-generation Video instance recognition framework on top of Detectron2.
Currently it provides advanced online and offline video instance segmentation algorithms, and a motion model for object-centric video segmentation task.
We will continue to update and improve it to provide a unified and efficient framework for the field of video instance recognition to nourish this field.

To date, VNext contains the official implementation of the following algorithms:

InstMove: Instance Motion for Object-centric Video Segmentation (CVPR 2023)

IDOL: In Defense of Online Models for Video Instance Segmentation (ECCV2022 Oral)

SeqFormer: Sequential Transformer for Video Instance Segmentation (ECCV2022 Oral)

NEWS!!:

InstMove is accepted to CVPR 2023, the code and models can be found here!
IDOL is accepted to ECCV 2022 as an oral presentation!
SeqFormer is accepted to ECCV 2022 as an oral presentation!
IDOL won first place in the video instance segmentation track of the 4th Large-scale Video Object Segmentation Challenge (CVPR2022).

Getting started

For Installation and data preparation, please refer to to INSTALL.md for more details.
For InstMove training, evaluation, plugin, and model zoo, please refer to InstMove.md
For IDOL training, evaluation, and model zoo, please refer to IDOL.md
For SeqFormer training, evaluation and model zoo, please refer to SeqFormer.md

IDOL

In Defense of Online Models for Video Instance Segmentation

Junfeng Wu, Qihao Liu, Yi Jiang, Song Bai, Alan Yuille, Xiang Bai

Introduction

In recent years, video instance segmentation (VIS) has been largely advanced by offline models, while online models are usually inferior to the contemporaneous offline models by over 10 AP, which is a huge drawback.
By dissecting current online models and offline models, we demonstrate that the main cause of the performance gap is the error-prone association and propose IDOL, which outperforms all online and offline methods on three benchmarks.
IDOL won first place in the video instance segmentation track of the 4th Large-scale Video Object Segmentation Challenge (CVPR2022).

Visualization results on OVIS valid set

Quantitative results

YouTube-VIS 2019

OVIS 2021

SeqFormer

SeqFormer: Sequential Transformer for Video Instance Segmentation

Junfeng Wu, Yi Jiang, Song Bai, Wenqing Zhang, Xiang Bai

Introduction

SeqFormer locates an instance in each frame and aggregates temporal information to learn a powerful representation of a video-level instance, which is used to predict the mask sequences on each frame dynamically.
SeqFormer is a robust, accurate, neat offline model and instance tracking is achieved naturally without tracking branches or post-processing.

Visualization results on YouTube-VIS 2019 valid set

Quantitative results

YouTube-VIS 2019

YouTube-VIS 2021

Citation

@inproceedings{seqformer,
  title={SeqFormer: Sequential Transformer for Video Instance Segmentation},
  author={Wu, Junfeng and Jiang, Yi and Bai, Song and Zhang, Wenqing and Bai, Xiang},
  booktitle={ECCV},
  year={2022},
}

@inproceedings{IDOL,
  title={In Defense of Online Models for Video Instance Segmentation},
  author={Wu, Junfeng and Liu, Qihao and Jiang, Yi and Bai, Song and Yuille, Alan and Bai, Xiang},
  booktitle={ECCV},
  year={2022},
}

Acknowledgement

This repo is based on detectron2, Deformable DETR, VisTR, and IFC Thanks for their wonderful works.

Badges

Extracted from project README

Related Projects

MotionGPT

The official PyTorch implementation of the paper "MotionGPT: Finetuned LLMs are General-Purpose M...

19 Jun 2023 191

UniMoCap

[Open-source Project] UniMoCap: community implementation to unify the text-motion datasets (Human...

06 Oct 2023 145

HumanMAC

[ICCV-2023] Official code for work "HumanMAC: Masked Motion Completion for Human Motion Prediction".

07 Feb 2023 266

CyclicGen

Deep Video Frame Interpolation using Cyclic Frame Generation

05 Nov 2018 158

human_body_prior

VPoser: Variational Human Pose Prior

09 May 2019 786

mv-extractor

Extract frames and motion vectors from H.264 and MPEG-4 encoded video.

06 Dec 2019 283

Roam

This repostory contains code and data instructions for ROAM, 3DV 2024. Authors: Wanyue Zhang, Ris...

31 Jan 2024 23

Conditional-Motion-In-Betweening

🕹️ Official Implementation of Conditional Motion In-betweening (CMIB) 🏃

10 May 2021 120

PianoMotion10M

Code release for PianoMotion10M

29 May 2024 51

OFASys

OFASys: A Multi-Modal Multi-Task Learning System for Building Generalist Models

08 Dec 2022 142

Awesome-Image-Distortion-Correction

A curated list of resources on handling Rolling Shutter effects and Radial Distortions

10 Jan 2020 220

momask-codes

Official implementation of "MoMask: Generative Masked Modeling of 3D Human Motions (CVPR2024)"

29 Nov 2023 783

sqair

Implementation of Sequential Attend, Infer, Repeat (SQAIR)

28 May 2018 97

MotionGPT

[NeurIPS 2023] MotionGPT: Human Motion as a Foreign Language, a unified motion-language generatio...

20 Jun 2023 1,460

motion-latent-diffusion

[CVPR 2023] Executing your Commands via Motion Diffusion in Latent Space, a fast and high-quality...

08 Dec 2022 574