Minimal code and examnples for inferencing Sapiens foundation human models in Pytorch
MIT License
Minimal code and examples for inferencing Sapiens foundation human models in Pytorch
SapiensPredictor
class that allows to run multiple tasks simultaneously[!CAUTION]
- Use 1B models, since the accuracy of lower models is not good (especially for segmentation)
- Exported ONNX models are too slow.
- Input sizes other than 768x1024 don't produce good results.
- Running Sapiens models on a cropped person produces worse results, even if you crop a wider rectangle around the person.
pip install sapiens-inferece
Or, clone this repository:
git clone https://github.com/ibaiGorordo/Sapiens-Pytorch-Inference.git
cd Sapiens-Pytorch-Inference
pip install -r requirements.txt
import cv2
from imread_from_url import imread_from_url
from sapiens_inference import SapiensPredictor, SapiensConfig, SapiensDepthType, SapiensNormalType
# Load the model
config = SapiensConfig()
config.depth_type = SapiensDepthType.DEPTH_03B # Disabled by default
config.normal_type = SapiensNormalType.NORMAL_1B # Disabled by default
predictor = SapiensPredictor(config)
# Load the image
img = imread_from_url("https://github.com/ibaiGorordo/Sapiens-Pytorch-Inference/blob/assets/test2.png?raw=true")
# Estimate the maps
result = predictor(img)
cv2.namedWindow("Combined", cv2.WINDOW_NORMAL)
cv2.imshow("Combined", result)
cv2.waitKey(0)
The SapiensPredictor
class allows to run multiple tasks simultaneously. It has the following methods:
SapiensPredictor(config: SapiensConfig)
- Load the model with the specified configuration.__call__(img: np.ndarray) -> np.ndarray
- Estimate the maps for the input image.The SapiensConfig
class allows to configure the model. It has the following attributes:
dtype: torch.dtype
- Data type to use. Default: torch.float32
.device: torch.device
- Device to use. Default: cuda
if available, otherwise cpu
.depth_type: SapiensDepthType
- Depth model to use. Options: OFF
, DEPTH_03B
, DEPTH_06B
, DEPTH_1B
, DEPTH_2B
. Default: OFF
.normal_type: SapiensNormalType
- Normal model to use. Options: OFF
, NORMAL_03B
, NORMAL_06B
, NORMAL_1B
, NORMAL_2B
. Default: OFF
.segmentation_type: SapiensSegmentationType
- Segmentation model to use (Always enabled for the mask). Options: SEGMENTATION_03B
, SEGMENTATION_06B
, SEGMENTATION_1B
. Default: SEGMENTATION_1B
.detector_config: DetectorConfig
- Configuration for the object detector. Default: {model_path: str = "models/yolov8m.pt"
, person_id: int = 0
, confidence: float = 0.25
}. Disabled as it produces worst results.minimum_person_height: float
- Minimum height ratio of the person to detect. Default: 0.5f
(50%). Not used if the object detector is disabled.python image_predictor.py
python video_predictor.py
python webcam_predictor.py
python image_normal_estimation.py
python image_segmentation.py
python video_normal_estimation.py
python video_segmentation.py
python webcam_normal_estimation.py
python webcam_segmentation.py
To export the model to ONNX, run the following script:
python export_onnx.py seg03b
The available models are seg03b
, seg06b
, seg1b
, depth03b
, depth06b
, depth1b
, depth2b
, normal03b
, normal06b
, normal1b
, normal2b
.
The original models are available at HuggingFace: https://huggingface.co/facebook/sapiens/tree/main/sapiens_lite_host