mmpose.apis¶

mmpose.apis.collect_multi_frames(video, frame_id, indices, online=False)[源代码]¶

Collect multi frames from the video.

参数

video (mmcv.VideoReader) – A VideoReader of the input video file.
frame_id (int) – index of the current frame
indices (list(int)) – index offsets of the frames to collect
online (bool) – inference mode, if set to True, can not use future frame information.

返回

multi frames collected from the input video file.

返回类型

list(ndarray)

mmpose.apis.extract_pose_sequence(pose_results, frame_idx, causal, seq_len, step=1)[源代码]¶

Extract the target frame from 2D pose results, and pad the sequence to a fixed length.

参数

pose_results (list[list[dict]]) –
Multi-frame pose detection results stored in a nested list. Each element of the outer list is the pose detection results of a single frame, and each element of the inner list is the pose information of one person, which contains:
- keypoints (ndarray[K, 2 or 3]): x, y, [score]
- track_id (int): unique id of each person, required when with_track_id==True.
- bbox ((4, ) or (5, )): left, right, top, bottom, [score]
frame_idx (int) – The index of the frame in the original video.
causal (bool) – If True, the target frame is the last frame in a sequence. Otherwise, the target frame is in the middle of a sequence.
seq_len (int) – The number of frames in the input sequence.
step (int) – Step size to extract frames from the video.

返回

Multi-frame pose detection results stored in a nested list with a length of seq_len.

返回类型

list[list[dict]]

mmpose.apis.get_track_id(results, results_last, next_id, min_keypoints=3, use_oks=False, tracking_thr=0.3, use_one_euro=False, fps=None, sigmas=None)[源代码]¶

Get track id for each person instance on the current frame.

参数

results (list[dict]) – The bbox & pose results of the current frame (bbox_result, pose_result).
results_last (list[dict], optional) – The bbox & pose & track_id info of the last frame (bbox_result, pose_result, track_id). None is equivalent to an empty result list. Default: None
next_id (int) – The track id for the new person instance.
min_keypoints (int) – Minimum number of keypoints recognized as person. 0 means no minimum threshold required. Default: 3.
use_oks (bool) – Flag to using oks tracking. default: False.
tracking_thr (float) – The threshold for tracking.
use_one_euro (bool) – Option to use one-euro-filter. default: False.
fps (optional) – Parameters that d_cutoff when one-euro-filter is used as a video input
sigmas (np.ndarray) – Standard deviation of keypoint labelling. It is necessary for oks_iou tracking (use_oks==True). It will be use sigmas of COCO as default if it is set to None. Default is None.

返回

results (list[dict]): The bbox & pose & track_id info of the current frame (bbox_result, pose_result, track_id).
next_id (int): The track id for the new person instance.

返回类型

tuple

mmpose.apis.inference_bottom_up_pose_model(model, img_or_path, dataset='BottomUpCocoDataset', dataset_info=None, pose_nms_thr=0.9, return_heatmap=False, outputs=None)[源代码]¶

Inference a single image with a bottom-up pose model.

注解

num_people: P
num_keypoints: K
bbox height: H
bbox width: W

参数

model (nn.Module) – The loaded pose model.
img_or_path (str| np.ndarray) – Image filename or loaded image.
dataset (str) – Dataset name, e.g. ‘BottomUpCocoDataset’. It is deprecated. Please use dataset_info instead.
dataset_info (DatasetInfo) – A class containing all dataset info.
pose_nms_thr (float) – retain oks overlap < pose_nms_thr, default: 0.9.
return_heatmap (bool) – Flag to return heatmap, default: False.
outputs (list(str) | tuple(str)) – Names of layers whose outputs need to be returned, default: None.

返回

pose_results (list[np.ndarray]): The predicted pose info. The length of the list is the number of people (P). Each item in the list is a ndarray, containing each person’s pose (np.ndarray[Kx3]): x, y, score.
returned_outputs (list[dict[np.ndarray[N, K, H, W] | torch.Tensor[N, K, H, W]]]): Output feature maps from layers specified in outputs. Includes ‘heatmap’ if return_heatmap is True.

返回类型

tuple

mmpose.apis.inference_interhand_3d_model(model, img_or_path, det_results, bbox_thr=None, format='xywh', dataset='InterHand3DDataset')[源代码]¶

Inference a single image with a list of hand bounding boxes.

注解

num_bboxes: N
num_keypoints: K

参数

model (nn.Module) – The loaded pose model.
img_or_path (str | np.ndarray) – Image filename or loaded image.
det_results (list[dict]) – The 2D bbox sequences stored in a list. Each each element of the list is the bbox of one person, whose shape is (ndarray[4 or 5]), containing 4 box coordinates (and score).
dataset (str) – Dataset name.
format – bbox format (‘xyxy’ | ‘xywh’). Default: ‘xywh’. ‘xyxy’ means (left, top, right, bottom), ‘xywh’ means (left, top, width, height).

返回

3D pose inference results. Each element is the result of an instance, which contains the predicted 3D keypoints with shape (ndarray[K,3]). If there is no valid instance, an empty list will be returned.

返回类型

list[dict]

mmpose.apis.inference_mesh_model(model, img_or_path, det_results, bbox_thr=None, format='xywh', dataset='MeshH36MDataset')[源代码]¶

Inference a single image with a list of bounding boxes.

注解

num_bboxes: N
num_keypoints: K
num_vertices: V
num_faces: F

参数

model (nn.Module) – The loaded pose model.
img_or_path (str | np.ndarray) – Image filename or loaded image.
det_results (list[dict]) – The 2D bbox sequences stored in a list. Each element of the list is the bbox of one person. “bbox” (ndarray[4 or 5]): The person bounding box, which contains 4 box coordinates (and score).
bbox_thr (float | None) – Threshold for bounding boxes. Only bboxes with higher scores will be fed into the pose detector. If bbox_thr is None, all boxes will be used.
format (str) –
bbox format (‘xyxy’ | ‘xywh’). Default: ‘xywh’.
- ’xyxy’ means (left, top, right, bottom),
- ’xywh’ means (left, top, width, height).
dataset (str) – Dataset name.

返回

3D pose inference results. Each element is the result of an instance, which contains:

’bbox’ (ndarray[4]): instance bounding bbox

’center’ (ndarray[2]): bbox center

’scale’ (ndarray[2]): bbox scale

’keypoints_3d’ (ndarray[K,3]): predicted 3D keypoints

’camera’ (ndarray[3]): camera parameters

’vertices’ (ndarray[V, 3]): predicted 3D vertices

’faces’ (ndarray[F, 3]): mesh faces

If there is no valid instance, an empty list will be returned.

返回类型

list[dict]

mmpose.apis.inference_pose_lifter_model(model, pose_results_2d, dataset=None, dataset_info=None, with_track_id=True, image_size=None, norm_pose_2d=False)[源代码]¶

Inference 3D pose from 2D pose sequences using a pose lifter model.

参数

model (nn.Module) – The loaded pose lifter model
pose_results_2d (list[list[dict]]) –
The 2D pose sequences stored in a nested list. Each element of the outer list is the 2D pose results of a single frame, and each element of the inner list is the 2D pose of one person, which contains:
- ”keypoints” (ndarray[K, 2 or 3]): x, y, [score]
- ”track_id” (int)
dataset (str) – Dataset name, e.g. ‘Body3DH36MDataset’
with_track_id – If True, the element in pose_results_2d is expected to contain “track_id”, which will be used to gather the pose sequence of a person from multiple frames. Otherwise, the pose results in each frame are expected to have a consistent number and order of identities. Default is True.
image_size (tuple|list) – image width, image height. If None, image size will not be contained in dict data.
norm_pose_2d (bool) – If True, scale the bbox (along with the 2D pose) to the average bbox scale of the dataset, and move the bbox (along with the 2D pose) to the average bbox center of the dataset.

返回

3D pose inference results. Each element is the result of an instance, which contains:

”keypoints_3d” (ndarray[K, 3]): predicted 3D keypoints

”keypoints” (ndarray[K, 2 or 3]): from the last frame in pose_results_2d.

”track_id” (int): from the last frame in pose_results_2d. If there is no valid instance, an empty list will be returned.

返回类型

list[dict]

mmpose.apis.inference_top_down_pose_model(model, imgs_or_paths, person_results=None, bbox_thr=None, format='xywh', dataset='TopDownCocoDataset', dataset_info=None, return_heatmap=False, outputs=None)[源代码]¶

Inference a single image with a list of person bounding boxes. Support single-frame and multi-frame inference setting.

注解

num_frames: F
num_people: P
num_keypoints: K
bbox height: H
bbox width: W

参数

model (nn.Module) – The loaded pose model.
imgs_or_paths (str | np.ndarray | list(str) | list(np.ndarray)) – Image filename(s) or loaded image(s).
person_results (list(dict), optional) –
a list of detected persons that contains bbox and/or track_id:
- bbox (4, ) or (5, ): The person bounding box, which contains
  4 box coordinates (and score).
- track_id (int): The unique id for each human instance. If
  not provided, a dummy person result with a bbox covering the entire image will be used. Default: None.
bbox_thr (float | None) – Threshold for bounding boxes. Only bboxes with higher scores will be fed into the pose detector. If bbox_thr is None, all boxes will be used.
format (str) –
bbox format (‘xyxy’ | ‘xywh’). Default: ‘xywh’.
- xyxy means (left, top, right, bottom),
- xywh means (left, top, width, height).
dataset (str) – Dataset name, e.g. ‘TopDownCocoDataset’. It is deprecated. Please use dataset_info instead.
dataset_info (DatasetInfo) – A class containing all dataset info.
return_heatmap (bool) – Flag to return heatmap, default: False
outputs (list(str) | tuple(str)) – Names of layers whose outputs need to be returned. Default: None.

返回

pose_results (list[dict]): The bbox & pose info. Each item in the list is a dictionary, containing the bbox: (left, top, right, bottom, [score]) and the pose (ndarray[Kx3]): x, y, score.
returned_outputs (list[dict[np.ndarray[N, K, H, W] | torch.Tensor[N, K, H, W]]]): Output feature maps from layers specified in outputs. Includes ‘heatmap’ if return_heatmap is True.

返回类型

tuple

mmpose.apis.init_pose_model(config, checkpoint=None, device='cuda:0')[源代码]¶

Initialize a pose model from config file.

参数

config (str or mmcv.Config) – Config file path or the config object.
checkpoint (str, optional) – Checkpoint path. If left as None, the model will not load any weights.

返回

The constructed detector.

返回类型

nn.Module

mmpose.apis.init_random_seed(seed=None, device='cuda')[源代码]¶

Initialize random seed.

If the seed is not set, the seed will be automatically randomized, and then broadcast to all processes to prevent some potential bugs.

参数

seed (int, Optional) – The seed. Default to None.
device (str) – The device where the seed will be put on. Default to ‘cuda’.

返回

Seed to be used.

返回类型

int

mmpose.apis.multi_gpu_test(model, data_loader, tmpdir=None, gpu_collect=False)[源代码]¶

Test model with multiple gpus.

This method tests model with multiple gpus and collects the results under two different modes: gpu and cpu modes. By setting ‘gpu_collect=True’ it encodes results to gpu tensors and use gpu communication for results collection. On cpu mode it saves the results on different gpus to ‘tmpdir’ and collects them by the rank 0 worker.

参数

model (nn.Module) – Model to be tested.
data_loader (nn.Dataloader) – Pytorch data loader.
tmpdir (str) – Path of directory to save the temporary results from different gpus under cpu mode.
gpu_collect (bool) – Option to use either gpu or cpu to collect results.

返回

The prediction results.

返回类型

list

mmpose.apis.process_mmdet_results(mmdet_results, cat_id=1)[源代码]¶

Process mmdet results, and return a list of bboxes.

参数

mmdet_results (list|tuple) – mmdet results.
cat_id (int) – category id (default: 1 for human)

返回

a list of detected bounding boxes

返回类型

person_results (list)

mmpose.apis.single_gpu_test(model, data_loader)[源代码]¶

Test model with a single gpu.

This method tests model with a single gpu and displays test progress bar.

参数

model (nn.Module) – Model to be tested.
data_loader (nn.Dataloader) – Pytorch data loader.

返回

The prediction results.

返回类型

list

mmpose.apis.train_model(model, dataset, cfg, distributed=False, validate=False, timestamp=None, meta=None)[源代码]¶

Train model entry function.

参数

model (nn.Module) – The model to be trained.
dataset (Dataset) – Train dataset.
cfg (dict) – The config dict for training.
distributed (bool) – Whether to use distributed training. Default: False.
validate (bool) – Whether to do evaluation. Default: False.
timestamp (str | None) – Local time for runner. Default: None.
meta (dict | None) – Meta dict to record some important information. Default: None

mmpose.apis.vis_3d_mesh_result(model, result, img=None, show=False, out_file=None)[源代码]¶

Visualize the 3D mesh estimation results.

参数

model (nn.Module) – The loaded model.
result (list[dict]) – 3D mesh estimation results.

mmpose.apis.vis_3d_pose_result(model, result, img=None, dataset='Body3DH36MDataset', dataset_info=None, kpt_score_thr=0.3, radius=8, thickness=2, vis_height=400, num_instances=- 1, axis_azimuth=70, show=False, out_file=None)[源代码]¶

Visualize the 3D pose estimation results.

参数

model (nn.Module) – The loaded model.
result (list[dict]) –

mmpose.apis.vis_pose_result(model, img, result, radius=4, thickness=1, kpt_score_thr=0.3, bbox_color='green', dataset='TopDownCocoDataset', dataset_info=None, show=False, out_file=None)[源代码]¶

Visualize the detection results on the image.

参数

model (nn.Module) – The loaded detector.
img (str | np.ndarray) – Image filename or loaded image.
result (list[dict]) – The results to draw over img (bbox_result, pose_result).
radius (int) – Radius of circles.
thickness (int) – Thickness of lines.
kpt_score_thr (float) – The threshold to visualize the keypoints.
skeleton (list[tuple()]) – Default None.
show (bool) – Whether to show the image. Default True.
out_file (str|None) – The filename of the output visualization image.

mmpose.apis.vis_pose_tracking_result(model, img, result, radius=4, thickness=1, kpt_score_thr=0.3, dataset='TopDownCocoDataset', dataset_info=None, show=False, out_file=None)[源代码]¶

Visualize the pose tracking results on the image.

参数

model (nn.Module) – The loaded detector.
img (str | np.ndarray) – Image filename or loaded image.
result (list[dict]) – The results to draw over img (bbox_result, pose_result).
radius (int) – Radius of circles.
thickness (int) – Thickness of lines.
kpt_score_thr (float) – The threshold to visualize the keypoints.
skeleton (list[tuple]) – Default None.
show (bool) – Whether to show the image. Default True.
out_file (str|None) – The filename of the output visualization image.

mmpose.apis.webcam¶

MMPose Webcam API: Tools to build simple interactive webcam applications and demos

Executor
Nodes
Utils
- Buffer and Message
- Pose
- Event
- Misc

Executor ¶

WebcamExecutor

The interface to build and execute webcam applications from configs.

Nodes ¶

Base Nodes ¶

`Node`	Base class for node, which is the interface of basic function module.
`BaseVisualizerNode`	Base class for nodes whose function is to create visual effects, like visualizing model predictions, showing graphics or showing text messages.

Model Nodes ¶

`DetectorNode`	Detect objects from the frame image using MMDetection model.
`TopDownPoseEstimatorNode`	Perform top-down pose estimation using MMPose model.
`PoseTrackerNode`	Perform object detection and top-down pose estimation.

Visualizer Nodes ¶

`ObjectVisualizerNode`	Visualize the bounding box and keypoints of objects.
`NoticeBoardNode`	Show text messages in the frame.
`SunglassesEffectNode`	Apply sunglasses effect (draw sunglasses at the facial area)to the objects with eye keypoints in the frame.
`BigeyeEffectNode`	Apply big-eye effect to the objects with eye keypoints in the frame.

Helper Nodes ¶

`ObjectAssignerNode`	Assign the object information to the frame message.
`MonitorNode`	Show diagnostic information.
`RecorderNode`	Record the video frames into a local file.

Utils ¶

Buffer and Message ¶

`BufferManager`	A helper class to manage multiple buffers.
`Message`	Message base class.
`FrameMessage`	The message to store information of a video frame.
`VideoEndingMessage`	The special message to indicate the ending of the input video.

Pose ¶

`get_eye_keypoint_ids`	A helper function to get the keypoint indices of left and right eyes from the model config.
`get_face_keypoint_ids`	A helper function to get the keypoint indices of the face from the model config.
`get_hand_keypoint_ids`	A helper function to get the keypoint indices of left and right hand from the model config.
`get_mouth_keypoint_ids`	A helper function to get the mouth keypoint index from the model config.
`get_wrist_keypoint_ids`	A helper function to get the keypoint indices of left and right wrists from the model config.

Event ¶

EventManager

A helper class to manage events.

Misc ¶

`copy_and_paste`	Copy the image region and paste to the background.
`screen_matting`	Get screen matting mask.
`expand_and_clamp`	Expand the bbox and clip it to fit the image shape.
`limit_max_fps`	A context manager to limit maximum frequence of entering the context.
`is_image_file`	Check if a path is an image file by its extension.
`get_cached_file_path`	Loads the Torch serialized object at the given URL.
`load_image_from_disk_or_url`	Load an image file, from disk or url.
`get_config_path`	Get config path from an OpenMMLab codebase.

mmpose.core¶

evaluation¶

class mmpose.core.evaluation.DistEvalHook(dataloader, start=None, interval=1, by_epoch=True, save_best=None, rule=None, test_fn=None, greater_keys=['acc', 'ap', 'ar', 'pck', 'auc', '3dpck', 'p-3dpck', '3dauc', 'p-3dauc', 'pcp'], less_keys=['loss', 'epe', 'nme', 'mpjpe', 'p-mpjpe', 'n-mpjpe'], broadcast_bn_buffer=True, tmpdir=None, gpu_collect=False, **eval_kwargs)[源代码]¶

class mmpose.core.evaluation.EvalHook(dataloader, start=None, interval=1, by_epoch=True, save_best=None, rule=None, test_fn=None, greater_keys=['acc', 'ap', 'ar', 'pck', 'auc', '3dpck', 'p-3dpck', '3dauc', 'p-3dauc', 'pcp'], less_keys=['loss', 'epe', 'nme', 'mpjpe', 'p-mpjpe', 'n-mpjpe'], **eval_kwargs)[源代码]¶

mmpose.core.evaluation.aggregate_scale(feature_maps_list, align_corners=False, project2image=True, size_projected=None, aggregate_scale='average')[源代码]¶

Aggregate multi-scale outputs.

注解

batch size: N keypoints num : K heatmap width: W heatmap height: H

参数

feature_maps_list (list[Tensor]) – Aggregated feature maps.
project2image (bool) – Option to resize to base scale.
size_projected (list[int, int]) – Base size of heatmaps [w, h].
align_corners (bool) – Align corners when performing interpolation.
aggregate_scale (str) –
Methods to aggregate multi-scale feature maps. Options: ‘average’, ‘unsqueeze_concat’.
- ’average’: Get the average of the feature maps.
- ’unsqueeze_concat’: Concatenate the feature maps along new axis.
  Default: ‘average.

返回

Aggregated feature maps.

返回类型

Tensor

mmpose.core.evaluation.aggregate_stage_flip(feature_maps, feature_maps_flip, index=- 1, project2image=True, size_projected=None, align_corners=False, aggregate_stage='concat', aggregate_flip='average')[源代码]¶

Inference the model to get multi-stage outputs (heatmaps & tags), and resize them to base sizes.

参数

feature_maps (list[Tensor]) – feature_maps can be heatmaps, tags, and pafs.
feature_maps_flip (list[Tensor] | None) – flipped feature_maps. feature maps can be heatmaps, tags, and pafs.
project2image (bool) – Option to resize to base scale.
size_projected (list[int, int]) – Base size of heatmaps [w, h].
align_corners (bool) – Align corners when performing interpolation.
aggregate_stage (str) –
Methods to aggregate multi-stage feature maps. Options: ‘concat’, ‘average’. Default: ‘concat.
- ’concat’: Concatenate the original and the flipped feature maps.
- ’average’: Get the average of the original and the flipped
  feature maps.
aggregate_flip (str) –
Methods to aggregate the original and the flipped feature maps. Options: ‘concat’, ‘average’, ‘none’. Default: ‘average.
- ’concat’: Concatenate the original and the flipped feature maps.
- ’average’: Get the average of the original and the flipped
  feature maps..
- ’none’: no flipped feature maps.

返回

Aggregated feature maps with shape [NxKxWxH].

返回类型

list[Tensor]

mmpose.core.evaluation.compute_similarity_transform(source_points, target_points)[源代码]¶

Computes a similarity transform (sR, t) that takes a set of 3D points source_points (N x 3) closest to a set of 3D points target_points, where R is an 3x3 rotation matrix, t 3x1 translation, s scale. And return the transformed 3D points source_points_hat (N x 3). i.e. solves the orthogonal Procrutes problem.

注解

Points number: N

参数

source_points (np.ndarray) – Source point set with shape [N, 3].
target_points (np.ndarray) – Target point set with shape [N, 3].

返回

Transformed source point set with shape [N, 3].

返回类型

np.ndarray

mmpose.core.evaluation.flip_feature_maps(feature_maps, flip_index=None)[源代码]¶

Flip the feature maps and swap the channels.

参数

feature_maps (list[Tensor]) – Feature maps.
flip_index (list[int] | None) – Channel-flip indexes. If None, do not flip channels.

返回

Flipped feature_maps.

返回类型

list[Tensor]

mmpose.core.evaluation.get_group_preds(grouped_joints, center, scale, heatmap_size, use_udp=False)[源代码]¶

Transform the grouped joints back to the image.

参数

grouped_joints (list) – Grouped person joints.
center (np.ndarray[2, ]) – Center of the bounding box (x, y).
scale (np.ndarray[2, ]) – Scale of the bounding box wrt [width, height].
heatmap_size (np.ndarray[2, ]) – Size of the destination heatmaps.
use_udp (bool) – Unbiased data processing. Paper ref: Huang et al. The Devil is in the Details: Delving into Unbiased Data Processing for Human Pose Estimation (CVPR’2020).

返回

List of the pose result for each person.

返回类型

list

mmpose.core.evaluation.keypoint_3d_auc(pred, gt, mask, alignment='none')[源代码]¶

Calculate the Area Under the Curve (3DAUC) computed for a range of 3DPCK thresholds.

Paper ref: Monocular 3D Human Pose Estimation In The Wild Using Improved CNN Supervision’ 3DV’2017. . This implementation is derived from mpii_compute_3d_pck.m, which is provided as part of the MPI-INF-3DHP test data release.

注解

batch_size: N num_keypoints: K keypoint_dims: C

参数

pred (np.ndarray[N, K, C]) – Predicted keypoint location.
gt (np.ndarray[N, K, C]) – Groundtruth keypoint location.
mask (np.ndarray[N, K]) – Visibility of the target. False for invisible joints, and True for visible. Invisible joints will be ignored for accuracy calculation.
alignment (str, optional) –
method to align the prediction with the groundtruth. Supported options are:
- 'none': no alignment will be applied
- 'scale': align in the least-square sense in scale
- 'procrustes': align in the least-square sense in scale,
  rotation and translation.

返回

AUC computed for a range of 3DPCK thresholds.

返回类型

auc

mmpose.core.evaluation.keypoint_3d_pck(pred, gt, mask, alignment='none', threshold=0.15)[源代码]¶

Calculate the Percentage of Correct Keypoints (3DPCK) w. or w/o rigid alignment.

Paper ref: Monocular 3D Human Pose Estimation In The Wild Using Improved CNN Supervision’ 3DV’2017. .

注解

batch_size: N
num_keypoints: K
keypoint_dims: C

参数

pred (np.ndarray[N, K, C]) – Predicted keypoint location.
gt (np.ndarray[N, K, C]) – Groundtruth keypoint location.
mask (np.ndarray[N, K]) – Visibility of the target. False for invisible joints, and True for visible. Invisible joints will be ignored for accuracy calculation.
alignment (str, optional) –
method to align the prediction with the groundtruth. Supported options are:
- 'none': no alignment will be applied
- 'scale': align in the least-square sense in scale
- 'procrustes': align in the least-square sense in scale,
  rotation and translation.
threshold – If L2 distance between the prediction and the groundtruth is less then threshold, the predicted result is considered as correct. Default: 0.15 (m).

返回

percentage of correct keypoints.

返回类型

pck

mmpose.core.evaluation.keypoint_auc(pred, gt, mask, normalize, num_step=20)[源代码]¶

Calculate the pose accuracy of PCK for each individual keypoint and the averaged accuracy across all keypoints for coordinates.

注解

batch_size: N
num_keypoints: K

参数

pred (np.ndarray[N, K, 2]) – Predicted keypoint location.
gt (np.ndarray[N, K, 2]) – Groundtruth keypoint location.
mask (np.ndarray[N, K]) – Visibility of the target. False for invisible joints, and True for visible. Invisible joints will be ignored for accuracy calculation.
normalize (float) – Normalization factor.

返回

Area under curve.

返回类型

float

mmpose.core.evaluation.keypoint_epe(pred, gt, mask)[源代码]¶

Calculate the end-point error.

注解

batch_size: N
num_keypoints: K

参数

pred (np.ndarray[N, K, 2]) – Predicted keypoint location.
gt (np.ndarray[N, K, 2]) – Groundtruth keypoint location.
mask (np.ndarray[N, K]) – Visibility of the target. False for invisible joints, and True for visible. Invisible joints will be ignored for accuracy calculation.

返回

Average end-point error.

返回类型

float

mmpose.core.evaluation.keypoint_mpjpe(pred, gt, mask, alignment='none')[源代码]¶

Calculate the mean per-joint position error (MPJPE) and the error after rigid alignment with the ground truth (P-MPJPE).

注解

batch_size: N
num_keypoints: K
keypoint_dims: C

参数

pred (np.ndarray) – Predicted keypoint location with shape [N, K, C].
gt (np.ndarray) – Groundtruth keypoint location with shape [N, K, C].
mask (np.ndarray) – Visibility of the target with shape [N, K]. False for invisible joints, and True for visible. Invisible joints will be ignored for accuracy calculation.
alignment (str, optional) –
method to align the prediction with the groundtruth. Supported options are:
- 'none': no alignment will be applied
- 'scale': align in the least-square sense in scale
- 'procrustes': align in the least-square sense in
  scale, rotation and translation.

返回

A tuple containing joint position errors

(float | np.ndarray): mean per-joint position error (mpjpe).
(float | np.ndarray): mpjpe after rigid alignment with the
ground truth (p-mpjpe).

返回类型

tuple

mmpose.core.evaluation.keypoint_pck_accuracy(pred, gt, mask, thr, normalize)[源代码]¶

Calculate the pose accuracy of PCK for each individual keypoint and the averaged accuracy across all keypoints for coordinates.

注解

PCK metric measures accuracy of the localization of the body joints. The distances between predicted positions and the ground-truth ones are typically normalized by the bounding box size. The threshold (thr) of the normalized distance is commonly set as 0.05, 0.1 or 0.2 etc.

batch_size: N
num_keypoints: K

参数

pred (np.ndarray[N, K, 2]) – Predicted keypoint location.
gt (np.ndarray[N, K, 2]) – Groundtruth keypoint location.
mask (np.ndarray[N, K]) – Visibility of the target. False for invisible joints, and True for visible. Invisible joints will be ignored for accuracy calculation.
thr (float) – Threshold of PCK calculation.
normalize (np.ndarray[N, 2]) – Normalization factor for H&W.

返回

A tuple containing keypoint accuracy.

acc (np.ndarray[K]): Accuracy of each keypoint.
avg_acc (float): Averaged accuracy across all keypoints.
cnt (int): Number of valid keypoints.

返回类型

tuple

mmpose.core.evaluation.keypoints_from_heatmaps(heatmaps, center, scale, unbiased=False, post_process='default', kernel=11, valid_radius_factor=0.0546875, use_udp=False, target_type='GaussianHeatmap')[源代码]¶

Get final keypoint predictions from heatmaps and transform them back to the image.

注解

batch size: N
num keypoints: K
heatmap height: H
heatmap width: W

参数

heatmaps (np.ndarray[N, K, H, W]) – model predicted heatmaps.
center (np.ndarray[N, 2]) – Center of the bounding box (x, y).
scale (np.ndarray[N, 2]) – Scale of the bounding box wrt height/width.
post_process (str/None) – Choice of methods to post-process heatmaps. Currently supported: None, ‘default’, ‘unbiased’, ‘megvii’.
unbiased (bool) – Option to use unbiased decoding. Mutually exclusive with megvii. Note: this arg is deprecated and unbiased=True can be replaced by post_process=’unbiased’ Paper ref: Zhang et al. Distribution-Aware Coordinate Representation for Human Pose Estimation (CVPR 2020).
kernel (int) – Gaussian kernel size (K) for modulation, which should match the heatmap gaussian sigma when training. K=17 for sigma=3 and k=11 for sigma=2.
valid_radius_factor (float) – The radius factor of the positive area in classification heatmap for UDP.
use_udp (bool) – Use unbiased data processing.
target_type (str) – ‘GaussianHeatmap’ or ‘CombinedTarget’. GaussianHeatmap: Classification target with gaussian distribution. CombinedTarget: The combination of classification target (response map) and regression target (offset map). Paper ref: Huang et al. The Devil is in the Details: Delving into Unbiased Data Processing for Human Pose Estimation (CVPR 2020).

返回

A tuple containing keypoint predictions and scores.

preds (np.ndarray[N, K, 2]): Predicted keypoint location in images.
maxvals (np.ndarray[N, K, 1]): Scores (confidence) of the keypoints.

返回类型

tuple

mmpose.core.evaluation.keypoints_from_heatmaps3d(heatmaps, center, scale)[源代码]¶

Get final keypoint predictions from 3d heatmaps and transform them back to the image.

注解

batch size: N
num keypoints: K
heatmap depth size: D
heatmap height: H
heatmap width: W

参数

heatmaps (np.ndarray[N, K, D, H, W]) – model predicted heatmaps.
center (np.ndarray[N, 2]) – Center of the bounding box (x, y).
scale (np.ndarray[N, 2]) – Scale of the bounding box wrt height/width.

返回

A tuple containing keypoint predictions and scores.

preds (np.ndarray[N, K, 3]): Predicted 3d keypoint location in images.
maxvals (np.ndarray[N, K, 1]): Scores (confidence) of the keypoints.

返回类型

tuple

mmpose.core.evaluation.keypoints_from_regression(regression_preds, center, scale, img_size)[源代码]¶

Get final keypoint predictions from regression vectors and transform them back to the image.

注解

batch_size: N
num_keypoints: K

参数

regression_preds (np.ndarray[N, K, 2]) – model prediction.
center (np.ndarray[N, 2]) – Center of the bounding box (x, y).
scale (np.ndarray[N, 2]) – Scale of the bounding box wrt height/width.
img_size (list(img_width, img_height)) – model input image size.

返回

preds (np.ndarray[N, K, 2]): Predicted keypoint location in images.
maxvals (np.ndarray[N, K, 1]): Scores (confidence) of the keypoints.

返回类型

tuple

mmpose.core.evaluation.multilabel_classification_accuracy(pred, gt, mask, thr=0.5)[源代码]¶

Get multi-label classification accuracy.

注解

batch size: N
label number: L

参数

pred (np.ndarray[N, L, 2]) – model predicted labels.
gt (np.ndarray[N, L, 2]) – ground-truth labels.
mask (np.ndarray[N, 1] or np.ndarray[N, L]) – reliability of
labels. (ground-truth) –

返回

multi-label classification accuracy.

返回类型

float

mmpose.core.evaluation.pose_pck_accuracy(output, target, mask, thr=0.05, normalize=None)[源代码]¶

Calculate the pose accuracy of PCK for each individual keypoint and the averaged accuracy across all keypoints from heatmaps.

注解

PCK metric measures accuracy of the localization of the body joints. The distances between predicted positions and the ground-truth ones are typically normalized by the bounding box size. The threshold (thr) of the normalized distance is commonly set as 0.05, 0.1 or 0.2 etc.

batch_size: N
num_keypoints: K
heatmap height: H
heatmap width: W

参数

output (np.ndarray[N, K, H, W]) – Model output heatmaps.
target (np.ndarray[N, K, H, W]) – Groundtruth heatmaps.
mask (np.ndarray[N, K]) – Visibility of the target. False for invisible joints, and True for visible. Invisible joints will be ignored for accuracy calculation.
thr (float) – Threshold of PCK calculation. Default 0.05.
normalize (np.ndarray[N, 2]) – Normalization factor for H&W.

返回

A tuple containing keypoint accuracy.

np.ndarray[K]: Accuracy of each keypoint.
float: Averaged accuracy across all keypoints.
int: Number of valid keypoints.

返回类型

tuple

mmpose.core.evaluation.post_dark_udp(coords, batch_heatmaps, kernel=3)[源代码]¶

DARK post-pocessing. Implemented by udp. Paper ref: Huang et al. The Devil is in the Details: Delving into Unbiased Data Processing for Human Pose Estimation (CVPR 2020). Zhang et al. Distribution-Aware Coordinate Representation for Human Pose Estimation (CVPR 2020).

注解

batch size: B
num keypoints: K
num persons: N
height of heatmaps: H
width of heatmaps: W

B=1 for bottom_up paradigm where all persons share the same heatmap. B=N for top_down paradigm where each person has its own heatmaps.

参数

coords (np.ndarray[N, K, 2]) – Initial coordinates of human pose.
batch_heatmaps (np.ndarray[B, K, H, W]) – batch_heatmaps
kernel (int) – Gaussian kernel size (K) for modulation.

返回

Refined coordinates.

返回类型

np.ndarray([N, K, 2])

mmpose.core.evaluation.split_ae_outputs(outputs, num_joints, with_heatmaps, with_ae, select_output_index)[源代码]¶

Split multi-stage outputs into heatmaps & tags.

参数

outputs (list(Tensor)) – Outputs of network
num_joints (int) – Number of joints
with_heatmaps (list[bool]) – Option to output heatmaps for different stages.
with_ae (list[bool]) – Option to output ae tags for different stages.
select_output_index (list[int]) – Output keep the selected index

返回

A tuple containing multi-stage outputs.

list[Tensor]: multi-stage heatmaps.
list[Tensor]: multi-stage tags.

返回类型

tuple

fp16¶

class mmpose.core.fp16.Fp16OptimizerHook(grad_clip=None, coalesce=True, bucket_size_mb=- 1, loss_scale=512.0, distributed=True)[源代码]¶

FP16 optimizer hook.

The steps of fp16 optimizer is as follows. 1. Scale the loss value. 2. BP in the fp16 model. 2. Copy gradients from fp16 model to fp32 weights. 3. Update fp32 weights. 4. Copy updated parameters from fp32 weights to fp16 model.

Refer to https://arxiv.org/abs/1710.03740 for more details.

参数: loss_scale (float) – Scale factor multiplied with loss.

after_train_iter(runner)[源代码]¶

Backward optimization steps for Mixed Precision Training.

Scale the loss by a scale factor.
Backward the loss to obtain the gradients (fp16).
Copy gradients from the model to the fp32 weight copy.
Scale the gradients back and update the fp32 weight copy.
Copy back the params from fp32 weight copy to the fp16 model.

参数: runner (mmcv.Runner) – The underlines training runner.

before_run(runner)[源代码]¶

Preparing steps before Mixed Precision Training.

Make a master copy of fp32 weights for optimization.
Convert the main model from fp32 to fp16.

参数: runner (mmcv.Runner) – The underlines training runner.

static copy_grads_to_fp32(fp16_net, fp32_weights)[源代码]¶: Copy gradients from fp16 model to fp32 weight copy.

static copy_params_to_fp16(fp16_net, fp32_weights)[源代码]¶: Copy updated params from fp32 weight copy to fp16 model.

mmpose.core.fp16.auto_fp16(apply_to=None, out_fp32=False)[源代码]¶

Decorator to enable fp16 training automatically.

This decorator is useful when you write custom modules and want to support mixed precision training. If inputs arguments are fp32 tensors, they will be converted to fp16 automatically. Arguments other than fp32 tensors are ignored.

参数

apply_to (Iterable, optional) – The argument names to be converted. None indicates all arguments.
out_fp32 (bool) – Whether to convert the output back to fp32.

示例

>>> import torch.nn as nn
>>> class MyModule1(nn.Module):
>>>
>>>     # Convert x and y to fp16
>>>     @auto_fp16()
>>>     def forward(self, x, y):
>>>         pass

>>> import torch.nn as nn
>>> class MyModule2(nn.Module):
>>>
>>>     # convert pred to fp16
>>>     @auto_fp16(apply_to=('pred', ))
>>>     def do_something(self, pred, others):
>>>         pass

mmpose.core.fp16.cast_tensor_type(inputs, src_type, dst_type)[源代码]¶

Recursively convert Tensor in inputs from src_type to dst_type.

参数

inputs – Inputs that to be casted.
src_type (torch.dtype) – Source type.
dst_type (torch.dtype) – Destination type.

返回

The same type with inputs, but all contained Tensors have been cast.

mmpose.core.fp16.force_fp32(apply_to=None, out_fp16=False)[源代码]¶

Decorator to convert input arguments to fp32 in force.

This decorator is useful when you write custom modules and want to support mixed precision training. If there are some inputs that must be processed in fp32 mode, then this decorator can handle it. If inputs arguments are fp16 tensors, they will be converted to fp32 automatically. Arguments other than fp16 tensors are ignored.

参数

apply_to (Iterable, optional) – The argument names to be converted. None indicates all arguments.
out_fp16 (bool) – Whether to convert the output back to fp16.

示例

>>> import torch.nn as nn
>>> class MyModule1(nn.Module):
>>>
>>>     # Convert x and y to fp32
>>>     @force_fp32()
>>>     def loss(self, x, y):
>>>         pass

>>> import torch.nn as nn
>>> class MyModule2(nn.Module):
>>>
>>>     # convert pred to fp32
>>>     @force_fp32(apply_to=('pred', ))
>>>     def post_process(self, pred, others):
>>>         pass

mmpose.core.fp16.wrap_fp16_model(model)[源代码]¶

Wrap the FP32 model to FP16.

Convert FP32 model to FP16.
Remain some necessary layers to be FP32, e.g., normalization layers.

参数: model (nn.Module) – Model in FP32.

utils¶

class mmpose.core.utils.ModelSetEpochHook[源代码]¶: The hook that tells model the current epoch in training.

class mmpose.core.utils.WeightNormClipHook(max_norm=1.0, module_param_names='weight')[源代码]¶

Apply weight norm clip regularization.

The module’s parameter will be clip to a given maximum norm before each forward pass.

参数

max_norm (float) – The maximum norm of the parameter.
module_param_names (str|list) – The parameter name (or name list) to apply weight norm clip.

hook(module, _input)[源代码]¶: Hook function.

property hook_type¶

Hook type Subclasses should overwrite this function to return a string value in.

{forward, forward_pre, backward}

mmpose.core.utils.allreduce_grads(params, coalesce=True, bucket_size_mb=- 1)[源代码]¶

Allreduce gradients.

参数

params (list[torch.Parameters]) – List of parameters of a model
coalesce (bool, optional) – Whether allreduce parameters as a whole. Default: True.
bucket_size_mb (int, optional) – Size of bucket, the unit is MB. Default: -1.

mmpose.core.utils.sync_random_seed(seed=None, device='cuda')[源代码]¶

Make sure different ranks share the same seed.

All workers must call this function, otherwise it will deadlock. This method is generally used in DistributedSampler, because the seed should be identical across all processes in the distributed group. In distributed sampling, different ranks should sample non-overlapped data in the dataset. Therefore, this function is used to make sure that each rank shuffles the data indices in the same order based on the same seed. Then different ranks could use different indices to select non-overlapped data from the same data list. :param seed: The seed. Default to None. :type seed: int, Optional :param device: The device where the seed will be put on.

Default to ‘cuda’.

返回: Seed to be used.
返回类型: int

post_processing¶

class mmpose.core.post_processing.Smoother(filter_cfg: Union[Dict, str], keypoint_dim: int = 2, keypoint_key: str = 'keypoints')[源代码]¶

Smoother to apply temporal smoothing on pose estimation results with a filter.

注解

T: The temporal length of the pose sequence K: The keypoint number of each target C: The keypoint coordinate dimension

参数

filter_cfg (dict | str) – The filter config. See example config files in configs/_base_/filters/ for details. Alternatively a config file path can be accepted and the config will be loaded.
keypoint_dim (int) – The keypoint coordinate dimension, which is also indicated as C. Default: 2
keypoint_key (str) – The dict key of the keypoints in the pose results. Default: ‘keypoints’

示例

>>> import numpy as np
>>> # Build dummy pose result
>>> results = []
>>> for t in range(10):
>>>     results_t = []
>>>     for track_id in range(2):
>>>         result = {
>>>             'track_id': track_id,
>>>             'keypoints': np.random.rand(17, 3)
>>>         }
>>>         results_t.append(result)
>>>     results.append(results_t)
>>> # Example 1: Smooth multi-frame pose results offline.
>>> filter_cfg = dict(type='GaussianFilter', window_size=3)
>>> smoother = Smoother(filter_cfg, keypoint_dim=2)
>>> smoothed_results = smoother.smooth(results)
>>> # Example 2: Smooth pose results online frame-by-frame
>>> filter_cfg = dict(type='GaussianFilter', window_size=3)
>>> smoother = Smoother(filter_cfg, keypoint_dim=2)
>>> for result_t in results:
>>>     smoothed_result_t = smoother.smooth(result_t)

smooth(results)[源代码]¶

Apply temporal smoothing on pose estimation sequences.

参数

results (list[dict] | list[list[dict]]) –

The pose results of a single frame (non-nested list) or multiple frames (nested list). The result of each target is a dict, which should contains:

track_id (optional, Any): The track ID of the target
keypoints (np.ndarray): The keypoint coordinates in [K, C]

返回

Temporal smoothed pose results, which has the same data structure as the input’s.

返回类型

(list[dict] | list[list[dict]])

mmpose.core.post_processing.affine_transform(pt, trans_mat)[源代码]¶

Apply an affine transformation to the points.

参数

pt (np.ndarray) – a 2 dimensional point to be transformed
trans_mat (np.ndarray) – 2x3 matrix of an affine transform

返回

Transformed points.

返回类型

np.ndarray

mmpose.core.post_processing.flip_back(output_flipped, flip_pairs, target_type='GaussianHeatmap')[源代码]¶

Flip the flipped heatmaps back to the original form.

注解

batch_size: N
num_keypoints: K
heatmap height: H
heatmap width: W

参数

output_flipped (np.ndarray[N, K, H, W]) – The output heatmaps obtained from the flipped images.
flip_pairs (list[tuple()) – Pairs of keypoints which are mirrored (for example, left ear – right ear).
target_type (str) – GaussianHeatmap or CombinedTarget

返回

heatmaps that flipped back to the original image

返回类型

np.ndarray

mmpose.core.post_processing.fliplr_joints(joints_3d, joints_3d_visible, img_width, flip_pairs)[源代码]¶

Flip human joints horizontally.

注解

num_keypoints: K

参数

joints_3d (np.ndarray([K, 3])) – Coordinates of keypoints.
joints_3d_visible (np.ndarray([K, 1])) – Visibility of keypoints.
img_width (int) – Image width.
flip_pairs (list[tuple]) – Pairs of keypoints which are mirrored (for example, left ear and right ear).

返回

Flipped human joints.

joints_3d_flipped (np.ndarray([K, 3])): Flipped joints.
joints_3d_visible_flipped (np.ndarray([K, 1])): Joint visibility.

返回类型

tuple

mmpose.core.post_processing.fliplr_regression(regression, flip_pairs, center_mode='static', center_x=0.5, center_index=0)[源代码]¶

Flip human joints horizontally.

注解

batch_size: N
num_keypoint: K

参数

regression (np.ndarray([..., K, C])) –
Coordinates of keypoints, where K is the joint number and C is the dimension. Example shapes are:
- [N, K, C]: a batch of keypoints where N is the batch size.
- [N, T, K, C]: a batch of pose sequences, where T is the frame
  number.
flip_pairs (list[tuple()]) – Pairs of keypoints which are mirrored (for example, left ear – right ear).
center_mode (str) –
The mode to set the center location on the x-axis to flip around. Options are:
- static: use a static x value (see center_x also)
- root: use a root joint (see center_index also)
center_x (float) – Set the x-axis location of the flip center. Only used when center_mode=static.
center_index (int) – Set the index of the root joint, whose x location will be used as the flip center. Only used when center_mode=root.

返回

Flipped joints.

返回类型

np.ndarray([…, K, C])

mmpose.core.post_processing.get_affine_transform(center, scale, rot, output_size, shift=(0.0, 0.0), inv=False)[源代码]¶

Get the affine transform matrix, given the center/scale/rot/output_size.

参数

center (np.ndarray[2, ]) – Center of the bounding box (x, y).
scale (np.ndarray[2, ]) – Scale of the bounding box wrt [width, height].
rot (float) – Rotation angle (degree).
output_size (np.ndarray[2, ] | list(2,)) – Size of the destination heatmaps.
shift (0-100%) – Shift translation ratio wrt the width/height. Default (0., 0.).
inv (bool) – Option to inverse the affine transform direction. (inv=False: src->dst or inv=True: dst->src)

返回

The transform matrix.

返回类型

np.ndarray

mmpose.core.post_processing.get_warp_matrix(theta, size_input, size_dst, size_target)[源代码]¶

Calculate the transformation matrix under the constraint of unbiased. Paper ref: Huang et al. The Devil is in the Details: Delving into Unbiased Data Processing for Human Pose Estimation (CVPR 2020).

参数

theta (float) – Rotation angle in degrees.
size_input (np.ndarray) – Size of input image [w, h].
size_dst (np.ndarray) – Size of output image [w, h].
size_target (np.ndarray) – Size of ROI in input plane [w, h].

返回

A matrix for transformation.

返回类型

np.ndarray

mmpose.core.post_processing.nearby_joints_nms(kpts_db, dist_thr, num_nearby_joints_thr=None, score_per_joint=False, max_dets=- 1)[源代码]¶

Nearby joints NMS implementations.

参数

kpts_db (list[dict]) – keypoints and scores.
dist_thr (float) – threshold for judging whether two joints are close.
num_nearby_joints_thr (int) – threshold for judging whether two instances are close.
max_dets (int) – max number of detections to keep.
score_per_joint (bool) – the input scores (in kpts_db) are per joint scores.

返回

indexes to keep.

返回类型

np.ndarray

mmpose.core.post_processing.oks_iou(g, d, a_g, a_d, sigmas=None, vis_thr=None)[源代码]¶

Calculate oks ious.

参数

g – Ground truth keypoints.
d – Detected keypoints.
a_g – Area of the ground truth object.
a_d – Area of the detected object.
sigmas – standard deviation of keypoint labelling.
vis_thr – threshold of the keypoint visibility.

返回

The oks ious.

返回类型

list

mmpose.core.post_processing.oks_nms(kpts_db, thr, sigmas=None, vis_thr=None, score_per_joint=False)[源代码]¶

OKS NMS implementations.

参数

kpts_db – keypoints.
thr – Retain overlap < thr.
sigmas – standard deviation of keypoint labelling.
vis_thr – threshold of the keypoint visibility.
score_per_joint – the input scores (in kpts_db) are per joint scores

返回

indexes to keep.

返回类型

np.ndarray

mmpose.core.post_processing.rotate_point(pt, angle_rad)[源代码]¶

Rotate a point by an angle.

参数

pt (list[float]) – 2 dimensional point to be rotated
angle_rad (float) – rotation angle by radian

返回

Rotated point.

返回类型

list[float]

mmpose.core.post_processing.soft_oks_nms(kpts_db, thr, max_dets=20, sigmas=None, vis_thr=None, score_per_joint=False)[源代码]¶

Soft OKS NMS implementations.

参数

kpts_db – keypoints and scores.
thr – retain oks overlap < thr.
max_dets – max number of detections to keep.
sigmas – Keypoint labelling uncertainty.
score_per_joint – the input scores (in kpts_db) are per joint scores

返回

indexes to keep.

返回类型

np.ndarray

mmpose.core.post_processing.transform_preds(coords, center, scale, output_size, use_udp=False)[源代码]¶

Get final keypoint predictions from heatmaps and apply scaling and translation to map them back to the image.

注解

num_keypoints: K

参数

coords (np.ndarray[K, ndims]) –
- If ndims=2, corrds are predicted keypoint location.
- If ndims=4, corrds are composed of (x, y, scores, tags)
- If ndims=5, corrds are composed of (x, y, scores, tags, flipped_tags)
center (np.ndarray[2, ]) – Center of the bounding box (x, y).
scale (np.ndarray[2, ]) – Scale of the bounding box wrt [width, height].
output_size (np.ndarray[2, ] | list(2,)) – Size of the destination heatmaps.
use_udp (bool) – Use unbiased data processing

返回

Predicted coordinates in the images.

返回类型

np.ndarray

mmpose.core.post_processing.warp_affine_joints(joints, mat)[源代码]¶

Apply affine transformation defined by the transform matrix on the joints.

参数

joints (np.ndarray[..., 2]) – Origin coordinate of joints.
mat (np.ndarray[3, 2]) – The affine matrix.

返回

Result coordinate of joints.

返回类型

np.ndarray[…, 2]

mmpose.models¶

backbones¶

class mmpose.models.backbones.AlexNet(num_classes=- 1)[源代码]¶

AlexNet backbone.

The input for AlexNet is a 224x224 RGB image.

参数: num_classes (int) – number of classes for classification. The default value is -1, which uses the backbone as a feature extractor without the top classifier.

forward(x)[源代码]¶

Forward function.

参数: x (Tensor | tuple[Tensor]) – x could be a torch.Tensor or a tuple of torch.Tensor, containing input data for forward computation.

class mmpose.models.backbones.CPM(in_channels, out_channels, feat_channels=128, middle_channels=32, num_stages=6, norm_cfg={'requires_grad': True, 'type': 'BN'})[源代码]¶

CPM backbone.

Convolutional Pose Machines. More details can be found in the paper .

参数

in_channels (int) – The input channels of the CPM.
out_channels (int) – The output channels of the CPM.
feat_channels (int) – Feature channel of each CPM stage.
middle_channels (int) – Feature channel of conv after the middle stage.
num_stages (int) – Number of stages.
norm_cfg (dict) – Dictionary to construct and config norm layer.

示例

>>> from mmpose.models import CPM
>>> import torch
>>> self = CPM(3, 17)
>>> self.eval()
>>> inputs = torch.rand(1, 3, 368, 368)
>>> level_outputs = self.forward(inputs)
>>> for level_output in level_outputs:
...     print(tuple(level_output.shape))
(1, 17, 46, 46)
(1, 17, 46, 46)
(1, 17, 46, 46)
(1, 17, 46, 46)
(1, 17, 46, 46)
(1, 17, 46, 46)

forward(x)[源代码]¶: Model forward function.

init_weights(pretrained=None)[源代码]¶

Initialize the weights in backbone.

参数: pretrained (str, optional) – Path to pre-trained weights. Defaults to None.

class mmpose.models.backbones.HRFormer(extra, in_channels=3, conv_cfg=None, norm_cfg={'requires_grad': True, 'type': 'BN'}, transformer_norm_cfg={'eps': 1e-06, 'type': 'LN'}, norm_eval=False, with_cp=False, zero_init_residual=False, frozen_stages=- 1)[源代码]¶

HRFormer backbone.

This backbone is the implementation of HRFormer: High-Resolution Transformer for Dense Prediction.

参数

extra (dict) –
Detailed configuration for each stage of HRNet. There must be 4 stages, the configuration for each stage must have 5 keys:
- num_modules (int): The number of HRModule in this stage.
- num_branches (int): The number of branches in the HRModule.
- block (str): The type of block.
- num_blocks (tuple): The number of blocks in each branch.
  The length must be equal to num_branches.
- num_channels (tuple): The number of channels in each branch.
  The length must be equal to num_branches.
in_channels (int) – Number of input image channels. Normally 3.
conv_cfg (dict) – Dictionary to construct and config conv layer. Default: None.
norm_cfg (dict) – Config of norm layer. Use SyncBN by default.
transformer_norm_cfg (dict) – Config of transformer norm layer. Use LN by default.
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.
zero_init_residual (bool) – Whether to use zero init for last norm layer in resblocks to let them behave as identity. Default: False.
frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Default: -1.

示例

>>> from mmpose.models import HRFormer
>>> import torch
>>> extra = dict(
>>>     stage1=dict(
>>>         num_modules=1,
>>>         num_branches=1,
>>>         block='BOTTLENECK',
>>>         num_blocks=(2, ),
>>>         num_channels=(64, )),
>>>     stage2=dict(
>>>         num_modules=1,
>>>         num_branches=2,
>>>         block='HRFORMER',
>>>         window_sizes=(7, 7),
>>>         num_heads=(1, 2),
>>>         mlp_ratios=(4, 4),
>>>         num_blocks=(2, 2),
>>>         num_channels=(32, 64)),
>>>     stage3=dict(
>>>         num_modules=4,
>>>         num_branches=3,
>>>         block='HRFORMER',
>>>         window_sizes=(7, 7, 7),
>>>         num_heads=(1, 2, 4),
>>>         mlp_ratios=(4, 4, 4),
>>>         num_blocks=(2, 2, 2),
>>>         num_channels=(32, 64, 128)),
>>>     stage4=dict(
>>>         num_modules=2,
>>>         num_branches=4,
>>>         block='HRFORMER',
>>>         window_sizes=(7, 7, 7, 7),
>>>         num_heads=(1, 2, 4, 8),
>>>         mlp_ratios=(4, 4, 4, 4),
>>>         num_blocks=(2, 2, 2, 2),
>>>         num_channels=(32, 64, 128, 256)))
>>> self = HRFormer(extra, in_channels=1)
>>> self.eval()
>>> inputs = torch.rand(1, 1, 32, 32)
>>> level_outputs = self.forward(inputs)
>>> for level_out in level_outputs:
...     print(tuple(level_out.shape))
(1, 32, 8, 8)
(1, 64, 4, 4)
(1, 128, 2, 2)
(1, 256, 1, 1)

class mmpose.models.backbones.HRNet(extra, in_channels=3, conv_cfg=None, norm_cfg={'type': 'BN'}, norm_eval=False, with_cp=False, zero_init_residual=False, frozen_stages=- 1)[源代码]¶

HRNet backbone.

High-Resolution Representations for Labeling Pixels and Regions

参数

extra (dict) – detailed configuration for each stage of HRNet.
in_channels (int) – Number of input image channels. Default: 3.
conv_cfg (dict) – dictionary to construct and config conv layer.
norm_cfg (dict) – dictionary to construct and config norm layer.
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed.
zero_init_residual (bool) – whether to use zero init for last norm layer in resblocks to let them behave as identity.
frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Default: -1.

示例

>>> from mmpose.models import HRNet
>>> import torch
>>> extra = dict(
>>>     stage1=dict(
>>>         num_modules=1,
>>>         num_branches=1,
>>>         block='BOTTLENECK',
>>>         num_blocks=(4, ),
>>>         num_channels=(64, )),
>>>     stage2=dict(
>>>         num_modules=1,
>>>         num_branches=2,
>>>         block='BASIC',
>>>         num_blocks=(4, 4),
>>>         num_channels=(32, 64)),
>>>     stage3=dict(
>>>         num_modules=4,
>>>         num_branches=3,
>>>         block='BASIC',
>>>         num_blocks=(4, 4, 4),
>>>         num_channels=(32, 64, 128)),
>>>     stage4=dict(
>>>         num_modules=3,
>>>         num_branches=4,
>>>         block='BASIC',
>>>         num_blocks=(4, 4, 4, 4),
>>>         num_channels=(32, 64, 128, 256)))
>>> self = HRNet(extra, in_channels=1)
>>> self.eval()
>>> inputs = torch.rand(1, 1, 32, 32)
>>> level_outputs = self.forward(inputs)
>>> for level_out in level_outputs:
...     print(tuple(level_out.shape))
(1, 32, 8, 8)

forward(x)[源代码]¶: Forward function.

init_weights(pretrained=None)[源代码]¶

Initialize the weights in backbone.

参数: pretrained (str, optional) – Path to pre-trained weights. Defaults to None.

property norm1¶

the normalization layer named “norm1”

Type: nn.Module

property norm2¶

the normalization layer named “norm2”

Type: nn.Module

train(mode=True)[源代码]¶: Convert the model into training mode.

class mmpose.models.backbones.HourglassAENet(downsample_times=4, num_stacks=1, out_channels=34, stage_channels=(256, 384, 512, 640, 768), feat_channels=256, norm_cfg={'requires_grad': True, 'type': 'BN'})[源代码]¶

Hourglass-AE Network proposed by Newell et al.

Associative Embedding: End-to-End Learning for Joint Detection and Grouping.

More details can be found in the paper .

参数

downsample_times (int) – Downsample times in a HourglassModule.
num_stacks (int) – Number of HourglassModule modules stacked, 1 for Hourglass-52, 2 for Hourglass-104.
stage_channels (list[int]) – Feature channel of each sub-module in a HourglassModule.
stage_blocks (list[int]) – Number of sub-modules stacked in a HourglassModule.
feat_channels (int) – Feature channel of conv after a HourglassModule.
norm_cfg (dict) – Dictionary to construct and config norm layer.

示例

>>> from mmpose.models import HourglassAENet
>>> import torch
>>> self = HourglassAENet()
>>> self.eval()
>>> inputs = torch.rand(1, 3, 512, 512)
>>> level_outputs = self.forward(inputs)
>>> for level_output in level_outputs:
...     print(tuple(level_output.shape))
(1, 34, 128, 128)

forward(x)[源代码]¶: Model forward function.

init_weights(pretrained=None)[源代码]¶

Initialize the weights in backbone.

参数: pretrained (str, optional) – Path to pre-trained weights. Defaults to None.

class mmpose.models.backbones.HourglassNet(downsample_times=5, num_stacks=2, stage_channels=(256, 256, 384, 384, 384, 512), stage_blocks=(2, 2, 2, 2, 2, 4), feat_channel=256, norm_cfg={'requires_grad': True, 'type': 'BN'})[源代码]¶

HourglassNet backbone.

Stacked Hourglass Networks for Human Pose Estimation. More details can be found in the paper .

参数

downsample_times (int) – Downsample times in a HourglassModule.
num_stacks (int) – Number of HourglassModule modules stacked, 1 for Hourglass-52, 2 for Hourglass-104.
stage_channels (list[int]) – Feature channel of each sub-module in a HourglassModule.
stage_blocks (list[int]) – Number of sub-modules stacked in a HourglassModule.
feat_channel (int) – Feature channel of conv after a HourglassModule.
norm_cfg (dict) – Dictionary to construct and config norm layer.

示例

>>> from mmpose.models import HourglassNet
>>> import torch
>>> self = HourglassNet()
>>> self.eval()
>>> inputs = torch.rand(1, 3, 511, 511)
>>> level_outputs = self.forward(inputs)
>>> for level_output in level_outputs:
...     print(tuple(level_output.shape))
(1, 256, 128, 128)
(1, 256, 128, 128)

forward(x)[源代码]¶: Model forward function.

init_weights(pretrained=None)[源代码]¶

Initialize the weights in backbone.

参数: pretrained (str, optional) – Path to pre-trained weights. Defaults to None.

class mmpose.models.backbones.I3D(in_channels=3, expansion=1.0)[源代码]¶

I3D backbone.

Please refer to the paper for details.

Args: in_channels (int): Input channels of the backbone, which is decided

on the input modality.

expansion (float): The multiplier of in_channels and out_channels.: Default: 1.

forward(x)[源代码]¶

Forward function.

参数: x (Tensor | tuple[Tensor]) – x could be a torch.Tensor or a tuple of torch.Tensor, containing input data for forward computation.

class mmpose.models.backbones.LiteHRNet(extra, in_channels=3, conv_cfg=None, norm_cfg={'type': 'BN'}, norm_eval=False, with_cp=False)[源代码]¶

Lite-HRNet backbone.

Lite-HRNet: A Lightweight High-Resolution Network.

Code adapted from ‘https://github.com/HRNet/Lite-HRNet’.

参数

extra (dict) – detailed configuration for each stage of HRNet.
in_channels (int) – Number of input image channels. Default: 3.
conv_cfg (dict) – dictionary to construct and config conv layer.
norm_cfg (dict) – dictionary to construct and config norm layer.
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed.

示例

>>> from mmpose.models import LiteHRNet
>>> import torch
>>> extra=dict(
>>>    stem=dict(stem_channels=32, out_channels=32, expand_ratio=1),
>>>    num_stages=3,
>>>    stages_spec=dict(
>>>        num_modules=(2, 4, 2),
>>>        num_branches=(2, 3, 4),
>>>        num_blocks=(2, 2, 2),
>>>        module_type=('LITE', 'LITE', 'LITE'),
>>>        with_fuse=(True, True, True),
>>>        reduce_ratios=(8, 8, 8),
>>>        num_channels=(
>>>            (40, 80),
>>>            (40, 80, 160),
>>>            (40, 80, 160, 320),
>>>        )),
>>>    with_head=False)
>>> self = LiteHRNet(extra, in_channels=1)
>>> self.eval()
>>> inputs = torch.rand(1, 1, 32, 32)
>>> level_outputs = self.forward(inputs)
>>> for level_out in level_outputs:
...     print(tuple(level_out.shape))
(1, 40, 8, 8)

forward(x)[源代码]¶: Forward function.

init_weights(pretrained=None)[源代码]¶

Initialize the weights in backbone.

参数: pretrained (str, optional) – Path to pre-trained weights. Defaults to None.

train(mode=True)[源代码]¶: Convert the model into training mode.

class mmpose.models.backbones.MSPN(unit_channels=256, num_stages=4, num_units=4, num_blocks=[2, 2, 2, 2], norm_cfg={'type': 'BN'}, res_top_channels=64)[源代码]¶

MSPN backbone. Paper ref: Li et al. “Rethinking on Multi-Stage Networks for Human Pose Estimation” (CVPR 2020).

参数

unit_channels (int) – Number of Channels in an upsample unit. Default: 256
num_stages (int) – Number of stages in a multi-stage MSPN. Default: 4
num_units (int) – Number of downsample/upsample units in a single-stage network. Default: 4 Note: Make sure num_units == len(self.num_blocks)
num_blocks (list) – Number of bottlenecks in each downsample unit. Default: [2, 2, 2, 2]
norm_cfg (dict) – dictionary to construct and config norm layer. Default: dict(type=’BN’)
res_top_channels (int) – Number of channels of feature from ResNetTop. Default: 64.

示例

>>> from mmpose.models import MSPN
>>> import torch
>>> self = MSPN(num_stages=2,num_units=2,num_blocks=[2,2])
>>> self.eval()
>>> inputs = torch.rand(1, 3, 511, 511)
>>> level_outputs = self.forward(inputs)
>>> for level_output in level_outputs:
...     for feature in level_output:
...         print(tuple(feature.shape))
...
(1, 256, 64, 64)
(1, 256, 128, 128)
(1, 256, 64, 64)
(1, 256, 128, 128)

forward(x)[源代码]¶: Model forward function.

init_weights(pretrained=None)[源代码]¶: Initialize model weights.

class mmpose.models.backbones.MobileNetV2(widen_factor=1.0, out_indices=(7), frozen_stages=- 1, conv_cfg=None, norm_cfg={'type': 'BN'}, act_cfg={'type': 'ReLU6'}, norm_eval=False, with_cp=False)[源代码]¶

MobileNetV2 backbone.

参数

widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Default: 1.0.
out_indices (None or Sequence[int]) – Output from which stages. Default: (7, ).
frozen_stages (int) – Stages to be frozen (all param fixed). Default: -1, which means not freezing any parameters.
conv_cfg (dict) – Config dict for convolution layer. Default: None, which means using conv2d.
norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’BN’).
act_cfg (dict) – Config dict for activation layer. Default: dict(type=’ReLU6’).
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.

forward(x)[源代码]¶

Forward function.

参数: x (Tensor | tuple[Tensor]) – x could be a torch.Tensor or a tuple of torch.Tensor, containing input data for forward computation.

init_weights(pretrained=None)[源代码]¶

Init backbone weights.

参数: pretrained (str | None) – If pretrained is a string, then it initializes backbone weights by loading the pretrained checkpoint. If pretrained is None, then it follows default initializer or customized initializer in subclasses.

make_layer(out_channels, num_blocks, stride, expand_ratio)[源代码]¶

Stack InvertedResidual blocks to build a layer for MobileNetV2.

参数

out_channels (int) – out_channels of block.
num_blocks (int) – number of blocks.
stride (int) – stride of the first block. Default: 1
expand_ratio (int) – Expand the number of channels of the hidden layer in InvertedResidual by this ratio. Default: 6.

train(mode=True)[源代码]¶

Sets the module in training mode.

This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g. Dropout, BatchNorm, etc.

参数: mode (bool) – whether to set training mode (True) or evaluation mode (False). Default: True.
返回: self
返回类型: Module

class mmpose.models.backbones.MobileNetV3(arch='small', conv_cfg=None, norm_cfg={'type': 'BN'}, out_indices=(- 1), frozen_stages=- 1, norm_eval=False, with_cp=False)[源代码]¶

MobileNetV3 backbone.

参数

arch (str) – Architecture of mobilnetv3, from {small, big}. Default: small.
conv_cfg (dict) – Config dict for convolution layer. Default: None, which means using conv2d.
norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’BN’).
out_indices (None or Sequence[int]) – Output from which stages. Default: (-1, ), which means output tensors from final stage.
frozen_stages (int) – Stages to be frozen (all param fixed). Default: -1, which means not freezing any parameters.
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.

forward(x)[源代码]¶

Forward function.

参数: x (Tensor | tuple[Tensor]) – x could be a torch.Tensor or a tuple of torch.Tensor, containing input data for forward computation.

init_weights(pretrained=None)[源代码]¶

Init backbone weights.

参数: pretrained (str | None) – If pretrained is a string, then it initializes backbone weights by loading the pretrained checkpoint. If pretrained is None, then it follows default initializer or customized initializer in subclasses.

train(mode=True)[源代码]¶

Sets the module in training mode.

This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g. Dropout, BatchNorm, etc.

参数: mode (bool) – whether to set training mode (True) or evaluation mode (False). Default: True.
返回: self
返回类型: Module

class mmpose.models.backbones.PyramidVisionTransformer(pretrain_img_size=224, in_channels=3, embed_dims=64, num_stages=4, num_layers=[3, 4, 6, 3], num_heads=[1, 2, 5, 8], patch_sizes=[4, 2, 2, 2], strides=[4, 2, 2, 2], paddings=[0, 0, 0, 0], sr_ratios=[8, 4, 2, 1], out_indices=(0, 1, 2, 3), mlp_ratios=[8, 8, 4, 4], qkv_bias=True, drop_rate=0.0, attn_drop_rate=0.0, drop_path_rate=0.1, use_abs_pos_embed=True, norm_after_stage=False, use_conv_ffn=False, act_cfg={'type': 'GELU'}, norm_cfg={'eps': 1e-06, 'type': 'LN'}, pretrained=None, convert_weights=True, init_cfg=None)[源代码]¶

Pyramid Vision Transformer (PVT)

Implementation of Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions.

参数

pretrain_img_size (int | tuple[int]) – The size of input image when pretrain. Defaults: 224.
in_channels (int) – Number of input channels. Default: 3.
embed_dims (int) – Embedding dimension. Default: 64.
num_stags (int) – The num of stages. Default: 4.
num_layers (Sequence[int]) – The layer number of each transformer encode layer. Default: [3, 4, 6, 3].
num_heads (Sequence[int]) – The attention heads of each transformer encode layer. Default: [1, 2, 5, 8].
patch_sizes (Sequence[int]) – The patch_size of each patch embedding. Default: [4, 2, 2, 2].
strides (Sequence[int]) – The stride of each patch embedding. Default: [4, 2, 2, 2].
paddings (Sequence[int]) – The padding of each patch embedding. Default: [0, 0, 0, 0].
sr_ratios (Sequence[int]) – The spatial reduction rate of each transformer encode layer. Default: [8, 4, 2, 1].
out_indices (Sequence[int] | int) – Output from which stages. Default: (0, 1, 2, 3).
mlp_ratios (Sequence[int]) – The ratio of the mlp hidden dim to the embedding dim of each transformer encode layer. Default: [8, 8, 4, 4].
qkv_bias (bool) – Enable bias for qkv if True. Default: True.
drop_rate (float) – Probability of an element to be zeroed. Default 0.0.
attn_drop_rate (float) – The drop out rate for attention layer. Default 0.0.
drop_path_rate (float) – stochastic depth rate. Default 0.1.
use_abs_pos_embed (bool) – If True, add absolute position embedding to the patch embedding. Defaults: True.
use_conv_ffn (bool) – If True, use Convolutional FFN to replace FFN. Default: False.
act_cfg (dict) – The activation config for FFNs. Default: dict(type=’GELU’).
norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’LN’).
pretrained (str, optional) – model pretrained path. Default: None.
convert_weights (bool) – The flag indicates whether the pre-trained model is from the original repo. We may need to convert some keys to make it compatible. Default: True.
init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None.

forward(x)[源代码]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

注解

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

init_weights(pretrained=None)[源代码]¶: Initialize the weights.

class mmpose.models.backbones.PyramidVisionTransformerV2(**kwargs)[源代码]¶: Implementation of PVTv2: Improved Baselines with Pyramid Vision Transformer.

class mmpose.models.backbones.RSN(unit_channels=256, num_stages=4, num_units=4, num_blocks=[2, 2, 2, 2], num_steps=4, norm_cfg={'type': 'BN'}, res_top_channels=64, expand_times=26)[源代码]¶

Residual Steps Network backbone. Paper ref: Cai et al. “Learning Delicate Local Representations for Multi-Person Pose Estimation” (ECCV 2020).

参数

unit_channels (int) – Number of Channels in an upsample unit. Default: 256
num_stages (int) – Number of stages in a multi-stage RSN. Default: 4
num_units (int) – NUmber of downsample/upsample units in a single-stage RSN. Default: 4 Note: Make sure num_units == len(self.num_blocks)
num_blocks (list) – Number of RSBs (Residual Steps Block) in each downsample unit. Default: [2, 2, 2, 2]
num_steps (int) – Number of steps in a RSB. Default:4
norm_cfg (dict) – dictionary to construct and config norm layer. Default: dict(type=’BN’)
res_top_channels (int) – Number of channels of feature from ResNet_top. Default: 64.
expand_times (int) – Times by which the in_channels are expanded in RSB. Default:26.

示例

>>> from mmpose.models import RSN
>>> import torch
>>> self = RSN(num_stages=2,num_units=2,num_blocks=[2,2])
>>> self.eval()
>>> inputs = torch.rand(1, 3, 511, 511)
>>> level_outputs = self.forward(inputs)
>>> for level_output in level_outputs:
...     for feature in level_output:
...         print(tuple(feature.shape))
...
(1, 256, 64, 64)
(1, 256, 128, 128)
(1, 256, 64, 64)
(1, 256, 128, 128)

forward(x)[源代码]¶: Model forward function.

init_weights(pretrained=None)[源代码]¶: Initialize model weights.

class mmpose.models.backbones.RegNet(arch, in_channels=3, stem_channels=32, base_channels=32, strides=(2, 2, 2, 2), dilations=(1, 1, 1, 1), out_indices=(3), style='pytorch', deep_stem=False, avg_down=False, frozen_stages=- 1, conv_cfg=None, norm_cfg={'requires_grad': True, 'type': 'BN'}, norm_eval=False, with_cp=False, zero_init_residual=True)[源代码]¶

RegNet backbone.

More details can be found in paper .

参数

arch (dict) – The parameter of RegNets. - w0 (int): initial width - wa (float): slope of width - wm (float): quantization parameter to quantize the width - depth (int): depth of the backbone - group_w (int): width of group - bot_mul (float): bottleneck ratio, i.e. expansion of bottleneck.
strides (Sequence[int]) – Strides of the first block of each stage.
base_channels (int) – Base channels after stem layer.
in_channels (int) – Number of input image channels. Default: 3.
dilations (Sequence[int]) – Dilation of each stage.
out_indices (Sequence[int]) – Output from which stages.
style (str) – pytorch or caffe. If set to “pytorch”, the stride-two layer is the 3x3 conv layer, otherwise the stride-two layer is the first 1x1 conv layer. Default: “pytorch”.
frozen_stages (int) – Stages to be frozen (all param fixed). -1 means not freezing any parameters. Default: -1.
norm_cfg (dict) – dictionary to construct and config norm layer. Default: dict(type=’BN’, requires_grad=True).
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.
zero_init_residual (bool) – whether to use zero init for last norm layer in resblocks to let them behave as identity. Default: True.

示例

>>> from mmpose.models import RegNet
>>> import torch
>>> self = RegNet(
        arch=dict(
            w0=88,
            wa=26.31,
            wm=2.25,
            group_w=48,
            depth=25,
            bot_mul=1.0),
         out_indices=(0, 1, 2, 3))
>>> self.eval()
>>> inputs = torch.rand(1, 3, 32, 32)
>>> level_outputs = self.forward(inputs)
>>> for level_out in level_outputs:
...     print(tuple(level_out.shape))
(1, 96, 8, 8)
(1, 192, 4, 4)
(1, 432, 2, 2)
(1, 1008, 1, 1)

adjust_width_group(widths, bottleneck_ratio, groups)[源代码]¶

Adjusts the compatibility of widths and groups.

参数

widths (list[int]) – Width of each stage.
bottleneck_ratio (float) – Bottleneck ratio.
groups (int) – number of groups in each stage

返回

The adjusted widths and groups of each stage.

返回类型

tuple(list)

forward(x)[源代码]¶: Forward function.

static generate_regnet(initial_width, width_slope, width_parameter, depth, divisor=8)[源代码]¶

Generates per block width from RegNet parameters.

参数

initial_width ([int]) – Initial width of the backbone
width_slope ([float]) – Slope of the quantized linear function
width_parameter ([int]) – Parameter used to quantize the width.
depth ([int]) – Depth of the backbone.
divisor (int, optional) – The divisor of channels. Defaults to 8.

返回

return a list of widths of each stage and the number of: stages

返回类型

list, int

get_stages_from_blocks(widths)[源代码]¶

Gets widths/stage_blocks of network at each stage.

参数: widths (list[int]) – Width in each stage.
返回: width and depth of each stage
返回类型: tuple(list)

static quantize_float(number, divisor)[源代码]¶

Converts a float to closest non-zero int divisible by divior.

参数

number (int) – Original number to be quantized.
divisor (int) – Divisor used to quantize the number.

返回

quantized number that is divisible by devisor.

返回类型

int

class mmpose.models.backbones.ResNeSt(depth, groups=1, width_per_group=4, radix=2, reduction_factor=4, avg_down_stride=True, **kwargs)[源代码]¶

ResNeSt backbone.

Please refer to the paper for details.

参数

depth (int) – Network depth, from {50, 101, 152, 200}.
groups (int) – Groups of conv2 in Bottleneck. Default: 32.
width_per_group (int) – Width per group of conv2 in Bottleneck. Default: 4.
radix (int) – Radix of SpltAtConv2d. Default: 2
reduction_factor (int) – Reduction factor of SplitAttentionConv2d. Default: 4.
avg_down_stride (bool) – Whether to use average pool for stride in Bottleneck. Default: True.
in_channels (int) – Number of input image channels. Default: 3.
stem_channels (int) – Output channels of the stem layer. Default: 64.
num_stages (int) – Stages of the network. Default: 4.
strides (Sequence[int]) – Strides of the first block of each stage. Default: (1, 2, 2, 2).
dilations (Sequence[int]) – Dilation of each stage. Default: (1, 1, 1, 1).
out_indices (Sequence[int]) – Output from which stages. If only one stage is specified, a single tensor (feature map) is returned, otherwise multiple stages are specified, a tuple of tensors will be returned. Default: (3, ).
style (str) – pytorch or caffe. If set to “pytorch”, the stride-two layer is the 3x3 conv layer, otherwise the stride-two layer is the first 1x1 conv layer.
deep_stem (bool) – Replace 7x7 conv in input stem with 3 3x3 conv. Default: False.
avg_down (bool) – Use AvgPool instead of stride conv when downsampling in the bottleneck. Default: False.
frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Default: -1.
conv_cfg (dict | None) – The config dict for conv layers. Default: None.
norm_cfg (dict) – The config dict for norm layers.
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.
zero_init_residual (bool) – Whether to use zero init for last norm layer in resblocks to let them behave as identity. Default: True.

make_res_layer(**kwargs)[源代码]¶: Make a ResLayer.

class mmpose.models.backbones.ResNeXt(depth, groups=32, width_per_group=4, **kwargs)[源代码]¶

ResNeXt backbone.

Please refer to the paper for details.

参数

depth (int) – Network depth, from {50, 101, 152}.
groups (int) – Groups of conv2 in Bottleneck. Default: 32.
width_per_group (int) – Width per group of conv2 in Bottleneck. Default: 4.
in_channels (int) – Number of input image channels. Default: 3.
stem_channels (int) – Output channels of the stem layer. Default: 64.
num_stages (int) – Stages of the network. Default: 4.
strides (Sequence[int]) – Strides of the first block of each stage. Default: (1, 2, 2, 2).
dilations (Sequence[int]) – Dilation of each stage. Default: (1, 1, 1, 1).
out_indices (Sequence[int]) – Output from which stages. If only one stage is specified, a single tensor (feature map) is returned, otherwise multiple stages are specified, a tuple of tensors will be returned. Default: (3, ).
style (str) – pytorch or caffe. If set to “pytorch”, the stride-two layer is the 3x3 conv layer, otherwise the stride-two layer is the first 1x1 conv layer.
deep_stem (bool) – Replace 7x7 conv in input stem with 3 3x3 conv. Default: False.
avg_down (bool) – Use AvgPool instead of stride conv when downsampling in the bottleneck. Default: False.
frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Default: -1.
conv_cfg (dict | None) – The config dict for conv layers. Default: None.
norm_cfg (dict) – The config dict for norm layers.
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.
zero_init_residual – Whether to use zero init for last norm layer in resblocks to let them behave as identity. Default: True.

make_res_layer(**kwargs)[源代码]¶: Make a ResLayer.

class mmpose.models.backbones.ResNet(depth, in_channels=3, stem_channels=64, base_channels=64, expansion=None, num_stages=4, strides=(1, 2, 2, 2), dilations=(1, 1, 1, 1), out_indices=(3), style='pytorch', deep_stem=False, avg_down=False, frozen_stages=- 1, conv_cfg=None, norm_cfg={'requires_grad': True, 'type': 'BN'}, norm_eval=False, with_cp=False, zero_init_residual=True)[源代码]¶

ResNet backbone.

Please refer to the paper for details.

参数

depth (int) – Network depth, from {18, 34, 50, 101, 152}.
in_channels (int) – Number of input image channels. Default: 3.
stem_channels (int) – Output channels of the stem layer. Default: 64.
base_channels (int) – Middle channels of the first stage. Default: 64.
num_stages (int) – Stages of the network. Default: 4.
strides (Sequence[int]) – Strides of the first block of each stage. Default: (1, 2, 2, 2).
dilations (Sequence[int]) – Dilation of each stage. Default: (1, 1, 1, 1).
out_indices (Sequence[int]) – Output from which stages. If only one stage is specified, a single tensor (feature map) is returned, otherwise multiple stages are specified, a tuple of tensors will be returned. Default: (3, ).
style (str) – pytorch or caffe. If set to “pytorch”, the stride-two layer is the 3x3 conv layer, otherwise the stride-two layer is the first 1x1 conv layer.
deep_stem (bool) – Replace 7x7 conv in input stem with 3 3x3 conv. Default: False.
avg_down (bool) – Use AvgPool instead of stride conv when downsampling in the bottleneck. Default: False.
frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Default: -1.
conv_cfg (dict | None) – The config dict for conv layers. Default: None.
norm_cfg (dict) – The config dict for norm layers.
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.
zero_init_residual (bool) – Whether to use zero init for last norm layer in resblocks to let them behave as identity. Default: True.

示例

>>> from mmpose.models import ResNet
>>> import torch
>>> self = ResNet(depth=18, out_indices=(0, 1, 2, 3))
>>> self.eval()
>>> inputs = torch.rand(1, 3, 32, 32)
>>> level_outputs = self.forward(inputs)
>>> for level_out in level_outputs:
...     print(tuple(level_out.shape))
(1, 64, 8, 8)
(1, 128, 4, 4)
(1, 256, 2, 2)
(1, 512, 1, 1)

forward(x)[源代码]¶: Forward function.

init_weights(pretrained=None)[源代码]¶

Initialize the weights in backbone.

参数: pretrained (str, optional) – Path to pre-trained weights. Defaults to None.

make_res_layer(**kwargs)[源代码]¶: Make a ResLayer.

property norm1¶

the normalization layer named “norm1”

Type: nn.Module

train(mode=True)[源代码]¶: Convert the model into training mode.

class mmpose.models.backbones.ResNetV1d(**kwargs)[源代码]¶

ResNetV1d variant described in Bag of Tricks.

Compared with default ResNet(ResNetV1b), ResNetV1d replaces the 7x7 conv in the input stem with three 3x3 convs. And in the downsampling block, a 2x2 avg_pool with stride 2 is added before conv, whose stride is changed to 1.

class mmpose.models.backbones.SCNet(depth, **kwargs)[源代码]¶

SCNet backbone.

Improving Convolutional Networks with Self-Calibrated Convolutions, Jiang-Jiang Liu, Qibin Hou, Ming-Ming Cheng, Changhu Wang, Jiashi Feng, IEEE CVPR, 2020. http://mftp.mmcheng.net/Papers/20cvprSCNet.pdf

参数

depth (int) – Depth of scnet, from {50, 101}.
in_channels (int) – Number of input image channels. Normally 3.
base_channels (int) – Number of base channels of hidden layer.
num_stages (int) – SCNet stages, normally 4.
strides (Sequence[int]) – Strides of the first block of each stage.
dilations (Sequence[int]) – Dilation of each stage.
out_indices (Sequence[int]) – Output from which stages.
style (str) – pytorch or caffe. If set to “pytorch”, the stride-two layer is the 3x3 conv layer, otherwise the stride-two layer is the first 1x1 conv layer.
deep_stem (bool) – Replace 7x7 conv in input stem with 3 3x3 conv
avg_down (bool) – Use AvgPool instead of stride conv when downsampling in the bottleneck.
frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters.
norm_cfg (dict) – Dictionary to construct and config norm layer.
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed.
zero_init_residual (bool) – Whether to use zero init for last norm layer in resblocks to let them behave as identity.

示例

>>> from mmpose.models import SCNet
>>> import torch
>>> self = SCNet(depth=50, out_indices=(0, 1, 2, 3))
>>> self.eval()
>>> inputs = torch.rand(1, 3, 224, 224)
>>> level_outputs = self.forward(inputs)
>>> for level_out in level_outputs:
...     print(tuple(level_out.shape))
(1, 256, 56, 56)
(1, 512, 28, 28)
(1, 1024, 14, 14)
(1, 2048, 7, 7)

class mmpose.models.backbones.SEResNeXt(depth, groups=32, width_per_group=4, **kwargs)[源代码]¶

SEResNeXt backbone.

Please refer to the paper for details.

参数

depth (int) – Network depth, from {50, 101, 152}.
groups (int) – Groups of conv2 in Bottleneck. Default: 32.
width_per_group (int) – Width per group of conv2 in Bottleneck. Default: 4.
se_ratio (int) – Squeeze ratio in SELayer. Default: 16.
in_channels (int) – Number of input image channels. Default: 3.
stem_channels (int) – Output channels of the stem layer. Default: 64.
num_stages (int) – Stages of the network. Default: 4.
strides (Sequence[int]) – Strides of the first block of each stage. Default: (1, 2, 2, 2).
dilations (Sequence[int]) – Dilation of each stage. Default: (1, 1, 1, 1).
out_indices (Sequence[int]) – Output from which stages. If only one stage is specified, a single tensor (feature map) is returned, otherwise multiple stages are specified, a tuple of tensors will be returned. Default: (3, ).
style (str) – pytorch or caffe. If set to “pytorch”, the stride-two layer is the 3x3 conv layer, otherwise the stride-two layer is the first 1x1 conv layer.
deep_stem (bool) – Replace 7x7 conv in input stem with 3 3x3 conv. Default: False.
avg_down (bool) – Use AvgPool instead of stride conv when downsampling in the bottleneck. Default: False.
frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Default: -1.
conv_cfg (dict | None) – The config dict for conv layers. Default: None.
norm_cfg (dict) – The config dict for norm layers.
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.
zero_init_residual (bool) – Whether to use zero init for last norm layer in resblocks to let them behave as identity. Default: True.

示例

>>> from mmpose.models import SEResNeXt
>>> import torch
>>> self = SEResNet(depth=50, out_indices=(0, 1, 2, 3))
>>> self.eval()
>>> inputs = torch.rand(1, 3, 224, 224)
>>> level_outputs = self.forward(inputs)
>>> for level_out in level_outputs:
...     print(tuple(level_out.shape))
(1, 256, 56, 56)
(1, 512, 28, 28)
(1, 1024, 14, 14)
(1, 2048, 7, 7)

make_res_layer(**kwargs)[源代码]¶: Make a ResLayer.

class mmpose.models.backbones.SEResNet(depth, se_ratio=16, **kwargs)[源代码]¶

SEResNet backbone.

Please refer to the paper for details.

参数

depth (int) – Network depth, from {50, 101, 152}.
se_ratio (int) – Squeeze ratio in SELayer. Default: 16.
in_channels (int) – Number of input image channels. Default: 3.
stem_channels (int) – Output channels of the stem layer. Default: 64.
num_stages (int) – Stages of the network. Default: 4.
strides (Sequence[int]) – Strides of the first block of each stage. Default: (1, 2, 2, 2).
dilations (Sequence[int]) – Dilation of each stage. Default: (1, 1, 1, 1).
out_indices (Sequence[int]) – Output from which stages. If only one stage is specified, a single tensor (feature map) is returned, otherwise multiple stages are specified, a tuple of tensors will be returned. Default: (3, ).
style (str) – pytorch or caffe. If set to “pytorch”, the stride-two layer is the 3x3 conv layer, otherwise the stride-two layer is the first 1x1 conv layer.
deep_stem (bool) – Replace 7x7 conv in input stem with 3 3x3 conv. Default: False.
avg_down (bool) – Use AvgPool instead of stride conv when downsampling in the bottleneck. Default: False.
frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Default: -1.
conv_cfg (dict | None) – The config dict for conv layers. Default: None.
norm_cfg (dict) – The config dict for norm layers.
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.
zero_init_residual (bool) – Whether to use zero init for last norm layer in resblocks to let them behave as identity. Default: True.

示例

>>> from mmpose.models import SEResNet
>>> import torch
>>> self = SEResNet(depth=50, out_indices=(0, 1, 2, 3))
>>> self.eval()
>>> inputs = torch.rand(1, 3, 224, 224)
>>> level_outputs = self.forward(inputs)
>>> for level_out in level_outputs:
...     print(tuple(level_out.shape))
(1, 256, 56, 56)
(1, 512, 28, 28)
(1, 1024, 14, 14)
(1, 2048, 7, 7)

make_res_layer(**kwargs)[源代码]¶: Make a ResLayer.

class mmpose.models.backbones.ShuffleNetV1(groups=3, widen_factor=1.0, out_indices=(2), frozen_stages=- 1, conv_cfg=None, norm_cfg={'type': 'BN'}, act_cfg={'type': 'ReLU'}, norm_eval=False, with_cp=False)[源代码]¶

ShuffleNetV1 backbone.

参数

groups (int, optional) – The number of groups to be used in grouped 1x1 convolutions in each ShuffleUnit. Default: 3.
widen_factor (float, optional) – Width multiplier - adjusts the number of channels in each layer by this amount. Default: 1.0.
out_indices (Sequence[int]) – Output from which stages. Default: (2, )
frozen_stages (int) – Stages to be frozen (all param fixed). Default: -1, which means not freezing any parameters.
conv_cfg (dict) – Config dict for convolution layer. Default: None, which means using conv2d.
norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’BN’).
act_cfg (dict) – Config dict for activation layer. Default: dict(type=’ReLU’).
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.

forward(x)[源代码]¶

Forward function.

参数: x (Tensor | tuple[Tensor]) – x could be a torch.Tensor or a tuple of torch.Tensor, containing input data for forward computation.

init_weights(pretrained=None)[源代码]¶

Init backbone weights.

参数: pretrained (str | None) – If pretrained is a string, then it initializes backbone weights by loading the pretrained checkpoint. If pretrained is None, then it follows default initializer or customized initializer in subclasses.

make_layer(out_channels, num_blocks, first_block=False)[源代码]¶

Stack ShuffleUnit blocks to make a layer.

参数

out_channels (int) – out_channels of the block.
num_blocks (int) – Number of blocks.
first_block (bool, optional) – Whether is the first ShuffleUnit of a sequential ShuffleUnits. Default: False, which means using the grouped 1x1 convolution.

train(mode=True)[源代码]¶

Sets the module in training mode.

This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g. Dropout, BatchNorm, etc.

参数: mode (bool) – whether to set training mode (True) or evaluation mode (False). Default: True.
返回: self
返回类型: Module

class mmpose.models.backbones.ShuffleNetV2(widen_factor=1.0, out_indices=(3), frozen_stages=- 1, conv_cfg=None, norm_cfg={'type': 'BN'}, act_cfg={'type': 'ReLU'}, norm_eval=False, with_cp=False)[源代码]¶

ShuffleNetV2 backbone.

参数

widen_factor (float) – Width multiplier - adjusts the number of channels in each layer by this amount. Default: 1.0.
out_indices (Sequence[int]) – Output from which stages. Default: (0, 1, 2, 3).
frozen_stages (int) – Stages to be frozen (all param fixed). Default: -1, which means not freezing any parameters.
conv_cfg (dict) – Config dict for convolution layer. Default: None, which means using conv2d.
norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’BN’).
act_cfg (dict) – Config dict for activation layer. Default: dict(type=’ReLU’).
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.

forward(x)[源代码]¶

Forward function.

参数: x (Tensor | tuple[Tensor]) – x could be a torch.Tensor or a tuple of torch.Tensor, containing input data for forward computation.

init_weights(pretrained=None)[源代码]¶

Init backbone weights.

参数: pretrained (str | None) – If pretrained is a string, then it initializes backbone weights by loading the pretrained checkpoint. If pretrained is None, then it follows default initializer or customized initializer in subclasses.

train(mode=True)[源代码]¶

Sets the module in training mode.

This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g. Dropout, BatchNorm, etc.

参数: mode (bool) – whether to set training mode (True) or evaluation mode (False). Default: True.
返回: self
返回类型: Module

class mmpose.models.backbones.SwinTransformer(pretrain_img_size=224, in_channels=3, embed_dims=96, patch_size=4, window_size=7, mlp_ratio=4, depths=(2, 2, 6, 2), num_heads=(3, 6, 12, 24), strides=(4, 2, 2, 2), out_indices=(0, 1, 2, 3), qkv_bias=True, qk_scale=None, patch_norm=True, drop_rate=0.0, attn_drop_rate=0.0, drop_path_rate=0.1, use_abs_pos_embed=False, act_cfg={'type': 'GELU'}, norm_cfg={'type': 'LN'}, with_cp=False, convert_weights=False, frozen_stages=- 1)[源代码]¶

Swin Transformer A PyTorch implement of : Swin Transformer: Hierarchical Vision Transformer using Shifted Windows -

https://arxiv.org/abs/2103.14030

Inspiration from https://github.com/microsoft/Swin-Transformer

参数

pretrain_img_size (int | tuple[int]) – The size of input image when pretrain. Defaults: 224.
in_channels (int) – The num of input channels. Defaults: 3.
embed_dims (int) – The feature dimension. Default: 96.
patch_size (int | tuple[int]) – Patch size. Default: 4.
window_size (int) – Window size. Default: 7.
mlp_ratio (int) – Ratio of mlp hidden dim to embedding dim. Default: 4.
depths (tuple[int]) – Depths of each Swin Transformer stage. Default: (2, 2, 6, 2).
num_heads (tuple[int]) – Parallel attention heads of each Swin Transformer stage. Default: (3, 6, 12, 24).
strides (tuple[int]) – The patch merging or patch embedding stride of each Swin Transformer stage. (In swin, we set kernel size equal to stride.) Default: (4, 2, 2, 2).
out_indices (tuple[int]) – Output from which stages. Default: (0, 1, 2, 3).
qkv_bias (bool, optional) – If True, add a learnable bias to query, key, value. Default: True
qk_scale (float | None, optional) – Override default qk scale of head_dim ** -0.5 if set. Default: None.
patch_norm (bool) – If add a norm layer for patch embed and patch merging. Default: True.
drop_rate (float) – Dropout rate. Defaults: 0.
attn_drop_rate (float) – Attention dropout rate. Default: 0.
drop_path_rate (float) – Stochastic depth rate. Defaults: 0.1.
use_abs_pos_embed (bool) – If True, add absolute position embedding to the patch embedding. Defaults: False.
act_cfg (dict) – Config dict for activation layer. Default: dict(type=’LN’).
norm_cfg (dict) – Config dict for normalization layer at output of backone. Defaults: dict(type=’LN’).
with_cp (bool, optional) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.
pretrained (str, optional) – model pretrained path. Default: None.
convert_weights (bool) – The flag indicates whether the pre-trained model is from the original repo. We may need to convert some keys to make it compatible. Default: False.
frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). Default: -1 (-1 means not freezing any parameters).

forward(x)[源代码]¶

Forward function.

参数: x (Tensor | tuple[Tensor]) – x could be a torch.Tensor or a tuple of torch.Tensor, containing input data for forward computation.

init_weights(pretrained=None)[源代码]¶

Initialize the weights in backbone.

参数: pretrained (str, optional) – Path to pre-trained weights. Defaults to None.

train(mode=True)[源代码]¶: Convert the model into training mode while keep layers freezed.

class mmpose.models.backbones.TCFormer(in_channels=3, embed_dims=[64, 128, 256, 512], num_heads=[1, 2, 4, 8], mlp_ratios=[4, 4, 4, 4], qkv_bias=True, qk_scale=None, drop_rate=0.0, attn_drop_rate=0.0, drop_path_rate=0.0, norm_cfg={'eps': 1e-06, 'type': 'LN'}, num_layers=[3, 4, 6, 3], sr_ratios=[8, 4, 2, 1], num_stages=4, pretrained=None, k=5, sample_ratios=[0.25, 0.25, 0.25], return_map=False, convert_weights=True)[源代码]¶

Token Clustering Transformer (TCFormer)

Implementation of Not All Tokens Are Equal: Human-centric Visual Analysis via Token Clustering Transformer <https://arxiv.org/abs/2204.08680>

Args: in_channels (int): Number of input channels. Default: 3. embed_dims (list[int]): Embedding dimension. Default:

[64, 128, 256, 512].

num_heads (Sequence[int]): The attention heads of each transformer
encode layer. Default: [1, 2, 5, 8].

mlp_ratios (Sequence[int]): The ratio of the mlp hidden dim to the
embedding dim of each transformer block.

qkv_bias (bool): Enable bias for qkv if True. Default: True. qk_scale (float | None, optional): Override default qk scale of

head_dim ** -0.5 if set. Default: None.

drop_rate (float): Probability of an element to be zeroed.
Default 0.0.

attn_drop_rate (float): The drop out rate for attention layer.
Default 0.0.

drop_path_rate (float): stochastic depth rate. Default 0. norm_cfg (dict): Config dict for normalization layer.

Default: dict(type=’LN’, eps=1e-6).

num_layers (Sequence[int]): The layer number of each transformer encode
layer. Default: [3, 4, 6, 3].

sr_ratios (Sequence[int]): The spatial reduction rate of each
transformer block. Default: [8, 4, 2, 1].

num_stages (int): The num of stages. Default: 4. pretrained (str, optional): model pretrained path. Default: None. k (int): number of the nearest neighbor used for local density. sample_ratios (list[float]): The sample ratios of CTM modules.

Default: [0.25, 0.25, 0.25]

return_map (bool): If True, transfer dynamic tokens to feature map at
last. Default: False

convert_weights (bool): The flag indicates whether the
pre-trained model is from the original repo. We may need to convert some keys to make it compatible. Default: True.

forward(x)[源代码]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

注解

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmpose.models.backbones.TCN(in_channels, stem_channels=1024, num_blocks=2, kernel_sizes=(3, 3, 3), dropout=0.25, causal=False, residual=True, use_stride_conv=False, conv_cfg={'type': 'Conv1d'}, norm_cfg={'type': 'BN1d'}, max_norm=None)[源代码]¶

TCN backbone.

Temporal Convolutional Networks. More details can be found in the paper .

参数

in_channels (int) – Number of input channels, which equals to num_keypoints * num_features.
stem_channels (int) – Number of feature channels. Default: 1024.
num_blocks (int) – NUmber of basic temporal convolutional blocks. Default: 2.
kernel_sizes (Sequence[int]) – Sizes of the convolving kernel of each basic block. Default: (3, 3, 3).
dropout (float) – Dropout rate. Default: 0.25.
causal (bool) – Use causal convolutions instead of symmetric convolutions (for real-time applications). Default: False.
residual (bool) – Use residual connection. Default: True.
use_stride_conv (bool) – Use TCN backbone optimized for single-frame batching, i.e. where batches have input length = receptive field, and output length = 1. This implementation replaces dilated convolutions with strided convolutions to avoid generating unused intermediate results. The weights are interchangeable with the reference implementation. Default: False
conv_cfg (dict) – dictionary to construct and config conv layer. Default: dict(type=’Conv1d’).
norm_cfg (dict) – dictionary to construct and config norm layer. Default: dict(type=’BN1d’).
max_norm (float|None) – if not None, the weight of convolution layers will be clipped to have a maximum norm of max_norm.

示例

>>> from mmpose.models import TCN
>>> import torch
>>> self = TCN(in_channels=34)
>>> self.eval()
>>> inputs = torch.rand(1, 34, 243)
>>> level_outputs = self.forward(inputs)
>>> for level_out in level_outputs:
...     print(tuple(level_out.shape))
(1, 1024, 235)
(1, 1024, 217)

forward(x)[源代码]¶: Forward function.

init_weights(pretrained=None)[源代码]¶: Initialize the weights.

class mmpose.models.backbones.V2VNet(input_channels, output_channels, mid_channels=32)[源代码]¶

V2VNet.

Please refer to the paper <https://arxiv.org/abs/1711.07399>: for details.

参数

input_channels (int) – Number of channels of the input feature volume.
output_channels (int) – Number of channels of the output volume.
mid_channels (int) – Input and output channels of the encoder-decoder block.

forward(x)[源代码]¶: Forward function.

class mmpose.models.backbones.VGG(depth, num_classes=- 1, num_stages=5, dilations=(1, 1, 1, 1, 1), out_indices=None, frozen_stages=- 1, conv_cfg=None, norm_cfg=None, act_cfg={'type': 'ReLU'}, norm_eval=False, ceil_mode=False, with_last_pool=True)[源代码]¶

VGG backbone.

参数

depth (int) – Depth of vgg, from {11, 13, 16, 19}.
with_norm (bool) – Use BatchNorm or not.
num_classes (int) – number of classes for classification.
num_stages (int) – VGG stages, normally 5.
dilations (Sequence[int]) – Dilation of each stage.
out_indices (Sequence[int]) – Output from which stages. If only one stage is specified, a single tensor (feature map) is returned, otherwise multiple stages are specified, a tuple of tensors will be returned. When it is None, the default behavior depends on whether num_classes is specified. If num_classes <= 0, the default value is (4, ), outputting the last feature map before classifier. If num_classes > 0, the default value is (5, ), outputting the classification score. Default: None.
frozen_stages (int) – Stages to be frozen (all param fixed). -1 means not freezing any parameters.
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.
ceil_mode (bool) – Whether to use ceil_mode of MaxPool. Default: False.
with_last_pool (bool) – Whether to keep the last pooling before classifier. Default: True.

forward(x)[源代码]¶

Forward function.

参数: x (Tensor | tuple[Tensor]) – x could be a torch.Tensor or a tuple of torch.Tensor, containing input data for forward computation.

init_weights(pretrained=None)[源代码]¶

Init backbone weights.

参数: pretrained (str | None) – If pretrained is a string, then it initializes backbone weights by loading the pretrained checkpoint. If pretrained is None, then it follows default initializer or customized initializer in subclasses.

train(mode=True)[源代码]¶

Sets the module in training mode.

This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g. Dropout, BatchNorm, etc.

参数: mode (bool) – whether to set training mode (True) or evaluation mode (False). Default: True.
返回: self
返回类型: Module

class mmpose.models.backbones.ViPNAS_MobileNetV3(wid=[16, 16, 24, 40, 80, 112, 160], expan=[None, 1, 5, 4, 5, 5, 6], dep=[None, 1, 4, 4, 4, 4, 4], ks=[3, 3, 7, 7, 5, 7, 5], group=[None, 8, 120, 20, 100, 280, 240], att=[None, True, True, False, True, True, True], stride=[2, 1, 2, 2, 2, 1, 2], act=['HSwish', 'ReLU', 'ReLU', 'ReLU', 'HSwish', 'HSwish', 'HSwish'], conv_cfg=None, norm_cfg={'type': 'BN'}, frozen_stages=- 1, norm_eval=False, with_cp=False)[源代码]¶

ViPNAS_MobileNetV3 backbone.

“ViPNAS: Efficient Video Pose Estimation via Neural Architecture Search” More details can be found in the paper .

参数

wid (list(int)) – Searched width config for each stage.
expan (list(int)) – Searched expansion ratio config for each stage.
dep (list(int)) – Searched depth config for each stage.
ks (list(int)) – Searched kernel size config for each stage.
group (list(int)) – Searched group number config for each stage.
att (list(bool)) – Searched attention config for each stage.
stride (list(int)) – Stride config for each stage.
act (list(dict)) – Activation config for each stage.
conv_cfg (dict) – Config dict for convolution layer. Default: None, which means using conv2d.
norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’BN’).
frozen_stages (int) – Stages to be frozen (all param fixed). Default: -1, which means not freezing any parameters.
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.

forward(x)[源代码]¶

Forward function.

参数: x (Tensor | tuple[Tensor]) – x could be a torch.Tensor or a tuple of torch.Tensor, containing input data for forward computation.

init_weights(pretrained=None)[源代码]¶

Init backbone weights.

参数: pretrained (str | None) – If pretrained is a string, then it initializes backbone weights by loading the pretrained checkpoint. If pretrained is None, then it follows default initializer or customized initializer in subclasses.

train(mode=True)[源代码]¶

Sets the module in training mode.

This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g. Dropout, BatchNorm, etc.

参数: mode (bool) – whether to set training mode (True) or evaluation mode (False). Default: True.
返回: self
返回类型: Module

class mmpose.models.backbones.ViPNAS_ResNet(depth, in_channels=3, num_stages=4, strides=(1, 2, 2, 2), dilations=(1, 1, 1, 1), out_indices=(3), style='pytorch', deep_stem=False, avg_down=False, frozen_stages=- 1, conv_cfg=None, norm_cfg={'requires_grad': True, 'type': 'BN'}, norm_eval=False, with_cp=False, zero_init_residual=True, wid=[48, 80, 160, 304, 608], expan=[None, 1, 1, 1, 1], dep=[None, 4, 6, 7, 3], ks=[7, 3, 5, 5, 5], group=[None, 16, 16, 16, 16], att=[None, True, False, True, True])[源代码]¶

ViPNAS_ResNet backbone.

“ViPNAS: Efficient Video Pose Estimation via Neural Architecture Search” More details can be found in the paper .

参数

depth (int) – Network depth, from {18, 34, 50, 101, 152}.
in_channels (int) – Number of input image channels. Default: 3.
num_stages (int) – Stages of the network. Default: 4.
strides (Sequence[int]) – Strides of the first block of each stage. Default: (1, 2, 2, 2).
dilations (Sequence[int]) – Dilation of each stage. Default: (1, 1, 1, 1).
out_indices (Sequence[int]) – Output from which stages. If only one stage is specified, a single tensor (feature map) is returned, otherwise multiple stages are specified, a tuple of tensors will be returned. Default: (3, ).
style (str) – pytorch or caffe. If set to “pytorch”, the stride-two layer is the 3x3 conv layer, otherwise the stride-two layer is the first 1x1 conv layer.
deep_stem (bool) – Replace 7x7 conv in input stem with 3 3x3 conv. Default: False.
avg_down (bool) – Use AvgPool instead of stride conv when downsampling in the bottleneck. Default: False.
frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Default: -1.
conv_cfg (dict | None) – The config dict for conv layers. Default: None.
norm_cfg (dict) – The config dict for norm layers.
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.
zero_init_residual (bool) – Whether to use zero init for last norm layer in resblocks to let them behave as identity. Default: True.
wid (list(int)) – Searched width config for each stage.
expan (list(int)) – Searched expansion ratio config for each stage.
dep (list(int)) – Searched depth config for each stage.
ks (list(int)) – Searched kernel size config for each stage.
group (list(int)) – Searched group number config for each stage.
att (list(bool)) – Searched attention config for each stage.

forward(x)[源代码]¶: Forward function.

init_weights(pretrained=None)[源代码]¶: Initialize model weights.

make_res_layer(**kwargs)[源代码]¶: Make a ViPNAS ResLayer.

property norm1¶

the normalization layer named “norm1”

Type: nn.Module

train(mode=True)[源代码]¶: Convert the model into training mode.

necks¶

class mmpose.models.necks.FPN(in_channels, out_channels, num_outs, start_level=0, end_level=- 1, add_extra_convs=False, relu_before_extra_convs=False, no_norm_on_lateral=False, conv_cfg=None, norm_cfg=None, act_cfg=None, upsample_cfg={'mode': 'nearest'})[源代码]¶

Feature Pyramid Network.

This is an implementation of paper Feature Pyramid Networks for Object Detection.

参数

in_channels (list[int]) – Number of input channels per scale.
out_channels (int) – Number of output channels (used at each scale).
num_outs (int) – Number of output scales.
start_level (int) – Index of the start input backbone level used to build the feature pyramid. Default: 0.
end_level (int) – Index of the end input backbone level (exclusive) to build the feature pyramid. Default: -1, which means the last level.
add_extra_convs (bool | str) –
If bool, it decides whether to add conv layers on top of the original feature maps. Default to False. If True, it is equivalent to add_extra_convs=’on_input’. If str, it specifies the source feature map of the extra convs. Only the following options are allowed
- ’on_input’: Last feat map of neck inputs (i.e. backbone feature).
- ’on_lateral’: Last feature map after lateral convs.
- ’on_output’: The last output feature map after fpn convs.
relu_before_extra_convs (bool) – Whether to apply relu before the extra conv. Default: False.
no_norm_on_lateral (bool) – Whether to apply norm on lateral. Default: False.
conv_cfg (dict) – Config dict for convolution layer. Default: None.
norm_cfg (dict) – Config dict for normalization layer. Default: None.
act_cfg (dict) – Config dict for activation layer in ConvModule. Default: None.
upsample_cfg (dict) – Config dict for interpolate layer. Default: dict(mode=’nearest’).

示例

>>> import torch
>>> in_channels = [2, 3, 5, 7]
>>> scales = [340, 170, 84, 43]
>>> inputs = [torch.rand(1, c, s, s)
...           for c, s in zip(in_channels, scales)]
>>> self = FPN(in_channels, 11, len(in_channels)).eval()
>>> outputs = self.forward(inputs)
>>> for i in range(len(outputs)):
...     print(f'outputs[{i}].shape = {outputs[i].shape}')
outputs[0].shape = torch.Size([1, 11, 340, 340])
outputs[1].shape = torch.Size([1, 11, 170, 170])
outputs[2].shape = torch.Size([1, 11, 84, 84])
outputs[3].shape = torch.Size([1, 11, 43, 43])

forward(inputs)[源代码]¶: Forward function.

init_weights()[源代码]¶: Initialize model weights.

class mmpose.models.necks.GlobalAveragePooling[源代码]¶

Global Average Pooling neck.

Note that we use view to remove extra channel after pooling. We do not use squeeze as it will also remove the batch dimension when the tensor has a batch dimension of size 1, which can lead to unexpected errors.

forward(inputs)[源代码]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

注解

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmpose.models.necks.MTA(in_channels=[64, 128, 256, 512], out_channels=128, num_outs=4, start_level=0, end_level=- 1, add_extra_convs=False, relu_before_extra_convs=False, no_norm_on_lateral=False, conv_cfg=None, norm_cfg=None, act_cfg=None, num_heads=[2, 2, 2, 2], mlp_ratios=[4, 4, 4, 4], sr_ratios=[8, 4, 2, 1], qkv_bias=True, qk_scale=None, drop_rate=0.0, attn_drop_rate=0.0, drop_path_rate=0.0, transformer_norm_cfg={'type': 'LN'}, use_sr_conv=False)[源代码]¶

Multi-stage Token feature Aggregation (MTA) module in TCFormer.

参数

in_channels (list[int]) – Number of input channels per stage. Default: [64, 128, 256, 512].
out_channels (int) – Number of output channels (used at each scale).
num_outs (int) – Number of output scales. Default: 4.
start_level (int) – Index of the start input backbone level used to build the feature pyramid. Default: 0.
end_level (int) – Index of the end input backbone level (exclusive) to build the feature pyramid. Default: -1, which means the last level.
add_extra_convs (bool | str) – If bool, it decides whether to add conv layers on top of the original feature maps. Default to False. If True, it is equivalent to add_extra_convs=’on_input’. If str, it specifies the source feature map of the extra convs. Only the following options are allowed - ‘on_input’: Last feat map of neck inputs (i.e. backbone feature). - ‘on_output’: The last output feature map after fpn convs.
relu_before_extra_convs (bool) – Whether to apply relu before the extra conv. Default: False.
no_norm_on_lateral (bool) – Whether to apply norm on lateral. Default: False.
conv_cfg (dict) – Config dict for convolution layer. Default: None.
norm_cfg (dict) – Config dict for normalization layer. Default: None.
act_cfg (dict) – Config dict for activation layer in ConvModule.
num_heads (Sequence[int]) – The attention heads of each transformer block. Default: [2, 2, 2, 2].
mlp_ratios (Sequence[int]) – The ratio of the mlp hidden dim to the embedding dim of each transformer block.
sr_ratios (Sequence[int]) – The spatial reduction rate of each transformer block. Default: [8, 4, 2, 1].
qkv_bias (bool) – Enable bias for qkv if True. Default: True.
qk_scale (float | None, optional) – Override default qk scale of head_dim ** -0.5 if set. Default: None.
drop_rate (float) – Probability of an element to be zeroed. Default 0.0.
attn_drop_rate (float) – The drop out rate for attention layer. Default 0.0.
drop_path_rate (float) – stochastic depth rate. Default 0.
transformer_norm_cfg (dict) – Config dict for normalization layer in transformer blocks. Default: dict(type=’LN’).
use_sr_conv (bool) – If True, use a conv layer for spatial reduction. If False, use a pooling process for spatial reduction. Defaults: False.

forward(inputs)[源代码]¶: Forward function.

init_weights()[源代码]¶: Initialize the weights.

class mmpose.models.necks.PoseWarperNeck(in_channels, out_channels, inner_channels, deform_groups=17, dilations=(3, 6, 12, 18, 24), trans_conv_kernel=1, res_blocks_cfg=None, offsets_kernel=3, deform_conv_kernel=3, in_index=0, input_transform=None, freeze_trans_layer=True, norm_eval=False, im2col_step=80)[源代码]¶

PoseWarper neck.

“Learning temporal pose estimation from sparsely-labeled videos”.

参数

in_channels (int) – Number of input channels from backbone
out_channels (int) – Number of output channels
inner_channels (int) – Number of intermediate channels of the res block
deform_groups (int) – Number of groups in the deformable conv
dilations (list|tuple) – different dilations of the offset conv layers
trans_conv_kernel (int) – the kernel of the trans conv layer, which is used to get heatmap from the output of backbone. Default: 1
res_blocks_cfg (dict|None) –
config of residual blocks. If None, use the default values. If not None, it should contain the following keys:
- block (str): the type of residual block, Default: ‘BASIC’.
- num_blocks (int): the number of blocks, Default: 20.
offsets_kernel (int) – the kernel of offset conv layer.
deform_conv_kernel (int) – the kernel of defomrable conv layer.
in_index (int|Sequence[int]) – Input feature index. Default: 0
input_transform (str|None) –
Transformation type of input features. Options: ‘resize_concat’, ‘multiple_select’, None. Default: None.
- ’resize_concat’: Multiple feature maps will be resize to the same size as first one and than concat together. Usually used in FCN head of HRNet.
- ’multiple_select’: Multiple feature maps will be bundle into a list and passed into decode head.
- None: Only one select feature map is allowed.
freeze_trans_layer (bool) – Whether to freeze the transition layer (stop grad and set eval mode). Default: True.
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.
im2col_step (int) – the argument im2col_step in deformable conv, Default: 80.

forward(inputs, frame_weight)[源代码]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

注解

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

train(mode=True)[源代码]¶: Convert the model into training mode.

detectors¶

class mmpose.models.detectors.AssociativeEmbedding(backbone, keypoint_head=None, train_cfg=None, test_cfg=None, pretrained=None, loss_pose=None)[源代码]¶

Associative embedding pose detectors.

参数

backbone (dict) – Backbone modules to extract feature.
keypoint_head (dict) – Keypoint head to process feature.
train_cfg (dict) – Config for training. Default: None.
test_cfg (dict) – Config for testing. Default: None.
pretrained (str) – Path to the pretrained models.
loss_pose (None) – Deprecated arguments. Please use loss_keypoint for heads instead.

forward(img=None, targets=None, masks=None, joints=None, img_metas=None, return_loss=True, return_heatmap=False, **kwargs)[源代码]¶

Calls either forward_train or forward_test depending on whether return_loss is True.

注解

batch_size: N
num_keypoints: K
num_img_channel: C
img_width: imgW
img_height: imgH
heatmaps weight: W
heatmaps height: H
max_num_people: M

参数

img (torch.Tensor[N,C,imgH,imgW]) – Input image.
targets (list(torch.Tensor[N,K,H,W])) – Multi-scale target heatmaps.
masks (list(torch.Tensor[N,H,W])) – Masks of multi-scale target heatmaps
joints (list(torch.Tensor[N,M,K,2])) – Joints of multi-scale target heatmaps for ae loss
img_metas (dict) –
Information about val & test. By default it includes:
- ”image_file”: image path
- ”aug_data”: input
- ”test_scale_factor”: test scale factor
- ”base_size”: base size of input
- ”center”: center of image
- ”scale”: scale of image
- ”flip_index”: flip index of keypoints
loss (return) – return_loss=True for training, return_loss=False for validation & test.
return_heatmap (bool) – Option to return heatmap.

返回

if ‘return_loss’ is true, then return losses. Otherwise, return predicted poses, scores, image paths and heatmaps.

返回类型

dict|tuple

forward_dummy(img)[源代码]¶

Used for computing network FLOPs.

See tools/get_flops.py.

参数: img (torch.Tensor) – Input image.
返回: Outputs.
返回类型: Tensor

forward_test(img, img_metas, return_heatmap=False, **kwargs)[源代码]¶

Inference the bottom-up model.

注解

Batchsize: N (currently support batchsize = 1)
num_img_channel: C
img_width: imgW
img_height: imgH

参数

flip_index (List(int)) –
aug_data (List(Tensor[NxCximgHximgW])) – Multi-scale image
test_scale_factor (List(float)) – Multi-scale factor
base_size (Tuple(int)) – Base size of image when scale is 1
center (np.ndarray) – center of image
scale (np.ndarray) – the scale of image

forward_train(img, targets, masks, joints, img_metas, **kwargs)[源代码]¶

Forward the bottom-up model and calculate the loss.

注解

batch_size: N num_keypoints: K num_img_channel: C img_width: imgW img_height: imgH heatmaps weight: W heatmaps height: H max_num_people: M

参数

img (torch.Tensor[N,C,imgH,imgW]) – Input image.
targets (List(torch.Tensor[N,K,H,W])) – Multi-scale target heatmaps.
masks (List(torch.Tensor[N,H,W])) – Masks of multi-scale target heatmaps
joints (List(torch.Tensor[N,M,K,2])) – Joints of multi-scale target heatmaps for ae loss
img_metas (dict) – Information about val&test By default this includes: - “image_file”: image path - “aug_data”: input - “test_scale_factor”: test scale factor - “base_size”: base size of input - “center”: center of image - “scale”: scale of image - “flip_index”: flip index of keypoints

返回

The total loss for bottom-up

返回类型

dict

init_weights(pretrained=None)[源代码]¶: Weight initialization for model.

show_result(img, result, skeleton=None, kpt_score_thr=0.3, bbox_color=None, pose_kpt_color=None, pose_link_color=None, radius=4, thickness=1, font_scale=0.5, win_name='', show=False, show_keypoint_weight=False, wait_time=0, out_file=None)[源代码]¶

Draw result over img.

参数

img (str or Tensor) – The image to be displayed.
result (list[dict]) – The results to draw over img (bbox_result, pose_result).
skeleton (list[list]) – The connection of keypoints. skeleton is 0-based indexing.
kpt_score_thr (float, optional) – Minimum score of keypoints to be shown. Default: 0.3.
pose_kpt_color (np.array[Nx3]`) – Color of N keypoints. If None, do not draw keypoints.
pose_link_color (np.array[Mx3]) – Color of M links. If None, do not draw links.
radius (int) – Radius of circles.
thickness (int) – Thickness of lines.
font_scale (float) – Font scales of texts.
win_name (str) – The window name.
show (bool) – Whether to show the image. Default: False.
show_keypoint_weight (bool) – Whether to change the transparency using the predicted confidence scores of keypoints.
wait_time (int) – Value of waitKey param. Default: 0.
out_file (str or None) – The filename to write the image. Default: None.

返回

Visualized image only if not show or out_file

返回类型

Tensor

property with_keypoint¶: Check if has keypoint_head.

class mmpose.models.detectors.CID(backbone, keypoint_head=None, train_cfg=None, test_cfg=None, pretrained=None, loss_pose=None)[源代码]¶

Contextual Instance Decouple for Multi-Person Pose Estimation.

参数

backbone (dict) – Backbone modules to extract feature.
keypoint_head (dict) – Keypoint head to process feature.
train_cfg (dict) – Config for training. Default: None.
test_cfg (dict) – Config for testing. Default: None.
pretrained (str) – Path to the pretrained models.
loss_pose (None) – Deprecated arguments. Please use loss_keypoint for heads instead.

forward(img=None, multi_heatmap=None, multi_mask=None, instance_coord=None, instance_heatmap=None, instance_mask=None, instance_valid=None, img_metas=None, return_loss=True, return_heatmap=False, **kwargs)[源代码]¶

Calls either forward_train or forward_test depending on whether return_loss is True.

注解

batch_size: N
num_keypoints: K
num_img_channel: C
img_width: imgW
img_height: imgH
heatmaps weight: W
heatmaps height: H
max_num_people: M

参数

img (torch.Tensor[N,C,imgH,imgW]) – Input image.
multi_heatmap (torch.Tensor[N,C,H,W]) – Multi-person heatmaps
multi_mask (torch.Tensor[N,1,H,W]) – Multi-person heatmap mask
instance_coord (torch.Tensor[N,M,2]) – Instance center coord
instance_heatmap (torch.Tensor[N,M,C,H,W]) – Single person heatmap for each instance
instance_mask (torch.Tensor[N,M,C,1,1]) – Single person heatmap mask
instance_valid (torch.Tensor[N,M]) – Bool mask to indicate the existence of each person
img_metas (dict) –
Information about val & test. By default it includes:
- ”image_file”: image path
- ”aug_data”: input
- ”test_scale_factor”: test scale factor
- ”base_size”: base size of input
- ”center”: center of image
- ”scale”: scale of image
- ”flip_index”: flip index of keypoints
loss (return) – return_loss=True for training, return_loss=False for validation & test.
return_heatmap (bool) – Option to return heatmap.

返回

if ‘return_loss’ is true, then return losses. Otherwise, return predicted poses, scores, image paths and heatmaps.

返回类型

dict|tuple

forward_dummy(img)[源代码]¶

Used for computing network FLOPs.

See tools/get_flops.py.

参数: img (torch.Tensor) – Input image.
返回: Outputs.
返回类型: Tensor

forward_test(img, img_metas, return_heatmap=False, **kwargs)[源代码]¶

Inference the bottom-up model.

注解

Batchsize: N (currently support batchsize = 1)
num_img_channel: C
img_width: imgW
img_height: imgH

参数

flip_index (List(int)) –
aug_data (List(Tensor[NxCximgHximgW])) – Multi-scale image
test_scale_factor (List(float)) – Multi-scale factor
base_size (Tuple(int)) – Base size of image when scale is 1
center (np.ndarray) – center of image
scale (np.ndarray) – the scale of image

forward_train(img, multi_heatmap, multi_mask, instance_coord, instance_heatmap, instance_mask, instance_valid, img_metas, **kwargs)[源代码]¶

Forward CID model and calculate the loss.

注解

batch_size: N num_keypoints: K num_img_channel: C img_width: imgW img_height: imgH heatmaps weight: W heatmaps height: H max_num_people: M

参数

img (torch.Tensor[N,C,imgH,imgW]) – Input image.
multi_heatmap (torch.Tensor[N,C,H,W]) – Multi-person heatmaps
multi_mask (torch.Tensor[N,1,H,W]) – Multi-person heatmap mask
instance_coord (torch.Tensor[N,M,2]) – Instance center coord
instance_heatmap (torch.Tensor[N,M,C,H,W]) – Single person heatmap for each instance
instance_mask (torch.Tensor[N,M,C,1,1]) – Single person heatmap mask
instance_valid (torch.Tensor[N,M]) – Bool mask to indicate the existence of each person
img_metas (dict) – Information about val&test By default this includes: - “image_file”: image path - “aug_data”: input - “test_scale_factor”: test scale factor - “base_size”: base size of input - “center”: center of image - “scale”: scale of image - “flip_index”: flip index of keypoints

返回

The total loss for bottom-up

返回类型

dict

init_weights(pretrained=None)[源代码]¶: Weight initialization for model.

show_result(img, result, skeleton=None, kpt_score_thr=0.3, bbox_color=None, pose_kpt_color=None, pose_link_color=None, radius=4, thickness=1, font_scale=0.5, win_name='', show=False, show_keypoint_weight=False, wait_time=0, out_file=None)[源代码]¶

Draw result over img.

参数

img (str or Tensor) – The image to be displayed.
result (list[dict]) – The results to draw over img (bbox_result, pose_result).
skeleton (list[list]) – The connection of keypoints. skeleton is 0-based indexing.
kpt_score_thr (float, optional) – Minimum score of keypoints to be shown. Default: 0.3.
pose_kpt_color (np.array[Nx3]`) – Color of N keypoints. If None, do not draw keypoints.
pose_link_color (np.array[Mx3]) – Color of M links. If None, do not draw links.
radius (int) – Radius of circles.
thickness (int) – Thickness of lines.
font_scale (float) – Font scales of texts.
win_name (str) – The window name.
show (bool) – Whether to show the image. Default: False.
show_keypoint_weight (bool) – Whether to change the transparency using the predicted confidence scores of keypoints.
wait_time (int) – Value of waitKey param. Default: 0.
out_file (str or None) – The filename to write the image. Default: None.

返回

Visualized image only if not show or out_file

返回类型

Tensor

property with_keypoint¶: Check if has keypoint_head.

class mmpose.models.detectors.DetectAndRegress(backbone, human_detector, pose_regressor, train_cfg=None, test_cfg=None, pretrained=None, freeze_2d=True)[源代码]¶

DetectAndRegress approach for multiview human pose detection.

参数

backbone (ConfigDict) – Dictionary to construct the 2D pose detector
human_detector (ConfigDict) – dictionary to construct human detector
pose_regressor (ConfigDict) – dictionary to construct pose regressor
train_cfg (ConfigDict) – Config for training. Default: None.
test_cfg (ConfigDict) – Config for testing. Default: None.
pretrained (str) – Path to the pretrained 2D model. Default: None.
freeze_2d (bool) – Whether to freeze the 2D model in training. Default: True.

forward(img=None, img_metas=None, return_loss=True, targets=None, masks=None, targets_3d=None, input_heatmaps=None, **kwargs)[源代码]¶

注解

batch_size: N num_keypoints: K num_img_channel: C img_width: imgW img_height: imgH feature_maps width: W feature_maps height: H volume_length: cubeL volume_width: cubeW volume_height: cubeH

参数

img (list(torch.Tensor[NxCximgHximgW])) – Multi-camera input images to the 2D model.
img_metas (list(dict)) – Information about image, 3D groundtruth and camera parameters.
return_loss – Option to return loss. return loss=True for training, return loss=False for validation & test.
targets (list(torch.Tensor[NxKxHxW])) – Multi-camera target feature_maps of the 2D model.
masks (list(torch.Tensor[NxHxW])) – Multi-camera masks of the input to the 2D model.
targets_3d (torch.Tensor[NxcubeLxcubeWxcubeH]) – Ground-truth 3D heatmap of human centers.
input_heatmaps (list(torch.Tensor[NxKxHxW])) –

Multi-camera feature_maps when the 2D model is not available.
Default: None.
**kwargs –

返回

if ‘return_loss’ is true, then return losses.: Otherwise, return predicted poses, human centers and sample_id

返回类型

dict

forward_dummy(img, input_heatmaps=None, num_candidates=5)[源代码]¶: Used for computing network FLOPs.

forward_test(img, img_metas, input_heatmaps=None)[源代码]¶

注解

batch_size: N num_keypoints: K num_img_channel: C img_width: imgW img_height: imgH feature_maps width: W feature_maps height: H volume_length: cubeL volume_width: cubeW volume_height: cubeH

参数

img (list(torch.Tensor[NxCximgHximgW])) – Multi-camera input images to the 2D model.
img_metas (list(dict)) – Information about image, 3D groundtruth and camera parameters.
input_heatmaps (list(torch.Tensor[NxKxHxW])) –

Multi-camera feature_maps when the 2D model is not available.
Default: None.

返回

predicted poses, human centers and sample_id

返回类型

dict

forward_train(img, img_metas, targets=None, masks=None, targets_3d=None, input_heatmaps=None)[源代码]¶

注解

batch_size: N num_keypoints: K num_img_channel: C img_width: imgW img_height: imgH feature_maps width: W feature_maps height: H volume_length: cubeL volume_width: cubeW volume_height: cubeH

参数

img (list(torch.Tensor[NxCximgHximgW])) – Multi-camera input images to the 2D model.
img_metas (list(dict)) – Information about image, 3D groundtruth and camera parameters.
targets (list(torch.Tensor[NxKxHxW])) – Multi-camera target feature_maps of the 2D model.
masks (list(torch.Tensor[NxHxW])) – Multi-camera masks of the input to the 2D model.
targets_3d (torch.Tensor[NxcubeLxcubeWxcubeH]) – Ground-truth 3D heatmap of human centers.
input_heatmaps (list(torch.Tensor[NxKxHxW])) –

Multi-camera feature_maps when the 2D model is not available.
Default: None.

返回

losses.

返回类型

dict

show_result(img, img_metas, visualize_2d=False, input_heatmaps=None, dataset_info=None, radius=4, thickness=2, out_dir=None, show=False)[源代码]¶: Visualize the results.

train(mode=True)[源代码]¶

Sets the module in training mode. :param mode: whether to set training mode (True)

or evaluation mode (False). Default: True.

返回: self
返回类型: Module

train_step(data_batch, optimizer, **kwargs)[源代码]¶

The iteration step during training.

This method defines an iteration step during training, except for the back propagation and optimizer updating, which are done in an optimizer hook. Note that in some complicated cases or models, the whole process including back propagation and optimizer updating is also defined in this method, such as GAN.

参数

data_batch (dict) – The output of dataloader.
optimizer (torch.optim.Optimizer | dict) – The optimizer of runner is passed to train_step(). This argument is unused and reserved.

返回

It should contain at least 3 keys: loss, log_vars,: num_samples. loss is a tensor for back propagation, which can be a weighted sum of multiple losses. log_vars contains all the variables to be sent to the logger. num_samples indicates the batch size (when the model is DDP, it means the batch size on each GPU), which is used for averaging the logs.

返回类型

dict

class mmpose.models.detectors.DisentangledKeypointRegressor(backbone, keypoint_head=None, train_cfg=None, test_cfg=None, pretrained=None)[源代码]¶

Disentangled keypoint regression pose detector.

参数

backbone (dict) – Backbone modules to extract feature.
keypoint_head (dict) – Keypoint head to process feature.
train_cfg (dict) – Config for training. Default: None.
test_cfg (dict) – Config for testing. Default: None.
pretrained (str) – Path to the pretrained models.

forward(img=None, heatmaps=None, masks=None, offsets=None, offset_weights=None, img_metas=None, return_loss=True, return_heatmap=False, **kwargs)[源代码]¶

Calls either forward_train or forward_test depending on whether return_loss is True.

注解

batch_size: N
num_keypoints: K
num_img_channel: C
img_width: imgW
img_height: imgH
heatmaps weight: W
heatmaps height: H
max_num_people: M

参数

img (torch.Tensor[N,C,imgH,imgW]) – # input image.
targets (list(torch.Tensor[N,K,H,W])) – Multi-scale target heatmaps.
masks (list(torch.Tensor[N,H,W])) – Masks of multi-scale target heatmaps
joints (list(torch.Tensor[N,M,K,2])) – Joints of multi-scale target heatmaps for ae loss
img_metas (dict) –
Information about val & test. By default it includes:
- ”image_file”: image path
- ”aug_data”: # input
- ”test_scale_factor”: test scale factor
- ”base_size”: base size of # input
- ”center”: center of image
- ”scale”: scale of image
- ”flip_index”: flip index of keypoints
loss (return) – return_loss=True for training, return_loss=False for validation & test.
return_heatmap (bool) – Option to return heatmap.

返回

if ‘return_loss’ is true, then return losses. Otherwise, return predicted poses, scores, image paths and heatmaps.

返回类型

dict|tuple

forward_dummy(img)[源代码]¶

Used for computing network FLOPs.

See tools/get_flops.py.

参数: img (torch.Tensor) – # input image.
返回: Outputs.
返回类型: Tensor

forward_test(img, img_metas, return_heatmap=False, **kwargs)[源代码]¶

Inference the one-stage model.

注解

Batchsize: N (currently support batchsize = 1)
num_img_channel: C
img_width: imgW
img_height: imgH

参数

flip_index (List(int)) –
aug_data (List(Tensor[NxCximgHximgW])) – Multi-scale image
num_joints (int) – Number of joints of an instsance. test_scale_factor (List(float)): Multi-scale factor
base_size (Tuple(int)) – Base size of image when scale is 1
image_size (int) – Short edge of images when scale is 1
heatmap_size (int) – Short edge of outputs when scale is 1
center (np.ndarray) – center of image
scale (np.ndarray) – the scale of image
skeleton (List(List(int))) – Links of joints

forward_train(img, heatmaps, masks, offsets, offset_weights, img_metas, **kwargs)[源代码]¶

Forward the bottom-up model and calculate the loss.

注解

batch_size: N num_keypoints: K num_img_channel: C img_width: imgW img_height: imgH heatmaps weight: W heatmaps height: H max_num_people: M

参数

img (torch.Tensor[N,C,imgH,imgW]) – # input image.
targets (List(torch.Tensor[N,K,H,W])) – Multi-scale target heatmaps.
masks (List(torch.Tensor[N,H,W])) – Masks of multi-scale target heatmaps
joints (List(torch.Tensor[N,M,K,2])) – Joints of multi-scale target heatmaps for ae loss
img_metas (dict) – Information about val&test By default this includes: - “image_file”: image path - “aug_data”: # input - “test_scale_factor”: test scale factor - “base_size”: base size of # input - “center”: center of image - “scale”: scale of image - “flip_index”: flip index of keypoints

返回

The total loss for bottom-up

返回类型

dict

init_weights(pretrained=None)[源代码]¶: Weight initialization for model.

show_result(img, result, skeleton=None, kpt_score_thr=0.3, bbox_color=None, pose_kpt_color=None, pose_link_color=None, radius=4, thickness=1, font_scale=0.5, win_name='', show=False, show_keypoint_weight=False, wait_time=0, out_file=None)[源代码]¶

Draw result over img.

参数

img (str or Tensor) – The image to be displayed.
result (list[dict]) – The results to draw over img (bbox_result, pose_result).
skeleton (list[list]) – The connection of keypoints. skeleton is 0-based indexing.
kpt_score_thr (float, optional) – Minimum score of keypoints to be shown. Default: 0.3.
pose_kpt_color (np.array[Nx3]`) – Color of N keypoints. If None, do not draw keypoints.
pose_link_color (np.array[Mx3]) – Color of M links. If None, do not draw links.
radius (int) – Radius of circles.
thickness (int) – Thickness of lines.
font_scale (float) – Font scales of texts.
win_name (str) – The window name.
show (bool) – Whether to show the image. Default: False.
show_keypoint_weight (bool) – Whether to change the transparency using the predicted confidence scores of keypoints.
wait_time (int) – Value of waitKey param. Default: 0.
out_file (str or None) – The filename to write the image. Default: None.

返回

Visualized image only if not show or out_file

返回类型

Tensor

property with_keypoint¶: Check if has keypoint_head.

class mmpose.models.detectors.GestureRecognizer(backbone, neck=None, cls_head=None, train_cfg=None, test_cfg=None, modality='rgb', pretrained=None)[源代码]¶

Hand gesture recognizer.

参数

backbone (dict) – Backbone modules to extract feature.
neck (dict) – Neck Modules to process feature.
cls_head (dict) – Classification head to process feature.
train_cfg (dict) – Config for training. Default: None.
test_cfg (dict) – Config for testing. Default: None.
modality (str or list or tuple) – Data modality. Default: None.
pretrained (str) – Path to the pretrained models.

forward(video, label=None, img_metas=None, return_loss=True, **kwargs)[源代码]¶

Calls either forward_train or forward_test depending on whether return_loss=True. Note this setting will change the expected inputs.

Note:

batch_size: N

num_vid_channel: C (Default: 3)

video height: vidH

video width: vidW

video length: vidL

Args:
video (list[torch.Tensor[NxCxvidLxvidHxvidW]]): Input videos. label (torch.Tensor[N]): Category label of videos. img_metas (list(dict)): Information about data.

By default this includes: - “fps: video frame rate - “modality”: modality of input videos

return_loss (bool): Option to return loss. return loss=True
for training, return loss=False for validation & test.

Returns:
dict|tuple: if return loss is true, then return losses. Otherwise, return predicted gestures for clips with a certain length. .

forward_test(video, label, img_metas, **kwargs)[源代码]¶: Defines the computation performed at every call when testing.

forward_train(video, label, img_metas, **kwargs)[源代码]¶: Defines the computation performed at every call when training.

init_weights(pretrained=None)[源代码]¶: Weight initialization for model.

set_train_epoch(epoch: int)[源代码]¶: set the training epoch of heads to support customized behaviour.

show_result(video, result, **kwargs)[源代码]¶: Visualize the results.

class mmpose.models.detectors.Interhand3D(backbone, neck=None, keypoint_head=None, train_cfg=None, test_cfg=None, pretrained=None, loss_pose=None)[源代码]¶

Top-down interhand 3D pose detector of paper ref: Gyeongsik Moon.

“InterHand2.6M: A Dataset and Baseline for 3D Interacting Hand Pose Estimation from a Single RGB Image”. A child class of TopDown detector.

forward(img, target=None, target_weight=None, img_metas=None, return_loss=True, **kwargs)[源代码]¶

Calls either forward_train or forward_test depending on whether return_loss=True. Note this setting will change the expected inputs. When return_loss=True, img and img_meta are single-nested (i.e. Tensor and List[dict]), and when resturn_loss=False, img and img_meta should be double nested (i.e. list[Tensor], list[list[dict]]), with the outer list indicating test time augmentations.

注解

batch_size: N
num_keypoints: K
num_img_channel: C (Default: 3)
img height: imgH
img width: imgW
heatmaps height: H
heatmaps weight: W

参数

img (torch.Tensor[NxCximgHximgW]) – Input images.
target (list[torch.Tensor]) – Target heatmaps, relative hand
depth and hand type. (root) –
target_weight (list[torch.Tensor]) – Weights for target
heatmaps –
hand root depth and hand type. (relative) –
img_metas (list(dict)) –
Information about data augmentation By default this includes:
- ”image_file: path to the image file
- ”center”: center of the bbox
- ”scale”: scale of the bbox
- ”rotation”: rotation of the bbox
- ”bbox_score”: score of bbox
- ”heatmap3d_depth_bound”: depth bound of hand keypoint 3D
  heatmap
- ”root_depth_bound”: depth bound of relative root depth 1D
  heatmap
return_loss (bool) – Option to return loss. return loss=True for training, return loss=False for validation & test.

返回

if return loss is true, then return losses. Otherwise, return predicted poses, boxes, image paths, heatmaps, relative hand root depth and hand type.

返回类型

dict|tuple

forward_test(img, img_metas, **kwargs)[源代码]¶: Defines the computation performed at every call when testing.

show_result(result, img=None, skeleton=None, kpt_score_thr=0.3, radius=8, bbox_color='green', thickness=2, pose_kpt_color=None, pose_link_color=None, vis_height=400, num_instances=- 1, axis_azimuth=- 115, win_name='', show=False, wait_time=0, out_file=None)[源代码]¶

Visualize 3D pose estimation results.

参数

result (list[dict]) –
The pose estimation results containing:
- ”keypoints_3d” ([K,4]): 3D keypoints
- ”keypoints” ([K,3] or [T,K,3]): Optional for visualizing
  2D inputs. If a sequence is given, only the last frame will be used for visualization
- ”bbox” ([4,] or [T,4]): Optional for visualizing 2D inputs
- ”title” (str): title for the subplot
img (str or Tensor) – Optional. The image to visualize 2D inputs on.
skeleton (list of [idx_i,idx_j]) – Skeleton described by a list of links, each is a pair of joint indices.
kpt_score_thr (float, optional) – Minimum score of keypoints to be shown. Default: 0.3.
radius (int) – Radius of circles.
bbox_color (str or tuple or Color) – Color of bbox lines.
thickness (int) – Thickness of lines.
pose_kpt_color (np.array[Nx3]`) – Color of N keypoints. If None, do not draw keypoints.
pose_link_color (np.array[Mx3]) – Color of M limbs. If None, do not draw limbs.
vis_height (int) – The image height of the visualization. The width will be N*vis_height depending on the number of visualized items.
num_instances (int) – Number of instances to be shown in 3D. If smaller than 0, all the instances in the pose_result will be shown. Otherwise, pad or truncate the pose_result to a length of num_instances.
axis_azimuth (float) – axis azimuth angle for 3D visualizations.
win_name (str) – The window name.
show (bool) – Whether to show the image. Default: False.
wait_time (int) – Value of waitKey param. Default: 0.
out_file (str or None) – The filename to write the image. Default: None.

返回

Visualized img, only if not show or out_file.

返回类型

Tensor

class mmpose.models.detectors.MultiTask(backbone, heads, necks=None, head2neck=None, pretrained=None)[源代码]¶

Multi-task detectors.

参数

backbone (dict) – Backbone modules to extract feature.
heads (list[dict]) – heads to output predictions.
necks (list[dict] | None) – necks to process feature.
(dict{int (head2neck) – int}): head index to neck index.
pretrained (str) – Path to the pretrained models.

forward(img, target=None, target_weight=None, img_metas=None, return_loss=True, **kwargs)[源代码]¶

Calls either forward_train or forward_test depending on whether return_loss=True. Note this setting will change the expected inputs. When return_loss=True, img and img_meta are single-nested (i.e. Tensor and List[dict]), and when resturn_loss=False, img and img_meta should be double nested (i.e. List[Tensor], List[List[dict]]), with the outer list indicating test time augmentations.

注解

batch_size: N
num_keypoints: K
num_img_channel: C (Default: 3)
img height: imgH
img weight: imgW
heatmaps height: H
heatmaps weight: W

参数

img (torch.Tensor[N,C,imgH,imgW]) – Input images.
target (list[torch.Tensor]) – Targets.
target_weight (List[torch.Tensor]) – Weights.
img_metas (list(dict)) –
Information about data augmentation By default this includes:
- ”image_file: path to the image file
- ”center”: center of the bbox
- ”scale”: scale of the bbox
- ”rotation”: rotation of the bbox
- ”bbox_score”: score of bbox
return_loss (bool) – Option to return loss. return loss=True for training, return loss=False for validation & test.

返回

if return loss is true, then return losses. Otherwise, return predicted poses, boxes, image paths and heatmaps.

返回类型

dict|tuple

forward_dummy(img)[源代码]¶

Used for computing network FLOPs.

See tools/get_flops.py.

参数: img (torch.Tensor) – Input image.
返回: Outputs.
返回类型: list[Tensor]

forward_test(img, img_metas, **kwargs)[源代码]¶: Defines the computation performed at every call when testing.

forward_train(img, target, target_weight, img_metas, **kwargs)[源代码]¶: Defines the computation performed at every call when training.

init_weights(pretrained=None)[源代码]¶: Weight initialization for model.

property with_necks¶: Check if has keypoint_head.

class mmpose.models.detectors.ParametricMesh(backbone, mesh_head, smpl, disc=None, loss_gan=None, loss_mesh=None, train_cfg=None, test_cfg=None, pretrained=None)[源代码]¶

Model-based 3D human mesh detector. Take a single color image as input and output 3D joints, SMPL parameters and camera parameters.

参数

backbone (dict) – Backbone modules to extract feature.
mesh_head (dict) – Mesh head to process feature.
smpl (dict) – Config for SMPL model.
disc (dict) – Discriminator for SMPL parameters. Default: None.
loss_gan (dict) – Config for adversarial loss. Default: None.
loss_mesh (dict) – Config for mesh loss. Default: None.
train_cfg (dict) – Config for training. Default: None.
test_cfg (dict) – Config for testing. Default: None.
pretrained (str) – Path to the pretrained models.

forward(img, img_metas=None, return_loss=False, **kwargs)[源代码]¶

Forward function.

Calls either forward_train or forward_test depending on whether return_loss=True.

注解

batch_size: N
num_img_channel: C (Default: 3)
img height: imgH
img width: imgW

参数

img (torch.Tensor[N x C x imgH x imgW]) – Input images.
img_metas (list(dict)) –
Information about data augmentation By default this includes:
- ”image_file: path to the image file
- ”center”: center of the bbox
- ”scale”: scale of the bbox
- ”rotation”: rotation of the bbox
- ”bbox_score”: score of bbox
return_loss (bool) – Option to return loss. return loss=True for training, return loss=False for validation & test.

返回

Return predicted 3D joints, SMPL parameters, boxes and image paths.

forward_dummy(img)[源代码]¶

Used for computing network FLOPs.

See tools/get_flops.py.

参数: img (torch.Tensor) – Input image.
返回: Outputs.
返回类型: Tensor

forward_test(img, img_metas, return_vertices=False, return_faces=False, **kwargs)[源代码]¶: Defines the computation performed at every call when testing.

forward_train(*args, **kwargs)[源代码]¶

Forward function for training.

For ParametricMesh, we do not use this interface.

get_3d_joints_from_mesh(vertices)[源代码]¶: Get 3D joints from 3D mesh using predefined joints regressor.

init_weights(pretrained=None)[源代码]¶: Weight initialization for model.

show_result(result, img, show=False, out_file=None, win_name='', wait_time=0, bbox_color='green', mesh_color=(76, 76, 204), **kwargs)[源代码]¶

Visualize 3D mesh estimation results.

参数

result (list[dict]) –
The mesh estimation results containing:
- ”bbox” (ndarray[4]): instance bounding bbox
- ”center” (ndarray[2]): bbox center
- ”scale” (ndarray[2]): bbox scale
- ”keypoints_3d” (ndarray[K,3]): predicted 3D keypoints
- ”camera” (ndarray[3]): camera parameters
- ”vertices” (ndarray[V, 3]): predicted 3D vertices
- ”faces” (ndarray[F, 3]): mesh faces
img (str or Tensor) – Optional. The image to visualize 2D inputs on.
win_name (str) – The window name.
show (bool) – Whether to show the image. Default: False.
wait_time (int) – Value of waitKey param. Default: 0.
out_file (str or None) – The filename to write the image. Default: None.
bbox_color (str or tuple or Color) – Color of bbox lines.
mesh_color (str or tuple or Color) – Color of mesh surface.

返回

Visualized img, only if not show or out_file.

返回类型

ndarray

train_step(data_batch, optimizer, **kwargs)[源代码]¶

Train step function.

In this function, the detector will finish the train step following the pipeline:

get fake and real SMPL parameters

optimize discriminator (if have)

optimize generator

If self.train_cfg.disc_step > 1, the train step will contain multiple iterations for optimizing discriminator with different input data and only one iteration for optimizing generator after disc_step iterations for discriminator.

参数

data_batch (torch.Tensor) – Batch of data as input.
optimizer (dict[torch.optim.Optimizer]) – Dict with optimizers for generator and discriminator (if have).

返回

Dict with loss, information for logger, the number of samples.

返回类型

outputs (dict)

val_step(data_batch, **kwargs)[源代码]¶

Forward function for evaluation.

参数: data_batch (dict) – Contain data for forward.
返回: Contain the results from model.
返回类型: dict

class mmpose.models.detectors.PoseLifter(backbone, neck=None, keypoint_head=None, traj_backbone=None, traj_neck=None, traj_head=None, loss_semi=None, train_cfg=None, test_cfg=None, pretrained=None)[源代码]¶

Pose lifter that lifts 2D pose to 3D pose.

The basic model is a pose model that predicts root-relative pose. If traj_head is not None, a trajectory model that predicts absolute root joint position is also built.

参数

backbone (dict) – Config for the backbone of pose model.
neck (dict|None) – Config for the neck of pose model.
keypoint_head (dict|None) – Config for the head of pose model.
traj_backbone (dict|None) – Config for the backbone of trajectory model. If traj_backbone is None and traj_head is not None, trajectory model will share backbone with pose model.
traj_neck (dict|None) – Config for the neck of trajectory model.
traj_head (dict|None) – Config for the head of trajectory model.
loss_semi (dict|None) – Config for semi-supervision loss.
train_cfg (dict|None) – Config for keypoint head during training.
test_cfg (dict|None) – Config for keypoint head during testing.
pretrained (str|None) – Path to pretrained weights.

forward(input, target=None, target_weight=None, metas=None, return_loss=True, **kwargs)[源代码]¶

Calls either forward_train or forward_test depending on whether return_loss=True.

注解

batch_size: N
num_input_keypoints: Ki
input_keypoint_dim: Ci
input_sequence_len: Ti
num_output_keypoints: Ko
output_keypoint_dim: Co
input_sequence_len: To

参数

input (torch.Tensor[NxKixCixTi]) – Input keypoint coordinates.
target (torch.Tensor[NxKoxCoxTo]) – Output keypoint coordinates. Defaults to None.
target_weight (torch.Tensor[NxKox1]) – Weights across different joint types. Defaults to None.
metas (list(dict)) – Information about data augmentation
return_loss (bool) – Option to return loss. return loss=True for training, return loss=False for validation & test.

返回

If reutrn_loss is true, return losses. Otherwise return predicted poses.

返回类型

dict|Tensor

forward_dummy(input)[源代码]¶

Used for computing network FLOPs. See tools/get_flops.py.

参数: input (torch.Tensor) – Input pose
返回: Model output
返回类型: Tensor

forward_test(input, metas, **kwargs)[源代码]¶: Defines the computation performed at every call when training.

forward_train(input, target, target_weight, metas, **kwargs)[源代码]¶: Defines the computation performed at every call when training.

init_weights(pretrained=None)[源代码]¶: Weight initialization for model.

show_result(result, img=None, skeleton=None, pose_kpt_color=None, pose_link_color=None, radius=8, thickness=2, vis_height=400, num_instances=- 1, axis_azimuth=70, win_name='', show=False, wait_time=0, out_file=None)[源代码]¶

Visualize 3D pose estimation results.

参数

result (list[dict]) –
The pose estimation results containing:
- ”keypoints_3d” ([K,4]): 3D keypoints
- ”keypoints” ([K,3] or [T,K,3]): Optional for visualizing
  2D inputs. If a sequence is given, only the last frame will be used for visualization
- ”bbox” ([4,] or [T,4]): Optional for visualizing 2D inputs
- ”title” (str): title for the subplot
img (str or Tensor) – Optional. The image to visualize 2D inputs on.
skeleton (list of [idx_i,idx_j]) – Skeleton described by a list of links, each is a pair of joint indices.
pose_kpt_color (np.array[Nx3]`) – Color of N keypoints. If None, do not draw keypoints.
pose_link_color (np.array[Mx3]) – Color of M links. If None, do not draw links.
radius (int) – Radius of circles.
thickness (int) – Thickness of lines.
vis_height (int) – The image height of the visualization. The width will be N*vis_height depending on the number of visualized items.
num_instances (int) – Number of instances to be shown in 3D. If smaller than 0, all the instances in the result will be shown. Otherwise, pad or truncate the result to a length of num_instances.
axis_azimuth (float) – axis azimuth angle for 3D visualizations.
win_name (str) – The window name.
show (bool) – Whether to directly show the visualization.
wait_time (int) – Value of waitKey param. Default: 0.
out_file (str or None) – The filename to write the image. Default: None.

返回

Visualized img, only if not show or out_file.

返回类型

Tensor

property with_keypoint¶: Check if has keypoint_head.

property with_neck¶: Check if has keypoint_neck.

property with_traj¶: Check if has trajectory_head.

property with_traj_backbone¶: Check if has trajectory_backbone.

property with_traj_neck¶: Check if has trajectory_neck.

class mmpose.models.detectors.PoseWarper(backbone, neck=None, keypoint_head=None, train_cfg=None, test_cfg=None, pretrained=None, loss_pose=None, concat_tensors=True)[源代码]¶

Top-down pose detectors for multi-frame settings for video inputs.

“Learning temporal pose estimation from sparsely-labeled videos”.

A child class of TopDown detector. The main difference between PoseWarper and TopDown lies in that the former takes a list of tensors as input image while the latter takes a single tensor as input image in forward method.

参数

backbone (dict) – Backbone modules to extract features.
neck (dict) – intermediate modules to transform features.
keypoint_head (dict) – Keypoint head to process feature.
train_cfg (dict) – Config for training. Default: None.
test_cfg (dict) – Config for testing. Default: None.
pretrained (str) – Path to the pretrained models.
loss_pose (None) – Deprecated arguments. Please use loss_keypoint for heads instead.
concat_tensors (bool) – Whether to concat the tensors on the batch dim, which can speed up, Default: True

forward(img, target=None, target_weight=None, img_metas=None, return_loss=True, return_heatmap=False, **kwargs)[源代码]¶

Calls either forward_train or forward_test depending on whether return_loss=True. Note this setting will change the expected inputs. When return_loss=True, img and img_meta are single-nested (i.e. Tensor and List[dict]), and when resturn_loss=False, img and img_meta should be double nested (i.e. List[Tensor], List[List[dict]]), with the outer list indicating test time augmentations.

注解

number of frames: F
batch_size: N
num_keypoints: K
num_img_channel: C (Default: 3)
img height: imgH
img width: imgW
heatmaps height: H
heatmaps weight: W

参数

imgs (list[F,torch.Tensor[N,C,imgH,imgW]]) – multiple input frames
target (torch.Tensor[N,K,H,W]) – Target heatmaps for one frame.
target_weight (torch.Tensor[N,K,1]) – Weights across different joint types.
img_metas (list(dict)) –
Information about data augmentation By default this includes:
- ”image_file: paths to multiple video frames
- ”center”: center of the bbox
- ”scale”: scale of the bbox
- ”rotation”: rotation of the bbox
- ”bbox_score”: score of bbox
return_loss (bool) – Option to return loss. return loss=True for training, return loss=False for validation & test.
return_heatmap (bool) – Option to return heatmap.

返回

if return loss is true, then return losses. Otherwise, return predicted poses, boxes, image paths and heatmaps.

返回类型

dict|tuple

forward_dummy(img)[源代码]¶

Used for computing network FLOPs.

See tools/get_flops.py.

参数: img (torch.Tensor[N,C,imgH,imgW], or list|tuple of tensors) – multiple input frames, N >= 2.
返回: Output heatmaps.
返回类型: Tensor

forward_test(imgs, img_metas, return_heatmap=False, **kwargs)[源代码]¶: Defines the computation performed at every call when testing.

forward_train(imgs, target, target_weight, img_metas, **kwargs)[源代码]¶: Defines the computation performed at every call when training.

class mmpose.models.detectors.TopDown(backbone, neck=None, keypoint_head=None, train_cfg=None, test_cfg=None, pretrained=None, loss_pose=None)[源代码]¶

Top-down pose detectors.

参数

backbone (dict) – Backbone modules to extract feature.
keypoint_head (dict) – Keypoint head to process feature.
train_cfg (dict) – Config for training. Default: None.
test_cfg (dict) – Config for testing. Default: None.
pretrained (str) – Path to the pretrained models.
loss_pose (None) – Deprecated arguments. Please use loss_keypoint for heads instead.

forward(img, target=None, target_weight=None, img_metas=None, return_loss=True, return_heatmap=False, **kwargs)[源代码]¶

Calls either forward_train or forward_test depending on whether return_loss=True. Note this setting will change the expected inputs. When return_loss=True, img and img_meta are single-nested (i.e. Tensor and List[dict]), and when resturn_loss=False, img and img_meta should be double nested (i.e. List[Tensor], List[List[dict]]), with the outer list indicating test time augmentations.

注解

batch_size: N
num_keypoints: K
num_img_channel: C (Default: 3)
img height: imgH
img width: imgW
heatmaps height: H
heatmaps weight: W

参数

img (torch.Tensor[NxCximgHximgW]) – Input images.
target (torch.Tensor[NxKxHxW]) – Target heatmaps.
target_weight (torch.Tensor[NxKx1]) – Weights across different joint types.
img_metas (list(dict)) –
Information about data augmentation By default this includes:
- ”image_file: path to the image file
- ”center”: center of the bbox
- ”scale”: scale of the bbox
- ”rotation”: rotation of the bbox
- ”bbox_score”: score of bbox
return_loss (bool) – Option to return loss. return loss=True for training, return loss=False for validation & test.
return_heatmap (bool) – Option to return heatmap.

返回

if return loss is true, then return losses. Otherwise, return predicted poses, boxes, image paths and heatmaps.

返回类型

dict|tuple

forward_dummy(img)[源代码]¶

Used for computing network FLOPs.

See tools/get_flops.py.

参数: img (torch.Tensor) – Input image.
返回: Output heatmaps.
返回类型: Tensor

forward_test(img, img_metas, return_heatmap=False, **kwargs)[源代码]¶: Defines the computation performed at every call when testing.

forward_train(img, target, target_weight, img_metas, **kwargs)[源代码]¶: Defines the computation performed at every call when training.

init_weights(pretrained=None)[源代码]¶: Weight initialization for model.

show_result(img, result, skeleton=None, kpt_score_thr=0.3, bbox_color='green', pose_kpt_color=None, pose_link_color=None, text_color='white', radius=4, thickness=1, font_scale=0.5, bbox_thickness=1, win_name='', show=False, show_keypoint_weight=False, wait_time=0, out_file=None)[源代码]¶

Draw result over img.

参数

img (str or Tensor) – The image to be displayed.
result (list[dict]) – The results to draw over img (bbox_result, pose_result).
skeleton (list[list]) – The connection of keypoints. skeleton is 0-based indexing.
kpt_score_thr (float, optional) – Minimum score of keypoints to be shown. Default: 0.3.
bbox_color (str or tuple or Color) – Color of bbox lines.
pose_kpt_color (np.array[Nx3]`) – Color of N keypoints. If None, do not draw keypoints.
pose_link_color (np.array[Mx3]) – Color of M links. If None, do not draw links.
text_color (str or tuple or Color) – Color of texts.
radius (int) – Radius of circles.
thickness (int) – Thickness of lines.
font_scale (float) – Font scales of texts.
win_name (str) – The window name.
show (bool) – Whether to show the image. Default: False.
show_keypoint_weight (bool) – Whether to change the transparency using the predicted confidence scores of keypoints.
wait_time (int) – Value of waitKey param. Default: 0.
out_file (str or None) – The filename to write the image. Default: None.

返回

Visualized img, only if not show or out_file.

返回类型

Tensor

property with_keypoint¶: Check if has keypoint_head.

property with_neck¶: Check if has neck.

class mmpose.models.detectors.VoxelCenterDetector(image_size, heatmap_size, space_size, cube_size, space_center, center_net, center_head, train_cfg=None, test_cfg=None)[源代码]¶

Detect human center by 3D CNN on voxels.

Please refer to the paper <https://arxiv.org/abs/2004.06239> for details. :param image_size: input size of the 2D model. :type image_size: list :param heatmap_size: output size of the 2D model. :type heatmap_size: list :param space_size: Size of the 3D space. :type space_size: list :param cube_size: Size of the input volume to the 3D CNN. :type cube_size: list :param space_center: Coordinate of the center of the 3D space. :type space_center: list :param center_net: Dictionary to construct the center net. :type center_net: ConfigDict :param center_head: Dictionary to construct the center head. :type center_head: ConfigDict :param train_cfg: Config for training. Default: None. :type train_cfg: ConfigDict :param test_cfg: Config for testing. Default: None. :type test_cfg: ConfigDict

assign2gt(center_candidates, gt_centers, gt_num_persons)[源代码]¶: “Assign gt id to each valid human center candidate.

forward(img, img_metas, return_loss=True, feature_maps=None, targets_3d=None)[源代码]¶

注解

batch_size: N num_keypoints: K num_img_channel: C img_width: imgW img_height: imgH heatmaps width: W heatmaps height: H

参数

img (list(torch.Tensor[NxCximgHximgW])) – Multi-camera input images to the 2D model.
img_metas (list(dict)) – Information about image, 3D groundtruth and camera parameters.
return_loss – Option to return loss. return loss=True for training, return loss=False for validation & test.
targets_3d (torch.Tensor[NxcubeLxcubeWxcubeH]) – Ground-truth 3D heatmap of human centers.
feature_maps (list(torch.Tensor[NxKxHxW])) – Multi-camera feature_maps.

返回

if ‘return_loss’ is true, then return losses.: Otherwise, return predicted poses

返回类型

dict

forward_dummy(feature_maps)[源代码]¶: Used for computing network FLOPs.

forward_test(img, img_metas, feature_maps=None)[源代码]¶

注解

batch_size: N num_keypoints: K num_img_channel: C img_width: imgW img_height: imgH heatmaps width: W heatmaps height: H

参数

img (list(torch.Tensor[NxCximgHximgW])) – Multi-camera input images to the 2D model.
img_metas (list(dict)) – Information about image, 3D groundtruth and camera parameters.
feature_maps (list(torch.Tensor[NxKxHxW])) – Multi-camera feature_maps.

返回

human centers

forward_train(img, img_metas, feature_maps=None, targets_3d=None, return_preds=False)[源代码]¶

注解

batch_size: N num_keypoints: K num_img_channel: C img_width: imgW img_height: imgH heatmaps width: W heatmaps height: H

参数

img (list(torch.Tensor[NxCximgHximgW])) – Multi-camera input images to the 2D model.
img_metas (list(dict)) – Information about image, 3D groundtruth and camera parameters.
targets_3d (torch.Tensor[NxcubeLxcubeWxcubeH]) – Ground-truth 3D heatmap of human centers.
feature_maps (list(torch.Tensor[NxKxHxW])) – Multi-camera feature_maps.
return_preds (bool) – Whether to return prediction results

返回

if ‘return_pred’ is true, then return losses: and human centers. Otherwise, return losses only

返回类型

dict

show_result(**kwargs)[源代码]¶: Visualize the results.

class mmpose.models.detectors.VoxelSinglePose(image_size, heatmap_size, sub_space_size, sub_cube_size, num_joints, pose_net, pose_head, train_cfg=None, test_cfg=None)[源代码]¶

VoxelPose Please refer to the paper <https://arxiv.org/abs/2004.06239> for details.

参数

image_size (list) – input size of the 2D model.
heatmap_size (list) – output size of the 2D model.
sub_space_size (list) – Size of the cuboid human proposal.
sub_cube_size (list) – Size of the input volume to the pose net.
pose_net (ConfigDict) – Dictionary to construct the pose net.
pose_head (ConfigDict) – Dictionary to construct the pose head.
train_cfg (ConfigDict) – Config for training. Default: None.
test_cfg (ConfigDict) – Config for testing. Default: None.

forward(img, img_metas, return_loss=True, feature_maps=None, human_candidates=None, **kwargs)[源代码]¶

注解

batch_size: N num_keypoints: K num_img_channel: C img_width: imgW img_height: imgH feature_maps width: W feature_maps height: H volume_length: cubeL volume_width: cubeW volume_height: cubeH

参数

img (list(torch.Tensor[NxCximgHximgW])) – Multi-camera input images to the 2D model.
feature_maps (list(torch.Tensor[NxCxHxW])) – Multi-camera input feature_maps.
img_metas (list(dict)) – Information about image, 3D groundtruth and camera parameters.
human_candidates (torch.Tensor[NxPx5]) – Human candidates.
return_loss – Option to return loss. return loss=True for training, return loss=False for validation & test.

forward_dummy(feature_maps, num_candidates=5)[源代码]¶: Used for computing network FLOPs.

forward_test(img, img_metas, feature_maps=None, human_candidates=None, **kwargs)[源代码]¶

Defines the computation performed at training. .. note:

batch_size: N
num_keypoints: K
num_img_channel: C
img_width: imgW
img_height: imgH
feature_maps width: W
feature_maps height: H
volume_length: cubeL
volume_width: cubeW
volume_height: cubeH

参数

img (list(torch.Tensor[NxCximgHximgW])) – Multi-camera input images to the 2D model.
feature_maps (list(torch.Tensor[NxCxHxW])) – Multi-camera input feature_maps.
img_metas (list(dict)) – Information about image, 3D groundtruth and camera parameters.
human_candidates (torch.Tensor[NxPx5]) – Human candidates.

返回

predicted poses, human centers and sample_id

返回类型

dict

forward_train(img, img_metas, feature_maps=None, human_candidates=None, return_preds=False, **kwargs)[源代码]¶

Defines the computation performed at training. .. note:

batch_size: N
num_keypoints: K
num_img_channel: C
img_width: imgW
img_height: imgH
feature_maps width: W
feature_maps height: H
volume_length: cubeL
volume_width: cubeW
volume_height: cubeH

参数

img (list(torch.Tensor[NxCximgHximgW])) – Multi-camera input images to the 2D model.
feature_maps (list(torch.Tensor[NxCxHxW])) – Multi-camera input feature_maps.
img_metas (list(dict)) – Information about image, 3D groundtruth and camera parameters.
human_candidates (torch.Tensor[NxPx5]) – Human candidates.
return_preds (bool) – Whether to return prediction results

返回

losses.

返回类型

dict

show_result(**kwargs)[源代码]¶: Visualize the results.

heads¶

class mmpose.models.heads.AEHigherResolutionHead(in_channels, num_joints, tag_per_joint=True, extra=None, num_deconv_layers=1, num_deconv_filters=(32), num_deconv_kernels=(4), num_basic_blocks=4, cat_output=None, with_ae_loss=None, loss_keypoint=None)[源代码]¶

Associative embedding with higher resolution head. paper ref: Bowen Cheng et al. “HigherHRNet: Scale-Aware Representation Learning for Bottom- Up Human Pose Estimation”.

参数

in_channels (int) – Number of input channels.
num_joints (int) – Number of joints
tag_per_joint (bool) – If tag_per_joint is True, the dimension of tags equals to num_joints, else the dimension of tags is 1. Default: True
extra (dict) – Configs for extra conv layers. Default: None
num_deconv_layers (int) – Number of deconv layers. num_deconv_layers should >= 0. Note that 0 means no deconv layers.
num_deconv_filters (list|tuple) – Number of filters. If num_deconv_layers > 0, the length of
num_deconv_kernels (list|tuple) – Kernel sizes.
cat_output (list[bool]) – Option to concat outputs.
with_ae_loss (list[bool]) – Option to use ae loss.
loss_keypoint (dict) – Config for loss. Default: None.

forward(x)[源代码]¶: Forward function.

get_loss(outputs, targets, masks, joints)[源代码]¶

Calculate bottom-up keypoint loss.

注解

batch_size: N
num_keypoints: K
num_outputs: O
heatmaps height: H
heatmaps weight: W

参数

outputs (list(torch.Tensor[N,K,H,W])) – Multi-scale output heatmaps.
targets (List(torch.Tensor[N,K,H,W])) – Multi-scale target heatmaps.
masks (List(torch.Tensor[N,H,W])) – Masks of multi-scale target heatmaps
joints (List(torch.Tensor[N,M,K,2])) – Joints of multi-scale target heatmaps for ae loss

init_weights()[源代码]¶: Initialize model weights.

class mmpose.models.heads.AEMultiStageHead(in_channels, out_channels, num_stages=1, num_deconv_layers=3, num_deconv_filters=(256, 256, 256), num_deconv_kernels=(4, 4, 4), extra=None, loss_keypoint=None)[源代码]¶

Associative embedding multi-stage head. paper ref: Alejandro Newell et al. “Associative Embedding: End-to-end Learning for Joint Detection and Grouping”

参数

in_channels (int) – Number of input channels.
out_channels (int) – Number of output channels.
num_deconv_layers (int) – Number of deconv layers. num_deconv_layers should >= 0. Note that 0 means no deconv layers.
num_deconv_filters (list|tuple) – Number of filters. If num_deconv_layers > 0, the length of
num_deconv_kernels (list|tuple) – Kernel sizes.
loss_keypoint (dict) – Config for loss. Default: None.

forward(x)[源代码]¶

Forward function.

返回: a list of heatmaps from multiple stages.
返回类型: out (list[Tensor])

get_loss(output, targets, masks, joints)[源代码]¶

Calculate bottom-up keypoint loss.

注解

batch_size: N
num_keypoints: K
heatmaps height: H
heatmaps weight: W

参数

output (List(torch.Tensor[NxKxHxW])) – Output heatmaps.
targets (List(List(torch.Tensor[NxKxHxW]))) – Multi-stage and multi-scale target heatmaps.
masks (List(List(torch.Tensor[NxHxW]))) – Masks of multi-stage and multi-scale target heatmaps
joints (List(List(torch.Tensor[NxMxKx2]))) – Joints of multi-stage multi-scale target heatmaps for ae loss

init_weights()[源代码]¶: Initialize model weights.

class mmpose.models.heads.AESimpleHead(in_channels, num_joints, num_deconv_layers=3, num_deconv_filters=(256, 256, 256), num_deconv_kernels=(4, 4, 4), tag_per_joint=True, with_ae_loss=None, extra=None, loss_keypoint=None)[源代码]¶

Associative embedding simple head. paper ref: Alejandro Newell et al. “Associative Embedding: End-to-end Learning for Joint Detection and Grouping”

参数

in_channels (int) – Number of input channels.
num_joints (int) – Number of joints.
num_deconv_layers (int) – Number of deconv layers. num_deconv_layers should >= 0. Note that 0 means no deconv layers.
num_deconv_filters (list|tuple) – Number of filters. If num_deconv_layers > 0, the length of
num_deconv_kernels (list|tuple) – Kernel sizes.
tag_per_joint (bool) – If tag_per_joint is True, the dimension of tags equals to num_joints, else the dimension of tags is 1. Default: True
with_ae_loss (list[bool]) – Option to use ae loss or not.
loss_keypoint (dict) – Config for loss. Default: None.

get_loss(outputs, targets, masks, joints)[源代码]¶

Calculate bottom-up keypoint loss.

注解

batch_size: N
num_keypoints: K
num_outputs: O
heatmaps height: H
heatmaps weight: W

参数

outputs (list(torch.Tensor[N,K,H,W])) – Multi-scale output heatmaps.
targets (List(torch.Tensor[N,K,H,W])) – Multi-scale target heatmaps.
masks (List(torch.Tensor[N,H,W])) – Masks of multi-scale target heatmaps
joints (List(torch.Tensor[N,M,K,2])) – Joints of multi-scale target heatmaps for ae loss

class mmpose.models.heads.CIDHead(in_channels, gfd_channels, num_joints, multi_hm_loss_factor=1.0, single_hm_loss_factor=4.0, contrastive_loss_factor=1.0, max_train_instances=200, prior_prob=0.01)[源代码]¶

CID head. paper ref: Dongkai Wang et al. “Contextual Instance Decouple for Robust Multi-Person Pose Estimation”.

参数

in_channels (int) – Number of input channels.
gfd_channels (int) – Number of instance feature map channels
num_joints (int) – Number of joints
multi_hm_loss_factor (float) – loss weight for multi-person heatmap
single_hm_loss_factor (float) – loss weight for single person heatmap
contrastive_loss_factor (float) – loss weight for contrastive loss
max_train_instances (int) – limit the number of instances
training to avoid (during) –
prior_prob (float) – focal loss bias initialization

forward(features, forward_info=None)[源代码]¶: Forward function.

init_weights()[源代码]¶: Initialize model weights.

class mmpose.models.heads.CuboidCenterHead(space_size, space_center, cube_size, max_num=10, max_pool_kernel=3)[源代码]¶

Get results from the 3D human center heatmap. In this module, human 3D centers are local maximums obtained from the 3D heatmap via NMS (max- pooling).

参数

space_size (list[3]) – The size of the 3D space.
cube_size (list[3]) – The size of the heatmap volume.
space_center (list[3]) – The coordinate of space center.
max_num (int) – Maximum of human center detections.
max_pool_kernel (int) – Kernel size of the max-pool kernel in nms.

forward(heatmap_volumes)[源代码]¶

参数: heatmap_volumes (torch.Tensor(NXLXWXH)) – 3D human center heatmaps predicted by the network.
返回: Coordinates of human centers.
返回类型: human_centers (torch.Tensor(NXPX5))

class mmpose.models.heads.CuboidPoseHead(beta)[源代码]¶

forward(heatmap_volumes, grid_coordinates)[源代码]¶

参数

heatmap_volumes (torch.Tensor(NxKxLxWxH)) – 3D human pose heatmaps predicted by the network.
grid_coordinates (torch.Tensor(Nx(LxWxH)x3)) – Coordinates of the grids in the heatmap volumes.

返回

Coordinates of human poses.

返回类型

human_poses (torch.Tensor(NxKx3))

class mmpose.models.heads.DEKRHead(in_channels, num_joints, num_heatmap_filters=32, num_offset_filters_per_joint=15, in_index=0, input_transform=None, num_deconv_layers=0, num_deconv_filters=None, num_deconv_kernels=None, extra={'final_conv_kernel': 0}, align_corners=False, heatmap_loss=None, offset_loss=None)[源代码]¶

DisEntangled Keypoint Regression head. “Bottom-up human pose estimation via disentangled keypoint regression”, CVPR’2021.

参数

in_channels (int) – Number of input channels.
num_joints (int) – Number of joints.
num_heatmap_filters (int) – Number of filters for heatmap branch.
num_offset_filters_per_joint (int) – Number of filters for each joint.
in_index (int|Sequence[int]) – Input feature index. Default: 0
input_transform (str|None) –
Transformation type of input features. Options: ‘resize_concat’, ‘multiple_select’, None. Default: None.
- ’resize_concat’: Multiple feature maps will be resized to the
  same size as the first one and then concat together. Usually used in FCN head of HRNet.
- ’multiple_select’: Multiple feature maps will be bundle into
  a list and passed into decode head.
- None: Only one select feature map is allowed.
align_corners (bool) – align_corners argument of F.interpolate. Default: False.
heatmap_loss (dict) – Config for heatmap loss. Default: None.
offset_loss (dict) – Config for offset loss. Default: None.

forward(x)[源代码]¶: Forward function.

get_loss(outputs, heatmaps, masks, offsets, offset_weights)[源代码]¶

Calculate the dekr loss.

注解

batch_size: N
num_channels: C
num_joints: K
heatmaps height: H
heatmaps weight: W

参数

outputs (List(torch.Tensor[N,C,H,W])) – Multi-scale outputs.
heatmaps (List(torch.Tensor[N,K+1,H,W])) – Multi-scale heatmap targets.
masks (List(torch.Tensor[N,K+1,H,W])) – Weights of multi-scale heatmap targets.
offsets (List(torch.Tensor[N,K*2,H,W])) – Multi-scale offset targets.
offset_weights (List(torch.Tensor[N,K*2,H,W])) – Weights of multi-scale offset targets.

init_weights()[源代码]¶: Initialize model weights.

class mmpose.models.heads.DeconvHead(in_channels=3, out_channels=17, num_deconv_layers=3, num_deconv_filters=(256, 256, 256), num_deconv_kernels=(4, 4, 4), extra=None, in_index=0, input_transform=None, align_corners=False, loss_keypoint=None)[源代码]¶

Simple deconv head.

参数

in_channels (int) – Number of input channels.
out_channels (int) – Number of output channels.
num_deconv_layers (int) – Number of deconv layers. num_deconv_layers should >= 0. Note that 0 means no deconv layers.
num_deconv_filters (list|tuple) – Number of filters. If num_deconv_layers > 0, the length of
num_deconv_kernels (list|tuple) – Kernel sizes.
in_index (int|Sequence[int]) – Input feature index. Default: 0
input_transform (str|None) –
Transformation type of input features. Options: ‘resize_concat’, ‘multiple_select’, None. Default: None.
- ’resize_concat’: Multiple feature maps will be resized to the
  same size as the first one and then concat together. Usually used in FCN head of HRNet.
- ’multiple_select’: Multiple feature maps will be bundle into
  a list and passed into decode head.
- None: Only one select feature map is allowed.
align_corners (bool) – align_corners argument of F.interpolate. Default: False.
loss_keypoint (dict) – Config for loss. Default: None.

forward(x)[源代码]¶: Forward function.

get_loss(outputs, targets, masks)[源代码]¶

Calculate bottom-up masked mse loss.

注解

batch_size: N
num_channels: C
heatmaps height: H
heatmaps weight: W

参数

outputs (List(torch.Tensor[N,C,H,W])) – Multi-scale outputs.
targets (List(torch.Tensor[N,C,H,W])) – Multi-scale targets.
masks (List(torch.Tensor[N,H,W])) – Masks of multi-scale targets.

init_weights()[源代码]¶: Initialize model weights.

class mmpose.models.heads.DeepposeRegressionHead(in_channels, num_joints, loss_keypoint=None, out_sigma=False, train_cfg=None, test_cfg=None)[源代码]¶

Deeppose regression head with fully connected layers.

“DeepPose: Human Pose Estimation via Deep Neural Networks”.

参数

in_channels (int) – Number of input channels
num_joints (int) – Number of joints
loss_keypoint (dict) – Config for keypoint loss. Default: None.
out_sigma (bool) – Predict the sigma (the viriance of the joint location) together with the joint location. Default: False

decode(img_metas, output, **kwargs)[源代码]¶

Decode the keypoints from output regression.

参数

img_metas (list(dict)) –
Information about data augmentation By default this includes:
- ”image_file: path to the image file
- ”center”: center of the bbox
- ”scale”: scale of the bbox
- ”rotation”: rotation of the bbox
- ”bbox_score”: score of bbox
output (np.ndarray[N, K, >=2]) – predicted regression vector.
kwargs – dict contains ‘img_size’. img_size (tuple(img_width, img_height)): input image size.

forward(x)[源代码]¶: Forward function.

get_accuracy(output, target, target_weight)[源代码]¶

Calculate accuracy for top-down keypoint loss.

注解

batch_size: N
num_keypoints: K

参数

output (torch.Tensor[N, K, 2 or 4]) – Output keypoints.
target (torch.Tensor[N, K, 2]) – Target keypoints.
target_weight (torch.Tensor[N, K, 2]) – Weights across different joint types.

get_loss(output, target, target_weight)[源代码]¶

Calculate top-down keypoint loss.

注解

batch_size: N
num_keypoints: K

参数

output (torch.Tensor[N, K, 2 or 4]) – Output keypoints.
target (torch.Tensor[N, K, 2]) – Target keypoints.
target_weight (torch.Tensor[N, K, 2]) – Weights across different joint types.

inference_model(x, flip_pairs=None)[源代码]¶

Inference function.

返回

Output regression.

返回类型

output_regression (np.ndarray)

参数

x (torch.Tensor[N, K, 2]) – Input features.
flip_pairs (None | list[tuple()) – Pairs of keypoints which are mirrored.

class mmpose.models.heads.HMRMeshHead(in_channels, smpl_mean_params=None, n_iter=3)[源代码]¶

SMPL parameters regressor head of simple baseline. “End-to-end Recovery of Human Shape and Pose”, CVPR’2018.

参数

in_channels (int) – Number of input channels
smpl_mean_params (str) – The file name of the mean SMPL parameters
n_iter (int) – The iterations of estimating delta parameters

forward(x)[源代码]¶

Forward function.

x is the image feature map and is expected to be in shape (batch size x channel number x height x width)

init_weights()[源代码]¶: Initialize model weights.

class mmpose.models.heads.Interhand3DHead(keypoint_head_cfg, root_head_cfg, hand_type_head_cfg, loss_keypoint=None, loss_root_depth=None, loss_hand_type=None, train_cfg=None, test_cfg=None)[源代码]¶

Interhand 3D head of paper ref: Gyeongsik Moon. “InterHand2.6M: A Dataset and Baseline for 3D Interacting Hand Pose Estimation from a Single RGB Image”.

参数

keypoint_head_cfg (dict) – Configs of Heatmap3DHead for hand keypoint estimation.
root_head_cfg (dict) – Configs of Heatmap1DHead for relative hand root depth estimation.
hand_type_head_cfg (dict) – Configs of MultilabelClassificationHead for hand type classification.
loss_keypoint (dict) – Config for keypoint loss. Default: None.
loss_root_depth (dict) – Config for relative root depth loss. Default: None.
loss_hand_type (dict) – Config for hand type classification loss. Default: None.

decode(img_metas, output, **kwargs)[源代码]¶

Decode hand keypoint, relative root depth and hand type.

参数

img_metas (list(dict)) –
Information about data augmentation By default this includes:
- ”image_file: path to the image file
- ”center”: center of the bbox
- ”scale”: scale of the bbox
- ”rotation”: rotation of the bbox
- ”bbox_score”: score of bbox
- ”heatmap3d_depth_bound”: depth bound of hand keypoint
  3D heatmap
- ”root_depth_bound”: depth bound of relative root depth
  1D heatmap
output (list[np.ndarray]) – model predicted 3D heatmaps, relative root depth and hand type.

forward(x)[源代码]¶: Forward function.

get_accuracy(output, target, target_weight)[源代码]¶

Calculate accuracy for hand type.

参数

output (list[Tensor]) – a list of outputs from multiple heads.
target (list[Tensor]) – a list of targets for multiple heads.
target_weight (list[Tensor]) – a list of targets weight for multiple heads.

get_loss(output, target, target_weight)[源代码]¶

Calculate loss for hand keypoint heatmaps, relative root depth and hand type.

参数

output (list[Tensor]) – a list of outputs from multiple heads.
target (list[Tensor]) – a list of targets for multiple heads.
target_weight (list[Tensor]) – a list of targets weight for multiple heads.

inference_model(x, flip_pairs=None)[源代码]¶

Inference function.

返回

list of output hand keypoint heatmaps, relative root depth and hand type.

返回类型

output (list[np.ndarray])

参数

x (torch.Tensor[N,K,H,W]) – Input features.
flip_pairs (None | list[tuple()) – Pairs of keypoints which are mirrored.

class mmpose.models.heads.MultiModalSSAHead(num_classes, modality, in_channels=1024, avg_pool_kernel=(1, 7, 7), dropout_prob=0.0, train_cfg=None, test_cfg=None, **kwargs)[源代码]¶

Sparial-temporal Semantic Alignment Head proposed in “Improving the performance of unimodal dynamic hand-gesture recognition with multimodal training”,

Please refer to the paper for details.

参数

num_classes (int) – number of classes.
modality (list[str]) – modalities of input videos for backbone.
in_channels (int) – number of channels of feature maps. Default: 1024
avg_pool_kernel (tuple[int]) – kernel size of pooling layer. Default: (1, 7, 7)
dropout_prob (float) – probablity to use dropout on input feature map. Default: 0
train_cfg (dict) – training config.
test_cfg (dict) – testing config.

forward(x, img_metas)[源代码]¶: Forward function.

get_accuracy(logits, label, img_metas)[源代码]¶

Compute the accuracy of predicted gesture.

注解

batch_size: N
number of classes: nC
logit length: L

参数

logits (list[NxnCxL]) – predicted logits for each modality.
label (list(dict)) – Category label.
img_metas (list(dict)) – Information about data. By default this includes: - “fps: video frame rate - “modality”: modality of input videos

返回

computed accuracy for each modality.

返回类型

dict[str, torch.tensor]

get_loss(logits, label, fmaps=None)[源代码]¶

Compute the Cross Entropy loss and SSA loss.

注解

batch_size: N
number of classes: nC
feature map channel: C
feature map height: H
feature map width: W
feature map length: L
logit length: Lg

参数

logits (list[NxnCxLg]) – predicted logits for each modality.
label (list(dict)) – Category label.
fmaps (list[torch.Tensor[NxCxLxHxW]]) – feature maps for each modality.

返回

computed losses.

返回类型

dict[str, torch.tensor]

init_weights()[源代码]¶: Initialize model weights.

set_train_epoch(epoch: int)[源代码]¶: set the epoch to control the activation of SSA loss.

class mmpose.models.heads.TemporalRegressionHead(in_channels, num_joints, max_norm=None, loss_keypoint=None, is_trajectory=False, train_cfg=None, test_cfg=None)[源代码]¶

Regression head of VideoPose3D.

“3D human pose estimation in video with temporal convolutions and semi-supervised training”, CVPR’2019.

参数

in_channels (int) – Number of input channels
num_joints (int) – Number of joints
loss_keypoint (dict) – Config for keypoint loss. Default: None.
max_norm (float|None) – if not None, the weight of convolution layers will be clipped to have a maximum norm of max_norm.
is_trajectory (bool) – If the model only predicts root joint position, then this arg should be set to True. In this case, traj_loss will be calculated. Otherwise, it should be set to False. Default: False.

decode(metas, output)[源代码]¶

Decode the keypoints from output regression.

参数

metas (list(dict)) –
Information about data augmentation. By default this includes:
- ”target_image_path”: path to the image file
output (np.ndarray[N, K, 3]) – predicted regression vector.
metas –
Information about data augmentation including:
- target_image_path (str): Optional, path to the image file
- target_mean (float): Optional, normalization parameter of
  the target pose.
- target_std (float): Optional, normalization parameter of the
  target pose.
- root_position (np.ndarray[3,1]): Optional, global
  position of the root joint.
- root_index (torch.ndarray[1,]): Optional, original index of
  the root joint before root-centering.

forward(x)[源代码]¶: Forward function.

get_accuracy(output, target, target_weight, metas)[源代码]¶

Calculate accuracy for keypoint loss.

注解

batch_size: N
num_keypoints: K

参数

output (torch.Tensor[N, K, 3]) – Output keypoints.
target (torch.Tensor[N, K, 3]) – Target keypoints.
target_weight (torch.Tensor[N, K, 3]) – Weights across different joint types.
metas (list(dict)) –
Information about data augmentation including:
- target_image_path (str): Optional, path to the image file
- target_mean (float): Optional, normalization parameter of
  the target pose.
- target_std (float): Optional, normalization parameter of the
  target pose.
- root_position (np.ndarray[3,1]): Optional, global
  position of the root joint.
- root_index (torch.ndarray[1,]): Optional, original index of
  the root joint before root-centering.

get_loss(output, target, target_weight)[源代码]¶

Calculate keypoint loss.

注解

batch_size: N
num_keypoints: K

参数

output (torch.Tensor[N, K, 3]) – Output keypoints.
target (torch.Tensor[N, K, 3]) – Target keypoints.
target_weight (torch.Tensor[N, K, 3]) – Weights across different joint types. If self.is_trajectory is True and target_weight is None, target_weight will be set inversely proportional to joint depth.

inference_model(x, flip_pairs=None)[源代码]¶

Inference function.

返回

Output regression.

返回类型

output_regression (np.ndarray)

参数

x (torch.Tensor[N, K, 2]) – Input features.
flip_pairs (None | list[tuple()) – Pairs of keypoints which are mirrored.

init_weights()[源代码]¶: Initialize the weights.

class mmpose.models.heads.TopdownHeatmapBaseHead[源代码]¶

Base class for top-down heatmap heads.

All top-down heatmap heads should subclass it. All subclass should overwrite:

Methods:get_loss, supporting to calculate loss. Methods:get_accuracy, supporting to calculate accuracy. Methods:forward, supporting to forward model. Methods:inference_model, supporting to inference model.

decode(img_metas, output, **kwargs)[源代码]¶

Decode keypoints from heatmaps.

参数

img_metas (list(dict)) –
Information about data augmentation By default this includes:
- ”image_file: path to the image file
- ”center”: center of the bbox
- ”scale”: scale of the bbox
- ”rotation”: rotation of the bbox
- ”bbox_score”: score of bbox
output (np.ndarray[N, K, H, W]) – model predicted heatmaps.

abstract forward(**kwargs)[源代码]¶: Forward function.

abstract get_accuracy(**kwargs)[源代码]¶: Gets the accuracy.

abstract get_loss(**kwargs)[源代码]¶: Gets the loss.

abstract inference_model(**kwargs)[源代码]¶: Inference function.

class mmpose.models.heads.TopdownHeatmapMSMUHead(out_shape, unit_channels=256, out_channels=17, num_stages=4, num_units=4, use_prm=False, norm_cfg={'type': 'BN'}, loss_keypoint=None, train_cfg=None, test_cfg=None)[源代码]¶

Heads for multi-stage multi-unit heads used in Multi-Stage Pose estimation Network (MSPN), and Residual Steps Networks (RSN).

参数

unit_channels (int) – Number of input channels.
out_channels (int) – Number of output channels.
out_shape (tuple) – Shape of the output heatmap.
num_stages (int) – Number of stages.
num_units (int) – Number of units in each stage.
use_prm (bool) – Whether to use pose refine machine (PRM). Default: False.
norm_cfg (dict) – dictionary to construct and config norm layer. Default: dict(type=’BN’)
loss_keypoint (dict) – Config for keypoint loss. Default: None.

forward(x)[源代码]¶

Forward function.

返回

a list of heatmaps from multiple stages: and units.

返回类型

out (list[Tensor])

get_accuracy(output, target, target_weight)[源代码]¶

Calculate accuracy for top-down keypoint loss.

注解

batch_size: N
num_keypoints: K
heatmaps height: H
heatmaps weight: W

参数

output (torch.Tensor[N,K,H,W]) – Output heatmaps.
target (torch.Tensor[N,K,H,W]) – Target heatmaps.
target_weight (torch.Tensor[N,K,1]) – Weights across different joint types.

get_loss(output, target, target_weight)[源代码]¶

Calculate top-down keypoint loss.

注解

batch_size: N
num_keypoints: K
num_outputs: O
heatmaps height: H
heatmaps weight: W

参数

output (torch.Tensor[N,O,K,H,W]) – Output heatmaps.
target (torch.Tensor[N,O,K,H,W]) – Target heatmaps.
target_weight (torch.Tensor[N,O,K,1]) – Weights across different joint types.

inference_model(x, flip_pairs=None)[源代码]¶

Inference function.

返回

Output heatmaps.

返回类型

output_heatmap (np.ndarray)

参数

x (list[torch.Tensor[N,K,H,W]]) – Input features.
flip_pairs (None | list[tuple]) – Pairs of keypoints which are mirrored.

init_weights()[源代码]¶: Initialize model weights.

class mmpose.models.heads.TopdownHeatmapMultiStageHead(in_channels=512, out_channels=17, num_stages=1, num_deconv_layers=3, num_deconv_filters=(256, 256, 256), num_deconv_kernels=(4, 4, 4), extra=None, loss_keypoint=None, train_cfg=None, test_cfg=None)[源代码]¶

Top-down heatmap multi-stage head.

TopdownHeatmapMultiStageHead is consisted of multiple branches, each of which has num_deconv_layers(>=0) number of deconv layers and a simple conv2d layer.

参数

in_channels (int) – Number of input channels.
out_channels (int) – Number of output channels.
num_stages (int) – Number of stages.
num_deconv_layers (int) – Number of deconv layers. num_deconv_layers should >= 0. Note that 0 means no deconv layers.
num_deconv_filters (list|tuple) – Number of filters. If num_deconv_layers > 0, the length of
num_deconv_kernels (list|tuple) – Kernel sizes.
loss_keypoint (dict) – Config for keypoint loss. Default: None.

forward(x)[源代码]¶

Forward function.

返回: a list of heatmaps from multiple stages.
返回类型: out (list[Tensor])

get_accuracy(output, target, target_weight)[源代码]¶

Calculate accuracy for top-down keypoint loss.

注解

batch_size: N
num_keypoints: K
heatmaps height: H
heatmaps weight: W

参数

output (torch.Tensor[N,K,H,W]) – Output heatmaps.
target (torch.Tensor[N,K,H,W]) – Target heatmaps.
target_weight (torch.Tensor[N,K,1]) – Weights across different joint types.

get_loss(output, target, target_weight)[源代码]¶

Calculate top-down keypoint loss.

注解

batch_size: N
num_keypoints: K
num_outputs: O
heatmaps height: H
heatmaps weight: W

参数

output (torch.Tensor[N,K,H,W]) – Output heatmaps.
target (torch.Tensor[N,K,H,W]) – Target heatmaps.
target_weight (torch.Tensor[N,K,1]) – Weights across different joint types.

inference_model(x, flip_pairs=None)[源代码]¶

Inference function.

返回

Output heatmaps.

返回类型

output_heatmap (np.ndarray)

参数

x (List[torch.Tensor[NxKxHxW]]) – Input features.
flip_pairs (None | list[tuple()) – Pairs of keypoints which are mirrored.

init_weights()[源代码]¶: Initialize model weights.

class mmpose.models.heads.TopdownHeatmapSimpleHead(in_channels, out_channels, num_deconv_layers=3, num_deconv_filters=(256, 256, 256), num_deconv_kernels=(4, 4, 4), extra=None, in_index=0, input_transform=None, align_corners=False, loss_keypoint=None, train_cfg=None, test_cfg=None)[源代码]¶

Top-down heatmap simple head. paper ref: Bin Xiao et al. Simple Baselines for Human Pose Estimation and Tracking.

TopdownHeatmapSimpleHead is consisted of (>=0) number of deconv layers and a simple conv2d layer.

参数

in_channels (int) – Number of input channels
out_channels (int) – Number of output channels
num_deconv_layers (int) – Number of deconv layers. num_deconv_layers should >= 0. Note that 0 means no deconv layers.
num_deconv_filters (list|tuple) – Number of filters. If num_deconv_layers > 0, the length of
num_deconv_kernels (list|tuple) – Kernel sizes.
in_index (int|Sequence[int]) – Input feature index. Default: 0
input_transform (str|None) –
Transformation type of input features. Options: ‘resize_concat’, ‘multiple_select’, None. Default: None.
- ’resize_concat’: Multiple feature maps will be resized to the
  same size as the first one and then concat together. Usually used in FCN head of HRNet.
- ’multiple_select’: Multiple feature maps will be bundle into
  a list and passed into decode head.
- None: Only one select feature map is allowed.
align_corners (bool) – align_corners argument of F.interpolate. Default: False.
loss_keypoint (dict) – Config for keypoint loss. Default: None.

forward(x)[源代码]¶: Forward function.

get_accuracy(output, target, target_weight)[源代码]¶

Calculate accuracy for top-down keypoint loss.

注解

batch_size: N
num_keypoints: K
heatmaps height: H
heatmaps weight: W

参数

output (torch.Tensor[N,K,H,W]) – Output heatmaps.
target (torch.Tensor[N,K,H,W]) – Target heatmaps.
target_weight (torch.Tensor[N,K,1]) – Weights across different joint types.

get_loss(output, target, target_weight)[源代码]¶

Calculate top-down keypoint loss.

注解

batch_size: N
num_keypoints: K
heatmaps height: H
heatmaps weight: W

参数

output (torch.Tensor[N,K,H,W]) – Output heatmaps.
target (torch.Tensor[N,K,H,W]) – Target heatmaps.
target_weight (torch.Tensor[N,K,1]) – Weights across different joint types.

inference_model(x, flip_pairs=None)[源代码]¶

Inference function.

返回

Output heatmaps.

返回类型

output_heatmap (np.ndarray)

参数

x (torch.Tensor[N,K,H,W]) – Input features.
flip_pairs (None | list[tuple]) – Pairs of keypoints which are mirrored.

init_weights()[源代码]¶: Initialize model weights.

class mmpose.models.heads.ViPNASHeatmapSimpleHead(in_channels, out_channels, num_deconv_layers=3, num_deconv_filters=(144, 144, 144), num_deconv_kernels=(4, 4, 4), num_deconv_groups=(16, 16, 16), extra=None, in_index=0, input_transform=None, align_corners=False, loss_keypoint=None, train_cfg=None, test_cfg=None)[源代码]¶

ViPNAS heatmap simple head.

ViPNAS: Efficient Video Pose Estimation via Neural Architecture Search. More details can be found in the paper .

TopdownHeatmapSimpleHead is consisted of (>=0) number of deconv layers and a simple conv2d layer.

参数

in_channels (int) – Number of input channels
out_channels (int) – Number of output channels
num_deconv_layers (int) – Number of deconv layers. num_deconv_layers should >= 0. Note that 0 means no deconv layers.
num_deconv_filters (list|tuple) – Number of filters. If num_deconv_layers > 0, the length of
num_deconv_kernels (list|tuple) – Kernel sizes.
num_deconv_groups (list|tuple) – Group number.
in_index (int|Sequence[int]) – Input feature index. Default: -1
input_transform (str|None) –
Transformation type of input features. Options: ‘resize_concat’, ‘multiple_select’, None. Default: None.
- ’resize_concat’: Multiple feature maps will be resize to the
  same size as first one and than concat together. Usually used in FCN head of HRNet.
- ’multiple_select’: Multiple feature maps will be bundle into
  a list and passed into decode head.
- None: Only one select feature map is allowed.
align_corners (bool) – align_corners argument of F.interpolate. Default: False.
loss_keypoint (dict) – Config for keypoint loss. Default: None.

forward(x)[源代码]¶: Forward function.

get_accuracy(output, target, target_weight)[源代码]¶

Calculate accuracy for top-down keypoint loss.

注解

batch_size: N
num_keypoints: K
heatmaps height: H
heatmaps weight: W

参数

output (torch.Tensor[N,K,H,W]) – Output heatmaps.
target (torch.Tensor[N,K,H,W]) – Target heatmaps.
target_weight (torch.Tensor[N,K,1]) – Weights across different joint types.

get_loss(output, target, target_weight)[源代码]¶

Calculate top-down keypoint loss.

注解

batch_size: N
num_keypoints: K
heatmaps height: H
heatmaps weight: W

参数

output (torch.Tensor[N,K,H,W]) – Output heatmaps.
target (torch.Tensor[N,K,H,W]) – Target heatmaps.
target_weight (torch.Tensor[N,K,1]) – Weights across different joint types.

inference_model(x, flip_pairs=None)[源代码]¶

Inference function.

返回

Output heatmaps.

返回类型

output_heatmap (np.ndarray)

参数

x (torch.Tensor[N,K,H,W]) – Input features.
flip_pairs (None | list[tuple]) – Pairs of keypoints which are mirrored.

init_weights()[源代码]¶: Initialize model weights.

losses¶

class mmpose.models.losses.AELoss(loss_type)[源代码]¶

Associative Embedding loss.

Associative Embedding: End-to-End Learning for Joint Detection and Grouping.

forward(tags, joints)[源代码]¶

Accumulate the tag loss for each image in the batch.

注解

batch_size: N
heatmaps weight: W
heatmaps height: H
max_num_people: M
num_keypoints: K

参数

tags (torch.Tensor[N,KxHxW,1]) – tag channels of output.
joints (torch.Tensor[N,M,K,2]) – joints information.

singleTagLoss(pred_tag, joints)[源代码]¶

Associative embedding loss for one image.

注解

heatmaps weight: W
heatmaps height: H
max_num_people: M
num_keypoints: K

参数

pred_tag (torch.Tensor[KxHxW,1]) – tag of output for one image.
joints (torch.Tensor[M,K,2]) – joints information for one image.

class mmpose.models.losses.AdaptiveWingLoss(alpha=2.1, omega=14, epsilon=1, theta=0.5, use_target_weight=False, loss_weight=1.0)[源代码]¶

Adaptive wing loss. paper ref: ‘Adaptive Wing Loss for Robust Face Alignment via Heatmap Regression’ Wang et al. ICCV’2019.

参数

alpha (float), omega (float), epsilon (float), theta (float) – are hyper-parameters.
use_target_weight (bool) – Option to use weighted MSE loss. Different joint types may have different target weights.
loss_weight (float) – Weight of the loss. Default: 1.0.

criterion(pred, target)[源代码]¶

Criterion of wingloss.

注解

batch_size: N num_keypoints: K

参数

pred (torch.Tensor[NxKxHxW]) – Predicted heatmaps.
target (torch.Tensor[NxKxHxW]) – Target heatmaps.

forward(output, target, target_weight)[源代码]¶

Forward function.

注解

batch_size: N num_keypoints: K

参数

output (torch.Tensor[NxKxHxW]) – Output heatmaps.
target (torch.Tensor[NxKxHxW]) – Target heatmaps.
target_weight (torch.Tensor[NxKx1]) – Weights across different joint types.

class mmpose.models.losses.BCELoss(use_target_weight=False, loss_weight=1.0)[源代码]¶

Binary Cross Entropy loss.

forward(output, target, target_weight=None)[源代码]¶

Forward function.

注解

batch_size: N
num_labels: K

参数

output (torch.Tensor[N, K]) – Output classification.
target (torch.Tensor[N, K]) – Target classification.
target_weight (torch.Tensor[N, K] or torch.Tensor[N]) – Weights across different labels.

class mmpose.models.losses.BoneLoss(joint_parents, use_target_weight=False, loss_weight=1.0)[源代码]¶

Bone length loss.

参数

joint_parents (list) – Indices of each joint’s parent joint.
use_target_weight (bool) – Option to use weighted bone loss. Different bone types may have different target weights.
loss_weight (float) – Weight of the loss. Default: 1.0.

forward(output, target, target_weight=None)[源代码]¶

Forward function.

注解

batch_size: N
num_keypoints: K
dimension of keypoints: D (D=2 or D=3)

参数

output (torch.Tensor[N, K, D]) – Output regression.
target (torch.Tensor[N, K, D]) – Target regression.
target_weight (torch.Tensor[N, K-1]) – Weights across different bone types.

class mmpose.models.losses.FocalHeatmapLoss(alpha=2, beta=4)[源代码]¶

forward(pred, gt, mask=None)[源代码]¶

Modified focal loss.

Exactly the same as CornerNet. Runs faster and costs a little bit more memory :param pred: :type pred: batch x c x h x w :param gt_regr: :type gt_regr: batch x c x h x w

class mmpose.models.losses.GANLoss(gan_type, real_label_val=1.0, fake_label_val=0.0, loss_weight=1.0)[源代码]¶

Define GAN loss.

参数

gan_type (str) – Support ‘vanilla’, ‘lsgan’, ‘wgan’, ‘hinge’.
real_label_val (float) – The value for real label. Default: 1.0.
fake_label_val (float) – The value for fake label. Default: 0.0.
loss_weight (float) – Loss weight. Default: 1.0. Note that loss_weight is only for generators; and it is always 1.0 for discriminators.

forward(input, target_is_real, is_disc=False)[源代码]¶

参数

input (Tensor) – The input for the loss module, i.e., the network prediction.
target_is_real (bool) – Whether the targe is real or fake.
is_disc (bool) – Whether the loss for discriminators or not. Default: False.

返回

GAN loss value.

返回类型

Tensor

get_target_label(input, target_is_real)[源代码]¶

Get target label.

参数

input (Tensor) – Input tensor.
target_is_real (bool) – Whether the target is real or fake.

返回

Target tensor. Return bool for wgan, otherwise, return Tensor.

返回类型

(bool | Tensor)

class mmpose.models.losses.HeatmapLoss(supervise_empty=True)[源代码]¶

Accumulate the heatmap loss for each image in the batch.

参数: supervise_empty (bool) – Whether to supervise empty channels.

forward(pred, gt, mask)[源代码]¶

Forward function.

注解

batch_size: N
heatmaps weight: W
heatmaps height: H
max_num_people: M
num_keypoints: K

参数

pred (torch.Tensor[N,K,H,W]) – heatmap of output.
gt (torch.Tensor[N,K,H,W]) – target heatmap.
mask (torch.Tensor[N,H,W]) – mask of target.

class mmpose.models.losses.JointsMSELoss(use_target_weight=False, loss_weight=1.0)[源代码]¶

MSE loss for heatmaps.

参数

use_target_weight (bool) – Option to use weighted MSE loss. Different joint types may have different target weights.
loss_weight (float) – Weight of the loss. Default: 1.0.

forward(output, target, target_weight)[源代码]¶: Forward function.

class mmpose.models.losses.JointsOHKMMSELoss(use_target_weight=False, topk=8, loss_weight=1.0)[源代码]¶

MSE loss with online hard keypoint mining.

参数

use_target_weight (bool) – Option to use weighted MSE loss. Different joint types may have different target weights.
topk (int) – Only top k joint losses are kept.
loss_weight (float) – Weight of the loss. Default: 1.0.

forward(output, target, target_weight)[源代码]¶: Forward function.

class mmpose.models.losses.L1Loss(use_target_weight=False, loss_weight=1.0)[源代码]¶

L1Loss loss .

forward(output, target, target_weight=None)[源代码]¶

Forward function.

注解

batch_size: N
num_keypoints: K

参数

output (torch.Tensor[N, K, 2]) – Output regression.
target (torch.Tensor[N, K, 2]) – Target regression.
target_weight (torch.Tensor[N, K, 2]) – Weights across different joint types.

class mmpose.models.losses.MPJPELoss(use_target_weight=False, loss_weight=1.0)[源代码]¶

MPJPE (Mean Per Joint Position Error) loss.

参数

use_target_weight (bool) – Option to use weighted MSE loss. Different joint types may have different target weights.
loss_weight (float) – Weight of the loss. Default: 1.0.

forward(output, target, target_weight=None)[源代码]¶

Forward function.

注解

batch_size: N
num_keypoints: K
dimension of keypoints: D (D=2 or D=3)

参数

output (torch.Tensor[N, K, D]) – Output regression.
target (torch.Tensor[N, K, D]) – Target regression.
target_weight (torch.Tensor[N,K,D]) – Weights across different joint types.

class mmpose.models.losses.MSELoss(use_target_weight=False, loss_weight=1.0)[源代码]¶

MSE loss for coordinate regression.

forward(output, target, target_weight=None)[源代码]¶

Forward function.

注解

batch_size: N
num_keypoints: K

参数

output (torch.Tensor[N, K, 2]) – Output regression.
target (torch.Tensor[N, K, 2]) – Target regression.
target_weight (torch.Tensor[N, K, 2]) – Weights across different joint types.

class mmpose.models.losses.MeshLoss(joints_2d_loss_weight, joints_3d_loss_weight, vertex_loss_weight, smpl_pose_loss_weight, smpl_beta_loss_weight, img_res, focal_length=5000)[源代码]¶

Mix loss for 3D human mesh. It is composed of loss on 2D joints, 3D joints, mesh vertices and smpl parameters (if any).

参数

joints_2d_loss_weight (float) – Weight for loss on 2D joints.
joints_3d_loss_weight (float) – Weight for loss on 3D joints.
vertex_loss_weight (float) – Weight for loss on 3D verteices.
smpl_pose_loss_weight (float) – Weight for loss on SMPL pose parameters.
smpl_beta_loss_weight (float) – Weight for loss on SMPL shape parameters.
img_res (int) – Input image resolution.
focal_length (float) – Focal length of camera model. Default=5000.

forward(output, target)[源代码]¶

Forward function.

参数

output (dict) – dict of network predicted results. Keys: ‘vertices’, ‘joints_3d’, ‘camera’, ‘pose’(optional), ‘beta’(optional)
target (dict) – dict of ground-truth labels. Keys: ‘vertices’, ‘joints_3d’, ‘joints_3d_visible’, ‘joints_2d’, ‘joints_2d_visible’, ‘pose’, ‘beta’, ‘has_smpl’

返回

dict of losses.

返回类型

dict

joints_2d_loss(pred_joints_2d, gt_joints_2d, joints_2d_visible)[源代码]¶

Compute 2D reprojection loss on the joints.

The loss is weighted by joints_2d_visible.

joints_3d_loss(pred_joints_3d, gt_joints_3d, joints_3d_visible)[源代码]¶

Compute 3D joints loss for the examples that 3D joint annotations are available.

The loss is weighted by joints_3d_visible.

project_points(points_3d, camera)[源代码]¶

Perform orthographic projection of 3D points using the camera parameters, return projected 2D points in image plane.

注解

batch size: B
point number: N

参数

points_3d (Tensor([B, N, 3])) – 3D points.
camera (Tensor([B, 3])) – camera parameters with the 3 channel as (scale, translation_x, translation_y)

返回

projected 2D points in image space.

返回类型

Tensor([B, N, 2])

smpl_losses(pred_rotmat, pred_betas, gt_pose, gt_betas, has_smpl)[源代码]¶

Compute SMPL parameters loss for the examples that SMPL parameter annotations are available.

The loss is weighted by has_smpl.

vertex_loss(pred_vertices, gt_vertices, has_smpl)[源代码]¶

Compute 3D vertex loss for the examples that 3D human mesh annotations are available.

The loss is weighted by the has_smpl.

class mmpose.models.losses.MultiLossFactory(num_joints, num_stages, ae_loss_type, with_ae_loss, push_loss_factor, pull_loss_factor, with_heatmaps_loss, heatmaps_loss_factor, supervise_empty=True)[源代码]¶

Loss for bottom-up models.

参数

num_joints (int) – Number of keypoints.
num_stages (int) – Number of stages.
ae_loss_type (str) – Type of ae loss.
with_ae_loss (list[bool]) – Use ae loss or not in multi-heatmap.
push_loss_factor (list[float]) – Parameter of push loss in multi-heatmap.
pull_loss_factor (list[float]) – Parameter of pull loss in multi-heatmap.
with_heatmap_loss (list[bool]) – Use heatmap loss or not in multi-heatmap.
heatmaps_loss_factor (list[float]) – Parameter of heatmap loss in multi-heatmap.
supervise_empty (bool) – Whether to supervise empty channels.

forward(outputs, heatmaps, masks, joints)[源代码]¶

Forward function to calculate losses.

注解

batch_size: N
heatmaps weight: W
heatmaps height: H
max_num_people: M
num_keypoints: K
output_channel: C C=2K if use ae loss else K

参数

outputs (list(torch.Tensor[N,C,H,W])) – outputs of stages.
heatmaps (list(torch.Tensor[N,K,H,W])) – target of heatmaps.
masks (list(torch.Tensor[N,H,W])) – masks of heatmaps.
joints (list(torch.Tensor[N,M,K,2])) – joints of ae loss.

class mmpose.models.losses.RLELoss(use_target_weight=False, size_average=True, residual=True, q_dis='laplace')[源代码]¶

RLE Loss.

Human Pose Regression With Residual Log-Likelihood Estimation arXiv:.

Code is modified from the official implementation.

参数

use_target_weight (bool) – Option to use weighted MSE loss. Different joint types may have different target weights.
size_average (bool) – Option to average the loss by the batch_size.
residual (bool) – Option to add L1 loss and let the flow learn the residual error distribution.
q_dis (string) – Option for the identity Q(error) distribution, Options: “laplace” or “gaussian”

forward(output, target, target_weight=None)[源代码]¶

Forward function.

注解

batch_size: N
num_keypoints: K
dimension of keypoints: D (D=2 or D=3)

参数

output (torch.Tensor[N, K, D*2]) – Output regression, including coords and sigmas.
target (torch.Tensor[N, K, D]) – Target regression.
target_weight (torch.Tensor[N, K, D]) – Weights across different joint types.

class mmpose.models.losses.SemiSupervisionLoss(joint_parents, projection_loss_weight=1.0, bone_loss_weight=1.0, warmup_iterations=0)[源代码]¶

Semi-supervision loss for unlabeled data. It is composed of projection loss and bone loss.

Paper ref: 3D human pose estimation in video with temporal convolutions and semi-supervised training Dario Pavllo et al. CVPR’2019.

参数

joint_parents (list) – Indices of each joint’s parent joint.
projection_loss_weight (float) – Weight for projection loss.
bone_loss_weight (float) – Weight for bone loss.
warmup_iterations (int) – Number of warmup iterations. In the first warmup_iterations iterations, the model is trained only on labeled data, and semi-supervision loss will be 0. This is a workaround since currently we cannot access epoch number in loss functions. Note that the iteration number in an epoch can be changed due to different GPU numbers in multi-GPU settings. So please set this parameter carefully. warmup_iterations = dataset_size // samples_per_gpu // gpu_num * warmup_epochs

forward(output, target)[源代码]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

注解

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

static project_joints(x, intrinsics)[源代码]¶

Project 3D joint coordinates to 2D image plane using camera intrinsic parameters.

参数

x (torch.Tensor[N, K, 3]) – 3D joint coordinates.
intrinsics (torch.Tensor[N, 4] | torch.Tensor[N, 9]) – Camera intrinsics: f (2), c (2), k (3), p (2).

class mmpose.models.losses.SmoothL1Loss(use_target_weight=False, loss_weight=1.0)[源代码]¶

SmoothL1Loss loss.

参数

use_target_weight (bool) – Option to use weighted MSE loss. Different joint types may have different target weights.
loss_weight (float) – Weight of the loss. Default: 1.0.

forward(output, target, target_weight=None)[源代码]¶

Forward function.

注解

batch_size: N
num_keypoints: K
dimension of keypoints: D (D=2 or D=3)

参数

output (torch.Tensor[N, K, D]) – Output regression.
target (torch.Tensor[N, K, D]) – Target regression.
target_weight (torch.Tensor[N, K, D]) – Weights across different joint types.

class mmpose.models.losses.SoftWeightSmoothL1Loss(use_target_weight=False, supervise_empty=True, beta=1.0, loss_weight=1.0)[源代码]¶

Smooth L1 loss with soft weight for regression.

参数

use_target_weight (bool) – Option to use weighted MSE loss. Different joint types may have different target weights.
supervise_empty (bool) – Whether to supervise the output with zero weight.
beta (float) – Specifies the threshold at which to change between L1 and L2 loss.
loss_weight (float) – Weight of the loss. Default: 1.0.

forward(output, target, target_weight=None)[源代码]¶

Forward function.

注解

batch_size: N
num_keypoints: K
dimension of keypoints: D (D=2 or D=3)

参数

output (torch.Tensor[N, K, D]) – Output regression.
target (torch.Tensor[N, K, D]) – Target regression.
target_weight (torch.Tensor[N, K, D]) – Weights across different joint types.

static smooth_l1_loss(input, target, reduction='none', beta=1.0)[源代码]¶: Re-implement torch.nn.functional.smooth_l1_loss with beta to support pytorch <= 1.6.

class mmpose.models.losses.SoftWingLoss(omega1=2.0, omega2=20.0, epsilon=0.5, use_target_weight=False, loss_weight=1.0)[源代码]¶

Soft Wing Loss ‘Structure-Coherent Deep Feature Learning for Robust Face Alignment’ Lin et al. TIP’2021.

loss =

|x| , if |x| < omega1
omega2*ln(1+|x|/epsilon) + B, if |x| >= omega1

参数

omega1 (float) – The first threshold.
omega2 (float) – The second threshold.
epsilon (float) – Also referred to as curvature.
use_target_weight (bool) – Option to use weighted MSE loss. Different joint types may have different target weights.
loss_weight (float) – Weight of the loss. Default: 1.0.

criterion(pred, target)[源代码]¶

Criterion of wingloss.

注解

batch_size: N num_keypoints: K dimension of keypoints: D (D=2 or D=3)

参数

pred (torch.Tensor[N, K, D]) – Output regression.
target (torch.Tensor[N, K, D]) – Target regression.

forward(output, target, target_weight=None)[源代码]¶

Forward function.

注解

batch_size: N num_keypoints: K dimension of keypoints: D (D=2 or D=3)

参数

output (torch.Tensor[N, K, D]) – Output regression.
target (torch.Tensor[N, K, D]) – Target regression.
target_weight (torch.Tensor[N, K, D]) – Weights across different joint types.

class mmpose.models.losses.WingLoss(omega=10.0, epsilon=2.0, use_target_weight=False, loss_weight=1.0)[源代码]¶

Wing Loss. paper ref: ‘Wing Loss for Robust Facial Landmark Localisation with Convolutional Neural Networks’ Feng et al. CVPR’2018.

参数

omega (float) – Also referred to as width.
epsilon (float) – Also referred to as curvature.
use_target_weight (bool) – Option to use weighted MSE loss. Different joint types may have different target weights.
loss_weight (float) – Weight of the loss. Default: 1.0.

criterion(pred, target)[源代码]¶

Criterion of wingloss.

注解

batch_size: N
num_keypoints: K
dimension of keypoints: D (D=2 or D=3)

参数

pred (torch.Tensor[N, K, D]) – Output regression.
target (torch.Tensor[N, K, D]) – Target regression.

forward(output, target, target_weight=None)[源代码]¶

Forward function.

注解

batch_size: N
num_keypoints: K
dimension of keypoints: D (D=2 or D=3)

参数

output (torch.Tensor[N, K, D]) – Output regression.
target (torch.Tensor[N, K, D]) – Target regression.
target_weight (torch.Tensor[N,K,D]) – Weights across different joint types.

misc¶

mmpose.datasets¶

class mmpose.datasets.AnimalATRWDataset(ann_file, img_prefix, data_cfg, pipeline, dataset_info=None, test_mode=False)[源代码]¶

ATRW dataset for animal pose estimation.

“ATRW: A Benchmark for Amur Tiger Re-identification in the Wild” ACM MM’2020. More details can be found in the paper .

The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.

ATRW keypoint indexes:

"left_ear",
"right_ear",
"nose",
"right_shoulder",
"right_front_paw",
"left_shoulder",
"left_front_paw",
"right_hip",
"right_knee",
"right_back_paw",
"left_hip",
"left_knee",
"left_back_paw",
"tail",
"center"

参数

ann_file (str) – Path to the annotation file.
img_prefix (str) – Path to a directory where images are held. Default: None.
data_cfg (dict) – config
pipeline (list[dict | callable]) – A sequence of data transforms.
dataset_info (DatasetInfo) – A class containing all dataset info.
test_mode (bool) – Store True when building test or validation dataset. Default: False.

evaluate(results, res_folder=None, metric='mAP', **kwargs)[源代码]¶

Evaluate coco keypoint results. The pose prediction results will be saved in ${res_folder}/result_keypoints.json.

注解

batch_size: N
num_keypoints: K
heatmap height: H
heatmap width: W

参数

results (list[dict]) –
Testing results containing the following items:
- preds (np.ndarray[N,K,3]): The first two dimensions are coordinates, score is the third dimension of the array.
- boxes (np.ndarray[N,6]): [center[0], center[1], scale[0], scale[1],area, score]
- image_paths (list[str]): For example, [‘data/coco/val2017 /000000393226.jpg’]
- heatmap (np.ndarray[N, K, H, W]): model output heatmap
- bbox_id (list(int)).
res_folder (str, optional) – The folder to save the testing results. If not specified, a temp folder will be created. Default: None.
metric (str | list[str]) – Metric to be performed. Defaults: ‘mAP’.

返回

Evaluation results for evaluation metric.

返回类型

dict

class mmpose.datasets.AnimalFlyDataset(ann_file, img_prefix, data_cfg, pipeline, dataset_info=None, test_mode=False)[源代码]¶

AnimalFlyDataset for animal pose estimation.

“Fast animal pose estimation using deep neural networks” Nature methods’2019. More details can be found in the paper .

The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.

Vinegar Fly keypoint indexes:

"head",
"eyeL",
"eyeR",
"neck",
"thorax",
"abdomen",
"forelegR1",
"forelegR2",
"forelegR3",
"forelegR4",
"midlegR1",
"midlegR2",
"midlegR3",
"midlegR4",
"hindlegR1",
"hindlegR2",
"hindlegR3",
"hindlegR4",
"forelegL1",
"forelegL2",
"forelegL3",
"forelegL4",
"midlegL1",
"midlegL2",
"midlegL3",
"midlegL4",
"hindlegL1",
"hindlegL2",
"hindlegL3",
"hindlegL4",
"wingL",
"wingR"

参数

ann_file (str) – Path to the annotation file.
img_prefix (str) – Path to a directory where images are held. Default: None.
data_cfg (dict) – config
pipeline (list[dict | callable]) – A sequence of data transforms.
dataset_info (DatasetInfo) – A class containing all dataset info.
test_mode (bool) – Store True when building test or validation dataset. Default: False.

evaluate(results, res_folder=None, metric='PCK', **kwargs)[源代码]¶

Evaluate Fly keypoint results. The pose prediction results will be saved in ${res_folder}/result_keypoints.json.

注解

batch_size: N
num_keypoints: K
heatmap height: H
heatmap width: W

参数

results (list[dict]) –
Testing results containing the following items:
- preds (np.ndarray[N,K,3]): The first two dimensions are coordinates, score is the third dimension of the array.
- boxes (np.ndarray[N,6]): [center[0], center[1], scale[0], scale[1],area, score]
- image_paths (list[str]): For example, [‘Test/source/0.jpg’]
- output_heatmap (np.ndarray[N, K, H, W]): model outputs.
res_folder (str) – Path of directory to save the results.
metric (str | list[str]) – Metric to be performed. Options: ‘PCK’, ‘AUC’, ‘EPE’.

返回

Evaluation results for evaluation metric.

返回类型

dict

class mmpose.datasets.AnimalHorse10Dataset(ann_file, img_prefix, data_cfg, pipeline, dataset_info=None, test_mode=False)[源代码]¶

AnimalHorse10Dataset for animal pose estimation.

“Pretraining boosts out-of-domain robustness for pose estimation” WACV’2021. More details can be found in the paper .

The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.

Horse-10 keypoint indexes:

'Nose',
'Eye',
'Nearknee',
'Nearfrontfetlock',
'Nearfrontfoot',
'Offknee',
'Offfrontfetlock',
'Offfrontfoot',
'Shoulder',
'Midshoulder',
'Elbow',
'Girth',
'Wither',
'Nearhindhock',
'Nearhindfetlock',
'Nearhindfoot',
'Hip',
'Stifle',
'Offhindhock',
'Offhindfetlock',
'Offhindfoot',
'Ischium'

参数

ann_file (str) – Path to the annotation file.
img_prefix (str) – Path to a directory where images are held. Default: None.
data_cfg (dict) – config
pipeline (list[dict | callable]) – A sequence of data transforms.
dataset_info (DatasetInfo) – A class containing all dataset info.
test_mode (bool) – Store True when building test or validation dataset. Default: False.

evaluate(results, res_folder=None, metric='PCK', **kwargs)[源代码]¶

Evaluate horse-10 keypoint results. The pose prediction results will be saved in ${res_folder}/result_keypoints.json.

注解

batch_size: N
num_keypoints: K
heatmap height: H
heatmap width: W

参数

results (list[dict]) –
Testing results containing the following items:
- preds (np.ndarray[N,K,3]): The first two dimensions are coordinates, score is the third dimension of the array.
- boxes (np.ndarray[N,6]): [center[0], center[1], scale[0], scale[1],area, score]
- image_paths (list[str]): For example, [‘Test/source/0.jpg’]
- output_heatmap (np.ndarray[N, K, H, W]): model outputs.
res_folder (str, optional) – The folder to save the testing results. If not specified, a temp folder will be created. Default: None.
metric (str | list[str]) – Metric to be performed. Options: ‘PCK’, ‘NME’.

返回

Evaluation results for evaluation metric.

返回类型

dict

class mmpose.datasets.AnimalLocustDataset(ann_file, img_prefix, data_cfg, pipeline, dataset_info=None, test_mode=False)[源代码]¶

AnimalLocustDataset for animal pose estimation.

“DeepPoseKit, a software toolkit for fast and robust animal pose estimation using deep learning” Elife’2019. More details can be found in the paper.

The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.

Desert Locust keypoint indexes:

"head",
"neck",
"thorax",
"abdomen1",
"abdomen2",
"anttipL",
"antbaseL",
"eyeL",
"forelegL1",
"forelegL2",
"forelegL3",
"forelegL4",
"midlegL1",
"midlegL2",
"midlegL3",
"midlegL4",
"hindlegL1",
"hindlegL2",
"hindlegL3",
"hindlegL4",
"anttipR",
"antbaseR",
"eyeR",
"forelegR1",
"forelegR2",
"forelegR3",
"forelegR4",
"midlegR1",
"midlegR2",
"midlegR3",
"midlegR4",
"hindlegR1",
"hindlegR2",
"hindlegR3",
"hindlegR4"

参数

ann_file (str) – Path to the annotation file.
img_prefix (str) – Path to a directory where images are held. Default: None.
data_cfg (dict) – config
pipeline (list[dict | callable]) – A sequence of data transforms.
dataset_info (DatasetInfo) – A class containing all dataset info.
test_mode (bool) – Store True when building test or validation dataset. Default: False.

evaluate(results, res_folder=None, metric='PCK', **kwargs)[源代码]¶

Evaluate Fly keypoint results. The pose prediction results will be saved in ${res_folder}/result_keypoints.json.

注解

batch_size: N
num_keypoints: K
heatmap height: H
heatmap width: W

参数

results (list[dict]) –
Testing results containing the following items:
- preds (np.ndarray[N,K,3]): The first two dimensions are coordinates, score is the third dimension of the array.
- boxes (np.ndarray[N,6]): [center[0], center[1], scale[0], scale[1],area, score]
- image_paths (list[str]): For example, [‘Test/source/0.jpg’]
- output_heatmap (np.ndarray[N, K, H, W]): model outputs.
res_folder (str, optional) – The folder to save the testing results. If not specified, a temp folder will be created. Default: None.
metric (str | list[str]) – Metric to be performed. Options: ‘PCK’, ‘AUC’, ‘EPE’.

返回

Evaluation results for evaluation metric.

返回类型

dict

class mmpose.datasets.AnimalMacaqueDataset(ann_file, img_prefix, data_cfg, pipeline, dataset_info=None, test_mode=False)[源代码]¶

MacaquePose dataset for animal pose estimation.

“MacaquePose: A novel ‘in the wild’ macaque monkey pose dataset for markerless motion capture” bioRxiv’2020. More details can be found in the paper .

The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.

Macaque keypoint indexes:

'nose',
'left_eye',
'right_eye',
'left_ear',
'right_ear',
'left_shoulder',
'right_shoulder',
'left_elbow',
'right_elbow',
'left_wrist',
'right_wrist',
'left_hip',
'right_hip',
'left_knee',
'right_knee',
'left_ankle',
'right_ankle'

参数

ann_file (str) – Path to the annotation file.
img_prefix (str) – Path to a directory where images are held. Default: None.
data_cfg (dict) – config
pipeline (list[dict | callable]) – A sequence of data transforms.
dataset_info (DatasetInfo) – A class containing all dataset info.
test_mode (bool) – Store True when building test or validation dataset. Default: False.

evaluate(results, res_folder=None, metric='mAP', **kwargs)[源代码]¶

Evaluate coco keypoint results. The pose prediction results will be saved in ${res_folder}/result_keypoints.json.

注解

batch_size: N num_keypoints: K heatmap height: H heatmap width: W

参数

results (list[dict]) –
Testing results containing the following items:
- preds (np.ndarray[N,K,3]): The first two dimensions are coordinates, score is the third dimension of the array.
- boxes (np.ndarray[N,6]): [center[0], center[1], scale[0], scale[1],area, score]
- image_paths (list[str]): For example, [‘data/coco/val2017 /000000393226.jpg’]
- heatmap (np.ndarray[N, K, H, W]): model output heatmap
- bbox_id (list(int)).
res_folder (str, optional) – The folder to save the testing results. If not specified, a temp folder will be created. Default: None.
metric (str | list[str]) – Metric to be performed. Defaults: ‘mAP’.

返回

Evaluation results for evaluation metric.

返回类型

dict

class mmpose.datasets.AnimalPoseDataset(ann_file, img_prefix, data_cfg, pipeline, dataset_info=None, test_mode=False)[源代码]¶

Animal-Pose dataset for animal pose estimation.

“Cross-domain Adaptation For Animal Pose Estimation” ICCV’2019 More details can be found in the paper .

The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.

Animal-Pose keypoint indexes:

'L_Eye',
'R_Eye',
'L_EarBase',
'R_EarBase',
'Nose',
'Throat',
'TailBase',
'Withers',
'L_F_Elbow',
'R_F_Elbow',
'L_B_Elbow',
'R_B_Elbow',
'L_F_Knee',
'R_F_Knee',
'L_B_Knee',
'R_B_Knee',
'L_F_Paw',
'R_F_Paw',
'L_B_Paw',
'R_B_Paw'

参数

ann_file (str) – Path to the annotation file.
img_prefix (str) – Path to a directory where images are held. Default: None.
data_cfg (dict) – config
pipeline (list[dict | callable]) – A sequence of data transforms.
dataset_info (DatasetInfo) – A class containing all dataset info.
test_mode (bool) – Store True when building test or validation dataset. Default: False.

evaluate(results, res_folder=None, metric='mAP', **kwargs)[源代码]¶

Evaluate coco keypoint results. The pose prediction results will be saved in ${res_folder}/result_keypoints.json.

注解

batch_size: N
num_keypoints: K
heatmap height: H
heatmap width: W

参数

results (list[dict]) –
Testing results containing the following items:
- preds (np.ndarray[N,K,3]): The first two dimensions are coordinates, score is the third dimension of the array.
- boxes (np.ndarray[N,6]): [center[0], center[1], scale[0], scale[1],area, score]
- image_paths (list[str]): For example, [‘data/coco/val2017 /000000393226.jpg’]
- heatmap (np.ndarray[N, K, H, W]): model output heatmap
- bbox_id (list(int)).
res_folder (str, optional) – The folder to save the testing results. If not specified, a temp folder will be created. Default: None.
metric (str | list[str]) – Metric to be performed. Defaults: ‘mAP’.

返回

Evaluation results for evaluation metric.

返回类型

dict

class mmpose.datasets.AnimalZebraDataset(ann_file, img_prefix, data_cfg, pipeline, dataset_info=None, test_mode=False)[源代码]¶

AnimalZebraDataset for animal pose estimation.

“DeepPoseKit, a software toolkit for fast and robust animal pose estimation using deep learning” Elife’2019. More details can be found in the paper.

The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.

Desert Locust keypoint indexes:

"snout",
"head",
"neck",
"forelegL1",
"forelegR1",
"hindlegL1",
"hindlegR1",
"tailbase",
"tailtip"

参数

ann_file (str) – Path to the annotation file.
img_prefix (str) – Path to a directory where images are held. Default: None.
data_cfg (dict) – config
pipeline (list[dict | callable]) – A sequence of data transforms.
dataset_info (DatasetInfo) – A class containing all dataset info.
test_mode (bool) – Store True when building test or validation dataset. Default: False.

evaluate(results, res_folder=None, metric='PCK', **kwargs)[源代码]¶

Evaluate Fly keypoint results. The pose prediction results will be saved in ${res_folder}/result_keypoints.json.

注解

batch_size: N
num_keypoints: K
heatmap height: H
heatmap width: W

参数

results (list[dict]) –
Testing results containing the following items:
- preds (np.ndarray[N,K,3]): The first two dimensions are coordinates, score is the third dimension of the array.
- boxes (np.ndarray[N,6]): [center[0], center[1], scale[0], scale[1],area, score]
- image_paths (list[str]): For example, [‘Test/source/0.jpg’]
- output_heatmap (np.ndarray[N, K, H, W]): model outputs.
res_folder (str, optional) – The folder to save the testing results. If not specified, a temp folder will be created. Default: None.
metric (str | list[str]) – Metric to be performed. Options: ‘PCK’, ‘AUC’, ‘EPE’.

返回

Evaluation results for evaluation metric.

返回类型

dict

class mmpose.datasets.Body3DH36MDataset(ann_file, img_prefix, data_cfg, pipeline, dataset_info=None, test_mode=False)[源代码]¶

Human3.6M dataset for 3D human pose estimation.

“Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments”, TPAMI`2014. More details can be found in the paper.

Human3.6M keypoint indexes:

'root (pelvis)',
'right_hip',
'right_knee',
'right_foot',
'left_hip',
'left_knee',
'left_foot',
'spine',
'thorax',
'neck_base',
'head',
'left_shoulder',
'left_elbow',
'left_wrist',
'right_shoulder',
'right_elbow',
'right_wrist'

参数

ann_file (str) – Path to the annotation file.
img_prefix (str) – Path to a directory where images are held. Default: None.
data_cfg (dict) – config
pipeline (list[dict | callable]) – A sequence of data transforms.
dataset_info (DatasetInfo) – A class containing all dataset info.
test_mode (bool) – Store True when building test or validation dataset. Default: False.

build_sample_indices()[源代码]¶

Split original videos into sequences and build frame indices.

This method overrides the default one in the base class.

evaluate(results, res_folder=None, metric='mpjpe', **kwargs)[源代码]¶: Evaluate keypoint results.

get_camera_param(imgname)[源代码]¶: Get camera parameters of a frame by its image name.

load_annotations()[源代码]¶: Load data annotation.

load_config(data_cfg)[源代码]¶

Initialize dataset attributes according to the config.

Override this method to set dataset specific attributes.

class mmpose.datasets.Body3DMviewDirectCampusDataset(ann_file, img_prefix, data_cfg, pipeline, dataset_info=None, test_mode=False)[源代码]¶

Campus dataset for direct multi-view human pose estimation.

3D Pictorial Structures for Multiple Human Pose Estimation’ CVPR’2014 More details can be found in the paper <http://campar.in.tum.de/pub/belagiannis2014cvpr/belagiannis2014cvpr.pdf>

The dataset loads both 2D and 3D annotations as well as camera parameters. It is worth mentioning that when training multi-view 3D pose models, due to the limited and incomplete annotations of this dataset, we may not use this dataset to train the model. Instead, we use the 2D pose estimator trained on COCO, and use independent 3D human poses from the CMU Panoptic dataset to train the 3D model. For testing, we first estimate 2D poses and generate 2D heatmaps for this dataset as the input to 3D model.

Campus keypoint indices:

'Right-Ankle': 0,
'Right-Knee': 1,
'Right-Hip': 2,
'Left-Hip': 3,
'Left-Knee': 4,
'Left-Ankle': 5,
'Right-Wrist': 6,
'Right-Elbow': 7,
'Right-Shoulder': 8,
'Left-Shoulder': 9,
'Left-Elbow': 10,
'Left-Wrist': 11,
'Bottom-Head': 12,
'Top-Head': 13,

参数

ann_file (str) – Path to the annotation file.
img_prefix (str) – Path to a directory where images are held. Default: None.
data_cfg (dict) – config
pipeline (list[dict | callable]) – A sequence of data transforms.
dataset_info (DatasetInfo) – A class containing all dataset info.
test_mode (bool) – Store True when building test or validation dataset. Default: False.

static coco2campus3D(coco_pose)[源代码]¶

transform coco order(our method output) 3d pose to campus dataset order with interpolation.

参数: coco_pose – np.array with shape 17x3

Returns: 3D pose in campus order with shape 14x3

evaluate(results, res_folder=None, metric='pcp', recall_threshold=500, alpha_error=0.5, **kwargs)[源代码]¶

参数

results (list[dict]) – Testing results containing the following items: - pose_3d (np.ndarray): predicted 3D human pose - sample_id (np.ndarray): sample id of a frame.
res_folder (str, optional) – The folder to save the testing results. If not specified, a temp folder will be created. Default: None.
metric (str | list[str]) – Metric to be performed. Defaults: ‘pcp’.
recall_threshold – threshold for calculating recall.
alpha_error – coefficient when calculating error for correct parts.
**kwargs –

Returns:

static get_new_center(center_list)[源代码]¶

Generate new center or select from the center list randomly.

The proability and the parameters related to cooridinates can also be tuned, just make sure that the center is within the given 3D space.

isvalid(new_center, bbox, bbox_list)[源代码]¶

Check if the new person bbox are valid, which need to satisfies:

the center is visible in at least 2 views, and
have a sufficiently small iou with all other person bboxes.

load_config(data_cfg)[源代码]¶

Initialize dataset attributes according to the config.

Override this method to set dataset specific attributes.

class mmpose.datasets.Body3DMviewDirectPanopticDataset(ann_file, img_prefix, data_cfg, pipeline, dataset_info=None, test_mode=False)[源代码]¶

Panoptic dataset for direct multi-view human pose estimation.

Panoptic Studio: A Massively Multiview System for Social Motion Capture’ ICCV’2015 More details can be found in the `paper .

The dataset loads both 2D and 3D annotations as well as camera parameters.

Panoptic keypoint indexes:

'neck': 0,
'nose': 1,
'mid-hip': 2,
'l-shoulder': 3,
'l-elbow': 4,
'l-wrist': 5,
'l-hip': 6,
'l-knee': 7,
'l-ankle': 8,
'r-shoulder': 9,
'r-elbow': 10,
'r-wrist': 11,
'r-hip': 12,
'r-knee': 13,
'r-ankle': 14,
'l-eye': 15,
'l-ear': 16,
'r-eye': 17,
'r-ear': 18,

参数

ann_file (str) – Path to the annotation file.
img_prefix (str) – Path to a directory where images are held. Default: None.
data_cfg (dict) – config
pipeline (list[dict | callable]) – A sequence of data transforms.
dataset_info (DatasetInfo) – A class containing all dataset info.
test_mode (bool) – Store True when building test or validation dataset. Default: False.

evaluate(results, res_folder=None, metric='mpjpe', **kwargs)[源代码]¶

参数

results (list[dict]) – Testing results containing the following items: - pose_3d (np.ndarray): predicted 3D human pose - sample_id (np.ndarray): sample id of a frame.
res_folder (str, optional) – The folder to save the testing results. If not specified, a temp folder will be created. Default: None.
metric (str | list[str]) – Metric to be performed. Defaults: ‘mpjpe’.
**kwargs –

Returns:

load_config(data_cfg)[源代码]¶

Initialize dataset attributes according to the config.

Override this method to set dataset specific attributes.

class mmpose.datasets.Body3DMviewDirectShelfDataset(ann_file, img_prefix, data_cfg, pipeline, dataset_info=None, test_mode=False)[源代码]¶

Shelf dataset for direct multi-view human pose estimation.

3D Pictorial Structures for Multiple Human Pose Estimation’ CVPR’2014 More details can be found in the paper <http://campar.in.tum.de/pub/belagiannis2014cvpr/belagiannis2014cvpr.pdf>

The dataset loads both 2D and 3D annotations as well as camera parameters. It is worth mentioning that when training multi-view 3D pose models, due to the limited and incomplete annotations of this dataset, we may not use this dataset to train the model. Instead, we use the 2D pose estimator trained on COCO, and use independent 3D human poses from the CMU Panoptic dataset to train the 3D model. For testing, we first estimate 2D poses and generate 2D heatmaps for this dataset as the input to 3D model.

Shelf keypoint indices:

'Right-Ankle': 0,
'Right-Knee': 1,
'Right-Hip': 2,
'Left-Hip': 3,
'Left-Knee': 4,
'Left-Ankle': 5,
'Right-Wrist': 6,
'Right-Elbow': 7,
'Right-Shoulder': 8,
'Left-Shoulder': 9,
'Left-Elbow': 10,
'Left-Wrist': 11,
'Bottom-Head': 12,
'Top-Head': 13,

参数

ann_file (str) – Path to the annotation file.
img_prefix (str) – Path to a directory where images are held. Default: None.
data_cfg (dict) – config
pipeline (list[dict | callable]) – A sequence of data transforms.
dataset_info (DatasetInfo) – A class containing all dataset info.
test_mode (bool) – Store True when building test or validation dataset. Default: False.

static coco2shelf3D(coco_pose, alpha=0.75)[源代码]¶

transform coco order(our method output) 3d pose to shelf dataset order with interpolation.

参数: coco_pose – np.array with shape 17x3

Returns: 3D pose in shelf order with shape 14x3

evaluate(results, res_folder=None, metric='pcp', recall_threshold=500, alpha_error=0.5, alpha_head=0.75, **kwargs)[源代码]¶

参数

results (list[dict]) – Testing results containing the following items: - pose_3d (np.ndarray): predicted 3D human pose - sample_id (np.ndarray): sample id of a frame.
res_folder (str, optional) – The folder to save the testing results. If not specified, a temp folder will be created. Default: None.
metric (str | list[str]) – Metric to be performed. Defaults: ‘pcp’.
recall_threshold – threshold for calculating recall.
alpha_error – coefficient when calculating correct parts.
alpha_head – coefficient for conputing head keypoints position when converting coco poses to shelf poses
**kwargs –

Returns:

static get_new_center(center_list)[源代码]¶

Generate new center or select from the center list randomly.

The proability and the parameters related to cooridinates can also be tuned, just make sure that the center is within the given 3D space.

static isvalid(bbox, bbox_list)[源代码]¶

Check if the new person bbox are valid, which need to satisfies:

have a sufficiently small iou with all other person bboxes.

load_config(data_cfg)[源代码]¶

Initialize dataset attributes according to the config.

Override this method to set dataset specific attributes.

class mmpose.datasets.BottomUpAicDataset(ann_file, img_prefix, data_cfg, pipeline, dataset_info=None, test_mode=False)[源代码]¶

Aic dataset for bottom-up pose estimation.

“AI Challenger : A Large-scale Dataset for Going Deeper in Image Understanding”, arXiv’2017. More details can be found in the paper

The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.

AIC keypoint indexes:

"right_shoulder",
"right_elbow",
"right_wrist",
"left_shoulder",
"left_elbow",
"left_wrist",
"right_hip",
"right_knee",
"right_ankle",
"left_hip",
"left_knee",
"left_ankle",
"head_top",
"neck"

参数

ann_file (str) – Path to the annotation file.
img_prefix (str) – Path to a directory where images are held. Default: None.
data_cfg (dict) – config
pipeline (list[dict | callable]) – A sequence of data transforms.
dataset_info (DatasetInfo) – A class containing all dataset info.
test_mode (bool) – Store True when building test or validation dataset. Default: False.

class mmpose.datasets.BottomUpCocoDataset(ann_file, img_prefix, data_cfg, pipeline, dataset_info=None, test_mode=False)[源代码]¶

COCO dataset for bottom-up pose estimation.

The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.

COCO keypoint indexes:

'nose',
'left_eye',
'right_eye',
'left_ear',
'right_ear',
'left_shoulder',
'right_shoulder',
'left_elbow',
'right_elbow',
'left_wrist',
'right_wrist',
'left_hip',
'right_hip',
'left_knee',
'right_knee',
'left_ankle',
'right_ankle'

参数

ann_file (str) – Path to the annotation file.
img_prefix (str) – Path to a directory where images are held. Default: None.
data_cfg (dict) – config
pipeline (list[dict | callable]) – A sequence of data transforms.
dataset_info (DatasetInfo) – A class containing all dataset info.
test_mode (bool) – Store True when building test or validation dataset. Default: False.

evaluate(results, res_folder=None, metric='mAP', **kwargs)[源代码]¶

Evaluate coco keypoint results. The pose prediction results will be saved in ${res_folder}/result_keypoints.json.

注解

num_people: P
num_keypoints: K

参数

results (list[dict]) –
Testing results containing the following items:
- preds (list[np.ndarray(P, K, 3+tag_num)]): Pose predictions for all people in images.
- scores (list[P]): List of person scores.
- image_path (list[str]): For example, [‘coco/images/ val2017/000000397133.jpg’]
- heatmap (np.ndarray[N, K, H, W]): model outputs.
res_folder (str, optional) – The folder to save the testing results. If not specified, a temp folder will be created. Default: None.
metric (str | list[str]) – Metric to be performed. Defaults: ‘mAP’.

返回

Evaluation results for evaluation metric.

返回类型

dict

class mmpose.datasets.BottomUpCocoWholeBodyDataset(ann_file, img_prefix, data_cfg, pipeline, dataset_info=None, test_mode=False)[源代码]¶

CocoWholeBodyDataset dataset for bottom-up pose estimation.

Whole-Body Human Pose Estimation in the Wild’, ECCV’2020. More details can be found in the `paper .

The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.

In total, we have 133 keypoints for wholebody pose estimation.

COCO-WholeBody keypoint indexes:

0-16: 17 body keypoints,
17-22: 6 foot keypoints,
23-90: 68 face keypoints,
91-132: 42 hand keypoints

参数

ann_file (str) – Path to the annotation file.
img_prefix (str) – Path to a directory where images are held. Default: None.
data_cfg (dict) – config
pipeline (list[dict | callable]) – A sequence of data transforms.
dataset_info (DatasetInfo) – A class containing all dataset info.
test_mode (bool) – Store True when building test or validation dataset. Default: False.

class mmpose.datasets.BottomUpCrowdPoseDataset(ann_file, img_prefix, data_cfg, pipeline, dataset_info=None, test_mode=False)[源代码]¶

CrowdPose dataset for bottom-up pose estimation.

“CrowdPose: Efficient Crowded Scenes Pose Estimation and A New Benchmark”, CVPR’2019. More details can be found in the paper.

The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.

CrowdPose keypoint indexes:

'left_shoulder',
'right_shoulder',
'left_elbow',
'right_elbow',
'left_wrist',
'right_wrist',
'left_hip',
'right_hip',
'left_knee',
'right_knee',
'left_ankle',
'right_ankle',
'top_head',
'neck'

参数

ann_file (str) – Path to the annotation file.
img_prefix (str) – Path to a directory where images are held. Default: None.
data_cfg (dict) – config
pipeline (list[dict | callable]) – A sequence of data transforms.
dataset_info (DatasetInfo) – A class containing all dataset info.
test_mode (bool) – Store True when building test or validation dataset. Default: False.

class mmpose.datasets.BottomUpMhpDataset(ann_file, img_prefix, data_cfg, pipeline, dataset_info=None, test_mode=False)[源代码]¶

MHPv2.0 dataset for top-down pose estimation.

“Understanding Humans in Crowded Scenes: Deep Nested Adversarial Learning and A New Benchmark for Multi-Human Parsing”, ACM MM’2018. More details can be found in the paper

The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.

MHP keypoint indexes:

"right ankle",
"right knee",
"right hip",
"left hip",
"left knee",
"left ankle",
"pelvis",
"thorax",
"upper neck",
"head top",
"right wrist",
"right elbow",
"right shoulder",
"left shoulder",
"left elbow",
"left wrist",

参数

ann_file (str) – Path to the annotation file.
img_prefix (str) – Path to a directory where images are held. Default: None.
data_cfg (dict) – config
pipeline (list[dict | callable]) – A sequence of data transforms.
dataset_info (DatasetInfo) – A class containing all dataset info.
test_mode (bool) – Store True when building test or validation dataset. Default: False.

class mmpose.datasets.Compose(transforms)[源代码]¶

Compose a data pipeline with a sequence of transforms.

参数: transforms (list[dict | callable]) – Either config dicts of transforms or transform objects.

class mmpose.datasets.DeepFashionDataset(ann_file, img_prefix, data_cfg, pipeline, subset='', dataset_info=None, test_mode=False)[源代码]¶

DeepFashion dataset (full-body clothes) for fashion landmark detection.

“DeepFashion: Powering Robust Clothes Recognition and Retrieval with Rich Annotations”, CVPR’2016. “Fashion Landmark Detection in the Wild”, ECCV’2016.

The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.

The dataset contains 3 categories for full-body, upper-body and lower-body.

Fashion landmark indexes for upper-body clothes:

'left collar',
'right collar',
'left sleeve',
'right sleeve',
'left hem',
'right hem'

Fashion landmark indexes for lower-body clothes:

'left waistline',
'right waistline',
'left hem',
'right hem'

Fashion landmark indexes for full-body clothes:

'left collar',
'right collar',
'left sleeve',
'right sleeve',
'left waistline',
'right waistline',
'left hem',
'right hem'

参数

ann_file (str) – Path to the annotation file.
img_prefix (str) – Path to a directory where images are held. Default: None.
data_cfg (dict) – config
pipeline (list[dict | callable]) – A sequence of data transforms.
dataset_info (DatasetInfo) – A class containing all dataset info.
test_mode (bool) – Store True when building test or validation dataset. Default: False.

evaluate(results, res_folder=None, metric='PCK', **kwargs)[源代码]¶

Evaluate freihand keypoint results. The pose prediction results will be saved in ${res_folder}/result_keypoints.json.

注解

batch_size: N
num_keypoints: K
heatmap height: H
heatmap width: W

参数

results (list[dict]) –
Testing results containing the following items:
- preds (np.ndarray[N,K,3]): The first two dimensions are coordinates, score is the third dimension of the array.
- boxes (np.ndarray[N,6]): [center[0], center[1], scale[0], scale[1],area, score]
- image_paths (list[str]): For example, [‘img_00000001.jpg’]
- output_heatmap (np.ndarray[N, K, H, W]): model outputs.
res_folder (str, optional) – The folder to save the testing results. If not specified, a temp folder will be created. Default: None.
metric (str | list[str]) – Metric to be performed. Options: ‘PCK’, ‘AUC’, ‘EPE’.

返回

Evaluation results for evaluation metric.

返回类型

dict

class mmpose.datasets.DistributedSampler(dataset, num_replicas=None, rank=None, shuffle=True, seed=0)[源代码]¶

DistributedSampler inheriting from torch.utils.data.DistributedSampler.

In pytorch of lower versions, there is no shuffle argument. This child class will port one to DistributedSampler.

class mmpose.datasets.Face300WDataset(ann_file, img_prefix, data_cfg, pipeline, dataset_info=None, test_mode=False)[源代码]¶

Face300W dataset for top-down face keypoint localization.

“300 faces In-the-wild challenge: Database and results”, Image and Vision Computing (IMAVIS) 2019.

The dataset loads raw images and apply specified transforms to return a dict containing the image tensors and other information.

The landmark annotations follow the 68 points mark-up. The definition can be found in https://ibug.doc.ic.ac.uk/resources/300-W/.

参数

ann_file (str) – Path to the annotation file.
img_prefix (str) – Path to a directory where images are held. Default: None.
data_cfg (dict) – config
pipeline (list[dict | callable]) – A sequence of data transforms.
dataset_info (DatasetInfo) – A class containing all dataset info.
test_mode (bool) – Store True when building test or validation dataset. Default: False.

evaluate(results, res_folder=None, metric='NME', **kwargs)[源代码]¶

Evaluate freihand keypoint results. The pose prediction results will be saved in ${res_folder}/result_keypoints.json.

注解

batch_size: N
num_keypoints: K
heatmap height: H
heatmap width: W

参数

results (list[dict]) –
Testing results containing the following items:
- preds (np.ndarray[1,K,3]): The first two dimensions are coordinates, score is the third dimension of the array.
- boxes (np.ndarray[1,6]): [center[0], center[1], scale[0], scale[1],area, score]
- image_path (list[str]): For example, [‘300W/ibug/ image_018.jpg’]
- output_heatmap (np.ndarray[N, K, H, W]): model outputs.
res_folder (str, optional) – The folder to save the testing results. If not specified, a temp folder will be created. Default: None.
metric (str | list[str]) – Metric to be performed. Options: ‘NME’.

返回

Evaluation results for evaluation metric.

返回类型

dict

class mmpose.datasets.FaceAFLWDataset(ann_file, img_prefix, data_cfg, pipeline, dataset_info=None, test_mode=False)[源代码]¶

Face AFLW dataset for top-down face keypoint localization.

“Annotated Facial Landmarks in the Wild: A Large-scale, Real-world Database for Facial Landmark Localization”. In Proc. First IEEE International Workshop on Benchmarking Facial Image Analysis Technologies, 2011.

The dataset loads raw images and apply specified transforms to return a dict containing the image tensors and other information.

The landmark annotations follow the 19 points mark-up. The definition can be found in https://www.tugraz.at/institute/icg/research /team-bischof/lrs/downloads/aflw/

参数

ann_file (str) – Path to the annotation file.
img_prefix (str) – Path to a directory where images are held. Default: None.
data_cfg (dict) – config
pipeline (list[dict | callable]) – A sequence of data transforms.
dataset_info (DatasetInfo) – A class containing all dataset info.
test_mode (bool) – Store True when building test or validation dataset. Default: False.

evaluate(results, res_folder=None, metric='NME', **kwargs)[源代码]¶

Evaluate freihand keypoint results. The pose prediction results will be saved in ${res_folder}/result_keypoints.json.

注解

batch_size: N
num_keypoints: K
heatmap height: H
heatmap width: W

参数

results (list[dict]) –
Testing results containing the following items:
- preds (np.ndarray[1,K,3]): The first two dimensions are coordinates, score is the third dimension of the array.
- boxes (np.ndarray[1,6]): [center[0], center[1], scale[0], scale[1],area, score]
- image_path (list[str]): For example, [‘aflw/images/flickr/ 0/image00002.jpg’]
- output_heatmap (np.ndarray[N, K, H, W]): model outputs.
res_folder (str, optional) – The folder to save the testing results. If not specified, a temp folder will be created. Default: None.
metric (str | list[str]) – Metric to be performed. Options: ‘NME’.

返回

Evaluation results for evaluation metric.

返回类型

dict

class mmpose.datasets.FaceCOFWDataset(ann_file, img_prefix, data_cfg, pipeline, dataset_info=None, test_mode=False)[源代码]¶

Face COFW dataset for top-down face keypoint localization.

“Robust face landmark estimation under occlusion”, ICCV’2013.

The dataset loads raw images and apply specified transforms to return a dict containing the image tensors and other information.

The landmark annotations follow the 29 points mark-up. The definition can be found in http://www.vision.caltech.edu/xpburgos/ICCV13/.

参数

ann_file (str) – Path to the annotation file.
img_prefix (str) – Path to a directory where images are held. Default: None.
data_cfg (dict) – config
pipeline (list[dict | callable]) – A sequence of data transforms.
dataset_info (DatasetInfo) – A class containing all dataset info.
test_mode (bool) – Store True when building test or validation dataset. Default: False.

evaluate(results, res_folder=None, metric='NME', **kwargs)[源代码]¶

Evaluate freihand keypoint results. The pose prediction results will be saved in ${res_folder}/result_keypoints.json.

注解

batch_size: N
num_keypoints: K
heatmap height: H
heatmap width: W

参数

results (list[dict]) –
Testing results containing the following items:
- preds (np.ndarray[1,K,3]): The first two dimensions are coordinates, score is the third dimension of the array.
- boxes (np.ndarray[1,6]): [center[0], center[1], scale[0], scale[1],area, score]
- image_path (list[str]): For example, [‘cofw/images/ 000001.jpg’]
- output_heatmap (np.ndarray[N, K, H, W]): model outputs.
res_folder (str, optional) – The folder to save the testing results. If not specified, a temp folder will be created. Default: None.
metric (str | list[str]) – Metric to be performed. Options: ‘NME’.

返回

Evaluation results for evaluation metric.

返回类型

dict

class mmpose.datasets.FaceCocoWholeBodyDataset(ann_file, img_prefix, data_cfg, pipeline, dataset_info=None, test_mode=False)[源代码]¶

CocoWholeBodyDataset for face keypoint localization.

Whole-Body Human Pose Estimation in the Wild’, ECCV’2020. More details can be found in the `paper .

The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.

The face landmark annotations follow the 68 points mark-up.

参数

ann_file (str) – Path to the annotation file.
img_prefix (str) – Path to a directory where images are held. Default: None.
data_cfg (dict) – config
pipeline (list[dict | callable]) – A sequence of data transforms.
dataset_info (DatasetInfo) – A class containing all dataset info.
test_mode (bool) – Store True when building test or validation dataset. Default: False.

evaluate(results, res_folder=None, metric='NME', **kwargs)[源代码]¶

Evaluate COCO-WholeBody Face keypoint results. The pose prediction results will be saved in ${res_folder}/result_keypoints.json.

注解

batch_size: N
num_keypoints: K
heatmap height: H
heatmap width: W

参数

results (list[dict]) –
Testing results containing the following items:
- preds (np.ndarray[1,K,3]): The first two dimensions are coordinates, score is the third dimension of the array.
- boxes (np.ndarray[1,6]): [center[0], center[1], scale[0], scale[1],area, score]
- image_path (list[str]): For example, [‘coco/train2017/ 000000000009.jpg’]
- output_heatmap (np.ndarray[N, K, H, W]): model outputs.
res_folder (str, optional) – The folder to save the testing results. If not specified, a temp folder will be created. Default: None.
metric (str | list[str]) – Metric to be performed. Options: ‘NME’.

返回

Evaluation results for evaluation metric.

返回类型

dict

class mmpose.datasets.FaceWFLWDataset(ann_file, img_prefix, data_cfg, pipeline, dataset_info=None, test_mode=False)[源代码]¶

Face WFLW dataset for top-down face keypoint localization.

“Look at Boundary: A Boundary-Aware Face Alignment Algorithm”, CVPR’2018.

The dataset loads raw images and apply specified transforms to return a dict containing the image tensors and other information.

The landmark annotations follow the 98 points mark-up. The definition can be found in https://wywu.github.io/projects/LAB/WFLW.html.

参数

ann_file (str) – Path to the annotation file.
img_prefix (str) – Path to a directory where images are held. Default: None.
data_cfg (dict) – config
pipeline (list[dict | callable]) – A sequence of data transforms.
dataset_info (DatasetInfo) – A class containing all dataset info.
test_mode (bool) – Store True when building test or validation dataset. Default: False.

evaluate(results, res_folder=None, metric='NME', **kwargs)[源代码]¶

Evaluate freihand keypoint results. The pose prediction results will be saved in ${res_folder}/result_keypoints.json.

注解

batch_size: N
num_keypoints: K
heatmap height: H
heatmap width: W

参数

results (list[dict]) –
Testing results containing the following items:
- preds (np.ndarray[1,K,3]): The first two dimensions are coordinates, score is the third dimension of the array.
- boxes (np.ndarray[1,6]): [center[0], center[1], scale[0], scale[1],area, score]
- image_path (list[str]): For example, [‘wflw/images/ 0–Parade/0_Parade_marchingband_1_1015.jpg’]
- output_heatmap (np.ndarray[N, K, H, W]): model outputs.
res_folder (str, optional) – The folder to save the testing results. If not specified, a temp folder will be created. Default: None.
metric (str | list[str]) – Metric to be performed. Options: ‘NME’.

返回

Evaluation results for evaluation metric.

返回类型

dict

class mmpose.datasets.FreiHandDataset(ann_file, img_prefix, data_cfg, pipeline, dataset_info=None, test_mode=False)[源代码]¶

FreiHand dataset for top-down hand pose estimation.

“FreiHAND: A Dataset for Markerless Capture of Hand Pose and Shape from Single RGB Images”, ICCV’2019. More details can be found in the paper .

The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.

FreiHand keypoint indexes:

'wrist',
'thumb1',
'thumb2',
'thumb3',
'thumb4',
'forefinger1',
'forefinger2',
'forefinger3',
'forefinger4',
'middle_finger1',
'middle_finger2',
'middle_finger3',
'middle_finger4',
'ring_finger1',
'ring_finger2',
'ring_finger3',
'ring_finger4',
'pinky_finger1',
'pinky_finger2',
'pinky_finger3',
'pinky_finger4'

参数

ann_file (str) – Path to the annotation file.
img_prefix (str) – Path to a directory where images are held. Default: None.
data_cfg (dict) – config
pipeline (list[dict | callable]) – A sequence of data transforms.
dataset_info (DatasetInfo) – A class containing all dataset info.
test_mode (bool) – Store True when building test or validation dataset. Default: False.

evaluate(results, res_folder=None, metric='PCK', **kwargs)[源代码]¶

Evaluate freihand keypoint results. The pose prediction results will be saved in ${res_folder}/result_keypoints.json.

注解

batch_size: N
num_keypoints: K
heatmap height: H
heatmap width: W

参数

results (list[dict]) –
Testing results containing the following items:
- preds (np.ndarray[N,K,3]): The first two dimensions are coordinates, score is the third dimension of the array.
- boxes (np.ndarray[N,6]): [center[0], center[1], scale[0], scale[1],area, score]
- image_paths (list[str]): For example, [‘training/rgb/ 00031426.jpg’]
- output_heatmap (np.ndarray[N, K, H, W]): model outputs.
res_folder (str, optional) – The folder to save the testing results. If not specified, a temp folder will be created. Default: None.
metric (str | list[str]) – Metric to be performed. Options: ‘PCK’, ‘AUC’, ‘EPE’.

返回

Evaluation results for evaluation metric.

返回类型

dict

class mmpose.datasets.HandCocoWholeBodyDataset(ann_file, img_prefix, data_cfg, pipeline, dataset_info=None, test_mode=False)[源代码]¶

CocoWholeBodyDataset for top-down hand pose estimation.

“Whole-Body Human Pose Estimation in the Wild”, ECCV’2020. More details can be found in the paper .

The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.

COCO-WholeBody Hand keypoint indexes:

'wrist',
'thumb1',
'thumb2',
'thumb3',
'thumb4',
'forefinger1',
'forefinger2',
'forefinger3',
'forefinger4',
'middle_finger1',
'middle_finger2',
'middle_finger3',
'middle_finger4',
'ring_finger1',
'ring_finger2',
'ring_finger3',
'ring_finger4',
'pinky_finger1',
'pinky_finger2',
'pinky_finger3',
'pinky_finger4'

参数

ann_file (str) – Path to the annotation file.
img_prefix (str) – Path to a directory where images are held. Default: None.
data_cfg (dict) – config
pipeline (list[dict | callable]) – A sequence of data transforms.
dataset_info (DatasetInfo) – A class containing all dataset info.
test_mode (bool) – Store True when building test or validation dataset. Default: False.

evaluate(results, res_folder=None, metric='PCK', **kwargs)[源代码]¶

Evaluate COCO-WholeBody Hand keypoint results. The pose prediction results will be saved in ${res_folder}/result_keypoints.json.

注解

batch_size: N
num_keypoints: K
heatmap height: H
heatmap width: W

参数

results (list[dict]) –
Testing results containing the following items:
- preds (np.ndarray[N,K,3]): The first two dimensions are coordinates, score is the third dimension of the array.
- boxes (np.ndarray[N,6]): [center[0], center[1], scale[0], scale[1],area, score]
- image_paths (list[str]): For example, [‘Test/source/0.jpg’]
- output_heatmap (np.ndarray[N, K, H, W]): model outputs.
res_folder (str, optional) – The folder to save the testing results. If not specified, a temp folder will be created. Default: None.
metric (str | list[str]) – Metric to be performed. Options: ‘PCK’, ‘AUC’, ‘EPE’.

返回

Evaluation results for evaluation metric.

返回类型

dict

class mmpose.datasets.InterHand2DDataset(ann_file, camera_file, joint_file, img_prefix, data_cfg, pipeline, dataset_info=None, test_mode=False)[源代码]¶

InterHand2.6M 2D dataset for top-down hand pose estimation.

“InterHand2.6M: A Dataset and Baseline for 3D Interacting Hand Pose Estimation from a Single RGB Image”, ECCV’2020. More details can be found in the paper .

The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.

InterHand2.6M keypoint indexes:

'thumb4',
'thumb3',
'thumb2',
'thumb1',
'forefinger4',
'forefinger3',
'forefinger2',
'forefinger1',
'middle_finger4',
'middle_finger3',
'middle_finger2',
'middle_finger1',
'ring_finger4',
'ring_finger3',
'ring_finger2',
'ring_finger1',
'pinky_finger4',
'pinky_finger3',
'pinky_finger2',
'pinky_finger1',
'wrist'

参数

ann_file (str) – Path to the annotation file.
camera_file (str) – Path to the camera file.
joint_file (str) – Path to the joint file.
img_prefix (str) – Path to a directory where images are held. Default: None.
data_cfg (dict) – config
pipeline (list[dict | callable]) – A sequence of data transforms.
dataset_info (DatasetInfo) – A class containing all dataset info.
test_mode (str) – Store True when building test or validation dataset. Default: False.

evaluate(results, res_folder=None, metric='PCK', **kwargs)[源代码]¶

Evaluate interhand2d keypoint results. The pose prediction results will be saved in ${res_folder}/result_keypoints.json.

注解

batch_size: N
num_keypoints: K
heatmap height: H
heatmap width: W

参数

results (list[dict]) –
Testing results containing the following items:
- preds (np.ndarray[N,K,3]): The first two dimensions are coordinates, score is the third dimension of the array.
- boxes (np.ndarray[N,6]): [center[0], center[1], scale[0], scale[1],area, score]
- image_paths (list[str]): For example, [‘Capture12/ 0390_dh_touchROM/cam410209/image62434.jpg’]
- output_heatmap (np.ndarray[N, K, H, W]): model outputs.
res_folder (str, optional) – The folder to save the testing results. If not specified, a temp folder will be created. Default: None.
metric (str | list[str]) – Metric to be performed. Options: ‘PCK’, ‘AUC’, ‘EPE’.

返回

Evaluation results for evaluation metric.

返回类型

dict

class mmpose.datasets.InterHand3DDataset(ann_file, camera_file, joint_file, img_prefix, data_cfg, pipeline, use_gt_root_depth=True, rootnet_result_file=None, dataset_info=None, test_mode=False)[源代码]¶

InterHand2.6M 3D dataset for top-down hand pose estimation.

“InterHand2.6M: A Dataset and Baseline for 3D Interacting Hand Pose Estimation from a Single RGB Image”, ECCV’2020. More details can be found in the paper .

The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.

InterHand2.6M keypoint indexes:

'r_thumb4',
'r_thumb3',
'r_thumb2',
'r_thumb1',
'r_index4',
'r_index3',
'r_index2',
'r_index1',
'r_middle4',
'r_middle3',
'r_middle2',
'r_middle1',
'r_ring4',
'r_ring3',
'r_ring2',
'r_ring1',
'r_pinky4',
'r_pinky3',
'r_pinky2',
'r_pinky1',
'r_wrist',
'l_thumb4',
'l_thumb3',
'l_thumb2',
'l_thumb1',
'l_index4',
'l_index3',
'l_index2',
'l_index1',
'l_middle4',
'l_middle3',
'l_middle2',
'l_middle1',
'l_ring4',
'l_ring3',
'l_ring2',
'l_ring1',
'l_pinky4',
'l_pinky3',
'l_pinky2',
'l_pinky1',
'l_wrist'

参数

ann_file (str) – Path to the annotation file.
camera_file (str) – Path to the camera file.
joint_file (str) – Path to the joint file.
img_prefix (str) – Path to a directory where images are held. Default: None.
data_cfg (dict) – config
pipeline (list[dict | callable]) – A sequence of data transforms.
use_gt_root_depth (bool) – Using the ground truth depth of the wrist or given depth from rootnet_result_file.
rootnet_result_file (str) – Path to the wrist depth file.
dataset_info (DatasetInfo) – A class containing all dataset info.
test_mode (str) – Store True when building test or validation dataset. Default: False.

evaluate(results, res_folder=None, metric='MPJPE', **kwargs)[源代码]¶

Evaluate interhand2d keypoint results. The pose prediction results will be saved in ${res_folder}/result_keypoints.json.

注解

batch_size: N
num_keypoints: K
heatmap height: H
heatmap width: W

参数

results (list[dict]) –
Testing results containing the following items:
- preds (np.ndarray[N,K,3]): The first two dimensions are coordinates, score is the third dimension of the array.
- hand_type (np.ndarray[N, 4]): The first two dimensions are hand type, scores is the last two dimensions.
- rel_root_depth (np.ndarray[N]): The relative depth of left wrist and right wrist.
- boxes (np.ndarray[N,6]): [center[0], center[1], scale[0], scale[1],area, score]
- image_paths (list[str]): For example, [‘Capture6/ 0012_aokay_upright/cam410061/image4996.jpg’]
- output_heatmap (np.ndarray[N, K, H, W]): model outputs.
res_folder (str, optional) – The folder to save the testing results. If not specified, a temp folder will be created. Default: None.
metric (str | list[str]) – Metric to be performed. Options: ‘MRRPE’, ‘MPJPE’, ‘Handedness_acc’.

返回

Evaluation results for evaluation metric.

返回类型

dict

class mmpose.datasets.MeshAdversarialDataset(train_dataset, adversarial_dataset)[源代码]¶

Mix Dataset for the adversarial training in 3D human mesh estimation task.

The dataset combines data from two datasets and return a dict containing data from two datasets.

参数

train_dataset (Dataset) – Dataset for 3D human mesh estimation.
adversarial_dataset (Dataset) – Dataset for adversarial learning, provides real SMPL parameters.

class mmpose.datasets.MeshH36MDataset(ann_file, img_prefix, data_cfg, pipeline, test_mode=False)[源代码]¶

Human3.6M Dataset for 3D human mesh estimation. It inherits all function from MeshBaseDataset and has its own evaluate function.

The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.

参数

ann_file (str) – Path to the annotation file.
img_prefix (str) – Path to a directory where images are held. Default: None.
data_cfg (dict) – config
pipeline (list[dict | callable]) – A sequence of data transforms.
test_mode (bool) – Store True when building test or validation dataset. Default: False.

evaluate(outputs, res_folder, metric='joint_error', logger=None)[源代码]¶: Evaluate 3D keypoint results.

class mmpose.datasets.MeshMixDataset(configs, partition)[源代码]¶

Mix Dataset for 3D human mesh estimation.

The dataset combines data from multiple datasets (MeshBaseDataset) and sample the data from different datasets with the provided proportions. The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.

参数

configs (list) – List of configs for multiple datasets.
partition (list) – Sample proportion of multiple datasets. The length of partition should be same with that of configs. The elements of it should be non-negative and is not necessary summing up to one.

示例

>>> from mmpose.datasets import MeshMixDataset
>>> data_cfg = dict(
>>>     image_size=[256, 256],
>>>     iuv_size=[64, 64],
>>>     num_joints=24,
>>>     use_IUV=True,
>>>     uv_type='BF')
>>>
>>> mix_dataset = MeshMixDataset(
>>>     configs=[
>>>         dict(
>>>             ann_file='tests/data/h36m/test_h36m.npz',
>>>             img_prefix='tests/data/h36m',
>>>             data_cfg=data_cfg,
>>>             pipeline=[]),
>>>         dict(
>>>             ann_file='tests/data/h36m/test_h36m.npz',
>>>             img_prefix='tests/data/h36m',
>>>             data_cfg=data_cfg,
>>>             pipeline=[]),
>>>     ],
>>>     partition=[0.6, 0.4])

class mmpose.datasets.MoshDataset(ann_file, pipeline, test_mode=False)[源代码]¶

Mosh Dataset for the adversarial training in 3D human mesh estimation task.

The dataset return a dict containing real-world SMPL parameters.

参数

ann_file (str) – Path to the annotation file.
pipeline (list[dict | callable]) – A sequence of data transforms.
test_mode (bool) – Store True when building test or validation dataset. Default: False.

class mmpose.datasets.NVGestureDataset(ann_file, vid_prefix, data_cfg, pipeline, dataset_info=None, test_mode=False)[源代码]¶

NVGesture dataset for gesture recognition.

“Online Detection and Classification of Dynamic Hand Gestures With Recurrent 3D Convolutional Neural Network”, Conference on Computer Vision and Pattern Recognition (CVPR) 2016.

The dataset loads raw videos and apply specified transforms to return a dict containing the image tensors and other information.

参数

ann_file (str) – Path to the annotation file.
vid_prefix (str) – Path to a directory where videos are held.
data_cfg (dict) – config
pipeline (list[dict | callable]) – A sequence of data transforms.
dataset_info (DatasetInfo) – A class containing all dataset info.
test_mode (bool) – Store True when building test or validation dataset. Default: False.

evaluate(results, res_folder=None, metric='AP', **kwargs)[源代码]¶

Evaluate nvgesture recognition results. The gesture prediction results will be saved in ${res_folder}/result_gesture.json.

注解

batch_size: N
heatmap length: L

参数

results (dict) – Testing results containing the following items: - logits (dict[str, torch.tensor[N,25,L]]): For each item, the key represents the modality of input video, while the value represents the prediction of gesture. Three dimensions represent batch, category and temporal length, respectively. - label (np.ndarray[N]): [center[0], center[1], scale[0], scale[1],area, score]
res_folder (str, optional) – The folder to save the testing results. If not specified, a temp folder will be created. Default: None.
metric (str | list[str]) – Metric to be performed. Options: ‘AP’.

返回

Evaluation results for evaluation metric.

返回类型

dict

class mmpose.datasets.OneHand10KDataset(ann_file, img_prefix, data_cfg, pipeline, dataset_info=None, test_mode=False)[源代码]¶

OneHand10K dataset for top-down hand pose estimation.

“Mask-pose Cascaded CNN for 2D Hand Pose Estimation from Single Color Images”, TCSVT’2019. More details can be found in the paper .

The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.

OneHand10K keypoint indexes:

'wrist',
'thumb1',
'thumb2',
'thumb3',
'thumb4',
'forefinger1',
'forefinger2',
'forefinger3',
'forefinger4',
'middle_finger1',
'middle_finger2',
'middle_finger3',
'middle_finger4',
'ring_finger1',
'ring_finger2',
'ring_finger3',
'ring_finger4',
'pinky_finger1',
'pinky_finger2',
'pinky_finger3',
'pinky_finger4'

参数

ann_file (str) – Path to the annotation file.
img_prefix (str) – Path to a directory where images are held. Default: None.
data_cfg (dict) – config
pipeline (list[dict | callable]) – A sequence of data transforms.
dataset_info (DatasetInfo) – A class containing all dataset info.
test_mode (bool) – Store True when building test or validation dataset. Default: False.

evaluate(results, res_folder=None, metric='PCK', **kwargs)[源代码]¶

Evaluate onehand10k keypoint results. The pose prediction results will be saved in ${res_folder}/result_keypoints.json.

注解

batch_size: N
num_keypoints: K
heatmap height: H
heatmap width: W

参数

results (list[dict]) –
Testing results containing the following items:
- preds (np.ndarray[N,K,3]): The first two dimensions are coordinates, score is the third dimension of the array.
- boxes (np.ndarray[N,6]): [center[0], center[1], scale[0], scale[1],area, score]
- image_paths (list[str]): For example, [‘Test/source/0.jpg’]
- output_heatmap (np.ndarray[N, K, H, W]): model outputs.
res_folder (str, optional) – The folder to save the testing results. If not specified, a temp folder will be created. Default: None.
metric (str | list[str]) – Metric to be performed. Options: ‘PCK’, ‘AUC’, ‘EPE’.

返回

Evaluation results for evaluation metric.

返回类型

dict

class mmpose.datasets.PanopticDataset(ann_file, img_prefix, data_cfg, pipeline, dataset_info=None, test_mode=False)[源代码]¶

Panoptic dataset for top-down hand pose estimation.

“Hand Keypoint Detection in Single Images using Multiview Bootstrapping”, CVPR’2017. More details can be found in the paper .

The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.

Panoptic keypoint indexes:

'wrist',
'thumb1',
'thumb2',
'thumb3',
'thumb4',
'forefinger1',
'forefinger2',
'forefinger3',
'forefinger4',
'middle_finger1',
'middle_finger2',
'middle_finger3',
'middle_finger4',
'ring_finger1',
'ring_finger2',
'ring_finger3',
'ring_finger4',
'pinky_finger1',
'pinky_finger2',
'pinky_finger3',
'pinky_finger4'

参数

ann_file (str) – Path to the annotation file.
img_prefix (str) – Path to a directory where images are held. Default: None.
data_cfg (dict) – config
pipeline (list[dict | callable]) – A sequence of data transforms.
dataset_info (DatasetInfo) – A class containing all dataset info.
test_mode (bool) – Store True when building test or validation dataset. Default: False.

evaluate(results, res_folder=None, metric='PCKh', **kwargs)[源代码]¶

Evaluate panoptic keypoint results. The pose prediction results will be saved in ${res_folder}/result_keypoints.json.

注解

batch_size: N
num_keypoints: K
heatmap height: H
heatmap width: W

参数

results (list[dict]) –
Testing results containing the following items:
- preds (np.ndarray[N,K,3]): The first two dimensions are coordinates, score is the third dimension of the array.
- boxes (np.ndarray[N,6]): [center[0], center[1], scale[0], scale[1],area, score]
- image_paths (list[str]): For example, [‘hand_labels/ manual_test/000648952_02_l.jpg’]
- output_heatmap (np.ndarray[N, K, H, W]): model outputs.
res_folder (str, optional) – The folder to save the testing results. If not specified, a temp folder will be created. Default: None.
metric (str | list[str]) – Metric to be performed. Options: ‘PCKh’, ‘AUC’, ‘EPE’.

返回

Evaluation results for evaluation metric.

返回类型

dict

class mmpose.datasets.TopDownAicDataset(ann_file, img_prefix, data_cfg, pipeline, dataset_info=None, test_mode=False)[源代码]¶

AicDataset dataset for top-down pose estimation.

“AI Challenger : A Large-scale Dataset for Going Deeper in Image Understanding”, arXiv’2017. More details can be found in the paper

The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.

AIC keypoint indexes:

"right_shoulder",
"right_elbow",
"right_wrist",
"left_shoulder",
"left_elbow",
"left_wrist",
"right_hip",
"right_knee",
"right_ankle",
"left_hip",
"left_knee",
"left_ankle",
"head_top",
"neck"

参数

ann_file (str) – Path to the annotation file.
img_prefix (str) – Path to a directory where images are held. Default: None.
data_cfg (dict) – config
pipeline (list[dict | callable]) – A sequence of data transforms.
dataset_info (DatasetInfo) – A class containing all dataset info.
test_mode (bool) – Store True when building test or validation dataset. Default: False.

class mmpose.datasets.TopDownCocoDataset(ann_file, img_prefix, data_cfg, pipeline, dataset_info=None, test_mode=False)[源代码]¶

CocoDataset dataset for top-down pose estimation.

“Microsoft COCO: Common Objects in Context”, ECCV’2014. More details can be found in the paper .

The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.

COCO keypoint indexes:

'nose',
'left_eye',
'right_eye',
'left_ear',
'right_ear',
'left_shoulder',
'right_shoulder',
'left_elbow',
'right_elbow',
'left_wrist',
'right_wrist',
'left_hip',
'right_hip',
'left_knee',
'right_knee',
'left_ankle',
'right_ankle'

参数

ann_file (str) – Path to the annotation file.
img_prefix (str) – Path to a directory where images are held. Default: None.
data_cfg (dict) – config
pipeline (list[dict | callable]) – A sequence of data transforms.
dataset_info (DatasetInfo) – A class containing all dataset info.
test_mode (bool) – Store True when building test or validation dataset. Default: False.

evaluate(results, res_folder=None, metric='mAP', **kwargs)[源代码]¶

Evaluate coco keypoint results. The pose prediction results will be saved in ${res_folder}/result_keypoints.json.

注解

batch_size: N
num_keypoints: K
heatmap height: H
heatmap width: W

参数

results (list[dict]) –
Testing results containing the following items:
- preds (np.ndarray[N,K,3]): The first two dimensions are coordinates, score is the third dimension of the array.
- boxes (np.ndarray[N,6]): [center[0], center[1], scale[0], scale[1],area, score]
- image_paths (list[str]): For example, [‘data/coco/val2017 /000000393226.jpg’]
- heatmap (np.ndarray[N, K, H, W]): model output heatmap
- bbox_id (list(int)).
res_folder (str, optional) – The folder to save the testing results. If not specified, a temp folder will be created. Default: None.
metric (str | list[str]) – Metric to be performed. Defaults: ‘mAP’.

返回

Evaluation results for evaluation metric.

返回类型

dict

class mmpose.datasets.TopDownCocoWholeBodyDataset(ann_file, img_prefix, data_cfg, pipeline, dataset_info=None, test_mode=False)[源代码]¶

CocoWholeBodyDataset dataset for top-down pose estimation.

“Whole-Body Human Pose Estimation in the Wild”, ECCV’2020. More details can be found in the paper .

The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.

COCO-WholeBody keypoint indexes:

0-16: 17 body keypoints,
17-22: 6 foot keypoints,
23-90: 68 face keypoints,
91-132: 42 hand keypoints

In total, we have 133 keypoints for wholebody pose estimation.

参数

ann_file (str) – Path to the annotation file.
img_prefix (str) – Path to a directory where images are held. Default: None.
data_cfg (dict) – config
pipeline (list[dict | callable]) – A sequence of data transforms.
dataset_info (DatasetInfo) – A class containing all dataset info.
test_mode (bool) – Store True when building test or validation dataset. Default: False.

class mmpose.datasets.TopDownCrowdPoseDataset(ann_file, img_prefix, data_cfg, pipeline, dataset_info=None, test_mode=False)[源代码]¶

CrowdPoseDataset dataset for top-down pose estimation.

“CrowdPose: Efficient Crowded Scenes Pose Estimation and A New Benchmark”, CVPR’2019. More details can be found in the paper.

The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.

CrowdPose keypoint indexes:

'left_shoulder',
'right_shoulder',
'left_elbow',
'right_elbow',
'left_wrist',
'right_wrist',
'left_hip',
'right_hip',
'left_knee',
'right_knee',
'left_ankle',
'right_ankle',
'top_head',
'neck'

参数

ann_file (str) – Path to the annotation file.
img_prefix (str) – Path to a directory where images are held. Default: None.
data_cfg (dict) – config
pipeline (list[dict | callable]) – A sequence of data transforms.
dataset_info (DatasetInfo) – A class containing all dataset info.
test_mode (bool) – Store True when building test or validation dataset. Default: False.

class mmpose.datasets.TopDownFreiHandDataset(*args, **kwargs)[源代码]¶

Deprecated TopDownFreiHandDataset.

evaluate(cfg, preds, output_dir, *args, **kwargs)[源代码]¶: Evaluate keypoint results.

class mmpose.datasets.TopDownH36MDataset(ann_file, img_prefix, data_cfg, pipeline, dataset_info=None, test_mode=False)[源代码]¶

Human3.6M dataset for top-down 2D pose estimation.

“Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments”, TPAMI`2014. More details can be found in the paper.

Human3.6M keypoint indexes:

'root (pelvis)',
'right_hip',
'right_knee',
'right_foot',
'left_hip',
'left_knee',
'left_foot',
'spine',
'thorax',
'neck_base',
'head',
'left_shoulder',
'left_elbow',
'left_wrist',
'right_shoulder',
'right_elbow',
'right_wrist'

参数

ann_file (str) – Path to the annotation file.
img_prefix (str) – Path to a directory where images are held. Default: None.
data_cfg (dict) – config
pipeline (list[dict | callable]) – A sequence of data transforms.
dataset_info (DatasetInfo) – A class containing all dataset info.
test_mode (bool) – Store True when building test or validation dataset. Default: False.

evaluate(results, res_folder=None, metric='PCK', **kwargs)[源代码]¶

Evaluate human3.6m 2d keypoint results. The pose prediction results will be saved in ${res_folder}/result_keypoints.json.

注解

batch_size: N
num_keypoints: K
heatmap height: H
heatmap width: W

参数

results (list[dict]) –
Testing results containing the following items:
- preds (np.ndarray[N,K,3]): The first two dimensions are
  coordinates, score is the third dimension of the array.
- boxes (np.ndarray[N,6]): [center[0], center[1], scale[0],
  scale[1],area, score]
- image_paths (list[str]): For example, [‘data/coco/val2017
  /000000393226.jpg’]
- heatmap (np.ndarray[N, K, H, W]): model output heatmap
- bbox_id (list(int)).
res_folder (str, optional) – The folder to save the testing results. If not specified, a temp folder will be created. Default: None.
metric (str | list[str]) – Metric to be performed. Defaults: ‘PCK’.

返回

Evaluation results for evaluation metric.

返回类型

dict

class mmpose.datasets.TopDownJhmdbDataset(ann_file, img_prefix, data_cfg, pipeline, dataset_info=None, test_mode=False)[源代码]¶

JhmdbDataset dataset for top-down pose estimation.

“Towards understanding action recognition”, ICCV’2013. More details can be found in the paper

The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.

sub-JHMDB keypoint indexes:

"neck",
"belly",
"head",
"right_shoulder",
"left_shoulder",
"right_hip",
"left_hip",
"right_elbow",
"left_elbow",
"right_knee",
"left_knee",
"right_wrist",
"left_wrist",
"right_ankle",
"left_ankle"

参数

ann_file (str) – Path to the annotation file.
img_prefix (str) – Path to a directory where images are held. Default: None.
data_cfg (dict) – config
pipeline (list[dict | callable]) – A sequence of data transforms.
dataset_info (DatasetInfo) – A class containing all dataset info.
test_mode (bool) – Store True when building test or validation dataset. Default: False.

evaluate(results, res_folder=None, metric='PCK', **kwargs)[源代码]¶

Evaluate onehand10k keypoint results. The pose prediction results will be saved in ${res_folder}/result_keypoints.json.

注解

batch_size: N
num_keypoints: K
heatmap height: H
heatmap width: W

参数

results (list[dict]) –
Testing results containing the following items:
- preds (np.ndarray[N,K,3]): The first two dimensions are coordinates, score is the third dimension of the array.
- boxes (np.ndarray[N,6]): [center[0], center[1], scale[0], scale[1],area, score]
- image_path (list[str])
- output_heatmap (np.ndarray[N, K, H, W]): model outputs.
res_folder (str, optional) – The folder to save the testing results. If not specified, a temp folder will be created. Default: None.
metric (str | list[str]) – Metric to be performed. Options: ‘PCK’, ‘tPCK’. PCK means normalized by the bounding boxes, while tPCK means normalized by the torso size.

返回

Evaluation results for evaluation metric.

返回类型

dict

class mmpose.datasets.TopDownMhpDataset(ann_file, img_prefix, data_cfg, pipeline, dataset_info=None, test_mode=False)[源代码]¶

MHPv2.0 dataset for top-down pose estimation.

“Understanding Humans in Crowded Scenes: Deep Nested Adversarial Learning and A New Benchmark for Multi-Human Parsing”, ACM MM’2018. More details can be found in the paper

Note that, the evaluation metric used here is mAP (adapted from COCO), which may be different from the official evaluation codes. ‘https://github.com/ZhaoJ9014/Multi-Human-Parsing/tree/master/’ ‘Evaluation/Multi-Human-Pose’ Please be cautious if you use the results in papers.

The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.

MHP keypoint indexes:

"right ankle",
"right knee",
"right hip",
"left hip",
"left knee",
"left ankle",
"pelvis",
"thorax",
"upper neck",
"head top",
"right wrist",
"right elbow",
"right shoulder",
"left shoulder",
"left elbow",
"left wrist",

参数

ann_file (str) – Path to the annotation file.
img_prefix (str) – Path to a directory where images are held. Default: None.
data_cfg (dict) – config
pipeline (list[dict | callable]) – A sequence of data transforms.
dataset_info (DatasetInfo) – A class containing all dataset info.
test_mode (bool) – Store True when building test or validation dataset. Default: False.

class mmpose.datasets.TopDownMpiiDataset(ann_file, img_prefix, data_cfg, pipeline, dataset_info=None, test_mode=False)[源代码]¶

MPII Dataset for top-down pose estimation.

“2D Human Pose Estimation: New Benchmark and State of the Art Analysis” ,CVPR’2014. More details can be found in the paper .

The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.

MPII keypoint indexes:

'right_ankle'
'right_knee',
'right_hip',
'left_hip',
'left_knee',
'left_ankle',
'pelvis',
'thorax',
'upper_neck',
'head_top',
'right_wrist',
'right_elbow',
'right_shoulder',
'left_shoulder',
'left_elbow',
'left_wrist'

参数

ann_file (str) – Path to the annotation file.
img_prefix (str) – Path to a directory where images are held. Default: None.
data_cfg (dict) – config
pipeline (list[dict | callable]) – A sequence of data transforms.
dataset_info (DatasetInfo) – A class containing all dataset info.
test_mode (bool) – Store True when building test or validation dataset. Default: False.

evaluate(results, res_folder=None, metric='PCKh', **kwargs)[源代码]¶

Evaluate PCKh for MPII dataset. Adapted from https://github.com/leoxiaobin/deep-high-resolution-net.pytorch Copyright (c) Microsoft, under the MIT License.

注解

batch_size: N
num_keypoints: K
heatmap height: H
heatmap width: W

参数

results (list[dict]) –
Testing results containing the following items:
- preds (np.ndarray[N,K,3]): The first two dimensions are coordinates, score is the third dimension of the array.
- boxes (np.ndarray[N,6]): [center[0], center[1], scale[0], scale[1],area, score]
- image_paths (list[str]): For example, [‘/val2017/000000 397133.jpg’]
- heatmap (np.ndarray[N, K, H, W]): model output heatmap.
res_folder (str, optional) – The folder to save the testing results. Default: None.
metric (str | list[str]) – Metrics to be performed. Defaults: ‘PCKh’.

返回

PCKh for each joint

返回类型

dict

class mmpose.datasets.TopDownMpiiTrbDataset(ann_file, img_prefix, data_cfg, pipeline, dataset_info=None, test_mode=False)[源代码]¶

MPII-TRB Dataset dataset for top-down pose estimation.

“TRB: A Novel Triplet Representation for Understanding 2D Human Body”, ICCV’2019. More details can be found in the paper .

The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.

MPII-TRB keypoint indexes:

'left_shoulder'
'right_shoulder'
'left_elbow'
'right_elbow'
'left_wrist'
'right_wrist'
'left_hip'
'right_hip'
'left_knee'
'right_knee'
'left_ankle'
'right_ankle'
'head'
'neck'

'right_neck'
'left_neck'
'medial_right_shoulder'
'lateral_right_shoulder'
'medial_right_bow'
'lateral_right_bow'
'medial_right_wrist'
'lateral_right_wrist'
'medial_left_shoulder'
'lateral_left_shoulder'
'medial_left_bow'
'lateral_left_bow'
'medial_left_wrist'
'lateral_left_wrist'
'medial_right_hip'
'lateral_right_hip'
'medial_right_knee'
'lateral_right_knee'
'medial_right_ankle'
'lateral_right_ankle'
'medial_left_hip'
'lateral_left_hip'
'medial_left_knee'
'lateral_left_knee'
'medial_left_ankle'
'lateral_left_ankle'

参数

ann_file (str) – Path to the annotation file.
img_prefix (str) – Path to a directory where images are held. Default: None.
data_cfg (dict) – config
pipeline (list[dict | callable]) – A sequence of data transforms.
dataset_info (DatasetInfo) – A class containing all dataset info.
test_mode (bool) – Store True when building test or validation dataset. Default: False.

evaluate(results, res_folder=None, metric='PCKh', **kwargs)[源代码]¶

Evaluate PCKh for MPII-TRB dataset.

注解

batch_size: N
num_keypoints: K
heatmap height: H
heatmap width: W

参数

results (list[dict]) –
Testing results containing the following items:
- preds (np.ndarray[N,K,3]): The first two dimensions are coordinates, score is the third dimension of the array.
- boxes (np.ndarray[N,6]): [center[0], center[1], scale[0], scale[1],area, score]
- image_paths (list[str]): For example, [‘/val2017/ 000000397133.jpg’]
- heatmap (np.ndarray[N, K, H, W]): model output heatmap.
- bbox_ids (list[str]): For example, [‘27407’].
res_folder (str, optional) – The folder to save the testing results. If not specified, a temp folder will be created. Default: None.
metric (str | list[str]) – Metrics to be performed. Defaults: ‘PCKh’.

返回

PCKh for each joint

返回类型

dict

class mmpose.datasets.TopDownOCHumanDataset(ann_file, img_prefix, data_cfg, pipeline, dataset_info=None, test_mode=False)[源代码]¶

OChuman dataset for top-down pose estimation.

“Pose2Seg: Detection Free Human Instance Segmentation”, CVPR’2019. More details can be found in the paper .

“Occluded Human (OCHuman)” dataset contains 8110 heavily occluded human instances within 4731 images. OCHuman dataset is designed for validation and testing. To evaluate on OCHuman, the model should be trained on COCO training set, and then test the robustness of the model to occlusion using OCHuman.

OCHuman keypoint indexes (same as COCO):

'nose',
'left_eye',
'right_eye',
'left_ear',
'right_ear',
'left_shoulder',
'right_shoulder',
'left_elbow',
'right_elbow',
'left_wrist',
'right_wrist',
'left_hip',
'right_hip',
'left_knee',
'right_knee',
'left_ankle',
'right_ankle'

参数

ann_file (str) – Path to the annotation file.
img_prefix (str) – Path to a directory where images are held. Default: None.
data_cfg (dict) – config
pipeline (list[dict | callable]) – A sequence of data transforms.
dataset_info (DatasetInfo) – A class containing all dataset info.
test_mode (bool) – Store True when building test or validation dataset. Default: False.

class mmpose.datasets.TopDownOneHand10KDataset(*args, **kwargs)[源代码]¶

Deprecated TopDownOneHand10KDataset.

evaluate(cfg, preds, output_dir, *args, **kwargs)[源代码]¶: Evaluate keypoint results.

class mmpose.datasets.TopDownPanopticDataset(*args, **kwargs)[源代码]¶

Deprecated TopDownPanopticDataset.

evaluate(cfg, preds, output_dir, *args, **kwargs)[源代码]¶: Evaluate keypoint results.

class mmpose.datasets.TopDownPoseTrack18Dataset(ann_file, img_prefix, data_cfg, pipeline, dataset_info=None, test_mode=False)[源代码]¶

PoseTrack18 dataset for top-down pose estimation.

“Posetrack: A benchmark for human pose estimation and tracking”, CVPR’2018. More details can be found in the paper .

The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.

PoseTrack2018 keypoint indexes:

'nose',
'head_bottom',
'head_top',
'left_ear',
'right_ear',
'left_shoulder',
'right_shoulder',
'left_elbow',
'right_elbow',
'left_wrist',
'right_wrist',
'left_hip',
'right_hip',
'left_knee',
'right_knee',
'left_ankle',
'right_ankle'

参数

ann_file (str) – Path to the annotation file.
img_prefix (str) – Path to a directory where images are held. Default: None.
data_cfg (dict) – config
pipeline (list[dict | callable]) – A sequence of data transforms.
dataset_info (DatasetInfo) – A class containing all dataset info.
test_mode (bool) – Store True when building test or validation dataset. Default: False.

evaluate(results, res_folder=None, metric='mAP', **kwargs)[源代码]¶

Evaluate posetrack keypoint results. The pose prediction results will be saved in ${res_folder}/result_keypoints.json.

注解

num_keypoints: K

参数

results (list[dict]) –
Testing results containing the following items:
- preds (np.ndarray[N,K,3]): The first two dimensions are coordinates, score is the third dimension of the array.
- boxes (np.ndarray[N,6]): [center[0], center[1], scale[0], scale[1],area, score]
- image_paths (list[str]): For example, [‘val/010016_mpii_test /000024.jpg’]
- heatmap (np.ndarray[N, K, H, W]): model output heatmap.
- bbox_id (list(int))
res_folder (str, optional) – The folder to save the testing results. If not specified, a temp folder will be created. Default: None.
metric (str | list[str]) – Metric to be performed. Defaults: ‘mAP’.

返回

Evaluation results for evaluation metric.

返回类型

dict

class mmpose.datasets.TopDownPoseTrack18VideoDataset(ann_file, img_prefix, data_cfg, pipeline, dataset_info=None, test_mode=False, ph_fill_len=6)[源代码]¶

PoseTrack18 dataset for top-down pose estimation.

“Posetrack: A benchmark for human pose estimation and tracking”, CVPR’2018. More details can be found in the paper .

The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.

PoseTrack2018 keypoint indexes:

'nose',
'head_bottom',
'head_top',
'left_ear',
'right_ear',
'left_shoulder',
'right_shoulder',
'left_elbow',
'right_elbow',
'left_wrist',
'right_wrist',
'left_hip',
'right_hip',
'left_knee',
'right_knee',
'left_ankle',
'right_ankle'

参数

ann_file (str) – Path to the annotation file.
img_prefix (str) – Path to a directory where videos/images are held. Default: None.
data_cfg (dict) – config
pipeline (list[dict | callable]) – A sequence of data transforms.
dataset_info (DatasetInfo) – A class containing all dataset info.
test_mode (bool) – Store True when building test or validation dataset. Default: False.
ph_fill_len (int) – The length of the placeholder to fill in the image filenames, default: 6 in PoseTrack18.

evaluate(results, res_folder=None, metric='mAP', **kwargs)[源代码]¶

Evaluate posetrack keypoint results. The pose prediction results will be saved in ${res_folder}/result_keypoints.json.

注解

num_keypoints: K

参数

results (list[dict]) –
Testing results containing the following items:
- preds (np.ndarray[N,K,3]): The first two dimensions are coordinates, score is the third dimension of the array.
- boxes (np.ndarray[N,6]): [center[0], center[1], scale[0], scale[1],area, score]
- image_paths (list[str]): For example, [‘val/010016_mpii_test /000024.jpg’]
- heatmap (np.ndarray[N, K, H, W]): model output heatmap.
- bbox_id (list(int))
res_folder (str, optional) – The folder to save the testing results. If not specified, a temp folder will be created. Default: None.
metric (str | list[str]) – Metric to be performed. Defaults: ‘mAP’.

返回

Evaluation results for evaluation metric.

返回类型

dict

mmpose.datasets.build_dataloader(dataset, samples_per_gpu, workers_per_gpu, num_gpus=1, dist=True, shuffle=True, seed=None, drop_last=True, pin_memory=True, **kwargs)[源代码]¶

Build PyTorch DataLoader.

In distributed training, each GPU/process has a dataloader. In non-distributed training, there is only one dataloader for all GPUs.

参数

dataset (Dataset) – A PyTorch dataset.
samples_per_gpu (int) – Number of training samples on each GPU, i.e., batch size of each GPU.
workers_per_gpu (int) – How many subprocesses to use for data loading for each GPU.
num_gpus (int) – Number of GPUs. Only used in non-distributed training.
dist (bool) – Distributed training/test or not. Default: True.
shuffle (bool) – Whether to shuffle the data at every epoch. Default: True.
drop_last (bool) – Whether to drop the last incomplete batch in epoch. Default: True
pin_memory (bool) – Whether to use pin_memory in DataLoader. Default: True
kwargs – any keyword argument to be used to initialize DataLoader

返回

A PyTorch dataloader.

返回类型

DataLoader

mmpose.datasets.build_dataset(cfg, default_args=None)[源代码]¶

Build a dataset from config dict.

参数

cfg (dict) – Config dict. It should at least contain the key “type”.
default_args (dict, optional) – Default initialization arguments. Default: None.

返回

The constructed dataset.

返回类型

Dataset

datasets¶

class mmpose.datasets.datasets.top_down.TopDownAicDataset(ann_file, img_prefix, data_cfg, pipeline, dataset_info=None, test_mode=False)[源代码]

AicDataset dataset for top-down pose estimation.

“AI Challenger : A Large-scale Dataset for Going Deeper in Image Understanding”, arXiv’2017. More details can be found in the paper

The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.

AIC keypoint indexes:

"right_shoulder",
"right_elbow",
"right_wrist",
"left_shoulder",
"left_elbow",
"left_wrist",
"right_hip",
"right_knee",
"right_ankle",
"left_hip",
"left_knee",
"left_ankle",
"head_top",
"neck"

参数

ann_file (str) – Path to the annotation file.
img_prefix (str) – Path to a directory where images are held. Default: None.
data_cfg (dict) – config
pipeline (list[dict | callable]) – A sequence of data transforms.
dataset_info (DatasetInfo) – A class containing all dataset info.
test_mode (bool) – Store True when building test or validation dataset. Default: False.

class mmpose.datasets.datasets.top_down.TopDownCocoDataset(ann_file, img_prefix, data_cfg, pipeline, dataset_info=None, test_mode=False)[源代码]

CocoDataset dataset for top-down pose estimation.

“Microsoft COCO: Common Objects in Context”, ECCV’2014. More details can be found in the paper .

The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.

COCO keypoint indexes:

'nose',
'left_eye',
'right_eye',
'left_ear',
'right_ear',
'left_shoulder',
'right_shoulder',
'left_elbow',
'right_elbow',
'left_wrist',
'right_wrist',
'left_hip',
'right_hip',
'left_knee',
'right_knee',
'left_ankle',
'right_ankle'

参数

ann_file (str) – Path to the annotation file.
img_prefix (str) – Path to a directory where images are held. Default: None.
data_cfg (dict) – config
pipeline (list[dict | callable]) – A sequence of data transforms.
dataset_info (DatasetInfo) – A class containing all dataset info.
test_mode (bool) – Store True when building test or validation dataset. Default: False.

evaluate(results, res_folder=None, metric='mAP', **kwargs)[源代码]

Evaluate coco keypoint results. The pose prediction results will be saved in ${res_folder}/result_keypoints.json.

注解

batch_size: N
num_keypoints: K
heatmap height: H
heatmap width: W

参数

results (list[dict]) –
Testing results containing the following items:
- preds (np.ndarray[N,K,3]): The first two dimensions are coordinates, score is the third dimension of the array.
- boxes (np.ndarray[N,6]): [center[0], center[1], scale[0], scale[1],area, score]
- image_paths (list[str]): For example, [‘data/coco/val2017 /000000393226.jpg’]
- heatmap (np.ndarray[N, K, H, W]): model output heatmap
- bbox_id (list(int)).
res_folder (str, optional) – The folder to save the testing results. If not specified, a temp folder will be created. Default: None.
metric (str | list[str]) – Metric to be performed. Defaults: ‘mAP’.

返回

Evaluation results for evaluation metric.

返回类型

dict

class mmpose.datasets.datasets.top_down.TopDownCocoWholeBodyDataset(ann_file, img_prefix, data_cfg, pipeline, dataset_info=None, test_mode=False)[源代码]

CocoWholeBodyDataset dataset for top-down pose estimation.

“Whole-Body Human Pose Estimation in the Wild”, ECCV’2020. More details can be found in the paper .

The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.

COCO-WholeBody keypoint indexes:

0-16: 17 body keypoints,
17-22: 6 foot keypoints,
23-90: 68 face keypoints,
91-132: 42 hand keypoints

In total, we have 133 keypoints for wholebody pose estimation.

参数

ann_file (str) – Path to the annotation file.
img_prefix (str) – Path to a directory where images are held. Default: None.
data_cfg (dict) – config
pipeline (list[dict | callable]) – A sequence of data transforms.
dataset_info (DatasetInfo) – A class containing all dataset info.
test_mode (bool) – Store True when building test or validation dataset. Default: False.

class mmpose.datasets.datasets.top_down.TopDownCrowdPoseDataset(ann_file, img_prefix, data_cfg, pipeline, dataset_info=None, test_mode=False)[源代码]

CrowdPoseDataset dataset for top-down pose estimation.

“CrowdPose: Efficient Crowded Scenes Pose Estimation and A New Benchmark”, CVPR’2019. More details can be found in the paper.

The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.

CrowdPose keypoint indexes:

'left_shoulder',
'right_shoulder',
'left_elbow',
'right_elbow',
'left_wrist',
'right_wrist',
'left_hip',
'right_hip',
'left_knee',
'right_knee',
'left_ankle',
'right_ankle',
'top_head',
'neck'

参数

ann_file (str) – Path to the annotation file.
img_prefix (str) – Path to a directory where images are held. Default: None.
data_cfg (dict) – config
pipeline (list[dict | callable]) – A sequence of data transforms.
dataset_info (DatasetInfo) – A class containing all dataset info.
test_mode (bool) – Store True when building test or validation dataset. Default: False.

class mmpose.datasets.datasets.top_down.TopDownH36MDataset(ann_file, img_prefix, data_cfg, pipeline, dataset_info=None, test_mode=False)[源代码]

Human3.6M dataset for top-down 2D pose estimation.

“Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments”, TPAMI`2014. More details can be found in the paper.

Human3.6M keypoint indexes:

'root (pelvis)',
'right_hip',
'right_knee',
'right_foot',
'left_hip',
'left_knee',
'left_foot',
'spine',
'thorax',
'neck_base',
'head',
'left_shoulder',
'left_elbow',
'left_wrist',
'right_shoulder',
'right_elbow',
'right_wrist'

参数

ann_file (str) – Path to the annotation file.
img_prefix (str) – Path to a directory where images are held. Default: None.
data_cfg (dict) – config
pipeline (list[dict | callable]) – A sequence of data transforms.
dataset_info (DatasetInfo) – A class containing all dataset info.
test_mode (bool) – Store True when building test or validation dataset. Default: False.

evaluate(results, res_folder=None, metric='PCK', **kwargs)[源代码]

Evaluate human3.6m 2d keypoint results. The pose prediction results will be saved in ${res_folder}/result_keypoints.json.

注解

batch_size: N
num_keypoints: K
heatmap height: H
heatmap width: W

参数

results (list[dict]) –
Testing results containing the following items:
- preds (np.ndarray[N,K,3]): The first two dimensions are
  coordinates, score is the third dimension of the array.
- boxes (np.ndarray[N,6]): [center[0], center[1], scale[0],
  scale[1],area, score]
- image_paths (list[str]): For example, [‘data/coco/val2017
  /000000393226.jpg’]
- heatmap (np.ndarray[N, K, H, W]): model output heatmap
- bbox_id (list(int)).
res_folder (str, optional) – The folder to save the testing results. If not specified, a temp folder will be created. Default: None.
metric (str | list[str]) – Metric to be performed. Defaults: ‘PCK’.

返回

Evaluation results for evaluation metric.

返回类型

dict

class mmpose.datasets.datasets.top_down.TopDownHalpeDataset(ann_file, img_prefix, data_cfg, pipeline, dataset_info=None, test_mode=False)[源代码]

HalpeDataset for top-down pose estimation.

‘https://github.com/Fang-Haoshu/Halpe-FullBody’

The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.

Halpe keypoint indexes:

0-19: 20 body keypoints,
20-25: 6 foot keypoints,
26-93: 68 face keypoints,
94-135: 42 hand keypoints

In total, we have 136 keypoints for wholebody pose estimation.

参数

ann_file (str) – Path to the annotation file.
img_prefix (str) – Path to a directory where images are held. Default: None.
data_cfg (dict) – config
pipeline (list[dict | callable]) – A sequence of data transforms.
dataset_info (DatasetInfo) – A class containing all dataset info.
test_mode (bool) – Store True when building test or validation dataset. Default: False.

class mmpose.datasets.datasets.top_down.TopDownJhmdbDataset(ann_file, img_prefix, data_cfg, pipeline, dataset_info=None, test_mode=False)[源代码]

JhmdbDataset dataset for top-down pose estimation.

“Towards understanding action recognition”, ICCV’2013. More details can be found in the paper

The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.

sub-JHMDB keypoint indexes:

"neck",
"belly",
"head",
"right_shoulder",
"left_shoulder",
"right_hip",
"left_hip",
"right_elbow",
"left_elbow",
"right_knee",
"left_knee",
"right_wrist",
"left_wrist",
"right_ankle",
"left_ankle"

参数

ann_file (str) – Path to the annotation file.
img_prefix (str) – Path to a directory where images are held. Default: None.
data_cfg (dict) – config
pipeline (list[dict | callable]) – A sequence of data transforms.
dataset_info (DatasetInfo) – A class containing all dataset info.
test_mode (bool) – Store True when building test or validation dataset. Default: False.

evaluate(results, res_folder=None, metric='PCK', **kwargs)[源代码]

Evaluate onehand10k keypoint results. The pose prediction results will be saved in ${res_folder}/result_keypoints.json.

注解

batch_size: N
num_keypoints: K
heatmap height: H
heatmap width: W

参数

results (list[dict]) –
Testing results containing the following items:
- preds (np.ndarray[N,K,3]): The first two dimensions are coordinates, score is the third dimension of the array.
- boxes (np.ndarray[N,6]): [center[0], center[1], scale[0], scale[1],area, score]
- image_path (list[str])
- output_heatmap (np.ndarray[N, K, H, W]): model outputs.
res_folder (str, optional) – The folder to save the testing results. If not specified, a temp folder will be created. Default: None.
metric (str | list[str]) – Metric to be performed. Options: ‘PCK’, ‘tPCK’. PCK means normalized by the bounding boxes, while tPCK means normalized by the torso size.

返回

Evaluation results for evaluation metric.

返回类型

dict

class mmpose.datasets.datasets.top_down.TopDownMhpDataset(ann_file, img_prefix, data_cfg, pipeline, dataset_info=None, test_mode=False)[源代码]

MHPv2.0 dataset for top-down pose estimation.

“Understanding Humans in Crowded Scenes: Deep Nested Adversarial Learning and A New Benchmark for Multi-Human Parsing”, ACM MM’2018. More details can be found in the paper

Note that, the evaluation metric used here is mAP (adapted from COCO), which may be different from the official evaluation codes. ‘https://github.com/ZhaoJ9014/Multi-Human-Parsing/tree/master/’ ‘Evaluation/Multi-Human-Pose’ Please be cautious if you use the results in papers.

The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.

MHP keypoint indexes:

"right ankle",
"right knee",
"right hip",
"left hip",
"left knee",
"left ankle",
"pelvis",
"thorax",
"upper neck",
"head top",
"right wrist",
"right elbow",
"right shoulder",
"left shoulder",
"left elbow",
"left wrist",

参数

ann_file (str) – Path to the annotation file.
img_prefix (str) – Path to a directory where images are held. Default: None.
data_cfg (dict) – config
pipeline (list[dict | callable]) – A sequence of data transforms.
dataset_info (DatasetInfo) – A class containing all dataset info.
test_mode (bool) – Store True when building test or validation dataset. Default: False.

class mmpose.datasets.datasets.top_down.TopDownMpiiDataset(ann_file, img_prefix, data_cfg, pipeline, dataset_info=None, test_mode=False)[源代码]

MPII Dataset for top-down pose estimation.

“2D Human Pose Estimation: New Benchmark and State of the Art Analysis” ,CVPR’2014. More details can be found in the paper .

The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.

MPII keypoint indexes:

'right_ankle'
'right_knee',
'right_hip',
'left_hip',
'left_knee',
'left_ankle',
'pelvis',
'thorax',
'upper_neck',
'head_top',
'right_wrist',
'right_elbow',
'right_shoulder',
'left_shoulder',
'left_elbow',
'left_wrist'

参数

ann_file (str) – Path to the annotation file.
img_prefix (str) – Path to a directory where images are held. Default: None.
data_cfg (dict) – config
pipeline (list[dict | callable]) – A sequence of data transforms.
dataset_info (DatasetInfo) – A class containing all dataset info.
test_mode (bool) – Store True when building test or validation dataset. Default: False.

evaluate(results, res_folder=None, metric='PCKh', **kwargs)[源代码]

Evaluate PCKh for MPII dataset. Adapted from https://github.com/leoxiaobin/deep-high-resolution-net.pytorch Copyright (c) Microsoft, under the MIT License.

注解

batch_size: N
num_keypoints: K
heatmap height: H
heatmap width: W

参数

results (list[dict]) –
Testing results containing the following items:
- preds (np.ndarray[N,K,3]): The first two dimensions are coordinates, score is the third dimension of the array.
- boxes (np.ndarray[N,6]): [center[0], center[1], scale[0], scale[1],area, score]
- image_paths (list[str]): For example, [‘/val2017/000000 397133.jpg’]
- heatmap (np.ndarray[N, K, H, W]): model output heatmap.
res_folder (str, optional) – The folder to save the testing results. Default: None.
metric (str | list[str]) – Metrics to be performed. Defaults: ‘PCKh’.

返回

PCKh for each joint

返回类型

dict

class mmpose.datasets.datasets.top_down.TopDownMpiiTrbDataset(ann_file, img_prefix, data_cfg, pipeline, dataset_info=None, test_mode=False)[源代码]

MPII-TRB Dataset dataset for top-down pose estimation.

“TRB: A Novel Triplet Representation for Understanding 2D Human Body”, ICCV’2019. More details can be found in the paper .

The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.

MPII-TRB keypoint indexes:

'left_shoulder'
'right_shoulder'
'left_elbow'
'right_elbow'
'left_wrist'
'right_wrist'
'left_hip'
'right_hip'
'left_knee'
'right_knee'
'left_ankle'
'right_ankle'
'head'
'neck'

'right_neck'
'left_neck'
'medial_right_shoulder'
'lateral_right_shoulder'
'medial_right_bow'
'lateral_right_bow'
'medial_right_wrist'
'lateral_right_wrist'
'medial_left_shoulder'
'lateral_left_shoulder'
'medial_left_bow'
'lateral_left_bow'
'medial_left_wrist'
'lateral_left_wrist'
'medial_right_hip'
'lateral_right_hip'
'medial_right_knee'
'lateral_right_knee'
'medial_right_ankle'
'lateral_right_ankle'
'medial_left_hip'
'lateral_left_hip'
'medial_left_knee'
'lateral_left_knee'
'medial_left_ankle'
'lateral_left_ankle'

参数

ann_file (str) – Path to the annotation file.
img_prefix (str) – Path to a directory where images are held. Default: None.
data_cfg (dict) – config
pipeline (list[dict | callable]) – A sequence of data transforms.
dataset_info (DatasetInfo) – A class containing all dataset info.
test_mode (bool) – Store True when building test or validation dataset. Default: False.

evaluate(results, res_folder=None, metric='PCKh', **kwargs)[源代码]

Evaluate PCKh for MPII-TRB dataset.

注解

batch_size: N
num_keypoints: K
heatmap height: H
heatmap width: W

参数

results (list[dict]) –
Testing results containing the following items:
- preds (np.ndarray[N,K,3]): The first two dimensions are coordinates, score is the third dimension of the array.
- boxes (np.ndarray[N,6]): [center[0], center[1], scale[0], scale[1],area, score]
- image_paths (list[str]): For example, [‘/val2017/ 000000397133.jpg’]
- heatmap (np.ndarray[N, K, H, W]): model output heatmap.
- bbox_ids (list[str]): For example, [‘27407’].
res_folder (str, optional) – The folder to save the testing results. If not specified, a temp folder will be created. Default: None.
metric (str | list[str]) – Metrics to be performed. Defaults: ‘PCKh’.

返回

PCKh for each joint

返回类型

dict

class mmpose.datasets.datasets.top_down.TopDownOCHumanDataset(ann_file, img_prefix, data_cfg, pipeline, dataset_info=None, test_mode=False)[源代码]

OChuman dataset for top-down pose estimation.

“Pose2Seg: Detection Free Human Instance Segmentation”, CVPR’2019. More details can be found in the paper .

“Occluded Human (OCHuman)” dataset contains 8110 heavily occluded human instances within 4731 images. OCHuman dataset is designed for validation and testing. To evaluate on OCHuman, the model should be trained on COCO training set, and then test the robustness of the model to occlusion using OCHuman.

OCHuman keypoint indexes (same as COCO):

'nose',
'left_eye',
'right_eye',
'left_ear',
'right_ear',
'left_shoulder',
'right_shoulder',
'left_elbow',
'right_elbow',
'left_wrist',
'right_wrist',
'left_hip',
'right_hip',
'left_knee',
'right_knee',
'left_ankle',
'right_ankle'

参数

ann_file (str) – Path to the annotation file.
img_prefix (str) – Path to a directory where images are held. Default: None.
data_cfg (dict) – config
pipeline (list[dict | callable]) – A sequence of data transforms.
dataset_info (DatasetInfo) – A class containing all dataset info.
test_mode (bool) – Store True when building test or validation dataset. Default: False.

class mmpose.datasets.datasets.top_down.TopDownPoseTrack18Dataset(ann_file, img_prefix, data_cfg, pipeline, dataset_info=None, test_mode=False)[源代码]

PoseTrack18 dataset for top-down pose estimation.

“Posetrack: A benchmark for human pose estimation and tracking”, CVPR’2018. More details can be found in the paper .

The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.

PoseTrack2018 keypoint indexes:

'nose',
'head_bottom',
'head_top',
'left_ear',
'right_ear',
'left_shoulder',
'right_shoulder',
'left_elbow',
'right_elbow',
'left_wrist',
'right_wrist',
'left_hip',
'right_hip',
'left_knee',
'right_knee',
'left_ankle',
'right_ankle'

参数

ann_file (str) – Path to the annotation file.
img_prefix (str) – Path to a directory where images are held. Default: None.
data_cfg (dict) – config
pipeline (list[dict | callable]) – A sequence of data transforms.
dataset_info (DatasetInfo) – A class containing all dataset info.
test_mode (bool) – Store True when building test or validation dataset. Default: False.

evaluate(results, res_folder=None, metric='mAP', **kwargs)[源代码]

Evaluate posetrack keypoint results. The pose prediction results will be saved in ${res_folder}/result_keypoints.json.

注解

num_keypoints: K

参数

results (list[dict]) –
Testing results containing the following items:
- preds (np.ndarray[N,K,3]): The first two dimensions are coordinates, score is the third dimension of the array.
- boxes (np.ndarray[N,6]): [center[0], center[1], scale[0], scale[1],area, score]
- image_paths (list[str]): For example, [‘val/010016_mpii_test /000024.jpg’]
- heatmap (np.ndarray[N, K, H, W]): model output heatmap.
- bbox_id (list(int))
res_folder (str, optional) – The folder to save the testing results. If not specified, a temp folder will be created. Default: None.
metric (str | list[str]) – Metric to be performed. Defaults: ‘mAP’.

返回

Evaluation results for evaluation metric.

返回类型

dict

class mmpose.datasets.datasets.top_down.TopDownPoseTrack18VideoDataset(ann_file, img_prefix, data_cfg, pipeline, dataset_info=None, test_mode=False, ph_fill_len=6)[源代码]

PoseTrack18 dataset for top-down pose estimation.

“Posetrack: A benchmark for human pose estimation and tracking”, CVPR’2018. More details can be found in the paper .

The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.

PoseTrack2018 keypoint indexes:

'nose',
'head_bottom',
'head_top',
'left_ear',
'right_ear',
'left_shoulder',
'right_shoulder',
'left_elbow',
'right_elbow',
'left_wrist',
'right_wrist',
'left_hip',
'right_hip',
'left_knee',
'right_knee',
'left_ankle',
'right_ankle'

参数

ann_file (str) – Path to the annotation file.
img_prefix (str) – Path to a directory where videos/images are held. Default: None.
data_cfg (dict) – config
pipeline (list[dict | callable]) – A sequence of data transforms.
dataset_info (DatasetInfo) – A class containing all dataset info.
test_mode (bool) – Store True when building test or validation dataset. Default: False.
ph_fill_len (int) – The length of the placeholder to fill in the image filenames, default: 6 in PoseTrack18.

evaluate(results, res_folder=None, metric='mAP', **kwargs)[源代码]

Evaluate posetrack keypoint results. The pose prediction results will be saved in ${res_folder}/result_keypoints.json.

注解

num_keypoints: K

参数

results (list[dict]) –
Testing results containing the following items:
- preds (np.ndarray[N,K,3]): The first two dimensions are coordinates, score is the third dimension of the array.
- boxes (np.ndarray[N,6]): [center[0], center[1], scale[0], scale[1],area, score]
- image_paths (list[str]): For example, [‘val/010016_mpii_test /000024.jpg’]
- heatmap (np.ndarray[N, K, H, W]): model output heatmap.
- bbox_id (list(int))
res_folder (str, optional) – The folder to save the testing results. If not specified, a temp folder will be created. Default: None.
metric (str | list[str]) – Metric to be performed. Defaults: ‘mAP’.

返回

Evaluation results for evaluation metric.

返回类型

dict

class mmpose.datasets.datasets.bottom_up.BottomUpAicDataset(ann_file, img_prefix, data_cfg, pipeline, dataset_info=None, test_mode=False)[源代码]

Aic dataset for bottom-up pose estimation.

“AI Challenger : A Large-scale Dataset for Going Deeper in Image Understanding”, arXiv’2017. More details can be found in the paper

The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.

AIC keypoint indexes:

"right_shoulder",
"right_elbow",
"right_wrist",
"left_shoulder",
"left_elbow",
"left_wrist",
"right_hip",
"right_knee",
"right_ankle",
"left_hip",
"left_knee",
"left_ankle",
"head_top",
"neck"

参数

ann_file (str) – Path to the annotation file.
img_prefix (str) – Path to a directory where images are held. Default: None.
data_cfg (dict) – config
pipeline (list[dict | callable]) – A sequence of data transforms.
dataset_info (DatasetInfo) – A class containing all dataset info.
test_mode (bool) – Store True when building test or validation dataset. Default: False.

class mmpose.datasets.datasets.bottom_up.BottomUpCocoDataset(ann_file, img_prefix, data_cfg, pipeline, dataset_info=None, test_mode=False)[源代码]

COCO dataset for bottom-up pose estimation.

The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.

COCO keypoint indexes:

'nose',
'left_eye',
'right_eye',
'left_ear',
'right_ear',
'left_shoulder',
'right_shoulder',
'left_elbow',
'right_elbow',
'left_wrist',
'right_wrist',
'left_hip',
'right_hip',
'left_knee',
'right_knee',
'left_ankle',
'right_ankle'

参数

ann_file (str) – Path to the annotation file.
img_prefix (str) – Path to a directory where images are held. Default: None.
data_cfg (dict) – config
pipeline (list[dict | callable]) – A sequence of data transforms.
dataset_info (DatasetInfo) – A class containing all dataset info.
test_mode (bool) – Store True when building test or validation dataset. Default: False.

evaluate(results, res_folder=None, metric='mAP', **kwargs)[源代码]

Evaluate coco keypoint results. The pose prediction results will be saved in ${res_folder}/result_keypoints.json.

注解

num_people: P
num_keypoints: K

参数

results (list[dict]) –
Testing results containing the following items:
- preds (list[np.ndarray(P, K, 3+tag_num)]): Pose predictions for all people in images.
- scores (list[P]): List of person scores.
- image_path (list[str]): For example, [‘coco/images/ val2017/000000397133.jpg’]
- heatmap (np.ndarray[N, K, H, W]): model outputs.
res_folder (str, optional) – The folder to save the testing results. If not specified, a temp folder will be created. Default: None.
metric (str | list[str]) – Metric to be performed. Defaults: ‘mAP’.

返回

Evaluation results for evaluation metric.

返回类型

dict

class mmpose.datasets.datasets.bottom_up.BottomUpCocoWholeBodyDataset(ann_file, img_prefix, data_cfg, pipeline, dataset_info=None, test_mode=False)[源代码]

CocoWholeBodyDataset dataset for bottom-up pose estimation.

Whole-Body Human Pose Estimation in the Wild’, ECCV’2020. More details can be found in the `paper .

The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.

In total, we have 133 keypoints for wholebody pose estimation.

COCO-WholeBody keypoint indexes:

0-16: 17 body keypoints,
17-22: 6 foot keypoints,
23-90: 68 face keypoints,
91-132: 42 hand keypoints

参数

ann_file (str) – Path to the annotation file.
img_prefix (str) – Path to a directory where images are held. Default: None.
data_cfg (dict) – config
pipeline (list[dict | callable]) – A sequence of data transforms.
dataset_info (DatasetInfo) – A class containing all dataset info.
test_mode (bool) – Store True when building test or validation dataset. Default: False.

class mmpose.datasets.datasets.bottom_up.BottomUpCrowdPoseDataset(ann_file, img_prefix, data_cfg, pipeline, dataset_info=None, test_mode=False)[源代码]

CrowdPose dataset for bottom-up pose estimation.

“CrowdPose: Efficient Crowded Scenes Pose Estimation and A New Benchmark”, CVPR’2019. More details can be found in the paper.

The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.

CrowdPose keypoint indexes:

'left_shoulder',
'right_shoulder',
'left_elbow',
'right_elbow',
'left_wrist',
'right_wrist',
'left_hip',
'right_hip',
'left_knee',
'right_knee',
'left_ankle',
'right_ankle',
'top_head',
'neck'

参数

ann_file (str) – Path to the annotation file.
img_prefix (str) – Path to a directory where images are held. Default: None.
data_cfg (dict) – config
pipeline (list[dict | callable]) – A sequence of data transforms.
dataset_info (DatasetInfo) – A class containing all dataset info.
test_mode (bool) – Store True when building test or validation dataset. Default: False.

class mmpose.datasets.datasets.bottom_up.BottomUpMhpDataset(ann_file, img_prefix, data_cfg, pipeline, dataset_info=None, test_mode=False)[源代码]

MHPv2.0 dataset for top-down pose estimation.

“Understanding Humans in Crowded Scenes: Deep Nested Adversarial Learning and A New Benchmark for Multi-Human Parsing”, ACM MM’2018. More details can be found in the paper

The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.

MHP keypoint indexes:

"right ankle",
"right knee",
"right hip",
"left hip",
"left knee",
"left ankle",
"pelvis",
"thorax",
"upper neck",
"head top",
"right wrist",
"right elbow",
"right shoulder",
"left shoulder",
"left elbow",
"left wrist",

参数

ann_file (str) – Path to the annotation file.
img_prefix (str) – Path to a directory where images are held. Default: None.
data_cfg (dict) – config
pipeline (list[dict | callable]) – A sequence of data transforms.
dataset_info (DatasetInfo) – A class containing all dataset info.
test_mode (bool) – Store True when building test or validation dataset. Default: False.

pipelines¶

class mmpose.datasets.pipelines.loading.LoadImageFromFile(to_float32=False, color_type='color', channel_order='rgb', file_client_args={'backend': 'disk'})[源代码]¶

Loading image(s) from file.

Required key: “image_file”.

Added key: “img”.

参数

to_float32 (bool) – Whether to convert the loaded image to a float32 numpy array. If set to False, the loaded image is an uint8 array. Defaults to False.
color_type (str) – Flags specifying the color type of a loaded image, candidates are ‘color’, ‘grayscale’ and ‘unchanged’.
channel_order (str) – Order of channel, candidates are ‘bgr’ and ‘rgb’.
file_client_args (dict) – Arguments to instantiate a FileClient. See mmcv.fileio.FileClient for details. Defaults to dict(backend='disk').

class mmpose.datasets.pipelines.loading.LoadVideoFromFile(to_float32=False, file_client_args={'backend': 'disk'})[源代码]¶

Loading video(s) from file.

Required key: “video_file”.

Added key: “video”.

参数

to_float32 (bool) – Whether to convert the loaded video to a float32 numpy array. If set to False, the loaded video is an uint8 array. Defaults to False.
file_client_args (dict) – Arguments to instantiate a FileClient. See mmcv.fileio.FileClient for details. Defaults to dict(backend='disk').

class mmpose.datasets.pipelines.shared_transform.Albumentation(transforms, keymap=None)[源代码]¶

Albumentation augmentation (pixel-level transforms only). Adds custom pixel-level transformations from Albumentations library. Please visit https://albumentations.readthedocs.io to get more information.

Note: we only support pixel-level transforms. Please visit https://github.com/albumentations-team/ albumentations#pixel-level-transforms to get more information about pixel-level transforms.

An example of transforms is as followed:

[
    dict(
        type='RandomBrightnessContrast',
        brightness_limit=[0.1, 0.3],
        contrast_limit=[0.1, 0.3],
        p=0.2),
    dict(type='ChannelShuffle', p=0.1),
    dict(
        type='OneOf',
        transforms=[
            dict(type='Blur', blur_limit=3, p=1.0),
            dict(type='MedianBlur', blur_limit=3, p=1.0)
        ],
        p=0.1),
]

参数

transforms (list[dict]) – A list of Albumentation transformations
keymap (dict) – Contains {‘input key’:’albumentation-style key’}, e.g., {‘img’: ‘image’}.

albu_builder(cfg)[源代码]¶

Import a module from albumentations.

It resembles some of build_from_cfg() logic.

参数: cfg (dict) – Config dict. It should at least contain the key “type”.
返回: The constructed object.
返回类型: obj

static mapper(d, keymap)[源代码]¶

Dictionary mapper.

Renames keys according to keymap provided.

参数

d (dict) – old dict
keymap (dict) – {‘old_key’:’new_key’}

返回

new dict.

返回类型

dict

class mmpose.datasets.pipelines.shared_transform.Collect(keys, meta_keys, meta_name='img_metas')[源代码]¶

Collect data from the loader relevant to the specific task.

This keeps the items in keys as it is, and collect items in meta_keys into a meta item called meta_name.This is usually the last stage of the data loader pipeline. For example, when keys=’imgs’, meta_keys=(‘filename’, ‘label’, ‘original_shape’), meta_name=’img_metas’, the results will be a dict with keys ‘imgs’ and ‘img_metas’, where ‘img_metas’ is a DataContainer of another dict with keys ‘filename’, ‘label’, ‘original_shape’.

参数

keys (Sequence[str|tuple]) – Required keys to be collected. If a tuple (key, key_new) is given as an element, the item retrieved by key will be renamed as key_new in collected data.
meta_name (str) – The name of the key that contains meta information. This key is always populated. Default: “img_metas”.
meta_keys (Sequence[str|tuple]) – Keys that are collected under meta_name. The contents of the meta_name dictionary depends on meta_keys.

class mmpose.datasets.pipelines.shared_transform.Compose(transforms)[源代码]¶

Compose a data pipeline with a sequence of transforms.

参数: transforms (list[dict | callable]) – Either config dicts of transforms or transform objects.

class mmpose.datasets.pipelines.shared_transform.MultiItemProcess(pipeline)[源代码]¶

Process each item and merge multi-item results to lists.

参数: pipeline (dict) – Dictionary to construct pipeline for a single item.

class mmpose.datasets.pipelines.shared_transform.MultitaskGatherTarget(pipeline_list, pipeline_indices=None, keys=('target', 'target_weight'))[源代码]¶

Gather the targets for multitask heads.

参数

pipeline_list (list[list]) – List of pipelines for all heads.
pipeline_indices (list[int]) – Pipeline index of each head.

class mmpose.datasets.pipelines.shared_transform.NormalizeTensor(mean, std)[源代码]¶

Normalize the Tensor image (CxHxW), with mean and std.

Required key: ‘img’. Modifies key: ‘img’.

参数

mean (list[float]) – Mean values of 3 channels.
std (list[float]) – Std values of 3 channels.

class mmpose.datasets.pipelines.shared_transform.PhotometricDistortion(brightness_delta=32, contrast_range=(0.5, 1.5), saturation_range=(0.5, 1.5), hue_delta=18)[源代码]¶

Apply photometric distortion to image sequentially, every transformation is applied with a probability of 0.5. The position of random contrast is in second or second to last.

random brightness
random contrast (mode 0)
convert color from BGR to HSV
random saturation
random hue
convert color from HSV to BGR
random contrast (mode 1)
randomly swap channels

参数

brightness_delta (int) – delta of brightness.
contrast_range (tuple) – range of contrast.
saturation_range (tuple) – range of saturation.
hue_delta (int) – delta of hue.

brightness(img)[源代码]¶: Brightness distortion.

contrast(img)[源代码]¶: Contrast distortion.

convert(img, alpha=1, beta=0)[源代码]¶: Multiple with alpha and add beta with clip.

class mmpose.datasets.pipelines.shared_transform.RenameKeys(key_pairs)[源代码]¶

Rename the keys.

参数: key_pairs (Sequence[tuple]) – Required keys to be renamed. If a tuple (key_src, key_tgt) is given as an element, the item retrieved by key_src will be renamed as key_tgt.

class mmpose.datasets.pipelines.shared_transform.ToTensor(device='cpu')[源代码]¶

Transform image to Tensor.

Required key: ‘img’. Modifies key: ‘img’.

参数: results (dict) – contain all information about training.

class mmpose.datasets.pipelines.top_down_transform.TopDownAffine(use_udp=False)[源代码]¶

Affine transform the image to make input.

Required key:’img’, ‘joints_3d’, ‘joints_3d_visible’, ‘ann_info’,’scale’, ‘rotation’ and ‘center’.

Modified key:’img’, ‘joints_3d’, and ‘joints_3d_visible’.

参数: use_udp (bool) – To use unbiased data processing. Paper ref: Huang et al. The Devil is in the Details: Delving into Unbiased Data Processing for Human Pose Estimation (CVPR 2020).

class mmpose.datasets.pipelines.top_down_transform.TopDownGenerateTarget(sigma=2, kernel=(11, 11), valid_radius_factor=0.0546875, target_type='GaussianHeatmap', encoding='MSRA', unbiased_encoding=False)[源代码]¶

Generate the target heatmap.

Required key: ‘joints_3d’, ‘joints_3d_visible’, ‘ann_info’.

Modified key: ‘target’, and ‘target_weight’.

参数

sigma – Sigma of heatmap gaussian for ‘MSRA’ approach.
kernel – Kernel of heatmap gaussian for ‘Megvii’ approach.
encoding (str) – Approach to generate target heatmaps. Currently supported approaches: ‘MSRA’, ‘Megvii’, ‘UDP’. Default:’MSRA’
unbiased_encoding (bool) – Option to use unbiased encoding methods. Paper ref: Zhang et al. Distribution-Aware Coordinate Representation for Human Pose Estimation (CVPR 2020).
keypoint_pose_distance – Keypoint pose distance for UDP. Paper ref: Huang et al. The Devil is in the Details: Delving into Unbiased Data Processing for Human Pose Estimation (CVPR 2020).
target_type (str) – supported targets: ‘GaussianHeatmap’, ‘CombinedTarget’. Default:’GaussianHeatmap’ CombinedTarget: The combination of classification target (response map) and regression target (offset map). Paper ref: Huang et al. The Devil is in the Details: Delving into Unbiased Data Processing for Human Pose Estimation (CVPR 2020).

class mmpose.datasets.pipelines.top_down_transform.TopDownGenerateTargetRegression[源代码]¶

Generate the target regression vector (coordinates).

Required key: ‘joints_3d’, ‘joints_3d_visible’, ‘ann_info’. Modified key: ‘target’, and ‘target_weight’.

class mmpose.datasets.pipelines.top_down_transform.TopDownGetBboxCenterScale(padding: float = 1.25)[源代码]¶

Convert bbox from [x, y, w, h] to center and scale.

The center is the coordinates of the bbox center, and the scale is the bbox width and height normalized by a scale factor.

Required key: ‘bbox’, ‘ann_info’

Modifies key: ‘center’, ‘scale’

参数: padding (float) – bbox padding scale that will be multilied to scale. Default: 1.25

class mmpose.datasets.pipelines.top_down_transform.TopDownGetRandomScaleRotation(rot_factor=40, scale_factor=0.5, rot_prob=0.6)[源代码]¶

Data augmentation with random scaling & rotating.

Required key: ‘scale’.

Modifies key: ‘scale’ and ‘rotation’.

参数

rot_factor (int) – Rotating to [-2*rot_factor, 2*rot_factor].
scale_factor (float) – Scaling to [1-scale_factor, 1+scale_factor].
rot_prob (float) – Probability of random rotation.

class mmpose.datasets.pipelines.top_down_transform.TopDownHalfBodyTransform(num_joints_half_body=8, prob_half_body=0.3)[源代码]¶

Data augmentation with half-body transform. Keep only the upper body or the lower body at random.

Required key: ‘joints_3d’, ‘joints_3d_visible’, and ‘ann_info’.

Modifies key: ‘scale’ and ‘center’.

参数

num_joints_half_body (int) – Threshold of performing half-body transform. If the body has fewer number of joints (< num_joints_half_body), ignore this step.
prob_half_body (float) – Probability of half-body transform.

static half_body_transform(cfg, joints_3d, joints_3d_visible)[源代码]¶: Get center&scale for half-body transform.

class mmpose.datasets.pipelines.top_down_transform.TopDownRandomFlip(flip_prob=0.5)[源代码]¶

Data augmentation with random image flip.

Required key: ‘img’, ‘joints_3d’, ‘joints_3d_visible’, ‘center’ and ‘ann_info’.

Modifies key: ‘img’, ‘joints_3d’, ‘joints_3d_visible’, ‘center’ and ‘flipped’.

参数

flip (bool) – Option to perform random flip.
flip_prob (float) – Probability of flip.

class mmpose.datasets.pipelines.top_down_transform.TopDownRandomShiftBboxCenter(shift_factor: float = 0.16, prob: float = 0.3)[源代码]¶

Random shift the bbox center.

Required key: ‘center’, ‘scale’

Modifies key: ‘center’

参数

shift_factor (float) – The factor to control the shift range, which is scale*pixel_std*scale_factor. Default: 0.16
prob (float) – Probability of applying random shift. Default: 0.3

class mmpose.datasets.pipelines.bottom_up_transform.BottomUpGenerateHeatmapTarget(sigma, bg_weight=1.0, gen_center_heatmap=False, use_udp=False)[源代码]¶

Generate multi-scale heatmap target for bottom-up.

Required key: ‘joints’, ‘mask’ and ‘center’.

Modifies key: ‘target’, ‘heatmaps’ and ‘masks’.

参数

sigma (int or tuple) – Sigma of heatmap Gaussian. If sigma is a tuple, the first item should be the sigma of keypoints and the second item should be the sigma of center.
bg_weight (float) – Weight for background. Default: 1.0.
gen_center_heatmap (bool) – Whether to generate heatmaps for instance centers. Default: False.
use_udp (bool) – To use unbiased data processing. Paper ref: Huang et al. The Devil is in the Details: Delving into Unbiased Data Processing for Human Pose Estimation (CVPR 2020).

class mmpose.datasets.pipelines.bottom_up_transform.BottomUpGenerateOffsetTarget(radius=4)[源代码]¶

Generate multi-scale offset target for bottom-up.

Required key: ‘center’, ‘joints and ‘area’.

Modifies key: ‘offsets’, ‘offset_weights.

参数: radius (int) – Radius of labeled area for each instance.

class mmpose.datasets.pipelines.bottom_up_transform.BottomUpGeneratePAFTarget(limb_width, skeleton=None)[源代码]¶

Generate multi-scale heatmaps and part affinity fields (PAF) target for bottom-up. Paper ref: Cao et al. Realtime Multi-Person 2D Human Pose Estimation using Part Affinity Fields (CVPR 2017).

参数: limb_width (int) – Limb width of part affinity fields

class mmpose.datasets.pipelines.bottom_up_transform.BottomUpGenerateTarget(sigma, max_num_people, use_udp=False)[源代码]¶

Generate multi-scale heatmap target for associate embedding.

参数

sigma (int) – Sigma of heatmap Gaussian
max_num_people (int) – Maximum number of people in an image
use_udp (bool) – To use unbiased data processing. Paper ref: Huang et al. The Devil is in the Details: Delving into Unbiased Data Processing for Human Pose Estimation (CVPR 2020).

class mmpose.datasets.pipelines.bottom_up_transform.BottomUpGetImgSize(test_scale_factor, current_scale=1, base_length=64, use_udp=False)[源代码]¶

Get multi-scale image sizes for bottom-up, including base_size and test_scale_factor. Keep the ratio and the image is resized to results[‘ann_info’][‘image_size’]×current_scale.

参数

test_scale_factor (List[float]) – Multi scale
current_scale (int) – default 1
base_length (int) – The width and height should be multiples of base_length. Default: 64.
use_udp (bool) – To use unbiased data processing. Paper ref: Huang et al. The Devil is in the Details: Delving into Unbiased Data Processing for Human Pose Estimation (CVPR 2020).

class mmpose.datasets.pipelines.bottom_up_transform.BottomUpRandomAffine(rot_factor, scale_factor, scale_type, trans_factor, use_udp=False)[源代码]¶

Data augmentation with random scaling & rotating.

参数

rot_factor (int) – Rotating to [-rotation_factor, rotation_factor]
scale_factor (float) – Scaling to [1-scale_factor, 1+scale_factor]
scale_type – wrt long or short length of the image.
trans_factor – Translation factor.
use_udp (bool) – To use unbiased data processing. Paper ref: Huang et al. The Devil is in the Details: Delving into Unbiased Data Processing for Human Pose Estimation (CVPR 2020).

class mmpose.datasets.pipelines.bottom_up_transform.BottomUpRandomFlip(flip_prob=0.5)[源代码]¶

Data augmentation with random image flip for bottom-up.

参数: flip_prob (float) – Probability of flip.

class mmpose.datasets.pipelines.bottom_up_transform.BottomUpResizeAlign(transforms, base_length=64, use_udp=False)[源代码]¶

Resize multi-scale size and align transform for bottom-up.

参数

transforms (List) – ToTensor & Normalize
base_length (int) – The width and height should be multiples of base_length. Default: 64.
use_udp (bool) – To use unbiased data processing. Paper ref: Huang et al. The Devil is in the Details: Delving into Unbiased Data Processing for Human Pose Estimation (CVPR 2020).

class mmpose.datasets.pipelines.bottom_up_transform.CIDGenerateTarget(max_num_people)[源代码]¶

Generate target for CID training.

参数: max_num_people (int) – Maximum number of people in an image

class mmpose.datasets.pipelines.bottom_up_transform.GetKeypointCenterArea(minimal_area=32)[源代码]¶

Copmute center and area from keypoitns for each instance.

Required key: ‘joints’.

Modifies key: ‘center’ and ‘area’.

参数: minimal_area (float) – Minimum of allowed area. Instance with smaller area will be ignored in training. Default: 32.

class mmpose.datasets.pipelines.bottom_up_transform.HeatmapGenerator(output_size, num_joints, sigma=- 1, use_udp=False)[源代码]¶

Generate heatmaps for bottom-up models.

参数

num_joints (int) – Number of keypoints
output_size (np.ndarray) – Size (w, h) of feature map
sigma (int) – Sigma of the heatmaps.
use_udp (bool) – To use unbiased data processing. Paper ref: Huang et al. The Devil is in the Details: Delving into Unbiased Data Processing for Human Pose Estimation (CVPR 2020).

class mmpose.datasets.pipelines.bottom_up_transform.JointsEncoder(max_num_people, num_joints, output_size, tag_per_joint)[源代码]¶

Encodes the visible joints into (coordinates, score); The coordinate of one joint and its score are of int type.

(idx * output_size**2 + y * output_size + x, 1) or (0, 0).

参数

max_num_people (int) – Max number of people in an image
num_joints (int) – Number of keypoints
output_size (np.ndarray) – Size (w, h) of feature map
tag_per_joint (bool) – Option to use one tag map per joint.

class mmpose.datasets.pipelines.bottom_up_transform.OffsetGenerator(output_size, num_joints, radius=4)[源代码]¶

Generate offset maps for bottom-up models.

参数

num_joints (int) – Number of keypoints
output_size (np.ndarray) – Size (w, h) of feature map
radius (int) – Radius of area assigned with valid offset

class mmpose.datasets.pipelines.bottom_up_transform.PAFGenerator(output_size, limb_width, skeleton)[源代码]¶

Generate part affinity fields.

参数

output_size (np.ndarray) – Size (w, h) of feature map.
limb_width (int) – Limb width of part affinity fields.
skeleton (list[list]) – connections of joints.

class mmpose.datasets.pipelines.mesh_transform.IUVToTensor[源代码]¶

Transform IUV image to part index mask and uv coordinates image. The 3 channels of IUV image means: part index, u coordinates, v coordinates.

Required key: ‘iuv’, ‘ann_info’. Modifies key: ‘part_index’, ‘uv_coordinates’.

参数: results (dict) – contain all information about training.

class mmpose.datasets.pipelines.mesh_transform.LoadIUVFromFile(to_float32=False)[源代码]¶: Loading IUV image from file.

class mmpose.datasets.pipelines.mesh_transform.MeshAffine[源代码]¶

Affine transform the image to get input image. Affine transform the 2D keypoints, 3D kepoints and IUV image too.

Required keys: ‘img’, ‘joints_2d’,’joints_2d_visible’, ‘joints_3d’, ‘joints_3d_visible’, ‘pose’, ‘iuv’, ‘ann_info’,’scale’, ‘rotation’ and ‘center’. Modifies key: ‘img’, ‘joints_2d’,’joints_2d_visible’, ‘joints_3d’, ‘pose’, ‘iuv’.

class mmpose.datasets.pipelines.mesh_transform.MeshGetRandomScaleRotation(rot_factor=30, scale_factor=0.25, rot_prob=0.6)[源代码]¶

Data augmentation with random scaling & rotating.

Required key: ‘scale’. Modifies key: ‘scale’ and ‘rotation’.

参数

rot_factor (int) – Rotating to [-2*rot_factor, 2*rot_factor].
scale_factor (float) – Scaling to [1-scale_factor, 1+scale_factor].
rot_prob (float) – Probability of random rotation.

class mmpose.datasets.pipelines.mesh_transform.MeshRandomChannelNoise(noise_factor=0.4)[源代码]¶

Data augmentation with random channel noise.

Required keys: ‘img’ Modifies key: ‘img’

参数: noise_factor (float) – Multiply each channel with a factor between``[1-scale_factor, 1+scale_factor]``

class mmpose.datasets.pipelines.mesh_transform.MeshRandomFlip(flip_prob=0.5)[源代码]¶

Data augmentation with random image flip.

Required keys: ‘img’, ‘joints_2d’,’joints_2d_visible’, ‘joints_3d’, ‘joints_3d_visible’, ‘center’, ‘pose’, ‘iuv’ and ‘ann_info’. Modifies key: ‘img’, ‘joints_2d’,’joints_2d_visible’, ‘joints_3d’, ‘joints_3d_visible’, ‘center’, ‘pose’, ‘iuv’.

参数: flip_prob (float) – Probability of flip.

class mmpose.datasets.pipelines.pose3d_transform.AffineJoints(item='joints', visible_item=None)[源代码]¶

Apply affine transformation to joints coordinates.

参数

item (str) – The name of the joints to apply affine.
visible_item (str) – The name of the visibility item.

Required keys:: item, visible_item(optional)
Modified keys:: item, visible_item(optional)

class mmpose.datasets.pipelines.pose3d_transform.CameraProjection(item, mode, output_name=None, camera_type='SimpleCamera', camera_param=None)[源代码]¶

Apply camera projection to joint coordinates.

参数

item (str) – The name of the pose to apply camera projection.
mode (str) –
The type of camera projection, supported options are
- world_to_camera
- world_to_pixel
- camera_to_world
- camera_to_pixel
output_name (str|None) – The name of the projected pose. If None (default) is given, the projected pose will be stored in place.
camera_type (str) – The camera class name (should be registered in CAMERA).
camera_param (dict|None) – The camera parameter dict. See the camera class definition for more details. If None is given, the camera parameter will be obtained during processing of each data sample with the key “camera_param”.

Required keys:

item

camera_param (if camera parameters are not given in initialization)

Modified keys:: output_name

class mmpose.datasets.pipelines.pose3d_transform.CollectCameraIntrinsics(camera_param=None, need_distortion=True)[源代码]¶

Store camera intrinsics in a 1-dim array, including f, c, k, p.

参数

camera_param (dict|None) – The camera parameter dict. See the camera class definition for more details. If None is given, the camera parameter will be obtained during processing of each data sample with the key “camera_param”.
need_distortion (bool) – Whether need distortion parameters k and p. Default: True.

Required keys:: camera_param (if camera parameters are not given in initialization)
Modified keys:: intrinsics

class mmpose.datasets.pipelines.pose3d_transform.Generate3DHeatmapTarget(sigma=2, joint_indices=None, max_bound=1.0)[源代码]¶

Generate the target 3d heatmap.

Required keys: ‘joints_3d’, ‘joints_3d_visible’, ‘ann_info’. Modified keys: ‘target’, and ‘target_weight’.

参数

sigma – Sigma of heatmap gaussian.
joint_indices (list) – Indices of joints used for heatmap generation. If None (default) is given, all joints will be used.
max_bound (float) – The maximal value of heatmap.

class mmpose.datasets.pipelines.pose3d_transform.GenerateInputHeatmaps(item='joints', visible_item=None, obscured=0.0, from_pred=True, sigma=3, scale=None, base_size=96, target_type='gaussian', heatmap_cfg=None)[源代码]¶

Generate 2D input heatmaps for multi-camera heatmaps when the 2D model is not available.

Required keys: ‘joints’ Modified keys: ‘input_heatmaps’

参数

sigma (int) – Sigma of heatmap gaussian (mm).
base_size (int) – the base size of human
target_type (str) – type of target heatmap, only support ‘gaussian’ now

class mmpose.datasets.pipelines.pose3d_transform.GenerateVoxel3DHeatmapTarget(sigma=200.0, joint_indices=None)[源代码]¶

Generate the target 3d heatmap.

Required keys: ‘joints_3d’, ‘joints_3d_visible’, ‘ann_info_3d’. Modified keys: ‘target’, and ‘target_weight’.

参数

sigma – Sigma of heatmap gaussian (mm).
joint_indices (list) – Indices of joints used for heatmap generation. If None (default) is given, all joints will be used.

class mmpose.datasets.pipelines.pose3d_transform.GetRootCenteredPose(item, root_index, visible_item=None, remove_root=False, root_name=None)[源代码]¶

Zero-center the pose around a given root joint. Optionally, the root joint can be removed from the original pose and stored as a separate item.

Note that the root-centered joints may no longer align with some annotation information (e.g. flip_pairs, num_joints, inference_channel, etc.) due to the removal of the root joint.

参数

item (str) – The name of the pose to apply root-centering.
root_index (int) – Root joint index in the pose.
visible_item (str) – The name of the visibility item.
remove_root (bool) – If true, remove the root joint from the pose
root_name (str) – Optional. If not none, it will be used as the key to store the root position separated from the original pose.

Required keys:: item
Modified keys:: item, visible_item, root_name

class mmpose.datasets.pipelines.pose3d_transform.ImageCoordinateNormalization(item, norm_camera=False, camera_param=None)[源代码]¶

Normalize the 2D joint coordinate with image width and height. Range [0, w] is mapped to [-1, 1], while preserving the aspect ratio.

参数

item (str|list[str]) – The name of the pose to normalize.
norm_camera (bool) – Whether to normalize camera intrinsics. Default: False.
camera_param (dict|None) – The camera parameter dict. See the camera class definition for more details. If None is given, the camera parameter will be obtained during processing of each data sample with the key “camera_param”.

Required keys:: item
Modified keys:: item (, camera_param)

class mmpose.datasets.pipelines.pose3d_transform.NormalizeJointCoordinate(item, mean=None, std=None, norm_param_file=None)[源代码]¶

Normalize the joint coordinate with given mean and std.

参数

item (str) – The name of the pose to normalize.
mean (array) – Mean values of joint coordinates in shape [K, C].
std (array) – Std values of joint coordinates in shape [K, C].
norm_param_file (str) – Optionally load a dict containing mean and std from a file using mmcv.load.

Required keys:: item
Modified keys:: item

class mmpose.datasets.pipelines.pose3d_transform.PoseSequenceToTensor(item)[源代码]¶

Convert pose sequence from numpy array to Tensor.

The original pose sequence should have a shape of [T,K,C] or [K,C], where T is the sequence length, K and C are keypoint number and dimension. The converted pose sequence will have a shape of [KxC, T].

参数: item (str) – The name of the pose sequence

Required keys:: item
Modified keys:: item

class mmpose.datasets.pipelines.pose3d_transform.RelativeJointRandomFlip(item, flip_cfg, visible_item=None, flip_prob=0.5, flip_camera=False, camera_param=None)[源代码]¶

Data augmentation with random horizontal joint flip around a root joint.

参数

item (str|list[str]) – The name of the pose to flip.
flip_cfg (dict|list[dict]) –
Configurations of the fliplr_regression function. It should contain the following arguments:
- center_mode: The mode to set the center location on the x-axis to flip around.
- center_x or center_index: Set the x-axis location or the root joint’s index to define the flip center.
Please refer to the docstring of the fliplr_regression function for more details.
visible_item (str|list[str]) – The name of the visibility item which will be flipped accordingly along with the pose.
flip_prob (float) – Probability of flip.
flip_camera (bool) – Whether to flip horizontal distortion coefficients.
camera_param (dict|None) – The camera parameter dict. See the camera class definition for more details. If None is given, the camera parameter will be obtained during processing of each data sample with the key “camera_param”.

Required keys:: item
Modified keys:: item (, camera_param)

samplers¶

class mmpose.datasets.samplers.DistributedSampler(dataset, num_replicas=None, rank=None, shuffle=True, seed=0)[源代码]

DistributedSampler inheriting from torch.utils.data.DistributedSampler.

In pytorch of lower versions, there is no shuffle argument. This child class will port one to DistributedSampler.

mmpose.utils¶

class mmpose.utils.StopWatch(window=1)[源代码]¶

A helper class to measure FPS and detailed time consuming of each phase in a video processing loop or similar scenarios.

参数: window (int) – The sliding window size to calculate the running average of the time consuming.

示例

>>> from mmpose.utils import StopWatch
>>> import time
>>> stop_watch = StopWatch(window=10)
>>> with stop_watch.timeit('total'):
>>>     time.sleep(0.1)
>>>     # 'timeit' support nested use
>>>     with stop_watch.timeit('phase1'):
>>>         time.sleep(0.1)
>>>     with stop_watch.timeit('phase2'):
>>>         time.sleep(0.2)
>>>     time.sleep(0.2)
>>> report = stop_watch.report()

report(key=None)[源代码]¶

Report timing information.

返回: The key is the timer name and the value is the corresponding average time consuming.
返回类型: dict

report_strings()[源代码]¶

Report timing information in texture strings.

返回: Each element is the information string of a timed event, in format of ‘{timer_name}: {time_in_ms}’. Specially, if timer_name is ‘_FPS_’, the result will be converted to fps.
返回类型: list(str)

timeit(timer_name='_FPS_')[源代码]¶

Timing a code snippet with an assigned name.

参数: timer_name (str) – The unique name of the interested code snippet to handle multiple timers and generate reports. Note that ‘_FPS_’ is a special key that the measurement will be in fps instead of millisecond. Also see report and report_strings. Default: ‘_FPS_’.

注解

This function should always be used in a with statement, as shown in the example.

mmpose.utils.get_root_logger(log_file=None, log_level=20)[源代码]¶

Use get_logger method in mmcv to get the root logger.

The logger will be initialized if it has not been initialized. By default a StreamHandler will be added. If log_file is specified, a FileHandler will also be added. The name of the root logger is the top-level package name, e.g., “mmpose”.

参数

log_file (str | None) – The log filename. If specified, a FileHandler will be added to the root logger.
log_level (int) – The root logger level. Note that only the process of rank 0 is affected, while other processes will set the level to “Error” and be silent most of the time.

返回

The root logger.

返回类型

logging.Logger

mmpose.utils.setup_multi_processes(cfg)[源代码]¶: Setup multi-processing environment variables.