mmpose.apis¶

class mmpose.apis.MMPoseInferencer(pose2d: Optional[str] = None, pose2d_weights: Optional[str] = None, pose3d: Optional[str] = None, pose3d_weights: Optional[str] = None, device: Optional[str] = None, scope: str = 'mmpose', det_model: Optional[Union[dict, mmengine.config.config.Config, mmengine.config.config.ConfigDict, str]] = None, det_weights: Optional[str] = None, det_cat_ids: Optional[Union[int, List]] = None, show_progress: bool = False)[源代码]¶

MMPose Inferencer. It’s a unified inferencer interface for pose estimation task, currently including: Pose2D. and it can be used to perform 2D keypoint detection.

参数

pose2d (str, optional) –
Pretrained 2D pose estimation algorithm. It’s the path to the config file or the model name defined in metafile. For example, it could be:
- model alias, e.g. 'body',
- config name, e.g. 'simcc_res50_8xb64-210e_coco-256x192',
- config path
Defaults to None.
pose2d_weights (str, optional) – Path to the custom checkpoint file of the selected pose2d model. If it is not specified and “pose2d” is a model name of metafile, the weights will be loaded from metafile. Defaults to None.
device (str, optional) – Device to run inference. If None, the available device will be automatically used. Defaults to None.
scope (str, optional) – The scope of the model. Defaults to “mmpose”.
det_model (str, optional) – Config path or alias of detection model. Defaults to None.
det_weights (str, optional) – Path to the checkpoints of detection model. Defaults to None.
det_cat_ids (int or list[int], optional) – Category id for detection model. Defaults to None.
output_heatmaps (bool, optional) – Flag to visualize predicted heatmaps. If set to None, the default setting from the model config will be used. Default is None.

forward(inputs: Union[str, numpy.ndarray], **forward_kwargs) → Union[mmengine.structures.instance_data.InstanceData, List[mmengine.structures.instance_data.InstanceData]][源代码]¶

Forward the inputs to the model.

参数: inputs (InputsType) – The inputs to be forwarded.
返回: The prediction results. Possibly with keys “pose2d”.
返回类型: Dict

preprocess(inputs: Union[str, numpy.ndarray, Sequence[Union[str, numpy.ndarray]]], batch_size: int = 1, **kwargs)[源代码]¶

Process the inputs into a model-feedable format.

参数

inputs (InputsType) – Inputs given by user.
batch_size (int) – batch size. Defaults to 1.

生成器

Any – Data processed by the pipeline and collate_fn. List[str or np.ndarray]: List of original inputs in the batch

visualize(inputs: Union[str, numpy.ndarray, Sequence[Union[str, numpy.ndarray]]], preds: Union[mmengine.structures.instance_data.InstanceData, List[mmengine.structures.instance_data.InstanceData]], **kwargs) → List[numpy.ndarray][源代码]¶

Visualize predictions.

参数

inputs (list) – Inputs preprocessed by _inputs_to_list().
preds (Any) – Predictions of the model.
return_vis (bool) – Whether to return images with predicted results.
show (bool) – Whether to display the image in a popup window. Defaults to False.
show_interval (int) – The interval of show (s). Defaults to 0
radius (int) – Keypoint radius for visualization. Defaults to 3
thickness (int) – Link thickness for visualization. Defaults to 1
kpt_thr (float) – The threshold to visualize the keypoints. Defaults to 0.3
vis_out_dir (str, optional) – directory to save visualization results w/o predictions. If left as empty, no file will be saved. Defaults to ‘’.

返回

Visualization results.

返回类型

List[np.ndarray]

class mmpose.apis.Pose2DInferencer(model: Union[dict, mmengine.config.config.Config, mmengine.config.config.ConfigDict, str], weights: Optional[str] = None, device: Optional[str] = None, scope: Optional[str] = 'mmpose', det_model: Optional[Union[dict, mmengine.config.config.Config, mmengine.config.config.ConfigDict, str]] = None, det_weights: Optional[str] = None, det_cat_ids: Optional[Union[int, Tuple]] = None, show_progress: bool = False)[源代码]¶

The inferencer for 2D pose estimation.

参数

model (str, optional) –
Pretrained 2D pose estimation algorithm. It’s the path to the config file or the model name defined in metafile. For example, it could be:
- model alias, e.g. 'body',
- config name, e.g. 'simcc_res50_8xb64-210e_coco-256x192',
- config path
Defaults to None.
weights (str, optional) – Path to the checkpoint. If it is not specified and “model” is a model name of metafile, the weights will be loaded from metafile. Defaults to None.
device (str, optional) – Device to run inference. If None, the available device will be automatically used. Defaults to None.
scope (str, optional) – The scope of the model. Defaults to “mmpose”.
det_model (str, optional) – Config path or alias of detection model. Defaults to None.
det_weights (str, optional) – Path to the checkpoints of detection model. Defaults to None.
det_cat_ids (int or list[int], optional) – Category id for detection model. Defaults to None.

forward(inputs: Union[dict, tuple], merge_results: bool = True, bbox_thr: float = - 1, pose_based_nms: bool = False)[源代码]¶

Performs a forward pass through the model.

参数

inputs (Union[dict, tuple]) – The input data to be processed. Can be either a dictionary or a tuple.
merge_results (bool, optional) – Whether to merge data samples, default to True. This is only applicable when the data_mode is ‘topdown’.
bbox_thr (float, optional) – A threshold for the bounding box scores. Bounding boxes with scores greater than this value will be retained. Default value is -1 which retains all bounding boxes.

返回

A list of data samples with prediction instances.

preprocess_single(input: Union[str, numpy.ndarray], index: int, bbox_thr: float = 0.3, nms_thr: float = 0.3, bboxes: Union[List[List], List[numpy.ndarray], numpy.ndarray] = [])[源代码]¶

Process a single input into a model-feedable format.

参数

input (InputType) – Input given by user.
index (int) – index of the input
bbox_thr (float) – threshold for bounding box detection. Defaults to 0.3.
nms_thr (float) – IoU threshold for bounding box NMS. Defaults to 0.3.

生成器

Any – Data processed by the pipeline and collate_fn.

update_model_visualizer_settings(draw_heatmap: bool = False, skeleton_style: str = 'mmpose', **kwargs) → None[源代码]¶

Update the settings of models and visualizer according to inference arguments.

参数

draw_heatmaps (bool, optional) – Flag to visualize predicted heatmaps. If not provided, it defaults to False.
skeleton_style (str, optional) – Skeleton style selection. Valid options are ‘mmpose’ and ‘openpose’. Defaults to ‘mmpose’.

mmpose.apis.collate_pose_sequence(pose_results_2d, with_track_id=True, target_frame=- 1)[源代码]¶

Reorganize multi-frame pose detection results into individual pose sequences.

备注

The temporal length of the pose detection results: T
The number of the person instances: N
The number of the keypoints: K
The channel number of each keypoint: C

参数

pose_results_2d (List[List[PoseDataSample]]) –
Multi-frame pose detection results stored in a nested list. Each element of the outer list is the pose detection results of a single frame, and each element of the inner list is the pose information of one person, which contains:
- keypoints (ndarray[K, 2 or 3]): x, y, [score]
- track_id (int): unique id of each person, required when
  with_track_id==True`
with_track_id (bool) – If True, the element in pose_results is expected to contain “track_id”, which will be used to gather the pose sequence of a person from multiple frames. Otherwise, the pose results in each frame are expected to have a consistent number and order of identities. Default is True.
target_frame (int) – The index of the target frame. Default: -1.

返回

Indivisual pose sequence in with length N.

返回类型

List[PoseDataSample]

mmpose.apis.collect_multi_frames(video, frame_id, indices, online=False)[源代码]¶

Collect multi frames from the video.

参数

video (mmcv.VideoReader) – A VideoReader of the input video file.
frame_id (int) – index of the current frame
indices (list(int)) – index offsets of the frames to collect
online (bool) – inference mode, if set to True, can not use future frame information.

返回

multi frames collected from the input video file.

返回类型

list(ndarray)

mmpose.apis.convert_keypoint_definition(keypoints, pose_det_dataset, pose_lift_dataset)[源代码]¶

Convert pose det dataset keypoints definition to pose lifter dataset keypoints definition, so that they are compatible with the definitions required for 3D pose lifting.

参数

keypoints (ndarray[N, K, 2 or 3]) – 2D keypoints to be transformed.
pose_det_dataset (str) – Name of the dataset for 2D pose detector.

:param : Name of the dataset for 2D pose detector. :type : str :param pose_lift_dataset: Name of the dataset for pose lifter model. :type pose_lift_dataset: str

返回: the transformed 2D keypoints.
返回类型: ndarray[K, 2 or 3]

mmpose.apis.extract_pose_sequence(pose_results, frame_idx, causal, seq_len, step=1)[源代码]¶

Extract the target frame from 2D pose results, and pad the sequence to a fixed length.

参数

pose_results (List[List[PoseDataSample]]) – Multi-frame pose detection results stored in a list.
frame_idx (int) – The index of the frame in the original video.
causal (bool) – If True, the target frame is the last frame in a sequence. Otherwise, the target frame is in the middle of a sequence.
seq_len (int) – The number of frames in the input sequence.
step (int) – Step size to extract frames from the video.

返回

Multi-frame pose detection results: stored in a nested list with a length of seq_len.

返回类型

List[List[PoseDataSample]]

mmpose.apis.inference_bottomup(model: torch.nn.modules.module.Module, img: Union[numpy.ndarray, str])[源代码]¶

Inference image with a bottom-up pose estimator.

参数

model (nn.Module) – The bottom-up pose estimator
img (np.ndarray | str) – The loaded image or image file to inference

返回

The inference results. Specifically, the predicted keypoints and scores are saved at data_sample.pred_instances.keypoints and data_sample.pred_instances.keypoint_scores.

返回类型

List[PoseDataSample]

mmpose.apis.inference_pose_lifter_model(model, pose_results_2d, with_track_id=True, image_size=None, norm_pose_2d=False)[源代码]¶

Inference 3D pose from 2D pose sequences using a pose lifter model.

参数

model (nn.Module) – The loaded pose lifter model
pose_results_2d (List[List[PoseDataSample]]) – The 2D pose sequences stored in a nested list.
with_track_id – If True, the element in pose_results_2d is expected to contain “track_id”, which will be used to gather the pose sequence of a person from multiple frames. Otherwise, the pose results in each frame are expected to have a consistent number and order of identities. Default is True.
image_size (tuple|list) – image width, image height. If None, image size will not be contained in dict data.
norm_pose_2d (bool) – If True, scale the bbox (along with the 2D pose) to the average bbox scale of the dataset, and move the bbox (along with the 2D pose) to the average bbox center of the dataset.

返回

3D pose inference results. Specifically, the predicted keypoints and scores are saved at data_sample.pred_instances.keypoints_3d.

返回类型

List[PoseDataSample]

mmpose.apis.inference_topdown(model: torch.nn.modules.module.Module, img: Union[numpy.ndarray, str], bboxes: Optional[Union[List, numpy.ndarray]] = None, bbox_format: str = 'xyxy') → List[mmpose.structures.pose_data_sample.PoseDataSample][源代码]¶

Inference image with a top-down pose estimator.

参数

model (nn.Module) – The top-down pose estimator
img (np.ndarray | str) – The loaded image or image file to inference
bboxes (np.ndarray, optional) – The bboxes in shape (N, 4), each row represents a bbox. If not given, the entire image will be regarded as a single bbox area. Defaults to None
bbox_format (str) – The bbox format indicator. Options are 'xywh' and 'xyxy'. Defaults to 'xyxy'

返回

The inference results. Specifically, the predicted keypoints and scores are saved at data_sample.pred_instances.keypoints and data_sample.pred_instances.keypoint_scores.

返回类型

List[PoseDataSample]

mmpose.apis.init_model(config: Union[str, pathlib.Path, mmengine.config.config.Config], checkpoint: Optional[str] = None, device: str = 'cuda:0', cfg_options: Optional[dict] = None) → torch.nn.modules.module.Module[源代码]¶

Initialize a pose estimator from a config file.

参数

config (str, Path, or mmengine.Config) – Config file path, Path, or the config object.
checkpoint (str, optional) – Checkpoint path. If left as None, the model will not load any weights. Defaults to None
device (str) – The device where the anchors will be put on. Defaults to 'cuda:0'.
cfg_options (dict, optional) – Options to override some settings in the used config. Defaults to None

返回

The constructed pose estimator.

返回类型

nn.Module

mmpose.apis.visualize(img: Union[numpy.ndarray, str], keypoints: numpy.ndarray, keypoint_score: Optional[numpy.ndarray] = None, metainfo: Optional[Union[str, dict]] = None, visualizer: Optional[mmpose.visualization.local_visualizer.PoseLocalVisualizer] = None, show_kpt_idx: bool = False, skeleton_style: str = 'mmpose', show: bool = False, kpt_thr: float = 0.3)[源代码]¶

Visualize 2d keypoints on an image.

参数

img (str | np.ndarray) – The image to be displayed.
keypoints (np.ndarray) – The keypoint to be displayed.
keypoint_score (np.ndarray) – The score of each keypoint.
metainfo (str | dict) – The metainfo of dataset.
visualizer (PoseLocalVisualizer) – The visualizer.
show_kpt_idx (bool) – Whether to show the index of keypoints.
skeleton_style (str) – Skeleton style. Options are ‘mmpose’ and ‘openpose’.
show (bool) – Whether to show the image.
wait_time (int) – Value of waitKey param.
kpt_thr (float) – Keypoint threshold.

mmpose.codecs¶

class mmpose.codecs.AssociativeEmbedding(input_size: Tuple[int, int], heatmap_size: Tuple[int, int], sigma: Optional[float] = None, use_udp: bool = False, decode_keypoint_order: List[int] = [], decode_nms_kernel: int = 5, decode_gaussian_kernel: int = 3, decode_keypoint_thr: float = 0.1, decode_tag_thr: float = 1.0, decode_topk: int = 30, decode_center_shift=0.0, decode_max_instances: Optional[int] = None)[源代码]¶

Encode/decode keypoints with the method introduced in “Associative Embedding”. This is an asymmetric codec, where the keypoints are represented as gaussian heatmaps and position indices during encoding, and restored from predicted heatmaps and group tags.

See the paper `Associative Embedding: End-to-End Learning for Joint Detection and Grouping`_ by Newell et al (2017) for details

备注

instance number: N
keypoint number: K
keypoint dimension: D
embedding tag dimension: L
image size: [w, h]
heatmap size: [W, H]

Encoded:

heatmaps (np.ndarray): The generated heatmap in shape (K, H, W)
where [W, H] is the heatmap_size

keypoint_indices (np.ndarray): The keypoint position indices in shape
(N, K, 2). Each keypoint’s index is [i, v], where i is the position index in the heatmap (\(i=y*w+x\)) and v is the visibility

keypoint_weights (np.ndarray): The target weights in shape (N, K)

参数

input_size (tuple) – Image size in [w, h]
heatmap_size (tuple) – Heatmap size in [W, H]
sigma (float) – The sigma value of the Gaussian heatmap
use_udp (bool) – Whether use unbiased data processing. See `UDP (CVPR 2020)`_ for details. Defaults to False
decode_keypoint_order (List[int]) – The grouping order of the keypoint indices. The groupping usually starts from a keypoints around the head and torso, and gruadually moves out to the limbs
decode_keypoint_thr (float) – The threshold of keypoint response value in heatmaps. Defaults to 0.1
decode_tag_thr (float) – The maximum allowed tag distance when matching a keypoint to a group. A keypoint with larger tag distance to any of the existing groups will initializes a new group. Defaults to 1.0
decode_nms_kernel (int) – The kernel size of the NMS during decoding, which should be an odd integer. Defaults to 5
decode_gaussian_kernel (int) – The kernel size of the Gaussian blur during decoding, which should be an odd integer. It is only used when self.use_udp==True. Defaults to 3
decode_topk (int) – The number top-k candidates of each keypoints that will be retrieved from the heatmaps during dedocding. Defaults to 20
decode_max_instances (int, optional) – The maximum number of instances to decode. None means no limitation to the instance number. Defaults to None

Grouping`: https://arxiv.org/abs/1611.05424 .. UDP (CVPR 2020): https://arxiv.org/abs/1911.07524

batch_decode(batch_heatmaps: torch.Tensor, batch_tags: torch.Tensor) → Tuple[List[numpy.ndarray], List[numpy.ndarray]][源代码]¶

Decode the keypoint coordinates from a batch of heatmaps and tagging heatmaps. The decoded keypoint coordinates are in the input image space.

参数

batch_heatmaps (Tensor) – Keypoint detection heatmaps in shape (B, K, H, W)
batch_tags (Tensor) – Tagging heatmaps in shape (B, C, H, W), where \(C=L*K\)

返回

batch_keypoints (List[np.ndarray]): Decoded keypoint coordinates
of the batch, each is in shape (N, K, D)
batch_scores (List[np.ndarray]): Decoded keypoint scores of the
batch, each is in shape (N, K). It usually represents the confidience of the keypoint prediction

返回类型

tuple

decode(encoded: Any) → Tuple[numpy.ndarray, numpy.ndarray][源代码]¶

Decode keypoints.

参数

encoded (any) – Encoded keypoint representation using the codec

返回

keypoints (np.ndarray): Keypoint coordinates in shape (N, K, D)
keypoints_visible (np.ndarray): Keypoint visibility in shape
(N, K, D)

返回类型

tuple

encode(keypoints: numpy.ndarray, keypoints_visible: Optional[numpy.ndarray] = None) → Tuple[numpy.ndarray, numpy.ndarray, numpy.ndarray][源代码]¶

Encode keypoints into heatmaps and position indices. Note that the original keypoint coordinates should be in the input image space.

参数

keypoints (np.ndarray) – Keypoint coordinates in shape (N, K, D)
keypoints_visible (np.ndarray) – Keypoint visibilities in shape (N, K)

返回

heatmaps (np.ndarray): The generated heatmap in shape
(K, H, W) where [W, H] is the heatmap_size
keypoint_indices (np.ndarray): The keypoint position indices
in shape (N, K, 2). Each keypoint’s index is [i, v], where i is the position index in the heatmap (\(i=y*w+x\)) and v is the visibility
keypoint_weights (np.ndarray): The target weights in shape
(N, K)

返回类型

dict

class mmpose.codecs.DecoupledHeatmap(input_size: Tuple[int, int], heatmap_size: Tuple[int, int], root_type: str = 'kpt_center', heatmap_min_overlap: float = 0.7, encode_max_instances: int = 30)[源代码]¶

Encode/decode keypoints with the method introduced in the paper CID.

See the paper Contextual Instance Decoupling for Robust Multi-Person Pose Estimation`_ by Wang et al (2022) for details

备注

instance number: N
keypoint number: K
keypoint dimension: D
image size: [w, h]
heatmap size: [W, H]

Encoded:

heatmaps (np.ndarray): The coupled heatmap in shape
(1+K, H, W) where [W, H] is the heatmap_size.
instance_heatmaps (np.ndarray): The decoupled heatmap in shape
(M*K, H, W) where M is the number of instances.
keypoint_weights (np.ndarray): The weight for heatmaps in shape
(M*K).
instance_coords (np.ndarray): The coordinates of instance roots
in shape (M, 2)

参数

input_size (tuple) – Image size in [w, h]
heatmap_size (tuple) – Heatmap size in [W, H]
root_type (str) –
The method to generate the instance root. Options are:
- 'kpt_center': Average coordinate of all visible keypoints.
- 'bbox_center': Center point of bounding boxes outlined by
  all visible keypoints.
Defaults to 'kpt_center'
heatmap_min_overlap (float) – Minimum overlap rate among instances. Used when calculating sigmas for instances. Defaults to 0.7
background_weight (float) – Loss weight of background pixels. Defaults to 0.1
encode_max_instances (int) – The maximum number of instances to encode for each sample. Defaults to 30

Contextual_Instance_Decoupling_for_Robust_Multi-Person_Pose_Estimation_ CVPR_2022_paper.html

decode(instance_heatmaps: numpy.ndarray, instance_scores: numpy.ndarray) → Tuple[numpy.ndarray, numpy.ndarray][源代码]¶

Decode keypoint coordinates from decoupled heatmaps. The decoded keypoint coordinates are in the input image space.

参数

instance_heatmaps (np.ndarray) – Heatmaps in shape (N, K, H, W)
instance_scores (np.ndarray) – Confidence of instance roots prediction in shape (N, 1)

返回

keypoints (np.ndarray): Decoded keypoint coordinates in shape
(N, K, D)
scores (np.ndarray): The keypoint scores in shape (N, K). It
usually represents the confidence of the keypoint prediction

返回类型

tuple

encode(keypoints: numpy.ndarray, keypoints_visible: Optional[numpy.ndarray] = None, bbox: Optional[numpy.ndarray] = None) → dict[源代码]¶

Encode keypoints into heatmaps.

参数

keypoints (np.ndarray) – Keypoint coordinates in shape (N, K, D)
keypoints_visible (np.ndarray) – Keypoint visibilities in shape (N, K)
bbox (np.ndarray) – Bounding box in shape (N, 8) which includes coordinates of 4 corners.

返回

heatmaps (np.ndarray): The coupled heatmap in shape
(1+K, H, W) where [W, H] is the heatmap_size.
instance_heatmaps (np.ndarray): The decoupled heatmap in shape
(N*K, H, W) where M is the number of instances.
keypoint_weights (np.ndarray): The weight for heatmaps in shape
(N*K).
instance_coords (np.ndarray): The coordinates of instance roots
in shape (N, 2)

返回类型

dict

class mmpose.codecs.EDPoseLabel(num_select: int = 100, num_keypoints: int = 17)[源代码]¶

Generate keypoint and label coordinates for `ED-Pose`_ by Yang J. et al (2023).

备注

instance number: N
keypoint number: K
keypoint dimension: D
image size: [w, h]

Encoded:

keypoints (np.ndarray): Keypoint coordinates in shape (N, K, D)

keypoints_visible (np.ndarray): Keypoint visibility in shape
(N, K, D)

area (np.ndarray): Area in shape (N)

bbox (np.ndarray): Bbox in shape (N, 4)

参数

num_select (int) – The number of candidate instances
num_keypoints (int) – The Number of keypoints

decode(input_shapes: numpy.ndarray, pred_logits: numpy.ndarray, pred_boxes: numpy.ndarray, pred_keypoints: numpy.ndarray)[源代码]¶

Select the final top-k keypoints, and decode the results from normalize size to origin input size.

参数

input_shapes (Tensor) – The size of input image resize.
test_cfg (ConfigType) – Config of testing.
pred_logits (Tensor) – The result of score.
pred_boxes (Tensor) – The result of bbox.
pred_keypoints (Tensor) – The result of keypoints.

返回

Decoded boxes, keypoints, and keypoint scores.

返回类型

tuple

encode(img_shape, keypoints: numpy.ndarray, keypoints_visible: Optional[numpy.ndarray] = None, area: Optional[numpy.ndarray] = None, bboxes: Optional[numpy.ndarray] = None) → dict[源代码]¶

Encoding keypoints, area and bbox from input image space to normalized space.

参数

img_shape (-) – The shape of image in the format of (width, height).
keypoints (-) – Keypoint coordinates in shape (N, K, D).
keypoints_visible (-) – Keypoint visibility in shape (N, K)
area (-) –
bboxes (-) –

返回

Contains the following items:

keypoint_labels (np.ndarray): The processed keypoints in
shape like (N, K, D).

keypoints_visible (np.ndarray): Keypoint visibility in shape
(N, K, D)

area_labels (np.ndarray): The processed target
area in shape (N).

bboxes_labels: The processed target bbox in
shape (N, 4).

返回类型

encoded (dict)

class mmpose.codecs.Hand3DHeatmap(image_size: Tuple[int, int] = [256, 256], root_heatmap_size: int = 64, heatmap_size: Tuple[int, int, int] = [64, 64, 64], heatmap3d_depth_bound: float = 400.0, heatmap_size_root: int = 64, root_depth_bound: float = 400.0, depth_size: int = 64, use_different_joint_weights: bool = False, sigma: int = 2, joint_indices: Optional[list] = None, max_bound: float = 1.0)[源代码]¶

Generate target 3d heatmap and relative root depth for hand datasets.

备注

instance number: N
keypoint number: K
keypoint dimension: D

参数

image_size (tuple) – Size of image. Default: [256, 256].
root_heatmap_size (int) – Size of heatmap of root head. Default: 64.
heatmap_size (tuple) – Size of heatmap. Default: [64, 64, 64].
heatmap3d_depth_bound (float) – Boundary for 3d heatmap depth. Default: 400.0.
heatmap_size_root (int) – Size of 3d heatmap root. Default: 64.
depth_size (int) – Number of depth discretization size, used for decoding. Defaults to 64.
root_depth_bound (float) – Boundary for 3d heatmap root depth. Default: 400.0.
use_different_joint_weights (bool) – Whether to use different joint weights. Default: False.
sigma (int) – Sigma of heatmap gaussian. Default: 2.
joint_indices (list, optional) – Indices of joints used for heatmap generation. If None (default) is given, all joints will be used. Default: None.
max_bound (float) – The maximal value of heatmap. Default: 1.0.

decode(heatmaps: numpy.ndarray, root_depth: numpy.ndarray, hand_type: numpy.ndarray) → Tuple[numpy.ndarray, numpy.ndarray][源代码]¶

Decode keypoint coordinates from heatmaps. The decoded keypoint coordinates are in the input image space.

参数

heatmaps (np.ndarray) – Heatmaps in shape (K, D, H, W)
root_depth (np.ndarray) – Root depth prediction.
hand_type (np.ndarray) – Hand type prediction.

返回

keypoints (np.ndarray): Decoded keypoint coordinates in shape
(N, K, D)
scores (np.ndarray): The keypoint scores in shape (N, K). It
usually represents the confidence of the keypoint prediction

返回类型

tuple

encode(keypoints: numpy.ndarray, keypoints_visible: Optional[numpy.ndarray], dataset_keypoint_weights: Optional[numpy.ndarray], rel_root_depth: numpy.float32, rel_root_valid: numpy.float32, hand_type: numpy.ndarray, hand_type_valid: numpy.ndarray, focal: numpy.ndarray, principal_pt: numpy.ndarray) → dict[源代码]¶

Encoding keypoints from input image space to input image space.

参数

keypoints (np.ndarray) – Keypoint coordinates in shape (N, K, D).
keypoints_visible (np.ndarray, optional) – Keypoint visibilities in shape (N, K).
dataset_keypoint_weights (np.ndarray, optional) – Keypoints weight in shape (K, ).
rel_root_depth (np.float32) – Relative root depth.
rel_root_valid (float) – Validity of relative root depth.
hand_type (np.ndarray) – Type of hand encoded as a array.
hand_type_valid (np.ndarray) – Validity of hand type.
focal (np.ndarray) – Focal length of camera.
principal_pt (np.ndarray) – Principal point of camera.

返回

Contains the following items:

heatmaps (np.ndarray): The generated heatmap in shape (K * D, H, W) where [W, H, D] is the heatmap_size

keypoint_weights (np.ndarray): The target weights in shape (N, K)

root_depth (np.ndarray): Encoded relative root depth

root_depth_weight (np.ndarray): The weights of relative root depth

type (np.ndarray): Encoded hand type

type_weight (np.ndarray): The weights of hand type

返回类型

encoded (dict)

class mmpose.codecs.ImagePoseLifting(num_keypoints: int, root_index: Union[int, List] = 0, remove_root: bool = False, save_index: bool = False, reshape_keypoints: bool = True, concat_vis: bool = False, keypoints_mean: Optional[numpy.ndarray] = None, keypoints_std: Optional[numpy.ndarray] = None, target_mean: Optional[numpy.ndarray] = None, target_std: Optional[numpy.ndarray] = None, additional_encode_keys: Optional[List[str]] = None)[源代码]¶

Generate keypoint coordinates for pose lifter.

备注

instance number: N
keypoint number: K
keypoint dimension: D
pose-lifitng target dimension: C

参数

num_keypoints (int) – The number of keypoints in the dataset.
root_index (Union[int, List]) – Root keypoint index in the pose.
remove_root (bool) – If true, remove the root keypoint from the pose. Default: False.
save_index (bool) – If true, store the root position separated from the original pose. Default: False.
reshape_keypoints (bool) – If true, reshape the keypoints into shape (-1, N). Default: True.
concat_vis (bool) – If true, concat the visibility item of keypoints. Default: False.
keypoints_mean (np.ndarray, optional) – Mean values of keypoints coordinates in shape (K, D).
keypoints_std (np.ndarray, optional) – Std values of keypoints coordinates in shape (K, D).
target_mean (np.ndarray, optional) – Mean values of pose-lifitng target coordinates in shape (K, C).
target_std (np.ndarray, optional) – Std values of pose-lifitng target coordinates in shape (K, C).

decode(encoded: numpy.ndarray, target_root: Optional[numpy.ndarray] = None) → Tuple[numpy.ndarray, numpy.ndarray][源代码]¶

Decode keypoint coordinates from normalized space to input image space.

参数

encoded (np.ndarray) – Coordinates in shape (N, K, C).
target_root (np.ndarray, optional) – The target root coordinate. Default: None.

返回

Decoded coordinates in shape (N, K, C). scores (np.ndarray): The keypoint scores in shape (N, K).

返回类型

keypoints (np.ndarray)

encode(keypoints: numpy.ndarray, keypoints_visible: Optional[numpy.ndarray] = None, lifting_target: Optional[numpy.ndarray] = None, lifting_target_visible: Optional[numpy.ndarray] = None) → dict[源代码]¶

Encoding keypoints from input image space to normalized space.

参数

keypoints (np.ndarray) – Keypoint coordinates in shape (N, K, D).
keypoints_visible (np.ndarray, optional) – Keypoint visibilities in shape (N, K).
lifting_target (np.ndarray, optional) – 3d target coordinate in shape (T, K, C).
lifting_target_visible (np.ndarray, optional) – Target coordinate in shape (T, K, ).

返回

Contains the following items:

keypoint_labels (np.ndarray): The processed keypoints in shape like (N, K, D) or (K * D, N).

keypoint_labels_visible (np.ndarray): The processed keypoints’ weights in shape (N, K, ) or (N-1, K, ).

lifting_target_label: The processed target coordinate in shape (K, C) or (K-1, C).

lifting_target_weight (np.ndarray): The target weights in shape (K, ) or (K-1, ).

trajectory_weights (np.ndarray): The trajectory weights in shape (K, ).

target_root (np.ndarray): The root coordinate of target in shape (C, ).

In addition, there are some optional items it may contain:

target_root (np.ndarray): The root coordinate of target in shape (C, ). Exists if zero_center is True.

target_root_removed (bool): Indicate whether the root of pose-lifitng target is removed. Exists if remove_root is True.

target_root_index (int): An integer indicating the index of root. Exists if remove_root and save_index are True.

返回类型

encoded (dict)

class mmpose.codecs.IntegralRegressionLabel(input_size: Tuple[int, int], heatmap_size: Tuple[int, int], sigma: float, unbiased: bool = False, blur_kernel_size: int = 11, normalize: bool = True)[源代码]¶

Generate keypoint coordinates and normalized heatmaps. See the paper: DSNT by Nibali et al(2018).

备注

instance number: N
keypoint number: K
keypoint dimension: D
image size: [w, h]

Encoded:

keypoint_labels (np.ndarray): The normalized regression labels in
shape (N, K, D) where D is 2 for 2d coordinates

heatmaps (np.ndarray): The generated heatmap in shape (K, H, W) where
[W, H] is the heatmap_size

keypoint_weights (np.ndarray): The target weights in shape (N, K)

参数

input_size (tuple) – Input image size in [w, h]
heatmap_size (tuple) – Heatmap size in [W, H]
sigma (float) – The sigma value of the Gaussian heatmap
unbiased (bool) – Whether use unbiased method (DarkPose) in 'msra' encoding. See Dark Pose for details. Defaults to False
blur_kernel_size (int) – The Gaussian blur kernel size of the heatmap modulation in DarkPose. The kernel size and sigma should follow the expirical formula \(sigma = 0.3*((ks-1)*0.5-1)+0.8\). Defaults to 11
normalize (bool) – Whether to normalize the heatmaps. Defaults to True.

decode(encoded: numpy.ndarray) → Tuple[numpy.ndarray, numpy.ndarray][源代码]¶

Decode keypoint coordinates from normalized space to input image space.

参数

encoded (np.ndarray) – Coordinates in shape (N, K, D)

返回

keypoints (np.ndarray): Decoded coordinates in shape (N, K, D)
socres (np.ndarray): The keypoint scores in shape (N, K).
It usually represents the confidence of the keypoint prediction

返回类型

tuple

encode(keypoints: numpy.ndarray, keypoints_visible: Optional[numpy.ndarray] = None) → dict[源代码]¶

Encoding keypoints to regression labels and heatmaps.

参数

keypoints (np.ndarray) – Keypoint coordinates in shape (N, K, D)
keypoints_visible (np.ndarray) – Keypoint visibilities in shape (N, K)

返回

keypoint_labels (np.ndarray): The normalized regression labels in
shape (N, K, D) where D is 2 for 2d coordinates
heatmaps (np.ndarray): The generated heatmap in shape
(K, H, W) where [W, H] is the heatmap_size
keypoint_weights (np.ndarray): The target weights in shape
(N, K)

返回类型

dict

class mmpose.codecs.MSRAHeatmap(input_size: Tuple[int, int], heatmap_size: Tuple[int, int], sigma: float, unbiased: bool = False, blur_kernel_size: int = 11)[源代码]¶

Represent keypoints as heatmaps via “MSRA” approach. See the paper: Simple Baselines for Human Pose Estimation and Tracking by Xiao et al (2018) for details.

备注

instance number: N
keypoint number: K
keypoint dimension: D
image size: [w, h]
heatmap size: [W, H]

Encoded:

heatmaps (np.ndarray): The generated heatmap in shape (K, H, W)
where [W, H] is the heatmap_size

keypoint_weights (np.ndarray): The target weights in shape (N, K)

参数

input_size (tuple) – Image size in [w, h]
heatmap_size (tuple) – Heatmap size in [W, H]
sigma (float) – The sigma value of the Gaussian heatmap
unbiased (bool) – Whether use unbiased method (DarkPose) in 'msra' encoding. See Dark Pose for details. Defaults to False
blur_kernel_size (int) – The Gaussian blur kernel size of the heatmap modulation in DarkPose. The kernel size and sigma should follow the expirical formula \(sigma = 0.3*((ks-1)*0.5-1)+0.8\). Defaults to 11

decode(encoded: numpy.ndarray) → Tuple[numpy.ndarray, numpy.ndarray][源代码]¶

Decode keypoint coordinates from heatmaps. The decoded keypoint coordinates are in the input image space.

参数

encoded (np.ndarray) – Heatmaps in shape (K, H, W)

返回

keypoints (np.ndarray): Decoded keypoint coordinates in shape
(N, K, D)
scores (np.ndarray): The keypoint scores in shape (N, K). It
usually represents the confidence of the keypoint prediction

返回类型

tuple

encode(keypoints: numpy.ndarray, keypoints_visible: Optional[numpy.ndarray] = None) → dict[源代码]¶

Encode keypoints into heatmaps. Note that the original keypoint coordinates should be in the input image space.

参数

keypoints (np.ndarray) – Keypoint coordinates in shape (N, K, D)
keypoints_visible (np.ndarray) – Keypoint visibilities in shape (N, K)

返回

heatmaps (np.ndarray): The generated heatmap in shape
(K, H, W) where [W, H] is the heatmap_size
keypoint_weights (np.ndarray): The target weights in shape
(N, K)

返回类型

dict

class mmpose.codecs.MegviiHeatmap(input_size: Tuple[int, int], heatmap_size: Tuple[int, int], kernel_size: int)[源代码]¶

Represent keypoints as heatmaps via “Megvii” approach. See MSPN (2019) and CPN (2018) for details.

备注

instance number: N
keypoint number: K
keypoint dimension: D
image size: [w, h]
heatmap size: [W, H]

Encoded:

heatmaps (np.ndarray): The generated heatmap in shape (K, H, W)
where [W, H] is the heatmap_size

keypoint_weights (np.ndarray): The target weights in shape (N, K)

参数

input_size (tuple) – Image size in [w, h]
heatmap_size (tuple) – Heatmap size in [W, H]
kernel_size (tuple) – The kernel size of the heatmap gaussian in [ks_x, ks_y]

decode(encoded: numpy.ndarray) → Tuple[numpy.ndarray, numpy.ndarray][源代码]¶

Decode keypoint coordinates from heatmaps. The decoded keypoint coordinates are in the input image space.

参数

encoded (np.ndarray) – Heatmaps in shape (K, H, W)

返回

keypoints (np.ndarray): Decoded keypoint coordinates in shape
(K, D)
scores (np.ndarray): The keypoint scores in shape (K,). It
usually represents the confidence of the keypoint prediction

返回类型

tuple

encode(keypoints: numpy.ndarray, keypoints_visible: Optional[numpy.ndarray] = None) → dict[源代码]¶

Encode keypoints into heatmaps. Note that the original keypoint coordinates should be in the input image space.

参数

keypoints (np.ndarray) – Keypoint coordinates in shape (N, K, D)
keypoints_visible (np.ndarray) – Keypoint visibilities in shape (N, K)

返回

heatmaps (np.ndarray): The generated heatmap in shape
(K, H, W) where [W, H] is the heatmap_size
keypoint_weights (np.ndarray): The target weights in shape
(N, K)

返回类型

dict

class mmpose.codecs.MotionBERTLabel(num_keypoints: int, root_index: int = 0, remove_root: bool = False, save_index: bool = False, concat_vis: bool = False, rootrel: bool = False, mode: str = 'test')[源代码]¶

Generate keypoint and label coordinates for `MotionBERT`_ by Zhu et al (2022).

备注

instance number: N
keypoint number: K
keypoint dimension: D
pose-lifitng target dimension: C

参数

num_keypoints (int) – The number of keypoints in the dataset.
root_index (int) – Root keypoint index in the pose. Default: 0.
remove_root (bool) – If true, remove the root keypoint from the pose. Default: False.
save_index (bool) – If true, store the root position separated from the original pose, only takes effect if remove_root is True. Default: False.
concat_vis (bool) – If true, concat the visibility item of keypoints. Default: False.
rootrel (bool) – If true, the root keypoint will be set to the coordinate origin. Default: False.
mode (str) – Indicating whether the current mode is ‘train’ or ‘test’. Default: 'test'.

decode(encoded: numpy.ndarray, w: Optional[numpy.ndarray] = None, h: Optional[numpy.ndarray] = None, factor: Optional[numpy.ndarray] = None) → Tuple[numpy.ndarray, numpy.ndarray][源代码]¶

Decode keypoint coordinates from normalized space to input image space.

参数

encoded (np.ndarray) – Coordinates in shape (N, K, C).
w (np.ndarray, optional) – The image widths in shape (N, ). Default: None.
h (np.ndarray, optional) – The image heights in shape (N, ). Default: None.
factor (np.ndarray, optional) – The factor for projection in shape (N, ). Default: None.

返回

Decoded coordinates in shape (N, K, C). scores (np.ndarray): The keypoint scores in shape (N, K).

返回类型

keypoints (np.ndarray)

encode(keypoints: numpy.ndarray, keypoints_visible: Optional[numpy.ndarray] = None, lifting_target: Optional[numpy.ndarray] = None, lifting_target_visible: Optional[numpy.ndarray] = None, camera_param: Optional[dict] = None, factor: Optional[numpy.ndarray] = None) → dict[源代码]¶

Encoding keypoints from input image space to normalized space.

参数

keypoints (np.ndarray) – Keypoint coordinates in shape (B, T, K, D).
keypoints_visible (np.ndarray, optional) – Keypoint visibilities in shape (B, T, K).
lifting_target (np.ndarray, optional) – 3d target coordinate in shape (T, K, C).
lifting_target_visible (np.ndarray, optional) – Target coordinate in shape (T, K, ).
camera_param (dict, optional) – The camera parameter dictionary.
factor (np.ndarray, optional) – The factor mapping camera and image coordinate in shape (T, ).

返回

Contains the following items:

keypoint_labels (np.ndarray): The processed keypoints in shape like (N, K, D).

keypoint_labels_visible (np.ndarray): The processed keypoints’ weights in shape (N, K, ) or (N, K-1, ).

lifting_target_label: The processed target coordinate in shape (K, C) or (K-1, C).

lifting_target_weight (np.ndarray): The target weights in shape (K, ) or (K-1, ).

factor (np.ndarray): The factor mapping camera and image coordinate in shape (T, 1).

返回类型

encoded (dict)

class mmpose.codecs.RegressionLabel(input_size: Tuple[int, int])[源代码]¶

Generate keypoint coordinates.

备注

instance number: N
keypoint number: K
keypoint dimension: D
image size: [w, h]

Encoded:

keypoint_labels (np.ndarray): The normalized regression labels in
shape (N, K, D) where D is 2 for 2d coordinates

keypoint_weights (np.ndarray): The target weights in shape (N, K)

参数: input_size (tuple) – Input image size in [w, h]

decode(encoded: numpy.ndarray) → Tuple[numpy.ndarray, numpy.ndarray][源代码]¶

Decode keypoint coordinates from normalized space to input image space.

参数

encoded (np.ndarray) – Coordinates in shape (N, K, D)

返回

keypoints (np.ndarray): Decoded coordinates in shape (N, K, D)
scores (np.ndarray): The keypoint scores in shape (N, K).
It usually represents the confidence of the keypoint prediction

返回类型

tuple

encode(keypoints: numpy.ndarray, keypoints_visible: Optional[numpy.ndarray] = None) → dict[源代码]¶

Encoding keypoints from input image space to normalized space.

参数

keypoints (np.ndarray) – Keypoint coordinates in shape (N, K, D)
keypoints_visible (np.ndarray) – Keypoint visibilities in shape (N, K)

返回

keypoint_labels (np.ndarray): The normalized regression labels in
shape (N, K, D) where D is 2 for 2d coordinates
keypoint_weights (np.ndarray): The target weights in shape
(N, K)

返回类型

dict

class mmpose.codecs.SPR(input_size: Tuple[int, int], heatmap_size: Tuple[int, int], sigma: Optional[Union[float, Tuple[float]]] = None, generate_keypoint_heatmaps: bool = False, root_type: str = 'kpt_center', minimal_diagonal_length: Union[int, float] = 5, background_weight: float = 0.1, decode_nms_kernel: int = 5, decode_max_instances: int = 30, decode_thr: float = 0.01)[源代码]¶

Encode/decode keypoints with Structured Pose Representation (SPR).

See the paper Single-stage multi-person pose machines by Nie et al (2017) for details

备注

instance number: N
keypoint number: K
keypoint dimension: D
image size: [w, h]
heatmap size: [W, H]

Encoded:

heatmaps (np.ndarray): The generated heatmap in shape (1, H, W)
where [W, H] is the heatmap_size. If the keypoint heatmap is generated together, the output heatmap shape is (K+1, H, W)

heatmap_weights (np.ndarray): The target weights for heatmaps which
has same shape with heatmaps.

displacements (np.ndarray): The dense keypoint displacement in
shape (K*2, H, W).

displacement_weights (np.ndarray): The target weights for heatmaps
which has same shape with displacements.

参数

input_size (tuple) – Image size in [w, h]
heatmap_size (tuple) – Heatmap size in [W, H]
sigma (float or tuple, optional) – The sigma values of the Gaussian heatmaps. If sigma is a tuple, it includes both sigmas for root and keypoint heatmaps. None means the sigmas are computed automatically from the heatmap size. Defaults to None
generate_keypoint_heatmaps (bool) – Whether to generate Gaussian heatmaps for each keypoint. Defaults to False
root_type (str) –
The method to generate the instance root. Options are:
- 'kpt_center': Average coordinate of all visible keypoints.
- 'bbox_center': Center point of bounding boxes outlined by
  all visible keypoints.
Defaults to 'kpt_center'
minimal_diagonal_length (int or float) – The threshold of diagonal length of instance bounding box. Small instances will not be used in training. Defaults to 32
background_weight (float) – Loss weight of background pixels. Defaults to 0.1
decode_thr (float) – The threshold of keypoint response value in heatmaps. Defaults to 0.01
decode_nms_kernel (int) – The kernel size of the NMS during decoding, which should be an odd integer. Defaults to 5
decode_max_instances (int) – The maximum number of instances to decode. Defaults to 30

decode(heatmaps: torch.Tensor, displacements: torch.Tensor) → Tuple[numpy.ndarray, numpy.ndarray][源代码]¶

Decode the keypoint coordinates from heatmaps and displacements. The decoded keypoint coordinates are in the input image space.

参数

heatmaps (Tensor) – Encoded root and keypoints (optional) heatmaps in shape (1, H, W) or (K+1, H, W)
displacements (Tensor) – Encoded keypoints displacement fields in shape (K*D, H, W)

返回

keypoints (Tensor): Decoded keypoint coordinates in shape
(N, K, D)
scores (tuple):
- root_scores (Tensor): The root scores in shape (N, )
- keypoint_scores (Tensor): The keypoint scores in
  shape (N, K). If keypoint heatmaps are not generated, keypoint_scores will be None

返回类型

tuple

encode(keypoints: numpy.ndarray, keypoints_visible: Optional[numpy.ndarray] = None) → dict[源代码]¶

Encode keypoints into root heatmaps and keypoint displacement fields. Note that the original keypoint coordinates should be in the input image space.

参数

keypoints (np.ndarray) – Keypoint coordinates in shape (N, K, D)
keypoints_visible (np.ndarray) – Keypoint visibilities in shape (N, K)

返回

heatmaps (np.ndarray): The generated heatmap in shape
(1, H, W) where [W, H] is the heatmap_size. If keypoint heatmaps are generated together, the shape is (K+1, H, W)
heatmap_weights (np.ndarray): The pixel-wise weight for heatmaps
which has same shape with heatmaps
displacements (np.ndarray): The generated displacement fields in
shape (K*D, H, W). The vector on each pixels represents the displacement of keypoints belong to the associated instance from this pixel.
displacement_weights (np.ndarray): The pixel-wise weight for
displacements which has same shape with displacements

返回类型

dict

get_keypoint_scores(heatmaps: torch.Tensor, keypoints: torch.Tensor)[源代码]¶

Calculate the keypoint scores with keypoints heatmaps and coordinates.

参数

heatmaps (Tensor) – Keypoint heatmaps in shape (K, H, W)
keypoints (Tensor) – Keypoint coordinates in shape (N, K, D)

返回

Keypoint scores in [N, K]

返回类型

Tensor

class mmpose.codecs.SimCCLabel(input_size: Tuple[int, int], smoothing_type: str = 'gaussian', sigma: Union[float, int, Tuple[float]] = 6.0, simcc_split_ratio: float = 2.0, label_smooth_weight: float = 0.0, normalize: bool = True, use_dark: bool = False, decode_visibility: bool = False, decode_beta: float = 150.0)[源代码]¶

Generate keypoint representation via “SimCC” approach. See the paper: `SimCC: a Simple Coordinate Classification Perspective for Human Pose Estimation`_ by Li et al (2022) for more details. Old name: SimDR

备注

instance number: N
keypoint number: K
keypoint dimension: D
image size: [w, h]

Encoded:

keypoint_x_labels (np.ndarray): The generated SimCC label for x-axis.
The label shape is (N, K, Wx) if smoothing_type=='gaussian' and (N, K) if smoothing_type==’standard’`, where \(Wx=w*simcc_split_ratio\)

keypoint_y_labels (np.ndarray): The generated SimCC label for y-axis.
The label shape is (N, K, Wy) if smoothing_type=='gaussian' and (N, K) if smoothing_type==’standard’`, where \(Wy=h*simcc_split_ratio\)

keypoint_weights (np.ndarray): The target weights in shape (N, K)

参数

input_size (tuple) – Input image size in [w, h]
smoothing_type (str) – The SimCC label smoothing strategy. Options are 'gaussian' and 'standard'. Defaults to 'gaussian'
sigma (float | int | tuple) – The sigma value in the Gaussian SimCC label. Defaults to 6.0
simcc_split_ratio (float) – The ratio of the label size to the input size. For example, if the input width is w, the x label size will be \(w*simcc_split_ratio\). Defaults to 2.0
label_smooth_weight (float) – Label Smoothing weight. Defaults to 0.0
normalize (bool) – Whether to normalize the heatmaps. Defaults to True.
use_dark (bool) – Whether to use the DARK post processing. Defaults to False.
decode_visibility (bool) – Whether to decode the visibility. Defaults to False.
decode_beta (float) – The beta value for decoding visibility. Defaults to 150.0.

Estimation`: https://arxiv.org/abs/2107.03332

decode(simcc_x: numpy.ndarray, simcc_y: numpy.ndarray) → Tuple[numpy.ndarray, numpy.ndarray][源代码]¶

Decode keypoint coordinates from SimCC representations. The decoded coordinates are in the input image space.

参数

encoded (Tuple[np.ndarray, np.ndarray]) – SimCC labels for x-axis and y-axis
simcc_x (np.ndarray) – SimCC label for x-axis
simcc_y (np.ndarray) – SimCC label for y-axis

返回

keypoints (np.ndarray): Decoded coordinates in shape (N, K, D)
socres (np.ndarray): The keypoint scores in shape (N, K).
It usually represents the confidence of the keypoint prediction

返回类型

tuple

encode(keypoints: numpy.ndarray, keypoints_visible: Optional[numpy.ndarray] = None) → dict[源代码]¶

Encoding keypoints into SimCC labels. Note that the original keypoint coordinates should be in the input image space.

参数

keypoints (np.ndarray) – Keypoint coordinates in shape (N, K, D)
keypoints_visible (np.ndarray) – Keypoint visibilities in shape (N, K)

返回

keypoint_x_labels (np.ndarray): The generated SimCC label for
x-axis. The label shape is (N, K, Wx) if smoothing_type=='gaussian' and (N, K) if smoothing_type==’standard’`, where \(Wx=w*simcc_split_ratio\)
keypoint_y_labels (np.ndarray): The generated SimCC label for
y-axis. The label shape is (N, K, Wy) if smoothing_type=='gaussian' and (N, K) if smoothing_type==’standard’`, where \(Wy=h*simcc_split_ratio\)
keypoint_weights (np.ndarray): The target weights in shape
(N, K)

返回类型

dict

class mmpose.codecs.UDPHeatmap(input_size: Tuple[int, int], heatmap_size: Tuple[int, int], heatmap_type: str = 'gaussian', sigma: float = 2.0, radius_factor: float = 0.0546875, blur_kernel_size: int = 11)[源代码]¶

Generate keypoint heatmaps by Unbiased Data Processing (UDP). See the paper: `The Devil is in the Details: Delving into Unbiased Data Processing for Human Pose Estimation`_ by Huang et al (2020) for details.

备注

instance number: N
keypoint number: K
keypoint dimension: D
image size: [w, h]
heatmap size: [W, H]

Encoded:

heatmap (np.ndarray): The generated heatmap in shape (C_out, H, W)
where [W, H] is the heatmap_size, and the C_out is the output channel number which depends on the heatmap_type. If heatmap_type==’gaussian’, C_out equals to keypoint number K; if heatmap_type==’combined’, C_out equals to K*3 (x_offset, y_offset and class label)

keypoint_weights (np.ndarray): The target weights in shape (K,)

参数

input_size (tuple) – Image size in [w, h]
heatmap_size (tuple) – Heatmap size in [W, H]
heatmap_type (str) –
The heatmap type to encode the keypoitns. Options are:
- 'gaussian': Gaussian heatmap
- 'combined': Combination of a binary label map and offset
  maps for X and Y axes.
sigma (float) – The sigma value of the Gaussian heatmap when heatmap_type=='gaussian'. Defaults to 2.0
radius_factor (float) – The radius factor of the binary label map when heatmap_type=='combined'. The positive region is defined as the neighbor of the keypoit with the radius \(r=radius_factor*max(W, H)\). Defaults to 0.0546875
blur_kernel_size (int) – The Gaussian blur kernel size of the heatmap modulation in DarkPose. Defaults to 11

Human Pose Estimation`: https://arxiv.org/abs/1911.07524

decode(encoded: numpy.ndarray) → Tuple[numpy.ndarray, numpy.ndarray][源代码]¶

Decode keypoint coordinates from heatmaps. The decoded keypoint coordinates are in the input image space.

参数

encoded (np.ndarray) – Heatmaps in shape (K, H, W)

返回

keypoints (np.ndarray): Decoded keypoint coordinates in shape
(N, K, D)
scores (np.ndarray): The keypoint scores in shape (N, K). It
usually represents the confidence of the keypoint prediction

返回类型

tuple

encode(keypoints: numpy.ndarray, keypoints_visible: Optional[numpy.ndarray] = None) → dict[源代码]¶

Encode keypoints into heatmaps. Note that the original keypoint coordinates should be in the input image space.

参数

keypoints (np.ndarray) – Keypoint coordinates in shape (N, K, D)
keypoints_visible (np.ndarray) – Keypoint visibilities in shape (N, K)

返回

heatmap (np.ndarray): The generated heatmap in shape
(C_out, H, W) where [W, H] is the heatmap_size, and the C_out is the output channel number which depends on the heatmap_type. If heatmap_type==’gaussian’, C_out equals to keypoint number K; if heatmap_type==’combined’, C_out equals to K*3 (x_offset, y_offset and class label)
keypoint_weights (np.ndarray): The target weights in shape
(K,)

返回类型

dict

class mmpose.codecs.VideoPoseLifting(num_keypoints: int, zero_center: bool = True, root_index: Union[int, List] = 0, remove_root: bool = False, save_index: bool = False, reshape_keypoints: bool = True, concat_vis: bool = False, normalize_camera: bool = False)[源代码]¶

Generate keypoint coordinates for pose lifter.

备注

instance number: N
keypoint number: K
keypoint dimension: D
pose-lifitng target dimension: C

参数

num_keypoints (int) – The number of keypoints in the dataset.
zero_center – Whether to zero-center the target around root. Default: True.
root_index (Union[int, List]) – Root keypoint index in the pose. Default: 0.
remove_root (bool) – If true, remove the root keypoint from the pose. Default: False.
save_index (bool) – If true, store the root position separated from the original pose, only takes effect if remove_root is True. Default: False.
reshape_keypoints (bool) – If true, reshape the keypoints into shape (-1, N). Default: True.
concat_vis (bool) – If true, concat the visibility item of keypoints. Default: False.
normalize_camera (bool) – Whether to normalize camera intrinsics. Default: False.

decode(encoded: numpy.ndarray, target_root: Optional[numpy.ndarray] = None) → Tuple[numpy.ndarray, numpy.ndarray][源代码]¶

Decode keypoint coordinates from normalized space to input image space.

参数

encoded (np.ndarray) – Coordinates in shape (N, K, C).
target_root (np.ndarray, optional) – The pose-lifitng target root coordinate. Default: None.

返回

Decoded coordinates in shape (N, K, C). scores (np.ndarray): The keypoint scores in shape (N, K).

返回类型

keypoints (np.ndarray)

encode(keypoints: numpy.ndarray, keypoints_visible: Optional[numpy.ndarray] = None, lifting_target: Optional[numpy.ndarray] = None, lifting_target_visible: Optional[numpy.ndarray] = None, camera_param: Optional[dict] = None) → dict[源代码]¶

Encoding keypoints from input image space to normalized space.

参数

keypoints (np.ndarray) – Keypoint coordinates in shape (N, K, D).
keypoints_visible (np.ndarray, optional) – Keypoint visibilities in shape (N, K).
lifting_target (np.ndarray, optional) – 3d target coordinate in shape (T, K, C).
lifting_target_visible (np.ndarray, optional) – Target coordinate in shape (T, K, ).
camera_param (dict, optional) – The camera parameter dictionary.

返回

Contains the following items:

keypoint_labels (np.ndarray): The processed keypoints in shape like (N, K, D) or (K * D, N).

keypoint_labels_visible (np.ndarray): The processed keypoints’ weights in shape (N, K, ) or (N-1, K, ).

lifting_target_label: The processed target coordinate in shape (K, C) or (K-1, C).

lifting_target_weight (np.ndarray): The target weights in shape (K, ) or (K-1, ).

trajectory_weights (np.ndarray): The trajectory weights in shape (K, ).

In addition, there are some optional items it may contain:

target_root (np.ndarray): The root coordinate of target in shape (C, ). Exists if zero_center is True.

target_root_removed (bool): Indicate whether the root of pose-lifitng target is removed. Exists if remove_root is True.

target_root_index (int): An integer indicating the index of root. Exists if remove_root and save_index are True.

camera_param (dict): The updated camera parameter dictionary. Exists if normalize_camera is True.

返回类型

encoded (dict)

class mmpose.codecs.YOLOXPoseAnnotationProcessor(expand_bbox: bool = False, input_size: Optional[Tuple] = None)[源代码]¶

Convert dataset annotations to the input format of YOLOX-Pose.

This processor expands bounding boxes and converts category IDs to labels.

参数

expand_bbox (bool, optional) – Whether to expand the bounding box to include all keypoints. Defaults to False.
input_size (tuple, optional) – The size of the input image for the model, formatted as (h, w). This argument is necessary for the codec in deployment but is not used indeed.

encode(keypoints: Optional[numpy.ndarray] = None, keypoints_visible: Optional[numpy.ndarray] = None, bbox: Optional[numpy.ndarray] = None, category_id: Optional[List[int]] = None) → Dict[str, numpy.ndarray][源代码]¶

Encode keypoints, bounding boxes, and category IDs.

参数

keypoints (np.ndarray, optional) – Keypoints array. Defaults to None.
keypoints_visible (np.ndarray, optional) – Visibility array for keypoints. Defaults to None.
bbox (np.ndarray, optional) – Bounding box array. Defaults to None.
category_id (List[int], optional) – List of category IDs. Defaults to None.

返回

Encoded annotations.

返回类型

Dict[str, np.ndarray]

mmpose.models¶

backbones¶

class mmpose.models.backbones.AlexNet(num_classes=- 1, init_cfg=None)[源代码]¶

AlexNet backbone.

The input for AlexNet is a 224x224 RGB image.

参数

num_classes (int) – number of classes for classification. The default value is -1, which uses the backbone as a feature extractor without the top classifier.
init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None

forward(x)[源代码]¶

Forward function.

参数: x (Tensor | tuple[Tensor]) – x could be a torch.Tensor or a tuple of torch.Tensor, containing input data for forward computation.

class mmpose.models.backbones.CPM(in_channels, out_channels, feat_channels=128, middle_channels=32, num_stages=6, norm_cfg={'requires_grad': True, 'type': 'BN'}, init_cfg=[{'type': 'Normal', 'std': 0.001, 'layer': ['Conv2d']}, {'type': 'Constant', 'val': 1, 'layer': ['_BatchNorm', 'GroupNorm']}])[源代码]¶

CPM backbone.

Convolutional Pose Machines. More details can be found in the paper .

参数

in_channels (int) – The input channels of the CPM.
out_channels (int) – The output channels of the CPM.
feat_channels (int) – Feature channel of each CPM stage.
middle_channels (int) – Feature channel of conv after the middle stage.
num_stages (int) – Number of stages.
norm_cfg (dict) – Dictionary to construct and config norm layer.
init_cfg (dict or list[dict], optional) –
Initialization config dict. Default: ``[

dict(type=’Normal’, std=0.001, layer=[‘Conv2d’]), dict(

type=’Constant’, val=1, layer=[‘_BatchNorm’, ‘GroupNorm’])

]``

示例

>>> from mmpose.models import CPM
>>> import torch
>>> self = CPM(3, 17)
>>> self.eval()
>>> inputs = torch.rand(1, 3, 368, 368)
>>> level_outputs = self.forward(inputs)
>>> for level_output in level_outputs:
...     print(tuple(level_output.shape))
(1, 17, 46, 46)
(1, 17, 46, 46)
(1, 17, 46, 46)
(1, 17, 46, 46)
(1, 17, 46, 46)
(1, 17, 46, 46)

forward(x)[源代码]¶: Model forward function.

class mmpose.models.backbones.CSPDarknet(arch='P5', deepen_factor=1.0, widen_factor=1.0, out_indices=(2, 3, 4), frozen_stages=- 1, use_depthwise=False, arch_ovewrite=None, spp_kernal_sizes=(5, 9, 13), conv_cfg=None, norm_cfg={'eps': 0.001, 'momentum': 0.03, 'type': 'BN'}, act_cfg={'type': 'Swish'}, norm_eval=False, init_cfg={'a': 2.23606797749979, 'distribution': 'uniform', 'layer': 'Conv2d', 'mode': 'fan_in', 'nonlinearity': 'leaky_relu', 'type': 'Kaiming'})[源代码]¶

CSP-Darknet backbone used in YOLOv5 and YOLOX.

参数

arch (str) – Architecture of CSP-Darknet, from {P5, P6}. Default: P5.
deepen_factor (float) – Depth multiplier, multiply number of blocks in CSP layer by this amount. Default: 1.0.
widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Default: 1.0.
out_indices (Sequence[int]) – Output from which stages. Default: (2, 3, 4).
frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Default: -1.
use_depthwise (bool) – Whether to use depthwise separable convolution. Default: False.
arch_ovewrite (list) – Overwrite default arch settings. Default: None.
spp_kernal_sizes – (tuple[int]): Sequential of kernel sizes of SPP layers. Default: (5, 9, 13).
conv_cfg (dict) – Config dict for convolution layer. Default: None.
norm_cfg (dict) – Dictionary to construct and config norm layer. Default: dict(type=’BN’, requires_grad=True).
act_cfg (dict) – Config dict for activation layer. Default: dict(type=’LeakyReLU’, negative_slope=0.1).
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only.
init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None.

示例

>>> from mmpose.models import CSPDarknet
>>> import torch
>>> self = CSPDarknet(depth=53)
>>> self.eval()
>>> inputs = torch.rand(1, 3, 416, 416)
>>> level_outputs = self.forward(inputs)
>>> for level_out in level_outputs:
...     print(tuple(level_out.shape))
...
(1, 256, 52, 52)
(1, 512, 26, 26)
(1, 1024, 13, 13)

forward(x)[源代码]¶

Define the computation performed at every call.

Should be overridden by all subclasses.

备注

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

train(mode=True)[源代码]¶

Set the module in training mode.

This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g. Dropout, BatchNorm, etc.

参数: mode (bool) – whether to set training mode (True) or evaluation mode (False). Default: True.
返回: self
返回类型: Module

class mmpose.models.backbones.CSPNeXt(arch: str = 'P5', deepen_factor: float = 1.0, widen_factor: float = 1.0, out_indices: Sequence[int] = (2, 3, 4), frozen_stages: int = - 1, use_depthwise: bool = False, expand_ratio: float = 0.5, arch_ovewrite: Optional[dict] = None, spp_kernel_sizes: Sequence[int] = (5, 9, 13), channel_attention: bool = True, conv_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 0.001, 'momentum': 0.03, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'type': 'SiLU'}, norm_eval: bool = False, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = {'a': 2.23606797749979, 'distribution': 'uniform', 'layer': 'Conv2d', 'mode': 'fan_in', 'nonlinearity': 'leaky_relu', 'type': 'Kaiming'})[源代码]¶

CSPNeXt backbone used in RTMDet.

参数

arch (str) – Architecture of CSPNeXt, from {P5, P6}. Defaults to P5.
expand_ratio (float) – Ratio to adjust the number of channels of the hidden layer. Defaults to 0.5.
deepen_factor (float) – Depth multiplier, multiply number of blocks in CSP layer by this amount. Defaults to 1.0.
widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Defaults to 1.0.
out_indices (Sequence[int]) – Output from which stages. Defaults to (2, 3, 4).
frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Defaults to -1.
use_depthwise (bool) – Whether to use depthwise separable convolution. Defaults to False.
arch_ovewrite (list) – Overwrite default arch settings. Defaults to None.
spp_kernel_sizes – (tuple[int]): Sequential of kernel sizes of SPP layers. Defaults to (5, 9, 13).
channel_attention (bool) – Whether to add channel attention in each stage. Defaults to True.
conv_cfg (ConfigDict or dict, optional) – Config dict for convolution layer. Defaults to None.
norm_cfg (ConfigDict or dict) – Dictionary to construct and config norm layer. Defaults to dict(type=’BN’, requires_grad=True).
act_cfg (ConfigDict or dict) – Config dict for activation layer. Defaults to dict(type=’SiLU’).
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only.

:param init_cfg (ConfigDict or dict or list[dict] or: list[ConfigDict]): Initialization config dict.

forward(x: Tuple[torch.Tensor, ...]) → Tuple[torch.Tensor, ...][源代码]¶

Define the computation performed at every call.

Should be overridden by all subclasses.

备注

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

train(mode=True) → None[源代码]¶

Set the module in training mode.

This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g. Dropout, BatchNorm, etc.

参数: mode (bool) – whether to set training mode (True) or evaluation mode (False). Default: True.
返回: self
返回类型: Module

class mmpose.models.backbones.DSTFormer(in_channels, feat_size=256, depth=5, num_heads=8, mlp_ratio=4, num_keypoints=17, seq_len=243, qkv_bias=True, qk_scale=None, drop_rate=0.0, attn_drop_rate=0.0, drop_path_rate=0.0, att_fuse=True, init_cfg=None)[源代码]¶

Dual-stream Spatio-temporal Transformer Module.

参数

in_channels (int) – Number of input channels.
feat_size – Number of feature channels. Default: 256.
depth – The network depth. Default: 5.
num_heads – Number of heads in multi-Head self-attention blocks. Default: 8.
mlp_ratio (int, optional) – The expansion ratio of FFN. Default: 4.
num_keypoints – num_keypoints (int): Number of keypoints. Default: 17.
seq_len – The sequence length. Default: 243.
qkv_bias (bool, optional) – If True, add a learnable bias to q, k, v. Default: True.
qk_scale (float | None, optional) – Override default qk scale of head_dim ** -0.5 if set. Default: None.
drop_rate (float, optional) – Dropout ratio of input. Default: 0.
attn_drop_rate (float, optional) – Dropout ratio of attention weight. Default: 0.
drop_path_rate (float, optional) – Stochastic depth rate. Default: 0.
att_fuse – Whether to fuse the results of attention blocks. Default: True.
init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None

示例

>>> from mmpose.models import DSTFormer
>>> import torch
>>> self = DSTFormer(in_channels=3)
>>> self.eval()
>>> inputs = torch.rand(1, 2, 17, 3)
>>> level_outputs = self.forward(inputs)
>>> print(tuple(level_outputs.shape))
(1, 2, 17, 512)

forward(x)[源代码]¶

Forward function.

参数: x (Tensor | tuple[Tensor]) – x could be a torch.Tensor or a tuple of torch.Tensor, containing input data for forward computation.

init_weights()[源代码]¶: Initialize the weights in backbone.

class mmpose.models.backbones.HRFormer(extra, in_channels=3, conv_cfg=None, norm_cfg={'requires_grad': True, 'type': 'BN'}, transformer_norm_cfg={'eps': 1e-06, 'type': 'LN'}, norm_eval=False, with_cp=False, zero_init_residual=False, frozen_stages=- 1, init_cfg=[{'type': 'Normal', 'std': 0.001, 'layer': ['Conv2d']}, {'type': 'Constant', 'val': 1, 'layer': ['_BatchNorm', 'GroupNorm']}])[源代码]¶

HRFormer backbone.

This backbone is the implementation of HRFormer: High-Resolution Transformer for Dense Prediction.

参数

extra (dict) –
Detailed configuration for each stage of HRNet. There must be 4 stages, the configuration for each stage must have 5 keys:
- num_modules (int): The number of HRModule in this stage.
- num_branches (int): The number of branches in the HRModule.
- block (str): The type of block.
- num_blocks (tuple): The number of blocks in each branch.
  The length must be equal to num_branches.
- num_channels (tuple): The number of channels in each branch.
  The length must be equal to num_branches.
in_channels (int) – Number of input image channels. Normally 3.
conv_cfg (dict) – Dictionary to construct and config conv layer. Default: None.
norm_cfg (dict) – Config of norm layer. Use SyncBN by default.
transformer_norm_cfg (dict) – Config of transformer norm layer. Use LN by default.
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.
zero_init_residual (bool) – Whether to use zero init for last norm layer in resblocks to let them behave as identity. Default: False.
frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Default: -1.
init_cfg (dict or list[dict], optional) –
Initialization config dict. Default: ``[

dict(type=’Normal’, std=0.001, layer=[‘Conv2d’]), dict(

type=’Constant’, val=1, layer=[‘_BatchNorm’, ‘GroupNorm’])

]``

示例

>>> from mmpose.models import HRFormer
>>> import torch
>>> extra = dict(
>>>     stage1=dict(
>>>         num_modules=1,
>>>         num_branches=1,
>>>         block='BOTTLENECK',
>>>         num_blocks=(2, ),
>>>         num_channels=(64, )),
>>>     stage2=dict(
>>>         num_modules=1,
>>>         num_branches=2,
>>>         block='HRFORMER',
>>>         window_sizes=(7, 7),
>>>         num_heads=(1, 2),
>>>         mlp_ratios=(4, 4),
>>>         num_blocks=(2, 2),
>>>         num_channels=(32, 64)),
>>>     stage3=dict(
>>>         num_modules=4,
>>>         num_branches=3,
>>>         block='HRFORMER',
>>>         window_sizes=(7, 7, 7),
>>>         num_heads=(1, 2, 4),
>>>         mlp_ratios=(4, 4, 4),
>>>         num_blocks=(2, 2, 2),
>>>         num_channels=(32, 64, 128)),
>>>     stage4=dict(
>>>         num_modules=2,
>>>         num_branches=4,
>>>         block='HRFORMER',
>>>         window_sizes=(7, 7, 7, 7),
>>>         num_heads=(1, 2, 4, 8),
>>>         mlp_ratios=(4, 4, 4, 4),
>>>         num_blocks=(2, 2, 2, 2),
>>>         num_channels=(32, 64, 128, 256)))
>>> self = HRFormer(extra, in_channels=1)
>>> self.eval()
>>> inputs = torch.rand(1, 1, 32, 32)
>>> level_outputs = self.forward(inputs)
>>> for level_out in level_outputs:
...     print(tuple(level_out.shape))
(1, 32, 8, 8)
(1, 64, 4, 4)
(1, 128, 2, 2)
(1, 256, 1, 1)

class mmpose.models.backbones.HRNet(extra, in_channels=3, conv_cfg=None, norm_cfg={'type': 'BN'}, norm_eval=False, with_cp=False, zero_init_residual=False, frozen_stages=- 1, init_cfg=[{'type': 'Normal', 'std': 0.001, 'layer': ['Conv2d']}, {'type': 'Constant', 'val': 1, 'layer': ['_BatchNorm', 'GroupNorm']}])[源代码]¶

HRNet backbone.

High-Resolution Representations for Labeling Pixels and Regions

参数

extra (dict) – detailed configuration for each stage of HRNet.
in_channels (int) – Number of input image channels. Default: 3.
conv_cfg (dict) – dictionary to construct and config conv layer.
norm_cfg (dict) – dictionary to construct and config norm layer.
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed.
zero_init_residual (bool) – whether to use zero init for last norm layer in resblocks to let them behave as identity.
frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Default: -1.
init_cfg (dict or list[dict], optional) –
Initialization config dict. Default: ``[

dict(type=’Normal’, std=0.001, layer=[‘Conv2d’]), dict(

type=’Constant’, val=1, layer=[‘_BatchNorm’, ‘GroupNorm’])

]``

示例

>>> from mmpose.models import HRNet
>>> import torch
>>> extra = dict(
>>>     stage1=dict(
>>>         num_modules=1,
>>>         num_branches=1,
>>>         block='BOTTLENECK',
>>>         num_blocks=(4, ),
>>>         num_channels=(64, )),
>>>     stage2=dict(
>>>         num_modules=1,
>>>         num_branches=2,
>>>         block='BASIC',
>>>         num_blocks=(4, 4),
>>>         num_channels=(32, 64)),
>>>     stage3=dict(
>>>         num_modules=4,
>>>         num_branches=3,
>>>         block='BASIC',
>>>         num_blocks=(4, 4, 4),
>>>         num_channels=(32, 64, 128)),
>>>     stage4=dict(
>>>         num_modules=3,
>>>         num_branches=4,
>>>         block='BASIC',
>>>         num_blocks=(4, 4, 4, 4),
>>>         num_channels=(32, 64, 128, 256)))
>>> self = HRNet(extra, in_channels=1)
>>> self.eval()
>>> inputs = torch.rand(1, 1, 32, 32)
>>> level_outputs = self.forward(inputs)
>>> for level_out in level_outputs:
...     print(tuple(level_out.shape))
(1, 32, 8, 8)

forward(x)[源代码]¶: Forward function.

init_weights()[源代码]¶: Initialize the weights in backbone.

property norm1¶

the normalization layer named “norm1”

Type: nn.Module

property norm2¶

the normalization layer named “norm2”

Type: nn.Module

train(mode=True)[源代码]¶: Convert the model into training mode.

class mmpose.models.backbones.HourglassAENet(downsample_times=4, num_stacks=1, out_channels=34, stage_channels=(256, 384, 512, 640, 768), feat_channels=256, norm_cfg={'requires_grad': True, 'type': 'BN'}, init_cfg=[{'type': 'Normal', 'std': 0.001, 'layer': ['Conv2d']}, {'type': 'Constant', 'val': 1, 'layer': ['_BatchNorm', 'GroupNorm']}])[源代码]¶

Hourglass-AE Network proposed by Newell et al.

Associative Embedding: End-to-End Learning for Joint Detection and Grouping.

More details can be found in the paper .

参数

downsample_times (int) – Downsample times in a HourglassModule.
num_stacks (int) – Number of HourglassModule modules stacked, 1 for Hourglass-52, 2 for Hourglass-104.
stage_channels (list[int]) – Feature channel of each sub-module in a HourglassModule.
stage_blocks (list[int]) – Number of sub-modules stacked in a HourglassModule.
feat_channels (int) – Feature channel of conv after a HourglassModule.
norm_cfg (dict) – Dictionary to construct and config norm layer.
init_cfg (dict or list[dict], optional) –
Initialization config dict. Default: ``[

dict(type=’Normal’, std=0.001, layer=[‘Conv2d’]), dict(

type=’Constant’, val=1, layer=[‘_BatchNorm’, ‘GroupNorm’])

]``

示例

>>> from mmpose.models import HourglassAENet
>>> import torch
>>> self = HourglassAENet()
>>> self.eval()
>>> inputs = torch.rand(1, 3, 512, 512)
>>> level_outputs = self.forward(inputs)
>>> for level_output in level_outputs:
...     print(tuple(level_output.shape))
(1, 34, 128, 128)

forward(x)[源代码]¶: Model forward function.

class mmpose.models.backbones.HourglassNet(downsample_times=5, num_stacks=2, stage_channels=(256, 256, 384, 384, 384, 512), stage_blocks=(2, 2, 2, 2, 2, 4), feat_channel=256, norm_cfg={'requires_grad': True, 'type': 'BN'}, init_cfg=[{'type': 'Normal', 'std': 0.001, 'layer': ['Conv2d']}, {'type': 'Constant', 'val': 1, 'layer': ['_BatchNorm', 'GroupNorm']}])[源代码]¶

HourglassNet backbone.

Stacked Hourglass Networks for Human Pose Estimation. More details can be found in the paper .

参数

downsample_times (int) – Downsample times in a HourglassModule.
num_stacks (int) – Number of HourglassModule modules stacked, 1 for Hourglass-52, 2 for Hourglass-104.
stage_channels (list[int]) – Feature channel of each sub-module in a HourglassModule.
stage_blocks (list[int]) – Number of sub-modules stacked in a HourglassModule.
feat_channel (int) – Feature channel of conv after a HourglassModule.
norm_cfg (dict) – Dictionary to construct and config norm layer.
init_cfg (dict or list[dict], optional) –
Initialization config dict. Default: ``[

dict(type=’Normal’, std=0.001, layer=[‘Conv2d’]), dict(

type=’Constant’, val=1, layer=[‘_BatchNorm’, ‘GroupNorm’])

]``

示例

>>> from mmpose.models import HourglassNet
>>> import torch
>>> self = HourglassNet()
>>> self.eval()
>>> inputs = torch.rand(1, 3, 511, 511)
>>> level_outputs = self.forward(inputs)
>>> for level_output in level_outputs:
...     print(tuple(level_output.shape))
(1, 256, 128, 128)
(1, 256, 128, 128)

forward(x)[源代码]¶: Model forward function.

class mmpose.models.backbones.LiteHRNet(extra, in_channels=3, conv_cfg=None, norm_cfg={'type': 'BN'}, norm_eval=False, with_cp=False, init_cfg=[{'type': 'Normal', 'std': 0.001, 'layer': ['Conv2d']}, {'type': 'Constant', 'val': 1, 'layer': ['_BatchNorm', 'GroupNorm']}])[源代码]¶

Lite-HRNet backbone.

Lite-HRNet: A Lightweight High-Resolution Network.

Code adapted from ‘https://github.com/HRNet/Lite-HRNet’.

参数

extra (dict) – detailed configuration for each stage of HRNet.
in_channels (int) – Number of input image channels. Default: 3.
conv_cfg (dict) – dictionary to construct and config conv layer.
norm_cfg (dict) – dictionary to construct and config norm layer.
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed.
init_cfg (dict or list[dict], optional) –
Initialization config dict. Default: ``[

dict(type=’Normal’, std=0.001, layer=[‘Conv2d’]), dict(

type=’Constant’, val=1, layer=[‘_BatchNorm’, ‘GroupNorm’])

]``

示例

>>> from mmpose.models import LiteHRNet
>>> import torch
>>> extra=dict(
>>>    stem=dict(stem_channels=32, out_channels=32, expand_ratio=1),
>>>    num_stages=3,
>>>    stages_spec=dict(
>>>        num_modules=(2, 4, 2),
>>>        num_branches=(2, 3, 4),
>>>        num_blocks=(2, 2, 2),
>>>        module_type=('LITE', 'LITE', 'LITE'),
>>>        with_fuse=(True, True, True),
>>>        reduce_ratios=(8, 8, 8),
>>>        num_channels=(
>>>            (40, 80),
>>>            (40, 80, 160),
>>>            (40, 80, 160, 320),
>>>        )),
>>>    with_head=False)
>>> self = LiteHRNet(extra, in_channels=1)
>>> self.eval()
>>> inputs = torch.rand(1, 1, 32, 32)
>>> level_outputs = self.forward(inputs)
>>> for level_out in level_outputs:
...     print(tuple(level_out.shape))
(1, 40, 8, 8)

forward(x)[源代码]¶: Forward function.

train(mode=True)[源代码]¶: Convert the model into training mode.

class mmpose.models.backbones.MSPN(unit_channels=256, num_stages=4, num_units=4, num_blocks=[2, 2, 2, 2], norm_cfg={'type': 'BN'}, res_top_channels=64, init_cfg=[{'type': 'Kaiming', 'layer': ['Conv2d']}, {'type': 'Constant', 'val': 1, 'layer': ['_BatchNorm', 'GroupNorm']}, {'type': 'Normal', 'std': 0.01, 'layer': ['Linear']}])[源代码]¶

MSPN backbone. Paper ref: Li et al. “Rethinking on Multi-Stage Networks for Human Pose Estimation” (CVPR 2020).

参数

unit_channels (int) – Number of Channels in an upsample unit. Default: 256
num_stages (int) – Number of stages in a multi-stage MSPN. Default: 4
num_units (int) – Number of downsample/upsample units in a single-stage network. Default: 4 Note: Make sure num_units == len(self.num_blocks)
num_blocks (list) – Number of bottlenecks in each downsample unit. Default: [2, 2, 2, 2]
norm_cfg (dict) – dictionary to construct and config norm layer. Default: dict(type=’BN’)
res_top_channels (int) – Number of channels of feature from ResNetTop. Default: 64.
init_cfg (dict or list[dict], optional) –
Initialization config dict. Default: ``[

dict(type=’Kaiming’, layer=[‘Conv2d’]), dict(

type=’Constant’, val=1, layer=[‘_BatchNorm’, ‘GroupNorm’]),

dict(
type=’Normal’, std=0.01, layer=[‘Linear’]),

]``

示例

>>> from mmpose.models import MSPN
>>> import torch
>>> self = MSPN(num_stages=2,num_units=2,num_blocks=[2,2])
>>> self.eval()
>>> inputs = torch.rand(1, 3, 511, 511)
>>> level_outputs = self.forward(inputs)
>>> for level_output in level_outputs:
...     for feature in level_output:
...         print(tuple(feature.shape))
...
(1, 256, 64, 64)
(1, 256, 128, 128)
(1, 256, 64, 64)
(1, 256, 128, 128)

forward(x)[源代码]¶: Model forward function.

init_weights()[源代码]¶: Initialize model weights.

class mmpose.models.backbones.MobileNetV2(widen_factor=1.0, out_indices=(7,), frozen_stages=- 1, conv_cfg=None, norm_cfg={'type': 'BN'}, act_cfg={'type': 'ReLU6'}, norm_eval=False, with_cp=False, init_cfg=[{'type': 'Kaiming', 'layer': ['Conv2d']}, {'type': 'Constant', 'val': 1, 'layer': ['_BatchNorm', 'GroupNorm']}])[源代码]¶

MobileNetV2 backbone.

参数

widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Default: 1.0.
out_indices (None or Sequence[int]) – Output from which stages. Default: (7, ).
frozen_stages (int) – Stages to be frozen (all param fixed). Default: -1, which means not freezing any parameters.
conv_cfg (dict) – Config dict for convolution layer. Default: None, which means using conv2d.
norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’BN’).
act_cfg (dict) – Config dict for activation layer. Default: dict(type=’ReLU6’).
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.
init_cfg (dict or list[dict], optional) –
Initialization config dict. Default: ``[

dict(type=’Kaiming’, layer=[‘Conv2d’]), dict(

type=’Constant’, val=1, layer=[‘_BatchNorm’, ‘GroupNorm’])

]``

forward(x)[源代码]¶

Forward function.

参数: x (Tensor | tuple[Tensor]) – x could be a torch.Tensor or a tuple of torch.Tensor, containing input data for forward computation.

make_layer(out_channels, num_blocks, stride, expand_ratio)[源代码]¶

Stack InvertedResidual blocks to build a layer for MobileNetV2.

参数

out_channels (int) – out_channels of block.
num_blocks (int) – number of blocks.
stride (int) – stride of the first block. Default: 1
expand_ratio (int) – Expand the number of channels of the hidden layer in InvertedResidual by this ratio. Default: 6.

train(mode=True)[源代码]¶

Set module status before forward computation.

参数: mode (bool) – Whether it is train_mode or test_mode

class mmpose.models.backbones.MobileNetV3(arch='small', conv_cfg=None, norm_cfg={'type': 'BN'}, out_indices=(- 1,), frozen_stages=- 1, norm_eval=False, with_cp=False, init_cfg=[{'type': 'Kaiming', 'layer': ['Conv2d']}, {'type': 'Constant', 'val': 1, 'layer': ['_BatchNorm']}])[源代码]¶

MobileNetV3 backbone.

参数

arch (str) – Architecture of mobilnetv3, from {small, big}. Default: small.
conv_cfg (dict) – Config dict for convolution layer. Default: None, which means using conv2d.
norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’BN’).
out_indices (None or Sequence[int]) – Output from which stages. Default: (-1, ), which means output tensors from final stage.
frozen_stages (int) – Stages to be frozen (all param fixed). Default: -1, which means not freezing any parameters.
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.
init_cfg (dict or list[dict], optional) –
Initialization config dict. Default: ``[

dict(type=’Kaiming’, layer=[‘Conv2d’]), dict(

type=’Constant’, val=1, layer=[‘_BatchNorm’])

]``

forward(x)[源代码]¶

Forward function.

参数: x (Tensor | tuple[Tensor]) – x could be a torch.Tensor or a tuple of torch.Tensor, containing input data for forward computation.

train(mode=True)[源代码]¶

Set module status before forward computation.

参数: mode (bool) – Whether it is train_mode or test_mode

class mmpose.models.backbones.PyramidVisionTransformer(pretrain_img_size=224, in_channels=3, embed_dims=64, num_stages=4, num_layers=[3, 4, 6, 3], num_heads=[1, 2, 5, 8], patch_sizes=[4, 2, 2, 2], strides=[4, 2, 2, 2], paddings=[0, 0, 0, 0], sr_ratios=[8, 4, 2, 1], out_indices=(0, 1, 2, 3), mlp_ratios=[8, 8, 4, 4], qkv_bias=True, drop_rate=0.0, attn_drop_rate=0.0, drop_path_rate=0.1, use_abs_pos_embed=True, norm_after_stage=False, use_conv_ffn=False, act_cfg={'type': 'GELU'}, norm_cfg={'eps': 1e-06, 'type': 'LN'}, convert_weights=True, init_cfg=[{'type': 'TruncNormal', 'std': 0.02, 'layer': ['Linear']}, {'type': 'Constant', 'val': 1, 'layer': ['LayerNorm']}, {'type': 'Kaiming', 'layer': ['Conv2d']}])[源代码]¶

Pyramid Vision Transformer (PVT)

Implementation of Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions.

参数

pretrain_img_size (int | tuple[int]) – The size of input image when pretrain. Defaults: 224.
in_channels (int) – Number of input channels. Default: 3.
embed_dims (int) – Embedding dimension. Default: 64.
num_stags (int) – The num of stages. Default: 4.
num_layers (Sequence[int]) – The layer number of each transformer encode layer. Default: [3, 4, 6, 3].
num_heads (Sequence[int]) – The attention heads of each transformer encode layer. Default: [1, 2, 5, 8].
patch_sizes (Sequence[int]) – The patch_size of each patch embedding. Default: [4, 2, 2, 2].
strides (Sequence[int]) – The stride of each patch embedding. Default: [4, 2, 2, 2].
paddings (Sequence[int]) – The padding of each patch embedding. Default: [0, 0, 0, 0].
sr_ratios (Sequence[int]) – The spatial reduction rate of each transformer encode layer. Default: [8, 4, 2, 1].
out_indices (Sequence[int] | int) – Output from which stages. Default: (0, 1, 2, 3).
mlp_ratios (Sequence[int]) – The ratio of the mlp hidden dim to the embedding dim of each transformer encode layer. Default: [8, 8, 4, 4].
qkv_bias (bool) – Enable bias for qkv if True. Default: True.
drop_rate (float) – Probability of an element to be zeroed. Default 0.0.
attn_drop_rate (float) – The drop out rate for attention layer. Default 0.0.
drop_path_rate (float) – stochastic depth rate. Default 0.1.
use_abs_pos_embed (bool) – If True, add absolute position embedding to the patch embedding. Defaults: True.
use_conv_ffn (bool) – If True, use Convolutional FFN to replace FFN. Default: False.
act_cfg (dict) – The activation config for FFNs. Default: dict(type=’GELU’).
norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’LN’).
pretrained (str, optional) – model pretrained path. Default: None.
convert_weights (bool) – The flag indicates whether the pre-trained model is from the original repo. We may need to convert some keys to make it compatible. Default: True.
init_cfg (dict or list[dict], optional) –
Initialization config dict. Default: ``[

dict(type=’TruncNormal’, std=.02, layer=[‘Linear’]), dict(type=’Constant’, val=1, layer=[‘LayerNorm’]), dict(type=’Normal’, std=0.01, layer=[‘Conv2d’])

]``

forward(x)[源代码]¶

Define the computation performed at every call.

Should be overridden by all subclasses.

备注

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

init_weights()[源代码]¶: Initialize the weights in backbone.

class mmpose.models.backbones.PyramidVisionTransformerV2(**kwargs)[源代码]¶: Implementation of PVTv2: Improved Baselines with Pyramid Vision Transformer.

class mmpose.models.backbones.RSN(unit_channels=256, num_stages=4, num_units=4, num_blocks=[2, 2, 2, 2], num_steps=4, norm_cfg={'type': 'BN'}, res_top_channels=64, expand_times=26, init_cfg=[{'type': 'Kaiming', 'layer': ['Conv2d']}, {'type': 'Constant', 'val': 1, 'layer': ['_BatchNorm', 'GroupNorm']}, {'type': 'Normal', 'std': 0.01, 'layer': ['Linear']}])[源代码]¶

Residual Steps Network backbone. Paper ref: Cai et al. “Learning Delicate Local Representations for Multi-Person Pose Estimation” (ECCV 2020).

参数

unit_channels (int) – Number of Channels in an upsample unit. Default: 256
num_stages (int) – Number of stages in a multi-stage RSN. Default: 4
num_units (int) – NUmber of downsample/upsample units in a single-stage RSN. Default: 4 Note: Make sure num_units == len(self.num_blocks)
num_blocks (list) – Number of RSBs (Residual Steps Block) in each downsample unit. Default: [2, 2, 2, 2]
num_steps (int) – Number of steps in a RSB. Default:4
norm_cfg (dict) – dictionary to construct and config norm layer. Default: dict(type=’BN’)
res_top_channels (int) – Number of channels of feature from ResNet_top. Default: 64.
expand_times (int) – Times by which the in_channels are expanded in RSB. Default:26.
init_cfg (dict or list[dict], optional) –
Initialization config dict. Default: ``[

dict(type=’Kaiming’, layer=[‘Conv2d’]), dict(

type=’Constant’, val=1, layer=[‘_BatchNorm’, ‘GroupNorm’]),

dict(
type=’Normal’, std=0.01, layer=[‘Linear’]),

]``

示例

>>> from mmpose.models import RSN
>>> import torch
>>> self = RSN(num_stages=2,num_units=2,num_blocks=[2,2])
>>> self.eval()
>>> inputs = torch.rand(1, 3, 511, 511)
>>> level_outputs = self.forward(inputs)
>>> for level_output in level_outputs:
...     for feature in level_output:
...         print(tuple(feature.shape))
...
(1, 256, 64, 64)
(1, 256, 128, 128)
(1, 256, 64, 64)
(1, 256, 128, 128)

forward(x)[源代码]¶: Model forward function.

class mmpose.models.backbones.RegNet(arch, in_channels=3, stem_channels=32, base_channels=32, strides=(2, 2, 2, 2), dilations=(1, 1, 1, 1), out_indices=(3,), style='pytorch', deep_stem=False, avg_down=False, frozen_stages=- 1, conv_cfg=None, norm_cfg={'requires_grad': True, 'type': 'BN'}, norm_eval=False, with_cp=False, zero_init_residual=True, init_cfg=[{'type': 'Kaiming', 'layer': ['Conv2d']}, {'type': 'Constant', 'val': 1, 'layer': ['_BatchNorm', 'GroupNorm']}])[源代码]¶

RegNet backbone.

More details can be found in paper .

参数

arch (dict) – The parameter of RegNets. - w0 (int): initial width - wa (float): slope of width - wm (float): quantization parameter to quantize the width - depth (int): depth of the backbone - group_w (int): width of group - bot_mul (float): bottleneck ratio, i.e. expansion of bottleneck.
strides (Sequence[int]) – Strides of the first block of each stage.
base_channels (int) – Base channels after stem layer.
in_channels (int) – Number of input image channels. Default: 3.
dilations (Sequence[int]) – Dilation of each stage.
out_indices (Sequence[int]) – Output from which stages.
style (str) – pytorch or caffe. If set to “pytorch”, the stride-two layer is the 3x3 conv layer, otherwise the stride-two layer is the first 1x1 conv layer. Default: “pytorch”.
frozen_stages (int) – Stages to be frozen (all param fixed). -1 means not freezing any parameters. Default: -1.
norm_cfg (dict) – dictionary to construct and config norm layer. Default: dict(type=’BN’, requires_grad=True).
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.
zero_init_residual (bool) – whether to use zero init for last norm layer in resblocks to let them behave as identity. Default: True.
init_cfg (dict or list[dict], optional) –
Initialization config dict. Default: ``[

dict(type=’Kaiming’, layer=[‘Conv2d’]), dict(

type=’Constant’, val=1, layer=[‘_BatchNorm’, ‘GroupNorm’])

]``

示例

>>> from mmpose.models import RegNet
>>> import torch
>>> self = RegNet(
        arch=dict(
            w0=88,
            wa=26.31,
            wm=2.25,
            group_w=48,
            depth=25,
            bot_mul=1.0),
         out_indices=(0, 1, 2, 3))
>>> self.eval()
>>> inputs = torch.rand(1, 3, 32, 32)
>>> level_outputs = self.forward(inputs)
>>> for level_out in level_outputs:
...     print(tuple(level_out.shape))
(1, 96, 8, 8)
(1, 192, 4, 4)
(1, 432, 2, 2)
(1, 1008, 1, 1)

adjust_width_group(widths, bottleneck_ratio, groups)[源代码]¶

Adjusts the compatibility of widths and groups.

参数

widths (list[int]) – Width of each stage.
bottleneck_ratio (float) – Bottleneck ratio.
groups (int) – number of groups in each stage

返回

The adjusted widths and groups of each stage.

返回类型

tuple(list)

forward(x)[源代码]¶: Forward function.

static generate_regnet(initial_width, width_slope, width_parameter, depth, divisor=8)[源代码]¶

Generates per block width from RegNet parameters.

参数

initial_width ([int]) – Initial width of the backbone
width_slope ([float]) – Slope of the quantized linear function
width_parameter ([int]) – Parameter used to quantize the width.
depth ([int]) – Depth of the backbone.
divisor (int, optional) – The divisor of channels. Defaults to 8.

返回

return a list of widths of each stage and the number of: stages

返回类型

list, int

get_stages_from_blocks(widths)[源代码]¶

Gets widths/stage_blocks of network at each stage.

参数: widths (list[int]) – Width in each stage.
返回: width and depth of each stage
返回类型: tuple(list)

static quantize_float(number, divisor)[源代码]¶

Converts a float to closest non-zero int divisible by divior.

参数

number (int) – Original number to be quantized.
divisor (int) – Divisor used to quantize the number.

返回

quantized number that is divisible by devisor.

返回类型

int

class mmpose.models.backbones.ResNeSt(depth, groups=1, width_per_group=4, radix=2, reduction_factor=4, avg_down_stride=True, **kwargs)[源代码]¶

ResNeSt backbone.

Please refer to the paper for details.

参数

depth (int) – Network depth, from {50, 101, 152, 200}.
groups (int) – Groups of conv2 in Bottleneck. Default: 32.
width_per_group (int) – Width per group of conv2 in Bottleneck. Default: 4.
radix (int) – Radix of SpltAtConv2d. Default: 2
reduction_factor (int) – Reduction factor of SplitAttentionConv2d. Default: 4.
avg_down_stride (bool) – Whether to use average pool for stride in Bottleneck. Default: True.
in_channels (int) – Number of input image channels. Default: 3.
stem_channels (int) – Output channels of the stem layer. Default: 64.
num_stages (int) – Stages of the network. Default: 4.
strides (Sequence[int]) – Strides of the first block of each stage. Default: (1, 2, 2, 2).
dilations (Sequence[int]) – Dilation of each stage. Default: (1, 1, 1, 1).
out_indices (Sequence[int]) – Output from which stages. If only one stage is specified, a single tensor (feature map) is returned, otherwise multiple stages are specified, a tuple of tensors will be returned. Default: (3, ).
style (str) – pytorch or caffe. If set to “pytorch”, the stride-two layer is the 3x3 conv layer, otherwise the stride-two layer is the first 1x1 conv layer.
deep_stem (bool) – Replace 7x7 conv in input stem with 3 3x3 conv. Default: False.
avg_down (bool) – Use AvgPool instead of stride conv when downsampling in the bottleneck. Default: False.
frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Default: -1.
conv_cfg (dict | None) – The config dict for conv layers. Default: None.
norm_cfg (dict) – The config dict for norm layers.
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.
zero_init_residual (bool) – Whether to use zero init for last norm layer in resblocks to let them behave as identity. Default: True.
init_cfg (dict or list[dict], optional) –
Initialization config dict. Default: ``[

dict(type=’Kaiming’, layer=[‘Conv2d’]), dict(

type=’Constant’, val=1, layer=[‘_BatchNorm’, ‘GroupNorm’])

]``

make_res_layer(**kwargs)[源代码]¶: Make a ResLayer.

class mmpose.models.backbones.ResNeXt(depth, groups=32, width_per_group=4, **kwargs)[源代码]¶

ResNeXt backbone.

Please refer to the paper for details.

参数

depth (int) – Network depth, from {50, 101, 152}.
groups (int) – Groups of conv2 in Bottleneck. Default: 32.
width_per_group (int) – Width per group of conv2 in Bottleneck. Default: 4.
in_channels (int) – Number of input image channels. Default: 3.
stem_channels (int) – Output channels of the stem layer. Default: 64.
num_stages (int) – Stages of the network. Default: 4.
strides (Sequence[int]) – Strides of the first block of each stage. Default: (1, 2, 2, 2).
dilations (Sequence[int]) – Dilation of each stage. Default: (1, 1, 1, 1).
out_indices (Sequence[int]) – Output from which stages. If only one stage is specified, a single tensor (feature map) is returned, otherwise multiple stages are specified, a tuple of tensors will be returned. Default: (3, ).
style (str) – pytorch or caffe. If set to “pytorch”, the stride-two layer is the 3x3 conv layer, otherwise the stride-two layer is the first 1x1 conv layer.
deep_stem (bool) – Replace 7x7 conv in input stem with 3 3x3 conv. Default: False.
avg_down (bool) – Use AvgPool instead of stride conv when downsampling in the bottleneck. Default: False.
frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Default: -1.
conv_cfg (dict | None) – The config dict for conv layers. Default: None.
norm_cfg (dict) – The config dict for norm layers.
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.
zero_init_residual (bool) – Whether to use zero init for last norm layer in resblocks to let them behave as identity. Default: True.
init_cfg –
Initialization config dict. Default: ``[

dict(type=’Kaiming’, layer=[‘Conv2d’]), dict(

type=’Constant’, val=1, layer=[‘_BatchNorm’, ‘GroupNorm’])

]``

make_res_layer(**kwargs)[源代码]¶: Make a ResLayer.

class mmpose.models.backbones.ResNet(depth, in_channels=3, stem_channels=64, base_channels=64, expansion=None, num_stages=4, strides=(1, 2, 2, 2), dilations=(1, 1, 1, 1), out_indices=(3,), style='pytorch', deep_stem=False, avg_down=False, frozen_stages=- 1, conv_cfg=None, norm_cfg={'requires_grad': True, 'type': 'BN'}, norm_eval=False, with_cp=False, zero_init_residual=True, init_cfg=[{'type': 'Kaiming', 'layer': ['Conv2d']}, {'type': 'Constant', 'val': 1, 'layer': ['_BatchNorm', 'GroupNorm']}])[源代码]¶

ResNet backbone.

Please refer to the paper for details.

参数

depth (int) – Network depth, from {18, 34, 50, 101, 152}.
in_channels (int) – Number of input image channels. Default: 3.
stem_channels (int) – Output channels of the stem layer. Default: 64.
base_channels (int) – Middle channels of the first stage. Default: 64.
num_stages (int) – Stages of the network. Default: 4.
strides (Sequence[int]) – Strides of the first block of each stage. Default: (1, 2, 2, 2).
dilations (Sequence[int]) – Dilation of each stage. Default: (1, 1, 1, 1).
out_indices (Sequence[int]) – Output from which stages. If only one stage is specified, a single tensor (feature map) is returned, otherwise multiple stages are specified, a tuple of tensors will be returned. Default: (3, ).
style (str) – pytorch or caffe. If set to “pytorch”, the stride-two layer is the 3x3 conv layer, otherwise the stride-two layer is the first 1x1 conv layer.
deep_stem (bool) – Replace 7x7 conv in input stem with 3 3x3 conv. Default: False.
avg_down (bool) – Use AvgPool instead of stride conv when downsampling in the bottleneck. Default: False.
frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Default: -1.
conv_cfg (dict | None) – The config dict for conv layers. Default: None.
norm_cfg (dict) – The config dict for norm layers.
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.
zero_init_residual (bool) – Whether to use zero init for last norm layer in resblocks to let them behave as identity. Default: True.
init_cfg (dict or list[dict], optional) –
Initialization config dict. Default: ``[

dict(type=’Kaiming’, layer=[‘Conv2d’]), dict(

type=’Constant’, val=1, layer=[‘_BatchNorm’, ‘GroupNorm’])

]``

示例

>>> from mmpose.models import ResNet
>>> import torch
>>> self = ResNet(depth=18, out_indices=(0, 1, 2, 3))
>>> self.eval()
>>> inputs = torch.rand(1, 3, 32, 32)
>>> level_outputs = self.forward(inputs)
>>> for level_out in level_outputs:
...     print(tuple(level_out.shape))
(1, 64, 8, 8)
(1, 128, 4, 4)
(1, 256, 2, 2)
(1, 512, 1, 1)

forward(x)[源代码]¶: Forward function.

init_weights()[源代码]¶: Initialize the weights in backbone.

make_res_layer(**kwargs)[源代码]¶: Make a ResLayer.

property norm1¶

the normalization layer named “norm1”

Type: nn.Module

train(mode=True)[源代码]¶: Convert the model into training mode.

class mmpose.models.backbones.ResNetV1d(**kwargs)[源代码]¶

ResNetV1d variant described in Bag of Tricks.

Compared with default ResNet(ResNetV1b), ResNetV1d replaces the 7x7 conv in the input stem with three 3x3 convs. And in the downsampling block, a 2x2 avg_pool with stride 2 is added before conv, whose stride is changed to 1.

class mmpose.models.backbones.SCNet(depth, **kwargs)[源代码]¶

SCNet backbone.

Improving Convolutional Networks with Self-Calibrated Convolutions, Jiang-Jiang Liu, Qibin Hou, Ming-Ming Cheng, Changhu Wang, Jiashi Feng, IEEE CVPR, 2020. http://mftp.mmcheng.net/Papers/20cvprSCNet.pdf

参数

depth (int) – Depth of scnet, from {50, 101}.
in_channels (int) – Number of input image channels. Normally 3.
base_channels (int) – Number of base channels of hidden layer.
num_stages (int) – SCNet stages, normally 4.
strides (Sequence[int]) – Strides of the first block of each stage.
dilations (Sequence[int]) – Dilation of each stage.
out_indices (Sequence[int]) – Output from which stages.
style (str) – pytorch or caffe. If set to “pytorch”, the stride-two layer is the 3x3 conv layer, otherwise the stride-two layer is the first 1x1 conv layer.
deep_stem (bool) – Replace 7x7 conv in input stem with 3 3x3 conv
avg_down (bool) – Use AvgPool instead of stride conv when downsampling in the bottleneck.
frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters.
norm_cfg (dict) – Dictionary to construct and config norm layer.
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed.
zero_init_residual (bool) – Whether to use zero init for last norm layer in resblocks to let them behave as identity.

示例

>>> from mmpose.models import SCNet
>>> import torch
>>> self = SCNet(depth=50, out_indices=(0, 1, 2, 3))
>>> self.eval()
>>> inputs = torch.rand(1, 3, 224, 224)
>>> level_outputs = self.forward(inputs)
>>> for level_out in level_outputs:
...     print(tuple(level_out.shape))
(1, 256, 56, 56)
(1, 512, 28, 28)
(1, 1024, 14, 14)
(1, 2048, 7, 7)

class mmpose.models.backbones.SEResNeXt(depth, groups=32, width_per_group=4, **kwargs)[源代码]¶

SEResNeXt backbone.

Please refer to the paper for details.

参数

depth (int) – Network depth, from {50, 101, 152}.
groups (int) – Groups of conv2 in Bottleneck. Default: 32.
width_per_group (int) – Width per group of conv2 in Bottleneck. Default: 4.
se_ratio (int) – Squeeze ratio in SELayer. Default: 16.
in_channels (int) – Number of input image channels. Default: 3.
stem_channels (int) – Output channels of the stem layer. Default: 64.
num_stages (int) – Stages of the network. Default: 4.
strides (Sequence[int]) – Strides of the first block of each stage. Default: (1, 2, 2, 2).
dilations (Sequence[int]) – Dilation of each stage. Default: (1, 1, 1, 1).
out_indices (Sequence[int]) – Output from which stages. If only one stage is specified, a single tensor (feature map) is returned, otherwise multiple stages are specified, a tuple of tensors will be returned. Default: (3, ).
style (str) – pytorch or caffe. If set to “pytorch”, the stride-two layer is the 3x3 conv layer, otherwise the stride-two layer is the first 1x1 conv layer.
deep_stem (bool) – Replace 7x7 conv in input stem with 3 3x3 conv. Default: False.
avg_down (bool) – Use AvgPool instead of stride conv when downsampling in the bottleneck. Default: False.
frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Default: -1.
conv_cfg (dict | None) – The config dict for conv layers. Default: None.
norm_cfg (dict) – The config dict for norm layers.
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.
zero_init_residual (bool) – Whether to use zero init for last norm layer in resblocks to let them behave as identity. Default: True.
init_cfg (dict or list[dict], optional) –
Initialization config dict. Default: ``[

dict(type=’Kaiming’, layer=[‘Conv2d’]), dict(

type=’Constant’, val=1, layer=[‘_BatchNorm’, ‘GroupNorm’])

]``

示例

>>> from mmpose.models import SEResNeXt
>>> import torch
>>> self = SEResNet(depth=50, out_indices=(0, 1, 2, 3))
>>> self.eval()
>>> inputs = torch.rand(1, 3, 224, 224)
>>> level_outputs = self.forward(inputs)
>>> for level_out in level_outputs:
...     print(tuple(level_out.shape))
(1, 256, 56, 56)
(1, 512, 28, 28)
(1, 1024, 14, 14)
(1, 2048, 7, 7)

make_res_layer(**kwargs)[源代码]¶: Make a ResLayer.

class mmpose.models.backbones.SEResNet(depth, se_ratio=16, **kwargs)[源代码]¶

SEResNet backbone.

Please refer to the paper for details.

参数

depth (int) – Network depth, from {50, 101, 152}.
se_ratio (int) – Squeeze ratio in SELayer. Default: 16.
in_channels (int) – Number of input image channels. Default: 3.
stem_channels (int) – Output channels of the stem layer. Default: 64.
num_stages (int) – Stages of the network. Default: 4.
strides (Sequence[int]) – Strides of the first block of each stage. Default: (1, 2, 2, 2).
dilations (Sequence[int]) – Dilation of each stage. Default: (1, 1, 1, 1).
out_indices (Sequence[int]) – Output from which stages. If only one stage is specified, a single tensor (feature map) is returned, otherwise multiple stages are specified, a tuple of tensors will be returned. Default: (3, ).
style (str) – pytorch or caffe. If set to “pytorch”, the stride-two layer is the 3x3 conv layer, otherwise the stride-two layer is the first 1x1 conv layer.
deep_stem (bool) – Replace 7x7 conv in input stem with 3 3x3 conv. Default: False.
avg_down (bool) – Use AvgPool instead of stride conv when downsampling in the bottleneck. Default: False.
frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Default: -1.
conv_cfg (dict | None) – The config dict for conv layers. Default: None.
norm_cfg (dict) – The config dict for norm layers.
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.
zero_init_residual (bool) – Whether to use zero init for last norm layer in resblocks to let them behave as identity. Default: True.
init_cfg (dict or list[dict], optional) –
Initialization config dict. Default: ``[

dict(type=’Kaiming’, layer=[‘Conv2d’]), dict(

type=’Constant’, val=1, layer=[‘_BatchNorm’, ‘GroupNorm’])

]``

示例

>>> from mmpose.models import SEResNet
>>> import torch
>>> self = SEResNet(depth=50, out_indices=(0, 1, 2, 3))
>>> self.eval()
>>> inputs = torch.rand(1, 3, 224, 224)
>>> level_outputs = self.forward(inputs)
>>> for level_out in level_outputs:
...     print(tuple(level_out.shape))
(1, 256, 56, 56)
(1, 512, 28, 28)
(1, 1024, 14, 14)
(1, 2048, 7, 7)

make_res_layer(**kwargs)[源代码]¶: Make a ResLayer.

class mmpose.models.backbones.ShuffleNetV1(groups=3, widen_factor=1.0, out_indices=(2,), frozen_stages=- 1, conv_cfg=None, norm_cfg={'type': 'BN'}, act_cfg={'type': 'ReLU'}, norm_eval=False, with_cp=False, init_cfg=[{'type': 'Normal', 'std': 0.01, 'layer': ['Conv2d']}, {'type': 'Constant', 'val': 1, 'bias': 0.0001, 'layer': ['_BatchNorm', 'GroupNorm']}])[源代码]¶

ShuffleNetV1 backbone.

参数

groups (int, optional) – The number of groups to be used in grouped 1x1 convolutions in each ShuffleUnit. Default: 3.
widen_factor (float, optional) – Width multiplier - adjusts the number of channels in each layer by this amount. Default: 1.0.
out_indices (Sequence[int]) – Output from which stages. Default: (2, )
frozen_stages (int) – Stages to be frozen (all param fixed). Default: -1, which means not freezing any parameters.
conv_cfg (dict) – Config dict for convolution layer. Default: None, which means using conv2d.
norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’BN’).
act_cfg (dict) – Config dict for activation layer. Default: dict(type=’ReLU’).
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.
init_cfg (dict or list[dict], optional) –
Initialization config dict. Default: ``[

dict(type=’Normal’, std=0.01, layer=[‘Conv2d’]), dict(

type=’Constant’, val=1, bias=0.0001 layer=[‘_BatchNorm’, ‘GroupNorm’])

]``

forward(x)[源代码]¶

Forward function.

参数: x (Tensor | tuple[Tensor]) – x could be a torch.Tensor or a tuple of torch.Tensor, containing input data for forward computation.

init_weights(pretrained=None)[源代码]¶: Initialize the weights.

make_layer(out_channels, num_blocks, first_block=False)[源代码]¶

Stack ShuffleUnit blocks to make a layer.

参数

out_channels (int) – out_channels of the block.
num_blocks (int) – Number of blocks.
first_block (bool, optional) – Whether is the first ShuffleUnit of a sequential ShuffleUnits. Default: False, which means using the grouped 1x1 convolution.

train(mode=True)[源代码]¶

Set module status before forward computation.

参数: mode (bool) – Whether it is train_mode or test_mode

class mmpose.models.backbones.ShuffleNetV2(widen_factor=1.0, out_indices=(3,), frozen_stages=- 1, conv_cfg=None, norm_cfg={'type': 'BN'}, act_cfg={'type': 'ReLU'}, norm_eval=False, with_cp=False, init_cfg=[{'type': 'Normal', 'std': 0.01, 'layer': ['Conv2d']}, {'type': 'Constant', 'val': 1, 'bias': 0.0001, 'layer': ['_BatchNorm', 'GroupNorm']}])[源代码]¶

ShuffleNetV2 backbone.

参数

widen_factor (float) – Width multiplier - adjusts the number of channels in each layer by this amount. Default: 1.0.
out_indices (Sequence[int]) – Output from which stages. Default: (0, 1, 2, 3).
frozen_stages (int) – Stages to be frozen (all param fixed). Default: -1, which means not freezing any parameters.
conv_cfg (dict) – Config dict for convolution layer. Default: None, which means using conv2d.
norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’BN’).
act_cfg (dict) – Config dict for activation layer. Default: dict(type=’ReLU’).
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.
init_cfg (dict or list[dict], optional) –
Initialization config dict. Default: ``[

dict(type=’Normal’, std=0.01, layer=[‘Conv2d’]), dict(

type=’Constant’, val=1, bias=0.0001 layer=[‘_BatchNorm’, ‘GroupNorm’])

]``

forward(x)[源代码]¶

Forward function.

参数: x (Tensor | tuple[Tensor]) – x could be a torch.Tensor or a tuple of torch.Tensor, containing input data for forward computation.

init_weights()[源代码]¶: Initialize the weights.

train(mode=True)[源代码]¶

Set module status before forward computation.

参数: mode (bool) – Whether it is train_mode or test_mode

class mmpose.models.backbones.SwinTransformer(pretrain_img_size=224, in_channels=3, embed_dims=96, patch_size=4, window_size=7, mlp_ratio=4, depths=(2, 2, 6, 2), num_heads=(3, 6, 12, 24), strides=(4, 2, 2, 2), out_indices=(0, 1, 2, 3), qkv_bias=True, qk_scale=None, patch_norm=True, drop_rate=0.0, attn_drop_rate=0.0, drop_path_rate=0.1, use_abs_pos_embed=False, act_cfg={'type': 'GELU'}, norm_cfg={'type': 'LN'}, with_cp=False, convert_weights=False, frozen_stages=- 1, init_cfg=[{'type': 'TruncNormal', 'std': 0.02, 'layer': ['Linear']}, {'type': 'Constant', 'val': 1, 'layer': ['LayerNorm']}])[源代码]¶

Swin Transformer A PyTorch implement of : Swin Transformer: Hierarchical Vision Transformer using Shifted Windows -

https://arxiv.org/abs/2103.14030

Inspiration from https://github.com/microsoft/Swin-Transformer

参数

pretrain_img_size (int | tuple[int]) – The size of input image when pretrain. Defaults: 224.
in_channels (int) – The num of input channels. Defaults: 3.
embed_dims (int) – The feature dimension. Default: 96.
patch_size (int | tuple[int]) – Patch size. Default: 4.
window_size (int) – Window size. Default: 7.
mlp_ratio (int) – Ratio of mlp hidden dim to embedding dim. Default: 4.
depths (tuple[int]) – Depths of each Swin Transformer stage. Default: (2, 2, 6, 2).
num_heads (tuple[int]) – Parallel attention heads of each Swin Transformer stage. Default: (3, 6, 12, 24).
strides (tuple[int]) – The patch merging or patch embedding stride of each Swin Transformer stage. (In swin, we set kernel size equal to stride.) Default: (4, 2, 2, 2).
out_indices (tuple[int]) – Output from which stages. Default: (0, 1, 2, 3).
qkv_bias (bool, optional) – If True, add a learnable bias to query, key, value. Default: True
qk_scale (float | None, optional) – Override default qk scale of head_dim ** -0.5 if set. Default: None.
patch_norm (bool) – If add a norm layer for patch embed and patch merging. Default: True.
drop_rate (float) – Dropout rate. Defaults: 0.
attn_drop_rate (float) – Attention dropout rate. Default: 0.
drop_path_rate (float) – Stochastic depth rate. Defaults: 0.1.
use_abs_pos_embed (bool) – If True, add absolute position embedding to the patch embedding. Defaults: False.
act_cfg (dict) – Config dict for activation layer. Default: dict(type=’LN’).
norm_cfg (dict) – Config dict for normalization layer at output of backone. Defaults: dict(type=’LN’).
with_cp (bool, optional) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.
pretrained (str, optional) – model pretrained path. Default: None.
convert_weights (bool) – The flag indicates whether the pre-trained model is from the original repo. We may need to convert some keys to make it compatible. Default: False.
frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). Default: -1 (-1 means not freezing any parameters).
init_cfg (dict or list[dict], optional) –
Initialization config dict. Default: ``[

dict(type=’TruncNormal’, std=.02, layer=[‘Linear’]), dict(type=’Constant’, val=1, layer=[‘LayerNorm’]),

]``

forward(x)[源代码]¶

Forward function.

参数: x (Tensor | tuple[Tensor]) – x could be a torch.Tensor or a tuple of torch.Tensor, containing input data for forward computation.

init_weights(pretrained=None)[源代码]¶

Initialize the weights in backbone.

参数: pretrained (str, optional) – Path to pre-trained weights. Defaults to None.

train(mode=True)[源代码]¶: Convert the model into training mode while keep layers freezed.

class mmpose.models.backbones.TCN(in_channels, stem_channels=1024, num_blocks=2, kernel_sizes=(3, 3, 3), dropout=0.25, causal=False, residual=True, use_stride_conv=False, conv_cfg={'type': 'Conv1d'}, norm_cfg={'type': 'BN1d'}, max_norm=None, init_cfg=[{'type': 'Kaiming', 'mode': 'fan_in', 'nonlinearity': 'relu', 'layer': ['Conv2d']}, {'type': 'Constant', 'val': 1, 'layer': ['_BatchNorm', 'GroupNorm']}])[源代码]¶

TCN backbone.

Temporal Convolutional Networks. More details can be found in the paper .

参数

in_channels (int) – Number of input channels, which equals to num_keypoints * num_features.
stem_channels (int) – Number of feature channels. Default: 1024.
num_blocks (int) – NUmber of basic temporal convolutional blocks. Default: 2.
kernel_sizes (Sequence[int]) – Sizes of the convolving kernel of each basic block. Default: (3, 3, 3).
dropout (float) – Dropout rate. Default: 0.25.
causal (bool) – Use causal convolutions instead of symmetric convolutions (for real-time applications). Default: False.
residual (bool) – Use residual connection. Default: True.
use_stride_conv (bool) – Use TCN backbone optimized for single-frame batching, i.e. where batches have input length = receptive field, and output length = 1. This implementation replaces dilated convolutions with strided convolutions to avoid generating unused intermediate results. The weights are interchangeable with the reference implementation. Default: False
conv_cfg (dict) – dictionary to construct and config conv layer. Default: dict(type=’Conv1d’).
norm_cfg (dict) – dictionary to construct and config norm layer. Default: dict(type=’BN1d’).
max_norm (float|None) – if not None, the weight of convolution layers will be clipped to have a maximum norm of max_norm.
init_cfg (dict or list[dict], optional) –
Initialization config dict. Default: ``[

dict(
type=’Kaiming’, mode=’fan_in’, nonlinearity=’relu’, layer=[‘Conv2d’]),

dict(
type=’Constant’, val=1, layer=[‘_BatchNorm’, ‘GroupNorm’])

]``

示例

>>> from mmpose.models import TCN
>>> import torch
>>> self = TCN(in_channels=34)
>>> self.eval()
>>> inputs = torch.rand(1, 34, 243)
>>> level_outputs = self.forward(inputs)
>>> for level_out in level_outputs:
...     print(tuple(level_out.shape))
(1, 1024, 235)
(1, 1024, 217)

forward(x)[源代码]¶: Forward function.

class mmpose.models.backbones.V2VNet(input_channels, output_channels, mid_channels=32, init_cfg={'layer': ['Conv3d', 'ConvTranspose3d'], 'std': 0.001, 'type': 'Normal'})[源代码]¶

V2VNet.

Please refer to the paper <https://arxiv.org/abs/1711.07399>: for details.

参数

input_channels (int) – Number of channels of the input feature volume.
output_channels (int) – Number of channels of the output volume.
mid_channels (int) – Input and output channels of the encoder-decoder block.
init_cfg (dict or list[dict], optional) –
Initialization config dict. Default: ``dict(

type=’Normal’, std=0.001, layer=[‘Conv3d’, ‘ConvTranspose3d’]

)``

forward(x)[源代码]¶: Forward function.

class mmpose.models.backbones.VGG(depth, num_classes=- 1, num_stages=5, dilations=(1, 1, 1, 1, 1), out_indices=None, frozen_stages=- 1, conv_cfg=None, norm_cfg=None, act_cfg={'type': 'ReLU'}, norm_eval=False, ceil_mode=False, with_last_pool=True, init_cfg=[{'type': 'Kaiming', 'layer': ['Conv2d']}, {'type': 'Constant', 'val': 1, 'layer': ['_BatchNorm', 'GroupNorm']}, {'type': 'Normal', 'std': 0.01, 'layer': ['Linear']}])[源代码]¶

VGG backbone.

参数

depth (int) – Depth of vgg, from {11, 13, 16, 19}.
with_norm (bool) – Use BatchNorm or not.
num_classes (int) – number of classes for classification.
num_stages (int) – VGG stages, normally 5.
dilations (Sequence[int]) – Dilation of each stage.
out_indices (Sequence[int]) – Output from which stages. If only one stage is specified, a single tensor (feature map) is returned, otherwise multiple stages are specified, a tuple of tensors will be returned. When it is None, the default behavior depends on whether num_classes is specified. If num_classes <= 0, the default value is (4, ), outputting the last feature map before classifier. If num_classes > 0, the default value is (5, ), outputting the classification score. Default: None.
frozen_stages (int) – Stages to be frozen (all param fixed). -1 means not freezing any parameters.
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.
ceil_mode (bool) – Whether to use ceil_mode of MaxPool. Default: False.
with_last_pool (bool) – Whether to keep the last pooling before classifier. Default: True.
init_cfg (dict or list[dict], optional) –
Initialization config dict. Default: ``[

dict(type=’Kaiming’, layer=[‘Conv2d’]), dict(

type=’Constant’, val=1, layer=[‘_BatchNorm’, ‘GroupNorm’]),

dict(
type=’Normal’, std=0.01, layer=[‘Linear’]),

]``

forward(x)[源代码]¶

Forward function.

参数: x (Tensor | tuple[Tensor]) – x could be a torch.Tensor or a tuple of torch.Tensor, containing input data for forward computation.

train(mode=True)[源代码]¶

Set module status before forward computation.

参数: mode (bool) – Whether it is train_mode or test_mode

class mmpose.models.backbones.ViPNAS_MobileNetV3(wid=[16, 16, 24, 40, 80, 112, 160], expan=[None, 1, 5, 4, 5, 5, 6], dep=[None, 1, 4, 4, 4, 4, 4], ks=[3, 3, 7, 7, 5, 7, 5], group=[None, 8, 120, 20, 100, 280, 240], att=[None, True, True, False, True, True, True], stride=[2, 1, 2, 2, 2, 1, 2], act=['HSwish', 'ReLU', 'ReLU', 'ReLU', 'HSwish', 'HSwish', 'HSwish'], conv_cfg=None, norm_cfg={'type': 'BN'}, frozen_stages=- 1, norm_eval=False, with_cp=False, init_cfg=[{'type': 'Normal', 'std': 0.001, 'layer': ['Conv2d']}, {'type': 'Constant', 'val': 1, 'layer': ['_BatchNorm', 'GroupNorm']}])[源代码]¶

ViPNAS_MobileNetV3 backbone.

“ViPNAS: Efficient Video Pose Estimation via Neural Architecture Search” More details can be found in the paper .

参数

wid (list(int)) – Searched width config for each stage.
expan (list(int)) – Searched expansion ratio config for each stage.
dep (list(int)) – Searched depth config for each stage.
ks (list(int)) – Searched kernel size config for each stage.
group (list(int)) – Searched group number config for each stage.
att (list(bool)) – Searched attention config for each stage.
stride (list(int)) – Stride config for each stage.
act (list(dict)) – Activation config for each stage.
conv_cfg (dict) – Config dict for convolution layer. Default: None, which means using conv2d.
norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’BN’).
frozen_stages (int) – Stages to be frozen (all param fixed). Default: -1, which means not freezing any parameters.
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.
init_cfg (dict or list[dict], optional) –
Initialization config dict. Default: ``[

dict(type=’Normal’, std=0.001, layer=[‘Conv2d’]), dict(

type=’Constant’, val=1, layer=[‘_BatchNorm’, ‘GroupNorm’])

]``

forward(x)[源代码]¶

Forward function.

参数: x (Tensor | tuple[Tensor]) – x could be a torch.Tensor or a tuple of torch.Tensor, containing input data for forward computation.

train(mode=True)[源代码]¶

Set module status before forward computation.

参数: mode (bool) – Whether it is train_mode or test_mode

class mmpose.models.backbones.ViPNAS_ResNet(depth, in_channels=3, num_stages=4, strides=(1, 2, 2, 2), dilations=(1, 1, 1, 1), out_indices=(3,), style='pytorch', deep_stem=False, avg_down=False, frozen_stages=- 1, conv_cfg=None, norm_cfg={'requires_grad': True, 'type': 'BN'}, norm_eval=False, with_cp=False, zero_init_residual=True, wid=[48, 80, 160, 304, 608], expan=[None, 1, 1, 1, 1], dep=[None, 4, 6, 7, 3], ks=[7, 3, 5, 5, 5], group=[None, 16, 16, 16, 16], att=[None, True, False, True, True], init_cfg=[{'type': 'Normal', 'std': 0.001, 'layer': ['Conv2d']}, {'type': 'Constant', 'val': 1, 'layer': ['_BatchNorm', 'GroupNorm']}])[源代码]¶

ViPNAS_ResNet backbone.

“ViPNAS: Efficient Video Pose Estimation via Neural Architecture Search” More details can be found in the paper .

参数

depth (int) – Network depth, from {18, 34, 50, 101, 152}.
in_channels (int) – Number of input image channels. Default: 3.
num_stages (int) – Stages of the network. Default: 4.
strides (Sequence[int]) – Strides of the first block of each stage. Default: (1, 2, 2, 2).
dilations (Sequence[int]) – Dilation of each stage. Default: (1, 1, 1, 1).
out_indices (Sequence[int]) – Output from which stages. If only one stage is specified, a single tensor (feature map) is returned, otherwise multiple stages are specified, a tuple of tensors will be returned. Default: (3, ).
style (str) – pytorch or caffe. If set to “pytorch”, the stride-two layer is the 3x3 conv layer, otherwise the stride-two layer is the first 1x1 conv layer.
deep_stem (bool) – Replace 7x7 conv in input stem with 3 3x3 conv. Default: False.
avg_down (bool) – Use AvgPool instead of stride conv when downsampling in the bottleneck. Default: False.
frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Default: -1.
conv_cfg (dict | None) – The config dict for conv layers. Default: None.
norm_cfg (dict) – The config dict for norm layers.
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.
zero_init_residual (bool) – Whether to use zero init for last norm layer in resblocks to let them behave as identity. Default: True.
wid (list(int)) – Searched width config for each stage.
expan (list(int)) – Searched expansion ratio config for each stage.
dep (list(int)) – Searched depth config for each stage.
ks (list(int)) – Searched kernel size config for each stage.
group (list(int)) – Searched group number config for each stage.
att (list(bool)) – Searched attention config for each stage.
init_cfg (dict or list[dict], optional) –
Initialization config dict. Default: ``[

dict(type=’Normal’, std=0.001, layer=[‘Conv2d’]), dict(

type=’Constant’, val=1, layer=[‘_BatchNorm’, ‘GroupNorm’])

]``

forward(x)[源代码]¶: Forward function.

make_res_layer(**kwargs)[源代码]¶: Make a ViPNAS ResLayer.

property norm1¶

the normalization layer named “norm1”

Type: nn.Module

train(mode=True)[源代码]¶: Convert the model into training mode.

necks¶

class mmpose.models.necks.CSPNeXtPAFPN(in_channels: Sequence[int], out_channels: int, out_indices=(0, 1, 2), num_csp_blocks: int = 3, use_depthwise: bool = False, expand_ratio: float = 0.5, upsample_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'mode': 'nearest', 'scale_factor': 2}, conv_cfg: Optional[bool] = None, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 0.001, 'momentum': 0.03, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'type': 'Swish'}, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[mmengine.config.config.ConfigDict, dict]]]] = {'a': 2.23606797749979, 'distribution': 'uniform', 'layer': 'Conv2d', 'mode': 'fan_in', 'nonlinearity': 'leaky_relu', 'type': 'Kaiming'})[源代码]¶

Path Aggregation Network with CSPNeXt blocks. Modified from RTMDet.

参数

in_channels (Sequence[int]) – Number of input channels per scale.
out_channels (int) – Number of output channels (used at each scale)
out_indices (Sequence[int]) – Output from which stages.
num_csp_blocks (int) – Number of bottlenecks in CSPLayer. Defaults to 3.
use_depthwise (bool) – Whether to use depthwise separable convolution in blocks. Defaults to False.
expand_ratio (float) – Ratio to adjust the number of channels of the hidden layer. Default: 0.5
upsample_cfg (dict) – Config dict for interpolate layer. Default: dict(scale_factor=2, mode=’nearest’)
conv_cfg (dict, optional) – Config dict for convolution layer. Default: None, which means using conv2d.
norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’BN’)
act_cfg (dict) – Config dict for activation layer. Default: dict(type=’Swish’)
init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None.

forward(inputs: Tuple[torch.Tensor, ...]) → Tuple[torch.Tensor, ...][源代码]¶

参数: inputs (tuple[Tensor]) – input features.
返回: YOLOXPAFPN features.
返回类型: tuple[Tensor]

class mmpose.models.necks.ChannelMapper(in_channels: List[int], out_channels: int, kernel_size: int = 3, conv_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, norm_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, act_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = {'type': 'ReLU'}, num_outs: Optional[int] = None, bias: Union[bool, str] = 'auto', init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[mmengine.config.config.ConfigDict, dict]]]] = {'distribution': 'uniform', 'layer': 'Conv2d', 'type': 'Xavier'})[源代码]¶

Channel Mapper to reduce/increase channels of backbone features.

This is used to reduce/increase channels of backbone features.

参数

in_channels (List[int]) – Number of input channels per scale.
out_channels (int) – Number of output channels (used at each scale).
kernel_size (int, optional) – kernel_size for reducing channels (used at each scale). Default: 3.
conv_cfg (ConfigDict or dict, optional) – Config dict for convolution layer. Default: None.
norm_cfg (ConfigDict or dict, optional) – Config dict for normalization layer. Default: None.
act_cfg (ConfigDict or dict, optional) – Config dict for activation layer in ConvModule. Default: dict(type=’ReLU’).
num_outs (int, optional) – Number of output feature maps. There would be extra_convs when num_outs larger than the length of in_channels.

:param init_cfg (ConfigDict or dict or list[ConfigDict or dict]: optional): Initialization config dict. :param : optional): Initialization config dict.

示例

>>> import torch
>>> in_channels = [2, 3, 5, 7]
>>> scales = [340, 170, 84, 43]
>>> inputs = [torch.rand(1, c, s, s)
...           for c, s in zip(in_channels, scales)]
>>> self = ChannelMapper(in_channels, 11, 3).eval()
>>> outputs = self.forward(inputs)
>>> for i in range(len(outputs)):
...     print(f'outputs[{i}].shape = {outputs[i].shape}')
outputs[0].shape = torch.Size([1, 11, 340, 340])
outputs[1].shape = torch.Size([1, 11, 170, 170])
outputs[2].shape = torch.Size([1, 11, 84, 84])
outputs[3].shape = torch.Size([1, 11, 43, 43])

forward(inputs: Tuple[torch.Tensor]) → Tuple[torch.Tensor][源代码]¶: Forward function.

class mmpose.models.necks.FPN(in_channels, out_channels, num_outs, start_level=0, end_level=- 1, add_extra_convs=False, relu_before_extra_convs=False, no_norm_on_lateral=False, conv_cfg=None, norm_cfg=None, act_cfg=None, upsample_cfg={'mode': 'nearest'})[源代码]¶

Feature Pyramid Network.

This is an implementation of paper Feature Pyramid Networks for Object Detection.

参数

in_channels (list[int]) – Number of input channels per scale.
out_channels (int) – Number of output channels (used at each scale).
num_outs (int) – Number of output scales.
start_level (int) – Index of the start input backbone level used to build the feature pyramid. Default: 0.
end_level (int) – Index of the end input backbone level (exclusive) to build the feature pyramid. Default: -1, which means the last level.
add_extra_convs (bool | str) –
If bool, it decides whether to add conv layers on top of the original feature maps. Default to False. If True, it is equivalent to add_extra_convs=’on_input’. If str, it specifies the source feature map of the extra convs. Only the following options are allowed
- ’on_input’: Last feat map of neck inputs (i.e. backbone feature).
- ’on_lateral’: Last feature map after lateral convs.
- ’on_output’: The last output feature map after fpn convs.
relu_before_extra_convs (bool) – Whether to apply relu before the extra conv. Default: False.
no_norm_on_lateral (bool) – Whether to apply norm on lateral. Default: False.
conv_cfg (dict) – Config dict for convolution layer. Default: None.
norm_cfg (dict) – Config dict for normalization layer. Default: None.
act_cfg (dict) – Config dict for activation layer in ConvModule. Default: None.
upsample_cfg (dict) – Config dict for interpolate layer. Default: dict(mode=’nearest’).

示例

>>> import torch
>>> in_channels = [2, 3, 5, 7]
>>> scales = [340, 170, 84, 43]
>>> inputs = [torch.rand(1, c, s, s)
...           for c, s in zip(in_channels, scales)]
>>> self = FPN(in_channels, 11, len(in_channels)).eval()
>>> outputs = self.forward(inputs)
>>> for i in range(len(outputs)):
...     print(f'outputs[{i}].shape = {outputs[i].shape}')
outputs[0].shape = torch.Size([1, 11, 340, 340])
outputs[1].shape = torch.Size([1, 11, 170, 170])
outputs[2].shape = torch.Size([1, 11, 84, 84])
outputs[3].shape = torch.Size([1, 11, 43, 43])

forward(inputs)[源代码]¶: Forward function.

init_weights()[源代码]¶: Initialize model weights.

class mmpose.models.necks.FeatureMapProcessor(select_index: Optional[Union[int, Tuple[int]]] = None, concat: bool = False, scale_factor: float = 1.0, apply_relu: bool = False, align_corners: bool = False)[源代码]¶

A PyTorch module for selecting, concatenating, and rescaling feature maps.

参数

select_index (Optional[Union[int, Tuple[int]]], optional) – Index or indices of feature maps to select. Defaults to None, which means all feature maps are used.
concat (bool, optional) – Whether to concatenate the selected feature maps. Defaults to False.
scale_factor (float, optional) – The scaling factor to apply to the feature maps. Defaults to 1.0.
apply_relu (bool, optional) – Whether to apply ReLU on input feature maps. Defaults to False.
align_corners (bool, optional) – Whether to align corners when resizing the feature maps. Defaults to False.

forward(inputs: Union[torch.Tensor, Sequence[torch.Tensor]]) → Union[torch.Tensor, List[torch.Tensor]][源代码]¶

Define the computation performed at every call.

Should be overridden by all subclasses.

备注

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmpose.models.necks.GlobalAveragePooling[源代码]¶

Global Average Pooling neck.

Note that we use view to remove extra channel after pooling. We do not use squeeze as it will also remove the batch dimension when the tensor has a batch dimension of size 1, which can lead to unexpected errors.

forward(inputs)[源代码]¶: Forward function.

class mmpose.models.necks.HybridEncoder(encoder_cfg: Union[mmengine.config.config.ConfigDict, dict] = {}, projector: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, num_encoder_layers: int = 1, in_channels: List[int] = [512, 1024, 2048], feat_strides: List[int] = [8, 16, 32], hidden_dim: int = 256, use_encoder_idx: List[int] = [2], pe_temperature: int = 10000, widen_factor: float = 1.0, deepen_factor: float = 1.0, spe_learnable: bool = False, output_indices: Optional[List[int]] = None, norm_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = {'requires_grad': True, 'type': 'BN'}, act_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = {'inplace': True, 'type': 'SiLU'})[源代码]¶

Hybrid encoder neck introduced in RT-DETR by Lyu et al (2023), combining transformer encoders with a Feature Pyramid Network (FPN) and a Path Aggregation Network (PAN).

参数

encoder_cfg (ConfigType) – Configuration for the transformer encoder.
projector (OptConfigType, optional) – Configuration for an optional projector module. Defaults to None.
num_encoder_layers (int, optional) – Number of encoder layers. Defaults to 1.
in_channels (List[int], optional) – Input channels of feature maps. Defaults to [512, 1024, 2048].
feat_strides (List[int], optional) – Strides of feature maps. Defaults to [8, 16, 32].
hidden_dim (int, optional) – Hidden dimension of the MLP. Defaults to 256.
use_encoder_idx (List[int], optional) – Indices of encoder layers to use. Defaults to [2].
pe_temperature (int, optional) – Positional encoding temperature. Defaults to 10000.
widen_factor (float, optional) – Expansion factor for CSPRepLayer. Defaults to 1.0.
deepen_factor (float, optional) – Depth multiplier for CSPRepLayer. Defaults to 1.0.
spe_learnable (bool, optional) – Whether positional encoding is learnable. Defaults to False.
output_indices (Optional[List[int]], optional) – Indices of output layers. Defaults to None.
norm_cfg (OptConfigType, optional) – Configuration for normalization layers. Defaults to Batch Normalization.
act_cfg (OptConfigType, optional) – Configuration for activation layers. Defaults to SiLU (Swish) with in-place operation.

forward(inputs: Tuple[torch.Tensor]) → Tuple[torch.Tensor][源代码]¶: Forward function.

switch_to_deploy(test_cfg)[源代码]¶: Switch to deploy mode.

class mmpose.models.necks.PoseWarperNeck(in_channels, out_channels, inner_channels, deform_groups=17, dilations=(3, 6, 12, 18, 24), trans_conv_kernel=1, res_blocks_cfg=None, offsets_kernel=3, deform_conv_kernel=3, in_index=0, input_transform=None, freeze_trans_layer=True, norm_eval=False, im2col_step=80)[源代码]¶

PoseWarper neck.

“Learning temporal pose estimation from sparsely-labeled videos”.

参数

in_channels (int) – Number of input channels from backbone
out_channels (int) – Number of output channels
inner_channels (int) – Number of intermediate channels of the res block
deform_groups (int) – Number of groups in the deformable conv
dilations (list|tuple) – different dilations of the offset conv layers
trans_conv_kernel (int) – the kernel of the trans conv layer, which is used to get heatmap from the output of backbone. Default: 1
res_blocks_cfg (dict|None) –
config of residual blocks. If None, use the default values. If not None, it should contain the following keys:
- block (str): the type of residual block, Default: ‘BASIC’.
- num_blocks (int): the number of blocks, Default: 20.
offsets_kernel (int) – the kernel of offset conv layer.
deform_conv_kernel (int) – the kernel of defomrable conv layer.
in_index (int|Sequence[int]) – Input feature index. Default: 0
input_transform (str|None) –
Transformation type of input features. Options: ‘resize_concat’, ‘multiple_select’, None. Default: None.
- ’resize_concat’: Multiple feature maps will be resize to the same size as first one and than concat together. Usually used in FCN head of HRNet.
- ’multiple_select’: Multiple feature maps will be bundle into a list and passed into decode head.
- None: Only one select feature map is allowed.
freeze_trans_layer (bool) – Whether to freeze the transition layer (stop grad and set eval mode). Default: True.
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.
im2col_step (int) – the argument im2col_step in deformable conv, Default: 80.

forward(inputs, frame_weight)[源代码]¶

Define the computation performed at every call.

Should be overridden by all subclasses.

备注

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

train(mode=True)[源代码]¶: Convert the model into training mode.

class mmpose.models.necks.YOLOXPAFPN(in_channels, out_channels, num_csp_blocks=3, use_depthwise=False, upsample_cfg={'mode': 'nearest', 'scale_factor': 2}, conv_cfg=None, norm_cfg={'eps': 0.001, 'momentum': 0.03, 'type': 'BN'}, act_cfg={'type': 'Swish'}, init_cfg={'a': 2.23606797749979, 'distribution': 'uniform', 'layer': 'Conv2d', 'mode': 'fan_in', 'nonlinearity': 'leaky_relu', 'type': 'Kaiming'})[源代码]¶

Path Aggregation Network used in YOLOX.

参数

in_channels (List[int]) – Number of input channels per scale.
out_channels (int) – Number of output channels (used at each scale)
num_csp_blocks (int) – Number of bottlenecks in CSPLayer. Default: 3
use_depthwise (bool) – Whether to depthwise separable convolution in blocks. Default: False
upsample_cfg (dict) – Config dict for interpolate layer. Default: dict(scale_factor=2, mode=’nearest’)
conv_cfg (dict, optional) – Config dict for convolution layer. Default: None, which means using conv2d.
norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’BN’)
act_cfg (dict) – Config dict for activation layer. Default: dict(type=’Swish’)
init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None.

forward(inputs)[源代码]¶

参数: inputs (tuple[Tensor]) – input features.
返回: YOLOXPAFPN features.
返回类型: tuple[Tensor]

detectors¶

heads¶

losses¶

misc¶

class mmpose.models.utils.CSPLayer(in_channels: int, out_channels: int, expand_ratio: float = 0.5, num_blocks: int = 1, add_identity: bool = True, use_depthwise: bool = False, use_cspnext_block: bool = False, channel_attention: bool = False, conv_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 0.001, 'momentum': 0.03, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'type': 'Swish'}, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[mmengine.config.config.ConfigDict, dict]]]] = None)[源代码]¶

Cross Stage Partial Layer.

参数

in_channels (int) – The input channels of the CSP layer.
out_channels (int) – The output channels of the CSP layer.
expand_ratio (float) – Ratio to adjust the number of channels of the hidden layer. Defaults to 0.5.
num_blocks (int) – Number of blocks. Defaults to 1.
add_identity (bool) – Whether to add identity in blocks. Defaults to True.
use_cspnext_block (bool) – Whether to use CSPNeXt block. Defaults to False.
use_depthwise (bool) – Whether to use depthwise separable convolution in blocks. Defaults to False.
channel_attention (bool) – Whether to add channel attention in each stage. Defaults to True.
conv_cfg (dict, optional) – Config dict for convolution layer. Defaults to None, which means using conv2d.
norm_cfg (dict) – Config dict for normalization layer. Defaults to dict(type=’BN’)
act_cfg (dict) – Config dict for activation layer. Defaults to dict(type=’Swish’)

:param init_cfg (ConfigDict or dict or list[dict] or: list[ConfigDict], optional): Initialization config dict.: Defaults to None.

forward(x: torch.Tensor) → torch.Tensor[源代码]¶: Forward function.

class mmpose.models.utils.DetrTransformerEncoder(num_layers: int, layer_cfg: Union[mmengine.config.config.ConfigDict, dict], num_cp: int = - 1, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None)[源代码]¶

Encoder of DETR.

参数

num_layers (int) – Number of encoder layers.
layer_cfg (ConfigDict or dict) – the config of each encoder layer. All the layers will share the same config.
num_cp (int) – Number of checkpointing blocks in encoder layer. Default to -1.
init_cfg (ConfigDict or dict, optional) – the config to control the initialization. Defaults to None.

forward(query: torch.Tensor, query_pos: torch.Tensor, key_padding_mask: torch.Tensor, **kwargs) → torch.Tensor[源代码]¶

Forward function of encoder.

参数

query (Tensor) – Input queries of encoder, has shape (bs, num_queries, dim).
query_pos (Tensor) – The positional embeddings of the queries, has shape (bs, num_queries, dim).
key_padding_mask (Tensor) – The key_padding_mask of self_attn input. ByteTensor, has shape (bs, num_queries).

返回

Has shape (bs, num_queries, dim) if batch_first is True, otherwise (num_queries, bs, dim).

返回类型

Tensor

class mmpose.models.utils.FrozenBatchNorm2d(n, eps: int = 1e-05)[源代码]¶

BatchNorm2d where the batch statistics and the affine parameters are fixed.

Copy-paste from torchvision.misc.ops with added eps before rqsrt, without which any other models than torchvision.models.resnet[18,34,50,101] produce nans.

forward(x)[源代码]¶

Define the computation performed at every call.

Should be overridden by all subclasses.

备注

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmpose.models.utils.GAUEncoder(in_token_dims, out_token_dims, expansion_factor=2, s=128, eps=1e-05, dropout_rate=0.0, drop_path=0.0, act_fn='SiLU', bias=False, pos_enc: str = 'none', spatial_dim: int = 1)[源代码]¶

Gated Attention Unit (GAU) Encoder.

参数

in_token_dims (int) – The input token dimension.
out_token_dims (int) – The output token dimension.
expansion_factor (int, optional) – The expansion factor of the intermediate token dimension. Defaults to 2.
s (int, optional) – The self-attention feature dimension. Defaults to 128.
eps (float, optional) – The minimum value in clamp. Defaults to 1e-5.
dropout_rate (float, optional) – The dropout rate. Defaults to 0.0.
drop_path (float, optional) – The drop path rate. Defaults to 0.0.
act_fn (str, optional) –
The activation function which should be one of the following options:
- ’ReLU’: ReLU activation.
- ’SiLU’: SiLU activation.
Defaults to ‘SiLU’.
bias (bool, optional) – Whether to use bias in linear layers. Defaults to False.
pos_enc (bool, optional) – Whether to use rotary position embedding. Defaults to False.
spatial_dim (int, optional) – The spatial dimension of inputs

Reference:: Transformer Quality in Linear Time

forward(x, mask=None, pos_enc=None)[源代码]¶: Forward function.

class mmpose.models.utils.PatchEmbed(in_channels=3, embed_dims=768, conv_type='Conv2d', kernel_size=16, stride=16, padding='corner', dilation=1, bias=True, norm_cfg=None, input_size=None, init_cfg=None)[源代码]¶

Image to Patch Embedding.

We use a conv layer to implement PatchEmbed.

参数

in_channels (int) – The num of input channels. Default: 3
embed_dims (int) – The dimensions of embedding. Default: 768
conv_type (str) – The config dict for embedding conv layer type selection. Default: “Conv2d.
kernel_size (int) – The kernel_size of embedding conv. Default: 16.
stride (int) – The slide stride of embedding conv. Default: None (Would be set as kernel_size).
padding (int | tuple | string) – The padding length of embedding conv. When it is a string, it means the mode of adaptive padding, support “same” and “corner” now. Default: “corner”.
dilation (int) – The dilation rate of embedding conv. Default: 1.
bias (bool) – Bias of embed conv. Default: True.
norm_cfg (dict, optional) – Config dict for normalization layer. Default: None.
input_size (int | tuple | None) – The size of input, which will be used to calculate the out size. Only work when dynamic_size is False. Default: None.
init_cfg (mmcv.ConfigDict, optional) – The Config for initialization. Default: None.

forward(x)[源代码]¶

参数

x (Tensor) – Has shape (B, C, H, W). In most case, C is 3.

返回

Contains merged results and its spatial shape.

x (Tensor): Has shape (B, out_h * out_w, embed_dims)

out_size (tuple[int]): Spatial shape of x, arrange as
(out_h, out_w).

返回类型

tuple

class mmpose.models.utils.RTMCCBlock(num_token, in_token_dims, out_token_dims, expansion_factor=2, s=128, eps=1e-05, dropout_rate=0.0, drop_path=0.0, attn_type='self-attn', act_fn='SiLU', bias=False, use_rel_bias=True, pos_enc=False)[源代码]¶

Gated Attention Unit (GAU) in RTMBlock.

参数

num_token (int) – The number of tokens.
in_token_dims (int) – The input token dimension.
out_token_dims (int) – The output token dimension.
expansion_factor (int, optional) – The expansion factor of the intermediate token dimension. Defaults to 2.
s (int, optional) – The self-attention feature dimension. Defaults to 128.
eps (float, optional) – The minimum value in clamp. Defaults to 1e-5.
dropout_rate (float, optional) – The dropout rate. Defaults to 0.0.
drop_path (float, optional) – The drop path rate. Defaults to 0.0.
attn_type (str, optional) –
Type of attention which should be one of the following options:
- ’self-attn’: Self-attention.
- ’cross-attn’: Cross-attention.
Defaults to ‘self-attn’.
act_fn (str, optional) –
The activation function which should be one of the following options:
- ’ReLU’: ReLU activation.
- ’SiLU’: SiLU activation.
Defaults to ‘SiLU’.
bias (bool, optional) – Whether to use bias in linear layers. Defaults to False.
use_rel_bias (bool, optional) – Whether to use relative bias. Defaults to True.
pos_enc (bool, optional) – Whether to use rotary position embedding. Defaults to False.

Reference:: Transformer Quality in Linear Time

forward(x)[源代码]¶: Forward function.

rel_pos_bias(seq_len, k_len=None)[源代码]¶: Add relative position bias.

class mmpose.models.utils.RepVGGBlock(in_channels: int, out_channels: int, stride: int = 1, padding: int = 1, dilation: int = 1, groups: int = 1, padding_mode: str = 'zeros', norm_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = {'type': 'BN'}, act_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = {'type': 'ReLU'}, without_branch_norm: bool = True, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None)[源代码]¶

A block in RepVGG architecture, supporting optional normalization in the identity branch.

This block consists of 3x3 and 1x1 convolutions, with an optional identity shortcut branch that includes normalization.

参数

in_channels (int) – The input channels of the block.
out_channels (int) – The output channels of the block.
stride (int) – The stride of the block. Defaults to 1.
padding (int) – The padding of the block. Defaults to 1.
dilation (int) – The dilation of the block. Defaults to 1.
groups (int) – The groups of the block. Defaults to 1.
padding_mode (str) – The padding mode of the block. Defaults to ‘zeros’.
norm_cfg (dict) – The config dict for normalization layers. Defaults to dict(type=’BN’).
act_cfg (dict) – The config dict for activation layers. Defaults to dict(type=’ReLU’).
without_branch_norm (bool) – Whether to skip branch_norm. Defaults to True.
init_cfg (dict) – The config dict for initialization. Defaults to None.

forward(x: torch.Tensor) → torch.Tensor[源代码]¶

Forward pass through the RepVGG block.

The output is the sum of 3x3 and 1x1 convolution outputs, along with the normalized identity branch output, followed by activation.

参数: x (Tensor) – The input tensor.
返回: The output tensor.
返回类型: Tensor

get_equivalent_kernel_bias()[源代码]¶

Derives the equivalent kernel and bias in a differentiable way.

返回: Equivalent kernel and bias
返回类型: tuple

switch_to_deploy(test_cfg: Optional[Dict] = None)[源代码]¶

Switches the block to deployment mode.

In deployment mode, the block uses a single convolution operation derived from the equivalent kernel and bias, replacing the original branches. This reduces computational complexity during inference.

class mmpose.models.utils.SinePositionalEncoding(out_channels: int, spatial_dim: int = 1, temperature: int = 100000.0, learnable: bool = False, eval_size: Optional[Union[int, Sequence[int]]] = None)[源代码]¶

Sine Positional Encoding Module. This module implements sine positional encoding, which is commonly used in transformer-based models to add positional information to the input sequences. It uses sine and cosine functions to create positional embeddings for each element in the input sequence.

参数

out_channels (int) – The number of features in the input sequence.
temperature (int) – A temperature parameter used to scale the positional encodings. Defaults to 10000.
spatial_dim (int) – The number of spatial dimension of input feature. 1 represents sequence data and 2 represents grid data. Defaults to 1.
learnable (bool) – Whether to optimize the frequency base. Defaults to False.
eval_size (int, tuple[int], optional) – The fixed spatial size of input features. Defaults to None.

static apply_additional_pos_enc(feature: torch.Tensor, pos_enc: torch.Tensor, spatial_dim: int = 1)[源代码]¶

Apply additional positional encoding to input features.

参数

feature (Tensor) – Input feature tensor.
pos_enc (Tensor) – Positional encoding tensor.
spatial_dim (int) – Spatial dimension of input features.

static apply_rotary_pos_enc(feature: torch.Tensor, pos_enc: torch.Tensor, spatial_dim: int = 1)[源代码]¶

Apply rotary positional encoding to input features.

参数

feature (Tensor) – Input feature tensor.
pos_enc (Tensor) – Positional encoding tensor.
spatial_dim (int) – Spatial dimension of input features.

forward(*args, **kwargs)[源代码]¶

Define the computation performed at every call.

Should be overridden by all subclasses.

备注

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

generate_pos_encoding(size: Optional[Union[int, Sequence[int]]] = None, position: Optional[torch.Tensor] = None)[源代码]¶

Generate positional encoding for input features.

参数

size (int or tuple[int]) – Size of the input features. Required if position is None.
position (Tensor, optional) – Position tensor. Required if size is None.

mmpose.models.utils.check_and_update_config(neck: Optional[Union[mmengine.config.config.Config, mmengine.config.config.ConfigDict]], head: Union[mmengine.config.config.Config, mmengine.config.config.ConfigDict]) → Tuple[Optional[Dict], Dict][源代码]¶

Check and update the configuration of the head and neck components. :param neck: Configuration for the neck component. :type neck: Optional[ConfigType] :param head: Configuration for the head component. :type head: ConfigType

返回

Updated configurations for the neck: and head components.

返回类型

Tuple[Optional[Dict], Dict]

mmpose.models.utils.filter_scores_and_topk(scores, score_thr, topk, results=None)[源代码]¶

Filter results using score threshold and topk candidates.

参数

scores (Tensor) – The scores, shape (num_bboxes, K).
score_thr (float) – The score filter threshold.
topk (int) – The number of topk candidates.
results (dict or list or Tensor, Optional) – The results to which the filtering rule is to be applied. The shape of each item is (num_bboxes, N).

返回

Filtered results

scores (Tensor): The scores after being filtered, shape (num_bboxes_filtered, ).

labels (Tensor): The class labels, shape (num_bboxes_filtered, ).

anchor_idxs (Tensor): The anchor indexes, shape (num_bboxes_filtered, ).

filtered_results (dict or list or Tensor, Optional): The filtered results. The shape of each item is (num_bboxes_filtered, N).

返回类型

tuple

mmpose.models.utils.inverse_sigmoid(x: torch.Tensor, eps: float = 0.001) → torch.Tensor[源代码]¶

Inverse function of sigmoid.

参数

x (Tensor) – The tensor to do the inverse.
eps (float) – EPS avoid numerical overflow. Defaults 1e-5.

返回

The x has passed the inverse function of sigmoid, has the same shape with input.

返回类型

Tensor

mmpose.models.utils.nchw_to_nlc(x)[源代码]¶

Flatten [N, C, H, W] shape tensor to [N, L, C] shape tensor.

参数: x (Tensor) – The input tensor of shape [N, C, H, W] before conversion.
返回: The output tensor of shape [N, L, C] after conversion.
返回类型: Tensor

mmpose.models.utils.nlc_to_nchw(x, hw_shape)[源代码]¶

Convert [N, L, C] shape tensor to [N, C, H, W] shape tensor.

参数

x (Tensor) – The input tensor of shape [N, L, C] before conversion.
hw_shape (Sequence[int]) – The height and width of output feature map.

返回

The output tensor of shape [N, C, H, W] after conversion.

返回类型

Tensor

mmpose.models.utils.rope(x, dim)[源代码]¶

Applies Rotary Position Embedding to input tensor.

参数

x (torch.Tensor) – Input tensor.
dim (int | list[int]) – The spatial dimension(s) to apply rotary position embedding.

返回

The tensor after applying rotary position: embedding.

返回类型

torch.Tensor

Reference:: RoFormer: Enhanced Transformer with Rotary Position Embedding

mmpose.datasets¶

class mmpose.datasets.CombinedDataset(metainfo: dict, datasets: list, pipeline: List[Union[dict, Callable]] = [], sample_ratio_factor: Optional[List[float]] = None, **kwargs)[源代码]¶

A wrapper of combined dataset.

参数

metainfo (dict) – The meta information of combined dataset.
datasets (list) – The configs of datasets to be combined.
pipeline (list, optional) – Processing pipeline. Defaults to [].
sample_ratio_factor (list, optional) – A list of sampling ratio factors for each dataset. Defaults to None

full_init()[源代码]¶: Fully initialize all sub datasets.

get_data_info(idx: int) → dict[源代码]¶

Get annotation by index.

参数: idx (int) – Global index of CombinedDataset.
返回: The idx-th annotation of the datasets.
返回类型: dict

property metainfo¶

Get meta information of dataset.

返回: meta information collected from BaseDataset.METAINFO, annotation file and metainfo argument during instantiation.
返回类型: dict

prepare_data(idx: int) → Any[源代码]¶

Get data processed by self.pipeline.The source dataset is depending on the index.

参数: idx (int) – The index of data_info.
返回: Depends on self.pipeline.
返回类型: Any

class mmpose.datasets.MultiSourceSampler(dataset: Sized, batch_size: int, source_ratio: List[Union[int, float]], shuffle: bool = True, round_up: bool = True, seed: Optional[int] = None)[源代码]¶

Multi-Source Sampler. According to the sampling ratio, sample data from different datasets to form batches.

参数

dataset (Sized) – The dataset
batch_size (int) – Size of mini-batch
source_ratio (list[int | float]) – The sampling ratio of different source datasets in a mini-batch
shuffle (bool) – Whether shuffle the dataset or not. Defaults to True
round_up (bool) – Whether to add extra samples to make the number of samples evenly divisible by the world size. Defaults to True.
seed (int, optional) – Random seed. If None, set a random seed. Defaults to None

set_epoch(epoch: int) → None[源代码]¶: Compatible in `epoch-based runner.

mmpose.datasets.build_dataset(cfg, default_args=None)[源代码]¶

Build a dataset from config dict.

参数

cfg (dict) – Config dict. It should at least contain the key “type”.
default_args (dict, optional) – Default initialization arguments. Default: None.

返回

The constructed dataset.

返回类型

Dataset

datasets¶

class mmpose.datasets.datasets.base.BaseCocoStyleDataset(ann_file: str = '', bbox_file: Optional[str] = None, data_mode: str = 'topdown', metainfo: Optional[dict] = None, data_root: Optional[str] = None, data_prefix: dict = {'img': ''}, filter_cfg: Optional[dict] = None, indices: Optional[Union[int, Sequence[int]]] = None, serialize_data: bool = True, pipeline: List[Union[dict, Callable]] = [], test_mode: bool = False, lazy_init: bool = False, max_refetch: int = 1000, sample_interval: int = 1)[源代码]

Base class for COCO-style datasets.

参数

ann_file (str) – Annotation file path. Default: ‘’.
bbox_file (str, optional) – Detection result file path. If bbox_file is set, detected bboxes loaded from this file will be used instead of ground-truth bboxes. This setting is only for evaluation, i.e., ignored when test_mode is False. Default: None.
data_mode (str) – Specifies the mode of data samples: 'topdown' or 'bottomup'. In 'topdown' mode, each data sample contains one instance; while in 'bottomup' mode, each data sample contains all instances in a image. Default: 'topdown'
metainfo (dict, optional) – Meta information for dataset, such as class information. Default: None.
data_root (str, optional) – The root directory for data_prefix and ann_file. Default: None.
data_prefix (dict, optional) – Prefix for training data. Default: dict(img='').
filter_cfg (dict, optional) – Config for filter data. Default: None.
indices (int or Sequence[int], optional) – Support using first few data in annotation file to facilitate training/testing on a smaller dataset. Default: None which means using all data_infos.
serialize_data (bool, optional) – Whether to hold memory using serialized objects, when enabled, data loader workers can use shared RAM from master process instead of making a copy. Default: True.
pipeline (list, optional) – Processing pipeline. Default: [].
test_mode (bool, optional) – test_mode=True means in test phase. Default: False.
lazy_init (bool, optional) – Whether to load annotation during instantiation. In some cases, such as visualization, only the meta information of the dataset is needed, which is not necessary to load annotation file. Basedataset can skip load annotations to save time by set lazy_init=False. Default: False.
max_refetch (int, optional) – If Basedataset.prepare_data get a None img. The maximum extra number of cycles to get a valid image. Default: 1000.
sample_interval (int, optional) – The sample interval of the dataset. Default: 1.

filter_data() → List[dict][源代码]

Filter annotations according to filter_cfg. Defaults return full data_list.

If ‘bbox_score_thr` in filter_cfg, the annotation with bbox_score below the threshold bbox_score_thr will be filtered out.

get_data_info(idx: int) → dict[源代码]

Get data info by index.

参数: idx (int) – Index of data info.
返回: Data info.
返回类型: dict

load_data_list() → List[dict][源代码]: Load data list from COCO annotation file or person detection result file.

parse_data_info(raw_data_info: dict) → Optional[dict][源代码]

Parse raw COCO annotation of an instance.

参数

raw_data_info (dict) –

Raw data information loaded from ann_file. It should have following contents:

'raw_ann_info': Raw annotation of an instance
'raw_img_info': Raw information of the image that
contains the instance

返回

Parsed instance annotation

返回类型

dict | None

prepare_data(idx) → Any[源代码]

Get data processed by self.pipeline.

BaseCocoStyleDataset overrides this method from mmengine.dataset.BaseDataset to add the metainfo into the data_info before it is passed to the pipeline.

参数: idx (int) – The index of data_info.
返回: Depends on self.pipeline.
返回类型: Any

class mmpose.datasets.datasets.base.BaseMocapDataset(ann_file: str = '', seq_len: int = 1, multiple_target: int = 0, causal: bool = True, subset_frac: float = 1.0, camera_param_file: Optional[str] = None, data_mode: str = 'topdown', metainfo: Optional[dict] = None, data_root: Optional[str] = None, data_prefix: dict = {'img': ''}, filter_cfg: Optional[dict] = None, indices: Optional[Union[int, Sequence[int]]] = None, serialize_data: bool = True, pipeline: List[Union[dict, Callable]] = [], test_mode: bool = False, lazy_init: bool = False, max_refetch: int = 1000)[源代码]

Base class for 3d body datasets.

参数

ann_file (str) – Annotation file path. Default: ‘’.
seq_len (int) – Number of frames in a sequence. Default: 1.
multiple_target (int) – If larger than 0, merge every multiple_target sequence together. Default: 0.
causal (bool) – If set to True, the rightmost input frame will be the target frame. Otherwise, the middle input frame will be the target frame. Default: True.
subset_frac (float) – The fraction to reduce dataset size. If set to 1, the dataset size is not reduced. Default: 1.
camera_param_file (str) – Cameras’ parameters file. Default: None.
data_mode (str) – Specifies the mode of data samples: 'topdown' or 'bottomup'. In 'topdown' mode, each data sample contains one instance; while in 'bottomup' mode, each data sample contains all instances in a image. Default: 'topdown'
metainfo (dict, optional) – Meta information for dataset, such as class information. Default: None.
data_root (str, optional) – The root directory for data_prefix and ann_file. Default: None.
data_prefix (dict, optional) – Prefix for training data. Default: dict(img='').
filter_cfg (dict, optional) – Config for filter data. Default: None.
indices (int or Sequence[int], optional) – Support using first few data in annotation file to facilitate training/testing on a smaller dataset. Default: None which means using all data_infos.
serialize_data (bool, optional) – Whether to hold memory using serialized objects, when enabled, data loader workers can use shared RAM from master process instead of making a copy. Default: True.
pipeline (list, optional) – Processing pipeline. Default: [].
test_mode (bool, optional) – test_mode=True means in test phase. Default: False.
lazy_init (bool, optional) – Whether to load annotation during instantiation. In some cases, such as visualization, only the meta information of the dataset is needed, which is not necessary to load annotation file. Basedataset can skip load annotations to save time by set lazy_init=False. Default: False.
max_refetch (int, optional) – If Basedataset.prepare_data get a None img. The maximum extra number of cycles to get a valid image. Default: 1000.

get_camera_param(imgname)[源代码]

Get camera parameters of a frame by its image name.

Override this method to specify how to get camera parameters.

get_data_info(idx: int) → dict[源代码]

Get data info by index.

参数: idx (int) – Index of data info.
返回: Data info.
返回类型: dict

get_sequence_indices() → List[List[int]][源代码]

Build sequence indices.

The default method creates sample indices that each sample is a single frame (i.e. seq_len=1). Override this method in the subclass to define how frames are sampled to form data samples.

Outputs:

sample_indices: the frame indices of each sample.: For a sample, all frames will be treated as an input sequence, and the ground-truth pose of the last frame will be the target.

load_data_list() → List[dict][源代码]: Load data list from COCO annotation file or person detection result file.

prepare_data(idx) → Any[源代码]

Get data processed by self.pipeline.

BaseCocoStyleDataset overrides this method from mmengine.dataset.BaseDataset to add the metainfo into the data_info before it is passed to the pipeline.

参数: idx (int) – The index of data_info.
返回: Depends on self.pipeline.
返回类型: Any

class mmpose.datasets.datasets.body.AicDataset(ann_file: str = '', bbox_file: Optional[str] = None, data_mode: str = 'topdown', metainfo: Optional[dict] = None, data_root: Optional[str] = None, data_prefix: dict = {'img': ''}, filter_cfg: Optional[dict] = None, indices: Optional[Union[int, Sequence[int]]] = None, serialize_data: bool = True, pipeline: List[Union[dict, Callable]] = [], test_mode: bool = False, lazy_init: bool = False, max_refetch: int = 1000, sample_interval: int = 1)[源代码]

AIC dataset for pose estimation.

“AI Challenger : A Large-scale Dataset for Going Deeper in Image Understanding”, arXiv’2017. More details can be found in the paper

AIC keypoints:

"right_shoulder",
"right_elbow",
"right_wrist",
"left_shoulder",
"left_elbow",
"left_wrist",
"right_hip",
"right_knee",
"right_ankle",
"left_hip",
"left_knee",
"left_ankle",
"head_top",
"neck"

参数

ann_file (str) – Annotation file path. Default: ‘’.
bbox_file (str, optional) – Detection result file path. If bbox_file is set, detected bboxes loaded from this file will be used instead of ground-truth bboxes. This setting is only for evaluation, i.e., ignored when test_mode is False. Default: None.
data_mode (str) – Specifies the mode of data samples: 'topdown' or 'bottomup'. In 'topdown' mode, each data sample contains one instance; while in 'bottomup' mode, each data sample contains all instances in a image. Default: 'topdown'
metainfo (dict, optional) – Meta information for dataset, such as class information. Default: None.
data_root (str, optional) – The root directory for data_prefix and ann_file. Default: None.
data_prefix (dict, optional) – Prefix for training data. Default: dict(img=None, ann=None).
filter_cfg (dict, optional) – Config for filter data. Default: None.
indices (int or Sequence[int], optional) – Support using first few data in annotation file to facilitate training/testing on a smaller dataset. Default: None which means using all data_infos.
serialize_data (bool, optional) – Whether to hold memory using serialized objects, when enabled, data loader workers can use shared RAM from master process instead of making a copy. Default: True.
pipeline (list, optional) – Processing pipeline. Default: [].
test_mode (bool, optional) – test_mode=True means in test phase. Default: False.
lazy_init (bool, optional) – Whether to load annotation during instantiation. In some cases, such as visualization, only the meta information of the dataset is needed, which is not necessary to load annotation file. Basedataset can skip load annotations to save time by set lazy_init=False. Default: False.
max_refetch (int, optional) – If Basedataset.prepare_data get a None img. The maximum extra number of cycles to get a valid image. Default: 1000.

class mmpose.datasets.datasets.body.CocoDataset(ann_file: str = '', bbox_file: Optional[str] = None, data_mode: str = 'topdown', metainfo: Optional[dict] = None, data_root: Optional[str] = None, data_prefix: dict = {'img': ''}, filter_cfg: Optional[dict] = None, indices: Optional[Union[int, Sequence[int]]] = None, serialize_data: bool = True, pipeline: List[Union[dict, Callable]] = [], test_mode: bool = False, lazy_init: bool = False, max_refetch: int = 1000, sample_interval: int = 1)[源代码]

COCO dataset for pose estimation.

“Microsoft COCO: Common Objects in Context”, ECCV’2014. More details can be found in the paper .

COCO keypoints:

'nose',
'left_eye',
'right_eye',
'left_ear',
'right_ear',
'left_shoulder',
'right_shoulder',
'left_elbow',
'right_elbow',
'left_wrist',
'right_wrist',
'left_hip',
'right_hip',
'left_knee',
'right_knee',
'left_ankle',
'right_ankle'

参数

ann_file (str) – Annotation file path. Default: ‘’.
bbox_file (str, optional) – Detection result file path. If bbox_file is set, detected bboxes loaded from this file will be used instead of ground-truth bboxes. This setting is only for evaluation, i.e., ignored when test_mode is False. Default: None.
data_mode (str) – Specifies the mode of data samples: 'topdown' or 'bottomup'. In 'topdown' mode, each data sample contains one instance; while in 'bottomup' mode, each data sample contains all instances in a image. Default: 'topdown'
metainfo (dict, optional) – Meta information for dataset, such as class information. Default: None.
data_root (str, optional) – The root directory for data_prefix and ann_file. Default: None.
data_prefix (dict, optional) – Prefix for training data. Default: dict(img=None, ann=None).
filter_cfg (dict, optional) – Config for filter data. Default: None.
indices (int or Sequence[int], optional) – Support using first few data in annotation file to facilitate training/testing on a smaller dataset. Default: None which means using all data_infos.
serialize_data (bool, optional) – Whether to hold memory using serialized objects, when enabled, data loader workers can use shared RAM from master process instead of making a copy. Default: True.
pipeline (list, optional) – Processing pipeline. Default: [].
test_mode (bool, optional) – test_mode=True means in test phase. Default: False.
lazy_init (bool, optional) – Whether to load annotation during instantiation. In some cases, such as visualization, only the meta information of the dataset is needed, which is not necessary to load annotation file. Basedataset can skip load annotations to save time by set lazy_init=False. Default: False.
max_refetch (int, optional) – If Basedataset.prepare_data get a None img. The maximum extra number of cycles to get a valid image. Default: 1000.

class mmpose.datasets.datasets.body.CrowdPoseDataset(ann_file: str = '', bbox_file: Optional[str] = None, data_mode: str = 'topdown', metainfo: Optional[dict] = None, data_root: Optional[str] = None, data_prefix: dict = {'img': ''}, filter_cfg: Optional[dict] = None, indices: Optional[Union[int, Sequence[int]]] = None, serialize_data: bool = True, pipeline: List[Union[dict, Callable]] = [], test_mode: bool = False, lazy_init: bool = False, max_refetch: int = 1000, sample_interval: int = 1)[源代码]

CrowdPose dataset for pose estimation.

“CrowdPose: Efficient Crowded Scenes Pose Estimation and A New Benchmark”, CVPR’2019. More details can be found in the paper.

CrowdPose keypoints:

'left_shoulder',
'right_shoulder',
'left_elbow',
'right_elbow',
'left_wrist',
'right_wrist',
'left_hip',
'right_hip',
'left_knee',
'right_knee',
'left_ankle',
'right_ankle',
'top_head',
'neck'

参数

ann_file (str) – Annotation file path. Default: ‘’.
bbox_file (str, optional) – Detection result file path. If bbox_file is set, detected bboxes loaded from this file will be used instead of ground-truth bboxes. This setting is only for evaluation, i.e., ignored when test_mode is False. Default: None.
data_mode (str) – Specifies the mode of data samples: 'topdown' or 'bottomup'. In 'topdown' mode, each data sample contains one instance; while in 'bottomup' mode, each data sample contains all instances in a image. Default: 'topdown'
metainfo (dict, optional) – Meta information for dataset, such as class information. Default: None.
data_root (str, optional) – The root directory for data_prefix and ann_file. Default: None.
data_prefix (dict, optional) – Prefix for training data. Default: dict(img=None, ann=None).
filter_cfg (dict, optional) – Config for filter data. Default: None.
indices (int or Sequence[int], optional) – Support using first few data in annotation file to facilitate training/testing on a smaller dataset. Default: None which means using all data_infos.
serialize_data (bool, optional) – Whether to hold memory using serialized objects, when enabled, data loader workers can use shared RAM from master process instead of making a copy. Default: True.
pipeline (list, optional) – Processing pipeline. Default: [].
test_mode (bool, optional) – test_mode=True means in test phase. Default: False.
lazy_init (bool, optional) – Whether to load annotation during instantiation. In some cases, such as visualization, only the meta information of the dataset is needed, which is not necessary to load annotation file. Basedataset can skip load annotations to save time by set lazy_init=False. Default: False.
max_refetch (int, optional) – If Basedataset.prepare_data get a None img. The maximum extra number of cycles to get a valid image. Default: 1000.

class mmpose.datasets.datasets.body.ExlposeDataset(ann_file: str = '', bbox_file: Optional[str] = None, data_mode: str = 'topdown', metainfo: Optional[dict] = None, data_root: Optional[str] = None, data_prefix: dict = {'img': ''}, filter_cfg: Optional[dict] = None, indices: Optional[Union[int, Sequence[int]]] = None, serialize_data: bool = True, pipeline: List[Union[dict, Callable]] = [], test_mode: bool = False, lazy_init: bool = False, max_refetch: int = 1000, sample_interval: int = 1)[源代码]

Exlpose dataset for pose estimation.

“Human Pose Estimation in Extremely Low-Light Conditions”, CVPR’2023. More details can be found in the paper.

ExLPose keypoints:: 0: “left_shoulder”, 1: “right_shoulder”, 2: “left_elbow”, 3: “right_elbow”, 4: “left_wrist”, 5: “right_wrist”, 6: “left_hip”, 7: “right_hip”, 8: “left_knee”, 9: “right_knee”, 10: “left_ankle”, 11: “right_ankle”, 12: “head”, 13: “neck”

参数

ann_file (str) – Annotation file path. Default: ‘’.
bbox_file (str, optional) – Detection result file path. If bbox_file is set, detected bboxes loaded from this file will be used instead of ground-truth bboxes. This setting is only for evaluation, i.e., ignored when test_mode is False. Default: None.
data_mode (str) – Specifies the mode of data samples: 'topdown' or 'bottomup'. In 'topdown' mode, each data sample contains one instance; while in 'bottomup' mode, each data sample contains all instances in a image. Default: 'topdown'
metainfo (dict, optional) – Meta information for dataset, such as class information. Default: None.
data_root (str, optional) – The root directory for data_prefix and ann_file. Default: None.
data_prefix (dict, optional) – Prefix for training data. Default: dict(img=None, ann=None).
filter_cfg (dict, optional) – Config for filter data. Default: None.
indices (int or Sequence[int], optional) – Support using first few data in annotation file to facilitate training/testing on a smaller dataset. Default: None which means using all data_infos.
serialize_data (bool, optional) – Whether to hold memory using serialized objects, when enabled, data loader workers can use shared RAM from master process instead of making a copy. Default: True.
pipeline (list, optional) – Processing pipeline. Default: [].
test_mode (bool, optional) – test_mode=True means in test phase. Default: False.
lazy_init (bool, optional) – Whether to load annotation during instantiation. In some cases, such as visualization, only the meta information of the dataset is needed, which is not necessary to load annotation file. Basedataset can skip load annotations to save time by set lazy_init=False. Default: False.
max_refetch (int, optional) – If Basedataset.prepare_data get a None img. The maximum extra number of cycles to get a valid image. Default: 1000.

class mmpose.datasets.datasets.body.HumanArt21Dataset(ann_file: str = '', bbox_file: Optional[str] = None, data_mode: str = 'topdown', metainfo: Optional[dict] = None, data_root: Optional[str] = None, data_prefix: dict = {'img': ''}, filter_cfg: Optional[dict] = None, indices: Optional[Union[int, Sequence[int]]] = None, serialize_data: bool = True, pipeline: List[Union[dict, Callable]] = [], test_mode: bool = False, lazy_init: bool = False, max_refetch: int = 1000, sample_interval: int = 1)[源代码]

Human-Art dataset for pose estimation with 21 kpts.

“Human-Art: A Versatile Human-Centric Dataset Bridging Natural and Artificial Scenes”, CVPR’2023. More details can be found in the paper .

Human-Art keypoints:

'nose',
'left_eye',
'right_eye',
'left_ear',
'right_ear',
'left_shoulder',
'right_shoulder',
'left_elbow',
'right_elbow',
'left_wrist',
'right_wrist',
'left_hip',
'right_hip',
'left_knee',
'right_knee',
'left_ankle',
'right_ankle',
'left_finger',
'right_finger',
'left_toe',
'right_toe',

参数

ann_file (str) – Annotation file path. Default: ‘’.
bbox_file (str, optional) – Detection result file path. If bbox_file is set, detected bboxes loaded from this file will be used instead of ground-truth bboxes. This setting is only for evaluation, i.e., ignored when test_mode is False. Default: None.
data_mode (str) – Specifies the mode of data samples: 'topdown' or 'bottomup'. In 'topdown' mode, each data sample contains one instance; while in 'bottomup' mode, each data sample contains all instances in a image. Default: 'topdown'
metainfo (dict, optional) – Meta information for dataset, such as class information. Default: None.
data_root (str, optional) – The root directory for data_prefix and ann_file. Default: None.
data_prefix (dict, optional) – Prefix for training data. Default: dict(img=None, ann=None).
filter_cfg (dict, optional) – Config for filter data. Default: None.
indices (int or Sequence[int], optional) – Support using first few data in annotation file to facilitate training/testing on a smaller dataset. Default: None which means using all data_infos.
serialize_data (bool, optional) – Whether to hold memory using serialized objects, when enabled, data loader workers can use shared RAM from master process instead of making a copy. Default: True.
pipeline (list, optional) – Processing pipeline. Default: [].
test_mode (bool, optional) – test_mode=True means in test phase. Default: False.
lazy_init (bool, optional) – Whether to load annotation during instantiation. In some cases, such as visualization, only the meta information of the dataset is needed, which is not necessary to load annotation file. Basedataset can skip load annotations to save time by set lazy_init=False. Default: False.
max_refetch (int, optional) – If Basedataset.prepare_data get a None img. The maximum extra number of cycles to get a valid image. Default: 1000.

parse_data_info(raw_data_info: dict) → Optional[dict][源代码]

Parse raw COCO annotation of an instance.

参数

raw_data_info (dict) –

Raw data information loaded from ann_file. It should have following contents:

'raw_ann_info': Raw annotation of an instance
'raw_img_info': Raw information of the image that
contains the instance

返回

Parsed instance annotation

返回类型

dict | None

class mmpose.datasets.datasets.body.HumanArtDataset(ann_file: str = '', bbox_file: Optional[str] = None, data_mode: str = 'topdown', metainfo: Optional[dict] = None, data_root: Optional[str] = None, data_prefix: dict = {'img': ''}, filter_cfg: Optional[dict] = None, indices: Optional[Union[int, Sequence[int]]] = None, serialize_data: bool = True, pipeline: List[Union[dict, Callable]] = [], test_mode: bool = False, lazy_init: bool = False, max_refetch: int = 1000, sample_interval: int = 1)[源代码]

Human-Art dataset for pose estimation with 17 kpts.

“Human-Art: A Versatile Human-Centric Dataset Bridging Natural and Artificial Scenes”, CVPR’2023. More details can be found in the paper .

Human-Art keypoints:

'nose',
'left_eye',
'right_eye',
'left_ear',
'right_ear',
'left_shoulder',
'right_shoulder',
'left_elbow',
'right_elbow',
'left_wrist',
'right_wrist',
'left_hip',
'right_hip',
'left_knee',
'right_knee',
'left_ankle',
'right_ankle'

参数

ann_file (str) – Annotation file path. Default: ‘’.
bbox_file (str, optional) – Detection result file path. If bbox_file is set, detected bboxes loaded from this file will be used instead of ground-truth bboxes. This setting is only for evaluation, i.e., ignored when test_mode is False. Default: None.
data_mode (str) – Specifies the mode of data samples: 'topdown' or 'bottomup'. In 'topdown' mode, each data sample contains one instance; while in 'bottomup' mode, each data sample contains all instances in a image. Default: 'topdown'
metainfo (dict, optional) – Meta information for dataset, such as class information. Default: None.
data_root (str, optional) – The root directory for data_prefix and ann_file. Default: None.
data_prefix (dict, optional) – Prefix for training data. Default: dict(img=None, ann=None).
filter_cfg (dict, optional) – Config for filter data. Default: None.
indices (int or Sequence[int], optional) – Support using first few data in annotation file to facilitate training/testing on a smaller dataset. Default: None which means using all data_infos.
serialize_data (bool, optional) – Whether to hold memory using serialized objects, when enabled, data loader workers can use shared RAM from master process instead of making a copy. Default: True.
pipeline (list, optional) – Processing pipeline. Default: [].
test_mode (bool, optional) – test_mode=True means in test phase. Default: False.
lazy_init (bool, optional) – Whether to load annotation during instantiation. In some cases, such as visualization, only the meta information of the dataset is needed, which is not necessary to load annotation file. Basedataset can skip load annotations to save time by set lazy_init=False. Default: False.
max_refetch (int, optional) – If Basedataset.prepare_data get a None img. The maximum extra number of cycles to get a valid image. Default: 1000.

class mmpose.datasets.datasets.body.JhmdbDataset(ann_file: str = '', bbox_file: Optional[str] = None, data_mode: str = 'topdown', metainfo: Optional[dict] = None, data_root: Optional[str] = None, data_prefix: dict = {'img': ''}, filter_cfg: Optional[dict] = None, indices: Optional[Union[int, Sequence[int]]] = None, serialize_data: bool = True, pipeline: List[Union[dict, Callable]] = [], test_mode: bool = False, lazy_init: bool = False, max_refetch: int = 1000, sample_interval: int = 1)[源代码]

JhmdbDataset dataset for pose estimation.

“Towards understanding action recognition”, ICCV’2013. More details can be found in the paper

sub-JHMDB keypoints:

"neck",
"belly",
"head",
"right_shoulder",
"left_shoulder",
"right_hip",
"left_hip",
"right_elbow",
"left_elbow",
"right_knee",
"left_knee",
"right_wrist",
"left_wrist",
"right_ankle",
"left_ankle"

参数

ann_file (str) – Annotation file path. Default: ‘’.
bbox_file (str, optional) – Detection result file path. If bbox_file is set, detected bboxes loaded from this file will be used instead of ground-truth bboxes. This setting is only for evaluation, i.e., ignored when test_mode is False. Default: None.
data_mode (str) – Specifies the mode of data samples: 'topdown' or 'bottomup'. In 'topdown' mode, each data sample contains one instance; while in 'bottomup' mode, each data sample contains all instances in a image. Default: 'topdown'
metainfo (dict, optional) – Meta information for dataset, such as class information. Default: None.
data_root (str, optional) – The root directory for data_prefix and ann_file. Default: None.
data_prefix (dict, optional) – Prefix for training data. Default: dict(img=None, ann=None).
filter_cfg (dict, optional) – Config for filter data. Default: None.
indices (int or Sequence[int], optional) – Support using first few data in annotation file to facilitate training/testing on a smaller dataset. Default: None which means using all data_infos.
serialize_data (bool, optional) – Whether to hold memory using serialized objects, when enabled, data loader workers can use shared RAM from master process instead of making a copy. Default: True.
pipeline (list, optional) – Processing pipeline. Default: [].
test_mode (bool, optional) – test_mode=True means in test phase. Default: False.
lazy_init (bool, optional) – Whether to load annotation during instantiation. In some cases, such as visualization, only the meta information of the dataset is needed, which is not necessary to load annotation file. Basedataset can skip load annotations to save time by set lazy_init=False. Default: False.
max_refetch (int, optional) – If Basedataset.prepare_data get a None img. The maximum extra number of cycles to get a valid image. Default: 1000.

parse_data_info(raw_data_info: dict) → Optional[dict][源代码]

Parse raw COCO annotation of an instance.

参数

raw_data_info (dict) –

Raw data information loaded from ann_file. It should have following contents:

'raw_ann_info': Raw annotation of an instance
'raw_img_info': Raw information of the image that
contains the instance

返回

Parsed instance annotation

返回类型

dict

class mmpose.datasets.datasets.body.MhpDataset(ann_file: str = '', bbox_file: Optional[str] = None, data_mode: str = 'topdown', metainfo: Optional[dict] = None, data_root: Optional[str] = None, data_prefix: dict = {'img': ''}, filter_cfg: Optional[dict] = None, indices: Optional[Union[int, Sequence[int]]] = None, serialize_data: bool = True, pipeline: List[Union[dict, Callable]] = [], test_mode: bool = False, lazy_init: bool = False, max_refetch: int = 1000, sample_interval: int = 1)[源代码]

MHPv2.0 dataset for pose estimation.

“Understanding Humans in Crowded Scenes: Deep Nested Adversarial Learning and A New Benchmark for Multi-Human Parsing”, ACM MM’2018. More details can be found in the paper

MHP keypoints:

"right ankle",
"right knee",
"right hip",
"left hip",
"left knee",
"left ankle",
"pelvis",
"thorax",
"upper neck",
"head top",
"right wrist",
"right elbow",
"right shoulder",
"left shoulder",
"left elbow",
"left wrist",

参数

ann_file (str) – Annotation file path. Default: ‘’.
bbox_file (str, optional) – Detection result file path. If bbox_file is set, detected bboxes loaded from this file will be used instead of ground-truth bboxes. This setting is only for evaluation, i.e., ignored when test_mode is False. Default: None.
data_mode (str) – Specifies the mode of data samples: 'topdown' or 'bottomup'. In 'topdown' mode, each data sample contains one instance; while in 'bottomup' mode, each data sample contains all instances in a image. Default: 'topdown'
metainfo (dict, optional) – Meta information for dataset, such as class information. Default: None.
data_root (str, optional) – The root directory for data_prefix and ann_file. Default: None.
data_prefix (dict, optional) – Prefix for training data. Default: dict(img=None, ann=None).
filter_cfg (dict, optional) – Config for filter data. Default: None.
indices (int or Sequence[int], optional) – Support using first few data in annotation file to facilitate training/testing on a smaller dataset. Default: None which means using all data_infos.
serialize_data (bool, optional) – Whether to hold memory using serialized objects, when enabled, data loader workers can use shared RAM from master process instead of making a copy. Default: True.
pipeline (list, optional) – Processing pipeline. Default: [].
test_mode (bool, optional) – test_mode=True means in test phase. Default: False.
lazy_init (bool, optional) – Whether to load annotation during instantiation. In some cases, such as visualization, only the meta information of the dataset is needed, which is not necessary to load annotation file. Basedataset can skip load annotations to save time by set lazy_init=False. Default: False.
max_refetch (int, optional) – If Basedataset.prepare_data get a None img. The maximum extra number of cycles to get a valid image. Default: 1000.

class mmpose.datasets.datasets.body.MpiiDataset(ann_file: str = '', bbox_file: Optional[str] = None, headbox_file: Optional[str] = None, data_mode: str = 'topdown', metainfo: Optional[dict] = None, data_root: Optional[str] = None, data_prefix: dict = {'img': ''}, filter_cfg: Optional[dict] = None, indices: Optional[Union[int, Sequence[int]]] = None, serialize_data: bool = True, pipeline: List[Union[dict, Callable]] = [], test_mode: bool = False, lazy_init: bool = False, max_refetch: int = 1000)[源代码]

MPII Dataset for pose estimation.

“2D Human Pose Estimation: New Benchmark and State of the Art Analysis” ,CVPR’2014. More details can be found in the paper .

MPII keypoints:

'right_ankle'
'right_knee',
'right_hip',
'left_hip',
'left_knee',
'left_ankle',
'pelvis',
'thorax',
'upper_neck',
'head_top',
'right_wrist',
'right_elbow',
'right_shoulder',
'left_shoulder',
'left_elbow',
'left_wrist'

参数

ann_file (str) – Annotation file path. Default: ‘’.
bbox_file (str, optional) – Detection result file path. If bbox_file is set, detected bboxes loaded from this file will be used instead of ground-truth bboxes. This setting is only for evaluation, i.e., ignored when test_mode is False. Default: None.
headbox_file (str, optional) – The path of mpii_gt_val.mat which provides the headboxes information used for PCKh. Default: None.
data_mode (str) – Specifies the mode of data samples: 'topdown' or 'bottomup'. In 'topdown' mode, each data sample contains one instance; while in 'bottomup' mode, each data sample contains all instances in a image. Default: 'topdown'
metainfo (dict, optional) – Meta information for dataset, such as class information. Default: None.
data_root (str, optional) – The root directory for data_prefix and ann_file. Default: None.
data_prefix (dict, optional) – Prefix for training data. Default: dict(img=None, ann=None).
filter_cfg (dict, optional) – Config for filter data. Default: None.
indices (int or Sequence[int], optional) – Support using first few data in annotation file to facilitate training/testing on a smaller dataset. Default: None which means using all data_infos.
serialize_data (bool, optional) – Whether to hold memory using serialized objects, when enabled, data loader workers can use shared RAM from master process instead of making a copy. Default: True.
pipeline (list, optional) – Processing pipeline. Default: [].
test_mode (bool, optional) – test_mode=True means in test phase. Default: False.
lazy_init (bool, optional) – Whether to load annotation during instantiation. In some cases, such as visualization, only the meta information of the dataset is needed, which is not necessary to load annotation file. Basedataset can skip load annotations to save time by set lazy_init=False. Default: False.
max_refetch (int, optional) – If Basedataset.prepare_data get a None img. The maximum extra number of cycles to get a valid image. Default: 1000.

class mmpose.datasets.datasets.body.MpiiTrbDataset(ann_file: str = '', bbox_file: Optional[str] = None, data_mode: str = 'topdown', metainfo: Optional[dict] = None, data_root: Optional[str] = None, data_prefix: dict = {'img': ''}, filter_cfg: Optional[dict] = None, indices: Optional[Union[int, Sequence[int]]] = None, serialize_data: bool = True, pipeline: List[Union[dict, Callable]] = [], test_mode: bool = False, lazy_init: bool = False, max_refetch: int = 1000, sample_interval: int = 1)[源代码]

MPII-TRB Dataset dataset for pose estimation.

“TRB: A Novel Triplet Representation for Understanding 2D Human Body”, ICCV’2019. More details can be found in the paper .

MPII-TRB keypoints:

'left_shoulder'
'right_shoulder'
'left_elbow'
'right_elbow'
'left_wrist'
'right_wrist'
'left_hip'
'right_hip'
'left_knee'
'right_knee'
'left_ankle'
'right_ankle'
'head'
'neck'

'right_neck'
'left_neck'
'medial_right_shoulder'
'lateral_right_shoulder'
'medial_right_bow'
'lateral_right_bow'
'medial_right_wrist'
'lateral_right_wrist'
'medial_left_shoulder'
'lateral_left_shoulder'
'medial_left_bow'
'lateral_left_bow'
'medial_left_wrist'
'lateral_left_wrist'
'medial_right_hip'
'lateral_right_hip'
'medial_right_knee'
'lateral_right_knee'
'medial_right_ankle'
'lateral_right_ankle'
'medial_left_hip'
'lateral_left_hip'
'medial_left_knee'
'lateral_left_knee'
'medial_left_ankle'
'lateral_left_ankle'

参数

ann_file (str) – Annotation file path. Default: ‘’.
bbox_file (str, optional) – Detection result file path. If bbox_file is set, detected bboxes loaded from this file will be used instead of ground-truth bboxes. This setting is only for evaluation, i.e., ignored when test_mode is False. Default: None.
data_mode (str) – Specifies the mode of data samples: 'topdown' or 'bottomup'. In 'topdown' mode, each data sample contains one instance; while in 'bottomup' mode, each data sample contains all instances in a image. Default: 'topdown'
metainfo (dict, optional) – Meta information for dataset, such as class information. Default: None.
data_root (str, optional) – The root directory for data_prefix and ann_file. Default: None.
data_prefix (dict, optional) – Prefix for training data. Default: dict(img=None, ann=None).
filter_cfg (dict, optional) – Config for filter data. Default: None.
indices (int or Sequence[int], optional) – Support using first few data in annotation file to facilitate training/testing on a smaller dataset. Default: None which means using all data_infos.
serialize_data (bool, optional) – Whether to hold memory using serialized objects, when enabled, data loader workers can use shared RAM from master process instead of making a copy. Default: True.
pipeline (list, optional) – Processing pipeline. Default: [].
test_mode (bool, optional) – test_mode=True means in test phase. Default: False.
lazy_init (bool, optional) – Whether to load annotation during instantiation. In some cases, such as visualization, only the meta information of the dataset is needed, which is not necessary to load annotation file. Basedataset can skip load annotations to save time by set lazy_init=False. Default: False.
max_refetch (int, optional) – If Basedataset.prepare_data get a None img. The maximum extra number of cycles to get a valid image. Default: 1000.

class mmpose.datasets.datasets.body.OCHumanDataset(ann_file: str = '', bbox_file: Optional[str] = None, data_mode: str = 'topdown', metainfo: Optional[dict] = None, data_root: Optional[str] = None, data_prefix: dict = {'img': ''}, filter_cfg: Optional[dict] = None, indices: Optional[Union[int, Sequence[int]]] = None, serialize_data: bool = True, pipeline: List[Union[dict, Callable]] = [], test_mode: bool = False, lazy_init: bool = False, max_refetch: int = 1000, sample_interval: int = 1)[源代码]

OChuman dataset for pose estimation.

“Pose2Seg: Detection Free Human Instance Segmentation”, CVPR’2019. More details can be found in the paper .

“Occluded Human (OCHuman)” dataset contains 8110 heavily occluded human instances within 4731 images. OCHuman dataset is designed for validation and testing. To evaluate on OCHuman, the model should be trained on COCO training set, and then test the robustness of the model to occlusion using OCHuman.

OCHuman keypoints (same as COCO):

'nose',
'left_eye',
'right_eye',
'left_ear',
'right_ear',
'left_shoulder',
'right_shoulder',
'left_elbow',
'right_elbow',
'left_wrist',
'right_wrist',
'left_hip',
'right_hip',
'left_knee',
'right_knee',
'left_ankle',
'right_ankle'

参数

ann_file (str) – Annotation file path. Default: ‘’.
bbox_file (str, optional) – Detection result file path. If bbox_file is set, detected bboxes loaded from this file will be used instead of ground-truth bboxes. This setting is only for evaluation, i.e., ignored when test_mode is False. Default: None.
data_mode (str) – Specifies the mode of data samples: 'topdown' or 'bottomup'. In 'topdown' mode, each data sample contains one instance; while in 'bottomup' mode, each data sample contains all instances in a image. Default: 'topdown'
metainfo (dict, optional) – Meta information for dataset, such as class information. Default: None.
data_root (str, optional) – The root directory for data_prefix and ann_file. Default: None.
data_prefix (dict, optional) – Prefix for training data. Default: dict(img=None, ann=None).
filter_cfg (dict, optional) – Config for filter data. Default: None.
indices (int or Sequence[int], optional) – Support using first few data in annotation file to facilitate training/testing on a smaller dataset. Default: None which means using all data_infos.
serialize_data (bool, optional) – Whether to hold memory using serialized objects, when enabled, data loader workers can use shared RAM from master process instead of making a copy. Default: True.
pipeline (list, optional) – Processing pipeline. Default: [].
test_mode (bool, optional) – test_mode=True means in test phase. Default: False.
lazy_init (bool, optional) – Whether to load annotation during instantiation. In some cases, such as visualization, only the meta information of the dataset is needed, which is not necessary to load annotation file. Basedataset can skip load annotations to save time by set lazy_init=False. Default: False.
max_refetch (int, optional) – If Basedataset.prepare_data get a None img. The maximum extra number of cycles to get a valid image. Default: 1000.

class mmpose.datasets.datasets.body.PoseTrack18Dataset(ann_file: str = '', bbox_file: Optional[str] = None, data_mode: str = 'topdown', metainfo: Optional[dict] = None, data_root: Optional[str] = None, data_prefix: dict = {'img': ''}, filter_cfg: Optional[dict] = None, indices: Optional[Union[int, Sequence[int]]] = None, serialize_data: bool = True, pipeline: List[Union[dict, Callable]] = [], test_mode: bool = False, lazy_init: bool = False, max_refetch: int = 1000, sample_interval: int = 1)[源代码]

PoseTrack18 dataset for pose estimation.

“Posetrack: A benchmark for human pose estimation and tracking”, CVPR’2018. More details can be found in the paper .

PoseTrack2018 keypoints:

'nose',
'head_bottom',
'head_top',
'left_ear',
'right_ear',
'left_shoulder',
'right_shoulder',
'left_elbow',
'right_elbow',
'left_wrist',
'right_wrist',
'left_hip',
'right_hip',
'left_knee',
'right_knee',
'left_ankle',
'right_ankle'

参数

ann_file (str) – Annotation file path. Default: ‘’.
bbox_file (str, optional) – Detection result file path. If bbox_file is set, detected bboxes loaded from this file will be used instead of ground-truth bboxes. This setting is only for evaluation, i.e., ignored when test_mode is False. Default: None.
data_mode (str) – Specifies the mode of data samples: 'topdown' or 'bottomup'. In 'topdown' mode, each data sample contains one instance; while in 'bottomup' mode, each data sample contains all instances in a image. Default: 'topdown'
metainfo (dict, optional) – Meta information for dataset, such as class information. Default: None.
data_root (str, optional) – The root directory for data_prefix and ann_file. Default: None.
data_prefix (dict, optional) – Prefix for training data. Default: dict(img=None, ann=None).
filter_cfg (dict, optional) – Config for filter data. Default: None.
indices (int or Sequence[int], optional) – Support using first few data in annotation file to facilitate training/testing on a smaller dataset. Default: None which means using all data_infos.
serialize_data (bool, optional) – Whether to hold memory using serialized objects, when enabled, data loader workers can use shared RAM from master process instead of making a copy. Default: True.
pipeline (list, optional) – Processing pipeline. Default: [].
test_mode (bool, optional) – test_mode=True means in test phase. Default: False.
lazy_init (bool, optional) – Whether to load annotation during instantiation. In some cases, such as visualization, only the meta information of the dataset is needed, which is not necessary to load annotation file. Basedataset can skip load annotations to save time by set lazy_init=False. Default: False.
max_refetch (int, optional) – If Basedataset.prepare_data get a None img. The maximum extra number of cycles to get a valid image. Default: 1000.

class mmpose.datasets.datasets.body.PoseTrack18VideoDataset(ann_file: str = '', bbox_file: Optional[str] = None, data_mode: str = 'topdown', frame_weights: List[Union[int, float]] = [0.0, 1.0], frame_sampler_mode: str = 'random', frame_range: Optional[Union[int, List[int]]] = None, num_sampled_frame: Optional[int] = None, frame_indices: Optional[Sequence[int]] = None, ph_fill_len: int = 6, metainfo: Optional[dict] = None, data_root: Optional[str] = None, data_prefix: dict = {'img': ''}, filter_cfg: Optional[dict] = None, indices: Optional[Union[int, Sequence[int]]] = None, serialize_data: bool = True, pipeline: List[Union[dict, Callable]] = [], test_mode: bool = False, lazy_init: bool = False, max_refetch: int = 1000)[源代码]

PoseTrack18 dataset for video pose estimation.

“Posetrack: A benchmark for human pose estimation and tracking”, CVPR’2018. More details can be found in the paper .

PoseTrack2018 keypoints:

'nose',
'head_bottom',
'head_top',
'left_ear',
'right_ear',
'left_shoulder',
'right_shoulder',
'left_elbow',
'right_elbow',
'left_wrist',
'right_wrist',
'left_hip',
'right_hip',
'left_knee',
'right_knee',
'left_ankle',
'right_ankle'

参数

ann_file (str) – Annotation file path. Default: ‘’.
bbox_file (str, optional) – Detection result file path. If bbox_file is set, detected bboxes loaded from this file will be used instead of ground-truth bboxes. This setting is only for evaluation, i.e., ignored when test_mode is False. Default: None.
data_mode (str) – Specifies the mode of data samples: 'topdown' or 'bottomup'. In 'topdown' mode, each data sample contains one instance; while in 'bottomup' mode, each data sample contains all instances in a image. Default: 'topdown'
frame_weights (List[Union[int, float]]) – The weight of each frame for aggregation. The first weight is for the center frame, then on ascending order of frame indices. Note that the length of frame_weights should be consistent with the number of sampled frames. Default: [0.0, 1.0]
frame_sampler_mode (str) – Specifies the mode of frame sampler: 'fixed' or 'random'. In 'fixed' mode, each frame index relative to the center frame is fixed, specified by frame_indices, while in 'random' mode, each frame index relative to the center frame is sampled from frame_range with certain randomness. Default: 'random'.
frame_range (int | List[int], optional) – The sampling range of supporting frames in the same video for center frame. Only valid when frame_sampler_mode is 'random'. Default: None.
num_sampled_frame (int, optional) – The number of sampled frames, except the center frame. Only valid when frame_sampler_mode is 'random'. Default: 1.
frame_indices (Sequence[int], optional) – The sampled frame indices, including the center frame indicated by 0. Only valid when frame_sampler_mode is 'fixed'. Default: None.
ph_fill_len (int) – The length of the placeholder to fill in the image filenames. Default: 6
metainfo (dict, optional) – Meta information for dataset, such as class information. Default: None.
data_root (str, optional) – The root directory for data_prefix and ann_file. Default: None.
data_prefix (dict, optional) – Prefix for training data. Default: dict(img='').
filter_cfg (dict, optional) – Config for filter data. Default: None.
indices (int or Sequence[int], optional) – Support using first few data in annotation file to facilitate training/testing on a smaller dataset. Default: None which means using all data_infos.
serialize_data (bool, optional) – Whether to hold memory using serialized objects, when enabled, data loader workers can use shared RAM from master process instead of making a copy. Default: True.
pipeline (list, optional) – Processing pipeline. Default: [].
test_mode (bool, optional) – test_mode=True means in test phase. Default: False.
lazy_init (bool, optional) – Whether to load annotation during instantiation. In some cases, such as visualization, only the meta information of the dataset is needed, which is not necessary to load annotation file. Basedataset can skip load annotations to save time by set lazy_init=False. Default: False.
max_refetch (int, optional) – If Basedataset.prepare_data get a None img. The maximum extra number of cycles to get a valid image. Default: 1000.

parse_data_info(raw_data_info: dict) → Optional[dict][源代码]

Parse raw annotation of an instance.

参数

raw_data_info (dict) –

Raw data information loaded from ann_file. It should have following contents:

'raw_ann_info': Raw annotation of an instance
'raw_img_info': Raw information of the image that
contains the instance

返回

Parsed instance annotation

返回类型

dict

class mmpose.datasets.datasets.face.AFLWDataset(ann_file: str = '', bbox_file: Optional[str] = None, data_mode: str = 'topdown', metainfo: Optional[dict] = None, data_root: Optional[str] = None, data_prefix: dict = {'img': ''}, filter_cfg: Optional[dict] = None, indices: Optional[Union[int, Sequence[int]]] = None, serialize_data: bool = True, pipeline: List[Union[dict, Callable]] = [], test_mode: bool = False, lazy_init: bool = False, max_refetch: int = 1000, sample_interval: int = 1)[源代码]

AFLW dataset for face keypoint localization.

“Annotated Facial Landmarks in the Wild: A Large-scale, Real-world Database for Facial Landmark Localization”. In Proc. First IEEE International Workshop on Benchmarking Facial Image Analysis Technologies, 2011.

The landmark annotations follow the 19 points mark-up. The definition can be found in https://www.tugraz.at/institute/icg/research /team-bischof/lrs/downloads/aflw/

Args: ann_file (str): Annotation file path. Default: ‘’. bbox_file (str, optional): Detection result file path. If

bbox_file is set, detected bboxes loaded from this file will be used instead of ground-truth bboxes. This setting is only for evaluation, i.e., ignored when test_mode is False. Default: None.

data_mode (str): Specifies the mode of data samples: 'topdown' or
'bottomup'. In 'topdown' mode, each data sample contains one instance; while in 'bottomup' mode, each data sample contains all instances in a image. Default: 'topdown'

metainfo (dict, optional): Meta information for dataset, such as class
information. Default: None.

data_root (str, optional): The root directory for data_prefix and
ann_file. Default: None.

data_prefix (dict, optional): Prefix for training data. Default:
dict(img=None, ann=None).

filter_cfg (dict, optional): Config for filter data. Default: None. indices (int or Sequence[int], optional): Support using first few

data in annotation file to facilitate training/testing on a smaller dataset. Default: None which means using all data_infos.

serialize_data (bool, optional): Whether to hold memory using
serialized objects, when enabled, data loader workers can use shared RAM from master process instead of making a copy. Default: True.

pipeline (list, optional): Processing pipeline. Default: []. test_mode (bool, optional): test_mode=True means in test phase.

Default: False.

lazy_init (bool, optional): Whether to load annotation during
instantiation. In some cases, such as visualization, only the meta information of the dataset is needed, which is not necessary to load annotation file. Basedataset can skip load annotations to save time by set lazy_init=False. Default: False.

max_refetch (int, optional): If Basedataset.prepare_data get a
None img. The maximum extra number of cycles to get a valid image. Default: 1000.

parse_data_info(raw_data_info: dict) → Optional[dict][源代码]

Parse raw Face AFLW annotation of an instance.

参数

raw_data_info (dict) –

Raw data information loaded from ann_file. It should have following contents:

'raw_ann_info': Raw annotation of an instance
'raw_img_info': Raw information of the image that
contains the instance

返回

Parsed instance annotation

返回类型

dict

class mmpose.datasets.datasets.face.COFWDataset(ann_file: str = '', bbox_file: Optional[str] = None, data_mode: str = 'topdown', metainfo: Optional[dict] = None, data_root: Optional[str] = None, data_prefix: dict = {'img': ''}, filter_cfg: Optional[dict] = None, indices: Optional[Union[int, Sequence[int]]] = None, serialize_data: bool = True, pipeline: List[Union[dict, Callable]] = [], test_mode: bool = False, lazy_init: bool = False, max_refetch: int = 1000, sample_interval: int = 1)[源代码]

COFW dataset for face keypoint localization.

“Robust face landmark estimation under occlusion”, ICCV’2013.

The landmark annotations follow the 29 points mark-up. The definition can be found in `http://www.vision.caltech.edu/xpburgos/ICCV13/`__ .

参数

ann_file (str) – Annotation file path. Default: ‘’.
bbox_file (str, optional) – Detection result file path. If bbox_file is set, detected bboxes loaded from this file will be used instead of ground-truth bboxes. This setting is only for evaluation, i.e., ignored when test_mode is False. Default: None.
data_mode (str) – Specifies the mode of data samples: 'topdown' or 'bottomup'. In 'topdown' mode, each data sample contains one instance; while in 'bottomup' mode, each data sample contains all instances in a image. Default: 'topdown'
metainfo (dict, optional) – Meta information for dataset, such as class information. Default: None.
data_root (str, optional) – The root directory for data_prefix and ann_file. Default: None.
data_prefix (dict, optional) – Prefix for training data. Default: dict(img=None, ann=None).
filter_cfg (dict, optional) – Config for filter data. Default: None.
indices (int or Sequence[int], optional) – Support using first few data in annotation file to facilitate training/testing on a smaller dataset. Default: None which means using all data_infos.
serialize_data (bool, optional) – Whether to hold memory using serialized objects, when enabled, data loader workers can use shared RAM from master process instead of making a copy. Default: True.
pipeline (list, optional) – Processing pipeline. Default: [].
test_mode (bool, optional) – test_mode=True means in test phase. Default: False.
lazy_init (bool, optional) – Whether to load annotation during instantiation. In some cases, such as visualization, only the meta information of the dataset is needed, which is not necessary to load annotation file. Basedataset can skip load annotations to save time by set lazy_init=False. Default: False.
max_refetch (int, optional) – If Basedataset.prepare_data get a None img. The maximum extra number of cycles to get a valid image. Default: 1000.

class mmpose.datasets.datasets.face.CocoWholeBodyFaceDataset(ann_file: str = '', bbox_file: Optional[str] = None, data_mode: str = 'topdown', metainfo: Optional[dict] = None, data_root: Optional[str] = None, data_prefix: dict = {'img': ''}, filter_cfg: Optional[dict] = None, indices: Optional[Union[int, Sequence[int]]] = None, serialize_data: bool = True, pipeline: List[Union[dict, Callable]] = [], test_mode: bool = False, lazy_init: bool = False, max_refetch: int = 1000, sample_interval: int = 1)[源代码]

CocoWholeBodyDataset for face keypoint localization.

Whole-Body Human Pose Estimation in the Wild’, ECCV’2020. More details can be found in the `paper .

The face landmark annotations follow the 68 points mark-up.

参数

ann_file (str) – Annotation file path. Default: ‘’.
bbox_file (str, optional) – Detection result file path. If bbox_file is set, detected bboxes loaded from this file will be used instead of ground-truth bboxes. This setting is only for evaluation, i.e., ignored when test_mode is False. Default: None.
data_mode (str) – Specifies the mode of data samples: 'topdown' or 'bottomup'. In 'topdown' mode, each data sample contains one instance; while in 'bottomup' mode, each data sample contains all instances in a image. Default: 'topdown'
metainfo (dict, optional) – Meta information for dataset, such as class information. Default: None.
data_root (str, optional) – The root directory for data_prefix and ann_file. Default: None.
data_prefix (dict, optional) – Prefix for training data. Default: dict(img=None, ann=None).
filter_cfg (dict, optional) – Config for filter data. Default: None.
indices (int or Sequence[int], optional) – Support using first few data in annotation file to facilitate training/testing on a smaller dataset. Default: None which means using all data_infos.
serialize_data (bool, optional) – Whether to hold memory using serialized objects, when enabled, data loader workers can use shared RAM from master process instead of making a copy. Default: True.
pipeline (list, optional) – Processing pipeline. Default: [].
test_mode (bool, optional) – test_mode=True means in test phase. Default: False.
lazy_init (bool, optional) – Whether to load annotation during instantiation. In some cases, such as visualization, only the meta information of the dataset is needed, which is not necessary to load annotation file. Basedataset can skip load annotations to save time by set lazy_init=False. Default: False.
max_refetch (int, optional) – If Basedataset.prepare_data get a None img. The maximum extra number of cycles to get a valid image. Default: 1000.

parse_data_info(raw_data_info: dict) → Optional[dict][源代码]

Parse raw CocoWholeBody Face annotation of an instance.

参数

raw_data_info (dict) –

Raw data information loaded from ann_file. It should have following contents:

'raw_ann_info': Raw annotation of an instance
'raw_img_info': Raw information of the image that
contains the instance

返回

Parsed instance annotation

返回类型

dict

class mmpose.datasets.datasets.face.Face300VWDataset(ann_file: str = '', bbox_file: Optional[str] = None, data_mode: str = 'topdown', metainfo: Optional[dict] = None, data_root: Optional[str] = None, data_prefix: dict = {'img': ''}, filter_cfg: Optional[dict] = None, indices: Optional[Union[int, Sequence[int]]] = None, serialize_data: bool = True, pipeline: List[Union[dict, Callable]] = [], test_mode: bool = False, lazy_init: bool = False, max_refetch: int = 1000, sample_interval: int = 1)[源代码]

300VW dataset for face keypoint tracking.

“The First Facial Landmark Tracking in-the-Wild Challenge:: Benchmark and Results”,

Proceedings of the IEEE international conference on computer vision workshops.

The landmark annotations follow the 68 points mark-up. The definition can be found in https://ibug.doc.ic.ac.uk/resources/300-VW/.

参数

ann_file (str) – Annotation file path. Default: ‘’.
bbox_file (str, optional) – Detection result file path. If bbox_file is set, detected bboxes loaded from this file will be used instead of ground-truth bboxes. This setting is only for evaluation, i.e., ignored when test_mode is False. Default: None.
data_mode (str) – Specifies the mode of data samples: 'topdown' or 'bottomup'. In 'topdown' mode, each data sample contains one instance; while in 'bottomup' mode, each data sample contains all instances in a image. Default: 'topdown'
metainfo (dict, optional) – Meta information for dataset, such as class information. Default: None.
data_root (str, optional) – The root directory for data_prefix and ann_file. Default: None.
data_prefix (dict, optional) – Prefix for training data. Default: dict(img=None, ann=None).
filter_cfg (dict, optional) – Config for filter data. Default: None.
indices (int or Sequence[int], optional) – Support using first few data in annotation file to facilitate training/testing on a smaller dataset. Default: None which means using all data_infos.
serialize_data (bool, optional) – Whether to hold memory using serialized objects, when enabled, data loader workers can use shared RAM from master process instead of making a copy. Default: True.
pipeline (list, optional) – Processing pipeline. Default: [].
test_mode (bool, optional) – test_mode=True means in test phase. Default: False.
lazy_init (bool, optional) – Whether to load annotation during instantiation. In some cases, such as visualization, only the meta information of the dataset is needed, which is not necessary to load annotation file. Basedataset can skip load annotations to save time by set lazy_init=False. Default: False.
max_refetch (int, optional) – If Basedataset.prepare_data get a None img. The maximum extra number of cycles to get a valid image. Default: 1000.

parse_data_info(raw_data_info: dict) → Optional[dict][源代码]

Parse raw Face300VW annotation of an instance.

参数

raw_data_info (dict) –

Raw data information loaded from ann_file. It should have following contents:

'raw_ann_info': Raw annotation of an instance
'raw_img_info': Raw information of the image that
contains the instance

返回

Parsed instance annotation

返回类型

dict

class mmpose.datasets.datasets.face.Face300WDataset(ann_file: str = '', bbox_file: Optional[str] = None, data_mode: str = 'topdown', metainfo: Optional[dict] = None, data_root: Optional[str] = None, data_prefix: dict = {'img': ''}, filter_cfg: Optional[dict] = None, indices: Optional[Union[int, Sequence[int]]] = None, serialize_data: bool = True, pipeline: List[Union[dict, Callable]] = [], test_mode: bool = False, lazy_init: bool = False, max_refetch: int = 1000, sample_interval: int = 1)[源代码]

300W dataset for face keypoint localization.

“300 faces In-the-wild challenge: Database and results”, Image and Vision Computing (IMAVIS) 2019.

The landmark annotations follow the 68 points mark-up. The definition can be found in https://ibug.doc.ic.ac.uk/resources/300-W/.

参数

ann_file (str) – Annotation file path. Default: ‘’.
bbox_file (str, optional) – Detection result file path. If bbox_file is set, detected bboxes loaded from this file will be used instead of ground-truth bboxes. This setting is only for evaluation, i.e., ignored when test_mode is False. Default: None.
data_mode (str) – Specifies the mode of data samples: 'topdown' or 'bottomup'. In 'topdown' mode, each data sample contains one instance; while in 'bottomup' mode, each data sample contains all instances in a image. Default: 'topdown'
metainfo (dict, optional) – Meta information for dataset, such as class information. Default: None.
data_root (str, optional) – The root directory for data_prefix and ann_file. Default: None.
data_prefix (dict, optional) – Prefix for training data. Default: dict(img=None, ann=None).
filter_cfg (dict, optional) – Config for filter data. Default: None.
indices (int or Sequence[int], optional) – Support using first few data in annotation file to facilitate training/testing on a smaller dataset. Default: None which means using all data_infos.
serialize_data (bool, optional) – Whether to hold memory using serialized objects, when enabled, data loader workers can use shared RAM from master process instead of making a copy. Default: True.
pipeline (list, optional) – Processing pipeline. Default: [].
test_mode (bool, optional) – test_mode=True means in test phase. Default: False.
lazy_init (bool, optional) – Whether to load annotation during instantiation. In some cases, such as visualization, only the meta information of the dataset is needed, which is not necessary to load annotation file. Basedataset can skip load annotations to save time by set lazy_init=False. Default: False.
max_refetch (int, optional) – If Basedataset.prepare_data get a None img. The maximum extra number of cycles to get a valid image. Default: 1000.

parse_data_info(raw_data_info: dict) → Optional[dict][源代码]

Parse raw Face300W annotation of an instance.

参数

raw_data_info (dict) –

Raw data information loaded from ann_file. It should have following contents:

'raw_ann_info': Raw annotation of an instance
'raw_img_info': Raw information of the image that
contains the instance

返回

Parsed instance annotation

返回类型

dict

class mmpose.datasets.datasets.face.Face300WLPDataset(ann_file: str = '', bbox_file: Optional[str] = None, data_mode: str = 'topdown', metainfo: Optional[dict] = None, data_root: Optional[str] = None, data_prefix: dict = {'img': ''}, filter_cfg: Optional[dict] = None, indices: Optional[Union[int, Sequence[int]]] = None, serialize_data: bool = True, pipeline: List[Union[dict, Callable]] = [], test_mode: bool = False, lazy_init: bool = False, max_refetch: int = 1000, sample_interval: int = 1)[源代码]

300W dataset for face keypoint localization.

“300 faces In-the-wild challenge: Database and results”, Image and Vision Computing (IMAVIS) 2019.

The landmark annotations follow the 68 points mark-up. The definition can be found in https://ibug.doc.ic.ac.uk/resources/300-W/.

参数

ann_file (str) – Annotation file path. Default: ‘’.
bbox_file (str, optional) – Detection result file path. If bbox_file is set, detected bboxes loaded from this file will be used instead of ground-truth bboxes. This setting is only for evaluation, i.e., ignored when test_mode is False. Default: None.
data_mode (str) – Specifies the mode of data samples: 'topdown' or 'bottomup'. In 'topdown' mode, each data sample contains one instance; while in 'bottomup' mode, each data sample contains all instances in a image. Default: 'topdown'
metainfo (dict, optional) – Meta information for dataset, such as class information. Default: None.
data_root (str, optional) – The root directory for data_prefix and ann_file. Default: None.
data_prefix (dict, optional) – Prefix for training data. Default: dict(img=None, ann=None).
filter_cfg (dict, optional) – Config for filter data. Default: None.
indices (int or Sequence[int], optional) – Support using first few data in annotation file to facilitate training/testing on a smaller dataset. Default: None which means using all data_infos.
serialize_data (bool, optional) – Whether to hold memory using serialized objects, when enabled, data loader workers can use shared RAM from master process instead of making a copy. Default: True.
pipeline (list, optional) – Processing pipeline. Default: [].
test_mode (bool, optional) – test_mode=True means in test phase. Default: False.
lazy_init (bool, optional) – Whether to load annotation during instantiation. In some cases, such as visualization, only the meta information of the dataset is needed, which is not necessary to load annotation file. Basedataset can skip load annotations to save time by set lazy_init=False. Default: False.
max_refetch (int, optional) – If Basedataset.prepare_data get a None img. The maximum extra number of cycles to get a valid image. Default: 1000.

class mmpose.datasets.datasets.face.LapaDataset(ann_file: str = '', bbox_file: Optional[str] = None, data_mode: str = 'topdown', metainfo: Optional[dict] = None, data_root: Optional[str] = None, data_prefix: dict = {'img': ''}, filter_cfg: Optional[dict] = None, indices: Optional[Union[int, Sequence[int]]] = None, serialize_data: bool = True, pipeline: List[Union[dict, Callable]] = [], test_mode: bool = False, lazy_init: bool = False, max_refetch: int = 1000, sample_interval: int = 1)[源代码]

LaPa dataset for face keypoint localization.

“A New Dataset and Boundary-Attention Semantic Segmentation for Face Parsing”, AAAI’2020.

The landmark annotations follow the 106 points mark-up. The definition can be found in `https://github.com/JDAI-CV/lapa-dataset/`__ .

参数

ann_file (str) – Annotation file path. Default: ‘’.
bbox_file (str, optional) – Detection result file path. If bbox_file is set, detected bboxes loaded from this file will be used instead of ground-truth bboxes. This setting is only for evaluation, i.e., ignored when test_mode is False. Default: None.
data_mode (str) – Specifies the mode of data samples: 'topdown' or 'bottomup'. In 'topdown' mode, each data sample contains one instance; while in 'bottomup' mode, each data sample contains all instances in a image. Default: 'topdown'
metainfo (dict, optional) – Meta information for dataset, such as class information. Default: None.
data_root (str, optional) – The root directory for data_prefix and ann_file. Default: None.
data_prefix (dict, optional) – Prefix for training data. Default: dict(img=None, ann=None).
filter_cfg (dict, optional) – Config for filter data. Default: None.
indices (int or Sequence[int], optional) – Support using first few data in annotation file to facilitate training/testing on a smaller dataset. Default: None which means using all data_infos.
serialize_data (bool, optional) – Whether to hold memory using serialized objects, when enabled, data loader workers can use shared RAM from master process instead of making a copy. Default: True.
pipeline (list, optional) – Processing pipeline. Default: [].
test_mode (bool, optional) – test_mode=True means in test phase. Default: False.
lazy_init (bool, optional) – Whether to load annotation during instantiation. In some cases, such as visualization, only the meta information of the dataset is needed, which is not necessary to load annotation file. Basedataset can skip load annotations to save time by set lazy_init=False. Default: False.
max_refetch (int, optional) – If Basedataset.prepare_data get a None img. The maximum extra number of cycles to get a valid image. Default: 1000.

class mmpose.datasets.datasets.face.WFLWDataset(ann_file: str = '', bbox_file: Optional[str] = None, data_mode: str = 'topdown', metainfo: Optional[dict] = None, data_root: Optional[str] = None, data_prefix: dict = {'img': ''}, filter_cfg: Optional[dict] = None, indices: Optional[Union[int, Sequence[int]]] = None, serialize_data: bool = True, pipeline: List[Union[dict, Callable]] = [], test_mode: bool = False, lazy_init: bool = False, max_refetch: int = 1000, sample_interval: int = 1)[源代码]

WFLW dataset for face keypoint localization.

“Look at Boundary: A Boundary-Aware Face Alignment Algorithm”, CVPR’2018.

The landmark annotations follow the 98 points mark-up. The definition can be found in `https://wywu.github.io/projects/LAB/WFLW.html`__ .

参数

ann_file (str) – Annotation file path. Default: ‘’.
bbox_file (str, optional) – Detection result file path. If bbox_file is set, detected bboxes loaded from this file will be used instead of ground-truth bboxes. This setting is only for evaluation, i.e., ignored when test_mode is False. Default: None.
data_mode (str) – Specifies the mode of data samples: 'topdown' or 'bottomup'. In 'topdown' mode, each data sample contains one instance; while in 'bottomup' mode, each data sample contains all instances in a image. Default: 'topdown'
metainfo (dict, optional) – Meta information for dataset, such as class information. Default: None.
data_root (str, optional) – The root directory for data_prefix and ann_file. Default: None.
data_prefix (dict, optional) – Prefix for training data. Default: dict(img=None, ann=None).
filter_cfg (dict, optional) – Config for filter data. Default: None.
indices (int or Sequence[int], optional) – Support using first few data in annotation file to facilitate training/testing on a smaller dataset. Default: None which means using all data_infos.
serialize_data (bool, optional) – Whether to hold memory using serialized objects, when enabled, data loader workers can use shared RAM from master process instead of making a copy. Default: True.
pipeline (list, optional) – Processing pipeline. Default: [].
test_mode (bool, optional) – test_mode=True means in test phase. Default: False.
lazy_init (bool, optional) – Whether to load annotation during instantiation. In some cases, such as visualization, only the meta information of the dataset is needed, which is not necessary to load annotation file. Basedataset can skip load annotations to save time by set lazy_init=False. Default: False.
max_refetch (int, optional) – If Basedataset.prepare_data get a None img. The maximum extra number of cycles to get a valid image. Default: 1000.

parse_data_info(raw_data_info: dict) → Optional[dict][源代码]

Parse raw Face WFLW annotation of an instance.

参数

raw_data_info (dict) –

Raw data information loaded from ann_file. It should have following contents:

'raw_ann_info': Raw annotation of an instance
'raw_img_info': Raw information of the image that
contains the instance

返回

Parsed instance annotation

返回类型

dict

class mmpose.datasets.datasets.hand.CocoWholeBodyHandDataset(ann_file: str = '', bbox_file: Optional[str] = None, data_mode: str = 'topdown', metainfo: Optional[dict] = None, data_root: Optional[str] = None, data_prefix: dict = {'img': ''}, filter_cfg: Optional[dict] = None, indices: Optional[Union[int, Sequence[int]]] = None, serialize_data: bool = True, pipeline: List[Union[dict, Callable]] = [], test_mode: bool = False, lazy_init: bool = False, max_refetch: int = 1000, sample_interval: int = 1)[源代码]

CocoWholeBodyDataset for hand pose estimation.

“Whole-Body Human Pose Estimation in the Wild”, ECCV’2020. More details can be found in the paper .

COCO-WholeBody Hand keypoints:

'wrist',
'thumb1',
'thumb2',
'thumb3',
'thumb4',
'forefinger1',
'forefinger2',
'forefinger3',
'forefinger4',
'middle_finger1',
'middle_finger2',
'middle_finger3',
'middle_finger4',
'ring_finger1',
'ring_finger2',
'ring_finger3',
'ring_finger4',
'pinky_finger1',
'pinky_finger2',
'pinky_finger3',
'pinky_finger4'

参数

ann_file (str) – Annotation file path. Default: ‘’.
bbox_file (str, optional) – Detection result file path. If bbox_file is set, detected bboxes loaded from this file will be used instead of ground-truth bboxes. This setting is only for evaluation, i.e., ignored when test_mode is False. Default: None.
data_mode (str) – Specifies the mode of data samples: 'topdown' or 'bottomup'. In 'topdown' mode, each data sample contains one instance; while in 'bottomup' mode, each data sample contains all instances in a image. Default: 'topdown'
metainfo (dict, optional) – Meta information for dataset, such as class information. Default: None.
data_root (str, optional) – The root directory for data_prefix and ann_file. Default: None.
data_prefix (dict, optional) – Prefix for training data. Default: dict(img=None, ann=None).
filter_cfg (dict, optional) – Config for filter data. Default: None.
indices (int or Sequence[int], optional) – Support using first few data in annotation file to facilitate training/testing on a smaller dataset. Default: None which means using all data_infos.
serialize_data (bool, optional) – Whether to hold memory using serialized objects, when enabled, data loader workers can use shared RAM from master process instead of making a copy. Default: True.
pipeline (list, optional) – Processing pipeline. Default: [].
test_mode (bool, optional) – test_mode=True means in test phase. Default: False.
lazy_init (bool, optional) – Whether to load annotation during instantiation. In some cases, such as visualization, only the meta information of the dataset is needed, which is not necessary to load annotation file. Basedataset can skip load annotations to save time by set lazy_init=False. Default: False.
max_refetch (int, optional) – If Basedataset.prepare_data get a None img. The maximum extra number of cycles to get a valid image. Default: 1000.

class mmpose.datasets.datasets.hand.FreiHandDataset(ann_file: str = '', bbox_file: Optional[str] = None, data_mode: str = 'topdown', metainfo: Optional[dict] = None, data_root: Optional[str] = None, data_prefix: dict = {'img': ''}, filter_cfg: Optional[dict] = None, indices: Optional[Union[int, Sequence[int]]] = None, serialize_data: bool = True, pipeline: List[Union[dict, Callable]] = [], test_mode: bool = False, lazy_init: bool = False, max_refetch: int = 1000, sample_interval: int = 1)[源代码]

FreiHand dataset for hand pose estimation.

“FreiHAND: A Dataset for Markerless Capture of Hand Pose and Shape from Single RGB Images”, ICCV’2019. More details can be found in the paper .

FreiHand keypoints:

'wrist',
'thumb1',
'thumb2',
'thumb3',
'thumb4',
'forefinger1',
'forefinger2',
'forefinger3',
'forefinger4',
'middle_finger1',
'middle_finger2',
'middle_finger3',
'middle_finger4',
'ring_finger1',
'ring_finger2',
'ring_finger3',
'ring_finger4',
'pinky_finger1',
'pinky_finger2',
'pinky_finger3',
'pinky_finger4'

参数

ann_file (str) – Annotation file path. Default: ‘’.
bbox_file (str, optional) – Detection result file path. If bbox_file is set, detected bboxes loaded from this file will be used instead of ground-truth bboxes. This setting is only for evaluation, i.e., ignored when test_mode is False. Default: None.
data_mode (str) – Specifies the mode of data samples: 'topdown' or 'bottomup'. In 'topdown' mode, each data sample contains one instance; while in 'bottomup' mode, each data sample contains all instances in a image. Default: 'topdown'
metainfo (dict, optional) – Meta information for dataset, such as class information. Default: None.
data_root (str, optional) – The root directory for data_prefix and ann_file. Default: None.
data_prefix (dict, optional) – Prefix for training data. Default: dict(img=None, ann=None).
filter_cfg (dict, optional) – Config for filter data. Default: None.
indices (int or Sequence[int], optional) – Support using first few data in annotation file to facilitate training/testing on a smaller dataset. Default: None which means using all data_infos.
serialize_data (bool, optional) – Whether to hold memory using serialized objects, when enabled, data loader workers can use shared RAM from master process instead of making a copy. Default: True.
pipeline (list, optional) – Processing pipeline. Default: [].
test_mode (bool, optional) – test_mode=True means in test phase. Default: False.
lazy_init (bool, optional) – Whether to load annotation during instantiation. In some cases, such as visualization, only the meta information of the dataset is needed, which is not necessary to load annotation file. Basedataset can skip load annotations to save time by set lazy_init=False. Default: False.
max_refetch (int, optional) – If Basedataset.prepare_data get a None img. The maximum extra number of cycles to get a valid image. Default: 1000.

parse_data_info(raw_data_info: dict) → Optional[dict][源代码]

Parse raw COCO annotation of an instance.

参数

raw_data_info (dict) –

Raw data information loaded from ann_file. It should have following contents:

'raw_ann_info': Raw annotation of an instance
'raw_img_info': Raw information of the image that
contains the instance

返回

Parsed instance annotation

返回类型

dict

class mmpose.datasets.datasets.hand.InterHand2DDoubleDataset(ann_file: str = '', camera_param_file: str = '', joint_file: str = '', use_gt_root_depth: bool = True, rootnet_result_file: Optional[str] = None, data_mode: str = 'topdown', metainfo: Optional[dict] = None, data_root: Optional[str] = None, data_prefix: dict = {'img': ''}, filter_cfg: Optional[dict] = None, indices: Optional[Union[int, Sequence[int]]] = None, serialize_data: bool = True, pipeline: List[Union[dict, Callable]] = [], test_mode: bool = False, lazy_init: bool = False, max_refetch: int = 1000, sample_interval: int = 1)[源代码]

InterHand2.6M dataset for 2d double hands.

“InterHand2.6M: A Dataset and Baseline for 3D Interacting Hand Pose Estimation from a Single RGB Image”, ECCV’2020. More details can be found in the paper .

The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.

InterHand2.6M keypoint indexes:

'r_thumb4',
'r_thumb3',
'r_thumb2',
'r_thumb1',
'r_index4',
'r_index3',
'r_index2',
'r_index1',
'r_middle4',
'r_middle3',
'r_middle2',
'r_middle1',
'r_ring4',
'r_ring3',
'r_ring2',
'r_ring1',
'r_pinky4',
'r_pinky3',
'r_pinky2',
'r_pinky1',
'r_wrist',
'l_thumb4',
'l_thumb3',
'l_thumb2',
'l_thumb1',
'l_index4',
'l_index3',
'l_index2',
'l_index1',
'l_middle4',
'l_middle3',
'l_middle2',
'l_middle1',
'l_ring4',
'l_ring3',
'l_ring2',
'l_ring1',
'l_pinky4',
'l_pinky3',
'l_pinky2',
'l_pinky1',
'l_wrist'

参数

ann_file (str) – Annotation file path. Default: ‘’.
camera_param_file (str) – Cameras’ parameters file. Default: ‘’.
joint_file (str) – Path to the joint file. Default: ‘’.
use_gt_root_depth (bool) – Using the ground truth depth of the wrist or given depth from rootnet_result_file. Default: True.
rootnet_result_file (str) – Path to the wrist depth file. Default: None.
data_mode (str) – Specifies the mode of data samples: 'topdown' or 'bottomup'. In 'topdown' mode, each data sample contains one instance; while in 'bottomup' mode, each data sample contains all instances in a image. Default: 'topdown'
metainfo (dict, optional) – Meta information for dataset, such as class information. Default: None.
data_root (str, optional) – The root directory for data_prefix and ann_file. Default: None.
data_prefix (dict, optional) – Prefix for training data. Default: dict(img='').
filter_cfg (dict, optional) – Config for filter data. Default: None.
indices (int or Sequence[int], optional) – Support using first few data in annotation file to facilitate training/testing on a smaller dataset. Default: None which means using all data_infos.
serialize_data (bool, optional) – Whether to hold memory using serialized objects, when enabled, data loader workers can use shared RAM from master process instead of making a copy. Default: True.
pipeline (list, optional) – Processing pipeline. Default: [].
test_mode (bool, optional) – test_mode=True means in test phase. Default: False.
lazy_init (bool, optional) – Whether to load annotation during instantiation. In some cases, such as visualization, only the meta information of the dataset is needed, which is not necessary to load annotation file. Basedataset can skip load annotations to save time by set lazy_init=False. Default: False.
max_refetch (int, optional) – If Basedataset.prepare_data get a None img. The maximum extra number of cycles to get a valid image. Default: 1000.
sample_interval (int, optional) – The sample interval of the dataset. Default: 1.

parse_data_info(raw_data_info: dict) → Optional[dict][源代码]

Parse raw COCO annotation of an instance.

参数

raw_data_info (dict) –

Raw data information loaded from ann_file. It should have following contents:

'raw_ann_info': Raw annotation of an instance
'raw_img_info': Raw information of the image that
contains the instance

返回

Parsed instance annotation

返回类型

dict | None

class mmpose.datasets.datasets.hand.OneHand10KDataset(ann_file: str = '', bbox_file: Optional[str] = None, data_mode: str = 'topdown', metainfo: Optional[dict] = None, data_root: Optional[str] = None, data_prefix: dict = {'img': ''}, filter_cfg: Optional[dict] = None, indices: Optional[Union[int, Sequence[int]]] = None, serialize_data: bool = True, pipeline: List[Union[dict, Callable]] = [], test_mode: bool = False, lazy_init: bool = False, max_refetch: int = 1000, sample_interval: int = 1)[源代码]

OneHand10K dataset for hand pose estimation.

“Mask-pose Cascaded CNN for 2D Hand Pose Estimation from Single Color Images”, TCSVT’2019. More details can be found in the paper .

OneHand10K keypoints:

'wrist',
'thumb1',
'thumb2',
'thumb3',
'thumb4',
'forefinger1',
'forefinger2',
'forefinger3',
'forefinger4',
'middle_finger1',
'middle_finger2',
'middle_finger3',
'middle_finger4',
'ring_finger1',
'ring_finger2',
'ring_finger3',
'ring_finger4',
'pinky_finger1',
'pinky_finger2',
'pinky_finger3',
'pinky_finger4'

参数

ann_file (str) – Annotation file path. Default: ‘’.
bbox_file (str, optional) – Detection result file path. If bbox_file is set, detected bboxes loaded from this file will be used instead of ground-truth bboxes. This setting is only for evaluation, i.e., ignored when test_mode is False. Default: None.
data_mode (str) – Specifies the mode of data samples: 'topdown' or 'bottomup'. In 'topdown' mode, each data sample contains one instance; while in 'bottomup' mode, each data sample contains all instances in a image. Default: 'topdown'
metainfo (dict, optional) – Meta information for dataset, such as class information. Default: None.
data_root (str, optional) – The root directory for data_prefix and ann_file. Default: None.
data_prefix (dict, optional) – Prefix for training data. Default: dict(img=None, ann=None).
filter_cfg (dict, optional) – Config for filter data. Default: None.
indices (int or Sequence[int], optional) – Support using first few data in annotation file to facilitate training/testing on a smaller dataset. Default: None which means using all data_infos.
serialize_data (bool, optional) – Whether to hold memory using serialized objects, when enabled, data loader workers can use shared RAM from master process instead of making a copy. Default: True.
pipeline (list, optional) – Processing pipeline. Default: [].
test_mode (bool, optional) – test_mode=True means in test phase. Default: False.
lazy_init (bool, optional) – Whether to load annotation during instantiation. In some cases, such as visualization, only the meta information of the dataset is needed, which is not necessary to load annotation file. Basedataset can skip load annotations to save time by set lazy_init=False. Default: False.
max_refetch (int, optional) – If Basedataset.prepare_data get a None img. The maximum extra number of cycles to get a valid image. Default: 1000.

class mmpose.datasets.datasets.hand.PanopticHand2DDataset(ann_file: str = '', bbox_file: Optional[str] = None, data_mode: str = 'topdown', metainfo: Optional[dict] = None, data_root: Optional[str] = None, data_prefix: dict = {'img': ''}, filter_cfg: Optional[dict] = None, indices: Optional[Union[int, Sequence[int]]] = None, serialize_data: bool = True, pipeline: List[Union[dict, Callable]] = [], test_mode: bool = False, lazy_init: bool = False, max_refetch: int = 1000, sample_interval: int = 1)[源代码]

Panoptic 2D dataset for hand pose estimation.

“Hand Keypoint Detection in Single Images using Multiview Bootstrapping”, CVPR’2017. More details can be found in the paper .

Panoptic keypoints:

'wrist',
'thumb1',
'thumb2',
'thumb3',
'thumb4',
'forefinger1',
'forefinger2',
'forefinger3',
'forefinger4',
'middle_finger1',
'middle_finger2',
'middle_finger3',
'middle_finger4',
'ring_finger1',
'ring_finger2',
'ring_finger3',
'ring_finger4',
'pinky_finger1',
'pinky_finger2',
'pinky_finger3',
'pinky_finger4'

参数

ann_file (str) – Annotation file path. Default: ‘’.
bbox_file (str, optional) – Detection result file path. If bbox_file is set, detected bboxes loaded from this file will be used instead of ground-truth bboxes. This setting is only for evaluation, i.e., ignored when test_mode is False. Default: None.
data_mode (str) – Specifies the mode of data samples: 'topdown' or 'bottomup'. In 'topdown' mode, each data sample contains one instance; while in 'bottomup' mode, each data sample contains all instances in a image. Default: 'topdown'
metainfo (dict, optional) – Meta information for dataset, such as class information. Default: None.
data_root (str, optional) – The root directory for data_prefix and ann_file. Default: None.
data_prefix (dict, optional) – Prefix for training data. Default: dict(img=None, ann=None).
filter_cfg (dict, optional) – Config for filter data. Default: None.
indices (int or Sequence[int], optional) – Support using first few data in annotation file to facilitate training/testing on a smaller dataset. Default: None which means using all data_infos.
serialize_data (bool, optional) – Whether to hold memory using serialized objects, when enabled, data loader workers can use shared RAM from master process instead of making a copy. Default: True.
pipeline (list, optional) – Processing pipeline. Default: [].
test_mode (bool, optional) – test_mode=True means in test phase. Default: False.
lazy_init (bool, optional) – Whether to load annotation during instantiation. In some cases, such as visualization, only the meta information of the dataset is needed, which is not necessary to load annotation file. Basedataset can skip load annotations to save time by set lazy_init=False. Default: False.
max_refetch (int, optional) – If Basedataset.prepare_data get a None img. The maximum extra number of cycles to get a valid image. Default: 1000.

parse_data_info(raw_data_info: dict) → Optional[dict][源代码]

Parse raw COCO annotation of an instance.

参数

raw_data_info (dict) –

Raw data information loaded from ann_file. It should have following contents:

'raw_ann_info': Raw annotation of an instance
'raw_img_info': Raw information of the image that
contains the instance

返回

Parsed instance annotation

返回类型

dict

class mmpose.datasets.datasets.hand.Rhd2DDataset(ann_file: str = '', bbox_file: Optional[str] = None, data_mode: str = 'topdown', metainfo: Optional[dict] = None, data_root: Optional[str] = None, data_prefix: dict = {'img': ''}, filter_cfg: Optional[dict] = None, indices: Optional[Union[int, Sequence[int]]] = None, serialize_data: bool = True, pipeline: List[Union[dict, Callable]] = [], test_mode: bool = False, lazy_init: bool = False, max_refetch: int = 1000, sample_interval: int = 1)[源代码]

Rendered Handpose Dataset for hand pose estimation.

“Learning to Estimate 3D Hand Pose from Single RGB Images”, ICCV’2017. More details can be found in the paper .

Rhd keypoints:

'wrist',
'thumb4',
'thumb3',
'thumb2',
'thumb1',
'forefinger4',
'forefinger3',
'forefinger2',
'forefinger1',
'middle_finger4',
'middle_finger3',
'middle_finger2',
'middle_finger1',
'ring_finger4',
'ring_finger3',
'ring_finger2',
'ring_finger1',
'pinky_finger4',
'pinky_finger3',
'pinky_finger2',
'pinky_finger1'

参数

ann_file (str) – Annotation file path. Default: ‘’.
bbox_file (str, optional) – Detection result file path. If bbox_file is set, detected bboxes loaded from this file will be used instead of ground-truth bboxes. This setting is only for evaluation, i.e., ignored when test_mode is False. Default: None.
data_mode (str) – Specifies the mode of data samples: 'topdown' or 'bottomup'. In 'topdown' mode, each data sample contains one instance; while in 'bottomup' mode, each data sample contains all instances in a image. Default: 'topdown'
metainfo (dict, optional) – Meta information for dataset, such as class information. Default: None.
data_root (str, optional) – The root directory for data_prefix and ann_file. Default: None.
data_prefix (dict, optional) – Prefix for training data. Default: dict(img=None, ann=None).
filter_cfg (dict, optional) – Config for filter data. Default: None.
indices (int or Sequence[int], optional) – Support using first few data in annotation file to facilitate training/testing on a smaller dataset. Default: None which means using all data_infos.
serialize_data (bool, optional) – Whether to hold memory using serialized objects, when enabled, data loader workers can use shared RAM from master process instead of making a copy. Default: True.
pipeline (list, optional) – Processing pipeline. Default: [].
test_mode (bool, optional) – test_mode=True means in test phase. Default: False.
lazy_init (bool, optional) – Whether to load annotation during instantiation. In some cases, such as visualization, only the meta information of the dataset is needed, which is not necessary to load annotation file. Basedataset can skip load annotations to save time by set lazy_init=False. Default: False.
max_refetch (int, optional) – If Basedataset.prepare_data get a None img. The maximum extra number of cycles to get a valid image. Default: 1000.

class mmpose.datasets.datasets.animal.AP10KDataset(ann_file: str = '', bbox_file: Optional[str] = None, data_mode: str = 'topdown', metainfo: Optional[dict] = None, data_root: Optional[str] = None, data_prefix: dict = {'img': ''}, filter_cfg: Optional[dict] = None, indices: Optional[Union[int, Sequence[int]]] = None, serialize_data: bool = True, pipeline: List[Union[dict, Callable]] = [], test_mode: bool = False, lazy_init: bool = False, max_refetch: int = 1000, sample_interval: int = 1)[源代码]

AP-10K dataset for animal pose estimation.

“AP-10K: A Benchmark for Animal Pose Estimation in the Wild” Neurips Dataset Track’2021. More details can be found in the paper .

AP-10K keypoints:

'L_Eye',
'R_Eye',
'Nose',
'Neck',
'root of tail',
'L_Shoulder',
'L_Elbow',
'L_F_Paw',
'R_Shoulder',
'R_Elbow',
'R_F_Paw,
'L_Hip',
'L_Knee',
'L_B_Paw',
'R_Hip',
'R_Knee',
'R_B_Paw'

参数

ann_file (str) – Annotation file path. Default: ‘’.
bbox_file (str, optional) – Detection result file path. If bbox_file is set, detected bboxes loaded from this file will be used instead of ground-truth bboxes. This setting is only for evaluation, i.e., ignored when test_mode is False. Default: None.
data_mode (str) – Specifies the mode of data samples: 'topdown' or 'bottomup'. In 'topdown' mode, each data sample contains one instance; while in 'bottomup' mode, each data sample contains all instances in a image. Default: 'topdown'
metainfo (dict, optional) – Meta information for dataset, such as class information. Default: None.
data_root (str, optional) – The root directory for data_prefix and ann_file. Default: None.
data_prefix (dict, optional) – Prefix for training data. Default: dict(img=None, ann=None).
filter_cfg (dict, optional) – Config for filter data. Default: None.
indices (int or Sequence[int], optional) – Support using first few data in annotation file to facilitate training/testing on a smaller dataset. Default: None which means using all data_infos.
serialize_data (bool, optional) – Whether to hold memory using serialized objects, when enabled, data loader workers can use shared RAM from master process instead of making a copy. Default: True.
pipeline (list, optional) – Processing pipeline. Default: [].
test_mode (bool, optional) – test_mode=True means in test phase. Default: False.
lazy_init (bool, optional) – Whether to load annotation during instantiation. In some cases, such as visualization, only the meta information of the dataset is needed, which is not necessary to load annotation file. Basedataset can skip load annotations to save time by set lazy_init=False. Default: False.
max_refetch (int, optional) – If Basedataset.prepare_data get a None img. The maximum extra number of cycles to get a valid image. Default: 1000.

class mmpose.datasets.datasets.animal.ATRWDataset(ann_file: str = '', bbox_file: Optional[str] = None, data_mode: str = 'topdown', metainfo: Optional[dict] = None, data_root: Optional[str] = None, data_prefix: dict = {'img': ''}, filter_cfg: Optional[dict] = None, indices: Optional[Union[int, Sequence[int]]] = None, serialize_data: bool = True, pipeline: List[Union[dict, Callable]] = [], test_mode: bool = False, lazy_init: bool = False, max_refetch: int = 1000, sample_interval: int = 1)[源代码]

ATRW dataset for animal pose estimation.

“ATRW: A Benchmark for Amur Tiger Re-identification in the Wild” ACM MM’2020. More details can be found in the paper .

ATRW keypoints:

"left_ear",
"right_ear",
"nose",
"right_shoulder",
"right_front_paw",
"left_shoulder",
"left_front_paw",
"right_hip",
"right_knee",
"right_back_paw",
"left_hip",
"left_knee",
"left_back_paw",
"tail",
"center"

参数

ann_file (str) – Annotation file path. Default: ‘’.
bbox_file (str, optional) – Detection result file path. If bbox_file is set, detected bboxes loaded from this file will be used instead of ground-truth bboxes. This setting is only for evaluation, i.e., ignored when test_mode is False. Default: None.
data_mode (str) – Specifies the mode of data samples: 'topdown' or 'bottomup'. In 'topdown' mode, each data sample contains one instance; while in 'bottomup' mode, each data sample contains all instances in a image. Default: 'topdown'
metainfo (dict, optional) – Meta information for dataset, such as class information. Default: None.
data_root (str, optional) – The root directory for data_prefix and ann_file. Default: None.
data_prefix (dict, optional) – Prefix for training data. Default: dict(img=None, ann=None).
filter_cfg (dict, optional) – Config for filter data. Default: None.
indices (int or Sequence[int], optional) – Support using first few data in annotation file to facilitate training/testing on a smaller dataset. Default: None which means using all data_infos.
serialize_data (bool, optional) – Whether to hold memory using serialized objects, when enabled, data loader workers can use shared RAM from master process instead of making a copy. Default: True.
pipeline (list, optional) – Processing pipeline. Default: [].
test_mode (bool, optional) – test_mode=True means in test phase. Default: False.
lazy_init (bool, optional) – Whether to load annotation during instantiation. In some cases, such as visualization, only the meta information of the dataset is needed, which is not necessary to load annotation file. Basedataset can skip load annotations to save time by set lazy_init=False. Default: False.
max_refetch (int, optional) – If Basedataset.prepare_data get a None img. The maximum extra number of cycles to get a valid image. Default: 1000.

class mmpose.datasets.datasets.animal.AnimalKingdomDataset(ann_file: str = '', bbox_file: Optional[str] = None, data_mode: str = 'topdown', metainfo: Optional[dict] = None, data_root: Optional[str] = None, data_prefix: dict = {'img': ''}, filter_cfg: Optional[dict] = None, indices: Optional[Union[int, Sequence[int]]] = None, serialize_data: bool = True, pipeline: List[Union[dict, Callable]] = [], test_mode: bool = False, lazy_init: bool = False, max_refetch: int = 1000, sample_interval: int = 1)[源代码]

Animal Kingdom dataset for animal pose estimation.

“[CVPR2022] Animal Kingdom:: A Large and Diverse Dataset for Animal Behavior Understanding”

More details can be found in the paper .

Website: <https://sutdcv.github.io/Animal-Kingdom>

The dataset loads raw features and apply specified transforms to return a dict containing the image tensors and other information.

Animal Kingdom keypoint indexes:

'Head_Mid_Top',
'Eye_Left',
'Eye_Right',
'Mouth_Front_Top',
'Mouth_Back_Left',
'Mouth_Back_Right',
'Mouth_Front_Bottom',
'Shoulder_Left',
'Shoulder_Right',
'Elbow_Left',
'Elbow_Right',
'Wrist_Left',
'Wrist_Right',
'Torso_Mid_Back',
'Hip_Left',
'Hip_Right',
'Knee_Left',
'Knee_Right',
'Ankle_Left ',
'Ankle_Right',
'Tail_Top_Back',
'Tail_Mid_Back',
'Tail_End_Back

参数

ann_file (str) – Annotation file path. Default: ‘’.
bbox_file (str, optional) – Detection result file path. If bbox_file is set, detected bboxes loaded from this file will be used instead of ground-truth bboxes. This setting is only for evaluation, i.e., ignored when test_mode is False. Default: None.
data_mode (str) – Specifies the mode of data samples: 'topdown' or 'bottomup'. In 'topdown' mode, each data sample contains one instance; while in 'bottomup' mode, each data sample contains all instances in a image. Default: 'topdown'
metainfo (dict, optional) – Meta information for dataset, such as class information. Default: None.
data_root (str, optional) – The root directory for data_prefix and ann_file. Default: None.
data_prefix (dict, optional) – Prefix for training data. Default: dict(img=None, ann=None).
filter_cfg (dict, optional) – Config for filter data. Default: None.
indices (int or Sequence[int], optional) – Support using first few data in annotation file to facilitate training/testing on a smaller dataset. Default: None which means using all data_infos.
serialize_data (bool, optional) – Whether to hold memory using serialized objects, when enabled, data loader workers can use shared RAM from master process instead of making a copy. Default: True.
pipeline (list, optional) – Processing pipeline. Default: [].
test_mode (bool, optional) – test_mode=True means in test phase. Default: False.
lazy_init (bool, optional) – Whether to load annotation during instantiation. In some cases, such as visualization, only the meta information of the dataset is needed, which is not necessary to load annotation file. Basedataset can skip load annotations to save time by set lazy_init=False. Default: False.
max_refetch (int, optional) – If Basedataset.prepare_data get a None img. The maximum extra number of cycles to get a valid image. Default: 1000.

class mmpose.datasets.datasets.animal.AnimalPoseDataset(ann_file: str = '', bbox_file: Optional[str] = None, data_mode: str = 'topdown', metainfo: Optional[dict] = None, data_root: Optional[str] = None, data_prefix: dict = {'img': ''}, filter_cfg: Optional[dict] = None, indices: Optional[Union[int, Sequence[int]]] = None, serialize_data: bool = True, pipeline: List[Union[dict, Callable]] = [], test_mode: bool = False, lazy_init: bool = False, max_refetch: int = 1000, sample_interval: int = 1)[源代码]

Animal-Pose dataset for animal pose estimation.

“Cross-domain Adaptation For Animal Pose Estimation” ICCV’2019 More details can be found in the paper .

Animal-Pose keypoints:

'L_Eye',
'R_Eye',
'L_EarBase',
'R_EarBase',
'Nose',
'Throat',
'TailBase',
'Withers',
'L_F_Elbow',
'R_F_Elbow',
'L_B_Elbow',
'R_B_Elbow',
'L_F_Knee',
'R_F_Knee',
'L_B_Knee',
'R_B_Knee',
'L_F_Paw',
'R_F_Paw',
'L_B_Paw',
'R_B_Paw'

参数

ann_file (str) – Annotation file path. Default: ‘’.
bbox_file (str, optional) – Detection result file path. If bbox_file is set, detected bboxes loaded from this file will be used instead of ground-truth bboxes. This setting is only for evaluation, i.e., ignored when test_mode is False. Default: None.
data_mode (str) – Specifies the mode of data samples: 'topdown' or 'bottomup'. In 'topdown' mode, each data sample contains one instance; while in 'bottomup' mode, each data sample contains all instances in a image. Default: 'topdown'
metainfo (dict, optional) – Meta information for dataset, such as class information. Default: None.
data_root (str, optional) – The root directory for data_prefix and ann_file. Default: None.
data_prefix (dict, optional) – Prefix for training data. Default: dict(img=None, ann=None).
filter_cfg (dict, optional) – Config for filter data. Default: None.
indices (int or Sequence[int], optional) – Support using first few data in annotation file to facilitate training/testing on a smaller dataset. Default: None which means using all data_infos.
serialize_data (bool, optional) – Whether to hold memory using serialized objects, when enabled, data loader workers can use shared RAM from master process instead of making a copy. Default: True.
pipeline (list, optional) – Processing pipeline. Default: [].
test_mode (bool, optional) – test_mode=True means in test phase. Default: False.
lazy_init (bool, optional) – Whether to load annotation during instantiation. In some cases, such as visualization, only the meta information of the dataset is needed, which is not necessary to load annotation file. Basedataset can skip load annotations to save time by set lazy_init=False. Default: False.
max_refetch (int, optional) – If Basedataset.prepare_data get a None img. The maximum extra number of cycles to get a valid image. Default: 1000.

class mmpose.datasets.datasets.animal.FlyDataset(ann_file: str = '', bbox_file: Optional[str] = None, data_mode: str = 'topdown', metainfo: Optional[dict] = None, data_root: Optional[str] = None, data_prefix: dict = {'img': ''}, filter_cfg: Optional[dict] = None, indices: Optional[Union[int, Sequence[int]]] = None, serialize_data: bool = True, pipeline: List[Union[dict, Callable]] = [], test_mode: bool = False, lazy_init: bool = False, max_refetch: int = 1000, sample_interval: int = 1)[源代码]

FlyDataset for animal pose estimation.

“Fast animal pose estimation using deep neural networks” Nature methods’2019. More details can be found in the paper .

Vinegar Fly keypoints:

"head",
"eyeL",
"eyeR",
"neck",
"thorax",
"abdomen",
"forelegR1",
"forelegR2",
"forelegR3",
"forelegR4",
"midlegR1",
"midlegR2",
"midlegR3",
"midlegR4",
"hindlegR1",
"hindlegR2",
"hindlegR3",
"hindlegR4",
"forelegL1",
"forelegL2",
"forelegL3",
"forelegL4",
"midlegL1",
"midlegL2",
"midlegL3",
"midlegL4",
"hindlegL1",
"hindlegL2",
"hindlegL3",
"hindlegL4",
"wingL",
"wingR"

参数

ann_file (str) – Annotation file path. Default: ‘’.
bbox_file (str, optional) – Detection result file path. If bbox_file is set, detected bboxes loaded from this file will be used instead of ground-truth bboxes. This setting is only for evaluation, i.e., ignored when test_mode is False. Default: None.
data_mode (str) – Specifies the mode of data samples: 'topdown' or 'bottomup'. In 'topdown' mode, each data sample contains one instance; while in 'bottomup' mode, each data sample contains all instances in a image. Default: 'topdown'
metainfo (dict, optional) – Meta information for dataset, such as class information. Default: None.
data_root (str, optional) – The root directory for data_prefix and ann_file. Default: None.
data_prefix (dict, optional) – Prefix for training data. Default: dict(img=None, ann=None).
filter_cfg (dict, optional) – Config for filter data. Default: None.
indices (int or Sequence[int], optional) – Support using first few data in annotation file to facilitate training/testing on a smaller dataset. Default: None which means using all data_infos.
serialize_data (bool, optional) – Whether to hold memory using serialized objects, when enabled, data loader workers can use shared RAM from master process instead of making a copy. Default: True.
pipeline (list, optional) – Processing pipeline. Default: [].
test_mode (bool, optional) – test_mode=True means in test phase. Default: False.
lazy_init (bool, optional) – Whether to load annotation during instantiation. In some cases, such as visualization, only the meta information of the dataset is needed, which is not necessary to load annotation file. Basedataset can skip load annotations to save time by set lazy_init=False. Default: False.
max_refetch (int, optional) – If Basedataset.prepare_data get a None img. The maximum extra number of cycles to get a valid image. Default: 1000.

class mmpose.datasets.datasets.animal.Horse10Dataset(ann_file: str = '', bbox_file: Optional[str] = None, data_mode: str = 'topdown', metainfo: Optional[dict] = None, data_root: Optional[str] = None, data_prefix: dict = {'img': ''}, filter_cfg: Optional[dict] = None, indices: Optional[Union[int, Sequence[int]]] = None, serialize_data: bool = True, pipeline: List[Union[dict, Callable]] = [], test_mode: bool = False, lazy_init: bool = False, max_refetch: int = 1000, sample_interval: int = 1)[源代码]

Horse10Dataset for animal pose estimation.

“Pretraining boosts out-of-domain robustness for pose estimation” WACV’2021. More details can be found in the paper .

Horse-10 keypoints:

'Nose',
'Eye',
'Nearknee',
'Nearfrontfetlock',
'Nearfrontfoot',
'Offknee',
'Offfrontfetlock',
'Offfrontfoot',
'Shoulder',
'Midshoulder',
'Elbow',
'Girth',
'Wither',
'Nearhindhock',
'Nearhindfetlock',
'Nearhindfoot',
'Hip',
'Stifle',
'Offhindhock',
'Offhindfetlock',
'Offhindfoot',
'Ischium'

参数

ann_file (str) – Annotation file path. Default: ‘’.
bbox_file (str, optional) – Detection result file path. If bbox_file is set, detected bboxes loaded from this file will be used instead of ground-truth bboxes. This setting is only for evaluation, i.e., ignored when test_mode is False. Default: None.
data_mode (str) – Specifies the mode of data samples: 'topdown' or 'bottomup'. In 'topdown' mode, each data sample contains one instance; while in 'bottomup' mode, each data sample contains all instances in a image. Default: 'topdown'
metainfo (dict, optional) – Meta information for dataset, such as class information. Default: None.
data_root (str, optional) – The root directory for data_prefix and ann_file. Default: None.
data_prefix (dict, optional) – Prefix for training data. Default: dict(img=None, ann=None).
filter_cfg (dict, optional) – Config for filter data. Default: None.
indices (int or Sequence[int], optional) – Support using first few data in annotation file to facilitate training/testing on a smaller dataset. Default: None which means using all data_infos.
serialize_data (bool, optional) – Whether to hold memory using serialized objects, when enabled, data loader workers can use shared RAM from master process instead of making a copy. Default: True.
pipeline (list, optional) – Processing pipeline. Default: [].
test_mode (bool, optional) – test_mode=True means in test phase. Default: False.
lazy_init (bool, optional) – Whether to load annotation during instantiation. In some cases, such as visualization, only the meta information of the dataset is needed, which is not necessary to load annotation file. Basedataset can skip load annotations to save time by set lazy_init=False. Default: False.
max_refetch (int, optional) – If Basedataset.prepare_data get a None img. The maximum extra number of cycles to get a valid image. Default: 1000.

class mmpose.datasets.datasets.animal.LocustDataset(ann_file: str = '', bbox_file: Optional[str] = None, data_mode: str = 'topdown', metainfo: Optional[dict] = None, data_root: Optional[str] = None, data_prefix: dict = {'img': ''}, filter_cfg: Optional[dict] = None, indices: Optional[Union[int, Sequence[int]]] = None, serialize_data: bool = True, pipeline: List[Union[dict, Callable]] = [], test_mode: bool = False, lazy_init: bool = False, max_refetch: int = 1000, sample_interval: int = 1)[源代码]

LocustDataset for animal pose estimation.

“DeepPoseKit, a software toolkit for fast and robust animal pose estimation using deep learning” Elife’2019. More details can be found in the paper .

Desert Locust keypoints:

"head",
"neck",
"thorax",
"abdomen1",
"abdomen2",
"anttipL",
"antbaseL",
"eyeL",
"forelegL1",
"forelegL2",
"forelegL3",
"forelegL4",
"midlegL1",
"midlegL2",
"midlegL3",
"midlegL4",
"hindlegL1",
"hindlegL2",
"hindlegL3",
"hindlegL4",
"anttipR",
"antbaseR",
"eyeR",
"forelegR1",
"forelegR2",
"forelegR3",
"forelegR4",
"midlegR1",
"midlegR2",
"midlegR3",
"midlegR4",
"hindlegR1",
"hindlegR2",
"hindlegR3",
"hindlegR4"

参数

ann_file (str) – Annotation file path. Default: ‘’.
bbox_file (str, optional) – Detection result file path. If bbox_file is set, detected bboxes loaded from this file will be used instead of ground-truth bboxes. This setting is only for evaluation, i.e., ignored when test_mode is False. Default: None.
data_mode (str) – Specifies the mode of data samples: 'topdown' or 'bottomup'. In 'topdown' mode, each data sample contains one instance; while in 'bottomup' mode, each data sample contains all instances in a image. Default: 'topdown'
metainfo (dict, optional) – Meta information for dataset, such as class information. Default: None.
data_root (str, optional) – The root directory for data_prefix and ann_file. Default: None.
data_prefix (dict, optional) – Prefix for training data. Default: dict(img=None, ann=None).
filter_cfg (dict, optional) – Config for filter data. Default: None.
indices (int or Sequence[int], optional) – Support using first few data in annotation file to facilitate training/testing on a smaller dataset. Default: None which means using all data_infos.
serialize_data (bool, optional) – Whether to hold memory using serialized objects, when enabled, data loader workers can use shared RAM from master process instead of making a copy. Default: True.
pipeline (list, optional) – Processing pipeline. Default: [].
test_mode (bool, optional) – test_mode=True means in test phase. Default: False.
lazy_init (bool, optional) – Whether to load annotation during instantiation. In some cases, such as visualization, only the meta information of the dataset is needed, which is not necessary to load annotation file. Basedataset can skip load annotations to save time by set lazy_init=False. Default: False.
max_refetch (int, optional) – If Basedataset.prepare_data get a None img. The maximum extra number of cycles to get a valid image. Default: 1000.

parse_data_info(raw_data_info: dict) → Optional[dict][源代码]

Parse raw Locust annotation of an instance.

参数

raw_data_info (dict) –

Raw data information loaded from ann_file. It should have following contents:

'raw_ann_info': Raw annotation of an instance
'raw_img_info': Raw information of the image that
contains the instance

返回

Parsed instance annotation

返回类型

dict

class mmpose.datasets.datasets.animal.MacaqueDataset(ann_file: str = '', bbox_file: Optional[str] = None, data_mode: str = 'topdown', metainfo: Optional[dict] = None, data_root: Optional[str] = None, data_prefix: dict = {'img': ''}, filter_cfg: Optional[dict] = None, indices: Optional[Union[int, Sequence[int]]] = None, serialize_data: bool = True, pipeline: List[Union[dict, Callable]] = [], test_mode: bool = False, lazy_init: bool = False, max_refetch: int = 1000, sample_interval: int = 1)[源代码]

MacaquePose dataset for animal pose estimation.

“MacaquePose: A novel ‘in the wild’ macaque monkey pose dataset for markerless motion capture” bioRxiv’2020. More details can be found in the paper .

Macaque keypoints:

'nose',
'left_eye',
'right_eye',
'left_ear',
'right_ear',
'left_shoulder',
'right_shoulder',
'left_elbow',
'right_elbow',
'left_wrist',
'right_wrist',
'left_hip',
'right_hip',
'left_knee',
'right_knee',
'left_ankle',
'right_ankle'

参数

ann_file (str) – Annotation file path. Default: ‘’.
bbox_file (str, optional) – Detection result file path. If bbox_file is set, detected bboxes loaded from this file will be used instead of ground-truth bboxes. This setting is only for evaluation, i.e., ignored when test_mode is False. Default: None.
data_mode (str) – Specifies the mode of data samples: 'topdown' or 'bottomup'. In 'topdown' mode, each data sample contains one instance; while in 'bottomup' mode, each data sample contains all instances in a image. Default: 'topdown'
metainfo (dict, optional) – Meta information for dataset, such as class information. Default: None.
data_root (str, optional) – The root directory for data_prefix and ann_file. Default: None.
data_prefix (dict, optional) – Prefix for training data. Default: dict(img=None, ann=None).
filter_cfg (dict, optional) – Config for filter data. Default: None.
indices (int or Sequence[int], optional) – Support using first few data in annotation file to facilitate training/testing on a smaller dataset. Default: None which means using all data_infos.
serialize_data (bool, optional) – Whether to hold memory using serialized objects, when enabled, data loader workers can use shared RAM from master process instead of making a copy. Default: True.
pipeline (list, optional) – Processing pipeline. Default: [].
test_mode (bool, optional) – test_mode=True means in test phase. Default: False.
lazy_init (bool, optional) – Whether to load annotation during instantiation. In some cases, such as visualization, only the meta information of the dataset is needed, which is not necessary to load annotation file. Basedataset can skip load annotations to save time by set lazy_init=False. Default: False.
max_refetch (int, optional) – If Basedataset.prepare_data get a None img. The maximum extra number of cycles to get a valid image. Default: 1000.

class mmpose.datasets.datasets.animal.ZebraDataset(ann_file: str = '', bbox_file: Optional[str] = None, data_mode: str = 'topdown', metainfo: Optional[dict] = None, data_root: Optional[str] = None, data_prefix: dict = {'img': ''}, filter_cfg: Optional[dict] = None, indices: Optional[Union[int, Sequence[int]]] = None, serialize_data: bool = True, pipeline: List[Union[dict, Callable]] = [], test_mode: bool = False, lazy_init: bool = False, max_refetch: int = 1000, sample_interval: int = 1)[源代码]

ZebraDataset for animal pose estimation.

“DeepPoseKit, a software toolkit for fast and robust animal pose estimation using deep learning” Elife’2019. More details can be found in the paper .

Zebra keypoints:

"snout",
"head",
"neck",
"forelegL1",
"forelegR1",
"hindlegL1",
"hindlegR1",
"tailbase",
"tailtip"

参数

ann_file (str) – Annotation file path. Default: ‘’.
bbox_file (str, optional) – Detection result file path. If bbox_file is set, detected bboxes loaded from this file will be used instead of ground-truth bboxes. This setting is only for evaluation, i.e., ignored when test_mode is False. Default: None.
data_mode (str) – Specifies the mode of data samples: 'topdown' or 'bottomup'. In 'topdown' mode, each data sample contains one instance; while in 'bottomup' mode, each data sample contains all instances in a image. Default: 'topdown'
metainfo (dict, optional) – Meta information for dataset, such as class information. Default: None.
data_root (str, optional) – The root directory for data_prefix and ann_file. Default: None.
data_prefix (dict, optional) – Prefix for training data. Default: dict(img=None, ann=None).
filter_cfg (dict, optional) – Config for filter data. Default: None.
indices (int or Sequence[int], optional) – Support using first few data in annotation file to facilitate training/testing on a smaller dataset. Default: None which means using all data_infos.
serialize_data (bool, optional) – Whether to hold memory using serialized objects, when enabled, data loader workers can use shared RAM from master process instead of making a copy. Default: True.
pipeline (list, optional) – Processing pipeline. Default: [].
test_mode (bool, optional) – test_mode=True means in test phase. Default: False.
lazy_init (bool, optional) – Whether to load annotation during instantiation. In some cases, such as visualization, only the meta information of the dataset is needed, which is not necessary to load annotation file. Basedataset can skip load annotations to save time by set lazy_init=False. Default: False.
max_refetch (int, optional) – If Basedataset.prepare_data get a None img. The maximum extra number of cycles to get a valid image. Default: 1000.

parse_data_info(raw_data_info: dict) → Optional[dict][源代码]

Parse raw Zebra annotation of an instance.

参数

raw_data_info (dict) –

Raw data information loaded from ann_file. It should have following contents:

'raw_ann_info': Raw annotation of an instance
'raw_img_info': Raw information of the image that
contains the instance

返回

Parsed instance annotation

返回类型

dict

class mmpose.datasets.datasets.fashion.DeepFashion2Dataset(ann_file: str = '', bbox_file: Optional[str] = None, data_mode: str = 'topdown', metainfo: Optional[dict] = None, data_root: Optional[str] = None, data_prefix: dict = {'img': ''}, filter_cfg: Optional[dict] = None, indices: Optional[Union[int, Sequence[int]]] = None, serialize_data: bool = True, pipeline: List[Union[dict, Callable]] = [], test_mode: bool = False, lazy_init: bool = False, max_refetch: int = 1000, sample_interval: int = 1)[源代码]: DeepFashion2 dataset for fashion landmark detection.

class mmpose.datasets.datasets.fashion.DeepFashionDataset(ann_file: str = '', subset: str = '', bbox_file: Optional[str] = None, data_mode: str = 'topdown', metainfo: Optional[dict] = None, data_root: Optional[str] = None, data_prefix: dict = {'img': ''}, filter_cfg: Optional[dict] = None, indices: Optional[Union[int, Sequence[int]]] = None, serialize_data: bool = True, pipeline: List[Union[dict, Callable]] = [], test_mode: bool = False, lazy_init: bool = False, max_refetch: int = 1000)[源代码]

DeepFashion dataset (full-body clothes) for fashion landmark detection.

“DeepFashion: Powering Robust Clothes Recognition and Retrieval with Rich Annotations”, CVPR’2016. “Fashion Landmark Detection in the Wild”, ECCV’2016.

The dataset contains 3 categories for full-body, upper-body and lower-body.

Fashion landmark indexes for upper-body clothes:

'left collar',
'right collar',
'left sleeve',
'right sleeve',
'left hem',
'right hem'

Fashion landmark indexes for lower-body clothes:

'left waistline',
'right waistline',
'left hem',
'right hem'

Fashion landmark indexes for full-body clothes:

'left collar',
'right collar',
'left sleeve',
'right sleeve',
'left waistline',
'right waistline',
'left hem',
'right hem'

参数

ann_file (str) – Annotation file path. Default: ‘’.
subset (str) – Specifies the subset of body: 'full', 'upper' or 'lower'. Default: ‘’, which means 'full'.
bbox_file (str, optional) – Detection result file path. If bbox_file is set, detected bboxes loaded from this file will be used instead of ground-truth bboxes. This setting is only for evaluation, i.e., ignored when test_mode is False. Default: None.
data_mode (str) – Specifies the mode of data samples: 'topdown' or 'bottomup'. In 'topdown' mode, each data sample contains one instance; while in 'bottomup' mode, each data sample contains all instances in a image. Default: 'topdown'
metainfo (dict, optional) – Meta information for dataset, such as class information. Default: None.
data_root (str, optional) – The root directory for data_prefix and ann_file. Default: None.
data_prefix (dict, optional) – Prefix for training data. Default: dict(img='').
filter_cfg (dict, optional) – Config for filter data. Default: None.
indices (int or Sequence[int], optional) – Support using first few data in annotation file to facilitate training/testing on a smaller dataset. Default: None which means using all data_infos.
serialize_data (bool, optional) – Whether to hold memory using serialized objects, when enabled, data loader workers can use shared RAM from master process instead of making a copy. Default: True.
pipeline (list, optional) – Processing pipeline. Default: [].
test_mode (bool, optional) – test_mode=True means in test phase. Default: False.
lazy_init (bool, optional) – Whether to load annotation during instantiation. In some cases, such as visualization, only the meta information of the dataset is needed, which is not necessary to load annotation file. Basedataset can skip load annotations to save time by set lazy_init=False. Default: False.
max_refetch (int, optional) – If Basedataset.prepare_data get a None img. The maximum extra number of cycles to get a valid image. Default: 1000.

transforms¶

class mmpose.datasets.transforms.loading.LoadImage(to_float32: bool = False, color_type: str = 'color', imdecode_backend: str = 'cv2', file_client_args: Optional[dict] = None, ignore_empty: bool = False, *, backend_args: Optional[dict] = None)[源代码]¶

Load an image from file or from the np.ndarray in results['img'].

Required Keys:

img_path

img (optional)

Modified Keys:

img

img_shape

ori_shape

img_path (optional)

参数

to_float32 (bool) – Whether to convert the loaded image to a float32 numpy array. If set to False, the loaded image is an uint8 array. Defaults to False.
color_type (str) – The flag argument for :func:mmcv.imfrombytes. Defaults to ‘color’.
imdecode_backend (str) – The image decoding backend type. The backend argument for :func:mmcv.imfrombytes. See :func:mmcv.imfrombytes for details. Defaults to ‘cv2’.
backend_args (dict, optional) – Arguments to instantiate the preifx of uri corresponding backend. Defaults to None.
ignore_empty (bool) – Whether to allow loading empty image or file path not existent. Defaults to False.

transform(results: dict) → Optional[dict][源代码]¶

The transform function of LoadImage.

参数: results (dict) – The result dict
返回: The result dict.
返回类型: dict

class mmpose.datasets.transforms.common_transforms.Albumentation(transforms: List[dict], keymap: Optional[dict] = None)[源代码]¶

Albumentation augmentation (pixel-level transforms only).

Adds custom pixel-level transformations from Albumentations library. Please visit https://albumentations.ai/docs/ to get more information.

Note: we only support pixel-level transforms. Please visit https://github.com/albumentations-team/ albumentations#pixel-level-transforms to get more information about pixel-level transforms.

Required Keys:

img

Modified Keys:

img

参数

transforms (List[dict]) –
A list of Albumentation transforms. An example of transforms is as followed: .. code-block:: python

[

dict(
type=’RandomBrightnessContrast’, brightness_limit=[0.1, 0.3], contrast_limit=[0.1, 0.3], p=0.2),

dict(type=’ChannelShuffle’, p=0.1), dict(

type=’OneOf’, transforms=[

dict(type=’Blur’, blur_limit=3, p=1.0), dict(type=’MedianBlur’, blur_limit=3, p=1.0)

], p=0.1),

]
keymap (dict | None) – key mapping from input key to albumentation-style key. Defaults to None, which will use {‘img’: ‘image’}.

albu_builder(cfg: dict) → None[源代码]¶

Import a module from albumentations.

It resembles some of build_from_cfg() logic.

参数: cfg (dict) – Config dict. It should at least contain the key “type”.
返回: The constructed transform object
返回类型: albumentations.BasicTransform

transform(results: dict) → dict[源代码]¶

The transform function of Albumentation to apply albumentations transforms.

See transform() method of BaseTransform for details.

参数: results (dict) – Result dict from the data pipeline.
返回: updated result dict.
返回类型: dict

class mmpose.datasets.transforms.common_transforms.FilterAnnotations(min_gt_bbox_wh: Tuple[int, int] = (1, 1), min_gt_area: int = 1, min_kpt_vis: int = 1, by_box: bool = False, by_area: bool = False, by_kpt: bool = True, keep_empty: bool = True)[源代码]¶

Eliminate undesirable annotations based on specific conditions.

This class is designed to sift through annotations by examining multiple factors such as the size of the bounding box, the visibility of keypoints, and the overall area. Users can fine-tune the criteria to filter out instances that have excessively small bounding boxes, insufficient area, or an inadequate number of visible keypoints.

Required Keys:

bbox (np.ndarray) (optional)
area (np.int64) (optional)
keypoints_visible (np.ndarray) (optional)

Modified Keys:

bbox (optional)
bbox_score (optional)
category_id (optional)
keypoints (optional)
keypoints_visible (optional)
area (optional)

参数

min_gt_bbox_wh (tuple[float]) – Minimum width and height of ground truth boxes. Default: (1., 1.)
min_gt_area (int) – Minimum foreground area of instances. Default: 1
min_kpt_vis (int) – Minimum number of visible keypoints. Default: 1
by_box (bool) – Filter instances with bounding boxes not meeting the min_gt_bbox_wh threshold. Default: False
by_area (bool) – Filter instances with area less than min_gt_area threshold. Default: False
by_kpt (bool) – Filter instances with keypoints_visible not meeting the min_kpt_vis threshold. Default: True
keep_empty (bool) – Whether to return None when it becomes an empty bbox after filtering. Defaults to True.

transform(results: dict) → Optional[dict][源代码]¶

Transform function to filter annotations.

参数: results (dict) – Result dict.
返回: Updated result dict.
返回类型: dict

class mmpose.datasets.transforms.common_transforms.GenerateTarget(encoder: Union[mmengine.config.config.ConfigDict, dict, List[Union[mmengine.config.config.ConfigDict, dict]]], target_type: Optional[str] = None, multilevel: bool = False, use_dataset_keypoint_weights: bool = False)[源代码]¶

Encode keypoints into Target.

The generated target is usually the supervision signal of the model learning, e.g. heatmaps or regression labels.

Required Keys:

keypoints

keypoints_visible

dataset_keypoint_weights

Added Keys:

The keys of the encoded items from the codec will be updated into
the results, e.g. 'heatmaps' or 'keypoint_weights'. See the specific codec for more details.

参数

encoder (dict | list[dict]) – The codec config for keypoint encoding. Both single encoder and multiple encoders (given as a list) are supported
multilevel (bool) – Determine the method to handle multiple encoders. If multilevel==True, generate multilevel targets from a group of encoders of the same type (e.g. multiple MSRAHeatmap encoders with different sigma values); If multilevel==False, generate combined targets from a group of different encoders. This argument will have no effect in case of single encoder. Defaults to False
use_dataset_keypoint_weights (bool) – Whether use the keypoint weights from the dataset meta information. Defaults to False
target_type (str, deprecated) – This argument is deprecated and has no effect. Defaults to None

transform(results: Dict) → Optional[dict][源代码]¶

The transform function of GenerateTarget.

See transform() method of BaseTransform for details.

class mmpose.datasets.transforms.common_transforms.GetBBoxCenterScale(padding: float = 1.25)[源代码]¶

Convert bboxes from [x, y, w, h] to center and scale.

The center is the coordinates of the bbox center, and the scale is the bbox width and height normalized by a scale factor.

Required Keys:

bbox

Added Keys:

bbox_center

bbox_scale

参数: padding (float) – The bbox padding scale that will be multilied to bbox_scale. Defaults to 1.25

transform(results: Dict) → Optional[dict][源代码]¶

The transform function of GetBBoxCenterScale.

See transform() method of BaseTransform for details.

参数: results (dict) – The result dict
返回: The result dict.
返回类型: dict

class mmpose.datasets.transforms.common_transforms.PhotometricDistortion(brightness_delta: int = 32, contrast_range: Sequence[Union[float, int]] = (0.5, 1.5), saturation_range: Sequence[Union[float, int]] = (0.5, 1.5), hue_delta: int = 18)[源代码]¶

Apply photometric distortion to image sequentially, every transformation is applied with a probability of 0.5. The position of random contrast is in second or second to last.

random brightness
random contrast (mode 0)
convert color from BGR to HSV
random saturation
random hue
convert color from HSV to BGR
random contrast (mode 1)
randomly swap channels

Required Keys:

img

Modified Keys:

img

参数

brightness_delta (int) – delta of brightness.
contrast_range (tuple) – range of contrast.
saturation_range (tuple) – range of saturation.
hue_delta (int) – delta of hue.

transform(results: dict) → dict[源代码]¶

The transform function of PhotometricDistortion to perform photometric distortion on images.

See transform() method of BaseTransform for details.

参数: results (dict) – Result dict from the data pipeline.
返回: Result dict with images distorted.
返回类型: dict

class mmpose.datasets.transforms.common_transforms.RandomBBoxTransform(shift_factor: float = 0.16, shift_prob: float = 0.3, scale_factor: Tuple[float, float] = (0.5, 1.5), scale_prob: float = 1.0, rotate_factor: float = 80.0, rotate_prob: float = 0.6)[源代码]¶

Rnadomly shift, resize and rotate the bounding boxes.

Required Keys:

bbox_center

bbox_scale

Modified Keys:

bbox_center

bbox_scale

Added Keys:

bbox_rotation

参数

shift_factor (float) – Randomly shift the bbox in range \([-dx, dx]\) and \([-dy, dy]\) in X and Y directions, where \(dx(y) = x(y)_scale \cdot shift_factor\) in pixels. Defaults to 0.16
shift_prob (float) – Probability of applying random shift. Defaults to 0.3
scale_factor (Tuple[float, float]) – Randomly resize the bbox in range \([scale_factor[0], scale_factor[1]]\). Defaults to (0.5, 1.5)
scale_prob (float) – Probability of applying random resizing. Defaults to 1.0
rotate_factor (float) – Randomly rotate the bbox in \([-rotate_factor, rotate_factor]\) in degrees. Defaults to 80.0
rotate_prob (float) – Probability of applying random rotation. Defaults to 0.6

transform(results: Dict) → Optional[dict][源代码]¶

The transform function of RandomBboxTransform.

See transform() method of BaseTransform for details.

参数: results (dict) – The result dict
返回: The result dict.
返回类型: dict

class mmpose.datasets.transforms.common_transforms.RandomFlip(prob: Union[float, List[float]] = 0.5, direction: Union[str, List[str]] = 'horizontal')[源代码]¶

Randomly flip the image, bbox and keypoints.

Required Keys:

img

img_shape

flip_indices

input_size (optional)

bbox (optional)

bbox_center (optional)

keypoints (optional)

keypoints_visible (optional)

img_mask (optional)

Modified Keys:

img

bbox (optional)

bbox_center (optional)

keypoints (optional)

keypoints_visible (optional)

img_mask (optional)

Added Keys:

flip

flip_direction

参数

prob (float | list[float]) – The flipping probability. If a list is given, the argument direction should be a list with the same length. And each element in prob indicates the flipping probability of the corresponding one in direction. Defaults to 0.5
direction (str | list[str]) – The flipping direction. Options are 'horizontal', 'vertical' and 'diagonal'. If a list is is given, each data sample’s flipping direction will be sampled from a distribution determined by the argument prob. Defaults to 'horizontal'.

transform(results: dict) → dict[源代码]¶

The transform function of RandomFlip.

See transform() method of BaseTransform for details.

参数: results (dict) – The result dict
返回: The result dict.
返回类型: dict

class mmpose.datasets.transforms.common_transforms.RandomHalfBody(min_total_keypoints: int = 9, min_upper_keypoints: int = 2, min_lower_keypoints: int = 3, padding: float = 1.5, prob: float = 0.3, upper_prioritized_prob: float = 0.7)[源代码]¶

Data augmentation with half-body transform that keeps only the upper or lower body at random.

Required Keys:

keypoints

keypoints_visible

upper_body_ids

lower_body_ids

Modified Keys:

bbox

bbox_center

bbox_scale

参数

min_total_keypoints (int) – The minimum required number of total valid keypoints of a person to apply half-body transform. Defaults to 8
min_half_keypoints (int) – The minimum required number of valid half-body keypoints of a person to apply half-body transform. Defaults to 2
padding (float) – The bbox padding scale that will be multilied to bbox_scale. Defaults to 1.5
prob (float) – The probability to apply half-body transform when the keypoint number meets the requirement. Defaults to 0.3

transform(results: Dict) → Optional[dict][源代码]¶

The transform function of HalfBodyTransform.

See transform() method of BaseTransform for details.

参数: results (dict) – The result dict
返回: The result dict.
返回类型: dict

class mmpose.datasets.transforms.common_transforms.YOLOXHSVRandomAug(hue_delta: int = 5, saturation_delta: int = 30, value_delta: int = 30)[源代码]¶

Apply HSV augmentation to image sequentially. It is referenced from https://github.com/Megvii- BaseDetection/YOLOX/blob/main/yolox/data/data_augment.py#L21.

Required Keys:

img

Modified Keys:

img

参数

hue_delta (int) – delta of hue. Defaults to 5.
saturation_delta (int) – delta of saturation. Defaults to 30.
value_delta (int) – delat of value. Defaults to 30.

transform(results: dict) → dict[源代码]¶

The transform function. All subclass of BaseTransform should override this method.

This function takes the result dict as the input, and can add new items to the dict or modify existing items in the dict. And the result dict will be returned in the end, which allows to concate multiple transforms into a pipeline.

参数: results (dict) – The result dict.
返回: The result dict.
返回类型: dict

class mmpose.datasets.transforms.topdown_transforms.TopdownAffine(input_size: Tuple[int, int], use_udp: bool = False)[源代码]¶

Get the bbox image as the model input by affine transform.

Required Keys:

img

bbox_center

bbox_scale

bbox_rotation (optional)

keypoints (optional)

Modified Keys:

img

bbox_scale

Added Keys:

input_size

transformed_keypoints

参数

input_size (Tuple[int, int]) – The input image size of the model in [w, h]. The bbox region will be cropped and resize to input_size
use_udp (bool) – Whether use unbiased data processing. See `UDP (CVPR 2020)`_ for details. Defaults to False

transform(results: Dict) → Optional[dict][源代码]¶

The transform function of TopdownAffine.

See transform() method of BaseTransform for details.

参数: results (dict) – The result dict
返回: The result dict.
返回类型: dict

class mmpose.datasets.transforms.bottomup_transforms.BottomupGetHeatmapMask(get_invalid: bool = False)[源代码]¶

Generate the mask of valid regions from the segmentation annotation.

Required Keys:

img_shape

invalid_segs (optional)

warp_mat (optional)

flip (optional)

flip_direction (optional)

heatmaps (optional)

Added Keys:

heatmap_mask

transform(results: Dict) → Optional[dict][源代码]¶

The transform function of BottomupGetHeatmapMask to perform photometric distortion on images.

See transform() method of BaseTransform for details.

参数: results (dict) – Result dict from the data pipeline.
返回: Result dict with images distorted.
返回类型: dict

class mmpose.datasets.transforms.bottomup_transforms.BottomupRandomAffine(input_size: Optional[Tuple[int, int]] = None, shift_factor: float = 0.2, shift_prob: float = 1.0, scale_factor: Tuple[float, float] = (0.75, 1.5), scale_prob: float = 1.0, scale_type: str = 'short', rotate_factor: float = 30.0, rotate_prob: float = 1, shear_factor: float = 2.0, shear_prob: float = 1.0, use_udp: bool = False, pad_val: Union[float, Tuple[float]] = 0, border: Tuple[int, int] = (0, 0), distribution='trunc_norm', transform_mode='affine', bbox_keep_corner: bool = True, clip_border: bool = False)[源代码]¶

Randomly shift, resize and rotate the image.

Required Keys:

img

img_shape

keypoints (optional)

Modified Keys:

img

keypoints (optional)

Added Keys:

input_size

warp_mat

参数

input_size (Tuple[int, int]) – The input image size of the model in [w, h]
shift_factor (float) – Randomly shift the image in range \([-dx, dx]\) and \([-dy, dy]\) in X and Y directions, where \(dx(y) = img_w(h) \cdot shift_factor\) in pixels. Defaults to 0.2
shift_prob (float) – Probability of applying random shift. Defaults to 1.0
scale_factor (Tuple[float, float]) – Randomly resize the image in range \([scale_factor[0], scale_factor[1]]\). Defaults to (0.75, 1.5)
scale_prob (float) – Probability of applying random resizing. Defaults to 1.0
scale_type (str) – wrt long or short length of the image. Defaults to short
rotate_factor (float) – Randomly rotate the bbox in \([-rotate_factor, rotate_factor]\) in degrees. Defaults to 40.0
use_udp (bool) – Whether use unbiased data processing. See `UDP (CVPR 2020)`_ for details. Defaults to False

transform(results: Dict) → Optional[dict][源代码]¶

The transform function of BottomupRandomAffine to perform photometric distortion on images.

See transform() method of BaseTransform for details.

参数: results (dict) – Result dict from the data pipeline.
返回: Result dict with images distorted.
返回类型: dict

class mmpose.datasets.transforms.bottomup_transforms.BottomupRandomChoiceResize(scales: Sequence[Union[int, Tuple]], keep_ratio: bool = False, clip_object_border: bool = True, backend: str = 'cv2', **resize_kwargs)[源代码]¶

Resize images & bbox & mask from a list of multiple scales.

This transform resizes the input image to some scale. Bboxes and masks are then resized with the same scale factor. Resize scale will be randomly selected from scales.

How to choose the target scale to resize the image will follow the rules below:

if scale is a list of tuple, the target scale is sampled from the list uniformally.
if scale is a tuple, the target scale will be set to the tuple.

Required Keys:

img
bbox
keypoints

Modified Keys:

img
img_shape
bbox
keypoints

Added Keys:

scale
scale_factor
scale_idx

参数

scales (Union[list, Tuple]) – Images scales for resizing.
**resize_kwargs – Other keyword arguments for the resize_type.

transform(results: dict) → dict[源代码]¶

Apply resize transforms on results from a list of scales.

参数: results (dict) – Result dict contains the data to transform.
返回: Resized results, ‘img’, ‘bbox’, ‘keypoints’, ‘scale’, ‘scale_factor’, ‘img_shape’, and ‘keep_ratio’ keys are updated in result dict.
返回类型: dict

class mmpose.datasets.transforms.bottomup_transforms.BottomupRandomCrop(crop_size: tuple, crop_type: str = 'absolute', allow_negative_crop: bool = False, recompute_bbox: bool = False, bbox_clip_border: bool = True)[源代码]¶

Random crop the image & bboxes & masks.

The absolute crop_size is sampled based on crop_type and image_size, then the cropped results are generated.

Required Keys:

img

keypoints

bbox (optional)

masks (BitmapMasks | PolygonMasks) (optional)

Modified Keys:

img

img_shape

keypoints

keypoints_visible

num_keypoints

bbox (optional)

bbox_score (optional)

id (optional)

category_id (optional)

raw_ann_info (optional)

iscrowd (optional)

segmentation (optional)

masks (optional)

Added Keys:

warp_mat

参数

crop_size (tuple) – The relative ratio or absolute pixels of (width, height).
crop_type (str, optional) – One of “relative_range”, “relative”, “absolute”, “absolute_range”. “relative” randomly crops (h * crop_size[0], w * crop_size[1]) part from an input of size (h, w). “relative_range” uniformly samples relative crop size from range [crop_size[0], 1] and [crop_size[1], 1] for height and width respectively. “absolute” crops from an input with absolute size (crop_size[0], crop_size[1]). “absolute_range” uniformly samples crop_h in range [crop_size[0], min(h, crop_size[1])] and crop_w in range [crop_size[0], min(w, crop_size[1])]. Defaults to “absolute”.
allow_negative_crop (bool, optional) – Whether to allow a crop that does not contain any bbox area. Defaults to False.
recompute_bbox (bool, optional) – Whether to re-compute the boxes based on cropped instance masks. Defaults to False.
bbox_clip_border (bool, optional) – Whether clip the objects outside the border of the image. Defaults to True.

备注

If the image is smaller than the absolute crop size, return the
original image.
If the crop does not contain any gt-bbox region and allow_negative_crop is set to False, skip this image.

transform(results: dict) → Optional[dict][源代码]¶

Transform function to randomly crop images, bounding boxes, masks, semantic segmentation maps.

参数

results (dict) – Result dict from loading pipeline.

返回

Randomly cropped results, ‘img_shape’: key in result dict is updated according to crop size. None will be returned when there is no valid bbox after cropping.

返回类型

results (Union[dict, None])

class mmpose.datasets.transforms.bottomup_transforms.BottomupResize(input_size: Tuple[int, int], aug_scales: Optional[List[float]] = None, size_factor: int = 32, resize_mode: str = 'fit', pad_val: tuple = (0, 0, 0), use_udp: bool = False)[源代码]¶

Resize the image to the input size of the model. Optionally, the image can be resized to multiple sizes to build a image pyramid for multi-scale inference.

Required Keys:

img

ori_shape

Modified Keys:

img

img_shape

Added Keys:

input_size

warp_mat

aug_scale

参数

input_size (Tuple[int, int]) – The input size of the model in [w, h]. Note that the actually size of the resized image will be affected by resize_mode and size_factor, thus may not exactly equals to the input_size
aug_scales (List[float], optional) – The extra input scales for multi-scale testing. If given, the input image will be resized to different scales to build a image pyramid. And heatmaps from all scales will be aggregated to make final prediction. Defaults to None
size_factor (int) – The actual input size will be ceiled to a multiple of the size_factor value at both sides. Defaults to 16
resize_mode (str) –
The method to resize the image to the input size. Options are:
- 'fit': The image will be resized according to the
  relatively longer side with the aspect ratio kept. The resized image will entirely fits into the range of the input size
- 'expand': The image will be resized according to the
  relatively shorter side with the aspect ratio kept. The resized image will exceed the given input size at the longer side
use_udp (bool) – Whether use unbiased data processing. See `UDP (CVPR 2020)`_ for details. Defaults to False

transform(results: Dict) → Optional[dict][源代码]¶

The transform function of BottomupResize to perform photometric distortion on images.

See transform() method of BaseTransform for details.

参数: results (dict) – Result dict from the data pipeline.
返回: Result dict with images distorted.
返回类型: dict

class mmpose.datasets.transforms.formatting.PackPoseInputs(meta_keys=('id', 'img_id', 'img_path', 'category_id', 'crowd_index', 'ori_shape', 'img_shape', 'input_size', 'input_center', 'input_scale', 'flip', 'flip_direction', 'flip_indices', 'raw_ann_info', 'dataset_name'), pack_transformed=False)[源代码]¶

Pack the inputs data for pose estimation.

The img_meta item is always populated. The contents of the img_meta dictionary depends on meta_keys. By default it includes:

id: id of the data sample

img_id: id of the image

'category_id': the id of the instance category

img_path: path to the image file

crowd_index (optional): measure the crowding level of an image,
defined in CrowdPose dataset

ori_shape: original shape of the image as a tuple (h, w, c)

img_shape: shape of the image input to the network as a tuple (h, w). Note that images may be zero padded on the bottom/right if the batch tensor is larger than this shape.

input_size: the input size to the network

flip: a boolean indicating if image flip transform was used

flip_direction: the flipping direction

flip_indices: the indices of each keypoint’s symmetric keypoint

raw_ann_info (optional): raw annotation of the instance(s)

参数: meta_keys (Sequence[str], optional) – Meta keys which will be stored in :obj: PoseDataSample as meta info. Defaults to ('id', 'img_id', 'img_path', 'category_id', 'crowd_index, 'ori_shape', 'img_shape', 'input_size', 'input_center', 'input_scale', 'flip', 'flip_direction', 'flip_indices', 'raw_ann_info')

transform(results: dict) → dict[源代码]¶

Method to pack the input data.

参数

results (dict) – Result dict from the data pipeline.

返回

‘inputs’ (obj:torch.Tensor): The forward data of models.
’data_samples’ (obj:PoseDataSample): The annotation info of the
sample.

返回类型

dict

mmpose.datasets.transforms.formatting.image_to_tensor(img: Union[numpy.ndarray, Sequence[numpy.ndarray]]) → torch.Tensor[源代码]¶

Translate image or sequence of images to tensor. Multiple image tensors will be stacked.

参数: value (np.ndarray | Sequence[np.ndarray]) – The original image or image sequence
返回: The output tensor.
返回类型: torch.Tensor

mmpose.datasets.transforms.formatting.keypoints_to_tensor(keypoints: Union[numpy.ndarray, Sequence[numpy.ndarray]]) → torch.Tensor[源代码]¶

Translate keypoints or sequence of keypoints to tensor. Multiple keypoints tensors will be stacked.

参数: keypoints (np.ndarray | Sequence[np.ndarray]) – The keypoints or keypoints sequence.
返回: The output tensor.
返回类型: torch.Tensor

mmpose.structures¶

class mmpose.structures.MultilevelPixelData(*, metainfo: Optional[dict] = None, **kwargs)[源代码]¶

Data structure for multi-level pixel-wise annotations or predictions.

All data items in data_fields of MultilevelPixelData are lists of np.ndarray or torch.Tensor, and should meet the following requirements:

Have the same length, which is the number of levels
At each level, the data should have 3 dimensions in order of channel,
height and weight
At each level, the data should have the same height and weight

实际案例

>>> metainfo = dict(num_keypoints=17)
>>> sizes = [(64, 48), (128, 96), (256, 192)]
>>> heatmaps = [np.random.rand(17, h, w) for h, w in sizes]
>>> masks = [torch.rand(1, h, w) for h, w in sizes]
>>> data = MultilevelPixelData(metainfo=metainfo,
...                            heatmaps=heatmaps,
...                            masks=masks)

>>> # get data item
>>> heatmaps = data.heatmaps  # A list of 3 numpy.ndarrays
>>> masks = data.masks  # A list of 3 torch.Tensors

>>> # get level
>>> data_l0 = data[0]  # PixelData with fields 'heatmaps' and 'masks'
>>> data.nlevel
3

>>> # get shape
>>> data.shape
((64, 48), (128, 96), (256, 192))

>>> # set
>>> offset_maps = [torch.rand(2, h, w) for h, w in sizes]
>>> data.offset_maps = offset_maps

cpu() → mmpose.structures.multilevel_pixel_data.MultilevelPixelData[源代码]¶: Convert all tensors to CPU in data.

cuda() → mmpose.structures.multilevel_pixel_data.MultilevelPixelData[源代码]¶: Convert all tensors to GPU in data.

detach() → mmpose.structures.multilevel_pixel_data.MultilevelPixelData[源代码]¶: Detach all tensors in data.

property nlevel¶

Return the level number.

返回: The level number, or None if the data has not been assigned.
返回类型: Optional[int]

numpy() → mmpose.structures.multilevel_pixel_data.MultilevelPixelData[源代码]¶: Convert all tensor to np.narray in data.

pop(*args) → Any[源代码]¶: pop property in data and metainfo as the same as python.

set_data(data: dict) → None[源代码]¶

Set or change key-value pairs in data_field by parameter data.

参数: data (dict) – A dict contains annotations of image or model predictions.

set_field(value: Any, name: str, dtype: Optional[Union[Type, Tuple[Type, ...]]] = None, field_type: str = 'data') → None[源代码]¶: Special method for set union field, used as property.setter functions.

property shape: Optional[Tuple[Tuple]]¶

Get the shape of multi-level pixel data.

返回: A tuple of data shape at each level, or None if the data has not been assigned.
返回类型: Optional[tuple]

to(*args, **kwargs) → mmpose.structures.multilevel_pixel_data.MultilevelPixelData[源代码]¶: Apply same name function to all tensors in data_fields.

to_tensor() → mmpose.structures.multilevel_pixel_data.MultilevelPixelData[源代码]¶: Convert all tensor to np.narray in data.

class mmpose.structures.PoseDataSample(*, metainfo: Optional[dict] = None, **kwargs)[源代码]¶

The base data structure of MMPose that is used as the interface between modules.

The attributes of PoseDataSample includes:

``gt_instances``(InstanceData): Ground truth of instances with
keypoint annotations

``pred_instances``(InstanceData): Instances with keypoint
predictions

``gt_fields``(PixelData): Ground truth of spatial distribution
annotations like keypoint heatmaps and part affine fields (PAF)

``pred_fields``(PixelData): Predictions of spatial distributions

实际案例

>>> import torch
>>> from mmengine.structures import InstanceData, PixelData
>>> from mmpose.structures import PoseDataSample

>>> pose_meta = dict(img_shape=(800, 1216),
...                  crop_size=(256, 192),
...                  heatmap_size=(64, 48))
>>> gt_instances = InstanceData()
>>> gt_instances.bboxes = torch.rand((1, 4))
>>> gt_instances.keypoints = torch.rand((1, 17, 2))
>>> gt_instances.keypoints_visible = torch.rand((1, 17, 1))
>>> gt_fields = PixelData()
>>> gt_fields.heatmaps = torch.rand((17, 64, 48))

>>> data_sample = PoseDataSample(gt_instances=gt_instances,
...                              gt_fields=gt_fields,
...                              metainfo=pose_meta)
>>> assert 'img_shape' in data_sample
>>> len(data_sample.gt_instances)
1

mmpose.structures.bbox_clip_border(bbox: numpy.ndarray, shape: Tuple[int, int]) → numpy.ndarray[源代码]¶

Clip bounding box coordinates to fit within a specified shape.

参数

bbox (np.ndarray) – Bounding box coordinates of shape (…, 4) or (…, 2).
shape (Tuple[int, int]) – Shape of the image to which bounding boxes are being clipped in the format of (w, h)

返回

Clipped bounding box coordinates.

返回类型

np.ndarray

示例

>>> bbox = np.array([[10, 20, 30, 40], [40, 50, 80, 90]])
>>> shape = (50, 50)  # Example image shape
>>> clipped_bbox = bbox_clip_border(bbox, shape)

mmpose.structures.bbox_corner2xyxy(bbox: numpy.ndarray)[源代码]¶

Convert bounding boxes from corner format to xyxy format.

Given a numpy array containing bounding boxes in the corner format (four corner points for each box), this function converts the bounding boxes to the (xmin, ymin, xmax, ymax) format.

参数

bbox (numpy.ndarray) – Input array of shape (N, 4, 2) representing N bounding boxes.

返回

An array of shape (N, 4) containing the bounding: boxes in xyxy format.

返回类型

numpy.ndarray

示例

corners = np.array([[[0, 0], [100, 0], [100, 50], [0, 50]],: [[10, 20], [200, 20], [200, 150], [10, 150]]])

bbox = bbox_corner2xyxy(corners)

mmpose.structures.bbox_cs2xywh(center: numpy.ndarray, scale: numpy.ndarray, padding: float = 1.0) → numpy.ndarray[源代码]¶

Transform the bbox format from (center, scale) to (x,y,w,h).

参数

center (ndarray) – BBox center (x, y) in shape (2,) or (n, 2)
scale (ndarray) – BBox scale (w, h) in shape (2,) or (n, 2)
padding (float) – BBox padding factor that will be multilied to scale. Default: 1.0

返回

BBox (x, y, w, h) in shape (4, ) or (n, 4)

返回类型

ndarray[float32]

mmpose.structures.bbox_cs2xyxy(center: numpy.ndarray, scale: numpy.ndarray, padding: float = 1.0) → numpy.ndarray[源代码]¶

Transform the bbox format from (center, scale) to (x1,y1,x2,y2).

参数

center (ndarray) – BBox center (x, y) in shape (2,) or (n, 2)
scale (ndarray) – BBox scale (w, h) in shape (2,) or (n, 2)
padding (float) – BBox padding factor that will be multilied to scale. Default: 1.0

返回

BBox (x1, y1, x2, y2) in shape (4, ) or (n, 4)

返回类型

ndarray[float32]

mmpose.structures.bbox_xywh2cs(bbox: numpy.ndarray, padding: float = 1.0) → Tuple[numpy.ndarray, numpy.ndarray][源代码]¶

Transform the bbox format from (x,y,w,h) into (center, scale)

参数

bbox (ndarray) – Bounding box(es) in shape (4,) or (n, 4), formatted as (x, y, h, w)
padding (float) – BBox padding factor that will be multilied to scale. Default: 1.0

返回

A tuple containing center and scale. - np.ndarray[float32]: Center (x, y) of the bbox in shape (2,) or

(n, 2)

np.ndarray[float32]: Scale (w, h) of the bbox in shape (2,) or
(n, 2)

返回类型

tuple

mmpose.structures.bbox_xywh2xyxy(bbox_xywh: numpy.ndarray) → numpy.ndarray[源代码]¶

Transform the bbox format from xywh to x1y1x2y2.

参数

bbox_xywh (ndarray) – Bounding boxes (with scores), shaped (n, 4) or (n, 5). (left, top, width, height, [score])

返回

Bounding boxes (with scores), shaped (n, 4) or: (n, 5). (left, top, right, bottom, [score])

返回类型

np.ndarray

mmpose.structures.bbox_xyxy2corner(bbox: numpy.ndarray)[源代码]¶

Convert bounding boxes from xyxy format to corner format.

Given a numpy array containing bounding boxes in the format (xmin, ymin, xmax, ymax), this function converts the bounding boxes to the corner format, where each box is represented by four corner points (top-left, top-right, bottom-right, bottom-left).

参数

bbox (numpy.ndarray) – Input array of shape (N, 4) representing N bounding boxes.

返回

An array of shape (N, 4, 2) containing the corner: points of the bounding boxes.

返回类型

numpy.ndarray

示例

bbox = np.array([[0, 0, 100, 50], [10, 20, 200, 150]]) corners = bbox_xyxy2corner(bbox)

mmpose.structures.bbox_xyxy2cs(bbox: numpy.ndarray, padding: float = 1.0) → Tuple[numpy.ndarray, numpy.ndarray][源代码]¶

Transform the bbox format from (x,y,w,h) into (center, scale)

参数

bbox (ndarray) – Bounding box(es) in shape (4,) or (n, 4), formatted as (left, top, right, bottom)
padding (float) – BBox padding factor that will be multilied to scale. Default: 1.0

返回

A tuple containing center and scale. - np.ndarray[float32]: Center (x, y) of the bbox in shape (2,) or

(n, 2)

np.ndarray[float32]: Scale (w, h) of the bbox in shape (2,) or
(n, 2)

返回类型

tuple

mmpose.structures.bbox_xyxy2xywh(bbox_xyxy: numpy.ndarray) → numpy.ndarray[源代码]¶

Transform the bbox format from x1y1x2y2 to xywh.

参数

bbox_xyxy (np.ndarray) – Bounding boxes (with scores), shaped (n, 4) or (n, 5). (left, top, right, bottom, [score])

返回

Bounding boxes (with scores),: shaped (n, 4) or (n, 5). (left, top, width, height, [score])

返回类型

np.ndarray

mmpose.structures.flip_bbox(bbox: numpy.ndarray, image_size: Tuple[int, int], bbox_format: str = 'xywh', direction: str = 'horizontal') → numpy.ndarray[源代码]¶

Flip the bbox in the given direction.

参数

bbox (np.ndarray) – The bounding boxes. The shape should be (…, 4) if bbox_format is 'xyxy' or 'xywh', and (…, 2) if bbox_format is 'center'
image_size (tuple) – The image shape in [w, h]
bbox_format (str) – The bbox format. Options are 'xywh', 'xyxy' and 'center'.
direction (str) – The flip direction. Options are 'horizontal', 'vertical' and 'diagonal'. Defaults to 'horizontal'

返回

The flipped bounding boxes.

返回类型

np.ndarray

mmpose.structures.flip_keypoints(keypoints: numpy.ndarray, keypoints_visible: Optional[numpy.ndarray], image_size: Tuple[int, int], flip_indices: List[int], direction: str = 'horizontal') → Tuple[numpy.ndarray, Optional[numpy.ndarray]][源代码]¶

Flip keypoints in the given direction.

备注

keypoint number: K
keypoint dimension: D

参数

keypoints (np.ndarray) – Keypoints in shape (…, K, D)
keypoints_visible (np.ndarray, optional) – The visibility of keypoints in shape (…, K, 1) or (…, K, 2). Set None if the keypoint visibility is unavailable
image_size (tuple) – The image shape in [w, h]
flip_indices (List[int]) – The indices of each keypoint’s symmetric keypoint
direction (str) – The flip direction. Options are 'horizontal', 'vertical' and 'diagonal'. Defaults to 'horizontal'

返回

keypoints_flipped (np.ndarray): Flipped keypoints in shape
(…, K, D)
keypoints_visible_flipped (np.ndarray, optional): Flipped keypoints’
visibility in shape (…, K, 1) or (…, K, 2). Return None if the input keypoints_visible is None

返回类型

tuple

mmpose.structures.get_pers_warp_matrix(center: numpy.ndarray, translate: numpy.ndarray, scale: float, rot: float, shear: numpy.ndarray) → numpy.ndarray[源代码]¶

Compute a perspective warp matrix based on specified transformations.

参数

center (np.ndarray) – Center of the transformation.
translate (np.ndarray) – Translation vector.
scale (float) – Scaling factor.
rot (float) – Rotation angle in degrees.
shear (np.ndarray) – Shearing angles in degrees along x and y axes.

返回

Perspective warp matrix.

返回类型

np.ndarray

示例

>>> center = np.array([0, 0])
>>> translate = np.array([10, 20])
>>> scale = 1.2
>>> rot = 30.0
>>> shear = np.array([15.0, 0.0])
>>> warp_matrix = get_pers_warp_matrix(center, translate,
                                       scale, rot, shear)

mmpose.structures.get_udp_warp_matrix(center: numpy.ndarray, scale: numpy.ndarray, rot: float, output_size: Tuple[int, int]) → numpy.ndarray[源代码]¶

Calculate the affine transformation matrix under the unbiased constraint. See `UDP (CVPR 2020)`_ for details.

备注

The bbox number: N

参数

center (np.ndarray[2, ]) – Center of the bounding box (x, y).
scale (np.ndarray[2, ]) – Scale of the bounding box wrt [width, height].
rot (float) – Rotation angle (degree).
output_size (tuple) – Size ([w, h]) of the output image

返回

A 2x3 transformation matrix

返回类型

np.ndarray

mmpose.structures.get_warp_matrix(center: numpy.ndarray, scale: numpy.ndarray, rot: float, output_size: Tuple[int, int], shift: Tuple[float, float] = (0.0, 0.0), inv: bool = False, fix_aspect_ratio: bool = True) → numpy.ndarray[源代码]¶

Calculate the affine transformation matrix that can warp the bbox area in the input image to the output size.

参数

center (np.ndarray[2, ]) – Center of the bounding box (x, y).
scale (np.ndarray[2, ]) – Scale of the bounding box wrt [width, height].
rot (float) – Rotation angle (degree).
output_size (np.ndarray[2, ] | list(2,)) – Size of the destination heatmaps.
shift (0-100%) – Shift translation ratio wrt the width/height. Default (0., 0.).
inv (bool) – Option to inverse the affine transform direction. (inv=False: src->dst or inv=True: dst->src)
fix_aspect_ratio (bool) – Whether to fix aspect ratio during transform. Defaults to True.

返回

A 2x3 transformation matrix

返回类型

np.ndarray

mmpose.structures.keypoint_clip_border(keypoints: numpy.ndarray, keypoints_visible: numpy.ndarray, shape: Tuple[int, int]) → Tuple[numpy.ndarray, numpy.ndarray][源代码]¶

Set the visibility values for keypoints outside the image border.

参数

keypoints (np.ndarray) – Input keypoints coordinates.
keypoints_visible (np.ndarray) – Visibility values of keypoints.
shape (Tuple[int, int]) – Shape of the image to which keypoints are being clipped in the format of (w, h).

备注

This function sets the visibility values of keypoints that fall outside: the specified frame border to zero (0.0).

mmpose.structures.merge_data_samples(data_samples: List[mmpose.structures.pose_data_sample.PoseDataSample]) → mmpose.structures.pose_data_sample.PoseDataSample[源代码]¶

Merge the given data samples into a single data sample.

This function can be used to merge the top-down predictions with bboxes from the same image. The merged data sample will contain all instances from the input data samples, and the identical metainfo with the first input data sample.

参数: data_samples (List[PoseDataSample]) – The data samples to merge
返回: The merged data sample.
返回类型: PoseDataSample

mmpose.structures.revert_heatmap(heatmap, input_center, input_scale, img_shape)[源代码]¶

Revert predicted heatmap on the original image.

参数

heatmap (np.ndarray or torch.tensor) – predicted heatmap.
input_center (np.ndarray) – bounding box center coordinate.
input_scale (np.ndarray) – bounding box scale.
img_shape (tuple or list) – size of original image.

mmpose.structures.split_instances(instances: mmengine.structures.instance_data.InstanceData) → List[mmengine.structures.instance_data.InstanceData][源代码]¶: Convert instances into a list where each element is a dict that contains information about one instance.

bbox¶

mmpose.structures.bbox.bbox_clip_border(bbox: numpy.ndarray, shape: Tuple[int, int]) → numpy.ndarray[源代码]¶

Clip bounding box coordinates to fit within a specified shape.

参数

bbox (np.ndarray) – Bounding box coordinates of shape (…, 4) or (…, 2).
shape (Tuple[int, int]) – Shape of the image to which bounding boxes are being clipped in the format of (w, h)

返回

Clipped bounding box coordinates.

返回类型

np.ndarray

示例

>>> bbox = np.array([[10, 20, 30, 40], [40, 50, 80, 90]])
>>> shape = (50, 50)  # Example image shape
>>> clipped_bbox = bbox_clip_border(bbox, shape)

mmpose.structures.bbox.bbox_corner2xyxy(bbox: numpy.ndarray)[源代码]¶

Convert bounding boxes from corner format to xyxy format.

Given a numpy array containing bounding boxes in the corner format (four corner points for each box), this function converts the bounding boxes to the (xmin, ymin, xmax, ymax) format.

参数

bbox (numpy.ndarray) – Input array of shape (N, 4, 2) representing N bounding boxes.

返回

An array of shape (N, 4) containing the bounding: boxes in xyxy format.

返回类型

numpy.ndarray

示例

corners = np.array([[[0, 0], [100, 0], [100, 50], [0, 50]],: [[10, 20], [200, 20], [200, 150], [10, 150]]])

bbox = bbox_corner2xyxy(corners)

mmpose.structures.bbox.bbox_cs2xywh(center: numpy.ndarray, scale: numpy.ndarray, padding: float = 1.0) → numpy.ndarray[源代码]¶

Transform the bbox format from (center, scale) to (x,y,w,h).

参数

center (ndarray) – BBox center (x, y) in shape (2,) or (n, 2)
scale (ndarray) – BBox scale (w, h) in shape (2,) or (n, 2)
padding (float) – BBox padding factor that will be multilied to scale. Default: 1.0

返回

BBox (x, y, w, h) in shape (4, ) or (n, 4)

返回类型

ndarray[float32]

mmpose.structures.bbox.bbox_cs2xyxy(center: numpy.ndarray, scale: numpy.ndarray, padding: float = 1.0) → numpy.ndarray[源代码]¶

Transform the bbox format from (center, scale) to (x1,y1,x2,y2).

参数

center (ndarray) – BBox center (x, y) in shape (2,) or (n, 2)
scale (ndarray) – BBox scale (w, h) in shape (2,) or (n, 2)
padding (float) – BBox padding factor that will be multilied to scale. Default: 1.0

返回

BBox (x1, y1, x2, y2) in shape (4, ) or (n, 4)

返回类型

ndarray[float32]

mmpose.structures.bbox.bbox_overlaps(bboxes1, bboxes2, mode='iou', is_aligned=False, eps=1e-06) → torch.Tensor[源代码]¶

Calculate overlap between two sets of bounding boxes.

参数

bboxes1 (torch.Tensor) – Bounding boxes of shape (…, m, 4) or empty.
bboxes2 (torch.Tensor) – Bounding boxes of shape (…, n, 4) or empty.
mode (str) – “iou” (intersection over union), “iof” (intersection over foreground), or “giou” (generalized intersection over union). Defaults to “iou”.
is_aligned (bool, optional) – If True, then m and n must be equal. Default False.
eps (float, optional) – A small constant added to the denominator for numerical stability. Default 1e-6.

返回

Overlap values of shape (…, m, n) if is_aligned is: False, else shape (…, m).

返回类型

torch.Tensor

示例

>>> bboxes1 = torch.FloatTensor([
>>>     [0, 0, 10, 10],
>>>     [10, 10, 20, 20],
>>>     [32, 32, 38, 42],
>>> ])
>>> bboxes2 = torch.FloatTensor([
>>>     [0, 0, 10, 20],
>>>     [0, 10, 10, 19],
>>>     [10, 10, 20, 20],
>>> ])
>>> overlaps = bbox_overlaps(bboxes1, bboxes2)
>>> assert overlaps.shape == (3, 3)
>>> overlaps = bbox_overlaps(bboxes1, bboxes2, is_aligned=True)
>>> assert overlaps.shape == (3, )

mmpose.structures.bbox.bbox_xywh2cs(bbox: numpy.ndarray, padding: float = 1.0) → Tuple[numpy.ndarray, numpy.ndarray][源代码]¶

Transform the bbox format from (x,y,w,h) into (center, scale)

参数

bbox (ndarray) – Bounding box(es) in shape (4,) or (n, 4), formatted as (x, y, h, w)
padding (float) – BBox padding factor that will be multilied to scale. Default: 1.0

返回

A tuple containing center and scale. - np.ndarray[float32]: Center (x, y) of the bbox in shape (2,) or

(n, 2)

np.ndarray[float32]: Scale (w, h) of the bbox in shape (2,) or
(n, 2)

返回类型

tuple

mmpose.structures.bbox.bbox_xywh2xyxy(bbox_xywh: numpy.ndarray) → numpy.ndarray[源代码]¶

Transform the bbox format from xywh to x1y1x2y2.

参数

bbox_xywh (ndarray) – Bounding boxes (with scores), shaped (n, 4) or (n, 5). (left, top, width, height, [score])

返回

Bounding boxes (with scores), shaped (n, 4) or: (n, 5). (left, top, right, bottom, [score])

返回类型

np.ndarray

mmpose.structures.bbox.bbox_xyxy2corner(bbox: numpy.ndarray)[源代码]¶

Convert bounding boxes from xyxy format to corner format.

Given a numpy array containing bounding boxes in the format (xmin, ymin, xmax, ymax), this function converts the bounding boxes to the corner format, where each box is represented by four corner points (top-left, top-right, bottom-right, bottom-left).

参数

bbox (numpy.ndarray) – Input array of shape (N, 4) representing N bounding boxes.

返回

An array of shape (N, 4, 2) containing the corner: points of the bounding boxes.

返回类型

numpy.ndarray

示例

bbox = np.array([[0, 0, 100, 50], [10, 20, 200, 150]]) corners = bbox_xyxy2corner(bbox)

mmpose.structures.bbox.bbox_xyxy2cs(bbox: numpy.ndarray, padding: float = 1.0) → Tuple[numpy.ndarray, numpy.ndarray][源代码]¶

Transform the bbox format from (x,y,w,h) into (center, scale)

参数

bbox (ndarray) – Bounding box(es) in shape (4,) or (n, 4), formatted as (left, top, right, bottom)
padding (float) – BBox padding factor that will be multilied to scale. Default: 1.0

返回

A tuple containing center and scale. - np.ndarray[float32]: Center (x, y) of the bbox in shape (2,) or

(n, 2)

np.ndarray[float32]: Scale (w, h) of the bbox in shape (2,) or
(n, 2)

返回类型

tuple

mmpose.structures.bbox.bbox_xyxy2xywh(bbox_xyxy: numpy.ndarray) → numpy.ndarray[源代码]¶

Transform the bbox format from x1y1x2y2 to xywh.

参数

bbox_xyxy (np.ndarray) – Bounding boxes (with scores), shaped (n, 4) or (n, 5). (left, top, right, bottom, [score])

返回

Bounding boxes (with scores),: shaped (n, 4) or (n, 5). (left, top, width, height, [score])

返回类型

np.ndarray

mmpose.structures.bbox.flip_bbox(bbox: numpy.ndarray, image_size: Tuple[int, int], bbox_format: str = 'xywh', direction: str = 'horizontal') → numpy.ndarray[源代码]¶

Flip the bbox in the given direction.

参数

bbox (np.ndarray) – The bounding boxes. The shape should be (…, 4) if bbox_format is 'xyxy' or 'xywh', and (…, 2) if bbox_format is 'center'
image_size (tuple) – The image shape in [w, h]
bbox_format (str) – The bbox format. Options are 'xywh', 'xyxy' and 'center'.
direction (str) – The flip direction. Options are 'horizontal', 'vertical' and 'diagonal'. Defaults to 'horizontal'

返回

The flipped bounding boxes.

返回类型

np.ndarray

mmpose.structures.bbox.get_pers_warp_matrix(center: numpy.ndarray, translate: numpy.ndarray, scale: float, rot: float, shear: numpy.ndarray) → numpy.ndarray[源代码]¶

Compute a perspective warp matrix based on specified transformations.

参数

center (np.ndarray) – Center of the transformation.
translate (np.ndarray) – Translation vector.
scale (float) – Scaling factor.
rot (float) – Rotation angle in degrees.
shear (np.ndarray) – Shearing angles in degrees along x and y axes.

返回

Perspective warp matrix.

返回类型

np.ndarray

示例

>>> center = np.array([0, 0])
>>> translate = np.array([10, 20])
>>> scale = 1.2
>>> rot = 30.0
>>> shear = np.array([15.0, 0.0])
>>> warp_matrix = get_pers_warp_matrix(center, translate,
                                       scale, rot, shear)

mmpose.structures.bbox.get_udp_warp_matrix(center: numpy.ndarray, scale: numpy.ndarray, rot: float, output_size: Tuple[int, int]) → numpy.ndarray[源代码]¶

Calculate the affine transformation matrix under the unbiased constraint. See `UDP (CVPR 2020)`_ for details.

备注

The bbox number: N

参数

center (np.ndarray[2, ]) – Center of the bounding box (x, y).
scale (np.ndarray[2, ]) – Scale of the bounding box wrt [width, height].
rot (float) – Rotation angle (degree).
output_size (tuple) – Size ([w, h]) of the output image

返回

A 2x3 transformation matrix

返回类型

np.ndarray

mmpose.structures.bbox.get_warp_matrix(center: numpy.ndarray, scale: numpy.ndarray, rot: float, output_size: Tuple[int, int], shift: Tuple[float, float] = (0.0, 0.0), inv: bool = False, fix_aspect_ratio: bool = True) → numpy.ndarray[源代码]¶

Calculate the affine transformation matrix that can warp the bbox area in the input image to the output size.

参数

center (np.ndarray[2, ]) – Center of the bounding box (x, y).
scale (np.ndarray[2, ]) – Scale of the bounding box wrt [width, height].
rot (float) – Rotation angle (degree).
output_size (np.ndarray[2, ] | list(2,)) – Size of the destination heatmaps.
shift (0-100%) – Shift translation ratio wrt the width/height. Default (0., 0.).
inv (bool) – Option to inverse the affine transform direction. (inv=False: src->dst or inv=True: dst->src)
fix_aspect_ratio (bool) – Whether to fix aspect ratio during transform. Defaults to True.

返回

A 2x3 transformation matrix

返回类型

np.ndarray

keypoint¶

mmpose.structures.keypoint.flip_keypoints(keypoints: numpy.ndarray, keypoints_visible: Optional[numpy.ndarray], image_size: Tuple[int, int], flip_indices: List[int], direction: str = 'horizontal') → Tuple[numpy.ndarray, Optional[numpy.ndarray]][源代码]¶

Flip keypoints in the given direction.

备注

keypoint number: K
keypoint dimension: D

参数

keypoints (np.ndarray) – Keypoints in shape (…, K, D)
keypoints_visible (np.ndarray, optional) – The visibility of keypoints in shape (…, K, 1) or (…, K, 2). Set None if the keypoint visibility is unavailable
image_size (tuple) – The image shape in [w, h]
flip_indices (List[int]) – The indices of each keypoint’s symmetric keypoint
direction (str) – The flip direction. Options are 'horizontal', 'vertical' and 'diagonal'. Defaults to 'horizontal'

返回

keypoints_flipped (np.ndarray): Flipped keypoints in shape
(…, K, D)
keypoints_visible_flipped (np.ndarray, optional): Flipped keypoints’
visibility in shape (…, K, 1) or (…, K, 2). Return None if the input keypoints_visible is None

返回类型

tuple

mmpose.structures.keypoint.flip_keypoints_custom_center(keypoints: numpy.ndarray, keypoints_visible: numpy.ndarray, flip_indices: List[int], center_mode: str = 'static', center_x: float = 0.5, center_index: Union[int, List] = 0)[源代码]¶

Flip human joints horizontally.

备注

num_keypoint: K
dimension: D

参数

keypoints (np.ndarray([..., K, D])) – Coordinates of keypoints.
keypoints_visible (np.ndarray([..., K])) – Visibility item of keypoints.
flip_indices (list[int]) – The indices to flip the keypoints.
center_mode (str) –
The mode to set the center location on the x-axis to flip around. Options are:
- static: use a static x value (see center_x also)
- root: use a root joint (see center_index also)
Defaults: 'static'.
center_x (float) – Set the x-axis location of the flip center. Only used when center_mode is 'static'. Defaults: 0.5.
center_index (Union[int, List]) – Set the index of the root joint, whose x location will be used as the flip center. Only used when center_mode is 'root'. Defaults: 0.

返回

Flipped joints.

返回类型

np.ndarray([…, K, C])

mmpose.structures.keypoint.keypoint_clip_border(keypoints: numpy.ndarray, keypoints_visible: numpy.ndarray, shape: Tuple[int, int]) → Tuple[numpy.ndarray, numpy.ndarray][源代码]¶

Set the visibility values for keypoints outside the image border.

参数

keypoints (np.ndarray) – Input keypoints coordinates.
keypoints_visible (np.ndarray) – Visibility values of keypoints.
shape (Tuple[int, int]) – Shape of the image to which keypoints are being clipped in the format of (w, h).

备注

This function sets the visibility values of keypoints that fall outside: the specified frame border to zero (0.0).

mmpose.registry¶

MMPose provides following registry nodes to support using modules across projects.

Each node is a child of the root registry in MMEngine. More details can be found at https://mmengine.readthedocs.io/en/latest/tutorials/registry.html.

mmpose.evaluation¶

metrics¶

class mmpose.evaluation.metrics.AUC(norm_factor: float = 30, num_thrs: int = 20, collect_device: str = 'cpu', prefix: Optional[str] = None)[源代码]¶

AUC evaluation metric.

Calculate the Area Under Curve (AUC) of keypoint PCK accuracy.

By altering the threshold percentage in the calculation of PCK accuracy, AUC can be generated to further evaluate the pose estimation algorithms.

备注

length of dataset: N
num_keypoints: K
number of keypoint dimensions: D (typically D = 2)

参数

norm_factor (float) – AUC normalization factor, Default: 30 (pixels).
num_thrs (int) – number of thresholds to calculate auc. Default: 20.
collect_device (str) – Device name used for collecting results from different ranks during distributed training. Must be 'cpu' or 'gpu'. Default: 'cpu'.
prefix (str, optional) – The prefix that will be added in the metric names to disambiguate homonymous metrics of different evaluators. If prefix is not provided in the argument, self.default_prefix will be used instead. Default: None.

compute_metrics(results: list) → Dict[str, float][源代码]¶

Compute the metrics from processed results.

参数: results (list) – The processed results of each batch.
返回: The computed metrics. The keys are the names of the metrics, and the values are corresponding results.
返回类型: Dict[str, float]

process(data_batch: Sequence[dict], data_samples: Sequence[dict]) → None[源代码]¶

Process one batch of data samples and predictions. The processed results should be stored in self.results, which will be used to compute the metrics when all batches have been processed.

参数

data_batch (Sequence[dict]) – A batch of data from the dataloader.
data_sample (Sequence[dict]) – A batch of outputs from the model.

class mmpose.evaluation.metrics.CocoMetric(ann_file: Optional[str] = None, use_area: bool = True, iou_type: str = 'keypoints', score_mode: str = 'bbox_keypoint', keypoint_score_thr: float = 0.2, nms_mode: str = 'oks_nms', nms_thr: float = 0.9, format_only: bool = False, pred_converter: Optional[Dict] = None, gt_converter: Optional[Dict] = None, outfile_prefix: Optional[str] = None, collect_device: str = 'cpu', prefix: Optional[str] = None)[源代码]¶

COCO pose estimation task evaluation metric.

Evaluate AR, AP, and mAP for keypoint detection tasks. Support COCO dataset and other datasets in COCO format. Please refer to COCO keypoint evaluation for more details.

参数

ann_file (str, optional) – Path to the coco format annotation file. If not specified, ground truth annotations from the dataset will be converted to coco format. Defaults to None
use_area (bool) – Whether to use 'area' message in the annotations. If the ground truth annotations (e.g. CrowdPose, AIC) do not have the field 'area', please set use_area=False. Defaults to True
iou_type (str) – The same parameter as iouType in xtcocotools.COCOeval, which can be 'keypoints', or 'keypoints_crowd' (used in CrowdPose dataset). Defaults to 'keypoints'
score_mode (str) –
The mode to score the prediction results which should be one of the following options:
- 'bbox': Take the score of bbox as the score of the
  prediction results.
- 'bbox_keypoint': Use keypoint score to rescore the
  prediction results.
- 'bbox_rle': Use rle_score to rescore the
  prediction results.
Defaults to ``’bbox_keypoint’`
keypoint_score_thr (float) – The threshold of keypoint score. The keypoints with score lower than it will not be included to rescore the prediction results. Valid only when score_mode is bbox_keypoint. Defaults to 0.2
nms_mode (str) –
The mode to perform Non-Maximum Suppression (NMS), which should be one of the following options:
- 'oks_nms': Use Object Keypoint Similarity (OKS) to
  perform NMS.
- 'soft_oks_nms': Use Object Keypoint Similarity (OKS)
  to perform soft NMS.
- 'none': Do not perform NMS. Typically for bottomup mode
  output.
Defaults to ``’oks_nms’`
nms_thr (float) – The Object Keypoint Similarity (OKS) threshold used in NMS when nms_mode is 'oks_nms' or 'soft_oks_nms'. Will retain the prediction results with OKS lower than nms_thr. Defaults to 0.9
format_only (bool) – Whether only format the output results without doing quantitative evaluation. This is designed for the need of test submission when the ground truth annotations are absent. If set to True, outfile_prefix should specify the path to store the output results. Defaults to False
pred_converter (dict, optional) – Config dictionary for the prediction converter. The dictionary has the same parameters as ‘KeypointConverter’. Defaults to None.
gt_converter (dict, optional) – Config dictionary for the ground truth converter. The dictionary has the same parameters as ‘KeypointConverter’. Defaults to None.
outfile_prefix (str | None) – The prefix of json files. It includes the file path and the prefix of filename, e.g., 'a/b/prefix'. If not specified, a temp file will be created. Defaults to None
collect_device (str) – Device name used for collecting results from different ranks during distributed training. Must be 'cpu' or 'gpu'. Defaults to 'cpu'
prefix (str, optional) – The prefix that will be added in the metric names to disambiguate homonymous metrics of different evaluators. If prefix is not provided in the argument, self.default_prefix will be used instead. Defaults to None

compute_metrics(results: list) → Dict[str, float][源代码]¶

Compute the metrics from processed results.

参数: results (list) – The processed results of each batch.
返回: The computed metrics. The keys are the names of the metrics, and the values are corresponding results.
返回类型: Dict[str, float]

property dataset_meta: Optional[dict]¶

Meta info of the dataset.

Type: Optional[dict]

gt_to_coco_json(gt_dicts: Sequence[dict], outfile_prefix: str) → str[源代码]¶

Convert ground truth to coco format json file.

参数

gt_dicts (Sequence[dict]) –
Ground truth of the dataset. Each dict contains the ground truth information about the data sample. Required keys of the each gt_dict in gt_dicts:
- img_id: image id of the data sample
- width: original image width
- height: original image height
- raw_ann_info: the raw annotation information
Optional keys:
- crowd_index: measure the crowding level of an image,
  defined in CrowdPose dataset
It is worth mentioning that, in order to compute CocoMetric, there are some required keys in the raw_ann_info:
- id: the id to distinguish different annotations
- image_id: the image id of this annotation
- category_id: the category of the instance.
- bbox: the object bounding box
- keypoints: the keypoints cooridinates along with their
  visibilities. Note that it need to be aligned with the official COCO format, e.g., a list with length N * 3, in which N is the number of keypoints. And each triplet represent the [x, y, visible] of the keypoint.
- iscrowd: indicating whether the annotation is a crowd.
  It is useful when matching the detection results to the ground truth.
There are some optional keys as well:
- area: it is necessary when self.use_area is True
- num_keypoints: it is necessary when self.iou_type
  is set as keypoints_crowd.
outfile_prefix (str) – The filename prefix of the json files. If the prefix is “somepath/xxx”, the json file will be named “somepath/xxx.gt.json”.

返回

The filename of the json file.

返回类型

str

process(data_batch: Sequence[dict], data_samples: Sequence[dict]) → None[源代码]¶

Process one batch of data samples and predictions. The processed results should be stored in self.results, which will be used to compute the metrics when all batches have been processed.

参数

data_batch (Sequence[dict]) – A batch of data from the dataloader.
data_samples (Sequence[dict]) –
A batch of outputs from the model, each of which has the following keys:
- ’id’: The id of the sample
- ’img_id’: The image_id of the sample
- ’pred_instances’: The prediction results of instance(s)

results2json(keypoints: Dict[int, list], outfile_prefix: str) → str[源代码]¶

Dump the keypoint detection results to a COCO style json file.

参数

keypoints (Dict[int, list]) – Keypoint detection results of the dataset.
outfile_prefix (str) – The filename prefix of the json files. If the prefix is “somepath/xxx”, the json files will be named “somepath/xxx.keypoints.json”,

返回

The json file name of keypoint results.

返回类型

str

class mmpose.evaluation.metrics.CocoWholeBodyMetric(ann_file: Optional[str] = None, use_area: bool = True, iou_type: str = 'keypoints', score_mode: str = 'bbox_keypoint', keypoint_score_thr: float = 0.2, nms_mode: str = 'oks_nms', nms_thr: float = 0.9, format_only: bool = False, pred_converter: Optional[Dict] = None, gt_converter: Optional[Dict] = None, outfile_prefix: Optional[str] = None, collect_device: str = 'cpu', prefix: Optional[str] = None)[源代码]¶

COCO-WholeBody evaluation metric.

Evaluate AR, AP, and mAP for COCO-WholeBody keypoint detection tasks. Support COCO-WholeBody dataset. Please refer to COCO keypoint evaluation for more details.

参数

ann_file (str, optional) – Path to the coco format annotation file. If not specified, ground truth annotations from the dataset will be converted to coco format. Defaults to None
use_area (bool) – Whether to use 'area' message in the annotations. If the ground truth annotations (e.g. CrowdPose, AIC) do not have the field 'area', please set use_area=False. Defaults to True
iou_type (str) – The same parameter as iouType in xtcocotools.COCOeval, which can be 'keypoints', or 'keypoints_crowd' (used in CrowdPose dataset). Defaults to 'keypoints'
score_mode (str) –
The mode to score the prediction results which should be one of the following options:
- 'bbox': Take the score of bbox as the score of the
  prediction results.
- 'bbox_keypoint': Use keypoint score to rescore the
  prediction results.
- 'bbox_rle': Use rle_score to rescore the
  prediction results.
Defaults to ``’bbox_keypoint’`
keypoint_score_thr (float) – The threshold of keypoint score. The keypoints with score lower than it will not be included to rescore the prediction results. Valid only when score_mode is bbox_keypoint. Defaults to 0.2
nms_mode (str) –
The mode to perform Non-Maximum Suppression (NMS), which should be one of the following options:
- 'oks_nms': Use Object Keypoint Similarity (OKS) to
  perform NMS.
- 'soft_oks_nms': Use Object Keypoint Similarity (OKS)
  to perform soft NMS.
- 'none': Do not perform NMS. Typically for bottomup mode
  output.
Defaults to ``’oks_nms’`
nms_thr (float) – The Object Keypoint Similarity (OKS) threshold used in NMS when nms_mode is 'oks_nms' or 'soft_oks_nms'. Will retain the prediction results with OKS lower than nms_thr. Defaults to 0.9
format_only (bool) – Whether only format the output results without doing quantitative evaluation. This is designed for the need of test submission when the ground truth annotations are absent. If set to True, outfile_prefix should specify the path to store the output results. Defaults to False
outfile_prefix (str | None) – The prefix of json files. It includes the file path and the prefix of filename, e.g., 'a/b/prefix'. If not specified, a temp file will be created. Defaults to None
**kwargs – Keyword parameters passed to mmeval.BaseMetric

gt_to_coco_json(gt_dicts: Sequence[dict], outfile_prefix: str) → str[源代码]¶

Convert ground truth to coco format json file.

参数

gt_dicts (Sequence[dict]) –
Ground truth of the dataset. Each dict contains the ground truth information about the data sample. Required keys of the each gt_dict in gt_dicts:
- img_id: image id of the data sample
- width: original image width
- height: original image height
- raw_ann_info: the raw annotation information
Optional keys:
- crowd_index: measure the crowding level of an image,
  defined in CrowdPose dataset
It is worth mentioning that, in order to compute CocoMetric, there are some required keys in the raw_ann_info:
- id: the id to distinguish different annotations
- image_id: the image id of this annotation
- category_id: the category of the instance.
- bbox: the object bounding box
- keypoints: the keypoints cooridinates along with their
  visibilities. Note that it need to be aligned with the official COCO format, e.g., a list with length N * 3, in which N is the number of keypoints. And each triplet represent the [x, y, visible] of the keypoint.
- ’keypoints’
- iscrowd: indicating whether the annotation is a crowd.
  It is useful when matching the detection results to the ground truth.
There are some optional keys as well:
- area: it is necessary when self.use_area is True
- num_keypoints: it is necessary when self.iou_type
  is set as keypoints_crowd.
outfile_prefix (str) – The filename prefix of the json files. If the prefix is “somepath/xxx”, the json file will be named “somepath/xxx.gt.json”.

返回

The filename of the json file.

返回类型

str

results2json(keypoints: Dict[int, list], outfile_prefix: str) → str[源代码]¶

Dump the keypoint detection results to a COCO style json file.

参数

keypoints (Dict[int, list]) – Keypoint detection results of the dataset.
outfile_prefix (str) – The filename prefix of the json files. If the prefix is “somepath/xxx”, the json files will be named “somepath/xxx.keypoints.json”,

返回

The json file name of keypoint results.

返回类型

str

class mmpose.evaluation.metrics.EPE(collect_device: str = 'cpu', prefix: Optional[str] = None, collect_dir: Optional[str] = None)[源代码]¶

EPE evaluation metric.

Calculate the end-point error (EPE) of keypoints.

备注

length of dataset: N
num_keypoints: K
number of keypoint dimensions: D (typically D = 2)

参数

collect_device (str) – Device name used for collecting results from different ranks during distributed training. Must be 'cpu' or 'gpu'. Default: 'cpu'.
prefix (str, optional) – The prefix that will be added in the metric names to disambiguate homonymous metrics of different evaluators. If prefix is not provided in the argument, self.default_prefix will be used instead. Default: None.

compute_metrics(results: list) → Dict[str, float][源代码]¶

Compute the metrics from processed results.

参数: results (list) – The processed results of each batch.
返回: The computed metrics. The keys are the names of the metrics, and the values are corresponding results.
返回类型: Dict[str, float]

process(data_batch: Sequence[dict], data_samples: Sequence[dict]) → None[源代码]¶

Process one batch of data samples and predictions. The processed results should be stored in self.results, which will be used to compute the metrics when all batches have been processed.

参数

data_batch (Sequence[dict]) – A batch of data from the dataloader.
data_samples (Sequence[dict]) – A batch of outputs from the model.

class mmpose.evaluation.metrics.InterHandMetric(modes: List[str] = ['MPJPE', 'MRRPE', 'HandednessAcc'], collect_device: str = 'cpu', prefix: Optional[str] = None)[源代码]¶

compute_metrics(results: list) → Dict[str, float][源代码]¶

Compute the metrics from processed results.

参数: results (list) – The processed results of each batch.
返回: The computed metrics. The keys are the names of the metrics, and the values are corresponding results.
返回类型: Dict[str, float]

process(data_batch: Sequence[dict], data_samples: Sequence[dict]) → None[源代码]¶

Process one batch of data samples and predictions. The processed results should be stored in self.results, which will be used to compute the metrics when all batches have been processed.

参数

data_batch (Sequence[dict]) – A batch of data from the dataloader.
data_samples (Sequence[dict]) – A batch of outputs from the model.

class mmpose.evaluation.metrics.JhmdbPCKAccuracy(thr: float = 0.05, norm_item: Union[str, Sequence[str]] = 'bbox', collect_device: str = 'cpu', prefix: Optional[str] = None)[源代码]¶

PCK accuracy evaluation metric for Jhmdb dataset.

Calculate the pose accuracy of Percentage of Correct Keypoints (PCK) for each individual keypoint and the averaged accuracy across all keypoints. PCK metric measures accuracy of the localization of the body joints. The distances between predicted positions and the ground-truth ones are typically normalized by the person bounding box size. The threshold (thr) of the normalized distance is commonly set as 0.05, 0.1 or 0.2 etc.

备注

length of dataset: N
num_keypoints: K
number of keypoint dimensions: D (typically D = 2)

参数

thr (float) – Threshold of PCK calculation. Default: 0.05.
norm_item (str | Sequence[str]) – The item used for normalization. Valid items include ‘bbox’, ‘head’, ‘torso’, which correspond to ‘PCK’, ‘PCKh’ and ‘tPCK’ respectively. Default: 'bbox'.
collect_device (str) – Device name used for collecting results from different ranks during distributed training. Must be 'cpu' or 'gpu'. Default: 'cpu'.
prefix (str, optional) – The prefix that will be added in the metric names to disambiguate homonymous metrics of different evaluators. If prefix is not provided in the argument, self.default_prefix will be used instead. Default: None.

实际案例

>>> from mmpose.evaluation.metrics import JhmdbPCKAccuracy
>>> import numpy as np
>>> from mmengine.structures import InstanceData
>>> num_keypoints = 15
>>> keypoints = np.random.random((1, num_keypoints, 2)) * 10
>>> gt_instances = InstanceData()
>>> gt_instances.keypoints = keypoints
>>> gt_instances.keypoints_visible = np.ones(
...     (1, num_keypoints, 1)).astype(bool)
>>> gt_instances.bboxes = np.random.random((1, 4)) * 20
>>> gt_instances.head_size = np.random.random((1, 1)) * 10
>>> pred_instances = InstanceData()
>>> pred_instances.keypoints = keypoints
>>> data_sample = {
...     'gt_instances': gt_instances.to_dict(),
...     'pred_instances': pred_instances.to_dict(),
... }
>>> data_samples = [data_sample]
>>> data_batch = [{'inputs': None}]
>>> jhmdb_pck_metric = JhmdbPCKAccuracy(thr=0.2, norm_item=['bbox', 'torso'])
... UserWarning: The prefix is not set in metric class JhmdbPCKAccuracy.
>>> jhmdb_pck_metric.process(data_batch, data_samples)
>>> jhmdb_pck_metric.evaluate(1)
10/26 17:48:09 - mmengine - INFO - Evaluating JhmdbPCKAccuracy (normalized by ``"bbox_size"``)...  # noqa
10/26 17:48:09 - mmengine - INFO - Evaluating JhmdbPCKAccuracy (normalized by ``"torso_size"``)...  # noqa
{'Head PCK': 1.0, 'Sho PCK': 1.0, 'Elb PCK': 1.0, 'Wri PCK': 1.0,
'Hip PCK': 1.0, 'Knee PCK': 1.0, 'Ank PCK': 1.0, 'PCK': 1.0,
'Head tPCK': 1.0, 'Sho tPCK': 1.0, 'Elb tPCK': 1.0, 'Wri tPCK': 1.0,
'Hip tPCK': 1.0, 'Knee tPCK': 1.0, 'Ank tPCK': 1.0, 'tPCK': 1.0}

compute_metrics(results: list) → Dict[str, float][源代码]¶

Compute the metrics from processed results.

参数

results (list) – The processed results of each batch.

返回

The computed metrics. The keys are the names of the metrics, and the values are corresponding results. If ‘bbox’ in self.norm_item, the returned results are the pck accuracy normalized by bbox_size, which have the following keys:

’Head PCK’: The PCK of head

’Sho PCK’: The PCK of shoulder

’Elb PCK’: The PCK of elbow

’Wri PCK’: The PCK of wrist

’Hip PCK’: The PCK of hip

’Knee PCK’: The PCK of knee

’Ank PCK’: The PCK of ankle

’PCK’: The mean PCK over all keypoints

If ‘torso’ in self.norm_item, the returned results are the pck accuracy normalized by torso_size, which have the following keys:

’Head tPCK’: The PCK of head

’Sho tPCK’: The PCK of shoulder

’Elb tPCK’: The PCK of elbow

’Wri tPCK’: The PCK of wrist

’Hip tPCK’: The PCK of hip

’Knee tPCK’: The PCK of knee

’Ank tPCK’: The PCK of ankle

’tPCK’: The mean PCK over all keypoints

返回类型

Dict[str, float]

class mmpose.evaluation.metrics.KeypointPartitionMetric(metric: dict, partitions: dict)[源代码]¶

Wrapper metric for evaluating pose metric on user-defined body parts.

Sometimes one may be interested in the performance of a pose model on certain body parts rather than on all the keypoints. For example, CocoWholeBodyMetric evaluates coco metric on body, foot, face, lefthand and righthand. However, CocoWholeBodyMetric cannot be applied to arbitrary custom datasets. This wrapper metric solves this problem.

Supported metrics:

CocoMetric Note 1: all keypoint ground truth should be stored in: keypoints not other data fields. Note 2: ann_file is not supported, it will be ignored. Note 3: score_mode other than ‘bbox’ may produce results different from the CocoWholebodyMetric. Note 4: nms_mode other than ‘none’ may produce results different from the CocoWholebodyMetric.
PCKAccuracy Note 1: data fields required by PCKAccuracy should: be provided, such as bbox, head_size, etc. Note 2: In terms of
‘torso’, since it is specifically designed for JhmdbDataset, it is: not recommended to use it for other datasets.

AUC supported without limitations. EPE supported without limitations. NME only norm_mode = ‘use_norm_item’ is supported, ‘keypoint_distance’ is incompatible with KeypointPartitionMetric.

Incompatible metrics:

The following metrics are dataset specific metrics:: CocoWholeBodyMetric MpiiPCKAccuracy JhmdbPCKAccuracy PoseTrack18Metric

Keypoint partitioning is included in these metrics.

参数

metric (dict) – arguments to instantiate a metric, please refer to the arguments required by the metric of your choice.
partitions (dict) –
definition of body partitions. For example, if we have 10 keypoints in total, the first 7 keypoints belong to body and the last 3 keypoints belong to foot, this field can be like this:

dict(
body=[0, 1, 2, 3, 4, 5, 6], foot=[7, 8, 9], all=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

)

where the numbers are the indices of keypoints and they can be discontinuous.

compute_metrics(results: list) → dict[源代码]¶

Compute the metrics from processed results.

参数: results (list) – The processed results of each batch.
返回: The computed metrics. The keys are the names of the metrics, and the values are corresponding results.
返回类型: dict

property dataset_meta: Optional[dict]¶

Meta info of the dataset.

Type: Optional[dict]

evaluate(size: int) → dict[源代码]¶: Run evaluation for each partition.

process(data_batch: Sequence[dict], data_samples: Sequence[dict]) → None[源代码]¶: Split data samples by partitions, then call metric.process part by part.

class mmpose.evaluation.metrics.MPJPE(mode: str = 'mpjpe', collect_device: str = 'cpu', prefix: Optional[str] = None, skip_list: List[str] = [])[源代码]¶

MPJPE evaluation metric.

Calculate the mean per-joint position error (MPJPE) of keypoints.

备注

length of dataset: N
num_keypoints: K
number of keypoint dimensions: D (typically D = 2)

参数

mode (str) –
Method to align the prediction with the ground truth. Supported options are:
- 'mpjpe': no alignment will be applied
- 'p-mpjpe': align in the least-square sense in scale
- 'n-mpjpe': align in the least-square sense in
  scale, rotation, and translation.
collect_device (str) – Device name used for collecting results from different ranks during distributed training. Must be 'cpu' or 'gpu'. Default: 'cpu'.
prefix (str, optional) – The prefix that will be added in the metric names to disambiguate homonymous metrics of different evaluators. If prefix is not provided in the argument, self.default_prefix will be used instead. Default: None.
skip_list (list, optional) – The list of subject and action combinations to be skipped. Default: [].

compute_metrics(results: list) → Dict[str, float][源代码]¶

Compute the metrics from processed results.

参数: results (list) – The processed results of each batch.
返回: The computed metrics. The keys are the names of the metrics, and the values are the corresponding results.
返回类型: Dict[str, float]

process(data_batch: Sequence[dict], data_samples: Sequence[dict]) → None[源代码]¶

Process one batch of data samples and predictions. The processed results should be stored in self.results, which will be used to compute the metrics when all batches have been processed.

参数

data_batch (Sequence[dict]) – A batch of data from the dataloader.
data_samples (Sequence[dict]) – A batch of outputs from the model.

class mmpose.evaluation.metrics.MpiiPCKAccuracy(thr: float = 0.5, norm_item: Union[str, Sequence[str]] = 'head', collect_device: str = 'cpu', prefix: Optional[str] = None)[源代码]¶

PCKh accuracy evaluation metric for MPII dataset.

Calculate the pose accuracy of Percentage of Correct Keypoints (PCK) for each individual keypoint and the averaged accuracy across all keypoints. PCK metric measures accuracy of the localization of the body joints. The distances between predicted positions and the ground-truth ones are typically normalized by the person bounding box size. The threshold (thr) of the normalized distance is commonly set as 0.05, 0.1 or 0.2 etc.

备注

length of dataset: N
num_keypoints: K
number of keypoint dimensions: D (typically D = 2)

参数

thr (float) – Threshold of PCK calculation. Default: 0.05.
norm_item (str | Sequence[str]) – The item used for normalization. Valid items include ‘bbox’, ‘head’, ‘torso’, which correspond to ‘PCK’, ‘PCKh’ and ‘tPCK’ respectively. Default: 'head'.
collect_device (str) – Device name used for collecting results from different ranks during distributed training. Must be 'cpu' or 'gpu'. Default: 'cpu'.
prefix (str, optional) – The prefix that will be added in the metric names to disambiguate homonymous metrics of different evaluators. If prefix is not provided in the argument, self.default_prefix will be used instead. Default: None.

实际案例

>>> from mmpose.evaluation.metrics import MpiiPCKAccuracy
>>> import numpy as np
>>> from mmengine.structures import InstanceData
>>> num_keypoints = 16
>>> keypoints = np.random.random((1, num_keypoints, 2)) * 10
>>> gt_instances = InstanceData()
>>> gt_instances.keypoints = keypoints + 1.0
>>> gt_instances.keypoints_visible = np.ones(
...     (1, num_keypoints, 1)).astype(bool)
>>> gt_instances.head_size = np.random.random((1, 1)) * 10
>>> pred_instances = InstanceData()
>>> pred_instances.keypoints = keypoints
>>> data_sample = {
...     'gt_instances': gt_instances.to_dict(),
...     'pred_instances': pred_instances.to_dict(),
... }
>>> data_samples = [data_sample]
>>> data_batch = [{'inputs': None}]
>>> mpii_pck_metric = MpiiPCKAccuracy(thr=0.3, norm_item='head')
... UserWarning: The prefix is not set in metric class MpiiPCKAccuracy.
>>> mpii_pck_metric.process(data_batch, data_samples)
>>> mpii_pck_metric.evaluate(1)
10/26 17:43:39 - mmengine - INFO - Evaluating MpiiPCKAccuracy (normalized by ``"head_size"``)...  # noqa
{'Head PCK': 100.0, 'Shoulder PCK': 100.0, 'Elbow PCK': 100.0,
Wrist PCK': 100.0, 'Hip PCK': 100.0, 'Knee PCK': 100.0,
'Ankle PCK': 100.0, 'PCK': 100.0, 'PCK@0.1': 100.0}

compute_metrics(results: list) → Dict[str, float][源代码]¶

Compute the metrics from processed results.

参数

results (list) – The processed results of each batch.

返回

The computed metrics. The keys are the names of the metrics, and the values are corresponding results. If ‘head’ in self.norm_item, the returned results are the pck accuracy normalized by head_size, which have the following keys:

’Head PCK’: The PCK of head

’Shoulder PCK’: The PCK of shoulder

’Elbow PCK’: The PCK of elbow

’Wrist PCK’: The PCK of wrist

’Hip PCK’: The PCK of hip

’Knee PCK’: The PCK of knee

’Ankle PCK’: The PCK of ankle

’PCK’: The mean PCK over all keypoints

’PCK@0.1’: The mean PCK at threshold 0.1

返回类型

Dict[str, float]

class mmpose.evaluation.metrics.NME(norm_mode: str, norm_item: Optional[str] = None, keypoint_indices: Optional[Sequence[int]] = None, collect_device: str = 'cpu', prefix: Optional[str] = None)[源代码]¶

NME evaluation metric.

Calculate the normalized mean error (NME) of keypoints.

备注

length of dataset: N
num_keypoints: K
number of keypoint dimensions: D (typically D = 2)

参数

norm_mode (str) – The normalization mode. There are two valid modes: ‘use_norm_item’ and ‘keypoint_distance’. When set as ‘use_norm_item’, should specify the argument norm_item, which represents the item in the datainfo that will be used as the normalization factor. When set as ‘keypoint_distance’, should specify the argument keypoint_indices that are used to calculate the keypoint distance as the normalization factor.
norm_item (str, optional) – The item used as the normalization factor. For example, ‘bbox_size’ in ‘AFLWDataset’. Only valid when norm_mode is use_norm_item. Default: None.
keypoint_indices (Sequence[int], optional) – The keypoint indices used to calculate the keypoint distance as the normalization factor. Only valid when norm_mode is keypoint_distance. If set as None, will use the default keypoint_indices in DEFAULT_KEYPOINT_INDICES for specific datasets, else use the given keypoint_indices of the dataset. Default: None.
collect_device (str) – Device name used for collecting results from different ranks during distributed training. Must be 'cpu' or 'gpu'. Default: 'cpu'.
prefix (str, optional) – The prefix that will be added in the metric names to disambiguate homonymous metrics of different evaluators. If prefix is not provided in the argument, self.default_prefix will be used instead. Default: None.

compute_metrics(results: list) → Dict[str, float][源代码]¶

Compute the metrics from processed results.

参数: results (list) – The processed results of each batch.
返回: The computed metrics. The keys are the names of the metrics, and the values are corresponding results.
返回类型: Dict[str, float]

process(data_batch: Sequence[dict], data_samples: Sequence[dict]) → None[源代码]¶

Process one batch of data samples and predictions. The processed results should be stored in self.results, which will be used to compute the metrics when all batches have been processed.

参数

data_batch (Sequence[dict]) – A batch of data from the dataloader.
data_samples (Sequence[dict]) – A batch of outputs from the model.

class mmpose.evaluation.metrics.PCKAccuracy(thr: float = 0.05, norm_item: Union[str, Sequence[str]] = 'bbox', collect_device: str = 'cpu', prefix: Optional[str] = None)[源代码]¶

PCK accuracy evaluation metric. Calculate the pose accuracy of Percentage of Correct Keypoints (PCK) for each individual keypoint and the averaged accuracy across all keypoints. PCK metric measures accuracy of the localization of the body joints. The distances between predicted positions and the ground-truth ones are typically normalized by the person bounding box size. The threshold (thr) of the normalized distance is commonly set as 0.05, 0.1 or 0.2 etc. .. note:

- length of dataset: N
- num_keypoints: K
- number of keypoint dimensions: D (typically D = 2)

参数

thr (float) – Threshold of PCK calculation. Default: 0.05.
norm_item (str | Sequence[str]) – The item used for normalization. Valid items include ‘bbox’, ‘head’, ‘torso’, which correspond to ‘PCK’, ‘PCKh’ and ‘tPCK’ respectively. Default: 'bbox'.
collect_device (str) – Device name used for collecting results from different ranks during distributed training. Must be 'cpu' or 'gpu'. Default: 'cpu'.
prefix (str, optional) – The prefix that will be added in the metric names to disambiguate homonymous metrics of different evaluators. If prefix is not provided in the argument, self.default_prefix will be used instead. Default: None.

实际案例

>>> from mmpose.evaluation.metrics import PCKAccuracy
>>> import numpy as np
>>> from mmengine.structures import InstanceData
>>> num_keypoints = 15
>>> keypoints = np.random.random((1, num_keypoints, 2)) * 10
>>> gt_instances = InstanceData()
>>> gt_instances.keypoints = keypoints
>>> gt_instances.keypoints_visible = np.ones(
...     (1, num_keypoints, 1)).astype(bool)
>>> gt_instances.bboxes = np.random.random((1, 4)) * 20
>>> pred_instances = InstanceData()
>>> pred_instances.keypoints = keypoints
>>> data_sample = {
...     'gt_instances': gt_instances.to_dict(),
...     'pred_instances': pred_instances.to_dict(),
... }
>>> data_samples = [data_sample]
>>> data_batch = [{'inputs': None}]
>>> pck_metric = PCKAccuracy(thr=0.5, norm_item='bbox')
...: UserWarning: The prefix is not set in metric class PCKAccuracy.
>>> pck_metric.process(data_batch, data_samples)
>>> pck_metric.evaluate(1)
10/26 15:37:57 - mmengine - INFO - Evaluating PCKAccuracy (normalized by ``"bbox_size"``)...  # noqa
{'PCK': 1.0}

compute_metrics(results: list) → Dict[str, float][源代码]¶

Compute the metrics from processed results.

参数

results (list) – The processed results of each batch.

返回

The computed metrics. The keys are the names of the metrics, and the values are corresponding results. The returned result dict may have the following keys:

’PCK’: The pck accuracy normalized by bbox_size.

’PCKh’: The pck accuracy normalized by head_size.

’tPCK’: The pck accuracy normalized by torso_size.

返回类型

Dict[str, float]

process(data_batch: Sequence[dict], data_samples: Sequence[dict]) → None[源代码]¶

Process one batch of data samples and predictions.

The processed results should be stored in self.results, which will be used to compute the metrics when all batches have been processed. :param data_batch: A batch of data

from the dataloader.

参数: data_samples (Sequence[dict]) – A batch of outputs from the model.

class mmpose.evaluation.metrics.PoseTrack18Metric(ann_file: Optional[str] = None, score_mode: str = 'bbox_keypoint', keypoint_score_thr: float = 0.2, nms_mode: str = 'oks_nms', nms_thr: float = 0.9, format_only: bool = False, outfile_prefix: Optional[str] = None, collect_device: str = 'cpu', prefix: Optional[str] = None)[源代码]¶

PoseTrack18 evaluation metric.

Evaluate AP, and mAP for keypoint detection tasks. Support PoseTrack18 (video) dataset. Please refer to https://github.com/leonid-pishchulin/poseval for more details.

参数

ann_file (str, optional) – Path to the coco format annotation file. If not specified, ground truth annotations from the dataset will be converted to coco format. Defaults to None
score_mode (str) –
The mode to score the prediction results which should be one of the following options:
- 'bbox': Take the score of bbox as the score of the
  prediction results.
- 'bbox_keypoint': Use keypoint score to rescore the
  prediction results.
Defaults to ``’bbox_keypoint’`
keypoint_score_thr (float) – The threshold of keypoint score. The keypoints with score lower than it will not be included to rescore the prediction results. Valid only when score_mode is bbox_keypoint. Defaults to 0.2
nms_mode (str) –
The mode to perform Non-Maximum Suppression (NMS), which should be one of the following options:
- 'oks_nms': Use Object Keypoint Similarity (OKS) to
  perform NMS.
- 'soft_oks_nms': Use Object Keypoint Similarity (OKS)
  to perform soft NMS.
- 'none': Do not perform NMS. Typically for bottomup mode
  output.
Defaults to ``’oks_nms’`
nms_thr (float) – The Object Keypoint Similarity (OKS) threshold used in NMS when nms_mode is 'oks_nms' or 'soft_oks_nms'. Will retain the prediction results with OKS lower than nms_thr. Defaults to 0.9
format_only (bool) – Whether only format the output results without doing quantitative evaluation. This is designed for the need of test submission when the ground truth annotations are absent. If set to True, outfile_prefix should specify the path to store the output results. Defaults to False
outfile_prefix (str | None) – The prefix of json files. It includes the file path and the prefix of filename, e.g., 'a/b/prefix'. If not specified, a temp file will be created. Defaults to None
**kwargs – Keyword parameters passed to mmeval.BaseMetric

results2json(keypoints: Dict[int, list], outfile_prefix: str) → str[源代码]¶

Dump the keypoint detection results into a json file.

参数

keypoints (Dict[int, list]) – Keypoint detection results of the dataset.
outfile_prefix (str) – The filename prefix of the json files. If the prefix is “somepath/xxx”, the json files will be named “somepath/xxx.keypoints.json”.

返回

The json file name of keypoint results.

返回类型

str

class mmpose.evaluation.metrics.SimpleMPJPE(mode: str = 'mpjpe', collect_device: str = 'cpu', prefix: Optional[str] = None, skip_list: List[str] = [])[源代码]¶

MPJPE evaluation metric.

Calculate the mean per-joint position error (MPJPE) of keypoints.

备注

length of dataset: N
num_keypoints: K
number of keypoint dimensions: D (typically D = 2)

参数

mode (str) –
Method to align the prediction with the ground truth. Supported options are:
- 'mpjpe': no alignment will be applied
- 'p-mpjpe': align in the least-square sense in scale
- 'n-mpjpe': align in the least-square sense in
  scale, rotation, and translation.
collect_device (str) – Device name used for collecting results from different ranks during distributed training. Must be 'cpu' or 'gpu'. Default: 'cpu'.
prefix (str, optional) – The prefix that will be added in the metric names to disambiguate homonymous metrics of different evaluators. If prefix is not provided in the argument, self.default_prefix will be used instead. Default: None.
skip_list (list, optional) – The list of subject and action combinations to be skipped. Default: [].

compute_metrics(results: list) → Dict[str, float][源代码]¶

Compute the metrics from processed results.

参数: results (list) – The processed results of each batch.
返回: The computed metrics. The keys are the names of the metrics, and the values are the corresponding results.
返回类型: Dict[str, float]

process(data_batch: Sequence[dict], data_samples: Sequence[dict]) → None[源代码]¶

Process one batch of data samples and predictions. The processed results should be stored in self.results, which will be used to compute the metrics when all batches have been processed.

参数

data_batch (Sequence[dict]) – A batch of data from the dataloader.
data_samples (Sequence[dict]) – A batch of outputs from the model.

functional¶

mmpose.evaluation.functional.keypoint_auc(pred: numpy.ndarray, gt: numpy.ndarray, mask: numpy.ndarray, norm_factor: numpy.ndarray, num_thrs: int = 20) → float[源代码]¶

Calculate the Area under curve (AUC) of keypoint PCK accuracy.

备注

instance number: N
keypoint number: K

参数

pred (np.ndarray[N, K, 2]) – Predicted keypoint location.
gt (np.ndarray[N, K, 2]) – Groundtruth keypoint location.
mask (np.ndarray[N, K]) – Visibility of the target. False for invisible joints, and True for visible. Invisible joints will be ignored for accuracy calculation.
norm_factor (float) – Normalization factor.
num_thrs (int) – number of thresholds to calculate auc.

返回

Area under curve (AUC) of keypoint PCK accuracy.

返回类型

float

mmpose.evaluation.functional.keypoint_epe(pred: numpy.ndarray, gt: numpy.ndarray, mask: numpy.ndarray) → float[源代码]¶

Calculate the end-point error.

备注

instance number: N
keypoint number: K

参数

pred (np.ndarray[N, K, 2]) – Predicted keypoint location.
gt (np.ndarray[N, K, 2]) – Groundtruth keypoint location.
mask (np.ndarray[N, K]) – Visibility of the target. False for invisible joints, and True for visible. Invisible joints will be ignored for accuracy calculation.

返回

Average end-point error.

返回类型

float

mmpose.evaluation.functional.keypoint_mpjpe(pred: numpy.ndarray, gt: numpy.ndarray, mask: numpy.ndarray, alignment: str = 'none')[源代码]¶

Calculate the mean per-joint position error (MPJPE) and the error after rigid alignment with the ground truth (P-MPJPE).

备注

batch_size: N
num_keypoints: K
keypoint_dims: C

参数

pred (np.ndarray) – Predicted keypoint location with shape [N, K, C].
gt (np.ndarray) – Groundtruth keypoint location with shape [N, K, C].
mask (np.ndarray) – Visibility of the target with shape [N, K]. False for invisible joints, and True for visible. Invisible joints will be ignored for accuracy calculation.
alignment (str, optional) –
method to align the prediction with the groundtruth. Supported options are:
- 'none': no alignment will be applied
- 'scale': align in the least-square sense in scale
- 'procrustes': align in the least-square sense in
  scale, rotation and translation.

返回

A tuple containing joint position errors

(float | np.ndarray): mean per-joint position error (mpjpe).
(float | np.ndarray): mpjpe after rigid alignment with the
ground truth (p-mpjpe).

返回类型

tuple

mmpose.evaluation.functional.keypoint_nme(pred: numpy.ndarray, gt: numpy.ndarray, mask: numpy.ndarray, normalize_factor: numpy.ndarray) → float[源代码]¶

Calculate the normalized mean error (NME).

备注

instance number: N
keypoint number: K

参数

pred (np.ndarray[N, K, 2]) – Predicted keypoint location.
gt (np.ndarray[N, K, 2]) – Groundtruth keypoint location.
mask (np.ndarray[N, K]) – Visibility of the target. False for invisible joints, and True for visible. Invisible joints will be ignored for accuracy calculation.
normalize_factor (np.ndarray[N, 2]) – Normalization factor.

返回

normalized mean error

返回类型

float

mmpose.evaluation.functional.keypoint_pck_accuracy(pred: numpy.ndarray, gt: numpy.ndarray, mask: numpy.ndarray, thr: numpy.ndarray, norm_factor: numpy.ndarray) → tuple[源代码]¶

Calculate the pose accuracy of PCK for each individual keypoint and the averaged accuracy across all keypoints for coordinates.

备注

PCK metric measures accuracy of the localization of the body joints. The distances between predicted positions and the ground-truth ones are typically normalized by the bounding box size. The threshold (thr) of the normalized distance is commonly set as 0.05, 0.1 or 0.2 etc.

instance number: N
keypoint number: K

参数

pred (np.ndarray[N, K, 2]) – Predicted keypoint location.
gt (np.ndarray[N, K, 2]) – Groundtruth keypoint location.
mask (np.ndarray[N, K]) – Visibility of the target. False for invisible joints, and True for visible. Invisible joints will be ignored for accuracy calculation.
thr (float) – Threshold of PCK calculation.
norm_factor (np.ndarray[N, 2]) – Normalization factor for H&W.

返回

A tuple containing keypoint accuracy.

acc (np.ndarray[K]): Accuracy of each keypoint.
avg_acc (float): Averaged accuracy across all keypoints.
cnt (int): Number of valid keypoints.

返回类型

tuple

mmpose.evaluation.functional.multilabel_classification_accuracy(pred: numpy.ndarray, gt: numpy.ndarray, mask: numpy.ndarray, thr: float = 0.5) → float[源代码]¶

Get multi-label classification accuracy.

备注

batch size: N
label number: L

参数

pred (np.ndarray[N, L, 2]) – model predicted labels.
gt (np.ndarray[N, L, 2]) – ground-truth labels.
mask (np.ndarray[N, 1] or np.ndarray[N, L]) – reliability of ground-truth labels.
thr (float) – Threshold for calculating accuracy.

返回

multi-label classification accuracy.

返回类型

float

mmpose.evaluation.functional.nearby_joints_nms(kpts_db: List[dict], dist_thr: float = 0.05, num_nearby_joints_thr: Optional[int] = None, score_per_joint: bool = False, max_dets: int = 30)[源代码]¶

Nearby joints NMS implementations. Instances with non-maximum scores will be suppressed if they have too much closed joints with other instances. This function is modified from project DEKR<https://github.com/HRNet/DEKR/blob/main/lib/core/nms.py>.

参数

kpts_db (list[dict]) – keypoints and scores.
dist_thr (float) – threshold for judging whether two joints are close. Defaults to 0.05.
num_nearby_joints_thr (int) – threshold for judging whether two instances are close.
max_dets (int) – max number of detections to keep. Defaults to 30.
score_per_joint (bool) – the input scores (in kpts_db) are per joint scores.

返回

indexes to keep.

返回类型

np.ndarray

mmpose.evaluation.functional.nms(dets: numpy.ndarray, thr: float) → List[int][源代码]¶

Greedily select boxes with high confidence and overlap <= thr.

参数

dets (np.ndarray) – [[x1, y1, x2, y2, score]].
thr (float) – Retain overlap < thr.

返回

Indexes to keep.

返回类型

list

mmpose.evaluation.functional.nms_torch(bboxes: torch.Tensor, scores: torch.Tensor, threshold: float = 0.65, iou_calculator=<function bbox_overlaps>, return_group: bool = False)[源代码]¶

Perform Non-Maximum Suppression (NMS) on a set of bounding boxes using their corresponding scores.

参数

bboxes (Tensor) – list of bounding boxes (each containing 4 elements for x1, y1, x2, y2).
scores (Tensor) – scores associated with each bounding box.
threshold (float) – IoU threshold to determine overlap.
iou_calculator (function) – method to calculate IoU.
return_group (bool) – if True, returns groups of overlapping bounding boxes, otherwise returns the main bounding boxes.

mmpose.evaluation.functional.oks_nms(kpts_db: List[dict], thr: float, sigmas: Optional[numpy.ndarray] = None, vis_thr: Optional[float] = None, score_per_joint: bool = False)[源代码]¶

OKS NMS implementations.

参数

kpts_db (List[dict]) – The keypoints results of the same image.
thr (float) – The threshold of NMS. Will retain oks overlap < thr.
sigmas (np.ndarray, optional) – Keypoint labelling uncertainty. Please refer to COCO keypoint evaluation for more details. If not given, use the sigmas on COCO dataset. Defaults to None
vis_thr (float, optional) – Threshold of the keypoint visibility. If specified, will calculate OKS based on those keypoints whose visibility higher than vis_thr. If not given, calculate the OKS based on all keypoints. Defaults to None
score_per_joint (bool) – Whether the input scores (in kpts_db) are per-joint scores. Defaults to False

返回

indexes to keep.

返回类型

np.ndarray

mmpose.evaluation.functional.pose_pck_accuracy(output: numpy.ndarray, target: numpy.ndarray, mask: numpy.ndarray, thr: float = 0.05, normalize: Optional[numpy.ndarray] = None) → tuple[源代码]¶

Calculate the pose accuracy of PCK for each individual keypoint and the averaged accuracy across all keypoints from heatmaps.

备注

PCK metric measures accuracy of the localization of the body joints. The distances between predicted positions and the ground-truth ones are typically normalized by the bounding box size. The threshold (thr) of the normalized distance is commonly set as 0.05, 0.1 or 0.2 etc.

batch_size: N
num_keypoints: K
heatmap height: H
heatmap width: W

参数

output (np.ndarray[N, K, H, W]) – Model output heatmaps.
target (np.ndarray[N, K, H, W]) – Groundtruth heatmaps.
mask (np.ndarray[N, K]) – Visibility of the target. False for invisible joints, and True for visible. Invisible joints will be ignored for accuracy calculation.
thr (float) – Threshold of PCK calculation. Default 0.05.
normalize (np.ndarray[N, 2]) – Normalization factor for H&W.

返回

A tuple containing keypoint accuracy.

np.ndarray[K]: Accuracy of each keypoint.
float: Averaged accuracy across all keypoints.
int: Number of valid keypoints.

返回类型

tuple

mmpose.evaluation.functional.simcc_pck_accuracy(output: Tuple[numpy.ndarray, numpy.ndarray], target: Tuple[numpy.ndarray, numpy.ndarray], simcc_split_ratio: float, mask: numpy.ndarray, thr: float = 0.05, normalize: Optional[numpy.ndarray] = None) → tuple[源代码]¶

Calculate the pose accuracy of PCK for each individual keypoint and the averaged accuracy across all keypoints from SimCC.

备注

PCK metric measures accuracy of the localization of the body joints. The distances between predicted positions and the ground-truth ones are typically normalized by the bounding box size. The threshold (thr) of the normalized distance is commonly set as 0.05, 0.1 or 0.2 etc.

instance number: N
keypoint number: K

参数

output (Tuple[np.ndarray, np.ndarray]) – Model predicted SimCC.
target (Tuple[np.ndarray, np.ndarray]) – Groundtruth SimCC.
mask (np.ndarray[N, K]) – Visibility of the target. False for invisible joints, and True for visible. Invisible joints will be ignored for accuracy calculation.
thr (float) – Threshold of PCK calculation. Default 0.05.
normalize (np.ndarray[N, 2]) – Normalization factor for H&W.

返回

A tuple containing keypoint accuracy.

np.ndarray[K]: Accuracy of each keypoint.
float: Averaged accuracy across all keypoints.
int: Number of valid keypoints.

返回类型

tuple

mmpose.evaluation.functional.soft_oks_nms(kpts_db: List[dict], thr: float, max_dets: int = 20, sigmas: Optional[numpy.ndarray] = None, vis_thr: Optional[float] = None, score_per_joint: bool = False)[源代码]¶

Soft OKS NMS implementations.

参数

kpts_db (List[dict]) – The keypoints results of the same image.
thr (float) – The threshold of NMS. Will retain oks overlap < thr.
max_dets (int) – Maximum number of detections to keep. Defaults to 20
sigmas (np.ndarray, optional) – Keypoint labelling uncertainty. Please refer to COCO keypoint evaluation for more details. If not given, use the sigmas on COCO dataset. Defaults to None
vis_thr (float, optional) – Threshold of the keypoint visibility. If specified, will calculate OKS based on those keypoints whose visibility higher than vis_thr. If not given, calculate the OKS based on all keypoints. Defaults to None
score_per_joint (bool) – Whether the input scores (in kpts_db) are per-joint scores. Defaults to False

返回

indexes to keep.

返回类型

np.ndarray

mmpose.evaluation.functional.transform_ann(ann_info: Union[dict, list], num_keypoints: int, mapping: Union[List[Tuple[int, int]], List[Tuple[Tuple, int]]])[源代码]¶: Transforms COCO-format annotations based on the mapping.

mmpose.evaluation.functional.transform_pred(pred_info: Union[dict, list], num_keypoints: int, mapping: Union[List[Tuple[int, int]], List[Tuple[Tuple, int]]])[源代码]¶: Transforms predictions based on the mapping.

mmpose.evaluation.functional.transform_sigmas(sigmas: Union[List, numpy.ndarray], num_keypoints: int, mapping: Union[List[Tuple[int, int]], List[Tuple[Tuple, int]]])[源代码]¶: Transforms the sigmas based on the mapping.

mmpose.visualization¶

class mmpose.visualization.FastVisualizer(metainfo: Dict, radius: Optional[int] = 6, line_width: Optional[int] = 3, kpt_thr: Optional[float] = 0.3)[源代码]¶

MMPose Fast Visualizer.

A simple yet fast visualizer for video/webcam inference.

参数

metainfo (dict) – pose meta information
radius (int, optional) – Keypoint radius for visualization. Defaults to 6.
line_width (int, optional) – Link width for visualization. Defaults to 3.
kpt_thr (float, optional) – Threshold for keypoints’ confidence score, keypoints with score below this value will not be drawn. Defaults to 0.3.

draw_points(img: numpy.ndarray, instances: Union[mmpose.visualization.fast_visualizer.Instances, Dict, numpy.ndarray])[源代码]¶

Draw points on the given image.

This method draws keypoints on the input image using the provided instances.

参数

img (numpy.ndarray) – The input image on which to draw the keypoints.
instances (object|dict|np.ndarray) – An object containing keypoints, or a dict containing ‘keypoints’, or a np.ndarray in shape of (Instance_num, Point_num, Point_dim)

返回

The input image will be modified in place.

返回类型

None

draw_pose(img: numpy.ndarray, instances: mmpose.visualization.fast_visualizer.Instances)[源代码]¶

Draw pose estimations on the given image.

This method draws keypoints and skeleton links on the input image using the provided instances.

参数

img (numpy.ndarray) – The input image on which to draw the pose estimations.
instances (object) – An object containing detected instances’ information, including keypoints and keypoint_scores.

返回

The input image will be modified in place.

返回类型

None

class mmpose.visualization.Pose3dLocalVisualizer(name: str = 'visualizer', image: Optional[numpy.ndarray] = None, vis_backends: Optional[Dict] = None, save_dir: Optional[str] = None, bbox_color: Optional[Union[str, Tuple[int]]] = 'green', kpt_color: Optional[Union[str, Tuple[Tuple[int]]]] = 'red', link_color: Optional[Union[str, Tuple[Tuple[int]]]] = None, text_color: Optional[Union[str, Tuple[int]]] = (255, 255, 255), skeleton: Optional[Union[List, Tuple]] = None, line_width: Union[int, float] = 1, radius: Union[int, float] = 3, show_keypoint_weight: bool = False, backend: str = 'opencv', alpha: float = 0.8, det_kpt_color: Optional[Union[str, Tuple[Tuple[int]]]] = None, det_dataset_skeleton: Optional[Union[str, Tuple[Tuple[int]]]] = None, det_dataset_link_color: Optional[numpy.ndarray] = None)[源代码]¶

MMPose 3d Local Visualizer.

参数

name (str) – Name of the instance. Defaults to ‘visualizer’.
image (np.ndarray, optional) – the origin image to draw. The format should be RGB. Defaults to None
vis_backends (list, optional) – Visual backend config list. Defaults to None
save_dir (str, optional) – Save file dir for all storage backends. If it is None, the backend storage will not save any data. Defaults to None
bbox_color (str, tuple(int), optional) – Color of bbox lines. The tuple of color should be in BGR order. Defaults to 'green'
kpt_color (str, tuple(tuple(int)), optional) – Color of keypoints. The tuple of color should be in BGR order. Defaults to 'red'
link_color (str, tuple(tuple(int)), optional) – Color of skeleton. The tuple of color should be in BGR order. Defaults to None
line_width (int, float) – The width of lines. Defaults to 1
radius (int, float) – The radius of keypoints. Defaults to 4
show_keypoint_weight (bool) – Whether to adjust the transparency of keypoints according to their score. Defaults to False
alpha (int, float) – The transparency of bboxes. Defaults to 0.8
det_kpt_color (str, tuple(tuple(int)), optional) – Keypoints color info for detection. Defaults to None
det_dataset_skeleton (list) – Skeleton info for detection. Defaults to None
det_dataset_link_color (list) – Link color for detection. Defaults to None

add_datasample(name: str, image: numpy.ndarray, data_sample: mmpose.structures.pose_data_sample.PoseDataSample, det_data_sample: Optional[mmpose.structures.pose_data_sample.PoseDataSample] = None, draw_gt: bool = True, draw_pred: bool = True, draw_2d: bool = True, draw_bbox: bool = False, show_kpt_idx: bool = False, skeleton_style: str = 'mmpose', dataset_2d: str = 'coco', dataset_3d: str = 'h36m', convert_keypoint: bool = True, axis_azimuth: float = 70, axis_limit: float = 1.7, axis_dist: float = 10.0, axis_elev: float = 15.0, num_instances: int = - 1, show: bool = False, wait_time: float = 0, out_file: Optional[str] = None, kpt_thr: float = 0.3, step: int = 0) → None[源代码]¶

Draw datasample and save to all backends.

If GT and prediction are plotted at the same time, they are

displayed in a stitched image where the left image is the ground truth and the right image is the prediction. - If show is True, all storage backends are ignored, and the images will be displayed in a local window. - If out_file is specified, the drawn image will be saved to out_file. t is usually used when the display is not available.

参数

name (str) – The image identifier
image (np.ndarray) – The image to draw
data_sample (PoseDataSample) – The 3d data sample to visualize
det_data_sample (PoseDataSample, optional) – The 2d detection data sample to visualize
draw_gt (bool) – Whether to draw GT PoseDataSample. Default to True
draw_pred (bool) – Whether to draw Prediction PoseDataSample. Defaults to True
draw_2d (bool) – Whether to draw 2d detection results. Defaults to True
draw_bbox (bool) – Whether to draw bounding boxes. Default to False
show_kpt_idx (bool) – Whether to show the index of keypoints. Defaults to False
skeleton_style (str) – Skeleton style selection. Defaults to 'mmpose'
dataset_2d (str) – Name of 2d keypoint dataset. Defaults to 'CocoDataset'
dataset_3d (str) – Name of 3d keypoint dataset. Defaults to 'Human36mDataset'
convert_keypoint (bool) – Whether to convert keypoint definition. Defaults to True
axis_azimuth (float) – axis azimuth angle for 3D visualizations.
axis_dist (float) – axis distance for 3D visualizations.
axis_elev (float) – axis elevation view angle for 3D visualizations.
axis_limit (float) – The axis limit to visualize 3d pose. The xyz range will be set as: - x: [x_c - axis_limit/2, x_c + axis_limit/2] - y: [y_c - axis_limit/2, y_c + axis_limit/2] - z: [0, axis_limit] Where x_c, y_c is the mean value of x and y coordinates
num_instances (int) – Number of instances to be shown in 3D. If smaller than 0, all the instances in the pose_result will be shown. Otherwise, pad or truncate the pose_result to a length of num_instances. Defaults to -1
show (bool) – Whether to display the drawn image. Default to False
wait_time (float) – The interval of show (s). Defaults to 0
out_file (str) – Path to output file. Defaults to None
kpt_thr (float, optional) – Minimum threshold of keypoints to be shown. Default: 0.3.
step (int) – Global step value to record. Defaults to 0

class mmpose.visualization.PoseLocalVisualizer(name: str = 'visualizer', image: Optional[numpy.ndarray] = None, vis_backends: Optional[Dict] = None, save_dir: Optional[str] = None, bbox_color: Optional[Union[str, Tuple[int]]] = 'green', kpt_color: Optional[Union[str, Tuple[Tuple[int]]]] = 'red', link_color: Optional[Union[str, Tuple[Tuple[int]]]] = None, text_color: Optional[Union[str, Tuple[int]]] = (255, 255, 255), skeleton: Optional[Union[List, Tuple]] = None, line_width: Union[int, float] = 1, radius: Union[int, float] = 3, show_keypoint_weight: bool = False, backend: str = 'opencv', alpha: float = 1.0)[源代码]¶

MMPose Local Visualizer.

参数

name (str) – Name of the instance. Defaults to ‘visualizer’.
image (np.ndarray, optional) – the origin image to draw. The format should be RGB. Defaults to None
vis_backends (list, optional) – Visual backend config list. Defaults to None
save_dir (str, optional) – Save file dir for all storage backends. If it is None, the backend storage will not save any data. Defaults to None
bbox_color (str, tuple(int), optional) – Color of bbox lines. The tuple of color should be in BGR order. Defaults to 'green'
kpt_color (str, tuple(tuple(int)), optional) – Color of keypoints. The tuple of color should be in BGR order. Defaults to 'red'
link_color (str, tuple(tuple(int)), optional) – Color of skeleton. The tuple of color should be in BGR order. Defaults to None
line_width (int, float) – The width of lines. Defaults to 1
radius (int, float) – The radius of keypoints. Defaults to 4
show_keypoint_weight (bool) – Whether to adjust the transparency of keypoints according to their score. Defaults to False
alpha (int, float) – The transparency of bboxes. Defaults to 1.0

实际案例

>>> import numpy as np
>>> from mmengine.structures import InstanceData
>>> from mmpose.structures import PoseDataSample
>>> from mmpose.visualization import PoseLocalVisualizer

>>> pose_local_visualizer = PoseLocalVisualizer(radius=1)
>>> image = np.random.randint(0, 256,
...                     size=(10, 12, 3)).astype('uint8')
>>> gt_instances = InstanceData()
>>> gt_instances.keypoints = np.array([[[1, 1], [2, 2], [4, 4],
...                                          [8, 8]]])
>>> gt_pose_data_sample = PoseDataSample()
>>> gt_pose_data_sample.gt_instances = gt_instances
>>> dataset_meta = {'skeleton_links': [[0, 1], [1, 2], [2, 3]]}
>>> pose_local_visualizer.set_dataset_meta(dataset_meta)
>>> pose_local_visualizer.add_datasample('image', image,
...                         gt_pose_data_sample)
>>> pose_local_visualizer.add_datasample(
...                       'image', image, gt_pose_data_sample,
...                        out_file='out_file.jpg')
>>> pose_local_visualizer.add_datasample(
...                        'image', image, gt_pose_data_sample,
...                         show=True)
>>> pred_instances = InstanceData()
>>> pred_instances.keypoints = np.array([[[1, 1], [2, 2], [4, 4],
...                                       [8, 8]]])
>>> pred_instances.score = np.array([0.8, 1, 0.9, 1])
>>> pred_pose_data_sample = PoseDataSample()
>>> pred_pose_data_sample.pred_instances = pred_instances
>>> pose_local_visualizer.add_datasample('image', image,
...                         gt_pose_data_sample,
...                         pred_pose_data_sample)

add_datasample(name: str, image: numpy.ndarray, data_sample: mmpose.structures.pose_data_sample.PoseDataSample, draw_gt: bool = True, draw_pred: bool = True, draw_heatmap: bool = False, draw_bbox: bool = False, show_kpt_idx: bool = False, skeleton_style: str = 'mmpose', show: bool = False, wait_time: float = 0, out_file: Optional[str] = None, kpt_thr: float = 0.3, step: int = 0) → None[源代码]¶

Draw datasample and save to all backends.

If GT and prediction are plotted at the same time, they are

displayed in a stitched image where the left image is the ground truth and the right image is the prediction. - If show is True, all storage backends are ignored, and the images will be displayed in a local window. - If out_file is specified, the drawn image will be saved to out_file. t is usually used when the display is not available.

参数

name (str) – The image identifier
image (np.ndarray) – The image to draw
data_sample (PoseDataSample, optional) – The data sample to visualize
draw_gt (bool) – Whether to draw GT PoseDataSample. Default to True
draw_pred (bool) – Whether to draw Prediction PoseDataSample. Defaults to True
draw_bbox (bool) – Whether to draw bounding boxes. Default to False
draw_heatmap (bool) – Whether to draw heatmaps. Defaults to False
show_kpt_idx (bool) – Whether to show the index of keypoints. Defaults to False
skeleton_style (str) – Skeleton style selection. Defaults to 'mmpose'
show (bool) – Whether to display the drawn image. Default to False
wait_time (float) – The interval of show (s). Defaults to 0
out_file (str) – Path to output file. Defaults to None
kpt_thr (float, optional) – Minimum threshold of keypoints to be shown. Default: 0.3.
step (int) – Global step value to record. Defaults to 0

set_dataset_meta(dataset_meta: Dict, skeleton_style: str = 'mmpose')[源代码]¶

Assign dataset_meta to the visualizer. The default visualization settings will be overridden.

参数: dataset_meta (dict) – meta information of dataset.

mmpose.engine¶

hooks¶

class mmpose.engine.hooks.BadCaseAnalysisHook(enable: bool = False, show: bool = False, wait_time: float = 0.0, interval: int = 50, kpt_thr: float = 0.3, out_dir: Optional[str] = None, backend_args: Optional[dict] = None, metric_type: str = 'loss', metric: mmengine.config.config.ConfigDict = {'type': 'KeypointMSELoss'}, metric_key: str = 'PCK', badcase_thr: float = 5)[源代码]¶

Bad Case Analyze Hook. Used to visualize validation and testing process prediction results.

In the testing phase:

If show is True, it means that only the prediction results are
visualized without storing data, so vis_backends needs to be excluded.
If out_dir is specified, it means that the prediction results
need to be saved to out_dir. In order to avoid vis_backends also storing data, so vis_backends needs to be excluded.
vis_backends takes effect if the user does not specify show
and out_dir`. You can set vis_backends to WandbVisBackend or TensorboardVisBackend to store the prediction result in Wandb or Tensorboard.

参数

enable (bool) – whether to draw prediction results. If it is False, it means that no drawing will be done. Defaults to False.
show (bool) – Whether to display the drawn image. Default to False.
wait_time (float) – The interval of show (s). Defaults to 0.
interval (int) – The interval of visualization. Defaults to 50.
kpt_thr (float) – The threshold to visualize the keypoints. Defaults to 0.3.
out_dir (str, optional) – directory where painted images will be saved in testing process.
backend_args (dict, optional) – Arguments to instantiate the preifx of uri corresponding backend. Defaults to None.
metric_type (str) – the mretic type to decide a badcase, loss or accuracy.
metric (ConfigDict) – The config of metric.
metric_key (str) – key of needed metric value in the return dict from class ‘metric’.
badcase_thr (float) – min loss or max accuracy for a badcase.

after_test_epoch(runner, metrics: Optional[Dict[str, float]] = None) → None[源代码]¶

All subclasses should override this method, if they need any operations after each test epoch.

参数

runner (Runner) – The runner of the testing process.
metrics (Dict[str, float], optional) – Evaluation results of all metrics on test dataset. The keys are the names of the metrics, and the values are corresponding results.

after_test_iter(runner: mmengine.runner.runner.Runner, batch_idx: int, data_batch: dict, outputs: Sequence[mmpose.structures.pose_data_sample.PoseDataSample]) → None[源代码]¶

Run after every testing iterations.

参数

runner (Runner) – The runner of the testing process.
batch_idx (int) – The index of the current batch in the test loop.
data_batch (dict) – Data from dataloader.
outputs (Sequence[PoseDataSample]) – Outputs from model.

check_badcase(data_batch, data_sample)[源代码]¶

Check whether the sample is a badcase.

参数

data_batch (Sequence[dict]) – A batch of data from the dataloader.
data_samples (Sequence[dict]) – A batch of outputs from the model.

返回

whether the sample is a badcase or not metric_value (float)

返回类型

is_badcase (bool)

class mmpose.engine.hooks.ExpMomentumEMA(model: torch.nn.modules.module.Module, momentum: float = 0.0002, gamma: int = 2000, interval=1, device: Optional[torch.device] = None, update_buffers: bool = False)[源代码]¶

Exponential moving average (EMA) with exponential momentum strategy, which is used in YOLOX.

Ported from ` the implementation of MMDetection <https://github.com/open-mmlab/mmdetection/blob/3.x/mmdet/models/layers/ema.py>`_.

参数

model (nn.Module) – The model to be averaged.
momentum (float) –

The momentum used for updating ema parameter.
Ema’s parameter are updated with the formula:

averaged_param = (1-momentum) * averaged_param + momentum * source_param. Defaults to 0.0002.
gamma (int) – Use a larger momentum early in training and gradually annealing to a smaller value to update the ema model smoothly. The momentum is calculated as (1 - momentum) * exp(-(1 + steps) / gamma) + momentum. Defaults to 2000.
interval (int) – Interval between two updates. Defaults to 1.
device (torch.device, optional) – If provided, the averaged model will be stored on the device. Defaults to None.
update_buffers (bool) – if True, it will compute running averages for both the parameters and the buffers of the model. Defaults to False.

avg_func(averaged_param: torch.Tensor, source_param: torch.Tensor, steps: int) → None[源代码]¶

Compute the moving average of the parameters using the exponential momentum strategy.

参数

averaged_param (Tensor) – The averaged parameters.
source_param (Tensor) – The source parameters.
steps (int) – The number of times the parameters have been updated.

class mmpose.engine.hooks.PoseVisualizationHook(enable: bool = False, interval: int = 50, kpt_thr: float = 0.3, show: bool = False, wait_time: float = 0.0, out_dir: Optional[str] = None, backend_args: Optional[dict] = None)[源代码]¶

Pose Estimation Visualization Hook. Used to visualize validation and testing process prediction results.

In the testing phase:

If show is True, it means that only the prediction results are
visualized without storing data, so vis_backends needs to be excluded.
If out_dir is specified, it means that the prediction results
need to be saved to out_dir. In order to avoid vis_backends also storing data, so vis_backends needs to be excluded.
vis_backends takes effect if the user does not specify show
and out_dir`. You can set vis_backends to WandbVisBackend or TensorboardVisBackend to store the prediction result in Wandb or Tensorboard.

参数

enable (bool) – whether to draw prediction results. If it is False, it means that no drawing will be done. Defaults to False.
interval (int) – The interval of visualization. Defaults to 50.
score_thr (float) – The threshold to visualize the bboxes and masks. Defaults to 0.3.
show (bool) – Whether to display the drawn image. Default to False.
wait_time (float) – The interval of show (s). Defaults to 0.
out_dir (str, optional) – directory where painted images will be saved in testing process.
backend_args (dict, optional) – Arguments to instantiate the preifx of uri corresponding backend. Defaults to None.

after_test_iter(runner: mmengine.runner.runner.Runner, batch_idx: int, data_batch: dict, outputs: Sequence[mmpose.structures.pose_data_sample.PoseDataSample]) → None[源代码]¶

Run after every testing iterations.

参数

runner (Runner) – The runner of the testing process.
batch_idx (int) – The index of the current batch in the test loop.
data_batch (dict) – Data from dataloader.
outputs (Sequence[PoseDataSample]) – Outputs from model.

after_val_iter(runner: mmengine.runner.runner.Runner, batch_idx: int, data_batch: dict, outputs: Sequence[mmpose.structures.pose_data_sample.PoseDataSample]) → None[源代码]¶

Run after every self.interval validation iterations.

参数

runner (Runner) – The runner of the validation process.
batch_idx (int) – The index of the current batch in the val loop.
data_batch (dict) – Data from dataloader.
outputs (Sequence[PoseDataSample]) – Outputs from model.

class mmpose.engine.hooks.RTMOModeSwitchHook(epoch_attributes: Dict[int, Dict])[源代码]¶

A hook to switch the mode of RTMO during training.

This hook allows for dynamic adjustments of model attributes at specified training epochs. It is designed to modify configurations such as turning off specific augmentations or changing loss functions at different stages of the training process.

参数

epoch_attributes (Dict[str, Dict]) – A dictionary where keys are epoch
Each (numbers and values are attribute modification dictionaries.) –
value. (dictionary specifies the attribute to modify and its new) –

示例

epoch_attributes = {: 5: [{“attr1.subattr”: new_value1}, {“attr2.subattr”: new_value2}], 10: [{“attr3.subattr”: new_value3}]

}

before_train_epoch(runner: mmengine.runner.runner.Runner)[源代码]¶

Method called before each training epoch.

It checks if the current epoch is in the epoch_attributes mapping and applies the corresponding attribute changes to the model.

class mmpose.engine.hooks.SyncNormHook[源代码]¶

Synchronize Norm states before validation.

before_val_epoch(runner)[源代码]¶: Synchronize normalization statistics.

class mmpose.engine.hooks.YOLOXPoseModeSwitchHook(num_last_epochs: int = 20, new_train_dataset: Optional[dict] = None, new_train_pipeline: Optional[Sequence[dict]] = None)[源代码]¶

Switch the mode of YOLOX-Pose during training.

This hook: 1) Turns off mosaic and mixup data augmentation. 2) Uses instance mask to assist positive anchor selection. 3) Uses auxiliary L1 loss in the head.

参数

num_last_epochs (int) – The number of last epochs at the end of training to close the data augmentation and switch to L1 loss. Defaults to 20.
new_train_dataset (dict) – New training dataset configuration that will be used in place of the original training dataset. Defaults to None.
new_train_pipeline (Sequence[dict]) – New data augmentation pipeline configuration that will be used in place of the original pipeline during training. Defaults to None.

before_train_epoch(runner: mmengine.runner.runner.Runner)[源代码]¶: Close mosaic and mixup augmentation, switch to use L1 loss.