Shortcuts

mmpose.apis

mmpose.codecs

class mmpose.codecs.AssociativeEmbedding(input_size: Tuple[int, int], heatmap_size: Tuple[int, int], sigma: Optional[float] = None, use_udp: bool = False, decode_keypoint_order: List[int] = [], decode_nms_kernel: int = 5, decode_gaussian_kernel: int = 3, decode_keypoint_thr: float = 0.1, decode_tag_thr: float = 1.0, decode_topk: int = 20, decode_max_instances: Optional[int] = None)[源代码]

Encode/decode keypoints with the method introduced in “Associative Embedding”. This is an asymmetric codec, where the keypoints are represented as gaussian heatmaps and position indices during encoding, and restored from predicted heatmaps and group tags.

See the paper `Associative Embedding: End-to-End Learning for Joint Detection and Grouping`_ by Newell et al (2017) for details

备注

  • instance number: N

  • keypoint number: K

  • keypoint dimension: D

  • embedding tag dimension: L

  • image size: [w, h]

  • heatmap size: [W, H]

Encoded:

  • heatmaps (np.ndarray): The generated heatmap in shape (K, H, W)

    where [W, H] is the heatmap_size

  • keypoint_indices (np.ndarray): The keypoint position indices in shape

    (N, K, 2). Each keypoint’s index is [i, v], where i is the position index in the heatmap (\(i=y*w+x\)) and v is the visibility

  • keypoint_weights (np.ndarray): The target weights in shape (N, K)

参数
  • input_size (tuple) – Image size in [w, h]

  • heatmap_size (tuple) – Heatmap size in [W, H]

  • sigma (float) – The sigma value of the Gaussian heatmap

  • use_udp (bool) – Whether use unbiased data processing. See `UDP (CVPR 2020)`_ for details. Defaults to False

  • decode_keypoint_order (List[int]) – The grouping order of the keypoint indices. The groupping usually starts from a keypoints around the head and torso, and gruadually moves out to the limbs

  • decode_keypoint_thr (float) – The threshold of keypoint response value in heatmaps. Defaults to 0.1

  • decode_tag_thr (float) – The maximum allowed tag distance when matching a keypoint to a group. A keypoint with larger tag distance to any of the existing groups will initializes a new group. Defaults to 1.0

  • decode_nms_kernel (int) – The kernel size of the NMS during decoding, which should be an odd integer. Defaults to 5

  • decode_gaussian_kernel (int) – The kernel size of the Gaussian blur during decoding, which should be an odd integer. It is only used when self.use_udp==True. Defaults to 3

  • decode_topk (int) – The number top-k candidates of each keypoints that will be retrieved from the heatmaps during dedocding. Defaults to 20

  • decode_max_instances (int, optional) – The maximum number of instances to decode. None means no limitation to the instance number. Defaults to None

Grouping`: https://arxiv.org/abs/1611.05424 .. UDP (CVPR 2020): https://arxiv.org/abs/1911.07524

batch_decode(batch_heatmaps: torch.Tensor, batch_tags: torch.Tensor) Tuple[List[numpy.ndarray], List[numpy.ndarray]][源代码]

Decode the keypoint coordinates from a batch of heatmaps and tagging heatmaps. The decoded keypoint coordinates are in the input image space.

参数
  • batch_heatmaps (Tensor) – Keypoint detection heatmaps in shape (B, K, H, W)

  • batch_tags (Tensor) – Tagging heatmaps in shape (B, C, H, W), where \(C=L*K\)

返回

  • batch_keypoints (List[np.ndarray]): Decoded keypoint coordinates

    of the batch, each is in shape (N, K, D)

  • batch_scores (List[np.ndarray]): Decoded keypoint scores of the

    batch, each is in shape (N, K). It usually represents the confidience of the keypoint prediction

返回类型

tuple

decode(encoded: Any) Tuple[numpy.ndarray, numpy.ndarray][源代码]

Decode keypoints.

参数

encoded (any) – Encoded keypoint representation using the codec

返回

  • keypoints (np.ndarray): Keypoint coordinates in shape (N, K, D)

  • keypoints_visible (np.ndarray): Keypoint visibility in shape

    (N, K, D)

返回类型

tuple

encode(keypoints: numpy.ndarray, keypoints_visible: Optional[numpy.ndarray] = None) Tuple[numpy.ndarray, numpy.ndarray, numpy.ndarray][源代码]

Encode keypoints into heatmaps and position indices. Note that the original keypoint coordinates should be in the input image space.

参数
  • keypoints (np.ndarray) – Keypoint coordinates in shape (N, K, D)

  • keypoints_visible (np.ndarray) – Keypoint visibilities in shape (N, K)

返回

  • heatmaps (np.ndarray): The generated heatmap in shape

    (K, H, W) where [W, H] is the heatmap_size

  • keypoint_indices (np.ndarray): The keypoint position indices

    in shape (N, K, 2). Each keypoint’s index is [i, v], where i is the position index in the heatmap (\(i=y*w+x\)) and v is the visibility

  • keypoint_weights (np.ndarray): The target weights in shape

    (N, K)

返回类型

dict

class mmpose.codecs.DecoupledHeatmap(input_size: Tuple[int, int], heatmap_size: Tuple[int, int], root_type: str = 'kpt_center', heatmap_min_overlap: float = 0.7, encode_max_instances: int = 30)[源代码]

Encode/decode keypoints with the method introduced in the paper CID.

See the paper Contextual Instance Decoupling for Robust Multi-Person Pose Estimation`_ by Wang et al (2022) for details

备注

  • instance number: N

  • keypoint number: K

  • keypoint dimension: D

  • image size: [w, h]

  • heatmap size: [W, H]

Encoded:
  • heatmaps (np.ndarray): The coupled heatmap in shape

    (1+K, H, W) where [W, H] is the heatmap_size.

  • instance_heatmaps (np.ndarray): The decoupled heatmap in shape

    (M*K, H, W) where M is the number of instances.

  • keypoint_weights (np.ndarray): The weight for heatmaps in shape

    (M*K).

  • instance_coords (np.ndarray): The coordinates of instance roots

    in shape (M, 2)

参数
  • input_size (tuple) – Image size in [w, h]

  • heatmap_size (tuple) – Heatmap size in [W, H]

  • root_type (str) –

    The method to generate the instance root. Options are:

    • 'kpt_center': Average coordinate of all visible keypoints.

    • 'bbox_center': Center point of bounding boxes outlined by

      all visible keypoints.

    Defaults to 'kpt_center'

  • heatmap_min_overlap (float) – Minimum overlap rate among instances. Used when calculating sigmas for instances. Defaults to 0.7

  • background_weight (float) – Loss weight of background pixels. Defaults to 0.1

  • encode_max_instances (int) – The maximum number of instances to encode for each sample. Defaults to 30

Contextual_Instance_Decoupling_for_Robust_Multi-Person_Pose_Estimation_ CVPR_2022_paper.html

decode(instance_heatmaps: numpy.ndarray, instance_scores: numpy.ndarray) Tuple[numpy.ndarray, numpy.ndarray][源代码]

Decode keypoint coordinates from decoupled heatmaps. The decoded keypoint coordinates are in the input image space.

参数
  • instance_heatmaps (np.ndarray) – Heatmaps in shape (N, K, H, W)

  • instance_scores (np.ndarray) – Confidence of instance roots prediction in shape (N, 1)

返回

  • keypoints (np.ndarray): Decoded keypoint coordinates in shape

    (N, K, D)

  • scores (np.ndarray): The keypoint scores in shape (N, K). It

    usually represents the confidence of the keypoint prediction

返回类型

tuple

encode(keypoints: numpy.ndarray, keypoints_visible: Optional[numpy.ndarray] = None, bbox: Optional[numpy.ndarray] = None) dict[源代码]

Encode keypoints into heatmaps.

参数
  • keypoints (np.ndarray) – Keypoint coordinates in shape (N, K, D)

  • keypoints_visible (np.ndarray) – Keypoint visibilities in shape (N, K)

  • bbox (np.ndarray) – Bounding box in shape (N, 8) which includes coordinates of 4 corners.

返回

  • heatmaps (np.ndarray): The coupled heatmap in shape

    (1+K, H, W) where [W, H] is the heatmap_size.

  • instance_heatmaps (np.ndarray): The decoupled heatmap in shape

    (N*K, H, W) where M is the number of instances.

  • keypoint_weights (np.ndarray): The weight for heatmaps in shape

    (N*K).

  • instance_coords (np.ndarray): The coordinates of instance roots

    in shape (N, 2)

返回类型

dict

class mmpose.codecs.IntegralRegressionLabel(input_size: Tuple[int, int], heatmap_size: Tuple[int, int], sigma: float, unbiased: bool = False, blur_kernel_size: int = 11, normalize: bool = True)[源代码]

Generate keypoint coordinates and normalized heatmaps. See the paper: DSNT by Nibali et al(2018).

备注

  • instance number: N

  • keypoint number: K

  • keypoint dimension: D

  • image size: [w, h]

Encoded:

  • keypoint_labels (np.ndarray): The normalized regression labels in

    shape (N, K, D) where D is 2 for 2d coordinates

  • heatmaps (np.ndarray): The generated heatmap in shape (K, H, W) where

    [W, H] is the heatmap_size

  • keypoint_weights (np.ndarray): The target weights in shape (N, K)

参数
  • input_size (tuple) – Input image size in [w, h]

  • heatmap_size (tuple) – Heatmap size in [W, H]

  • sigma (float) – The sigma value of the Gaussian heatmap

  • unbiased (bool) – Whether use unbiased method (DarkPose) in 'msra' encoding. See Dark Pose for details. Defaults to False

  • blur_kernel_size (int) – The Gaussian blur kernel size of the heatmap modulation in DarkPose. The kernel size and sigma should follow the expirical formula \(sigma = 0.3*((ks-1)*0.5-1)+0.8\). Defaults to 11

  • normalize (bool) – Whether to normalize the heatmaps. Defaults to True.

decode(encoded: numpy.ndarray) Tuple[numpy.ndarray, numpy.ndarray][源代码]

Decode keypoint coordinates from normalized space to input image space.

参数

encoded (np.ndarray) – Coordinates in shape (N, K, D)

返回

  • keypoints (np.ndarray): Decoded coordinates in shape (N, K, D)

  • socres (np.ndarray): The keypoint scores in shape (N, K).

    It usually represents the confidence of the keypoint prediction

返回类型

tuple

encode(keypoints: numpy.ndarray, keypoints_visible: Optional[numpy.ndarray] = None) dict[源代码]

Encoding keypoints to regression labels and heatmaps.

参数
  • keypoints (np.ndarray) – Keypoint coordinates in shape (N, K, D)

  • keypoints_visible (np.ndarray) – Keypoint visibilities in shape (N, K)

返回

  • keypoint_labels (np.ndarray): The normalized regression labels in

    shape (N, K, D) where D is 2 for 2d coordinates

  • heatmaps (np.ndarray): The generated heatmap in shape

    (K, H, W) where [W, H] is the heatmap_size

  • keypoint_weights (np.ndarray): The target weights in shape

    (N, K)

返回类型

dict

class mmpose.codecs.MSRAHeatmap(input_size: Tuple[int, int], heatmap_size: Tuple[int, int], sigma: float, unbiased: bool = False, blur_kernel_size: int = 11)[源代码]

Represent keypoints as heatmaps via “MSRA” approach. See the paper: Simple Baselines for Human Pose Estimation and Tracking by Xiao et al (2018) for details.

备注

  • instance number: N

  • keypoint number: K

  • keypoint dimension: D

  • image size: [w, h]

  • heatmap size: [W, H]

Encoded:

  • heatmaps (np.ndarray): The generated heatmap in shape (K, H, W)

    where [W, H] is the heatmap_size

  • keypoint_weights (np.ndarray): The target weights in shape (N, K)

参数
  • input_size (tuple) – Image size in [w, h]

  • heatmap_size (tuple) – Heatmap size in [W, H]

  • sigma (float) – The sigma value of the Gaussian heatmap

  • unbiased (bool) – Whether use unbiased method (DarkPose) in 'msra' encoding. See Dark Pose for details. Defaults to False

  • blur_kernel_size (int) – The Gaussian blur kernel size of the heatmap modulation in DarkPose. The kernel size and sigma should follow the expirical formula \(sigma = 0.3*((ks-1)*0.5-1)+0.8\). Defaults to 11

decode(encoded: numpy.ndarray) Tuple[numpy.ndarray, numpy.ndarray][源代码]

Decode keypoint coordinates from heatmaps. The decoded keypoint coordinates are in the input image space.

参数

encoded (np.ndarray) – Heatmaps in shape (K, H, W)

返回

  • keypoints (np.ndarray): Decoded keypoint coordinates in shape

    (N, K, D)

  • scores (np.ndarray): The keypoint scores in shape (N, K). It

    usually represents the confidence of the keypoint prediction

返回类型

tuple

encode(keypoints: numpy.ndarray, keypoints_visible: Optional[numpy.ndarray] = None) dict[源代码]

Encode keypoints into heatmaps. Note that the original keypoint coordinates should be in the input image space.

参数
  • keypoints (np.ndarray) – Keypoint coordinates in shape (N, K, D)

  • keypoints_visible (np.ndarray) – Keypoint visibilities in shape (N, K)

返回

  • heatmaps (np.ndarray): The generated heatmap in shape

    (K, H, W) where [W, H] is the heatmap_size

  • keypoint_weights (np.ndarray): The target weights in shape

    (N, K)

返回类型

dict

class mmpose.codecs.MegviiHeatmap(input_size: Tuple[int, int], heatmap_size: Tuple[int, int], kernel_size: int)[源代码]

Represent keypoints as heatmaps via “Megvii” approach. See MSPN (2019) and CPN (2018) for details.

备注

  • instance number: N

  • keypoint number: K

  • keypoint dimension: D

  • image size: [w, h]

  • heatmap size: [W, H]

Encoded:

  • heatmaps (np.ndarray): The generated heatmap in shape (K, H, W)

    where [W, H] is the heatmap_size

  • keypoint_weights (np.ndarray): The target weights in shape (N, K)

参数
  • input_size (tuple) – Image size in [w, h]

  • heatmap_size (tuple) – Heatmap size in [W, H]

  • kernel_size (tuple) – The kernel size of the heatmap gaussian in [ks_x, ks_y]

decode(encoded: numpy.ndarray) Tuple[numpy.ndarray, numpy.ndarray][源代码]

Decode keypoint coordinates from heatmaps. The decoded keypoint coordinates are in the input image space.

参数

encoded (np.ndarray) – Heatmaps in shape (K, H, W)

返回

  • keypoints (np.ndarray): Decoded keypoint coordinates in shape

    (K, D)

  • scores (np.ndarray): The keypoint scores in shape (K,). It

    usually represents the confidence of the keypoint prediction

返回类型

tuple

encode(keypoints: numpy.ndarray, keypoints_visible: Optional[numpy.ndarray] = None) dict[源代码]

Encode keypoints into heatmaps. Note that the original keypoint coordinates should be in the input image space.

参数
  • keypoints (np.ndarray) – Keypoint coordinates in shape (N, K, D)

  • keypoints_visible (np.ndarray) – Keypoint visibilities in shape (N, K)

返回

  • heatmaps (np.ndarray): The generated heatmap in shape

    (K, H, W) where [W, H] is the heatmap_size

  • keypoint_weights (np.ndarray): The target weights in shape

    (N, K)

返回类型

dict

class mmpose.codecs.RegressionLabel(input_size: Tuple[int, int])[源代码]

Generate keypoint coordinates.

备注

  • instance number: N

  • keypoint number: K

  • keypoint dimension: D

  • image size: [w, h]

Encoded:

  • keypoint_labels (np.ndarray): The normalized regression labels in

    shape (N, K, D) where D is 2 for 2d coordinates

  • keypoint_weights (np.ndarray): The target weights in shape (N, K)

参数

input_size (tuple) – Input image size in [w, h]

decode(encoded: numpy.ndarray) Tuple[numpy.ndarray, numpy.ndarray][源代码]

Decode keypoint coordinates from normalized space to input image space.

参数

encoded (np.ndarray) – Coordinates in shape (N, K, D)

返回

  • keypoints (np.ndarray): Decoded coordinates in shape (N, K, D)

  • socres (np.ndarray): The keypoint scores in shape (N, K).

    It usually represents the confidence of the keypoint prediction

返回类型

tuple

encode(keypoints: numpy.ndarray, keypoints_visible: Optional[numpy.ndarray] = None) dict[源代码]

Encoding keypoints from input image space to normalized space.

参数
  • keypoints (np.ndarray) – Keypoint coordinates in shape (N, K, D)

  • keypoints_visible (np.ndarray) – Keypoint visibilities in shape (N, K)

返回

  • keypoint_labels (np.ndarray): The normalized regression labels in

    shape (N, K, D) where D is 2 for 2d coordinates

  • keypoint_weights (np.ndarray): The target weights in shape

    (N, K)

返回类型

dict

class mmpose.codecs.SPR(input_size: Tuple[int, int], heatmap_size: Tuple[int, int], sigma: Optional[Union[float, Tuple[float]]] = None, generate_keypoint_heatmaps: bool = False, root_type: str = 'kpt_center', minimal_diagonal_length: Union[int, float] = 5, background_weight: float = 0.1, decode_nms_kernel: int = 5, decode_max_instances: int = 30, decode_thr: float = 0.01)[源代码]

Encode/decode keypoints with Structured Pose Representation (SPR).

See the paper Single-stage multi-person pose machines by Nie et al (2017) for details

备注

  • instance number: N

  • keypoint number: K

  • keypoint dimension: D

  • image size: [w, h]

  • heatmap size: [W, H]

Encoded:

  • heatmaps (np.ndarray): The generated heatmap in shape (1, H, W)

    where [W, H] is the heatmap_size. If the keypoint heatmap is generated together, the output heatmap shape is (K+1, H, W)

  • heatmap_weights (np.ndarray): The target weights for heatmaps which

    has same shape with heatmaps.

  • displacements (np.ndarray): The dense keypoint displacement in

    shape (K*2, H, W).

  • displacement_weights (np.ndarray): The target weights for heatmaps

    which has same shape with displacements.

参数
  • input_size (tuple) – Image size in [w, h]

  • heatmap_size (tuple) – Heatmap size in [W, H]

  • sigma (float or tuple, optional) – The sigma values of the Gaussian heatmaps. If sigma is a tuple, it includes both sigmas for root and keypoint heatmaps. None means the sigmas are computed automatically from the heatmap size. Defaults to None

  • generate_keypoint_heatmaps (bool) – Whether to generate Gaussian heatmaps for each keypoint. Defaults to False

  • root_type (str) –

    The method to generate the instance root. Options are:

    • 'kpt_center': Average coordinate of all visible keypoints.

    • 'bbox_center': Center point of bounding boxes outlined by

      all visible keypoints.

    Defaults to 'kpt_center'

  • minimal_diagonal_length (int or float) – The threshold of diagonal length of instance bounding box. Small instances will not be used in training. Defaults to 32

  • background_weight (float) – Loss weight of background pixels. Defaults to 0.1

  • decode_thr (float) – The threshold of keypoint response value in heatmaps. Defaults to 0.01

  • decode_nms_kernel (int) – The kernel size of the NMS during decoding, which should be an odd integer. Defaults to 5

  • decode_max_instances (int) – The maximum number of instances to decode. Defaults to 30

decode(heatmaps: torch.Tensor, displacements: torch.Tensor) Tuple[numpy.ndarray, numpy.ndarray][源代码]

Decode the keypoint coordinates from heatmaps and displacements. The decoded keypoint coordinates are in the input image space.

参数
  • heatmaps (Tensor) – Encoded root and keypoints (optional) heatmaps in shape (1, H, W) or (K+1, H, W)

  • displacements (Tensor) – Encoded keypoints displacement fields in shape (K*D, H, W)

返回

  • keypoints (Tensor): Decoded keypoint coordinates in shape

    (N, K, D)

  • scores (tuple):
    • root_scores (Tensor): The root scores in shape (N, )

    • keypoint_scores (Tensor): The keypoint scores in

      shape (N, K). If keypoint heatmaps are not generated, keypoint_scores will be None

返回类型

tuple

encode(keypoints: numpy.ndarray, keypoints_visible: Optional[numpy.ndarray] = None) dict[源代码]

Encode keypoints into root heatmaps and keypoint displacement fields. Note that the original keypoint coordinates should be in the input image space.

参数
  • keypoints (np.ndarray) – Keypoint coordinates in shape (N, K, D)

  • keypoints_visible (np.ndarray) – Keypoint visibilities in shape (N, K)

返回

  • heatmaps (np.ndarray): The generated heatmap in shape

    (1, H, W) where [W, H] is the heatmap_size. If keypoint heatmaps are generated together, the shape is (K+1, H, W)

  • heatmap_weights (np.ndarray): The pixel-wise weight for heatmaps

    which has same shape with heatmaps

  • displacements (np.ndarray): The generated displacement fields in

    shape (K*D, H, W). The vector on each pixels represents the displacement of keypoints belong to the associated instance from this pixel.

  • displacement_weights (np.ndarray): The pixel-wise weight for

    displacements which has same shape with displacements

返回类型

dict

get_keypoint_scores(heatmaps: torch.Tensor, keypoints: torch.Tensor)[源代码]

Calculate the keypoint scores with keypoints heatmaps and coordinates.

参数
  • heatmaps (Tensor) – Keypoint heatmaps in shape (K, H, W)

  • keypoints (Tensor) – Keypoint coordinates in shape (N, K, D)

返回

Keypoint scores in [N, K]

返回类型

Tensor

class mmpose.codecs.SimCCLabel(input_size: Tuple[int, int], smoothing_type: str = 'gaussian', sigma: Union[float, int, Tuple[float]] = 6.0, simcc_split_ratio: float = 2.0, label_smooth_weight: float = 0.0, normalize: bool = True, use_dark: bool = False)[源代码]

Generate keypoint representation via “SimCC” approach. See the paper: `SimCC: a Simple Coordinate Classification Perspective for Human Pose Estimation`_ by Li et al (2022) for more details. Old name: SimDR

备注

  • instance number: N

  • keypoint number: K

  • keypoint dimension: D

  • image size: [w, h]

Encoded:

  • keypoint_x_labels (np.ndarray): The generated SimCC label for x-axis.

    The label shape is (N, K, Wx) if smoothing_type=='gaussian' and (N, K) if smoothing_type==’standard’`, where \(Wx=w*simcc_split_ratio\)

  • keypoint_y_labels (np.ndarray): The generated SimCC label for y-axis.

    The label shape is (N, K, Wy) if smoothing_type=='gaussian' and (N, K) if smoothing_type==’standard’`, where \(Wy=h*simcc_split_ratio\)

  • keypoint_weights (np.ndarray): The target weights in shape (N, K)

参数
  • input_size (tuple) – Input image size in [w, h]

  • smoothing_type (str) – The SimCC label smoothing strategy. Options are 'gaussian' and 'standard'. Defaults to 'gaussian'

  • sigma (float | int | tuple) – The sigma value in the Gaussian SimCC label. Defaults to 6.0

  • simcc_split_ratio (float) – The ratio of the label size to the input size. For example, if the input width is w, the x label size will be \(w*simcc_split_ratio\). Defaults to 2.0

  • label_smooth_weight (float) – Label Smoothing weight. Defaults to 0.0

  • normalize (bool) – Whether to normalize the heatmaps. Defaults to True.

Estimation`: https://arxiv.org/abs/2107.03332

decode(simcc_x: numpy.ndarray, simcc_y: numpy.ndarray) Tuple[numpy.ndarray, numpy.ndarray][源代码]

Decode keypoint coordinates from SimCC representations. The decoded coordinates are in the input image space.

参数
  • encoded (Tuple[np.ndarray, np.ndarray]) – SimCC labels for x-axis and y-axis

  • simcc_x (np.ndarray) – SimCC label for x-axis

  • simcc_y (np.ndarray) – SimCC label for y-axis

返回

  • keypoints (np.ndarray): Decoded coordinates in shape (N, K, D)

  • socres (np.ndarray): The keypoint scores in shape (N, K).

    It usually represents the confidence of the keypoint prediction

返回类型

tuple

encode(keypoints: numpy.ndarray, keypoints_visible: Optional[numpy.ndarray] = None) dict[源代码]

Encoding keypoints into SimCC labels. Note that the original keypoint coordinates should be in the input image space.

参数
  • keypoints (np.ndarray) – Keypoint coordinates in shape (N, K, D)

  • keypoints_visible (np.ndarray) – Keypoint visibilities in shape (N, K)

返回

  • keypoint_x_labels (np.ndarray): The generated SimCC label for

    x-axis. The label shape is (N, K, Wx) if smoothing_type=='gaussian' and (N, K) if smoothing_type==’standard’`, where \(Wx=w*simcc_split_ratio\)

  • keypoint_y_labels (np.ndarray): The generated SimCC label for

    y-axis. The label shape is (N, K, Wy) if smoothing_type=='gaussian' and (N, K) if smoothing_type==’standard’`, where \(Wy=h*simcc_split_ratio\)

  • keypoint_weights (np.ndarray): The target weights in shape

    (N, K)

返回类型

dict

class mmpose.codecs.UDPHeatmap(input_size: Tuple[int, int], heatmap_size: Tuple[int, int], heatmap_type: str = 'gaussian', sigma: float = 2.0, radius_factor: float = 0.0546875, blur_kernel_size: int = 11)[源代码]

Generate keypoint heatmaps by Unbiased Data Processing (UDP). See the paper: `The Devil is in the Details: Delving into Unbiased Data Processing for Human Pose Estimation`_ by Huang et al (2020) for details.

备注

  • instance number: N

  • keypoint number: K

  • keypoint dimension: D

  • image size: [w, h]

  • heatmap size: [W, H]

Encoded:

  • heatmap (np.ndarray): The generated heatmap in shape (C_out, H, W)

    where [W, H] is the heatmap_size, and the C_out is the output channel number which depends on the heatmap_type. If heatmap_type==’gaussian’, C_out equals to keypoint number K; if heatmap_type==’combined’, C_out equals to K*3 (x_offset, y_offset and class label)

  • keypoint_weights (np.ndarray): The target weights in shape (K,)

参数
  • input_size (tuple) – Image size in [w, h]

  • heatmap_size (tuple) – Heatmap size in [W, H]

  • heatmap_type (str) –

    The heatmap type to encode the keypoitns. Options are:

    • 'gaussian': Gaussian heatmap

    • 'combined': Combination of a binary label map and offset

      maps for X and Y axes.

  • sigma (float) – The sigma value of the Gaussian heatmap when heatmap_type=='gaussian'. Defaults to 2.0

  • radius_factor (float) – The radius factor of the binary label map when heatmap_type=='combined'. The positive region is defined as the neighbor of the keypoit with the radius \(r=radius_factor*max(W, H)\). Defaults to 0.0546875

  • blur_kernel_size (int) – The Gaussian blur kernel size of the heatmap modulation in DarkPose. Defaults to 11

Human Pose Estimation`: https://arxiv.org/abs/1911.07524

decode(encoded: numpy.ndarray) Tuple[numpy.ndarray, numpy.ndarray][源代码]

Decode keypoint coordinates from heatmaps. The decoded keypoint coordinates are in the input image space.

参数

encoded (np.ndarray) – Heatmaps in shape (K, H, W)

返回

  • keypoints (np.ndarray): Decoded keypoint coordinates in shape

    (N, K, D)

  • scores (np.ndarray): The keypoint scores in shape (N, K). It

    usually represents the confidence of the keypoint prediction

返回类型

tuple

encode(keypoints: numpy.ndarray, keypoints_visible: Optional[numpy.ndarray] = None) dict[源代码]

Encode keypoints into heatmaps. Note that the original keypoint coordinates should be in the input image space.

参数
  • keypoints (np.ndarray) – Keypoint coordinates in shape (N, K, D)

  • keypoints_visible (np.ndarray) – Keypoint visibilities in shape (N, K)

返回

  • heatmap (np.ndarray): The generated heatmap in shape

    (C_out, H, W) where [W, H] is the heatmap_size, and the C_out is the output channel number which depends on the heatmap_type. If heatmap_type==’gaussian’, C_out equals to keypoint number K; if heatmap_type==’combined’, C_out equals to K*3 (x_offset, y_offset and class label)

  • keypoint_weights (np.ndarray): The target weights in shape

    (K,)

返回类型

dict

mmpose.models

backbones

class mmpose.models.backbones.AlexNet(num_classes=- 1, init_cfg=None)[源代码]

AlexNet backbone.

The input for AlexNet is a 224x224 RGB image.

参数
  • num_classes (int) – number of classes for classification. The default value is -1, which uses the backbone as a feature extractor without the top classifier.

  • init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None

forward(x)[源代码]

Forward function.

参数

x (Tensor | tuple[Tensor]) – x could be a torch.Tensor or a tuple of torch.Tensor, containing input data for forward computation.

class mmpose.models.backbones.CPM(in_channels, out_channels, feat_channels=128, middle_channels=32, num_stages=6, norm_cfg={'requires_grad': True, 'type': 'BN'}, init_cfg=[{'type': 'Normal', 'std': 0.001, 'layer': ['Conv2d']}, {'type': 'Constant', 'val': 1, 'layer': ['_BatchNorm', 'GroupNorm']}])[源代码]

CPM backbone.

Convolutional Pose Machines. More details can be found in the paper .

参数
  • in_channels (int) – The input channels of the CPM.

  • out_channels (int) – The output channels of the CPM.

  • feat_channels (int) – Feature channel of each CPM stage.

  • middle_channels (int) – Feature channel of conv after the middle stage.

  • num_stages (int) – Number of stages.

  • norm_cfg (dict) – Dictionary to construct and config norm layer.

  • init_cfg (dict or list[dict], optional) –

    Initialization config dict. Default: ``[

    dict(type=’Normal’, std=0.001, layer=[‘Conv2d’]), dict(

    type=’Constant’, val=1, layer=[‘_BatchNorm’, ‘GroupNorm’])

    ]``

示例

>>> from mmpose.models import CPM
>>> import torch
>>> self = CPM(3, 17)
>>> self.eval()
>>> inputs = torch.rand(1, 3, 368, 368)
>>> level_outputs = self.forward(inputs)
>>> for level_output in level_outputs:
...     print(tuple(level_output.shape))
(1, 17, 46, 46)
(1, 17, 46, 46)
(1, 17, 46, 46)
(1, 17, 46, 46)
(1, 17, 46, 46)
(1, 17, 46, 46)
forward(x)[源代码]

Model forward function.

class mmpose.models.backbones.HRFormer(extra, in_channels=3, conv_cfg=None, norm_cfg={'requires_grad': True, 'type': 'BN'}, transformer_norm_cfg={'eps': 1e-06, 'type': 'LN'}, norm_eval=False, with_cp=False, zero_init_residual=False, frozen_stages=- 1, init_cfg=[{'type': 'Normal', 'std': 0.001, 'layer': ['Conv2d']}, {'type': 'Constant', 'val': 1, 'layer': ['_BatchNorm', 'GroupNorm']}])[源代码]

HRFormer backbone.

This backbone is the implementation of HRFormer: High-Resolution Transformer for Dense Prediction.

参数
  • extra (dict) –

    Detailed configuration for each stage of HRNet. There must be 4 stages, the configuration for each stage must have 5 keys:

    • num_modules (int): The number of HRModule in this stage.

    • num_branches (int): The number of branches in the HRModule.

    • block (str): The type of block.

    • num_blocks (tuple): The number of blocks in each branch.

      The length must be equal to num_branches.

    • num_channels (tuple): The number of channels in each branch.

      The length must be equal to num_branches.

  • in_channels (int) – Number of input image channels. Normally 3.

  • conv_cfg (dict) – Dictionary to construct and config conv layer. Default: None.

  • norm_cfg (dict) – Config of norm layer. Use SyncBN by default.

  • transformer_norm_cfg (dict) – Config of transformer norm layer. Use LN by default.

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.

  • zero_init_residual (bool) – Whether to use zero init for last norm layer in resblocks to let them behave as identity. Default: False.

  • frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Default: -1.

  • init_cfg (dict or list[dict], optional) –

    Initialization config dict. Default: ``[

    dict(type=’Normal’, std=0.001, layer=[‘Conv2d’]), dict(

    type=’Constant’, val=1, layer=[‘_BatchNorm’, ‘GroupNorm’])

    ]``

示例

>>> from mmpose.models import HRFormer
>>> import torch
>>> extra = dict(
>>>     stage1=dict(
>>>         num_modules=1,
>>>         num_branches=1,
>>>         block='BOTTLENECK',
>>>         num_blocks=(2, ),
>>>         num_channels=(64, )),
>>>     stage2=dict(
>>>         num_modules=1,
>>>         num_branches=2,
>>>         block='HRFORMER',
>>>         window_sizes=(7, 7),
>>>         num_heads=(1, 2),
>>>         mlp_ratios=(4, 4),
>>>         num_blocks=(2, 2),
>>>         num_channels=(32, 64)),
>>>     stage3=dict(
>>>         num_modules=4,
>>>         num_branches=3,
>>>         block='HRFORMER',
>>>         window_sizes=(7, 7, 7),
>>>         num_heads=(1, 2, 4),
>>>         mlp_ratios=(4, 4, 4),
>>>         num_blocks=(2, 2, 2),
>>>         num_channels=(32, 64, 128)),
>>>     stage4=dict(
>>>         num_modules=2,
>>>         num_branches=4,
>>>         block='HRFORMER',
>>>         window_sizes=(7, 7, 7, 7),
>>>         num_heads=(1, 2, 4, 8),
>>>         mlp_ratios=(4, 4, 4, 4),
>>>         num_blocks=(2, 2, 2, 2),
>>>         num_channels=(32, 64, 128, 256)))
>>> self = HRFormer(extra, in_channels=1)
>>> self.eval()
>>> inputs = torch.rand(1, 1, 32, 32)
>>> level_outputs = self.forward(inputs)
>>> for level_out in level_outputs:
...     print(tuple(level_out.shape))
(1, 32, 8, 8)
(1, 64, 4, 4)
(1, 128, 2, 2)
(1, 256, 1, 1)
class mmpose.models.backbones.HRNet(extra, in_channels=3, conv_cfg=None, norm_cfg={'type': 'BN'}, norm_eval=False, with_cp=False, zero_init_residual=False, frozen_stages=- 1, init_cfg=[{'type': 'Normal', 'std': 0.001, 'layer': ['Conv2d']}, {'type': 'Constant', 'val': 1, 'layer': ['_BatchNorm', 'GroupNorm']}])[源代码]

HRNet backbone.

High-Resolution Representations for Labeling Pixels and Regions

参数
  • extra (dict) – detailed configuration for each stage of HRNet.

  • in_channels (int) – Number of input image channels. Default: 3.

  • conv_cfg (dict) – dictionary to construct and config conv layer.

  • norm_cfg (dict) – dictionary to construct and config norm layer.

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False

  • with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed.

  • zero_init_residual (bool) – whether to use zero init for last norm layer in resblocks to let them behave as identity.

  • frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Default: -1.

  • init_cfg (dict or list[dict], optional) –

    Initialization config dict. Default: ``[

    dict(type=’Normal’, std=0.001, layer=[‘Conv2d’]), dict(

    type=’Constant’, val=1, layer=[‘_BatchNorm’, ‘GroupNorm’])

    ]``

示例

>>> from mmpose.models import HRNet
>>> import torch
>>> extra = dict(
>>>     stage1=dict(
>>>         num_modules=1,
>>>         num_branches=1,
>>>         block='BOTTLENECK',
>>>         num_blocks=(4, ),
>>>         num_channels=(64, )),
>>>     stage2=dict(
>>>         num_modules=1,
>>>         num_branches=2,
>>>         block='BASIC',
>>>         num_blocks=(4, 4),
>>>         num_channels=(32, 64)),
>>>     stage3=dict(
>>>         num_modules=4,
>>>         num_branches=3,
>>>         block='BASIC',
>>>         num_blocks=(4, 4, 4),
>>>         num_channels=(32, 64, 128)),
>>>     stage4=dict(
>>>         num_modules=3,
>>>         num_branches=4,
>>>         block='BASIC',
>>>         num_blocks=(4, 4, 4, 4),
>>>         num_channels=(32, 64, 128, 256)))
>>> self = HRNet(extra, in_channels=1)
>>> self.eval()
>>> inputs = torch.rand(1, 1, 32, 32)
>>> level_outputs = self.forward(inputs)
>>> for level_out in level_outputs:
...     print(tuple(level_out.shape))
(1, 32, 8, 8)
forward(x)[源代码]

Forward function.

init_weights()[源代码]

Initialize the weights in backbone.

property norm1

the normalization layer named “norm1”

Type

nn.Module

property norm2

the normalization layer named “norm2”

Type

nn.Module

train(mode=True)[源代码]

Convert the model into training mode.

class mmpose.models.backbones.HourglassAENet(downsample_times=4, num_stacks=1, out_channels=34, stage_channels=(256, 384, 512, 640, 768), feat_channels=256, norm_cfg={'requires_grad': True, 'type': 'BN'}, init_cfg=[{'type': 'Normal', 'std': 0.001, 'layer': ['Conv2d']}, {'type': 'Constant', 'val': 1, 'layer': ['_BatchNorm', 'GroupNorm']}])[源代码]

Hourglass-AE Network proposed by Newell et al.

Associative Embedding: End-to-End Learning for Joint Detection and Grouping.

More details can be found in the paper .

参数
  • downsample_times (int) – Downsample times in a HourglassModule.

  • num_stacks (int) – Number of HourglassModule modules stacked, 1 for Hourglass-52, 2 for Hourglass-104.

  • stage_channels (list[int]) – Feature channel of each sub-module in a HourglassModule.

  • stage_blocks (list[int]) – Number of sub-modules stacked in a HourglassModule.

  • feat_channels (int) – Feature channel of conv after a HourglassModule.

  • norm_cfg (dict) – Dictionary to construct and config norm layer.

  • init_cfg (dict or list[dict], optional) –

    Initialization config dict. Default: ``[

    dict(type=’Normal’, std=0.001, layer=[‘Conv2d’]), dict(

    type=’Constant’, val=1, layer=[‘_BatchNorm’, ‘GroupNorm’])

    ]``

示例

>>> from mmpose.models import HourglassAENet
>>> import torch
>>> self = HourglassAENet()
>>> self.eval()
>>> inputs = torch.rand(1, 3, 512, 512)
>>> level_outputs = self.forward(inputs)
>>> for level_output in level_outputs:
...     print(tuple(level_output.shape))
(1, 34, 128, 128)
forward(x)[源代码]

Model forward function.

class mmpose.models.backbones.HourglassNet(downsample_times=5, num_stacks=2, stage_channels=(256, 256, 384, 384, 384, 512), stage_blocks=(2, 2, 2, 2, 2, 4), feat_channel=256, norm_cfg={'requires_grad': True, 'type': 'BN'}, init_cfg=[{'type': 'Normal', 'std': 0.001, 'layer': ['Conv2d']}, {'type': 'Constant', 'val': 1, 'layer': ['_BatchNorm', 'GroupNorm']}])[源代码]

HourglassNet backbone.

Stacked Hourglass Networks for Human Pose Estimation. More details can be found in the paper .

参数
  • downsample_times (int) – Downsample times in a HourglassModule.

  • num_stacks (int) – Number of HourglassModule modules stacked, 1 for Hourglass-52, 2 for Hourglass-104.

  • stage_channels (list[int]) – Feature channel of each sub-module in a HourglassModule.

  • stage_blocks (list[int]) – Number of sub-modules stacked in a HourglassModule.

  • feat_channel (int) – Feature channel of conv after a HourglassModule.

  • norm_cfg (dict) – Dictionary to construct and config norm layer.

  • init_cfg (dict or list[dict], optional) –

    Initialization config dict. Default: ``[

    dict(type=’Normal’, std=0.001, layer=[‘Conv2d’]), dict(

    type=’Constant’, val=1, layer=[‘_BatchNorm’, ‘GroupNorm’])

    ]``

示例

>>> from mmpose.models import HourglassNet
>>> import torch
>>> self = HourglassNet()
>>> self.eval()
>>> inputs = torch.rand(1, 3, 511, 511)
>>> level_outputs = self.forward(inputs)
>>> for level_output in level_outputs:
...     print(tuple(level_output.shape))
(1, 256, 128, 128)
(1, 256, 128, 128)
forward(x)[源代码]

Model forward function.

class mmpose.models.backbones.LiteHRNet(extra, in_channels=3, conv_cfg=None, norm_cfg={'type': 'BN'}, norm_eval=False, with_cp=False, init_cfg=[{'type': 'Normal', 'std': 0.001, 'layer': ['Conv2d']}, {'type': 'Constant', 'val': 1, 'layer': ['_BatchNorm', 'GroupNorm']}])[源代码]

Lite-HRNet backbone.

Lite-HRNet: A Lightweight High-Resolution Network.

Code adapted from ‘https://github.com/HRNet/Lite-HRNet’.

参数
  • extra (dict) – detailed configuration for each stage of HRNet.

  • in_channels (int) – Number of input image channels. Default: 3.

  • conv_cfg (dict) – dictionary to construct and config conv layer.

  • norm_cfg (dict) – dictionary to construct and config norm layer.

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False

  • with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed.

  • init_cfg (dict or list[dict], optional) –

    Initialization config dict. Default: ``[

    dict(type=’Normal’, std=0.001, layer=[‘Conv2d’]), dict(

    type=’Constant’, val=1, layer=[‘_BatchNorm’, ‘GroupNorm’])

    ]``

示例

>>> from mmpose.models import LiteHRNet
>>> import torch
>>> extra=dict(
>>>    stem=dict(stem_channels=32, out_channels=32, expand_ratio=1),
>>>    num_stages=3,
>>>    stages_spec=dict(
>>>        num_modules=(2, 4, 2),
>>>        num_branches=(2, 3, 4),
>>>        num_blocks=(2, 2, 2),
>>>        module_type=('LITE', 'LITE', 'LITE'),
>>>        with_fuse=(True, True, True),
>>>        reduce_ratios=(8, 8, 8),
>>>        num_channels=(
>>>            (40, 80),
>>>            (40, 80, 160),
>>>            (40, 80, 160, 320),
>>>        )),
>>>    with_head=False)
>>> self = LiteHRNet(extra, in_channels=1)
>>> self.eval()
>>> inputs = torch.rand(1, 1, 32, 32)
>>> level_outputs = self.forward(inputs)
>>> for level_out in level_outputs:
...     print(tuple(level_out.shape))
(1, 40, 8, 8)
forward(x)[源代码]

Forward function.

train(mode=True)[源代码]

Convert the model into training mode.

class mmpose.models.backbones.MSPN(unit_channels=256, num_stages=4, num_units=4, num_blocks=[2, 2, 2, 2], norm_cfg={'type': 'BN'}, res_top_channels=64, init_cfg=[{'type': 'Kaiming', 'layer': ['Conv2d']}, {'type': 'Constant', 'val': 1, 'layer': ['_BatchNorm', 'GroupNorm']}, {'type': 'Normal', 'std': 0.01, 'layer': ['Linear']}])[源代码]

MSPN backbone. Paper ref: Li et al. “Rethinking on Multi-Stage Networks for Human Pose Estimation” (CVPR 2020).

参数
  • unit_channels (int) – Number of Channels in an upsample unit. Default: 256

  • num_stages (int) – Number of stages in a multi-stage MSPN. Default: 4

  • num_units (int) – Number of downsample/upsample units in a single-stage network. Default: 4 Note: Make sure num_units == len(self.num_blocks)

  • num_blocks (list) – Number of bottlenecks in each downsample unit. Default: [2, 2, 2, 2]

  • norm_cfg (dict) – dictionary to construct and config norm layer. Default: dict(type=’BN’)

  • res_top_channels (int) – Number of channels of feature from ResNetTop. Default: 64.

  • init_cfg (dict or list[dict], optional) –

    Initialization config dict. Default: ``[

    dict(type=’Kaiming’, layer=[‘Conv2d’]), dict(

    type=’Constant’, val=1, layer=[‘_BatchNorm’, ‘GroupNorm’]),

    dict(

    type=’Normal’, std=0.01, layer=[‘Linear’]),

    ]``

示例

>>> from mmpose.models import MSPN
>>> import torch
>>> self = MSPN(num_stages=2,num_units=2,num_blocks=[2,2])
>>> self.eval()
>>> inputs = torch.rand(1, 3, 511, 511)
>>> level_outputs = self.forward(inputs)
>>> for level_output in level_outputs:
...     for feature in level_output:
...         print(tuple(feature.shape))
...
(1, 256, 64, 64)
(1, 256, 128, 128)
(1, 256, 64, 64)
(1, 256, 128, 128)
forward(x)[源代码]

Model forward function.

init_weights()[源代码]

Initialize model weights.

class mmpose.models.backbones.MobileNetV2(widen_factor=1.0, out_indices=(7,), frozen_stages=- 1, conv_cfg=None, norm_cfg={'type': 'BN'}, act_cfg={'type': 'ReLU6'}, norm_eval=False, with_cp=False, init_cfg=[{'type': 'Kaiming', 'layer': ['Conv2d']}, {'type': 'Constant', 'val': 1, 'layer': ['_BatchNorm', 'GroupNorm']}])[源代码]

MobileNetV2 backbone.

参数
  • widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Default: 1.0.

  • out_indices (None or Sequence[int]) – Output from which stages. Default: (7, ).

  • frozen_stages (int) – Stages to be frozen (all param fixed). Default: -1, which means not freezing any parameters.

  • conv_cfg (dict) – Config dict for convolution layer. Default: None, which means using conv2d.

  • norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’BN’).

  • act_cfg (dict) – Config dict for activation layer. Default: dict(type=’ReLU6’).

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.

  • with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.

  • init_cfg (dict or list[dict], optional) –

    Initialization config dict. Default: ``[

    dict(type=’Kaiming’, layer=[‘Conv2d’]), dict(

    type=’Constant’, val=1, layer=[‘_BatchNorm’, ‘GroupNorm’])

    ]``

forward(x)[源代码]

Forward function.

参数

x (Tensor | tuple[Tensor]) – x could be a torch.Tensor or a tuple of torch.Tensor, containing input data for forward computation.

make_layer(out_channels, num_blocks, stride, expand_ratio)[源代码]

Stack InvertedResidual blocks to build a layer for MobileNetV2.

参数
  • out_channels (int) – out_channels of block.

  • num_blocks (int) – number of blocks.

  • stride (int) – stride of the first block. Default: 1

  • expand_ratio (int) – Expand the number of channels of the hidden layer in InvertedResidual by this ratio. Default: 6.

train(mode=True)[源代码]

Set module status before forward computation.

参数

mode (bool) – Whether it is train_mode or test_mode

class mmpose.models.backbones.MobileNetV3(arch='small', conv_cfg=None, norm_cfg={'type': 'BN'}, out_indices=(- 1,), frozen_stages=- 1, norm_eval=False, with_cp=False, init_cfg=[{'type': 'Kaiming', 'layer': ['Conv2d']}, {'type': 'Constant', 'val': 1, 'layer': ['_BatchNorm']}])[源代码]

MobileNetV3 backbone.

参数
  • arch (str) – Architecture of mobilnetv3, from {small, big}. Default: small.

  • conv_cfg (dict) – Config dict for convolution layer. Default: None, which means using conv2d.

  • norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’BN’).

  • out_indices (None or Sequence[int]) – Output from which stages. Default: (-1, ), which means output tensors from final stage.

  • frozen_stages (int) – Stages to be frozen (all param fixed). Default: -1, which means not freezing any parameters.

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.

  • with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.

  • init_cfg (dict or list[dict], optional) –

    Initialization config dict. Default: ``[

    dict(type=’Kaiming’, layer=[‘Conv2d’]), dict(

    type=’Constant’, val=1, layer=[‘_BatchNorm’])

    ]``

forward(x)[源代码]

Forward function.

参数

x (Tensor | tuple[Tensor]) – x could be a torch.Tensor or a tuple of torch.Tensor, containing input data for forward computation.

train(mode=True)[源代码]

Set module status before forward computation.

参数

mode (bool) – Whether it is train_mode or test_mode

class mmpose.models.backbones.PyramidVisionTransformer(pretrain_img_size=224, in_channels=3, embed_dims=64, num_stages=4, num_layers=[3, 4, 6, 3], num_heads=[1, 2, 5, 8], patch_sizes=[4, 2, 2, 2], strides=[4, 2, 2, 2], paddings=[0, 0, 0, 0], sr_ratios=[8, 4, 2, 1], out_indices=(0, 1, 2, 3), mlp_ratios=[8, 8, 4, 4], qkv_bias=True, drop_rate=0.0, attn_drop_rate=0.0, drop_path_rate=0.1, use_abs_pos_embed=True, norm_after_stage=False, use_conv_ffn=False, act_cfg={'type': 'GELU'}, norm_cfg={'eps': 1e-06, 'type': 'LN'}, convert_weights=True, init_cfg=[{'type': 'TruncNormal', 'std': 0.02, 'layer': ['Linear']}, {'type': 'Constant', 'val': 1, 'layer': ['LayerNorm']}, {'type': 'Kaiming', 'layer': ['Conv2d']}])[源代码]

Pyramid Vision Transformer (PVT)

Implementation of Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions.

参数
  • pretrain_img_size (int | tuple[int]) – The size of input image when pretrain. Defaults: 224.

  • in_channels (int) – Number of input channels. Default: 3.

  • embed_dims (int) – Embedding dimension. Default: 64.

  • num_stags (int) – The num of stages. Default: 4.

  • num_layers (Sequence[int]) – The layer number of each transformer encode layer. Default: [3, 4, 6, 3].

  • num_heads (Sequence[int]) – The attention heads of each transformer encode layer. Default: [1, 2, 5, 8].

  • patch_sizes (Sequence[int]) – The patch_size of each patch embedding. Default: [4, 2, 2, 2].

  • strides (Sequence[int]) – The stride of each patch embedding. Default: [4, 2, 2, 2].

  • paddings (Sequence[int]) – The padding of each patch embedding. Default: [0, 0, 0, 0].

  • sr_ratios (Sequence[int]) – The spatial reduction rate of each transformer encode layer. Default: [8, 4, 2, 1].

  • out_indices (Sequence[int] | int) – Output from which stages. Default: (0, 1, 2, 3).

  • mlp_ratios (Sequence[int]) – The ratio of the mlp hidden dim to the embedding dim of each transformer encode layer. Default: [8, 8, 4, 4].

  • qkv_bias (bool) – Enable bias for qkv if True. Default: True.

  • drop_rate (float) – Probability of an element to be zeroed. Default 0.0.

  • attn_drop_rate (float) – The drop out rate for attention layer. Default 0.0.

  • drop_path_rate (float) – stochastic depth rate. Default 0.1.

  • use_abs_pos_embed (bool) – If True, add absolute position embedding to the patch embedding. Defaults: True.

  • use_conv_ffn (bool) – If True, use Convolutional FFN to replace FFN. Default: False.

  • act_cfg (dict) – The activation config for FFNs. Default: dict(type=’GELU’).

  • norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’LN’).

  • pretrained (str, optional) – model pretrained path. Default: None.

  • convert_weights (bool) – The flag indicates whether the pre-trained model is from the original repo. We may need to convert some keys to make it compatible. Default: True.

  • init_cfg (dict or list[dict], optional) –

    Initialization config dict. Default: ``[

    dict(type=’TruncNormal’, std=.02, layer=[‘Linear’]), dict(type=’Constant’, val=1, layer=[‘LayerNorm’]), dict(type=’Normal’, std=0.01, layer=[‘Conv2d’])

    ]``

forward(x)[源代码]

Defines the computation performed at every call.

Should be overridden by all subclasses.

备注

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

init_weights()[源代码]

Initialize the weights in backbone.

class mmpose.models.backbones.PyramidVisionTransformerV2(**kwargs)[源代码]

Implementation of PVTv2: Improved Baselines with Pyramid Vision Transformer.

class mmpose.models.backbones.RSN(unit_channels=256, num_stages=4, num_units=4, num_blocks=[2, 2, 2, 2], num_steps=4, norm_cfg={'type': 'BN'}, res_top_channels=64, expand_times=26, init_cfg=[{'type': 'Kaiming', 'layer': ['Conv2d']}, {'type': 'Constant', 'val': 1, 'layer': ['_BatchNorm', 'GroupNorm']}, {'type': 'Normal', 'std': 0.01, 'layer': ['Linear']}])[源代码]

Residual Steps Network backbone. Paper ref: Cai et al. “Learning Delicate Local Representations for Multi-Person Pose Estimation” (ECCV 2020).

参数
  • unit_channels (int) – Number of Channels in an upsample unit. Default: 256

  • num_stages (int) – Number of stages in a multi-stage RSN. Default: 4

  • num_units (int) – NUmber of downsample/upsample units in a single-stage RSN. Default: 4 Note: Make sure num_units == len(self.num_blocks)

  • num_blocks (list) – Number of RSBs (Residual Steps Block) in each downsample unit. Default: [2, 2, 2, 2]

  • num_steps (int) – Number of steps in a RSB. Default:4

  • norm_cfg (dict) – dictionary to construct and config norm layer. Default: dict(type=’BN’)

  • res_top_channels (int) – Number of channels of feature from ResNet_top. Default: 64.

  • expand_times (int) – Times by which the in_channels are expanded in RSB. Default:26.

  • init_cfg (dict or list[dict], optional) –

    Initialization config dict. Default: ``[

    dict(type=’Kaiming’, layer=[‘Conv2d’]), dict(

    type=’Constant’, val=1, layer=[‘_BatchNorm’, ‘GroupNorm’]),

    dict(

    type=’Normal’, std=0.01, layer=[‘Linear’]),

    ]``

示例

>>> from mmpose.models import RSN
>>> import torch
>>> self = RSN(num_stages=2,num_units=2,num_blocks=[2,2])
>>> self.eval()
>>> inputs = torch.rand(1, 3, 511, 511)
>>> level_outputs = self.forward(inputs)
>>> for level_output in level_outputs:
...     for feature in level_output:
...         print(tuple(feature.shape))
...
(1, 256, 64, 64)
(1, 256, 128, 128)
(1, 256, 64, 64)
(1, 256, 128, 128)
forward(x)[源代码]

Model forward function.

class mmpose.models.backbones.RegNet(arch, in_channels=3, stem_channels=32, base_channels=32, strides=(2, 2, 2, 2), dilations=(1, 1, 1, 1), out_indices=(3,), style='pytorch', deep_stem=False, avg_down=False, frozen_stages=- 1, conv_cfg=None, norm_cfg={'requires_grad': True, 'type': 'BN'}, norm_eval=False, with_cp=False, zero_init_residual=True, init_cfg=[{'type': 'Kaiming', 'layer': ['Conv2d']}, {'type': 'Constant', 'val': 1, 'layer': ['_BatchNorm', 'GroupNorm']}])[源代码]

RegNet backbone.

More details can be found in paper .

参数
  • arch (dict) – The parameter of RegNets. - w0 (int): initial width - wa (float): slope of width - wm (float): quantization parameter to quantize the width - depth (int): depth of the backbone - group_w (int): width of group - bot_mul (float): bottleneck ratio, i.e. expansion of bottleneck.

  • strides (Sequence[int]) – Strides of the first block of each stage.

  • base_channels (int) – Base channels after stem layer.

  • in_channels (int) – Number of input image channels. Default: 3.

  • dilations (Sequence[int]) – Dilation of each stage.

  • out_indices (Sequence[int]) – Output from which stages.

  • style (str) – pytorch or caffe. If set to “pytorch”, the stride-two layer is the 3x3 conv layer, otherwise the stride-two layer is the first 1x1 conv layer. Default: “pytorch”.

  • frozen_stages (int) – Stages to be frozen (all param fixed). -1 means not freezing any parameters. Default: -1.

  • norm_cfg (dict) – dictionary to construct and config norm layer. Default: dict(type=’BN’, requires_grad=True).

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.

  • with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.

  • zero_init_residual (bool) – whether to use zero init for last norm layer in resblocks to let them behave as identity. Default: True.

  • init_cfg (dict or list[dict], optional) –

    Initialization config dict. Default: ``[

    dict(type=’Kaiming’, layer=[‘Conv2d’]), dict(

    type=’Constant’, val=1, layer=[‘_BatchNorm’, ‘GroupNorm’])

    ]``

示例

>>> from mmpose.models import RegNet
>>> import torch
>>> self = RegNet(
        arch=dict(
            w0=88,
            wa=26.31,
            wm=2.25,
            group_w=48,
            depth=25,
            bot_mul=1.0),
         out_indices=(0, 1, 2, 3))
>>> self.eval()
>>> inputs = torch.rand(1, 3, 32, 32)
>>> level_outputs = self.forward(inputs)
>>> for level_out in level_outputs:
...     print(tuple(level_out.shape))
(1, 96, 8, 8)
(1, 192, 4, 4)
(1, 432, 2, 2)
(1, 1008, 1, 1)
adjust_width_group(widths, bottleneck_ratio, groups)[源代码]

Adjusts the compatibility of widths and groups.

参数
  • widths (list[int]) – Width of each stage.

  • bottleneck_ratio (float) – Bottleneck ratio.

  • groups (int) – number of groups in each stage

返回

The adjusted widths and groups of each stage.

返回类型

tuple(list)

forward(x)[源代码]

Forward function.

static generate_regnet(initial_width, width_slope, width_parameter, depth, divisor=8)[源代码]

Generates per block width from RegNet parameters.

参数
  • initial_width ([int]) – Initial width of the backbone

  • width_slope ([float]) – Slope of the quantized linear function

  • width_parameter ([int]) – Parameter used to quantize the width.

  • depth ([int]) – Depth of the backbone.

  • divisor (int, optional) – The divisor of channels. Defaults to 8.

返回

return a list of widths of each stage and the number of

stages

返回类型

list, int

get_stages_from_blocks(widths)[源代码]

Gets widths/stage_blocks of network at each stage.

参数

widths (list[int]) – Width in each stage.

返回

width and depth of each stage

返回类型

tuple(list)

static quantize_float(number, divisor)[源代码]

Converts a float to closest non-zero int divisible by divior.

参数
  • number (int) – Original number to be quantized.

  • divisor (int) – Divisor used to quantize the number.

返回

quantized number that is divisible by devisor.

返回类型

int

class mmpose.models.backbones.ResNeSt(depth, groups=1, width_per_group=4, radix=2, reduction_factor=4, avg_down_stride=True, **kwargs)[源代码]

ResNeSt backbone.

Please refer to the paper for details.

参数
  • depth (int) – Network depth, from {50, 101, 152, 200}.

  • groups (int) – Groups of conv2 in Bottleneck. Default: 32.

  • width_per_group (int) – Width per group of conv2 in Bottleneck. Default: 4.

  • radix (int) – Radix of SpltAtConv2d. Default: 2

  • reduction_factor (int) – Reduction factor of SplitAttentionConv2d. Default: 4.

  • avg_down_stride (bool) – Whether to use average pool for stride in Bottleneck. Default: True.

  • in_channels (int) – Number of input image channels. Default: 3.

  • stem_channels (int) – Output channels of the stem layer. Default: 64.

  • num_stages (int) – Stages of the network. Default: 4.

  • strides (Sequence[int]) – Strides of the first block of each stage. Default: (1, 2, 2, 2).

  • dilations (Sequence[int]) – Dilation of each stage. Default: (1, 1, 1, 1).

  • out_indices (Sequence[int]) – Output from which stages. If only one stage is specified, a single tensor (feature map) is returned, otherwise multiple stages are specified, a tuple of tensors will be returned. Default: (3, ).

  • style (str) – pytorch or caffe. If set to “pytorch”, the stride-two layer is the 3x3 conv layer, otherwise the stride-two layer is the first 1x1 conv layer.

  • deep_stem (bool) – Replace 7x7 conv in input stem with 3 3x3 conv. Default: False.

  • avg_down (bool) – Use AvgPool instead of stride conv when downsampling in the bottleneck. Default: False.

  • frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Default: -1.

  • conv_cfg (dict | None) – The config dict for conv layers. Default: None.

  • norm_cfg (dict) – The config dict for norm layers.

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.

  • with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.

  • zero_init_residual (bool) – Whether to use zero init for last norm layer in resblocks to let them behave as identity. Default: True.

  • init_cfg (dict or list[dict], optional) –

    Initialization config dict. Default: ``[

    dict(type=’Kaiming’, layer=[‘Conv2d’]), dict(

    type=’Constant’, val=1, layer=[‘_BatchNorm’, ‘GroupNorm’])

    ]``

make_res_layer(**kwargs)[源代码]

Make a ResLayer.

class mmpose.models.backbones.ResNeXt(depth, groups=32, width_per_group=4, **kwargs)[源代码]

ResNeXt backbone.

Please refer to the paper for details.

参数
  • depth (int) – Network depth, from {50, 101, 152}.

  • groups (int) – Groups of conv2 in Bottleneck. Default: 32.

  • width_per_group (int) – Width per group of conv2 in Bottleneck. Default: 4.

  • in_channels (int) – Number of input image channels. Default: 3.

  • stem_channels (int) – Output channels of the stem layer. Default: 64.

  • num_stages (int) – Stages of the network. Default: 4.

  • strides (Sequence[int]) – Strides of the first block of each stage. Default: (1, 2, 2, 2).

  • dilations (Sequence[int]) – Dilation of each stage. Default: (1, 1, 1, 1).

  • out_indices (Sequence[int]) – Output from which stages. If only one stage is specified, a single tensor (feature map) is returned, otherwise multiple stages are specified, a tuple of tensors will be returned. Default: (3, ).

  • style (str) – pytorch or caffe. If set to “pytorch”, the stride-two layer is the 3x3 conv layer, otherwise the stride-two layer is the first 1x1 conv layer.

  • deep_stem (bool) – Replace 7x7 conv in input stem with 3 3x3 conv. Default: False.

  • avg_down (bool) – Use AvgPool instead of stride conv when downsampling in the bottleneck. Default: False.

  • frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Default: -1.

  • conv_cfg (dict | None) – The config dict for conv layers. Default: None.

  • norm_cfg (dict) – The config dict for norm layers.

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.

  • with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.

  • zero_init_residual (bool) – Whether to use zero init for last norm layer in resblocks to let them behave as identity. Default: True.

  • init_cfg

    Initialization config dict. Default: ``[

    dict(type=’Kaiming’, layer=[‘Conv2d’]), dict(

    type=’Constant’, val=1, layer=[‘_BatchNorm’, ‘GroupNorm’])

    ]``

make_res_layer(**kwargs)[源代码]

Make a ResLayer.

class mmpose.models.backbones.ResNet(depth, in_channels=3, stem_channels=64, base_channels=64, expansion=None, num_stages=4, strides=(1, 2, 2, 2), dilations=(1, 1, 1, 1), out_indices=(3,), style='pytorch', deep_stem=False, avg_down=False, frozen_stages=- 1, conv_cfg=None, norm_cfg={'requires_grad': True, 'type': 'BN'}, norm_eval=False, with_cp=False, zero_init_residual=True, init_cfg=[{'type': 'Kaiming', 'layer': ['Conv2d']}, {'type': 'Constant', 'val': 1, 'layer': ['_BatchNorm', 'GroupNorm']}])[源代码]

ResNet backbone.

Please refer to the paper for details.

参数
  • depth (int) – Network depth, from {18, 34, 50, 101, 152}.

  • in_channels (int) – Number of input image channels. Default: 3.

  • stem_channels (int) – Output channels of the stem layer. Default: 64.

  • base_channels (int) – Middle channels of the first stage. Default: 64.

  • num_stages (int) – Stages of the network. Default: 4.

  • strides (Sequence[int]) – Strides of the first block of each stage. Default: (1, 2, 2, 2).

  • dilations (Sequence[int]) – Dilation of each stage. Default: (1, 1, 1, 1).

  • out_indices (Sequence[int]) – Output from which stages. If only one stage is specified, a single tensor (feature map) is returned, otherwise multiple stages are specified, a tuple of tensors will be returned. Default: (3, ).

  • style (str) – pytorch or caffe. If set to “pytorch”, the stride-two layer is the 3x3 conv layer, otherwise the stride-two layer is the first 1x1 conv layer.

  • deep_stem (bool) – Replace 7x7 conv in input stem with 3 3x3 conv. Default: False.

  • avg_down (bool) – Use AvgPool instead of stride conv when downsampling in the bottleneck. Default: False.

  • frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Default: -1.

  • conv_cfg (dict | None) – The config dict for conv layers. Default: None.

  • norm_cfg (dict) – The config dict for norm layers.

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.

  • with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.

  • zero_init_residual (bool) – Whether to use zero init for last norm layer in resblocks to let them behave as identity. Default: True.

  • init_cfg (dict or list[dict], optional) –

    Initialization config dict. Default: ``[

    dict(type=’Kaiming’, layer=[‘Conv2d’]), dict(

    type=’Constant’, val=1, layer=[‘_BatchNorm’, ‘GroupNorm’])

    ]``

示例

>>> from mmpose.models import ResNet
>>> import torch
>>> self = ResNet(depth=18, out_indices=(0, 1, 2, 3))
>>> self.eval()
>>> inputs = torch.rand(1, 3, 32, 32)
>>> level_outputs = self.forward(inputs)
>>> for level_out in level_outputs:
...     print(tuple(level_out.shape))
(1, 64, 8, 8)
(1, 128, 4, 4)
(1, 256, 2, 2)
(1, 512, 1, 1)
forward(x)[源代码]

Forward function.

init_weights()[源代码]

Initialize the weights in backbone.

make_res_layer(**kwargs)[源代码]

Make a ResLayer.

property norm1

the normalization layer named “norm1”

Type

nn.Module

train(mode=True)[源代码]

Convert the model into training mode.

class mmpose.models.backbones.ResNetV1d(**kwargs)[源代码]

ResNetV1d variant described in Bag of Tricks.

Compared with default ResNet(ResNetV1b), ResNetV1d replaces the 7x7 conv in the input stem with three 3x3 convs. And in the downsampling block, a 2x2 avg_pool with stride 2 is added before conv, whose stride is changed to 1.

class mmpose.models.backbones.SCNet(depth, **kwargs)[源代码]

SCNet backbone.

Improving Convolutional Networks with Self-Calibrated Convolutions, Jiang-Jiang Liu, Qibin Hou, Ming-Ming Cheng, Changhu Wang, Jiashi Feng, IEEE CVPR, 2020. http://mftp.mmcheng.net/Papers/20cvprSCNet.pdf

参数
  • depth (int) – Depth of scnet, from {50, 101}.

  • in_channels (int) – Number of input image channels. Normally 3.

  • base_channels (int) – Number of base channels of hidden layer.

  • num_stages (int) – SCNet stages, normally 4.

  • strides (Sequence[int]) – Strides of the first block of each stage.

  • dilations (Sequence[int]) – Dilation of each stage.

  • out_indices (Sequence[int]) – Output from which stages.

  • style (str) – pytorch or caffe. If set to “pytorch”, the stride-two layer is the 3x3 conv layer, otherwise the stride-two layer is the first 1x1 conv layer.

  • deep_stem (bool) – Replace 7x7 conv in input stem with 3 3x3 conv

  • avg_down (bool) – Use AvgPool instead of stride conv when downsampling in the bottleneck.

  • frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters.

  • norm_cfg (dict) – Dictionary to construct and config norm layer.

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only.

  • with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed.

  • zero_init_residual (bool) – Whether to use zero init for last norm layer in resblocks to let them behave as identity.

示例

>>> from mmpose.models import SCNet
>>> import torch
>>> self = SCNet(depth=50, out_indices=(0, 1, 2, 3))
>>> self.eval()
>>> inputs = torch.rand(1, 3, 224, 224)
>>> level_outputs = self.forward(inputs)
>>> for level_out in level_outputs:
...     print(tuple(level_out.shape))
(1, 256, 56, 56)
(1, 512, 28, 28)
(1, 1024, 14, 14)
(1, 2048, 7, 7)
class mmpose.models.backbones.SEResNeXt(depth, groups=32, width_per_group=4, **kwargs)[源代码]

SEResNeXt backbone.

Please refer to the paper for details.

参数
  • depth (int) – Network depth, from {50, 101, 152}.

  • groups (int) – Groups of conv2 in Bottleneck. Default: 32.

  • width_per_group (int) – Width per group of conv2 in Bottleneck. Default: 4.

  • se_ratio (int) – Squeeze ratio in SELayer. Default: 16.

  • in_channels (int) – Number of input image channels. Default: 3.

  • stem_channels (int) – Output channels of the stem layer. Default: 64.

  • num_stages (int) – Stages of the network. Default: 4.

  • strides (Sequence[int]) – Strides of the first block of each stage. Default: (1, 2, 2, 2).

  • dilations (Sequence[int]) – Dilation of each stage. Default: (1, 1, 1, 1).

  • out_indices (Sequence[int]) – Output from which stages. If only one stage is specified, a single tensor (feature map) is returned, otherwise multiple stages are specified, a tuple of tensors will be returned. Default: (3, ).

  • style (str) – pytorch or caffe. If set to “pytorch”, the stride-two layer is the 3x3 conv layer, otherwise the stride-two layer is the first 1x1 conv layer.

  • deep_stem (bool) – Replace 7x7 conv in input stem with 3 3x3 conv. Default: False.

  • avg_down (bool) – Use AvgPool instead of stride conv when downsampling in the bottleneck. Default: False.

  • frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Default: -1.

  • conv_cfg (dict | None) – The config dict for conv layers. Default: None.

  • norm_cfg (dict) – The config dict for norm layers.

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.

  • with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.

  • zero_init_residual (bool) – Whether to use zero init for last norm layer in resblocks to let them behave as identity. Default: True.

  • init_cfg (dict or list[dict], optional) –

    Initialization config dict. Default: ``[

    dict(type=’Kaiming’, layer=[‘Conv2d’]), dict(

    type=’Constant’, val=1, layer=[‘_BatchNorm’, ‘GroupNorm’])

    ]``

示例

>>> from mmpose.models import SEResNeXt
>>> import torch
>>> self = SEResNet(depth=50, out_indices=(0, 1, 2, 3))
>>> self.eval()
>>> inputs = torch.rand(1, 3, 224, 224)
>>> level_outputs = self.forward(inputs)
>>> for level_out in level_outputs:
...     print(tuple(level_out.shape))
(1, 256, 56, 56)
(1, 512, 28, 28)
(1, 1024, 14, 14)
(1, 2048, 7, 7)
make_res_layer(**kwargs)[源代码]

Make a ResLayer.

class mmpose.models.backbones.SEResNet(depth, se_ratio=16, **kwargs)[源代码]

SEResNet backbone.

Please refer to the paper for details.

参数
  • depth (int) – Network depth, from {50, 101, 152}.

  • se_ratio (int) – Squeeze ratio in SELayer. Default: 16.

  • in_channels (int) – Number of input image channels. Default: 3.

  • stem_channels (int) – Output channels of the stem layer. Default: 64.

  • num_stages (int) – Stages of the network. Default: 4.

  • strides (Sequence[int]) – Strides of the first block of each stage. Default: (1, 2, 2, 2).

  • dilations (Sequence[int]) – Dilation of each stage. Default: (1, 1, 1, 1).

  • out_indices (Sequence[int]) – Output from which stages. If only one stage is specified, a single tensor (feature map) is returned, otherwise multiple stages are specified, a tuple of tensors will be returned. Default: (3, ).

  • style (str) – pytorch or caffe. If set to “pytorch”, the stride-two layer is the 3x3 conv layer, otherwise the stride-two layer is the first 1x1 conv layer.

  • deep_stem (bool) – Replace 7x7 conv in input stem with 3 3x3 conv. Default: False.

  • avg_down (bool) – Use AvgPool instead of stride conv when downsampling in the bottleneck. Default: False.

  • frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Default: -1.

  • conv_cfg (dict | None) – The config dict for conv layers. Default: None.

  • norm_cfg (dict) – The config dict for norm layers.

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.

  • with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.

  • zero_init_residual (bool) – Whether to use zero init for last norm layer in resblocks to let them behave as identity. Default: True.

  • init_cfg (dict or list[dict], optional) –

    Initialization config dict. Default: ``[

    dict(type=’Kaiming’, layer=[‘Conv2d’]), dict(

    type=’Constant’, val=1, layer=[‘_BatchNorm’, ‘GroupNorm’])

    ]``

示例

>>> from mmpose.models import SEResNet
>>> import torch
>>> self = SEResNet(depth=50, out_indices=(0, 1, 2, 3))
>>> self.eval()
>>> inputs = torch.rand(1, 3, 224, 224)
>>> level_outputs = self.forward(inputs)
>>> for level_out in level_outputs:
...     print(tuple(level_out.shape))
(1, 256, 56, 56)
(1, 512, 28, 28)
(1, 1024, 14, 14)
(1, 2048, 7, 7)
make_res_layer(**kwargs)[源代码]

Make a ResLayer.

class mmpose.models.backbones.ShuffleNetV1(groups=3, widen_factor=1.0, out_indices=(2,), frozen_stages=- 1, conv_cfg=None, norm_cfg={'type': 'BN'}, act_cfg={'type': 'ReLU'}, norm_eval=False, with_cp=False, init_cfg=[{'type': 'Normal', 'std': 0.01, 'layer': ['Conv2d']}, {'type': 'Constant', 'val': 1, 'bias': 0.0001, 'layer': ['_BatchNorm', 'GroupNorm']}])[源代码]

ShuffleNetV1 backbone.

参数
  • groups (int, optional) – The number of groups to be used in grouped 1x1 convolutions in each ShuffleUnit. Default: 3.

  • widen_factor (float, optional) – Width multiplier - adjusts the number of channels in each layer by this amount. Default: 1.0.

  • out_indices (Sequence[int]) – Output from which stages. Default: (2, )

  • frozen_stages (int) – Stages to be frozen (all param fixed). Default: -1, which means not freezing any parameters.

  • conv_cfg (dict) – Config dict for convolution layer. Default: None, which means using conv2d.

  • norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’BN’).

  • act_cfg (dict) – Config dict for activation layer. Default: dict(type=’ReLU’).

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.

  • with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.

  • init_cfg (dict or list[dict], optional) –

    Initialization config dict. Default: ``[

    dict(type=’Normal’, std=0.01, layer=[‘Conv2d’]), dict(

    type=’Constant’, val=1, bias=0.0001 layer=[‘_BatchNorm’, ‘GroupNorm’])

    ]``

forward(x)[源代码]

Forward function.

参数

x (Tensor | tuple[Tensor]) – x could be a torch.Tensor or a tuple of torch.Tensor, containing input data for forward computation.

init_weights(pretrained=None)[源代码]

Initialize the weights.

make_layer(out_channels, num_blocks, first_block=False)[源代码]

Stack ShuffleUnit blocks to make a layer.

参数
  • out_channels (int) – out_channels of the block.

  • num_blocks (int) – Number of blocks.

  • first_block (bool, optional) – Whether is the first ShuffleUnit of a sequential ShuffleUnits. Default: False, which means using the grouped 1x1 convolution.

train(mode=True)[源代码]

Set module status before forward computation.

参数

mode (bool) – Whether it is train_mode or test_mode

class mmpose.models.backbones.ShuffleNetV2(widen_factor=1.0, out_indices=(3,), frozen_stages=- 1, conv_cfg=None, norm_cfg={'type': 'BN'}, act_cfg={'type': 'ReLU'}, norm_eval=False, with_cp=False, init_cfg=[{'type': 'Normal', 'std': 0.01, 'layer': ['Conv2d']}, {'type': 'Constant', 'val': 1, 'bias': 0.0001, 'layer': ['_BatchNorm', 'GroupNorm']}])[源代码]

ShuffleNetV2 backbone.

参数
  • widen_factor (float) – Width multiplier - adjusts the number of channels in each layer by this amount. Default: 1.0.

  • out_indices (Sequence[int]) – Output from which stages. Default: (0, 1, 2, 3).

  • frozen_stages (int) – Stages to be frozen (all param fixed). Default: -1, which means not freezing any parameters.

  • conv_cfg (dict) – Config dict for convolution layer. Default: None, which means using conv2d.

  • norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’BN’).

  • act_cfg (dict) – Config dict for activation layer. Default: dict(type=’ReLU’).

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.

  • with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.

  • init_cfg (dict or list[dict], optional) –

    Initialization config dict. Default: ``[

    dict(type=’Normal’, std=0.01, layer=[‘Conv2d’]), dict(

    type=’Constant’, val=1, bias=0.0001 layer=[‘_BatchNorm’, ‘GroupNorm’])

    ]``

forward(x)[源代码]

Forward function.

参数

x (Tensor | tuple[Tensor]) – x could be a torch.Tensor or a tuple of torch.Tensor, containing input data for forward computation.

init_weights()[源代码]

Initialize the weights.

train(mode=True)[源代码]

Set module status before forward computation.

参数

mode (bool) – Whether it is train_mode or test_mode

class mmpose.models.backbones.SwinTransformer(pretrain_img_size=224, in_channels=3, embed_dims=96, patch_size=4, window_size=7, mlp_ratio=4, depths=(2, 2, 6, 2), num_heads=(3, 6, 12, 24), strides=(4, 2, 2, 2), out_indices=(0, 1, 2, 3), qkv_bias=True, qk_scale=None, patch_norm=True, drop_rate=0.0, attn_drop_rate=0.0, drop_path_rate=0.1, use_abs_pos_embed=False, act_cfg={'type': 'GELU'}, norm_cfg={'type': 'LN'}, with_cp=False, convert_weights=False, frozen_stages=- 1, init_cfg=[{'type': 'TruncNormal', 'std': 0.02, 'layer': ['Linear']}, {'type': 'Constant', 'val': 1, 'layer': ['LayerNorm']}])[源代码]

Swin Transformer A PyTorch implement of : Swin Transformer: Hierarchical Vision Transformer using Shifted Windows -

Inspiration from https://github.com/microsoft/Swin-Transformer

参数
  • pretrain_img_size (int | tuple[int]) – The size of input image when pretrain. Defaults: 224.

  • in_channels (int) – The num of input channels. Defaults: 3.

  • embed_dims (int) – The feature dimension. Default: 96.

  • patch_size (int | tuple[int]) – Patch size. Default: 4.

  • window_size (int) – Window size. Default: 7.

  • mlp_ratio (int) – Ratio of mlp hidden dim to embedding dim. Default: 4.

  • depths (tuple[int]) – Depths of each Swin Transformer stage. Default: (2, 2, 6, 2).

  • num_heads (tuple[int]) – Parallel attention heads of each Swin Transformer stage. Default: (3, 6, 12, 24).

  • strides (tuple[int]) – The patch merging or patch embedding stride of each Swin Transformer stage. (In swin, we set kernel size equal to stride.) Default: (4, 2, 2, 2).

  • out_indices (tuple[int]) – Output from which stages. Default: (0, 1, 2, 3).

  • qkv_bias (bool, optional) – If True, add a learnable bias to query, key, value. Default: True

  • qk_scale (float | None, optional) – Override default qk scale of head_dim ** -0.5 if set. Default: None.

  • patch_norm (bool) – If add a norm layer for patch embed and patch merging. Default: True.

  • drop_rate (float) – Dropout rate. Defaults: 0.

  • attn_drop_rate (float) – Attention dropout rate. Default: 0.

  • drop_path_rate (float) – Stochastic depth rate. Defaults: 0.1.

  • use_abs_pos_embed (bool) – If True, add absolute position embedding to the patch embedding. Defaults: False.

  • act_cfg (dict) – Config dict for activation layer. Default: dict(type=’LN’).

  • norm_cfg (dict) – Config dict for normalization layer at output of backone. Defaults: dict(type=’LN’).

  • with_cp (bool, optional) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.

  • pretrained (str, optional) – model pretrained path. Default: None.

  • convert_weights (bool) – The flag indicates whether the pre-trained model is from the original repo. We may need to convert some keys to make it compatible. Default: False.

  • frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). Default: -1 (-1 means not freezing any parameters).

  • init_cfg (dict or list[dict], optional) –

    Initialization config dict. Default: ``[

    dict(type=’TruncNormal’, std=.02, layer=[‘Linear’]), dict(type=’Constant’, val=1, layer=[‘LayerNorm’]),

    ]``

forward(x)[源代码]

Forward function.

参数

x (Tensor | tuple[Tensor]) – x could be a torch.Tensor or a tuple of torch.Tensor, containing input data for forward computation.

init_weights(pretrained=None)[源代码]

Initialize the weights in backbone.

参数

pretrained (str, optional) – Path to pre-trained weights. Defaults to None.

train(mode=True)[源代码]

Convert the model into training mode while keep layers freezed.

class mmpose.models.backbones.TCN(in_channels, stem_channels=1024, num_blocks=2, kernel_sizes=(3, 3, 3), dropout=0.25, causal=False, residual=True, use_stride_conv=False, conv_cfg={'type': 'Conv1d'}, norm_cfg={'type': 'BN1d'}, max_norm=None, init_cfg=[{'type': 'Kaiming', 'mode': 'fan_in', 'nonlinearity': 'relu', 'layer': ['Conv2d']}, {'type': 'Constant', 'val': 1, 'layer': ['_BatchNorm', 'GroupNorm']}])[源代码]

TCN backbone.

Temporal Convolutional Networks. More details can be found in the paper .

参数
  • in_channels (int) – Number of input channels, which equals to num_keypoints * num_features.

  • stem_channels (int) – Number of feature channels. Default: 1024.

  • num_blocks (int) – NUmber of basic temporal convolutional blocks. Default: 2.

  • kernel_sizes (Sequence[int]) – Sizes of the convolving kernel of each basic block. Default: (3, 3, 3).

  • dropout (float) – Dropout rate. Default: 0.25.

  • causal (bool) – Use causal convolutions instead of symmetric convolutions (for real-time applications). Default: False.

  • residual (bool) – Use residual connection. Default: True.

  • use_stride_conv (bool) – Use TCN backbone optimized for single-frame batching, i.e. where batches have input length = receptive field, and output length = 1. This implementation replaces dilated convolutions with strided convolutions to avoid generating unused intermediate results. The weights are interchangeable with the reference implementation. Default: False

  • conv_cfg (dict) – dictionary to construct and config conv layer. Default: dict(type=’Conv1d’).

  • norm_cfg (dict) – dictionary to construct and config norm layer. Default: dict(type=’BN1d’).

  • max_norm (float|None) – if not None, the weight of convolution layers will be clipped to have a maximum norm of max_norm.

  • init_cfg (dict or list[dict], optional) –

    Initialization config dict. Default: ``[

    dict(

    type=’Kaiming’, mode=’fan_in’, nonlinearity=’relu’, layer=[‘Conv2d’]),

    dict(

    type=’Constant’, val=1, layer=[‘_BatchNorm’, ‘GroupNorm’])

    ]``

示例

>>> from mmpose.models import TCN
>>> import torch
>>> self = TCN(in_channels=34)
>>> self.eval()
>>> inputs = torch.rand(1, 34, 243)
>>> level_outputs = self.forward(inputs)
>>> for level_out in level_outputs:
...     print(tuple(level_out.shape))
(1, 1024, 235)
(1, 1024, 217)
forward(x)[源代码]

Forward function.

class mmpose.models.backbones.V2VNet(input_channels, output_channels, mid_channels=32, init_cfg={'layer': ['Conv3d', 'ConvTranspose3d'], 'std': 0.001, 'type': 'Normal'})[源代码]

V2VNet.

Please refer to the paper <https://arxiv.org/abs/1711.07399>

for details.

参数
  • input_channels (int) – Number of channels of the input feature volume.

  • output_channels (int) – Number of channels of the output volume.

  • mid_channels (int) – Input and output channels of the encoder-decoder block.

  • init_cfg (dict or list[dict], optional) –

    Initialization config dict. Default: ``dict(

    type=’Normal’, std=0.001, layer=[‘Conv3d’, ‘ConvTranspose3d’]

    )``

forward(x)[源代码]

Forward function.

class mmpose.models.backbones.VGG(depth, num_classes=- 1, num_stages=5, dilations=(1, 1, 1, 1, 1), out_indices=None, frozen_stages=- 1, conv_cfg=None, norm_cfg=None, act_cfg={'type': 'ReLU'}, norm_eval=False, ceil_mode=False, with_last_pool=True, init_cfg=[{'type': 'Kaiming', 'layer': ['Conv2d']}, {'type': 'Constant', 'val': 1, 'layer': ['_BatchNorm', 'GroupNorm']}, {'type': 'Normal', 'std': 0.01, 'layer': ['Linear']}])[源代码]

VGG backbone.

参数
  • depth (int) – Depth of vgg, from {11, 13, 16, 19}.

  • with_norm (bool) – Use BatchNorm or not.

  • num_classes (int) – number of classes for classification.

  • num_stages (int) – VGG stages, normally 5.

  • dilations (Sequence[int]) – Dilation of each stage.

  • out_indices (Sequence[int]) – Output from which stages. If only one stage is specified, a single tensor (feature map) is returned, otherwise multiple stages are specified, a tuple of tensors will be returned. When it is None, the default behavior depends on whether num_classes is specified. If num_classes <= 0, the default value is (4, ), outputting the last feature map before classifier. If num_classes > 0, the default value is (5, ), outputting the classification score. Default: None.

  • frozen_stages (int) – Stages to be frozen (all param fixed). -1 means not freezing any parameters.

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.

  • ceil_mode (bool) – Whether to use ceil_mode of MaxPool. Default: False.

  • with_last_pool (bool) – Whether to keep the last pooling before classifier. Default: True.

  • init_cfg (dict or list[dict], optional) –

    Initialization config dict. Default: ``[

    dict(type=’Kaiming’, layer=[‘Conv2d’]), dict(

    type=’Constant’, val=1, layer=[‘_BatchNorm’, ‘GroupNorm’]),

    dict(

    type=’Normal’, std=0.01, layer=[‘Linear’]),

    ]``

forward(x)[源代码]

Forward function.

参数

x (Tensor | tuple[Tensor]) – x could be a torch.Tensor or a tuple of torch.Tensor, containing input data for forward computation.

train(mode=True)[源代码]

Set module status before forward computation.

参数

mode (bool) – Whether it is train_mode or test_mode

class mmpose.models.backbones.ViPNAS_MobileNetV3(wid=[16, 16, 24, 40, 80, 112, 160], expan=[None, 1, 5, 4, 5, 5, 6], dep=[None, 1, 4, 4, 4, 4, 4], ks=[3, 3, 7, 7, 5, 7, 5], group=[None, 8, 120, 20, 100, 280, 240], att=[None, True, True, False, True, True, True], stride=[2, 1, 2, 2, 2, 1, 2], act=['HSwish', 'ReLU', 'ReLU', 'ReLU', 'HSwish', 'HSwish', 'HSwish'], conv_cfg=None, norm_cfg={'type': 'BN'}, frozen_stages=- 1, norm_eval=False, with_cp=False, init_cfg=[{'type': 'Normal', 'std': 0.001, 'layer': ['Conv2d']}, {'type': 'Constant', 'val': 1, 'layer': ['_BatchNorm', 'GroupNorm']}])[源代码]

ViPNAS_MobileNetV3 backbone.

“ViPNAS: Efficient Video Pose Estimation via Neural Architecture Search” More details can be found in the paper .

参数
  • wid (list(int)) – Searched width config for each stage.

  • expan (list(int)) – Searched expansion ratio config for each stage.

  • dep (list(int)) – Searched depth config for each stage.

  • ks (list(int)) – Searched kernel size config for each stage.

  • group (list(int)) – Searched group number config for each stage.

  • att (list(bool)) – Searched attention config for each stage.

  • stride (list(int)) – Stride config for each stage.

  • act (list(dict)) – Activation config for each stage.

  • conv_cfg (dict) – Config dict for convolution layer. Default: None, which means using conv2d.

  • norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’BN’).

  • frozen_stages (int) – Stages to be frozen (all param fixed). Default: -1, which means not freezing any parameters.

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.

  • with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.

  • init_cfg (dict or list[dict], optional) –

    Initialization config dict. Default: ``[

    dict(type=’Normal’, std=0.001, layer=[‘Conv2d’]), dict(

    type=’Constant’, val=1, layer=[‘_BatchNorm’, ‘GroupNorm’])

    ]``

forward(x)[源代码]

Forward function.

参数

x (Tensor | tuple[Tensor]) – x could be a torch.Tensor or a tuple of torch.Tensor, containing input data for forward computation.

train(mode=True)[源代码]

Set module status before forward computation.

参数

mode (bool) – Whether it is train_mode or test_mode

class mmpose.models.backbones.ViPNAS_ResNet(depth, in_channels=3, num_stages=4, strides=(1, 2, 2, 2), dilations=(1, 1, 1, 1), out_indices=(3,), style='pytorch', deep_stem=False, avg_down=False, frozen_stages=- 1, conv_cfg=None, norm_cfg={'requires_grad': True, 'type': 'BN'}, norm_eval=False, with_cp=False, zero_init_residual=True, wid=[48, 80, 160, 304, 608], expan=[None, 1, 1, 1, 1], dep=[None, 4, 6, 7, 3], ks=[7, 3, 5, 5, 5], group=[None, 16, 16, 16, 16], att=[None, True, False, True, True], init_cfg=[{'type': 'Normal', 'std': 0.001, 'layer': ['Conv2d']}, {'type': 'Constant', 'val': 1, 'layer': ['_BatchNorm', 'GroupNorm']}])[源代码]

ViPNAS_ResNet backbone.

“ViPNAS: Efficient Video Pose Estimation via Neural Architecture Search” More details can be found in the paper .

参数
  • depth (int) – Network depth, from {18, 34, 50, 101, 152}.

  • in_channels (int) – Number of input image channels. Default: 3.

  • num_stages (int) – Stages of the network. Default: 4.

  • strides (Sequence[int]) – Strides of the first block of each stage. Default: (1, 2, 2, 2).

  • dilations (Sequence[int]) – Dilation of each stage. Default: (1, 1, 1, 1).

  • out_indices (Sequence[int]) – Output from which stages. If only one stage is specified, a single tensor (feature map) is returned, otherwise multiple stages are specified, a tuple of tensors will be returned. Default: (3, ).

  • style (str) – pytorch or caffe. If set to “pytorch”, the stride-two layer is the 3x3 conv layer, otherwise the stride-two layer is the first 1x1 conv layer.

  • deep_stem (bool) – Replace 7x7 conv in input stem with 3 3x3 conv. Default: False.

  • avg_down (bool) – Use AvgPool instead of stride conv when downsampling in the bottleneck. Default: False.

  • frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Default: -1.

  • conv_cfg (dict | None) – The config dict for conv layers. Default: None.

  • norm_cfg (dict) – The config dict for norm layers.

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.

  • with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.

  • zero_init_residual (bool) – Whether to use zero init for last norm layer in resblocks to let them behave as identity. Default: True.

  • wid (list(int)) – Searched width config for each stage.

  • expan (list(int)) – Searched expansion ratio config for each stage.

  • dep (list(int)) – Searched depth config for each stage.

  • ks (list(int)) – Searched kernel size config for each stage.

  • group (list(int)) – Searched group number config for each stage.

  • att (list(bool)) – Searched attention config for each stage.

  • init_cfg (dict or list[dict], optional) –

    Initialization config dict. Default: ``[

    dict(type=’Normal’, std=0.001, layer=[‘Conv2d’]), dict(

    type=’Constant’, val=1, layer=[‘_BatchNorm’, ‘GroupNorm’])

    ]``

forward(x)[源代码]

Forward function.

make_res_layer(**kwargs)[源代码]

Make a ViPNAS ResLayer.

property norm1

the normalization layer named “norm1”

Type

nn.Module

train(mode=True)[源代码]

Convert the model into training mode.

necks

class mmpose.models.necks.FPN(in_channels, out_channels, num_outs, start_level=0, end_level=- 1, add_extra_convs=False, relu_before_extra_convs=False, no_norm_on_lateral=False, conv_cfg=None, norm_cfg=None, act_cfg=None, upsample_cfg={'mode': 'nearest'})[源代码]

Feature Pyramid Network.

This is an implementation of paper Feature Pyramid Networks for Object Detection.

参数
  • in_channels (list[int]) – Number of input channels per scale.

  • out_channels (int) – Number of output channels (used at each scale).

  • num_outs (int) – Number of output scales.

  • start_level (int) – Index of the start input backbone level used to build the feature pyramid. Default: 0.

  • end_level (int) – Index of the end input backbone level (exclusive) to build the feature pyramid. Default: -1, which means the last level.

  • add_extra_convs (bool | str) –

    If bool, it decides whether to add conv layers on top of the original feature maps. Default to False. If True, it is equivalent to add_extra_convs=’on_input’. If str, it specifies the source feature map of the extra convs. Only the following options are allowed

    • ’on_input’: Last feat map of neck inputs (i.e. backbone feature).

    • ’on_lateral’: Last feature map after lateral convs.

    • ’on_output’: The last output feature map after fpn convs.

  • relu_before_extra_convs (bool) – Whether to apply relu before the extra conv. Default: False.

  • no_norm_on_lateral (bool) – Whether to apply norm on lateral. Default: False.

  • conv_cfg (dict) – Config dict for convolution layer. Default: None.

  • norm_cfg (dict) – Config dict for normalization layer. Default: None.

  • act_cfg (dict) – Config dict for activation layer in ConvModule. Default: None.

  • upsample_cfg (dict) – Config dict for interpolate layer. Default: dict(mode=’nearest’).

示例

>>> import torch
>>> in_channels = [2, 3, 5, 7]
>>> scales = [340, 170, 84, 43]
>>> inputs = [torch.rand(1, c, s, s)
...           for c, s in zip(in_channels, scales)]
>>> self = FPN(in_channels, 11, len(in_channels)).eval()
>>> outputs = self.forward(inputs)
>>> for i in range(len(outputs)):
...     print(f'outputs[{i}].shape = {outputs[i].shape}')
outputs[0].shape = torch.Size([1, 11, 340, 340])
outputs[1].shape = torch.Size([1, 11, 170, 170])
outputs[2].shape = torch.Size([1, 11, 84, 84])
outputs[3].shape = torch.Size([1, 11, 43, 43])
forward(inputs)[源代码]

Forward function.

init_weights()[源代码]

Initialize model weights.

class mmpose.models.necks.FeatureMapProcessor(select_index: Optional[Union[int, Tuple[int]]] = None, concat: bool = False, scale_factor: float = 1.0, apply_relu: bool = False, align_corners: bool = False)[源代码]

A PyTorch module for selecting, concatenating, and rescaling feature maps.

参数
  • select_index (Optional[Union[int, Tuple[int]]], optional) – Index or indices of feature maps to select. Defaults to None, which means all feature maps are used.

  • concat (bool, optional) – Whether to concatenate the selected feature maps. Defaults to False.

  • scale_factor (float, optional) – The scaling factor to apply to the feature maps. Defaults to 1.0.

  • apply_relu (bool, optional) – Whether to apply ReLU on input feature maps. Defaults to False.

  • align_corners (bool, optional) – Whether to align corners when resizing the feature maps. Defaults to False.

forward(inputs: Union[torch.Tensor, Sequence[torch.Tensor]]) Union[torch.Tensor, List[torch.Tensor]][源代码]

Defines the computation performed at every call.

Should be overridden by all subclasses.

备注

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmpose.models.necks.GlobalAveragePooling[源代码]

Global Average Pooling neck.

Note that we use view to remove extra channel after pooling. We do not use squeeze as it will also remove the batch dimension when the tensor has a batch dimension of size 1, which can lead to unexpected errors.

forward(inputs)[源代码]

Forward function.

class mmpose.models.necks.PoseWarperNeck(in_channels, out_channels, inner_channels, deform_groups=17, dilations=(3, 6, 12, 18, 24), trans_conv_kernel=1, res_blocks_cfg=None, offsets_kernel=3, deform_conv_kernel=3, in_index=0, input_transform=None, freeze_trans_layer=True, norm_eval=False, im2col_step=80)[源代码]

PoseWarper neck.

“Learning temporal pose estimation from sparsely-labeled videos”.

参数
  • in_channels (int) – Number of input channels from backbone

  • out_channels (int) – Number of output channels

  • inner_channels (int) – Number of intermediate channels of the res block

  • deform_groups (int) – Number of groups in the deformable conv

  • dilations (list|tuple) – different dilations of the offset conv layers

  • trans_conv_kernel (int) – the kernel of the trans conv layer, which is used to get heatmap from the output of backbone. Default: 1

  • res_blocks_cfg (dict|None) –

    config of residual blocks. If None, use the default values. If not None, it should contain the following keys:

    • block (str): the type of residual block, Default: ‘BASIC’.

    • num_blocks (int): the number of blocks, Default: 20.

  • offsets_kernel (int) – the kernel of offset conv layer.

  • deform_conv_kernel (int) – the kernel of defomrable conv layer.

  • in_index (int|Sequence[int]) – Input feature index. Default: 0

  • input_transform (str|None) –

    Transformation type of input features. Options: ‘resize_concat’, ‘multiple_select’, None. Default: None.

    • ’resize_concat’: Multiple feature maps will be resize to the same size as first one and than concat together. Usually used in FCN head of HRNet.

    • ’multiple_select’: Multiple feature maps will be bundle into a list and passed into decode head.

    • None: Only one select feature map is allowed.

  • freeze_trans_layer (bool) – Whether to freeze the transition layer (stop grad and set eval mode). Default: True.

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.

  • im2col_step (int) – the argument im2col_step in deformable conv, Default: 80.

forward(inputs, frame_weight)[源代码]

Defines the computation performed at every call.

Should be overridden by all subclasses.

备注

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

train(mode=True)[源代码]

Convert the model into training mode.

detectors

class mmpose.models.pose_estimators.BottomupPoseEstimator(backbone: Union[mmengine.config.config.ConfigDict, dict], neck: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, head: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, train_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, test_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, data_preprocessor: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[mmengine.config.config.ConfigDict, dict]]]] = None)[源代码]

Base class for bottom-up pose estimators.

参数
  • backbone (dict) – The backbone config

  • neck (dict, optional) – The neck config. Defaults to None

  • head (dict, optional) – The head config. Defaults to None

  • train_cfg (dict, optional) – The runtime config for training process. Defaults to None

  • test_cfg (dict, optional) – The runtime config for testing process. Defaults to None

  • data_preprocessor (dict, optional) – The data preprocessing config to build the instance of BaseDataPreprocessor. Defaults to None.

  • init_cfg (dict, optional) – The config to control the initialization. Defaults to None

add_pred_to_datasample(batch_pred_instances: List[mmengine.structures.instance_data.InstanceData], batch_pred_fields: Optional[List[mmengine.structures.pixel_data.PixelData]], batch_data_samples: List[mmpose.structures.pose_data_sample.PoseDataSample]) List[mmpose.structures.pose_data_sample.PoseDataSample][源代码]

Add predictions into data samples.

参数
  • batch_pred_instances (List[InstanceData]) – The predicted instances of the input data batch

  • batch_pred_fields (List[PixelData], optional) – The predicted fields (e.g. heatmaps) of the input batch

  • batch_data_samples (List[PoseDataSample]) – The input data batch

返回

A list of data samples where the predictions are stored in the pred_instances field of each data sample. The length of the list is the batch size when merge==False, or 1 when merge==True.

返回类型

List[PoseDataSample]

loss(inputs: torch.Tensor, data_samples: List[mmpose.structures.pose_data_sample.PoseDataSample]) dict[源代码]

Calculate losses from a batch of inputs and data samples.

参数
  • inputs (Tensor) – Inputs with shape (N, C, H, W).

  • data_samples (List[PoseDataSample]) – The batch data samples.

返回

A dictionary of losses.

返回类型

dict

predict(inputs: Union[torch.Tensor, List[torch.Tensor]], data_samples: List[mmpose.structures.pose_data_sample.PoseDataSample]) List[mmpose.structures.pose_data_sample.PoseDataSample][源代码]

Predict results from a batch of inputs and data samples with post- processing.

参数
  • inputs (Tensor | List[Tensor]) – Input image in tensor or image pyramid as a list of tensors. Each tensor is in shape [B, C, H, W]

  • data_samples (List[PoseDataSample]) – The batch data samples

返回

The pose estimation results of the input images. The return value is PoseDataSample instances with pred_instances and pred_fields``(optional) field , and ``pred_instances usually contains the following keys:

  • keypoints (Tensor): predicted keypoint coordinates in shape

    (num_instances, K, D) where K is the keypoint number and D is the keypoint dimension

  • keypoint_scores (Tensor): predicted keypoint scores in shape

    (num_instances, K)

返回类型

list[PoseDataSample]

class mmpose.models.pose_estimators.TopdownPoseEstimator(backbone: Union[mmengine.config.config.ConfigDict, dict], neck: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, head: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, train_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, test_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, data_preprocessor: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[mmengine.config.config.ConfigDict, dict]]]] = None, metainfo: Optional[dict] = None)[源代码]

Base class for top-down pose estimators.

参数
  • backbone (dict) – The backbone config

  • neck (dict, optional) – The neck config. Defaults to None

  • head (dict, optional) – The head config. Defaults to None

  • train_cfg (dict, optional) – The runtime config for training process. Defaults to None

  • test_cfg (dict, optional) – The runtime config for testing process. Defaults to None

  • data_preprocessor (dict, optional) – The data preprocessing config to build the instance of BaseDataPreprocessor. Defaults to None

  • init_cfg (dict, optional) – The config to control the initialization. Defaults to None

  • metainfo (dict) – Meta information for dataset, such as keypoints definition and properties. If set, the metainfo of the input data batch will be overridden. For more details, please refer to https://mmpose.readthedocs.io/en/latest/user_guides/ prepare_datasets.html#create-a-custom-dataset-info- config-file-for-the-dataset. Defaults to None

add_pred_to_datasample(batch_pred_instances: List[mmengine.structures.instance_data.InstanceData], batch_pred_fields: Optional[List[mmengine.structures.pixel_data.PixelData]], batch_data_samples: List[mmpose.structures.pose_data_sample.PoseDataSample]) List[mmpose.structures.pose_data_sample.PoseDataSample][源代码]

Add predictions into data samples.

参数
  • batch_pred_instances (List[InstanceData]) – The predicted instances of the input data batch

  • batch_pred_fields (List[PixelData], optional) – The predicted fields (e.g. heatmaps) of the input batch

  • batch_data_samples (List[PoseDataSample]) – The input data batch

返回

A list of data samples where the predictions are stored in the pred_instances field of each data sample.

返回类型

List[PoseDataSample]

loss(inputs: torch.Tensor, data_samples: List[mmpose.structures.pose_data_sample.PoseDataSample]) dict[源代码]

Calculate losses from a batch of inputs and data samples.

参数
  • inputs (Tensor) – Inputs with shape (N, C, H, W).

  • data_samples (List[PoseDataSample]) – The batch data samples.

返回

A dictionary of losses.

返回类型

dict

predict(inputs: torch.Tensor, data_samples: List[mmpose.structures.pose_data_sample.PoseDataSample]) List[mmpose.structures.pose_data_sample.PoseDataSample][源代码]

Predict results from a batch of inputs and data samples with post- processing.

参数
  • inputs (Tensor) – Inputs with shape (N, C, H, W)

  • data_samples (List[PoseDataSample]) – The batch data samples

返回

The pose estimation results of the input images. The return value is PoseDataSample instances with pred_instances and pred_fields``(optional) field , and ``pred_instances usually contains the following keys:

  • keypoints (Tensor): predicted keypoint coordinates in shape

    (num_instances, K, D) where K is the keypoint number and D is the keypoint dimension

  • keypoint_scores (Tensor): predicted keypoint scores in shape

    (num_instances, K)

返回类型

list[PoseDataSample]

heads

class mmpose.models.heads.AssociativeEmbeddingHead(in_channels: Union[int, Sequence[int]], num_keypoints: int, tag_dim: int = 1, tag_per_keypoint: bool = True, deconv_out_channels: Optional[Sequence[int]] = (256, 256, 256), deconv_kernel_sizes: Optional[Sequence[int]] = (4, 4, 4), conv_out_channels: Optional[Sequence[int]] = None, conv_kernel_sizes: Optional[Sequence[int]] = None, final_layer: dict = {'kernel_size': 1}, keypoint_loss: Union[mmengine.config.config.ConfigDict, dict] = {'type': 'KeypointMSELoss'}, tag_loss: Union[mmengine.config.config.ConfigDict, dict] = {'type': 'AssociativeEmbeddingLoss'}, decoder: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None)[源代码]
forward(feats: Tuple[torch.Tensor]) Tuple[torch.Tensor, torch.Tensor][源代码]

Forward the network. The input is multi scale feature maps and the output is the heatmaps and tags.

参数

feats (Tuple[Tensor]) – Multi scale feature maps.

返回

  • heatmaps (Tensor): output heatmaps

  • tags (Tensor): output tags

返回类型

tuple

loss(feats: Tuple[torch.Tensor], batch_data_samples: Optional[List[mmpose.structures.pose_data_sample.PoseDataSample]], train_cfg: Union[mmengine.config.config.ConfigDict, dict] = {}) dict[源代码]

Calculate losses from a batch of inputs and data samples.

参数
  • feats (Tuple[Tensor]) – The multi-stage features

  • batch_data_samples (List[PoseDataSample]) – The batch data samples

  • train_cfg (dict) – The runtime config for training process. Defaults to {}

返回

A dictionary of losses.

返回类型

dict

predict(feats: Union[Tuple[torch.Tensor], List[Tuple[torch.Tensor]], List[List[Tuple[torch.Tensor]]]], batch_data_samples: Optional[List[mmpose.structures.pose_data_sample.PoseDataSample]], test_cfg: Union[mmengine.config.config.ConfigDict, dict] = {}) Union[List[mmengine.structures.instance_data.InstanceData], Tuple[List[mmengine.structures.instance_data.InstanceData], List[mmengine.structures.pixel_data.PixelData]]][源代码]

Predict results from features.

参数
  • feats (Features) –

    The features which could be in following forms:

    • Tuple[Tensor]: multi-stage features from the backbone

    • List[Tuple[Tensor]]: multiple features for TTA where either

      flip_test or multiscale_test is applied

    • List[List[Tuple[Tensor]]]: multiple features for TTA where

      both flip_test and multiscale_test are applied

  • batch_data_samples (List[PoseDataSample]) – The batch data samples

  • test_cfg (dict) – The runtime config for testing process. Defaults to {}

返回

If test_cfg['output_heatmap']==True, return both pose and heatmap prediction; otherwise only return the pose prediction.

The pose prediction is a list of InstanceData, each contains the following fields:

  • keypoints (np.ndarray): predicted keypoint coordinates in

    shape (num_instances, K, D) where K is the keypoint number and D is the keypoint dimension

  • keypoint_scores (np.ndarray): predicted keypoint scores in

    shape (num_instances, K)

The heatmap prediction is a list of PixelData, each contains the following fields:

  • heatmaps (Tensor): The predicted heatmaps in shape (K, h, w)

返回类型

Union[InstanceList | Tuple[InstanceList | PixelDataList]]

class mmpose.models.heads.BaseHead(init_cfg: Optional[Union[dict, List[dict]]] = None)[源代码]

Base head. A subclass should override predict() and loss().

参数

init_cfg (dict, optional) – The extra init config of layers. Defaults to None.

decode(batch_outputs: Union[torch.Tensor, Tuple[torch.Tensor]]) List[mmengine.structures.instance_data.InstanceData][源代码]

Decode keypoints from outputs.

参数

batch_outputs (Tensor | Tuple[Tensor]) – The network outputs of a data batch

返回

A list of InstanceData, each contains the decoded pose information of the instances of one data sample.

返回类型

List[InstanceData]

abstract forward(feats: Tuple[torch.Tensor])[源代码]

Forward the network.

abstract loss(feats: Tuple[torch.Tensor], batch_data_samples: Optional[List[mmpose.structures.pose_data_sample.PoseDataSample]], train_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = {}) dict[源代码]

Calculate losses from a batch of inputs and data samples.

abstract predict(feats: Union[Tuple[torch.Tensor], List[Tuple[torch.Tensor]], List[List[Tuple[torch.Tensor]]]], batch_data_samples: Optional[List[mmpose.structures.pose_data_sample.PoseDataSample]], test_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = {}) Union[List[mmengine.structures.instance_data.InstanceData], Tuple[List[mmengine.structures.instance_data.InstanceData], List[mmengine.structures.pixel_data.PixelData]]][源代码]

Predict results from features.

class mmpose.models.heads.CIDHead(in_channels: Union[int, Sequence[int]], gfd_channels: int, num_keypoints: int, prior_prob: float = 0.01, coupled_heatmap_loss: Optional[Union[mmengine.config.config.ConfigDict, dict]] = {'type': 'FocalHeatmapLoss'}, decoupled_heatmap_loss: Optional[Union[mmengine.config.config.ConfigDict, dict]] = {'type': 'FocalHeatmapLoss'}, contrastive_loss: Optional[Union[mmengine.config.config.ConfigDict, dict]] = {'type': 'InfoNCELoss'}, decoder: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None)[源代码]

Contextual Instance Decoupling head introduced in `Contextual Instance Decoupling for Robust Multi-Person Pose Estimation (CID)`_ by Wang et al (2022). The head is composed of an Instance Information Abstraction (IIA) module and a Global Feature Decoupling (GFD) module.

参数
  • in_channels (int | Sequence[int]) – Number of channels in the input feature map

  • num_keypoints (int) – Number of keypoints

  • gfd_channels (int) – Number of filters in GFD module

  • max_train_instances (int) – Maximum number of instances in a batch during training. Defaults to 200

  • heatmap_loss (Config) – Config of the heatmap loss. Defaults to use KeypointMSELoss

  • coupled_heatmap_loss (Config) – Config of the loss for coupled heatmaps. Defaults to use SoftWeightSmoothL1Loss

  • decoupled_heatmap_loss (Config) – Config of the loss for decoupled heatmaps. Defaults to use SoftWeightSmoothL1Loss

  • contrastive_loss (Config) – Config of the contrastive loss for representation vectors of instances. Defaults to use InfoNCELoss

  • decoder (Config, optional) – The decoder config that controls decoding keypoint coordinates from the network output. Defaults to None

  • init_cfg (Config, optional) – Config to control the initialization. See default_init_cfg for default settings

Contextual_Instance_Decoupling_for_Robust_Multi-Person_Pose_Estimation_ CVPR_2022_paper.html

forward(feats: Tuple[torch.Tensor]) torch.Tensor[源代码]

Forward the network. The input is multi scale feature maps and the output is the heatmap.

参数

feats (Tuple[Tensor]) – Multi scale feature maps.

返回

output heatmap.

返回类型

Tensor

loss(feats: Tuple[torch.Tensor], batch_data_samples: Optional[List[mmpose.structures.pose_data_sample.PoseDataSample]], train_cfg: Union[mmengine.config.config.ConfigDict, dict] = {}) dict[源代码]

Calculate losses from a batch of inputs and data samples.

参数
  • feats (Tuple[Tensor]) – The multi-stage features

  • batch_data_samples (List[PoseDataSample]) – The batch data samples

  • train_cfg (dict) – The runtime config for training process. Defaults to {}

返回

A dictionary of losses.

返回类型

dict

predict(feats: Union[Tuple[torch.Tensor], List[Tuple[torch.Tensor]], List[List[Tuple[torch.Tensor]]]], batch_data_samples: Optional[List[mmpose.structures.pose_data_sample.PoseDataSample]], test_cfg: Union[mmengine.config.config.ConfigDict, dict] = {}) Union[List[mmengine.structures.instance_data.InstanceData], Tuple[List[mmengine.structures.instance_data.InstanceData], List[mmengine.structures.pixel_data.PixelData]]][源代码]

Predict results from features.

参数
  • feats (Tuple[Tensor] | List[Tuple[Tensor]]) – The multi-stage features (or multiple multi-stage features in TTA)

  • batch_data_samples (List[PoseDataSample]) – The batch data samples

  • test_cfg (dict) – The runtime config for testing process. Defaults to {}

返回

If test_cfg['output_heatmap']==True, return both pose and heatmap prediction; otherwise only return the pose prediction.

The pose prediction is a list of InstanceData, each contains the following fields:

  • keypoints (np.ndarray): predicted keypoint coordinates in

    shape (num_instances, K, D) where K is the keypoint number and D is the keypoint dimension

  • keypoint_scores (np.ndarray): predicted keypoint scores in

    shape (num_instances, K)

The heatmap prediction is a list of PixelData, each contains the following fields:

  • heatmaps (Tensor): The predicted heatmaps in shape (K, h, w)

返回类型

Union[InstanceList | Tuple[InstanceList | PixelDataList]]

class mmpose.models.heads.CPMHead(in_channels: Union[int, Sequence[int]], out_channels: int, num_stages: int, deconv_out_channels: Optional[Sequence[int]] = None, deconv_kernel_sizes: Optional[Sequence[int]] = None, final_layer: dict = {'kernel_size': 1}, loss: Union[mmengine.config.config.ConfigDict, dict, List[Union[mmengine.config.config.ConfigDict, dict]]] = {'type': 'KeypointMSELoss', 'use_target_weight': True}, decoder: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None)[源代码]

Multi-stage heatmap head introduced in Convolutional Pose Machines by Wei et al (2016) and used by Stacked Hourglass Networks by Newell et al (2016). The head consists of multiple branches, each of which has some deconv layers and a simple conv2d layer.

参数
  • in_channels (int | Sequence[int]) – Number of channels in the input feature maps.

  • out_channels (int) – Number of channels in the output heatmaps.

  • num_stages (int) – Number of stages.

  • deconv_out_channels (Sequence[int], optional) – The output channel number of each deconv layer. Defaults to (256, 256, 256)

  • deconv_kernel_sizes (Sequence[int | tuple], optional) – The kernel size of each deconv layer. Each element should be either an integer for both height and width dimensions, or a tuple of two integers for the height and the width dimension respectively. Defaults to (4, 4, 4)

  • final_layer (dict) – Arguments of the final Conv2d layer. Defaults to dict(kernel_size=1)

  • loss (Config | List[Config]) – Config of the keypoint loss of different stages. Defaults to use KeypointMSELoss.

  • decoder (Config, optional) – The decoder config that controls decoding keypoint coordinates from the network output. Defaults to None

  • init_cfg (Config, optional) – Config to control the initialization. See default_init_cfg for default settings

forward(feats: Sequence[torch.Tensor]) List[torch.Tensor][源代码]

Forward the network. The input is multi-stage feature maps and the output is a list of heatmaps from multiple stages.

参数

feats (Sequence[Tensor]) – Multi-stage feature maps.

返回

A list of output heatmaps from multiple stages.

返回类型

List[Tensor]

loss(feats: Sequence[torch.Tensor], batch_data_samples: Optional[List[mmpose.structures.pose_data_sample.PoseDataSample]], train_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = {}) dict[源代码]

Calculate losses from a batch of inputs and data samples.

参数
  • feats (Sequence[Tensor]) – Multi-stage feature maps.

  • batch_data_samples (List[PoseDataSample]) – The Data Samples. It usually includes information such as gt_instances.

  • train_cfg (Config, optional) – The training config.

返回

A dictionary of loss components.

返回类型

dict

predict(feats: Union[Tuple[torch.Tensor], List[Tuple[torch.Tensor]], List[List[Tuple[torch.Tensor]]]], batch_data_samples: Optional[List[mmpose.structures.pose_data_sample.PoseDataSample]], test_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = {}) Union[List[mmengine.structures.instance_data.InstanceData], Tuple[List[mmengine.structures.instance_data.InstanceData], List[mmengine.structures.pixel_data.PixelData]]][源代码]

Predict results from multi-stage feature maps.

参数
  • feats (Tuple[Tensor] | List[Tuple[Tensor]]) – The multi-stage features (or multiple multi-stage features in TTA)

  • batch_data_samples (List[PoseDataSample]) – The batch data samples

  • test_cfg (dict) – The runtime config for testing process. Defaults to {}

返回

If test_cfg['output_heatmap']==True, return both pose and heatmap prediction; otherwise only return the pose prediction.

The pose prediction is a list of InstanceData, each contains the following fields:

  • keypoints (np.ndarray): predicted keypoint coordinates in

    shape (num_instances, K, D) where K is the keypoint number and D is the keypoint dimension

  • keypoint_scores (np.ndarray): predicted keypoint scores in

    shape (num_instances, K)

The heatmap prediction is a list of PixelData, each contains the following fields:

  • heatmaps (Tensor): The predicted heatmaps in shape (K, h, w)

返回类型

Union[InstanceList | Tuple[InstanceList | PixelDataList]]

class mmpose.models.heads.DEKRHead(in_channels: Union[int, Sequence[int]], num_keypoints: int, num_heatmap_filters: int = 32, num_displacement_filters_per_keypoint: int = 15, heatmap_loss: Union[mmengine.config.config.ConfigDict, dict] = {'type': 'KeypointMSELoss', 'use_target_weight': True}, displacement_loss: Union[mmengine.config.config.ConfigDict, dict] = {'supervise_empty': False, 'type': 'SoftWeightSmoothL1Loss', 'use_target_weight': True}, decoder: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, rescore_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None)[源代码]

DisEntangled Keypoint Regression head introduced in Bottom-up human pose estimation via disentangled keypoint regression by Geng et al (2021). The head is composed of a heatmap branch and a displacement branch.

参数
  • in_channels (int | Sequence[int]) – Number of channels in the input feature map

  • num_joints (int) – Number of joints

  • num_heatmap_filters (int) – Number of filters for heatmap branch. Defaults to 32

  • num_offset_filters_per_joint (int) – Number of filters for each joint in displacement branch. Defaults to 15

  • heatmap_loss (Config) – Config of the heatmap loss. Defaults to use KeypointMSELoss

  • displacement_loss (Config) – Config of the displacement regression loss. Defaults to use SoftWeightSmoothL1Loss

  • decoder (Config, optional) – The decoder config that controls decoding keypoint coordinates from the network output. Defaults to None

  • rescore_cfg (Config, optional) – The config for rescore net which estimates OKS via predicted keypoints and keypoint scores. Defaults to None

  • init_cfg (Config, optional) – Config to control the initialization. See default_init_cfg for default settings

decode(heatmaps: Tuple[torch.Tensor], displacements: Tuple[torch.Tensor], test_cfg: Union[mmengine.config.config.ConfigDict, dict] = {}, metainfo: dict = {}) List[mmengine.structures.instance_data.InstanceData][源代码]

Decode keypoints from outputs.

参数
  • heatmaps (Tuple[Tensor]) – The output heatmaps inferred from one image or multi-scale images.

  • displacements (Tuple[Tensor]) – The output displacement fields inferred from one image or multi-scale images.

  • test_cfg (dict) – The runtime config for testing process. Defaults to {}

  • metainfo (dict) – The metainfo of test dataset. Defaults to {}

返回

A list of InstanceData, each contains the

decoded pose information of the instances of one data sample.

返回类型

List[InstanceData]

forward(feats: Tuple[torch.Tensor]) torch.Tensor[源代码]

Forward the network. The input is multi scale feature maps and the output is a tuple of heatmap and displacement.

参数

feats (Tuple[Tensor]) – Multi scale feature maps.

返回

output heatmap and displacement.

返回类型

Tuple[Tensor]

loss(feats: Tuple[torch.Tensor], batch_data_samples: Optional[List[mmpose.structures.pose_data_sample.PoseDataSample]], train_cfg: Union[mmengine.config.config.ConfigDict, dict] = {}) dict[源代码]

Calculate losses from a batch of inputs and data samples.

参数
  • feats (Tuple[Tensor]) – The multi-stage features

  • batch_data_samples (List[PoseDataSample]) – The batch data samples

  • train_cfg (dict) – The runtime config for training process. Defaults to {}

返回

A dictionary of losses.

返回类型

dict

predict(feats: Union[Tuple[torch.Tensor], List[Tuple[torch.Tensor]], List[List[Tuple[torch.Tensor]]]], batch_data_samples: Optional[List[mmpose.structures.pose_data_sample.PoseDataSample]], test_cfg: Union[mmengine.config.config.ConfigDict, dict] = {}) Union[List[mmengine.structures.instance_data.InstanceData], Tuple[List[mmengine.structures.instance_data.InstanceData], List[mmengine.structures.pixel_data.PixelData]]][源代码]

Predict results from features.

参数
  • feats (Tuple[Tensor] | List[Tuple[Tensor]]) – The multi-stage features (or multiple multi-scale features in TTA)

  • batch_data_samples (List[PoseDataSample]) – The batch data samples

  • test_cfg (dict) – The runtime config for testing process. Defaults to {}

返回

If test_cfg['output_heatmap']==True, return both pose and heatmap prediction; otherwise only return the pose prediction.

The pose prediction is a list of InstanceData, each contains the following fields:

  • keypoints (np.ndarray): predicted keypoint coordinates in

    shape (num_instances, K, D) where K is the keypoint number and D is the keypoint dimension

  • keypoint_scores (np.ndarray): predicted keypoint scores in

    shape (num_instances, K)

The heatmap prediction is a list of PixelData, each contains the following fields:

  • heatmaps (Tensor): The predicted heatmaps in shape (1, h, w)

    or (K+1, h, w) if keypoint heatmaps are predicted

  • displacements (Tensor): The predicted displacement fields

    in shape (K*2, h, w)

返回类型

Union[InstanceList | Tuple[InstanceList | PixelDataList]]

class mmpose.models.heads.DSNTHead(in_channels: Union[int, Sequence[int]], in_featuremap_size: Tuple[int, int], num_joints: int, lambda_t: int = - 1, debias: bool = False, beta: float = 1.0, deconv_out_channels: Optional[Sequence[int]] = (256, 256, 256), deconv_kernel_sizes: Optional[Sequence[int]] = (4, 4, 4), conv_out_channels: Optional[Sequence[int]] = None, conv_kernel_sizes: Optional[Sequence[int]] = None, final_layer: dict = {'kernel_size': 1}, loss: Union[mmengine.config.config.ConfigDict, dict] = {'losses': [{'type': 'SmoothL1Loss', 'use_target_weight': True}, {'type': 'JSDiscretLoss', 'use_target_weight': True}], 'type': 'MultipleLossWrapper'}, decoder: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None)[源代码]

Top-down integral regression head introduced in DSNT by Nibali et al(2018). The head contains a differentiable spatial to numerical transform (DSNT) layer that do soft-argmax operation on the predicted heatmaps to regress the coordinates.

This head is used for algorithms that require supervision of heatmaps in DSNT approach.

参数
  • in_channels (int | sequence[int]) – Number of input channels

  • in_featuremap_size (int | sequence[int]) – Size of input feature map

  • num_joints (int) – Number of joints

  • lambda_t (int) – Discard heatmap-based loss when current epoch > lambda_t. Defaults to -1.

  • debias (bool) – Whether to remove the bias of Integral Pose Regression. see `Removing the Bias of Integral Pose Regression`_ by Gu et al (2021). Defaults to False.

  • beta (float) – A smoothing parameter in softmax. Defaults to 1.0.

  • deconv_out_channels (sequence[int]) – The output channel number of each deconv layer. Defaults to (256, 256, 256)

  • deconv_kernel_sizes (sequence[int | tuple], optional) – The kernel size of each deconv layer. Each element should be either an integer for both height and width dimensions, or a tuple of two integers for the height and the width dimension respectively.Defaults to (4, 4, 4)

  • conv_out_channels (sequence[int], optional) – The output channel number of each intermediate conv layer. None means no intermediate conv layer between deconv layers and the final conv layer. Defaults to None

  • conv_kernel_sizes (sequence[int | tuple], optional) – The kernel size of each intermediate conv layer. Defaults to None

  • final_layer (dict) – Arguments of the final Conv2d layer. Defaults to dict(kernel_size=1)

  • loss (Config) – Config for keypoint loss. Defaults to use DSNTLoss

  • decoder (Config, optional) – The decoder config that controls decoding keypoint coordinates from the network output. Defaults to None

  • init_cfg (Config, optional) – Config to control the initialization. See default_init_cfg for default settings

loss(inputs: Tuple[torch.Tensor], batch_data_samples: Optional[List[mmpose.structures.pose_data_sample.PoseDataSample]], train_cfg: Union[mmengine.config.config.ConfigDict, dict] = {}) dict[源代码]

Calculate losses from a batch of inputs and data samples.

class mmpose.models.heads.HeatmapHead(in_channels: Union[int, Sequence[int]], out_channels: int, deconv_out_channels: Optional[Sequence[int]] = (256, 256, 256), deconv_kernel_sizes: Optional[Sequence[int]] = (4, 4, 4), conv_out_channels: Optional[Sequence[int]] = None, conv_kernel_sizes: Optional[Sequence[int]] = None, final_layer: dict = {'kernel_size': 1}, loss: Union[mmengine.config.config.ConfigDict, dict] = {'type': 'KeypointMSELoss', 'use_target_weight': True}, decoder: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None)[源代码]

Top-down heatmap head introduced in Simple Baselines by Xiao et al (2018). The head is composed of a few deconvolutional layers followed by a convolutional layer to generate heatmaps from low-resolution feature maps.

参数
  • in_channels (int | Sequence[int]) – Number of channels in the input feature map

  • out_channels (int) – Number of channels in the output heatmap

  • deconv_out_channels (Sequence[int], optional) – The output channel number of each deconv layer. Defaults to (256, 256, 256)

  • deconv_kernel_sizes (Sequence[int | tuple], optional) – The kernel size of each deconv layer. Each element should be either an integer for both height and width dimensions, or a tuple of two integers for the height and the width dimension respectively.Defaults to (4, 4, 4)

  • conv_out_channels (Sequence[int], optional) – The output channel number of each intermediate conv layer. None means no intermediate conv layer between deconv layers and the final conv layer. Defaults to None

  • conv_kernel_sizes (Sequence[int | tuple], optional) – The kernel size of each intermediate conv layer. Defaults to None

  • final_layer (dict) – Arguments of the final Conv2d layer. Defaults to dict(kernel_size=1)

  • loss (Config) – Config of the keypoint loss. Defaults to use KeypointMSELoss

  • decoder (Config, optional) – The decoder config that controls decoding keypoint coordinates from the network output. Defaults to None

  • init_cfg (Config, optional) – Config to control the initialization. See default_init_cfg for default settings

  • extra (dict, optional) – Extra configurations. Defaults to None

forward(feats: Tuple[torch.Tensor]) torch.Tensor[源代码]

Forward the network. The input is multi scale feature maps and the output is the heatmap.

参数

feats (Tuple[Tensor]) – Multi scale feature maps.

返回

output heatmap.

返回类型

Tensor

loss(feats: Tuple[torch.Tensor], batch_data_samples: Optional[List[mmpose.structures.pose_data_sample.PoseDataSample]], train_cfg: Union[mmengine.config.config.ConfigDict, dict] = {}) dict[源代码]

Calculate losses from a batch of inputs and data samples.

参数
  • feats (Tuple[Tensor]) – The multi-stage features

  • batch_data_samples (List[PoseDataSample]) – The batch data samples

  • train_cfg (dict) – The runtime config for training process. Defaults to {}

返回

A dictionary of losses.

返回类型

dict

predict(feats: Union[Tuple[torch.Tensor], List[Tuple[torch.Tensor]], List[List[Tuple[torch.Tensor]]]], batch_data_samples: Optional[List[mmpose.structures.pose_data_sample.PoseDataSample]], test_cfg: Union[mmengine.config.config.ConfigDict, dict] = {}) Union[List[mmengine.structures.instance_data.InstanceData], Tuple[List[mmengine.structures.instance_data.InstanceData], List[mmengine.structures.pixel_data.PixelData]]][源代码]

Predict results from features.

参数
  • feats (Tuple[Tensor] | List[Tuple[Tensor]]) – The multi-stage features (or multiple multi-stage features in TTA)

  • batch_data_samples (List[PoseDataSample]) – The batch data samples

  • test_cfg (dict) – The runtime config for testing process. Defaults to {}

返回

If test_cfg['output_heatmap']==True, return both pose and heatmap prediction; otherwise only return the pose prediction.

The pose prediction is a list of InstanceData, each contains the following fields:

  • keypoints (np.ndarray): predicted keypoint coordinates in

    shape (num_instances, K, D) where K is the keypoint number and D is the keypoint dimension

  • keypoint_scores (np.ndarray): predicted keypoint scores in

    shape (num_instances, K)

The heatmap prediction is a list of PixelData, each contains the following fields:

  • heatmaps (Tensor): The predicted heatmaps in shape (K, h, w)

返回类型

Union[InstanceList | Tuple[InstanceList | PixelDataList]]

class mmpose.models.heads.IntegralRegressionHead(in_channels: Union[int, Sequence[int]], in_featuremap_size: Tuple[int, int], num_joints: int, debias: bool = False, beta: float = 1.0, deconv_out_channels: Optional[Sequence[int]] = (256, 256, 256), deconv_kernel_sizes: Optional[Sequence[int]] = (4, 4, 4), conv_out_channels: Optional[Sequence[int]] = None, conv_kernel_sizes: Optional[Sequence[int]] = None, final_layer: dict = {'kernel_size': 1}, loss: Union[mmengine.config.config.ConfigDict, dict] = {'type': 'SmoothL1Loss', 'use_target_weight': True}, decoder: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None)[源代码]

Top-down integral regression head introduced in IPR by Xiao et al(2018). The head contains a differentiable spatial to numerical transform (DSNT) layer that do soft-argmax operation on the predicted heatmaps to regress the coordinates.

This head is used for algorithms that only supervise the coordinates.

参数
  • in_channels (int | sequence[int]) – Number of input channels

  • in_featuremap_size (int | sequence[int]) – Size of input feature map

  • num_joints (int) – Number of joints

  • debias (bool) – Whether to remove the bias of Integral Pose Regression. see `Removing the Bias of Integral Pose Regression`_ by Gu et al (2021). Defaults to False.

  • beta (float) – A smoothing parameter in softmax. Defaults to 1.0.

  • deconv_out_channels (sequence[int]) – The output channel number of each deconv layer. Defaults to (256, 256, 256)

  • deconv_kernel_sizes (sequence[int | tuple], optional) – The kernel size of each deconv layer. Each element should be either an integer for both height and width dimensions, or a tuple of two integers for the height and the width dimension respectively.Defaults to (4, 4, 4)

  • conv_out_channels (sequence[int], optional) – The output channel number of each intermediate conv layer. None means no intermediate conv layer between deconv layers and the final conv layer. Defaults to None

  • conv_kernel_sizes (sequence[int | tuple], optional) – The kernel size of each intermediate conv layer. Defaults to None

  • final_layer (dict) – Arguments of the final Conv2d layer. Defaults to dict(kernel_size=1)

  • loss (Config) – Config for keypoint loss. Defaults to use SmoothL1Loss

  • decoder (Config, optional) – The decoder config that controls decoding keypoint coordinates from the network output. Defaults to None

  • init_cfg (Config, optional) – Config to control the initialization. See default_init_cfg for default settings

forward(feats: Tuple[torch.Tensor]) Union[torch.Tensor, Tuple[torch.Tensor]][源代码]

Forward the network. The input is multi scale feature maps and the output is the coordinates.

参数

feats (Tuple[Tensor]) – Multi scale feature maps.

返回

output coordinates(and sigmas[optional]).

返回类型

Tensor

loss(inputs: Tuple[torch.Tensor], batch_data_samples: Optional[List[mmpose.structures.pose_data_sample.PoseDataSample]], train_cfg: Union[mmengine.config.config.ConfigDict, dict] = {}) dict[源代码]

Calculate losses from a batch of inputs and data samples.

predict(feats: Tuple[torch.Tensor], batch_data_samples: Optional[List[mmpose.structures.pose_data_sample.PoseDataSample]], test_cfg: Union[mmengine.config.config.ConfigDict, dict] = {}) Union[List[mmengine.structures.instance_data.InstanceData], Tuple[List[mmengine.structures.instance_data.InstanceData], List[mmengine.structures.pixel_data.PixelData]]][源代码]

Predict results from features.

参数
  • feats (Tuple[Tensor] | List[Tuple[Tensor]]) – The multi-stage features (or multiple multi-stage features in TTA)

  • batch_data_samples (List[PoseDataSample]) – The batch data samples

  • test_cfg (dict) – The runtime config for testing process. Defaults to {}

返回

If test_cfg['output_heatmap']==True, return both pose and heatmap prediction; otherwise only return the pose prediction.

The pose prediction is a list of InstanceData, each contains the following fields:

  • keypoints (np.ndarray): predicted keypoint coordinates in

    shape (num_instances, K, D) where K is the keypoint number and D is the keypoint dimension

  • keypoint_scores (np.ndarray): predicted keypoint scores in

    shape (num_instances, K)

The heatmap prediction is a list of PixelData, each contains the following fields:

  • heatmaps (Tensor): The predicted heatmaps in shape (K, h, w)

返回类型

Union[InstanceList | Tuple[InstanceList | PixelDataList]]

class mmpose.models.heads.MSPNHead(num_stages: int = 4, num_units: int = 4, out_shape: tuple = (64, 48), unit_channels: int = 256, out_channels: int = 17, use_prm: bool = False, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'type': 'BN'}, level_indices: Sequence[int] = [], loss: Union[mmengine.config.config.ConfigDict, dict, List[Union[mmengine.config.config.ConfigDict, dict]]] = {'type': 'KeypointMSELoss', 'use_target_weight': True}, decoder: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None)[源代码]

Multi-stage multi-unit heatmap head introduced in `Multi-Stage Pose estimation Network (MSPN)`_ by Li et al (2019), and used by `Residual Steps Networks (RSN)`_ by Cai et al (2020). The head consists of multiple stages and each stage consists of multiple units. Each unit of each stage has some conv layers.

参数
  • num_stages (int) – Number of stages.

  • num_units (int) – Number of units in each stage.

  • out_shape (tuple) – The output shape of the output heatmaps.

  • unit_channels (int) – Number of input channels.

  • out_channels (int) – Number of output channels.

  • out_shape – Shape of the output heatmaps.

  • use_prm (bool) – Whether to use pose refine machine (PRM). Defaults to False.

  • norm_cfg (Config) – Config to construct the norm layer. Defaults to dict(type='BN')

  • loss (Config | List[Config]) – Config of the keypoint loss for different stages and different units. Defaults to use KeypointMSELoss.

  • level_indices (Sequence[int]) – The indices that specified the level of target heatmaps.

  • decoder (Config, optional) – The decoder config that controls decoding keypoint coordinates from the network output. Defaults to None

  • init_cfg (Config, optional) – Config to control the initialization. See default_init_cfg for default settings

property default_init_cfg

Default config for weight initialization.

forward(feats: Sequence[Sequence[torch.Tensor]]) List[torch.Tensor][源代码]

Forward the network. The input is multi-stage multi-unit feature maps and the output is a list of heatmaps from multiple stages.

参数

feats (Sequence[Sequence[Tensor]]) – Feature maps from multiple stages and units.

返回

A list of output heatmaps from multiple stages

and units.

返回类型

List[Tensor]

loss(feats: Sequence[Sequence[torch.Tensor]], batch_data_samples: Optional[List[mmpose.structures.pose_data_sample.PoseDataSample]], train_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = {}) dict[源代码]

Calculate losses from a batch of inputs and data samples.

备注

  • batch_size: B

  • num_output_heatmap_levels: L

  • num_keypoints: K

  • heatmaps height: H

  • heatmaps weight: W

  • num_instances: N (usually 1 in topdown heatmap heads)

参数
  • feats (Sequence[Sequence[Tensor]]) – Feature maps from multiple stages and units

  • batch_data_samples (List[PoseDataSample]) – The Data Samples. It usually includes information such as gt_instance_labels and gt_fields.

  • train_cfg (Config, optional) – The training config

返回

A dictionary of loss components.

返回类型

dict

predict(feats: Union[Sequence[Sequence[torch.Tensor]], List[Sequence[Sequence[torch.Tensor]]]], batch_data_samples: Optional[List[mmpose.structures.pose_data_sample.PoseDataSample]], test_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = {}) Union[List[mmengine.structures.instance_data.InstanceData], Tuple[List[mmengine.structures.instance_data.InstanceData], List[mmengine.structures.pixel_data.PixelData]]][源代码]

Predict results from multi-stage feature maps.

参数
  • feats (Sequence[Sequence[Tensor]]) – Multi-stage multi-unit features (or multiple MSMU features for TTA)

  • batch_data_samples (List[PoseDataSample]) – The Data Samples. It usually includes information such as gt_instance_labels.

  • test_cfg (Config, optional) – The testing/inference config

返回

If test_cfg['output_heatmap']==True, return both pose and heatmap prediction; otherwise only return the pose prediction.

The pose prediction is a list of InstanceData, each contains the following fields:

  • keypoints (np.ndarray): predicted keypoint coordinates in

    shape (num_instances, K, D) where K is the keypoint number and D is the keypoint dimension

  • keypoint_scores (np.ndarray): predicted keypoint scores in

    shape (num_instances, K)

The heatmap prediction is a list of PixelData, each contains the following fields:

  • heatmaps (Tensor): The predicted heatmaps in shape (K, h, w)

返回类型

Union[InstanceList | Tuple[InstanceList | PixelDataList]]

class mmpose.models.heads.RLEHead(in_channels: Union[int, Sequence[int]], num_joints: int, loss: Union[mmengine.config.config.ConfigDict, dict] = {'type': 'RLELoss', 'use_target_weight': True}, decoder: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None)[源代码]

Top-down regression head introduced in RLE by Li et al(2021). The head is composed of fully-connected layers to predict the coordinates and sigma(the variance of the coordinates) together.

参数
  • in_channels (int | sequence[int]) – Number of input channels

  • num_joints (int) – Number of joints

  • loss (Config) – Config for keypoint loss. Defaults to use RLELoss

  • decoder (Config, optional) – The decoder config that controls decoding keypoint coordinates from the network output. Defaults to None

  • init_cfg (Config, optional) – Config to control the initialization. See default_init_cfg for default settings

forward(feats: Tuple[torch.Tensor]) torch.Tensor[源代码]

Forward the network. The input is multi scale feature maps and the output is the coordinates.

参数

feats (Tuple[Tensor]) – Multi scale feature maps.

返回

output coordinates(and sigmas[optional]).

返回类型

Tensor

loss(inputs: Tuple[torch.Tensor], batch_data_samples: Optional[List[mmpose.structures.pose_data_sample.PoseDataSample]], train_cfg: Union[mmengine.config.config.ConfigDict, dict] = {}) dict[源代码]

Calculate losses from a batch of inputs and data samples.

predict(feats: Tuple[torch.Tensor], batch_data_samples: Optional[List[mmpose.structures.pose_data_sample.PoseDataSample]], test_cfg: Union[mmengine.config.config.ConfigDict, dict] = {}) Union[List[mmengine.structures.instance_data.InstanceData], Tuple[List[mmengine.structures.instance_data.InstanceData], List[mmengine.structures.pixel_data.PixelData]]][源代码]

Predict results from outputs.

class mmpose.models.heads.RTMCCHead(in_channels: Union[int, Sequence[int]], out_channels: int, input_size: Tuple[int, int], in_featuremap_size: Tuple[int, int], simcc_split_ratio: float = 2.0, final_layer_kernel_size: int = 1, gau_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'act_fn': 'ReLU', 'drop_path': 0.0, 'dropout_rate': 0.0, 'expansion_factor': 2, 'hidden_dims': 256, 'pos_enc': False, 's': 128, 'use_rel_bias': False}, loss: Union[mmengine.config.config.ConfigDict, dict] = {'type': 'KLDiscretLoss', 'use_target_weight': True}, decoder: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None)[源代码]

Top-down head introduced in RTMPose (2023). The head is composed of a large-kernel convolutional layer, a fully-connected layer and a Gated Attention Unit to generate 1d representation from low-resolution feature maps.

参数
  • in_channels (int | sequence[int]) – Number of channels in the input feature map.

  • out_channels (int) – Number of channels in the output heatmap.

  • input_size (tuple) – Size of input image in shape [w, h].

  • in_featuremap_size (int | sequence[int]) – Size of input feature map.

  • simcc_split_ratio (float) – Split ratio of pixels. Default: 2.0.

  • final_layer_kernel_size (int) – Kernel size of the convolutional layer. Default: 1.

  • gau_cfg (Config) –

    Config dict for the Gated Attention Unit. Default: dict(

    hidden_dims=256, s=128, expansion_factor=2, dropout_rate=0., drop_path=0., act_fn=’ReLU’, use_rel_bias=False, pos_enc=False).

  • loss (Config) – Config of the keypoint loss. Defaults to use KLDiscretLoss

  • decoder (Config, optional) – The decoder config that controls decoding keypoint coordinates from the network output. Defaults to None

  • init_cfg (Config, optional) – Config to control the initialization. See default_init_cfg for default settings

forward(feats: Tuple[torch.Tensor]) Tuple[torch.Tensor, torch.Tensor][源代码]

Forward the network.

The input is multi scale feature maps and the output is the heatmap.

参数

feats (Tuple[Tensor]) – Multi scale feature maps.

返回

1d representation of x. pred_y (Tensor): 1d representation of y.

返回类型

pred_x (Tensor)

loss(feats: Tuple[torch.Tensor], batch_data_samples: Optional[List[mmpose.structures.pose_data_sample.PoseDataSample]], train_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = {}) dict[源代码]

Calculate losses from a batch of inputs and data samples.

predict(feats: Tuple[torch.Tensor], batch_data_samples: Optional[List[mmpose.structures.pose_data_sample.PoseDataSample]], test_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = {}) List[mmengine.structures.instance_data.InstanceData][源代码]

Predict results from features.

参数
  • feats (Tuple[Tensor] | List[Tuple[Tensor]]) – The multi-stage features (or multiple multi-stage features in TTA)

  • batch_data_samples (List[PoseDataSample]) – The batch data samples

  • test_cfg (dict) – The runtime config for testing process. Defaults to {}

返回

The pose predictions, each contains the following fields:

  • keypoints (np.ndarray): predicted keypoint coordinates in

    shape (num_instances, K, D) where K is the keypoint number and D is the keypoint dimension

  • keypoint_scores (np.ndarray): predicted keypoint scores in

    shape (num_instances, K)

  • keypoint_x_labels (np.ndarray, optional): The predicted 1-D

    intensity distribution in the x direction

  • keypoint_y_labels (np.ndarray, optional): The predicted 1-D

    intensity distribution in the y direction

返回类型

List[InstanceData]

class mmpose.models.heads.RegressionHead(in_channels: Union[int, Sequence[int]], num_joints: int, loss: Union[mmengine.config.config.ConfigDict, dict] = {'type': 'SmoothL1Loss', 'use_target_weight': True}, decoder: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None)[源代码]

Top-down regression head introduced in Deeppose by Toshev et al (2014). The head is composed of fully-connected layers to predict the coordinates directly.

参数
  • in_channels (int | sequence[int]) – Number of input channels

  • num_joints (int) – Number of joints

  • loss (Config) – Config for keypoint loss. Defaults to use SmoothL1Loss

  • decoder (Config, optional) – The decoder config that controls decoding keypoint coordinates from the network output. Defaults to None

  • init_cfg (Config, optional) – Config to control the initialization. See default_init_cfg for default settings

forward(feats: Tuple[torch.Tensor]) torch.Tensor[源代码]

Forward the network. The input is multi scale feature maps and the output is the coordinates.

参数

feats (Tuple[Tensor]) – Multi scale feature maps.

返回

output coordinates(and sigmas[optional]).

返回类型

Tensor

loss(inputs: Tuple[torch.Tensor], batch_data_samples: Optional[List[mmpose.structures.pose_data_sample.PoseDataSample]], train_cfg: Union[mmengine.config.config.ConfigDict, dict] = {}) dict[源代码]

Calculate losses from a batch of inputs and data samples.

predict(feats: Tuple[torch.Tensor], batch_data_samples: Optional[List[mmpose.structures.pose_data_sample.PoseDataSample]], test_cfg: Union[mmengine.config.config.ConfigDict, dict] = {}) Union[List[mmengine.structures.instance_data.InstanceData], Tuple[List[mmengine.structures.instance_data.InstanceData], List[mmengine.structures.pixel_data.PixelData]]][源代码]

Predict results from outputs.

class mmpose.models.heads.SimCCHead(in_channels: Union[int, Sequence[int]], out_channels: int, input_size: Tuple[int, int], in_featuremap_size: Tuple[int, int], simcc_split_ratio: float = 2.0, deconv_type: str = 'heatmap', deconv_out_channels: Optional[Sequence[int]] = (256, 256, 256), deconv_kernel_sizes: Optional[Sequence[int]] = (4, 4, 4), deconv_num_groups: Optional[Sequence[int]] = (16, 16, 16), conv_out_channels: Optional[Sequence[int]] = None, conv_kernel_sizes: Optional[Sequence[int]] = None, final_layer: dict = {'kernel_size': 1}, loss: Union[mmengine.config.config.ConfigDict, dict] = {'type': 'KLDiscretLoss', 'use_target_weight': True}, decoder: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None)[源代码]

Top-down heatmap head introduced in SimCC by Li et al (2022). The head is composed of a few deconvolutional layers followed by a fully- connected layer to generate 1d representation from low-resolution feature maps.

参数
  • in_channels (int | sequence[int]) – Number of channels in the input feature map

  • out_channels (int) – Number of channels in the output heatmap

  • input_size (tuple) – Input image size in shape [w, h]

  • in_featuremap_size (int | sequence[int]) – Size of input feature map

  • simcc_split_ratio (float) – Split ratio of pixels

  • deconv_type (str, optional) –

    The type of deconv head which should be one of the following options:

    • 'heatmap': make deconv layers in HeatmapHead

    • 'vipnas': make deconv layers in ViPNASHead

    Defaults to 'Heatmap'

  • deconv_out_channels (sequence[int]) – The output channel number of each deconv layer. Defaults to (256, 256, 256)

  • deconv_kernel_sizes (sequence[int | tuple], optional) – The kernel size of each deconv layer. Each element should be either an integer for both height and width dimensions, or a tuple of two integers for the height and the width dimension respectively.Defaults to (4, 4, 4)

  • deconv_num_groups (Sequence[int], optional) – The group number of each deconv layer. Defaults to (16, 16, 16)

  • conv_out_channels (sequence[int], optional) – The output channel number of each intermediate conv layer. None means no intermediate conv layer between deconv layers and the final conv layer. Defaults to None

  • conv_kernel_sizes (sequence[int | tuple], optional) – The kernel size of each intermediate conv layer. Defaults to None

  • final_layer (dict) – Arguments of the final Conv2d layer. Defaults to dict(kernel_size=1)

  • loss (Config) – Config of the keypoint loss. Defaults to use KLDiscretLoss

  • decoder (Config, optional) – The decoder config that controls decoding keypoint coordinates from the network output. Defaults to None

  • init_cfg (Config, optional) – Config to control the initialization. See default_init_cfg for default settings

forward(feats: Tuple[torch.Tensor]) Tuple[torch.Tensor, torch.Tensor][源代码]

Forward the network. The input is multi scale feature maps and the output is the heatmap.

参数

feats (Tuple[Tensor]) – Multi scale feature maps.

返回

1d representation of x. pred_y (Tensor): 1d representation of y.

返回类型

pred_x (Tensor)

loss(feats: Tuple[torch.Tensor], batch_data_samples: Optional[List[mmpose.structures.pose_data_sample.PoseDataSample]], train_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = {}) dict[源代码]

Calculate losses from a batch of inputs and data samples.

predict(feats: Tuple[torch.Tensor], batch_data_samples: Optional[List[mmpose.structures.pose_data_sample.PoseDataSample]], test_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = {}) List[mmengine.structures.instance_data.InstanceData][源代码]

Predict results from features.

参数
  • feats (Tuple[Tensor] | List[Tuple[Tensor]]) – The multi-stage features (or multiple multi-stage features in TTA)

  • batch_data_samples (List[PoseDataSample]) – The batch data samples

  • test_cfg (dict) – The runtime config for testing process. Defaults to {}

返回

The pose predictions, each contains the following fields:

  • keypoints (np.ndarray): predicted keypoint coordinates in

    shape (num_instances, K, D) where K is the keypoint number and D is the keypoint dimension

  • keypoint_scores (np.ndarray): predicted keypoint scores in

    shape (num_instances, K)

  • keypoint_x_labels (np.ndarray, optional): The predicted 1-D

    intensity distribution in the x direction

  • keypoint_y_labels (np.ndarray, optional): The predicted 1-D

    intensity distribution in the y direction

返回类型

List[InstanceData]

class mmpose.models.heads.ViPNASHead(in_channels: Union[int, Sequence[int]], out_channels: int, deconv_out_channels: Optional[Sequence[int]] = (144, 144, 144), deconv_kernel_sizes: Optional[Sequence[int]] = (4, 4, 4), deconv_num_groups: Optional[Sequence[int]] = (16, 16, 16), conv_out_channels: Optional[Sequence[int]] = None, conv_kernel_sizes: Optional[Sequence[int]] = None, final_layer: dict = {'kernel_size': 1}, loss: Union[mmengine.config.config.ConfigDict, dict] = {'type': 'KeypointMSELoss', 'use_target_weight': True}, decoder: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None)[源代码]

ViPNAS heatmap head introduced in ViPNAS by Xu et al (2021). The head is composed of a few deconvolutional layers followed by a convolutional layer to generate heatmaps from low-resolution feature maps. Specifically, different from the :class: HeatmapHead introduced by Simple Baselines, the group numbers in the deconvolutional layers are elastic and thus can be optimized by neural architecture search (NAS).

参数
  • in_channels (int | Sequence[int]) – Number of channels in the input feature map

  • out_channels (int) – Number of channels in the output heatmap

  • deconv_out_channels (Sequence[int], optional) – The output channel number of each deconv layer. Defaults to (144, 144, 144)

  • deconv_kernel_sizes (Sequence[int | tuple], optional) – The kernel size of each deconv layer. Each element should be either an integer for both height and width dimensions, or a tuple of two integers for the height and the width dimension respectively.Defaults to (4, 4, 4)

  • deconv_num_groups (Sequence[int], optional) – The group number of each deconv layer. Defaults to (16, 16, 16)

  • conv_out_channels (Sequence[int], optional) – The output channel number of each intermediate conv layer. None means no intermediate conv layer between deconv layers and the final conv layer. Defaults to None

  • conv_kernel_sizes (Sequence[int | tuple], optional) – The kernel size of each intermediate conv layer. Defaults to None

  • final_layer (dict) – Arguments of the final Conv2d layer. Defaults to dict(kernel_size=1)

  • loss (Config) – Config of the keypoint loss. Defaults to use KeypointMSELoss

  • decoder (Config, optional) – The decoder config that controls decoding keypoint coordinates from the network output. Defaults to None

  • init_cfg (Config, optional) – Config to control the initialization. See default_init_cfg for default settings

losses

class mmpose.models.losses.AdaptiveWingLoss(alpha=2.1, omega=14, epsilon=1, theta=0.5, use_target_weight=False, loss_weight=1.0)[源代码]

Adaptive wing loss. paper ref: ‘Adaptive Wing Loss for Robust Face Alignment via Heatmap Regression’ Wang et al. ICCV’2019.

参数
  • alpha (float), omega (float), epsilon (float), theta (float) – are hyper-parameters.

  • use_target_weight (bool) – Option to use weighted MSE loss. Different joint types may have different target weights.

  • loss_weight (float) – Weight of the loss. Default: 1.0.

criterion(pred, target)[源代码]

Criterion of wingloss.

备注

batch_size: N num_keypoints: K

参数
  • pred (torch.Tensor[NxKxHxW]) – Predicted heatmaps.

  • target (torch.Tensor[NxKxHxW]) – Target heatmaps.

forward(output: torch.Tensor, target: torch.Tensor, target_weights: Optional[torch.Tensor] = None)[源代码]

Forward function.

备注

batch_size: N num_keypoints: K

参数
  • output (torch.Tensor[N, K, H, W]) – Output heatmaps.

  • target (torch.Tensor[N, K, H, W]) – Target heatmaps.

  • target_weight (torch.Tensor[N, K]) – Weights across different joint types.

class mmpose.models.losses.AssociativeEmbeddingLoss(loss_weight: float = 1.0, push_loss_factor: float = 0.5)[源代码]

Associative Embedding loss.

Details can be found in Associative Embedding

备注

  • batch size: B

  • instance number: N

  • keypoint number: K

  • keypoint dimension: D

  • embedding tag dimension: L

  • heatmap size: [W, H]

参数
  • loss_weight (float) – Weight of the loss. Defaults to 1.0

  • push_loss_factor (float) – A factor that controls the weight between the push loss and the pull loss. Defaults to 0.5

forward(tags: torch.Tensor, keypoint_indices: Union[List[torch.Tensor], torch.Tensor])[源代码]

Compute associative embedding loss on a batch of data.

参数
  • tags (Tensor) – Tagging heatmaps in shape (B, L*K, H, W)

  • keypoint_indices (Tensor|List[Tensor]) – Ground-truth keypint position indices represented by a Tensor in shape (B, N, K, 2), or a list of B Tensors in shape (N_i, K, 2) Each keypoint’s index is represented as [i, v], where i is the position index in the heatmap (\(i=y*w+x\)) and v is the visibility

返回

  • pull_loss (Tensor)

  • push_loss (Tensor)

返回类型

tuple

class mmpose.models.losses.BCELoss(use_target_weight=False, loss_weight=1.0)[源代码]

Binary Cross Entropy loss.

参数
  • use_target_weight (bool) – Option to use weighted loss. Different joint types may have different target weights.

  • loss_weight (float) – Weight of the loss. Default: 1.0.

forward(output, target, target_weight=None)[源代码]

Forward function.

备注

  • batch_size: N

  • num_labels: K

参数
  • output (torch.Tensor[N, K]) – Output classification.

  • target (torch.Tensor[N, K]) – Target classification.

  • target_weight (torch.Tensor[N, K] or torch.Tensor[N]) – Weights across different labels.

class mmpose.models.losses.BoneLoss(joint_parents, use_target_weight=False, loss_weight=1.0)[源代码]

Bone length loss.

参数
  • joint_parents (list) – Indices of each joint’s parent joint.

  • use_target_weight (bool) – Option to use weighted bone loss. Different bone types may have different target weights.

  • loss_weight (float) – Weight of the loss. Default: 1.0.

forward(output, target, target_weight=None)[源代码]

Forward function.

备注

  • batch_size: N

  • num_keypoints: K

  • dimension of keypoints: D (D=2 or D=3)

参数
  • output (torch.Tensor[N, K, D]) – Output regression.

  • target (torch.Tensor[N, K, D]) – Target regression.

  • target_weight (torch.Tensor[N, K-1]) – Weights across different bone types.

class mmpose.models.losses.CombinedLoss(losses: Dict[str, Union[mmengine.config.config.ConfigDict, dict]])[源代码]

A wrapper to combine multiple loss functions. These loss functions can have different input type (e.g. heatmaps or regression values), and can only be involed individually and explixitly.

参数

losses (Dict[str, ConfigType]) – The names and configs of loss functions to be wrapped

Example::
>>> heatmap_loss_cfg = dict(type='KeypointMSELoss')
>>> ae_loss_cfg = dict(type='AssociativeEmbeddingLoss')
>>> loss_module = CombinedLoss(
...     losses=dict(
...         heatmap_loss=heatmap_loss_cfg,
...         ae_loss=ae_loss_cfg))
>>> loss_hm = loss_module.heatmap_loss(pred_heatmap, gt_heatmap)
>>> loss_ae = loss_module.ae_loss(pred_tags, keypoint_indices)
class mmpose.models.losses.JSDiscretLoss(use_target_weight=True, size_average: bool = True)[源代码]

Discrete JS Divergence loss for DSNT with Gaussian Heatmap.

Modified from the official implementation.

参数
  • use_target_weight (bool) – Option to use weighted loss. Different joint types may have different target weights.

  • size_average (bool) – Option to average the loss by the batch_size.

forward(pred_hm, gt_hm, target_weight=None)[源代码]

Forward function.

参数
  • pred_hm (torch.Tensor[N, K, H, W]) – Predicted heatmaps.

  • gt_hm (torch.Tensor[N, K, H, W]) – Target heatmaps.

  • target_weight (torch.Tensor[N, K] or torch.Tensor[N]) – Weights across different labels.

返回

Loss value.

返回类型

torch.Tensor

js(pred_hm, gt_hm)[源代码]

Jensen-Shannon Divergence.

kl(p, q)[源代码]

Kullback-Leibler Divergence.

class mmpose.models.losses.KLDiscretLoss(beta=1.0, label_softmax=False, use_target_weight=True)[源代码]

Discrete KL Divergence loss for SimCC with Gaussian Label Smoothing. Modified from `the official implementation.

<https://github.com/leeyegy/SimCC>`_. :param beta: Temperature factor of Softmax. :type beta: float :param label_softmax: Whether to use Softmax on labels. :type label_softmax: bool :param use_target_weight: Option to use weighted loss.

Different joint types may have different target weights.

criterion(dec_outs, labels)[源代码]

Criterion function.

forward(pred_simcc, gt_simcc, target_weight)[源代码]

Forward function.

参数
  • pred_simcc (Tuple[Tensor, Tensor]) – Predicted SimCC vectors of x-axis and y-axis.

  • gt_simcc (Tuple[Tensor, Tensor]) – Target representations.

  • target_weight (torch.Tensor[N, K] or torch.Tensor[N]) – Weights across different labels.

class mmpose.models.losses.KeypointMSELoss(use_target_weight: bool = False, skip_empty_channel: bool = False, loss_weight: float = 1.0)[源代码]

MSE loss for heatmaps.

参数
  • use_target_weight (bool) – Option to use weighted MSE loss. Different joint types may have different target weights. Defaults to False

  • skip_empty_channel (bool) – If True, heatmap channels with no non-zero value (which means no visible ground-truth keypoint in the image) will not be used to calculate the loss. Defaults to False

  • loss_weight (float) – Weight of the loss. Defaults to 1.0

forward(output: torch.Tensor, target: torch.Tensor, target_weights: Optional[torch.Tensor] = None, mask: Optional[torch.Tensor] = None) torch.Tensor[源代码]

Forward function of loss.

备注

  • batch_size: B

  • num_keypoints: K

  • heatmaps height: H

  • heatmaps weight: W

参数
  • output (Tensor) – The output heatmaps with shape [B, K, H, W]

  • target (Tensor) – The target heatmaps with shape [B, K, H, W]

  • target_weights (Tensor, optional) – The target weights of differet keypoints, with shape [B, K] (keypoint-wise) or [B, K, H, W] (pixel-wise).

  • mask (Tensor, optional) – The masks of valid heatmap pixels in shape [B, K, H, W] or [B, 1, H, W]. If None, no mask will be applied. Defaults to None

返回

The calculated loss.

返回类型

Tensor

class mmpose.models.losses.KeypointOHKMMSELoss(use_target_weight: bool = False, topk: int = 8, loss_weight: float = 1.0)[源代码]

MSE loss with online hard keypoint mining.

参数
  • use_target_weight (bool) – Option to use weighted MSE loss. Different joint types may have different target weights. Defaults to False

  • topk (int) – Only top k joint losses are kept. Defaults to 8

  • loss_weight (float) – Weight of the loss. Defaults to 1.0

forward(output: torch.Tensor, target: torch.Tensor, target_weights: torch.Tensor) torch.Tensor[源代码]

Forward function of loss.

备注

  • batch_size: B

  • num_keypoints: K

  • heatmaps height: H

  • heatmaps weight: W

参数
  • output (Tensor) – The output heatmaps with shape [B, K, H, W].

  • target (Tensor) – The target heatmaps with shape [B, K, H, W].

  • target_weights (Tensor) – The target weights of differet keypoints, with shape [B, K].

返回

The calculated loss.

返回类型

Tensor

class mmpose.models.losses.L1Loss(use_target_weight=False, loss_weight=1.0)[源代码]

L1Loss loss .

forward(output, target, target_weight=None)[源代码]

Forward function.

备注

  • batch_size: N

  • num_keypoints: K

参数
  • output (torch.Tensor[N, K, 2]) – Output regression.

  • target (torch.Tensor[N, K, 2]) – Target regression.

  • target_weight (torch.Tensor[N, K, 2]) – Weights across different joint types.

class mmpose.models.losses.MPJPELoss(use_target_weight=False, loss_weight=1.0)[源代码]

MPJPE (Mean Per Joint Position Error) loss.

参数
  • use_target_weight (bool) – Option to use weighted MSE loss. Different joint types may have different target weights.

  • loss_weight (float) – Weight of the loss. Default: 1.0.

forward(output, target, target_weight=None)[源代码]

Forward function.

备注

  • batch_size: N

  • num_keypoints: K

  • dimension of keypoints: D (D=2 or D=3)

参数
  • output (torch.Tensor[N, K, D]) – Output regression.

  • target (torch.Tensor[N, K, D]) – Target regression.

  • target_weight (torch.Tensor[N,K,D]) – Weights across different joint types.

class mmpose.models.losses.MSELoss(use_target_weight=False, loss_weight=1.0)[源代码]

MSE loss for coordinate regression.

forward(output, target, target_weight=None)[源代码]

Forward function.

备注

  • batch_size: N

  • num_keypoints: K

参数
  • output (torch.Tensor[N, K, 2]) – Output regression.

  • target (torch.Tensor[N, K, 2]) – Target regression.

  • target_weight (torch.Tensor[N, K, 2]) – Weights across different joint types.

class mmpose.models.losses.MultipleLossWrapper(losses: list)[源代码]

A wrapper to collect multiple loss functions together and return a list of losses in the same order.

参数

losses (list) – List of Loss Config

forward(input_list, target_list, keypoint_weights=None)[源代码]

Forward function.

备注

  • batch_size: N

  • num_keypoints: K

  • dimension of keypoints: D (D=2 or D=3)

参数
  • input_list (List[Tensor]) – List of inputs.

  • target_list (List[Tensor]) – List of targets.

  • keypoint_weights (Tensor[N, K, D]) – Weights across different joint types.

class mmpose.models.losses.RLELoss(use_target_weight=False, size_average=True, residual=True, q_distribution='laplace')[源代码]

RLE Loss.

Human Pose Regression With Residual Log-Likelihood Estimation arXiv:.

Code is modified from the official implementation.

参数
  • use_target_weight (bool) – Option to use weighted loss. Different joint types may have different target weights.

  • size_average (bool) – Option to average the loss by the batch_size.

  • residual (bool) – Option to add L1 loss and let the flow learn the residual error distribution.

  • q_dis (string) – Option for the identity Q(error) distribution, Options: “laplace” or “gaussian”

forward(pred, sigma, target, target_weight=None)[源代码]

Forward function.

备注

  • batch_size: N

  • num_keypoints: K

  • dimension of keypoints: D (D=2 or D=3)

参数
  • pred (Tensor[N, K, D]) – Output regression.

  • sigma (Tensor[N, K, D]) – Output sigma.

  • target (Tensor[N, K, D]) – Target regression.

  • target_weight (Tensor[N, K, D]) – Weights across different joint types.

class mmpose.models.losses.SemiSupervisionLoss(joint_parents, projection_loss_weight=1.0, bone_loss_weight=1.0, warmup_iterations=0)[源代码]

Semi-supervision loss for unlabeled data. It is composed of projection loss and bone loss.

Paper ref: 3D human pose estimation in video with temporal convolutions and semi-supervised training Dario Pavllo et al. CVPR’2019.

参数
  • joint_parents (list) – Indices of each joint’s parent joint.

  • projection_loss_weight (float) – Weight for projection loss.

  • bone_loss_weight (float) – Weight for bone loss.

  • warmup_iterations (int) – Number of warmup iterations. In the first warmup_iterations iterations, the model is trained only on labeled data, and semi-supervision loss will be 0. This is a workaround since currently we cannot access epoch number in loss functions. Note that the iteration number in an epoch can be changed due to different GPU numbers in multi-GPU settings. So please set this parameter carefully. warmup_iterations = dataset_size // samples_per_gpu // gpu_num * warmup_epochs

forward(output, target)[源代码]

Defines the computation performed at every call.

Should be overridden by all subclasses.

备注

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

static project_joints(x, intrinsics)[源代码]

Project 3D joint coordinates to 2D image plane using camera intrinsic parameters.

参数
  • x (torch.Tensor[N, K, 3]) – 3D joint coordinates.

  • intrinsics (torch.Tensor[N, 4] | torch.Tensor[N, 9]) – Camera intrinsics: f (2), c (2), k (3), p (2).

class mmpose.models.losses.SmoothL1Loss(use_target_weight=False, loss_weight=1.0)[源代码]

SmoothL1Loss loss.

参数
  • use_target_weight (bool) – Option to use weighted MSE loss. Different joint types may have different target weights.

  • loss_weight (float) – Weight of the loss. Default: 1.0.

forward(output, target, target_weight=None)[源代码]

Forward function.

备注

  • batch_size: N

  • num_keypoints: K

  • dimension of keypoints: D (D=2 or D=3)

参数
  • output (torch.Tensor[N, K, D]) – Output regression.

  • target (torch.Tensor[N, K, D]) – Target regression.

  • target_weight (torch.Tensor[N, K, D]) – Weights across different joint types.

class mmpose.models.losses.SoftWeightSmoothL1Loss(use_target_weight=False, supervise_empty=True, beta=1.0, loss_weight=1.0)[源代码]

Smooth L1 loss with soft weight for regression.

参数
  • use_target_weight (bool) – Option to use weighted MSE loss. Different joint types may have different target weights.

  • supervise_empty (bool) – Whether to supervise the output with zero weight.

  • beta (float) – Specifies the threshold at which to change between L1 and L2 loss.

  • loss_weight (float) – Weight of the loss. Default: 1.0.

forward(output, target, target_weight=None)[源代码]

Forward function.

备注

  • batch_size: N

  • num_keypoints: K

  • dimension of keypoints: D (D=2 or D=3)

参数
  • output (torch.Tensor[N, K, D]) – Output regression.

  • target (torch.Tensor[N, K, D]) – Target regression.

  • target_weight (torch.Tensor[N, K, D]) – Weights across different joint types.

static smooth_l1_loss(input, target, reduction='none', beta=1.0)[源代码]

Re-implement torch.nn.functional.smooth_l1_loss with beta to support pytorch <= 1.6.

class mmpose.models.losses.SoftWingLoss(omega1=2.0, omega2=20.0, epsilon=0.5, use_target_weight=False, loss_weight=1.0)[源代码]

Soft Wing Loss ‘Structure-Coherent Deep Feature Learning for Robust Face Alignment’ Lin et al. TIP’2021.

loss =
  1. |x| , if |x| < omega1

  2. omega2*ln(1+|x|/epsilon) + B, if |x| >= omega1

参数
  • omega1 (float) – The first threshold.

  • omega2 (float) – The second threshold.

  • epsilon (float) – Also referred to as curvature.

  • use_target_weight (bool) – Option to use weighted MSE loss. Different joint types may have different target weights.

  • loss_weight (float) – Weight of the loss. Default: 1.0.

criterion(pred, target)[源代码]

Criterion of wingloss.

备注

batch_size: N num_keypoints: K dimension of keypoints: D (D=2 or D=3)

参数
  • pred (torch.Tensor[N, K, D]) – Output regression.

  • target (torch.Tensor[N, K, D]) – Target regression.

forward(output, target, target_weight=None)[源代码]

Forward function.

备注

batch_size: N num_keypoints: K dimension of keypoints: D (D=2 or D=3)

参数
  • output (torch.Tensor[N, K, D]) – Output regression.

  • target (torch.Tensor[N, K, D]) – Target regression.

  • target_weight (torch.Tensor[N, K, D]) – Weights across different joint types.

class mmpose.models.losses.WingLoss(omega=10.0, epsilon=2.0, use_target_weight=False, loss_weight=1.0)[源代码]

Wing Loss. paper ref: ‘Wing Loss for Robust Facial Landmark Localisation with Convolutional Neural Networks’ Feng et al. CVPR’2018.

参数
  • omega (float) – Also referred to as width.

  • epsilon (float) – Also referred to as curvature.

  • use_target_weight (bool) – Option to use weighted MSE loss. Different joint types may have different target weights.

  • loss_weight (float) – Weight of the loss. Default: 1.0.

criterion(pred, target)[源代码]

Criterion of wingloss.

备注

  • batch_size: N

  • num_keypoints: K

  • dimension of keypoints: D (D=2 or D=3)

参数
  • pred (torch.Tensor[N, K, D]) – Output regression.

  • target (torch.Tensor[N, K, D]) – Target regression.

forward(output, target, target_weight=None)[源代码]

Forward function.

备注

  • batch_size: N

  • num_keypoints: K

  • dimension of keypoints: D (D=2 or D=3)

参数
  • output (torch.Tensor[N, K, D]) – Output regression.

  • target (torch.Tensor[N, K, D]) – Target regression.

  • target_weight (torch.Tensor[N,K,D]) – Weights across different joint types.

misc

class mmpose.models.utils.PatchEmbed(in_channels=3, embed_dims=768, conv_type='Conv2d', kernel_size=16, stride=16, padding='corner', dilation=1, bias=True, norm_cfg=None, input_size=None, init_cfg=None)[源代码]

Image to Patch Embedding.

We use a conv layer to implement PatchEmbed.

参数
  • in_channels (int) – The num of input channels. Default: 3

  • embed_dims (int) – The dimensions of embedding. Default: 768

  • conv_type (str) – The config dict for embedding conv layer type selection. Default: “Conv2d.

  • kernel_size (int) – The kernel_size of embedding conv. Default: 16.

  • stride (int) – The slide stride of embedding conv. Default: None (Would be set as kernel_size).

  • padding (int | tuple | string) – The padding length of embedding conv. When it is a string, it means the mode of adaptive padding, support “same” and “corner” now. Default: “corner”.

  • dilation (int) – The dilation rate of embedding conv. Default: 1.

  • bias (bool) – Bias of embed conv. Default: True.

  • norm_cfg (dict, optional) – Config dict for normalization layer. Default: None.

  • input_size (int | tuple | None) – The size of input, which will be used to calculate the out size. Only work when dynamic_size is False. Default: None.

  • init_cfg (mmcv.ConfigDict, optional) – The Config for initialization. Default: None.

forward(x)[源代码]
参数

x (Tensor) – Has shape (B, C, H, W). In most case, C is 3.

返回

Contains merged results and its spatial shape.

  • x (Tensor): Has shape (B, out_h * out_w, embed_dims)

  • out_size (tuple[int]): Spatial shape of x, arrange as

    (out_h, out_w).

返回类型

tuple

class mmpose.models.utils.RTMCCBlock(num_token, in_token_dims, out_token_dims, expansion_factor=2, s=128, eps=1e-05, dropout_rate=0.0, drop_path=0.0, attn_type='self-attn', act_fn='SiLU', bias=False, use_rel_bias=True, pos_enc=False)[源代码]

Gated Attention Unit (GAU) in RTMBlock.

参数
  • num_token (int) – The number of tokens.

  • in_token_dims (int) – The input token dimension.

  • out_token_dims (int) – The output token dimension.

  • expansion_factor (int, optional) – The expansion factor of the intermediate token dimension. Defaults to 2.

  • s (int, optional) – The self-attention feature dimension. Defaults to 128.

  • eps (float, optional) – The minimum value in clamp. Defaults to 1e-5.

  • dropout_rate (float, optional) – The dropout rate. Defaults to 0.0.

  • drop_path (float, optional) – The drop path rate. Defaults to 0.0.

  • attn_type (str, optional) –

    Type of attention which should be one of the following options:

    • ’self-attn’: Self-attention.

    • ’cross-attn’: Cross-attention.

    Defaults to ‘self-attn’.

  • act_fn (str, optional) –

    The activation function which should be one of the following options:

    • ’ReLU’: ReLU activation.

    • ’SiLU’: SiLU activation.

    Defaults to ‘SiLU’.

  • bias (bool, optional) – Whether to use bias in linear layers. Defaults to False.

  • use_rel_bias (bool, optional) – Whether to use relative bias. Defaults to True.

  • pos_enc (bool, optional) – Whether to use rotary position embedding. Defaults to False.

Reference:

Transformer Quality in Linear Time

forward(x)[源代码]

Forward function.

rel_pos_bias(seq_len, k_len=None)[源代码]

Add relative position bias.

mmpose.models.utils.check_and_update_config(neck: Optional[Union[mmengine.config.config.Config, mmengine.config.config.ConfigDict]], head: Union[mmengine.config.config.Config, mmengine.config.config.ConfigDict]) Tuple[Optional[Dict], Dict][源代码]

Check and update the configuration of the head and neck components. :param neck: Configuration for the neck component. :type neck: Optional[ConfigType] :param head: Configuration for the head component. :type head: ConfigType

返回

Updated configurations for the neck

and head components.

返回类型

Tuple[Optional[Dict], Dict]

mmpose.models.utils.nchw_to_nlc(x)[源代码]

Flatten [N, C, H, W] shape tensor to [N, L, C] shape tensor.

参数

x (Tensor) – The input tensor of shape [N, C, H, W] before conversion.

返回

The output tensor of shape [N, L, C] after conversion.

返回类型

Tensor

mmpose.models.utils.nlc_to_nchw(x, hw_shape)[源代码]

Convert [N, L, C] shape tensor to [N, C, H, W] shape tensor.

参数
  • x (Tensor) – The input tensor of shape [N, L, C] before conversion.

  • hw_shape (Sequence[int]) – The height and width of output feature map.

返回

The output tensor of shape [N, C, H, W] after conversion.

返回类型

Tensor

mmpose.models.utils.rope(x, dim)[源代码]

Applies Rotary Position Embedding to input tensor.

参数
  • x (torch.Tensor) – Input tensor.

  • dim (int | list[int]) – The spatial dimension(s) to apply rotary position embedding.

返回

The tensor after applying rotary position

embedding.

返回类型

torch.Tensor

Reference:

RoFormer: Enhanced Transformer with Rotary Position Embedding

mmpose.datasets

class mmpose.datasets.CombinedDataset(metainfo: dict, datasets: list, pipeline: List[Union[dict, Callable]] = [], **kwargs)[源代码]

A wrapper of combined dataset.

参数
  • metainfo (dict) – The meta information of combined dataset.

  • datasets (list) – The configs of datasets to be combined.

  • pipeline (list, optional) – Processing pipeline. Defaults to [].

full_init()[源代码]

Fully initialize all sub datasets.

get_data_info(idx: int) dict[源代码]

Get annotation by index.

参数

idx (int) – Global index of CombinedDataset.

返回

The idx-th annotation of the datasets.

返回类型

dict

property metainfo

Get meta information of dataset.

返回

meta information collected from BaseDataset.METAINFO, annotation file and metainfo argument during instantiation.

返回类型

dict

prepare_data(idx: int) Any[源代码]

Get data processed by self.pipeline.The source dataset is depending on the index.

参数

idx (int) – The index of data_info.

返回

Depends on self.pipeline.

返回类型

Any

class mmpose.datasets.MultiSourceSampler(dataset: Sized, batch_size: int, source_ratio: List[Union[int, float]], shuffle: bool = True, round_up: bool = True, seed: Optional[int] = None)[源代码]

Multi-Source Sampler. According to the sampling ratio, sample data from different datasets to form batches.

参数
  • dataset (Sized) – The dataset

  • batch_size (int) – Size of mini-batch

  • source_ratio (list[int | float]) – The sampling ratio of different source datasets in a mini-batch

  • shuffle (bool) – Whether shuffle the dataset or not. Defaults to True

  • round_up (bool) – Whether to add extra samples to make the number of samples evenly divisible by the world size. Defaults to True.

  • seed (int, optional) – Random seed. If None, set a random seed. Defaults to None

set_epoch(epoch: int) None[源代码]

Compatible in `epoch-based runner.

mmpose.datasets.build_dataset(cfg, default_args=None)[源代码]

Build a dataset from config dict.

参数
  • cfg (dict) – Config dict. It should at least contain the key “type”.

  • default_args (dict, optional) – Default initialization arguments. Default: None.

返回

The constructed dataset.

返回类型

Dataset

datasets

class mmpose.datasets.datasets.base.BaseCocoStyleDataset(ann_file: str = '', bbox_file: Optional[str] = None, data_mode: str = 'topdown', metainfo: Optional[dict] = None, data_root: Optional[str] = None, data_prefix: dict = {'img': ''}, filter_cfg: Optional[dict] = None, indices: Optional[Union[int, Sequence[int]]] = None, serialize_data: bool = True, pipeline: List[Union[dict, Callable]] = [], test_mode: bool = False, lazy_init: bool = False, max_refetch: int = 1000)[源代码]

Base class for COCO-style datasets.

参数
  • ann_file (str) – Annotation file path. Default: ‘’.

  • bbox_file (str, optional) – Detection result file path. If bbox_file is set, detected bboxes loaded from this file will be used instead of ground-truth bboxes. This setting is only for evaluation, i.e., ignored when test_mode is False. Default: None.

  • data_mode (str) – Specifies the mode of data samples: 'topdown' or 'bottomup'. In 'topdown' mode, each data sample contains one instance; while in 'bottomup' mode, each data sample contains all instances in a image. Default: 'topdown'

  • metainfo (dict, optional) – Meta information for dataset, such as class information. Default: None.

  • data_root (str, optional) – The root directory for data_prefix and ann_file. Default: None.

  • data_prefix (dict, optional) – Prefix for training data. Default: dict(img='').

  • filter_cfg (dict, optional) – Config for filter data. Default: None.

  • indices (int or Sequence[int], optional) – Support using first few data in annotation file to facilitate training/testing on a smaller dataset. Default: None which means using all data_infos.

  • serialize_data (bool, optional) – Whether to hold memory using serialized objects, when enabled, data loader workers can use shared RAM from master process instead of making a copy. Default: True.

  • pipeline (list, optional) – Processing pipeline. Default: [].

  • test_mode (bool, optional) – test_mode=True means in test phase. Default: False.

  • lazy_init (bool, optional) – Whether to load annotation during instantiation. In some cases, such as visualization, only the meta information of the dataset is needed, which is not necessary to load annotation file. Basedataset can skip load annotations to save time by set lazy_init=False. Default: False.

  • max_refetch (int, optional) – If Basedataset.prepare_data get a None img. The maximum extra number of cycles to get a valid image. Default: 1000.

filter_data() List[dict][源代码]

Filter annotations according to filter_cfg. Defaults return full data_list.

If ‘bbox_score_thr` in filter_cfg, the annotation with bbox_score below the threshold bbox_score_thr will be filtered out.

get_data_info(idx: int) dict[源代码]

Get data info by index.

参数

idx (int) – Index of data info.

返回

Data info.

返回类型

dict

load_data_list() List[dict][源代码]

Load data list from COCO annotation file or person detection result file.

parse_data_info(raw_data_info: dict) Optional[dict][源代码]

Parse raw COCO annotation of an instance.

参数

raw_data_info (dict) –

Raw data information loaded from ann_file. It should have following contents:

  • 'raw_ann_info': Raw annotation of an instance

  • 'raw_img_info': Raw information of the image that

    contains the instance

返回

Parsed instance annotation

返回类型

dict | None

prepare_data(idx) Any[源代码]

Get data processed by self.pipeline.

BaseCocoStyleDataset overrides this method from mmengine.dataset.BaseDataset to add the metainfo into the data_info before it is passed to the pipeline.

参数

idx (int) – The index of data_info.

返回

Depends on self.pipeline.

返回类型

Any

class mmpose.datasets.datasets.body.AicDataset(ann_file: str = '', bbox_file: Optional[str] = None, data_mode: str = 'topdown', metainfo: Optional[dict] = None, data_root: Optional[str] = None, data_prefix: dict = {'img': ''}, filter_cfg: Optional[dict] = None, indices: Optional[Union[int, Sequence[int]]] = None, serialize_data: bool = True, pipeline: List[Union[dict, Callable]] = [], test_mode: bool = False, lazy_init: bool = False, max_refetch: int = 1000)[源代码]

AIC dataset for pose estimation.

“AI Challenger : A Large-scale Dataset for Going Deeper in Image Understanding”, arXiv’2017. More details can be found in the paper

AIC keypoints:

0: "right_shoulder",
1: "right_elbow",
2: "right_wrist",
3: "left_shoulder",
4: "left_elbow",
5: "left_wrist",
6: "right_hip",
7: "right_knee",
8: "right_ankle",
9: "left_hip",
10: "left_knee",
11: "left_ankle",
12: "head_top",
13: "neck"
参数
  • ann_file (str) – Annotation file path. Default: ‘’.

  • bbox_file (str, optional) – Detection result file path. If bbox_file is set, detected bboxes loaded from this file will be used instead of ground-truth bboxes. This setting is only for evaluation, i.e., ignored when test_mode is False. Default: None.

  • data_mode (str) – Specifies the mode of data samples: 'topdown' or 'bottomup'. In 'topdown' mode, each data sample contains one instance; while in 'bottomup' mode, each data sample contains all instances in a image. Default: 'topdown'

  • metainfo (dict, optional) – Meta information for dataset, such as class information. Default: None.

  • data_root (str, optional) – The root directory for data_prefix and ann_file. Default: None.

  • data_prefix (dict, optional) – Prefix for training data. Default: dict(img=None, ann=None).

  • filter_cfg (dict, optional) – Config for filter data. Default: None.

  • indices (int or Sequence[int], optional) – Support using first few data in annotation file to facilitate training/testing on a smaller dataset. Default: None which means using all data_infos.

  • serialize_data (bool, optional) – Whether to hold memory using serialized objects, when enabled, data loader workers can use shared RAM from master process instead of making a copy. Default: True.

  • pipeline (list, optional) – Processing pipeline. Default: [].

  • test_mode (bool, optional) – test_mode=True means in test phase. Default: False.

  • lazy_init (bool, optional) – Whether to load annotation during instantiation. In some cases, such as visualization, only the meta information of the dataset is needed, which is not necessary to load annotation file. Basedataset can skip load annotations to save time by set lazy_init=False. Default: False.

  • max_refetch (int, optional) – If Basedataset.prepare_data get a None img. The maximum extra number of cycles to get a valid image. Default: 1000.

class mmpose.datasets.datasets.body.CocoDataset(ann_file: str = '', bbox_file: Optional[str] = None, data_mode: str = 'topdown', metainfo: Optional[dict] = None, data_root: Optional[str] = None, data_prefix: dict = {'img': ''}, filter_cfg: Optional[dict] = None, indices: Optional[Union[int, Sequence[int]]] = None, serialize_data: bool = True, pipeline: List[Union[dict, Callable]] = [], test_mode: bool = False, lazy_init: bool = False, max_refetch: int = 1000)[源代码]

COCO dataset for pose estimation.

“Microsoft COCO: Common Objects in Context”, ECCV’2014. More details can be found in the paper .

COCO keypoints:

0: 'nose',
1: 'left_eye',
2: 'right_eye',
3: 'left_ear',
4: 'right_ear',
5: 'left_shoulder',
6: 'right_shoulder',
7: 'left_elbow',
8: 'right_elbow',
9: 'left_wrist',
10: 'right_wrist',
11: 'left_hip',
12: 'right_hip',
13: 'left_knee',
14: 'right_knee',
15: 'left_ankle',
16: 'right_ankle'
参数
  • ann_file (str) – Annotation file path. Default: ‘’.

  • bbox_file (str, optional) – Detection result file path. If bbox_file is set, detected bboxes loaded from this file will be used instead of ground-truth bboxes. This setting is only for evaluation, i.e., ignored when test_mode is False. Default: None.

  • data_mode (str) – Specifies the mode of data samples: 'topdown' or 'bottomup'. In 'topdown' mode, each data sample contains one instance; while in 'bottomup' mode, each data sample contains all instances in a image. Default: 'topdown'

  • metainfo (dict, optional) – Meta information for dataset, such as class information. Default: None.

  • data_root (str, optional) – The root directory for data_prefix and ann_file. Default: None.

  • data_prefix (dict, optional) – Prefix for training data. Default: dict(img=None, ann=None).

  • filter_cfg (dict, optional) – Config for filter data. Default: None.

  • indices (int or Sequence[int], optional) – Support using first few data in annotation file to facilitate training/testing on a smaller dataset. Default: None which means using all data_infos.

  • serialize_data (bool, optional) – Whether to hold memory using serialized objects, when enabled, data loader workers can use shared RAM from master process instead of making a copy. Default: True.

  • pipeline (list, optional) – Processing pipeline. Default: [].

  • test_mode (bool, optional) – test_mode=True means in test phase. Default: False.

  • lazy_init (bool, optional) – Whether to load annotation during instantiation. In some cases, such as visualization, only the meta information of the dataset is needed, which is not necessary to load annotation file. Basedataset can skip load annotations to save time by set lazy_init=False. Default: False.

  • max_refetch (int, optional) – If Basedataset.prepare_data get a None img. The maximum extra number of cycles to get a valid image. Default: 1000.

class mmpose.datasets.datasets.body.CrowdPoseDataset(ann_file: str = '', bbox_file: Optional[str] = None, data_mode: str = 'topdown', metainfo: Optional[dict] = None, data_root: Optional[str] = None, data_prefix: dict = {'img': ''}, filter_cfg: Optional[dict] = None, indices: Optional[Union[int, Sequence[int]]] = None, serialize_data: bool = True, pipeline: List[Union[dict, Callable]] = [], test_mode: bool = False, lazy_init: bool = False, max_refetch: int = 1000)[源代码]

CrowdPose dataset for pose estimation.

“CrowdPose: Efficient Crowded Scenes Pose Estimation and A New Benchmark”, CVPR’2019. More details can be found in the paper.

CrowdPose keypoints:

0: 'left_shoulder',
1: 'right_shoulder',
2: 'left_elbow',
3: 'right_elbow',
4: 'left_wrist',
5: 'right_wrist',
6: 'left_hip',
7: 'right_hip',
8: 'left_knee',
9: 'right_knee',
10: 'left_ankle',
11: 'right_ankle',
12: 'top_head',
13: 'neck'
参数
  • ann_file (str) – Annotation file path. Default: ‘’.

  • bbox_file (str, optional) – Detection result file path. If bbox_file is set, detected bboxes loaded from this file will be used instead of ground-truth bboxes. This setting is only for evaluation, i.e., ignored when test_mode is False. Default: None.

  • data_mode (str) – Specifies the mode of data samples: 'topdown' or 'bottomup'. In 'topdown' mode, each data sample contains one instance; while in 'bottomup' mode, each data sample contains all instances in a image. Default: 'topdown'

  • metainfo (dict, optional) – Meta information for dataset, such as class information. Default: None.

  • data_root (str, optional) – The root directory for data_prefix and ann_file. Default: None.

  • data_prefix (dict, optional) – Prefix for training data. Default: dict(img=None, ann=None).

  • filter_cfg (dict, optional) – Config for filter data. Default: None.

  • indices (int or Sequence[int], optional) – Support using first few data in annotation file to facilitate training/testing on a smaller dataset. Default: None which means using all data_infos.

  • serialize_data (bool, optional) – Whether to hold memory using serialized objects, when enabled, data loader workers can use shared RAM from master process instead of making a copy. Default: True.

  • pipeline (list, optional) – Processing pipeline. Default: [].

  • test_mode (bool, optional) – test_mode=True means in test phase. Default: False.

  • lazy_init (bool, optional) – Whether to load annotation during instantiation. In some cases, such as visualization, only the meta information of the dataset is needed, which is not necessary to load annotation file. Basedataset can skip load annotations to save time by set lazy_init=False. Default: False.

  • max_refetch (int, optional) – If Basedataset.prepare_data get a None img. The maximum extra number of cycles to get a valid image. Default: 1000.

class mmpose.datasets.datasets.body.JhmdbDataset(ann_file: str = '', bbox_file: Optional[str] = None, data_mode: str = 'topdown', metainfo: Optional[dict] = None, data_root: Optional[str] = None, data_prefix: dict = {'img': ''}, filter_cfg: Optional[dict] = None, indices: Optional[Union[int, Sequence[int]]] = None, serialize_data: bool = True, pipeline: List[Union[dict, Callable]] = [], test_mode: bool = False, lazy_init: bool = False, max_refetch: int = 1000)[源代码]

JhmdbDataset dataset for pose estimation.

“Towards understanding action recognition”, ICCV’2013. More details can be found in the paper

sub-JHMDB keypoints:

0: "neck",
1: "belly",
2: "head",
3: "right_shoulder",
4: "left_shoulder",
5: "right_hip",
6: "left_hip",
7: "right_elbow",
8: "left_elbow",
9: "right_knee",
10: "left_knee",
11: "right_wrist",
12: "left_wrist",
13: "right_ankle",
14: "left_ankle"
参数
  • ann_file (str) – Annotation file path. Default: ‘’.

  • bbox_file (str, optional) – Detection result file path. If bbox_file is set, detected bboxes loaded from this file will be used instead of ground-truth bboxes. This setting is only for evaluation, i.e., ignored when test_mode is False. Default: None.

  • data_mode (str) – Specifies the mode of data samples: 'topdown' or 'bottomup'. In 'topdown' mode, each data sample contains one instance; while in 'bottomup' mode, each data sample contains all instances in a image. Default: 'topdown'

  • metainfo (dict, optional) – Meta information for dataset, such as class information. Default: None.

  • data_root (str, optional) – The root directory for data_prefix and ann_file. Default: None.

  • data_prefix (dict, optional) – Prefix for training data. Default: dict(img=None, ann=None).

  • filter_cfg (dict, optional) – Config for filter data. Default: None.

  • indices (int or Sequence[int], optional) – Support using first few data in annotation file to facilitate training/testing on a smaller dataset. Default: None which means using all data_infos.

  • serialize_data (bool, optional) – Whether to hold memory using serialized objects, when enabled, data loader workers can use shared RAM from master process instead of making a copy. Default: True.

  • pipeline (list, optional) – Processing pipeline. Default: [].

  • test_mode (bool, optional) – test_mode=True means in test phase. Default: False.

  • lazy_init (bool, optional) – Whether to load annotation during instantiation. In some cases, such as visualization, only the meta information of the dataset is needed, which is not necessary to load annotation file. Basedataset can skip load annotations to save time by set lazy_init=False. Default: False.

  • max_refetch (int, optional) – If Basedataset.prepare_data get a None img. The maximum extra number of cycles to get a valid image. Default: 1000.

parse_data_info(raw_data_info: dict) Optional[dict][源代码]

Parse raw COCO annotation of an instance.

参数

raw_data_info (dict) –

Raw data information loaded from ann_file. It should have following contents:

  • 'raw_ann_info': Raw annotation of an instance

  • 'raw_img_info': Raw information of the image that

    contains the instance

返回

Parsed instance annotation

返回类型

dict

class mmpose.datasets.datasets.body.MhpDataset(ann_file: str = '', bbox_file: Optional[str] = None, data_mode: str = 'topdown', metainfo: Optional[dict] = None, data_root: Optional[str] = None, data_prefix: dict = {'img': ''}, filter_cfg: Optional[dict] = None, indices: Optional[Union[int, Sequence[int]]] = None, serialize_data: bool = True, pipeline: List[Union[dict, Callable]] = [], test_mode: bool = False, lazy_init: bool = False, max_refetch: int = 1000)[源代码]

MHPv2.0 dataset for pose estimation.

“Understanding Humans in Crowded Scenes: Deep Nested Adversarial Learning and A New Benchmark for Multi-Human Parsing”, ACM MM’2018. More details can be found in the paper

MHP keypoints:

0: "right ankle",
1: "right knee",
2: "right hip",
3: "left hip",
4: "left knee",
5: "left ankle",
6: "pelvis",
7: "thorax",
8: "upper neck",
9: "head top",
10: "right wrist",
11: "right elbow",
12: "right shoulder",
13: "left shoulder",
14: "left elbow",
15: "left wrist",
参数
  • ann_file (str) – Annotation file path. Default: ‘’.

  • bbox_file (str, optional) – Detection result file path. If bbox_file is set, detected bboxes loaded from this file will be used instead of ground-truth bboxes. This setting is only for evaluation, i.e., ignored when test_mode is False. Default: None.

  • data_mode (str) – Specifies the mode of data samples: 'topdown' or 'bottomup'. In 'topdown' mode, each data sample contains one instance; while in 'bottomup' mode, each data sample contains all instances in a image. Default: 'topdown'

  • metainfo (dict, optional) – Meta information for dataset, such as class information. Default: None.

  • data_root (str, optional) – The root directory for data_prefix and ann_file. Default: None.

  • data_prefix (dict, optional) – Prefix for training data. Default: dict(img=None, ann=None).

  • filter_cfg (dict, optional) – Config for filter data. Default: None.

  • indices (int or Sequence[int], optional) – Support using first few data in annotation file to facilitate training/testing on a smaller dataset. Default: None which means using all data_infos.

  • serialize_data (bool, optional) – Whether to hold memory using serialized objects, when enabled, data loader workers can use shared RAM from master process instead of making a copy. Default: True.

  • pipeline (list, optional) – Processing pipeline. Default: [].

  • test_mode (bool, optional) – test_mode=True means in test phase. Default: False.

  • lazy_init (bool, optional) – Whether to load annotation during instantiation. In some cases, such as visualization, only the meta information of the dataset is needed, which is not necessary to load annotation file. Basedataset can skip load annotations to save time by set lazy_init=False. Default: False.

  • max_refetch (int, optional) – If Basedataset.prepare_data get a None img. The maximum extra number of cycles to get a valid image. Default: 1000.

class mmpose.datasets.datasets.body.MpiiDataset(ann_file: str = '', bbox_file: Optional[str] = None, headbox_file: Optional[str] = None, data_mode: str = 'topdown', metainfo: Optional[dict] = None, data_root: Optional[str] = None, data_prefix: dict = {'img': ''}, filter_cfg: Optional[dict] = None, indices: Optional[Union[int, Sequence[int]]] = None, serialize_data: bool = True, pipeline: List[Union[dict, Callable]] = [], test_mode: bool = False, lazy_init: bool = False, max_refetch: int = 1000)[源代码]

MPII Dataset for pose estimation.

“2D Human Pose Estimation: New Benchmark and State of the Art Analysis” ,CVPR’2014. More details can be found in the paper .

MPII keypoints:

0: 'right_ankle'
1: 'right_knee',
2: 'right_hip',
3: 'left_hip',
4: 'left_knee',
5: 'left_ankle',
6: 'pelvis',
7: 'thorax',
8: 'upper_neck',
9: 'head_top',
10: 'right_wrist',
11: 'right_elbow',
12: 'right_shoulder',
13: 'left_shoulder',
14: 'left_elbow',
15: 'left_wrist'
参数
  • ann_file (str) – Annotation file path. Default: ‘’.

  • bbox_file (str, optional) – Detection result file path. If bbox_file is set, detected bboxes loaded from this file will be used instead of ground-truth bboxes. This setting is only for evaluation, i.e., ignored when test_mode is False. Default: None.

  • headbox_file (str, optional) – The path of mpii_gt_val.mat which provides the headboxes information used for PCKh. Default: None.

  • data_mode (str) – Specifies the mode of data samples: 'topdown' or 'bottomup'. In 'topdown' mode, each data sample contains one instance; while in 'bottomup' mode, each data sample contains all instances in a image. Default: 'topdown'

  • metainfo (dict, optional) – Meta information for dataset, such as class information. Default: None.

  • data_root (str, optional) – The root directory for data_prefix and ann_file. Default: None.

  • data_prefix (dict, optional) – Prefix for training data. Default: dict(img=None, ann=None).

  • filter_cfg (dict, optional) – Config for filter data. Default: None.

  • indices (int or Sequence[int], optional) – Support using first few data in annotation file to facilitate training/testing on a smaller dataset. Default: None which means using all data_infos.

  • serialize_data (bool, optional) – Whether to hold memory using serialized objects, when enabled, data loader workers can use shared RAM from master process instead of making a copy. Default: True.

  • pipeline (list, optional) – Processing pipeline. Default: [].

  • test_mode (bool, optional) – test_mode=True means in test phase. Default: False.

  • lazy_init (bool, optional) – Whether to load annotation during instantiation. In some cases, such as visualization, only the meta information of the dataset is needed, which is not necessary to load annotation file. Basedataset can skip load annotations to save time by set lazy_init=False. Default: False.

  • max_refetch (int, optional) – If Basedataset.prepare_data get a None img. The maximum extra number of cycles to get a valid image. Default: 1000.

class mmpose.datasets.datasets.body.MpiiTrbDataset(ann_file: str = '', bbox_file: Optional[str] = None, data_mode: str = 'topdown', metainfo: Optional[dict] = None, data_root: Optional[str] = None, data_prefix: dict = {'img': ''}, filter_cfg: Optional[dict] = None, indices: Optional[Union[int, Sequence[int]]] = None, serialize_data: bool = True, pipeline: List[Union[dict, Callable]] = [], test_mode: bool = False, lazy_init: bool = False, max_refetch: int = 1000)[源代码]

MPII-TRB Dataset dataset for pose estimation.

“TRB: A Novel Triplet Representation for Understanding 2D Human Body”, ICCV’2019. More details can be found in the paper .

MPII-TRB keypoints:

0: 'left_shoulder'
1: 'right_shoulder'
2: 'left_elbow'
3: 'right_elbow'
4: 'left_wrist'
5: 'right_wrist'
6: 'left_hip'
7: 'right_hip'
8: 'left_knee'
9: 'right_knee'
10: 'left_ankle'
11: 'right_ankle'
12: 'head'
13: 'neck'

14: 'right_neck'
15: 'left_neck'
16: 'medial_right_shoulder'
17: 'lateral_right_shoulder'
18: 'medial_right_bow'
19: 'lateral_right_bow'
20: 'medial_right_wrist'
21: 'lateral_right_wrist'
22: 'medial_left_shoulder'
23: 'lateral_left_shoulder'
24: 'medial_left_bow'
25: 'lateral_left_bow'
26: 'medial_left_wrist'
27: 'lateral_left_wrist'
28: 'medial_right_hip'
29: 'lateral_right_hip'
30: 'medial_right_knee'
31: 'lateral_right_knee'
32: 'medial_right_ankle'
33: 'lateral_right_ankle'
34: 'medial_left_hip'
35: 'lateral_left_hip'
36: 'medial_left_knee'
37: 'lateral_left_knee'
38: 'medial_left_ankle'
39: 'lateral_left_ankle'
参数
  • ann_file (str) – Annotation file path. Default: ‘’.

  • bbox_file (str, optional) – Detection result file path. If bbox_file is set, detected bboxes loaded from this file will be used instead of ground-truth bboxes. This setting is only for evaluation, i.e., ignored when test_mode is False. Default: None.

  • data_mode (str) – Specifies the mode of data samples: 'topdown' or 'bottomup'. In 'topdown' mode, each data sample contains one instance; while in 'bottomup' mode, each data sample contains all instances in a image. Default: 'topdown'

  • metainfo (dict, optional) – Meta information for dataset, such as class information. Default: None.

  • data_root (str, optional) – The root directory for data_prefix and ann_file. Default: None.

  • data_prefix (dict, optional) – Prefix for training data. Default: dict(img=None, ann=None).

  • filter_cfg (dict, optional) – Config for filter data. Default: None.

  • indices (int or Sequence[int], optional) – Support using first few data in annotation file to facilitate training/testing on a smaller dataset. Default: None which means using all data_infos.

  • serialize_data (bool, optional) – Whether to hold memory using serialized objects, when enabled, data loader workers can use shared RAM from master process instead of making a copy. Default: True.

  • pipeline (list, optional) – Processing pipeline. Default: [].

  • test_mode (bool, optional) – test_mode=True means in test phase. Default: False.

  • lazy_init (bool, optional) – Whether to load annotation during instantiation. In some cases, such as visualization, only the meta information of the dataset is needed, which is not necessary to load annotation file. Basedataset can skip load annotations to save time by set lazy_init=False. Default: False.

  • max_refetch (int, optional) – If Basedataset.prepare_data get a None img. The maximum extra number of cycles to get a valid image. Default: 1000.

class mmpose.datasets.datasets.body.OCHumanDataset(ann_file: str = '', bbox_file: Optional[str] = None, data_mode: str = 'topdown', metainfo: Optional[dict] = None, data_root: Optional[str] = None, data_prefix: dict = {'img': ''}, filter_cfg: Optional[dict] = None, indices: Optional[Union[int, Sequence[int]]] = None, serialize_data: bool = True, pipeline: List[Union[dict, Callable]] = [], test_mode: bool = False, lazy_init: bool = False, max_refetch: int = 1000)[源代码]

OChuman dataset for pose estimation.

“Pose2Seg: Detection Free Human Instance Segmentation”, CVPR’2019. More details can be found in the paper .

“Occluded Human (OCHuman)” dataset contains 8110 heavily occluded human instances within 4731 images. OCHuman dataset is designed for validation and testing. To evaluate on OCHuman, the model should be trained on COCO training set, and then test the robustness of the model to occlusion using OCHuman.

OCHuman keypoints (same as COCO):

0: 'nose',
1: 'left_eye',
2: 'right_eye',
3: 'left_ear',
4: 'right_ear',
5: 'left_shoulder',
6: 'right_shoulder',
7: 'left_elbow',
8: 'right_elbow',
9: 'left_wrist',
10: 'right_wrist',
11: 'left_hip',
12: 'right_hip',
13: 'left_knee',
14: 'right_knee',
15: 'left_ankle',
16: 'right_ankle'
参数
  • ann_file (str) – Annotation file path. Default: ‘’.

  • bbox_file (str, optional) – Detection result file path. If bbox_file is set, detected bboxes loaded from this file will be used instead of ground-truth bboxes. This setting is only for evaluation, i.e., ignored when test_mode is False. Default: None.

  • data_mode (str) – Specifies the mode of data samples: 'topdown' or 'bottomup'. In 'topdown' mode, each data sample contains one instance; while in 'bottomup' mode, each data sample contains all instances in a image. Default: 'topdown'

  • metainfo (dict, optional) – Meta information for dataset, such as class information. Default: None.

  • data_root (str, optional) – The root directory for data_prefix and ann_file. Default: None.

  • data_prefix (dict, optional) – Prefix for training data. Default: dict(img=None, ann=None).

  • filter_cfg (dict, optional) – Config for filter data. Default: None.

  • indices (int or Sequence[int], optional) – Support using first few data in annotation file to facilitate training/testing on a smaller dataset. Default: None which means using all data_infos.

  • serialize_data (bool, optional) – Whether to hold memory using serialized objects, when enabled, data loader workers can use shared RAM from master process instead of making a copy. Default: True.

  • pipeline (list, optional) – Processing pipeline. Default: [].

  • test_mode (bool, optional) – test_mode=True means in test phase. Default: False.

  • lazy_init (bool, optional) – Whether to load annotation during instantiation. In some cases, such as visualization, only the meta information of the dataset is needed, which is not necessary to load annotation file. Basedataset can skip load annotations to save time by set lazy_init=False. Default: False.

  • max_refetch (int, optional) – If Basedataset.prepare_data get a None img. The maximum extra number of cycles to get a valid image. Default: 1000.

class mmpose.datasets.datasets.body.PoseTrack18Dataset(ann_file: str = '', bbox_file: Optional[str] = None, data_mode: str = 'topdown', metainfo: Optional[dict] = None, data_root: Optional[str] = None, data_prefix: dict = {'img': ''}, filter_cfg: Optional[dict] = None, indices: Optional[Union[int, Sequence[int]]] = None, serialize_data: bool = True, pipeline: List[Union[dict, Callable]] = [], test_mode: bool = False, lazy_init: bool = False, max_refetch: int = 1000)[源代码]

PoseTrack18 dataset for pose estimation.

“Posetrack: A benchmark for human pose estimation and tracking”, CVPR’2018. More details can be found in the paper .

PoseTrack2018 keypoints:

0: 'nose',
1: 'head_bottom',
2: 'head_top',
3: 'left_ear',
4: 'right_ear',
5: 'left_shoulder',
6: 'right_shoulder',
7: 'left_elbow',
8: 'right_elbow',
9: 'left_wrist',
10: 'right_wrist',
11: 'left_hip',
12: 'right_hip',
13: 'left_knee',
14: 'right_knee',
15: 'left_ankle',
16: 'right_ankle'
参数
  • ann_file (str) – Annotation file path. Default: ‘’.

  • bbox_file (str, optional) – Detection result file path. If bbox_file is set, detected bboxes loaded from this file will be used instead of ground-truth bboxes. This setting is only for evaluation, i.e., ignored when test_mode is False. Default: None.

  • data_mode (str) – Specifies the mode of data samples: 'topdown' or 'bottomup'. In 'topdown' mode, each data sample contains one instance; while in 'bottomup' mode, each data sample contains all instances in a image. Default: 'topdown'

  • metainfo (dict, optional) – Meta information for dataset, such as class information. Default: None.

  • data_root (str, optional) – The root directory for data_prefix and ann_file. Default: None.

  • data_prefix (dict, optional) – Prefix for training data. Default: dict(img=None, ann=None).

  • filter_cfg (dict, optional) – Config for filter data. Default: None.

  • indices (int or Sequence[int], optional) – Support using first few data in annotation file to facilitate training/testing on a smaller dataset. Default: None which means using all data_infos.

  • serialize_data (bool, optional) – Whether to hold memory using serialized objects, when enabled, data loader workers can use shared RAM from master process instead of making a copy. Default: True.

  • pipeline (list, optional) – Processing pipeline. Default: [].

  • test_mode (bool, optional) – test_mode=True means in test phase. Default: False.

  • lazy_init (bool, optional) – Whether to load annotation during instantiation. In some cases, such as visualization, only the meta information of the dataset is needed, which is not necessary to load annotation file. Basedataset can skip load annotations to save time by set lazy_init=False. Default: False.

  • max_refetch (int, optional) – If Basedataset.prepare_data get a None img. The maximum extra number of cycles to get a valid image. Default: 1000.

class mmpose.datasets.datasets.body.PoseTrack18VideoDataset(ann_file: str = '', bbox_file: Optional[str] = None, data_mode: str = 'topdown', frame_weights: List[Union[int, float]] = [0.0, 1.0], frame_sampler_mode: str = 'random', frame_range: Optional[Union[int, List[int]]] = None, num_sampled_frame: Optional[int] = None, frame_indices: Optional[Sequence[int]] = None, ph_fill_len: int = 6, metainfo: Optional[dict] = None, data_root: Optional[str] = None, data_prefix: dict = {'img': ''}, filter_cfg: Optional[dict] = None, indices: Optional[Union[int, Sequence[int]]] = None, serialize_data: bool = True, pipeline: List[Union[dict, Callable]] = [], test_mode: bool = False, lazy_init: bool = False, max_refetch: int = 1000)[源代码]

PoseTrack18 dataset for video pose estimation.

“Posetrack: A benchmark for human pose estimation and tracking”, CVPR’2018. More details can be found in the paper .

PoseTrack2018 keypoints:

0: 'nose',
1: 'head_bottom',
2: 'head_top',
3: 'left_ear',
4: 'right_ear',
5: 'left_shoulder',
6: 'right_shoulder',
7: 'left_elbow',
8: 'right_elbow',
9: 'left_wrist',
10: 'right_wrist',
11: 'left_hip',
12: 'right_hip',
13: 'left_knee',
14: 'right_knee',
15: 'left_ankle',
16: 'right_ankle'
参数
  • ann_file (str) – Annotation file path. Default: ‘’.

  • bbox_file (str, optional) – Detection result file path. If bbox_file is set, detected bboxes loaded from this file will be used instead of ground-truth bboxes. This setting is only for evaluation, i.e., ignored when test_mode is False. Default: None.

  • data_mode (str) – Specifies the mode of data samples: 'topdown' or 'bottomup'. In 'topdown' mode, each data sample contains one instance; while in 'bottomup' mode, each data sample contains all instances in a image. Default: 'topdown'

  • frame_weights (List[Union[int, float]]) – The weight of each frame for aggregation. The first weight is for the center frame, then on ascending order of frame indices. Note that the length of frame_weights should be consistent with the number of sampled frames. Default: [0.0, 1.0]

  • frame_sampler_mode (str) – Specifies the mode of frame sampler: 'fixed' or 'random'. In 'fixed' mode, each frame index relative to the center frame is fixed, specified by frame_indices, while in 'random' mode, each frame index relative to the center frame is sampled from frame_range with certain randomness. Default: 'random'.

  • frame_range (int | List[int], optional) – The sampling range of supporting frames in the same video for center frame. Only valid when frame_sampler_mode is 'random'. Default: None.

  • num_sampled_frame (int, optional) – The number of sampled frames, except the center frame. Only valid when frame_sampler_mode is 'random'. Default: 1.

  • frame_indices (Sequence[int], optional) – The sampled frame indices, including the center frame indicated by 0. Only valid when frame_sampler_mode is 'fixed'. Default: None.

  • ph_fill_len (int) – The length of the placeholder to fill in the image filenames. Default: 6

  • metainfo (dict, optional) – Meta information for dataset, such as class information. Default: None.

  • data_root (str, optional) – The root directory for data_prefix and ann_file. Default: None.

  • data_prefix (dict, optional) – Prefix for training data. Default: dict(img='').

  • filter_cfg (dict, optional) – Config for filter data. Default: None.

  • indices (int or Sequence[int], optional) – Support using first few data in annotation file to facilitate training/testing on a smaller dataset. Default: None which means using all data_infos.

  • serialize_data (bool, optional) – Whether to hold memory using serialized objects, when enabled, data loader workers can use shared RAM from master process instead of making a copy. Default: True.

  • pipeline (list, optional) – Processing pipeline. Default: [].

  • test_mode (bool, optional) – test_mode=True means in test phase. Default: False.

  • lazy_init (bool, optional) – Whether to load annotation during instantiation. In some cases, such as visualization, only the meta information of the dataset is needed, which is not necessary to load annotation file. Basedataset can skip load annotations to save time by set lazy_init=False. Default: False.

  • max_refetch (int, optional) – If Basedataset.prepare_data get a None img. The maximum extra number of cycles to get a valid image. Default: 1000.

parse_data_info(raw_data_info: dict) Optional[dict][源代码]

Parse raw annotation of an instance.

参数

raw_data_info (dict) –

Raw data information loaded from ann_file. It should have following contents:

  • 'raw_ann_info': Raw annotation of an instance

  • 'raw_img_info': Raw information of the image that

    contains the instance

返回

Parsed instance annotation

返回类型

dict

class mmpose.datasets.datasets.face.AFLWDataset(ann_file: str = '', bbox_file: Optional[str] = None, data_mode: str = 'topdown', metainfo: Optional[dict] = None, data_root: Optional[str] = None, data_prefix: dict = {'img': ''}, filter_cfg: Optional[dict] = None, indices: Optional[Union[int, Sequence[int]]] = None, serialize_data: bool = True, pipeline: List[Union[dict, Callable]] = [], test_mode: bool = False, lazy_init: bool = False, max_refetch: int = 1000)[源代码]

AFLW dataset for face keypoint localization.

“Annotated Facial Landmarks in the Wild: A Large-scale, Real-world Database for Facial Landmark Localization”. In Proc. First IEEE International Workshop on Benchmarking Facial Image Analysis Technologies, 2011.

The landmark annotations follow the 19 points mark-up. The definition can be found in https://www.tugraz.at/institute/icg/research /team-bischof/lrs/downloads/aflw/

Args: ann_file (str): Annotation file path. Default: ‘’. bbox_file (str, optional): Detection result file path. If

bbox_file is set, detected bboxes loaded from this file will be used instead of ground-truth bboxes. This setting is only for evaluation, i.e., ignored when test_mode is False. Default: None.

data_mode (str): Specifies the mode of data samples: 'topdown' or

'bottomup'. In 'topdown' mode, each data sample contains one instance; while in 'bottomup' mode, each data sample contains all instances in a image. Default: 'topdown'

metainfo (dict, optional): Meta information for dataset, such as class

information. Default: None.

data_root (str, optional): The root directory for data_prefix and

ann_file. Default: None.

data_prefix (dict, optional): Prefix for training data. Default:

dict(img=None, ann=None).

filter_cfg (dict, optional): Config for filter data. Default: None. indices (int or Sequence[int], optional): Support using first few

data in annotation file to facilitate training/testing on a smaller dataset. Default: None which means using all data_infos.

serialize_data (bool, optional): Whether to hold memory using

serialized objects, when enabled, data loader workers can use shared RAM from master process instead of making a copy. Default: True.

pipeline (list, optional): Processing pipeline. Default: []. test_mode (bool, optional): test_mode=True means in test phase.

Default: False.

lazy_init (bool, optional): Whether to load annotation during

instantiation. In some cases, such as visualization, only the meta information of the dataset is needed, which is not necessary to load annotation file. Basedataset can skip load annotations to save time by set lazy_init=False. Default: False.

max_refetch (int, optional): If Basedataset.prepare_data get a

None img. The maximum extra number of cycles to get a valid image. Default: 1000.

parse_data_info(raw_data_info: dict) Optional[dict][源代码]

Parse raw Face AFLW annotation of an instance.

参数

raw_data_info (dict) –

Raw data information loaded from ann_file. It should have following contents:

  • 'raw_ann_info': Raw annotation of an instance

  • 'raw_img_info': Raw information of the image that

    contains the instance

返回

Parsed instance annotation

返回类型

dict

class mmpose.datasets.datasets.face.COFWDataset(ann_file: str = '', bbox_file: Optional[str] = None, data_mode: str = 'topdown', metainfo: Optional[dict] = None, data_root: Optional[str] = None, data_prefix: dict = {'img': ''}, filter_cfg: Optional[dict] = None, indices: Optional[Union[int, Sequence[int]]] = None, serialize_data: bool = True, pipeline: List[Union[dict, Callable]] = [], test_mode: bool = False, lazy_init: bool = False, max_refetch: int = 1000)[源代码]

COFW dataset for face keypoint localization.

“Robust face landmark estimation under occlusion”, ICCV’2013.

The landmark annotations follow the 29 points mark-up. The definition can be found in `http://www.vision.caltech.edu/xpburgos/ICCV13/`__ .

参数
  • ann_file (str) – Annotation file path. Default: ‘’.

  • bbox_file (str, optional) – Detection result file path. If bbox_file is set, detected bboxes loaded from this file will be used instead of ground-truth bboxes. This setting is only for evaluation, i.e., ignored when test_mode is False. Default: None.

  • data_mode (str) – Specifies the mode of data samples: 'topdown' or 'bottomup'. In 'topdown' mode, each data sample contains one instance; while in 'bottomup' mode, each data sample contains all instances in a image. Default: 'topdown'

  • metainfo (dict, optional) – Meta information for dataset, such as class information. Default: None.

  • data_root (str, optional) – The root directory for data_prefix and ann_file. Default: None.

  • data_prefix (dict, optional) – Prefix for training data. Default: dict(img=None, ann=None).

  • filter_cfg (dict, optional) – Config for filter data. Default: None.

  • indices (int or Sequence[int], optional) – Support using first few data in annotation file to facilitate training/testing on a smaller dataset. Default: None which means using all data_infos.

  • serialize_data (bool, optional) – Whether to hold memory using serialized objects, when enabled, data loader workers can use shared RAM from master process instead of making a copy. Default: True.

  • pipeline (list, optional) – Processing pipeline. Default: [].

  • test_mode (bool, optional) – test_mode=True means in test phase. Default: False.

  • lazy_init (bool, optional) – Whether to load annotation during instantiation. In some cases, such as visualization, only the meta information of the dataset is needed, which is not necessary to load annotation file. Basedataset can skip load annotations to save time by set lazy_init=False. Default: False.

  • max_refetch (int, optional) – If Basedataset.prepare_data get a None img. The maximum extra number of cycles to get a valid image. Default: 1000.

class mmpose.datasets.datasets.face.CocoWholeBodyFaceDataset(ann_file: str = '', bbox_file: Optional[str] = None, data_mode: str = 'topdown', metainfo: Optional[dict] = None, data_root: Optional[str] = None, data_prefix: dict = {'img': ''}, filter_cfg: Optional[dict] = None, indices: Optional[Union[int, Sequence[int]]] = None, serialize_data: bool = True, pipeline: List[Union[dict, Callable]] = [], test_mode: bool = False, lazy_init: bool = False, max_refetch: int = 1000)[源代码]

CocoWholeBodyDataset for face keypoint localization.

Whole-Body Human Pose Estimation in the Wild’, ECCV’2020. More details can be found in the `paper .

The face landmark annotations follow the 68 points mark-up.

参数
  • ann_file (str) – Annotation file path. Default: ‘’.

  • bbox_file (str, optional) – Detection result file path. If bbox_file is set, detected bboxes loaded from this file will be used instead of ground-truth bboxes. This setting is only for evaluation, i.e., ignored when test_mode is False. Default: None.

  • data_mode (str) – Specifies the mode of data samples: 'topdown' or 'bottomup'. In 'topdown' mode, each data sample contains one instance; while in 'bottomup' mode, each data sample contains all instances in a image. Default: 'topdown'

  • metainfo (dict, optional) – Meta information for dataset, such as class information. Default: None.

  • data_root (str, optional) – The root directory for data_prefix and ann_file. Default: None.

  • data_prefix (dict, optional) – Prefix for training data. Default: dict(img=None, ann=None).

  • filter_cfg (dict, optional) – Config for filter data. Default: None.

  • indices (int or Sequence[int], optional) – Support using first few data in annotation file to facilitate training/testing on a smaller dataset. Default: None which means using all data_infos.

  • serialize_data (bool, optional) – Whether to hold memory using serialized objects, when enabled, data loader workers can use shared RAM from master process instead of making a copy. Default: True.

  • pipeline (list, optional) – Processing pipeline. Default: [].

  • test_mode (bool, optional) – test_mode=True means in test phase. Default: False.

  • lazy_init (bool, optional) – Whether to load annotation during instantiation. In some cases, such as visualization, only the meta information of the dataset is needed, which is not necessary to load annotation file. Basedataset can skip load annotations to save time by set lazy_init=False. Default: False.

  • max_refetch (int, optional) – If Basedataset.prepare_data get a None img. The maximum extra number of cycles to get a valid image. Default: 1000.

parse_data_info(raw_data_info: dict) Optional[dict][源代码]

Parse raw CocoWholeBody Face annotation of an instance.

参数

raw_data_info (dict) –

Raw data information loaded from ann_file. It should have following contents:

  • 'raw_ann_info': Raw annotation of an instance

  • 'raw_img_info': Raw information of the image that

    contains the instance

返回

Parsed instance annotation

返回类型

dict

class mmpose.datasets.datasets.face.Face300WDataset(ann_file: str = '', bbox_file: Optional[str] = None, data_mode: str = 'topdown', metainfo: Optional[dict] = None, data_root: Optional[str] = None, data_prefix: dict = {'img': ''}, filter_cfg: Optional[dict] = None, indices: Optional[Union[int, Sequence[int]]] = None, serialize_data: bool = True, pipeline: List[Union[dict, Callable]] = [], test_mode: bool = False, lazy_init: bool = False, max_refetch: int = 1000)[源代码]

300W dataset for face keypoint localization.

“300 faces In-the-wild challenge: Database and results”, Image and Vision Computing (IMAVIS) 2019.

The landmark annotations follow the 68 points mark-up. The definition can be found in https://ibug.doc.ic.ac.uk/resources/300-W/.

参数
  • ann_file (str) – Annotation file path. Default: ‘’.

  • bbox_file (str, optional) – Detection result file path. If bbox_file is set, detected bboxes loaded from this file will be used instead of ground-truth bboxes. This setting is only for evaluation, i.e., ignored when test_mode is False. Default: None.

  • data_mode (str) – Specifies the mode of data samples: 'topdown' or 'bottomup'. In 'topdown' mode, each data sample contains one instance; while in 'bottomup' mode, each data sample contains all instances in a image. Default: 'topdown'

  • metainfo (dict, optional) – Meta information for dataset, such as class information. Default: None.

  • data_root (str, optional) – The root directory for data_prefix and ann_file. Default: None.

  • data_prefix (dict, optional) – Prefix for training data. Default: dict(img=None, ann=None).

  • filter_cfg (dict, optional) – Config for filter data. Default: None.

  • indices (int or Sequence[int], optional) – Support using first few data in annotation file to facilitate training/testing on a smaller dataset. Default: None which means using all data_infos.

  • serialize_data (bool, optional) – Whether to hold memory using serialized objects, when enabled, data loader workers can use shared RAM from master process instead of making a copy. Default: True.

  • pipeline (list, optional) – Processing pipeline. Default: [].

  • test_mode (bool, optional) – test_mode=True means in test phase. Default: False.

  • lazy_init (bool, optional) – Whether to load annotation during instantiation. In some cases, such as visualization, only the meta information of the dataset is needed, which is not necessary to load annotation file. Basedataset can skip load annotations to save time by set lazy_init=False. Default: False.

  • max_refetch (int, optional) – If Basedataset.prepare_data get a None img. The maximum extra number of cycles to get a valid image. Default: 1000.

parse_data_info(raw_data_info: dict) Optional[dict][源代码]

Parse raw Face300W annotation of an instance.

参数

raw_data_info (dict) –

Raw data information loaded from ann_file. It should have following contents:

  • 'raw_ann_info': Raw annotation of an instance

  • 'raw_img_info': Raw information of the image that

    contains the instance

返回

Parsed instance annotation

返回类型

dict

class mmpose.datasets.datasets.face.LapaDataset(ann_file: str = '', bbox_file: Optional[str] = None, data_mode: str = 'topdown', metainfo: Optional[dict] = None, data_root: Optional[str] = None, data_prefix: dict = {'img': ''}, filter_cfg: Optional[dict] = None, indices: Optional[Union[int, Sequence[int]]] = None, serialize_data: bool = True, pipeline: List[Union[dict, Callable]] = [], test_mode: bool = False, lazy_init: bool = False, max_refetch: int = 1000)[源代码]

LaPa dataset for face keypoint localization.

“A New Dataset and Boundary-Attention Semantic Segmentation for Face Parsing”, AAAI’2020.

The landmark annotations follow the 106 points mark-up. The definition can be found in `https://github.com/JDAI-CV/lapa-dataset/`__ .

参数
  • ann_file (str) – Annotation file path. Default: ‘’.

  • bbox_file (str, optional) – Detection result file path. If bbox_file is set, detected bboxes loaded from this file will be used instead of ground-truth bboxes. This setting is only for evaluation, i.e., ignored when test_mode is False. Default: None.

  • data_mode (str) – Specifies the mode of data samples: 'topdown' or 'bottomup'. In 'topdown' mode, each data sample contains one instance; while in 'bottomup' mode, each data sample contains all instances in a image. Default: 'topdown'

  • metainfo (dict, optional) – Meta information for dataset, such as class information. Default: None.

  • data_root (str, optional) – The root directory for data_prefix and ann_file. Default: None.

  • data_prefix (dict, optional) – Prefix for training data. Default: dict(img=None, ann=None).

  • filter_cfg (dict, optional) – Config for filter data. Default: None.

  • indices (int or Sequence[int], optional) – Support using first few data in annotation file to facilitate training/testing on a smaller dataset. Default: None which means using all data_infos.

  • serialize_data (bool, optional) – Whether to hold memory using serialized objects, when enabled, data loader workers can use shared RAM from master process instead of making a copy. Default: True.

  • pipeline (list, optional) – Processing pipeline. Default: [].

  • test_mode (bool, optional) – test_mode=True means in test phase. Default: False.

  • lazy_init (bool, optional) – Whether to load annotation during instantiation. In some cases, such as visualization, only the meta information of the dataset is needed, which is not necessary to load annotation file. Basedataset can skip load annotations to save time by set lazy_init=False. Default: False.

  • max_refetch (int, optional) – If Basedataset.prepare_data get a None img. The maximum extra number of cycles to get a valid image. Default: 1000.

class mmpose.datasets.datasets.face.WFLWDataset(ann_file: str = '', bbox_file: Optional[str] = None, data_mode: str = 'topdown', metainfo: Optional[dict] = None, data_root: Optional[str] = None, data_prefix: dict = {'img': ''}, filter_cfg: Optional[dict] = None, indices: Optional[Union[int, Sequence[int]]] = None, serialize_data: bool = True, pipeline: List[Union[dict, Callable]] = [], test_mode: bool = False, lazy_init: bool = False, max_refetch: int = 1000)[源代码]

WFLW dataset for face keypoint localization.

“Look at Boundary: A Boundary-Aware Face Alignment Algorithm”, CVPR’2018.

The landmark annotations follow the 98 points mark-up. The definition can be found in `https://wywu.github.io/projects/LAB/WFLW.html`__ .

参数
  • ann_file (str) – Annotation file path. Default: ‘’.

  • bbox_file (str, optional) – Detection result file path. If bbox_file is set, detected bboxes loaded from this file will be used instead of ground-truth bboxes. This setting is only for evaluation, i.e., ignored when test_mode is False. Default: None.

  • data_mode (str) – Specifies the mode of data samples: 'topdown' or 'bottomup'. In 'topdown' mode, each data sample contains one instance; while in 'bottomup' mode, each data sample contains all instances in a image. Default: 'topdown'

  • metainfo (dict, optional) – Meta information for dataset, such as class information. Default: None.

  • data_root (str, optional) – The root directory for data_prefix and ann_file. Default: None.

  • data_prefix (dict, optional) – Prefix for training data. Default: dict(img=None, ann=None).

  • filter_cfg (dict, optional) – Config for filter data. Default: None.

  • indices (int or Sequence[int], optional) – Support using first few data in annotation file to facilitate training/testing on a smaller dataset. Default: None which means using all data_infos.

  • serialize_data (bool, optional) – Whether to hold memory using serialized objects, when enabled, data loader workers can use shared RAM from master process instead of making a copy. Default: True.

  • pipeline (list, optional) – Processing pipeline. Default: [].

  • test_mode (bool, optional) – test_mode=True means in test phase. Default: False.

  • lazy_init (bool, optional) – Whether to load annotation during instantiation. In some cases, such as visualization, only the meta information of the dataset is needed, which is not necessary to load annotation file. Basedataset can skip load annotations to save time by set lazy_init=False. Default: False.

  • max_refetch (int, optional) – If Basedataset.prepare_data get a None img. The maximum extra number of cycles to get a valid image. Default: 1000.