Feature extraction

On the previous page, we created a dataset with two groups of animals, each group with three individuals (trajectories). We then created a DataFrame of observations, which we used to create an annotated dataset. Let’s recap and display the observations in a table.

display(Table(dataset.observations))

The dataset consists of groups and can be iterated over. Similarly, groups consist of dyads (note the target parameter when creating groups/datasets) and can also be iterated over. Each of these objects is a SampleableMixin that provides methods for feature extraction, compatible with our feature extraction workflow.

for group_id, group in dataset:
    for dyad_id, dyad in group:
        print(f"Group {group_id}, Dyad {dyad_id} with {len(dyad)} samples")
Group a, Dyad ('a_1', 'a_2') with 196 samples
Group a, Dyad ('a_1', 'a_3') with 196 samples
Group a, Dyad ('a_2', 'a_1') with 196 samples
Group a, Dyad ('a_2', 'a_3') with 199 samples
Group a, Dyad ('a_3', 'a_1') with 196 samples
Group a, Dyad ('a_3', 'a_2') with 199 samples
Group b, Dyad ('b_1', 'b_2') with 199 samples
Group b, Dyad ('b_1', 'b_3') with 196 samples
Group b, Dyad ('b_2', 'b_1') with 199 samples
Group b, Dyad ('b_2', 'b_3') with 196 samples
Group b, Dyad ('b_3', 'b_1') with 196 samples
Group b, Dyad ('b_3', 'b_2') with 196 samples

Manual feature calculation

Each timestamp that both animals are present is considered a sample. For each sample, we can extract features such as keypoint_distances(), posture_angles(), or speed(). We can do this manually by using the feature functions defined in the features module:

Hint

All feature functions return a ndarray containing the computed features. The shape depends on the number of postural elements and whether the feature should be computed element_wise with regard to these elements. All feature functions have an additional flat parameter, if specified, the features will be returned as a ndarray with shape (n_samples, n_features).

from vassi.features import (
    keypoint_distances, posture_angles, speed
)

# calculate all pairwise keypoint distances (for four keypoints)
# element_wise=True to only calculate four distances
# (would otherwise calculate 16 distances)
distances = keypoint_distances(
    dyad.trajectory,
    trajectory_other=dyad.trajectory_other,
    keypoints_1=(0, 1, 2, 3),
    keypoints_2=(0, 1, 2, 3),
    element_wise=True,
    flat=True,
)
# calculate one posture angle for the 'actor'
# (the first individual in the dyad)
angles_actor = posture_angles(
    dyad.trajectory,
    keypoint_pairs_1=((1, 0), ),
    keypoint_pairs_2=((3, 2), ),
    flat=True,
)
# calculate one angle between posture segments of
# both individuals using the same function
angles = posture_angles(
    dyad.trajectory,
    trajectory_other=dyad.trajectory_other,
    keypoint_pairs_1=((1, 0), ),
    keypoint_pairs_2=((1, 0), ),
    flat=True,
)
# calculate speed for both individuals with a step size of 15
speed_actor = speed(
    dyad.trajectory,
    keypoints=(0, 2),
    step=15,
    flat=True,
)
speed_recipient = speed(
    dyad.trajectory_other,
    keypoints=(0, 2),
    step=15,
    flat=True,
)

features = np.concatenate(
    (
        distances,
        angles_actor,
        angles,
        speed_actor,
        speed_recipient,
    ),
    axis=1,
)
print(features.shape)
(196, 10)

For more transparency (especially in subsequent classification steps), feature functions can be wrapped using the as_dataframe() decorator, which then returns a DataFrame with named columns.

from vassi.features import as_dataframe

speed_actor_df = as_dataframe(speed)(
    dyad.trajectory,
    keypoints=(0, 2),
    step=15,
    flat=True,
)
speed_recipient_df = as_dataframe(speed)(
    dyad.trajectory_other,
    keypoints=(0, 2),
    step=15,
    flat=True,
)

print(pd.concat([speed_actor_df, speed_recipient_df], axis=1).head())
   speed_t(15)-0  speed_t(15)-2  speed_t(15)-0  speed_t(15)-2
0       0.011664       0.033358       0.016175       0.042869
1       0.011664       0.033358       0.016175       0.042869
2       0.011664       0.033358       0.016175       0.042869
3       0.011664       0.033358       0.016175       0.042869
4       0.011664       0.033358       0.016175       0.042869

Note

In the example above, speed is a temporal feature and needs some padding (step // 2) to align with other features, therefore the first values are repeated. Feature names are independent of the input trajectory, so cannot differentiate between actor and recipient in the example above.

Using feature extractors

A more reproducible workflow can be achieved by using the DataFrameFeatureExtractor class. This allows you to define a set of features and their parameters in a more structured way.

from vassi.features import DataFrameFeatureExtractor

extractor = DataFrameFeatureExtractor(
    features=[
        (
            posture_angles,
            dict(
                keypoint_pairs_1=((1, 0), ),
                keypoint_pairs_2=((3, 2), ),
            ),
        ),
        (
            speed,
            dict(
                keypoints=(0, 2),
                step=15,
            ),
        ),
        (
            speed,
            dict(
                keypoints=(0, 2),
                step=15,
                reversed_dyad=True,
            ),
        )
    ],
    dyadic_features=[
        (
            keypoint_distances,
            dict(
                keypoints_1=(0, 1, 2, 3),
                keypoints_2=(0, 1, 2, 3),
                element_wise=True,
            ),
        ),
        (
            posture_angles,
            dict(
                keypoint_pairs_1=((1, 0), ),
                keypoint_pairs_2=((3, 2), ),
            ),
        ),
    ],
    cache_mode=False,
)

print(extractor.extract(dyad.trajectory, dyad.trajectory_other).head())
   posture_angles-1_0-3_2  speed_t(15)-0  speed_t(15)-2  r_speed_t(15)-0  \
0                0.385173       0.011664       0.033358         0.016175   
1                1.203246       0.011664       0.033358         0.016175   
2                1.245478       0.011664       0.033358         0.016175   
3                2.264221       0.011664       0.033358         0.016175   
4                0.527311       0.011664       0.033358         0.016175   

   r_speed_t(15)-2  keypoint_distances-0-0  keypoint_distances-1-1  \
0         0.042869                0.435789                0.252969   
1         0.042869                0.550464                0.418999   
2         0.042869                0.679210                0.585478   
3         0.042869                0.534971                0.323831   
4         0.042869                0.409749                0.404508   

   keypoint_distances-2-2  keypoint_distances-3-3  posture_angles-1_0-3_2  
0                0.143071                0.394625                0.851134  
1                0.249795                0.229722                0.609156  
2                0.384948                0.199233                0.068361  
3                0.158834                0.323625                0.926760  
4                0.387330                0.589616               -2.473069  

This computes the same 10 features that were obtained before by concatenating the results of separate feature functions, but has the advantage of being more reproducible and easier to maintain. For example, we can easily save and load the configuration to and from a YAML file.

extractor.save_yaml('config.yaml')
extractor = DataFrameFeatureExtractor(cache_mode=False).read_yaml('config.yaml')

The saved configuration file is shown below. You can also start by creating such a YAML file and add individual and dyadic features with their respective arguments.

individual:
- - posture_angles
  - keypoint_pairs_1:
    - - 1
      - 0
    keypoint_pairs_2:
    - - 3
      - 2
- - speed
  - keypoints:
    - 0
    - 2
    step: 15
- - speed
  - keypoints:
    - 0
    - 2
    step: 15
    reversed_dyad: true
dyadic:
- - keypoint_distances
  - keypoints_1:
    - 0
    - 1
    - 2
    - 3
    keypoints_2:
    - 0
    - 1
    - 2
    - 3
    element_wise: true
- - posture_angles
  - keypoint_pairs_1:
    - - 1
      - 0
    keypoint_pairs_2:
    - - 3
      - 2

Have a look at the API documentation (submodules features and temporal_features) for all implemented features and their respective arguments.

Hint

Feature extractors (inheriting from BaseExtractor) allow additional arguments to be passed to the feature functions to allow for more flexibility:

  • as_absolute to compute absolute values (e.g., helpful for positive and negative angles).

  • reversed_dyad to switch actor and recipient trajectories, for example to calculate specific individual featues for the recipient.

  • as_sign_change_latency to compute the latency between sign changes of a feature, for example to measure the time between changes a clockwise or anticlockwise posture angle.

The DataFrameFeatureExtractor adds additional arguments to specify which columns to discard from the resulting DataFrame. You can specify one or more strings (patterns) to drop, and use keep to specify exceptions for columns that should be kept regardless of these patterns.

Feature extractors can not only be used via their extract() method, but also as an argument for all dataset types (SampleableMixin). This includes individuals, dyads, groups, and datasets.