Getting started

This notebook showcases the main functionalities of tri3d. It can be run for any of the supported dataset.

[1]:
import k3d
import matplotlib.pyplot as plt
import numpy as np

from tri3d.datasets import NuScenes, Waymo, ZODFrames
from tri3d.geometry import test_box_in_frame
from tri3d.misc import nearest_sorted
from tri3d.plot import (
    gen_discrete_cmap,
    plot_annotations_bev,
    plot_bbox_cam,
    plot_bboxes_3d,
    to_k3d_colors,
)

Uncomment the desired dataset below to visualize another dataset. The rest of the notebook should run as-is thanks to the shared API.

[2]:
# dataset = NuScenes("../../datasets/nuscenes", "v1.0-mini")
# camera, image, lidar = "CAM_FRONT", "IMG_FRONT", "LIDAR_TOP"

# dataset = Once("../../datasets/once", "train")
# camera, image, lidar = "cam01", "img01", "lidar_roof"

# dataset = SemanticKITTI("../../datasets/semantickitti")
# camera, image, lidar = None, None, "velodyne"

dataset = Waymo("../../datasets/waymo", split="training")
camera, image, lidar = "CAM_FRONT", "IMG_FRONT", "LIDAR_TOP"

# dataset = ZODFrames(
#     "../../datasets/zodframes",
#     "trainval-frames-mini.json",
#     "train",
# )
# camera, image, lidar = "front", "img_front", "velodyne"

if isinstance(dataset, Waymo):  # waymo does not have segmentation for all frames
    seq = 0
    frame = dataset.frames(seq, sensor="SEG_LIDAR_TOP")[0]
elif isinstance(dataset, NuScenes):
    seq = 5
    frame = dataset.keyframes(seq, lidar)[13]
elif isinstance(dataset, ZODFrames):
    seq = 1
    ann_ts = dataset.timestamps(seq, sensor="boxes")[0]
    frame = nearest_sorted(dataset.timestamps(seq, lidar), ann_ts)
else:
    seq, frame = 1, 0

cam_frame = nearest_sorted(
    dataset.timestamps(seq, camera), dataset.timestamps(seq, lidar)[frame]
)

Pose and calibration

In order to plot sensor poses, we first use Dataset.cam_sensors and Dataset.pcl_sensors to get the sensor names. The list of available frames for these sensors is given by Dataset.frames().

Using Dataset.poses() returns the poses of a given sensor at every sample frame available for that sensor. Poses are expressed in an arbitrary world coordinate system which is defined per sequence.

A pose is actually the geometric transform which takes a point in the sensor-local coordinates and returns it in world woordinates. In tri3d, poses are stored as instances of tri3d.geometry.Transform, which contain a batch of transformations (one for each frame). The geometric transformations objects support inversion, composition and, of course, application to 3D points.

[3]:
plot = k3d.plot(name="pcl", camera_mode="orbit", camera_auto_fit=False)

# select the first lidar pose as reference coordinates
p0 = dataset.poses(seq=seq, sensor=lidar)[0]

# iterate over sensors
for sensor in dataset.pcl_sensors + dataset.cam_sensors:
    if len(dataset.timestamps(seq=seq, sensor=sensor)) == 0:
        print(f"no data for sensor {sensor}")
        continue

    # get the sensor poses as a batched geometric transformation
    poses = dataset.poses(seq=seq, sensor=sensor)

    print(f"{sensor:15s} : {len(poses)} frames")

    # express poses relative to p0 instead of world coordinates
    poses = p0.inv() @ poses

    # Get sensor positions at each frame by projecting [0, 0, 0].
    # Note that [0, 0, 0] is automatically broadcasted to len(poses).
    origins = poses.apply(np.zeros([3]))

    # plot pose axes
    for edge, color in zip(np.eye(3) * 0.2, [0xFF0000, 0x00FF00, 0x0000FF]):
        # get axis vector by projecting [1, 0, 0], [0, 1, 0], [0, 0, 1]
        vectors = poses.apply(edge) - origins

        plot += k3d.vectors(
            origins.astype(np.float32),
            vectors.astype(np.float32),
            color=color,
            head_size=0.3,
            group=sensor,
        )

plot.camera = [6.44, -4.52, 3.66, 11.35, 6.17, -3.55, 0.0, 0.0, 1.0]
plot
LIDAR_TOP       : 198 frames
LIDAR_FRONT     : 198 frames
LIDAR_SIDE_LEFT : 198 frames
LIDAR_SIDE_RIGHT : 198 frames
LIDAR_REAR      : 198 frames
CAM_FRONT       : 198 frames
CAM_FRONT_LEFT  : 198 frames
CAM_FRONT_RIGHT : 198 frames
CAM_SIDE_LEFT   : 198 frames
CAM_SIDE_RIGHT  : 198 frames
no data for sensor CAM_REAR_LEFT
no data for sensor CAM_REAR
no data for sensor CAM_REAR_RIGHT
[3]:

Annotations in frame

Requesting annotations in image coordinates will automatically interpolate and project annotations to the timestamps and coordinates of the camera.

Two helpers are used to reduce the boilerplate: tri3d.plot.test_box_in_frame and tri3d.plot.plot_bbox_cam.

[4]:
# retrieve image
sample_image = dataset.image(seq, cam_frame, sensor=camera)

# retrieve boxes in image coordinates, with trajectories interpolated
# at the image timestamp if necessary
sample_annotations = dataset.boxes(seq, cam_frame, coords=image)
[5]:
img_size = sample_image.size
cmap = gen_discrete_cmap(len(dataset.det_labels))

plt.figure(dpi=125, figsize=(10, 6))
plt.xlim((0, img_size[0]))
plt.ylim((img_size[1], 0))

plt.imshow(sample_image)

for i, ann in enumerate(sample_annotations):
    # skip boxes where no vertex is in the frame
    if test_box_in_frame(ann.transform, ann.size, img_size):
        obj_id = str(getattr(ann, "instance", i))[:4]
        hdg = ann.heading / np.pi * 180
        color = cmap(dataset.det_labels.index(ann.label))
        u, v, z = ann.center
        print(f"  {obj_id:4s}: {ann.label:20s} at {z:3.0f}m heading {hdg:+.0f}°")

        # plot the bounding box
        plot_bbox_cam(ann.transform, ann.size, img_size, ec=color)

        # plot the uid
        plt.text(
            np.clip(u, 0, img_size[0]),
            np.clip(v, 0, img_size[1]),
            obj_id,
            horizontalalignment="center",
            verticalalignment="center",
            bbox=dict(facecolor="white", linewidth=0, alpha=0.7),
        )
  1   : VEHICLE              at  17m heading -16°
  2   : VEHICLE              at  28m heading -13°
  5   : SIGN                 at  56m heading -194°
  9   : SIGN                 at  21m heading -194°
  12  : VEHICLE              at  22m heading -102°
  16  : VEHICLE              at  47m heading -14°
  19  : VEHICLE              at  66m heading -8°
  21  : SIGN                 at  66m heading +53°
  22  : SIGN                 at  21m heading -194°
  25  : VEHICLE              at  76m heading -18°
  27  : SIGN                 at  21m heading -195°
_images/example_8_1.png

Lidar points over images

Requesting points in image coordinates will automatically compensate for ego car movement between the two sensor timestamps. It will also apply extrinsic and intrinsic calibrations.

[6]:
# Retrieve image.
sample_image = dataset.image(seq, cam_frame, sensor=camera)

# Retrieve points in image coordinates.
# The Lidar frame with the nearest timestamp will be used and ego car movement between
# the lidar and camera timestamps will be compensated.
points = dataset.points(seq, cam_frame, sensor=lidar, coords=image)

# Discard extra dims (intensity, etc.).
uvz = points[:, :3]

# Only keep points that are visible in the image.
in_frame = (
    (uvz[:, 0] > 0)
    & (uvz[:, 1] > 0)
    & (uvz[:, 2] > 0) # in front of camera
    & (uvz[:, 0] < sample_image.size[0])
    & (uvz[:, 1] < sample_image.size[1])
)
uvz = uvz[in_frame]
[7]:
plt.figure(dpi=125, figsize=(10, 6))
plt.xlim((0, img_size[0]))
plt.ylim((img_size[1], 0))

plt.imshow(sample_image)
plt.scatter(uvz[:, 0], uvz[:, 1], c=uvz[:, 2], clim=(0, 40), s=1., alpha=1)

plt.xlim(0, sample_image.size[0])
plt.ylim(sample_image.size[1], 0)
plt.gca().set_aspect("equal")
_images/example_11_0.png

Lidar points and boxes

[8]:
# retrieve point cloud
sample_pcl = dataset.points(seq, frame, coords=lidar, sensor=lidar)

# retrieve boxes in lidar coordinates
sample_annotations = dataset.boxes(seq, frame, coords=lidar)
[9]:
plt.figure(dpi=125, figsize=(8, 8))

plt.scatter(
    sample_pcl[:, 0], sample_pcl[:, 1], s=5, c="k", alpha=0.6, edgecolors="none"
)

cmap = gen_discrete_cmap(len(dataset.det_labels))

plot_annotations_bev(
    [a.center for a in sample_annotations],
    [a.size for a in sample_annotations],
    [a.heading for a in sample_annotations],
    ax=plt.gca(),
    c=cmap([dataset.det_labels.index(a.label) for a in sample_annotations]),
)

plt.xlim(-25, 25)
plt.ylim(-25, 25)
plt.gca().set_aspect("equal")
_images/example_14_0.png
[10]:
xyz = dataset.points(seq, frame=frame, coords=lidar, sensor=lidar)[:, :3].astype(np.float32)

sample_annotations = dataset.boxes(seq, frame=frame, coords=lidar)
sample_annotations = [a for a in sample_annotations if a.label != "DontCare"]

c = to_k3d_colors(plt.get_cmap("viridis")((xyz[:, 2] + 2) / 4))

plot = k3d.plot(name="pcl", camera_mode="orbit", camera_auto_fit=False)

plot += k3d.points(positions=xyz, point_size=0.07, shader="flat", colors=c)

plot_bboxes_3d(
    plot,
    [a.transform for a in sample_annotations],
    [a.size for a in sample_annotations],
    c=0xFF0000,
)

plot.camera = [-0.52, -12.19, 17.0, 4.94, -1.7, 5.19, 0.0, 0.0, 1.0]
plot
/home/ngranger/venvs/tri3d/lib64/python3.12/site-packages/traittypes/traittypes.py:97: UserWarning: Given trait value dtype "int64" does not match required type "float32". A coerced copy has been created.
  warnings.warn(
[10]:
[11]:
xyz = dataset.points(seq, frame, sensor=lidar)[:, :3].astype(np.float32)
semantic = dataset.semantic(seq, frame, sensor=lidar)

cmap = gen_discrete_cmap(len(dataset.sem_labels))
c = to_k3d_colors(cmap(semantic)) * (semantic >= 0)

plot = k3d.plot(name="pcl", camera_mode="orbit", camera_auto_fit=False)

plot += k3d.points(positions=xyz, point_size=0.07, shader="flat", colors=c)

plot.camera = [-0.52, -12.19, 17.0, 4.94, -1.7, 5.19, 0.0, 0.0, 1.0]
plot
[11]: