{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Getting started\n", "\n", "This notebook showcases the main functionalities of tri3d.\n", "It can be run for any of the supported dataset." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "pycharm": { "is_executing": false } }, "outputs": [], "source": [ "import k3d\n", "import matplotlib.pyplot as plt\n", "import numpy as np\n", "\n", "from tri3d.datasets import NuScenes, Waymo, ZODFrames\n", "from tri3d.geometry import test_box_in_frame\n", "from tri3d.misc import nearest_sorted\n", "from tri3d.plot import (\n", " gen_discrete_cmap,\n", " plot_annotations_bev,\n", " plot_bbox_cam,\n", " plot_bboxes_3d,\n", " to_k3d_colors,\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Uncomment the desired dataset below to visualize another dataset.\n", "The rest of the notebook should run as-is thanks to the shared API." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "pycharm": { "is_executing": false }, "tags": [] }, "outputs": [], "source": [ "# dataset = NuScenes(\"../../datasets/nuscenes\", \"v1.0-mini\")\n", "# camera, image, lidar = \"CAM_FRONT\", \"IMG_FRONT\", \"LIDAR_TOP\"\n", "\n", "# dataset = Once(\"../../datasets/once\", \"train\")\n", "# camera, image, lidar = \"cam01\", \"img01\", \"lidar_roof\"\n", "\n", "# dataset = SemanticKITTI(\"../../datasets/semantickitti\")\n", "# camera, image, lidar = None, None, \"velodyne\"\n", "\n", "dataset = Waymo(\"../../datasets/waymo\", split=\"training\")\n", "camera, image, lidar = \"CAM_FRONT\", \"IMG_FRONT\", \"LIDAR_TOP\"\n", "\n", "# dataset = ZODFrames(\n", "# \"../../datasets/zodframes\",\n", "# \"trainval-frames-mini.json\",\n", "# \"train\",\n", "# )\n", "# camera, image, lidar = \"front\", \"img_front\", \"velodyne\"\n", "\n", "if isinstance(dataset, Waymo): # waymo does not have segmentation for all frames\n", " seq = 0\n", " frame = dataset.frames(seq, sensor=\"SEG_LIDAR_TOP\")[0]\n", "elif isinstance(dataset, NuScenes):\n", " seq = 5\n", " frame = dataset.keyframes(seq, lidar)[13]\n", "elif isinstance(dataset, ZODFrames):\n", " seq = 1\n", " ann_ts = dataset.timestamps(seq, sensor=\"boxes\")[0]\n", " frame = nearest_sorted(dataset.timestamps(seq, lidar), ann_ts)\n", "else:\n", " seq, frame = 1, 0\n", "\n", "cam_frame = nearest_sorted(\n", " dataset.timestamps(seq, camera), dataset.timestamps(seq, lidar)[frame]\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Pose and calibration\n", "\n", "In order to plot sensor poses, we first use `Dataset.cam_sensors` and `Dataset.pcl_sensors` to get the sensor names.\n", "The list of available frames for these sensors is given by `Dataset.frames()`.\n", "\n", "Using `Dataset.poses()` returns the poses of a given sensor at every sample frame available for that sensor.\n", "Poses are expressed in an arbitrary *world* coordinate system which is defined per sequence.\n", "\n", "A pose is actually the geometric transform which takes a point in the sensor-local coordinates and returns it in *world* woordinates.\n", "In tri3d, poses are stored as instances of `tri3d.geometry.Transform`, which contain a batch of transformations (one for each frame).\n", "The geometric transformations objects support inversion, composition and, of course, application to 3D points." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plot = k3d.plot(name=\"pcl\", camera_mode=\"orbit\", camera_auto_fit=False)\n", "\n", "# select the first lidar pose as reference coordinates\n", "p0 = dataset.poses(seq=seq, sensor=lidar)[0]\n", "\n", "# iterate over sensors\n", "for sensor in dataset.pcl_sensors + dataset.cam_sensors:\n", " if len(dataset.timestamps(seq=seq, sensor=sensor)) == 0:\n", " print(f\"no data for sensor {sensor}\")\n", " continue\n", "\n", " # get the sensor poses as a batched geometric transformation\n", " poses = dataset.poses(seq=seq, sensor=sensor)\n", "\n", " print(f\"{sensor:15s} : {len(poses)} frames\")\n", "\n", " # express poses relative to p0 instead of world coordinates\n", " poses = p0.inv() @ poses\n", "\n", " # Get sensor positions at each frame by projecting [0, 0, 0].\n", " # Note that [0, 0, 0] is automatically broadcasted to len(poses).\n", " origins = poses.apply(np.zeros([3]))\n", "\n", " # plot pose axes\n", " for edge, color in zip(np.eye(3) * 0.2, [0xFF0000, 0x00FF00, 0x0000FF]):\n", " # get axis vector by projecting [1, 0, 0], [0, 1, 0], [0, 0, 1]\n", " vectors = poses.apply(edge) - origins\n", "\n", " plot += k3d.vectors(\n", " origins.astype(np.float32),\n", " vectors.astype(np.float32),\n", " color=color,\n", " head_size=0.3,\n", " group=sensor,\n", " )\n", "\n", "plot.camera = [6.44, -4.52, 3.66, 11.35, 6.17, -3.55, 0.0, 0.0, 1.0]\n", "plot" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Annotations in frame\n", "\n", "Requesting annotations in image coordinates will automatically interpolate and project annotations to the timestamps and coordinates of the camera.\n", "\n", "Two helpers are used to reduce the boilerplate: `tri3d.plot.test_box_in_frame` and `tri3d.plot.plot_bbox_cam`." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "pycharm": { "is_executing": false }, "tags": [] }, "outputs": [], "source": [ "# retrieve image\n", "sample_image = dataset.image(seq, cam_frame, sensor=camera)\n", "\n", "# retrieve boxes in image coordinates, with trajectories interpolated\n", "# at the image timestamp if necessary\n", "sample_annotations = dataset.boxes(seq, cam_frame, coords=image)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "img_size = sample_image.size\n", "cmap = gen_discrete_cmap(len(dataset.det_labels))\n", "\n", "plt.figure(dpi=125, figsize=(10, 6))\n", "plt.xlim((0, img_size[0]))\n", "plt.ylim((img_size[1], 0))\n", "\n", "plt.imshow(sample_image)\n", "\n", "for i, ann in enumerate(sample_annotations):\n", " # skip boxes where no vertex is in the frame\n", " if test_box_in_frame(ann.transform, ann.size, img_size):\n", " obj_id = str(getattr(ann, \"instance\", i))[:4]\n", " hdg = ann.heading / np.pi * 180\n", " color = cmap(dataset.det_labels.index(ann.label))\n", " u, v, z = ann.center\n", " print(f\" {obj_id:4s}: {ann.label:20s} at {z:3.0f}m heading {hdg:+.0f}°\")\n", "\n", " # plot the bounding box\n", " plot_bbox_cam(ann.transform, ann.size, img_size, ec=color)\n", "\n", " # plot the uid\n", " plt.text(\n", " np.clip(u, 0, img_size[0]),\n", " np.clip(v, 0, img_size[1]),\n", " obj_id,\n", " horizontalalignment=\"center\",\n", " verticalalignment=\"center\",\n", " bbox=dict(facecolor=\"white\", linewidth=0, alpha=0.7),\n", " )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Lidar points over images\n", "\n", "Requesting points in image coordinates will automatically compensate for ego car movement between the two sensor timestamps.\n", "It will also apply extrinsic and intrinsic calibrations." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "pycharm": { "is_executing": false } }, "outputs": [], "source": [ "# Retrieve image.\n", "sample_image = dataset.image(seq, cam_frame, sensor=camera)\n", "\n", "# Retrieve points in image coordinates.\n", "# The Lidar frame with the nearest timestamp will be used and ego car movement between \n", "# the lidar and camera timestamps will be compensated.\n", "points = dataset.points(seq, cam_frame, sensor=lidar, coords=image)\n", "\n", "# Discard extra dims (intensity, etc.).\n", "uvz = points[:, :3]\n", "\n", "# Only keep points that are visible in the image.\n", "in_frame = (\n", " (uvz[:, 0] > 0)\n", " & (uvz[:, 1] > 0)\n", " & (uvz[:, 2] > 0) # in front of camera\n", " & (uvz[:, 0] < sample_image.size[0])\n", " & (uvz[:, 1] < sample_image.size[1])\n", ")\n", "uvz = uvz[in_frame]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plt.figure(dpi=125, figsize=(10, 6))\n", "plt.xlim((0, img_size[0]))\n", "plt.ylim((img_size[1], 0))\n", "\n", "plt.imshow(sample_image)\n", "plt.scatter(uvz[:, 0], uvz[:, 1], c=uvz[:, 2], clim=(0, 40), s=1., alpha=1)\n", "\n", "plt.xlim(0, sample_image.size[0])\n", "plt.ylim(sample_image.size[1], 0)\n", "plt.gca().set_aspect(\"equal\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Lidar points and boxes" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# retrieve point cloud\n", "sample_pcl = dataset.points(seq, frame, coords=lidar, sensor=lidar)\n", "\n", "# retrieve boxes in lidar coordinates\n", "sample_annotations = dataset.boxes(seq, frame, coords=lidar)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plt.figure(dpi=125, figsize=(8, 8))\n", "\n", "plt.scatter(\n", " sample_pcl[:, 0], sample_pcl[:, 1], s=5, c=\"k\", alpha=0.6, edgecolors=\"none\"\n", ")\n", "\n", "cmap = gen_discrete_cmap(len(dataset.det_labels))\n", "\n", "plot_annotations_bev(\n", " [a.center for a in sample_annotations],\n", " [a.size for a in sample_annotations],\n", " [a.heading for a in sample_annotations],\n", " ax=plt.gca(),\n", " c=cmap([dataset.det_labels.index(a.label) for a in sample_annotations]),\n", ")\n", "\n", "plt.xlim(-25, 25)\n", "plt.ylim(-25, 25)\n", "plt.gca().set_aspect(\"equal\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "xyz = dataset.points(seq, frame=frame, coords=lidar, sensor=lidar)[:, :3].astype(np.float32)\n", "\n", "sample_annotations = dataset.boxes(seq, frame=frame, coords=lidar)\n", "sample_annotations = [a for a in sample_annotations if a.label != \"DontCare\"]\n", "\n", "c = to_k3d_colors(plt.get_cmap(\"viridis\")((xyz[:, 2] + 2) / 4))\n", "\n", "plot = k3d.plot(name=\"pcl\", camera_mode=\"orbit\", camera_auto_fit=False)\n", "\n", "plot += k3d.points(positions=xyz, point_size=0.07, shader=\"flat\", colors=c)\n", "\n", "plot_bboxes_3d(\n", " plot,\n", " [a.transform for a in sample_annotations],\n", " [a.size for a in sample_annotations],\n", " c=0xFF0000,\n", ")\n", "\n", "plot.camera = [-0.52, -12.19, 17.0, 4.94, -1.7, 5.19, 0.0, 0.0, 1.0]\n", "plot" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "xyz = dataset.points(seq, frame, sensor=lidar)[:, :3].astype(np.float32)\n", "semantic = dataset.semantic(seq, frame, sensor=lidar)\n", "\n", "cmap = gen_discrete_cmap(len(dataset.sem_labels))\n", "c = to_k3d_colors(cmap(semantic)) * (semantic >= 0)\n", "\n", "plot = k3d.plot(name=\"pcl\", camera_mode=\"orbit\", camera_auto_fit=False)\n", "\n", "plot += k3d.points(positions=xyz, point_size=0.07, shader=\"flat\", colors=c)\n", "\n", "plot.camera = [-0.52, -12.19, 17.0, 4.94, -1.7, 5.19, 0.0, 0.0, 1.0]\n", "plot" ] } ], "metadata": { "kernelspec": { "display_name": "tri3d", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.11" } }, "nbformat": 4, "nbformat_minor": 4 }