Abstract:Indoor environments lack the spatial intelligence infrastructure that GPS provides outdoors; first responders arriving at unfamiliar buildings typically have no machine-readable map of safety equipment. Prior work on 3D semantic segmentation for public safety identified two barriers: scarcity of labeled indoor training data and poor recognition of small safety-critical features by native point-cloud methods. This paper presents INSIGHT, a zero-target-domain-annotation pipeline that projects 2D image understanding into 3D metric space via registered RGB-D data. Two interchangeable vision stacks share a common 3D back end: a SAM3 foundation-model stack for text-prompted segmentation, and a traditional CV stack (open-set detection, VQA, OCR) whose intermediate outputs are independently inspectable. Evaluated on all seven subareas of Stanford 2D-3D-S (70{,}496 images), the pipeline produces Pointcept-schema-compatible labeled point clouds and ISO~19164-compliant scene graphs with ${\sim}10^{4}{\times}$ compression; role-filtered payloads transmit in ${<}15$\,s at 1\,Mbps over FirstNet Band~14. We report per-point labeling accuracy on 7 shared classes, detection sensitivity for 15 safety-critical classes absent from public 3D benchmarks alongside code-capped deployable estimates, and inter-pipeline complementarity, demonstrating that 2D-to-3D semantic transfer addresses the labeled-data bottleneck while scene graphs provide building intelligence compact enough for field deployment.
Abstract:This study analyzes semantic segmentation performance across heterogeneously labeled point-cloud datasets relevant to public safety applications, including pre-incident planning systems derived from lidar scans. Using NIST's Point Cloud City dataset (Enfield and Memphis collections), we investigate challenges in unifying differently labeled 3D data. Our methodology employs a graded schema with the KPConv architecture, evaluating performance through IoU metrics on safety-relevant features. Results indicate performance variability: geometrically large objects (e.g. stairs, windows) achieve higher segmentation performance, suggesting potential for navigational context, while smaller safety-critical features exhibit lower recognition rates. Performance is impacted by class imbalance and the limited geometric distinction of smaller objects in typical lidar scans, indicating limitations in detecting certain safety-relevant features using current point-cloud methods. Key identified challenges include insufficient labeled data, difficulties in unifying class labels across datasets, and the need for standardization. Potential directions include automated labeling and multi-dataset learning strategies. We conclude that reliable point-cloud semantic segmentation for public safety necessitates standardized annotation protocols and improved labeling techniques to address data heterogeneity and the detection of small, safety-critical elements.