LANe : Lighting-Aware Neural Fields for Compositional Scene Synthesis

Akshay Krishnan*1, Amit Raj*1, Xianling Zhang2, Alexandra Carlson2, Nathan Tseng2, Sandhya Sridhar2, Nikita Jaipuria2, James Hays1
1Georgia Institute of Technology 2Ford Autonomy
.

DIV2K Desert

We present Lighting-Aware Neural Fields (LANe) for compositional scene synthesis. With the disentanglement of the learnt world model and a class specific object model, our approach is capable of arbitrarily composing objects into the scene. Our novel light field modulated representation allows the object model to be rendered in scenes in a lighting-aware manner.

The figure above shows the same world model learnt and reused as background scenes on each row, and object models composed into scene in arbitrary poses and locations under different lighting conditions. Note that the newly synthesized objects are shaded appropriately based on the scene lighting condition in which it is placed, which indicates our approach LANe can compose multiple object models and world model with physical realistic and consistent results.

Abstract

Neural fields have recently enjoyed great success in representing and rendering 3D scenes. However, most state-of-the-art implicit representations model static or dynamic scenes as a whole, with minor variations. Existing work on learning disentangled world and object fields do not consider the problem of composing objects into different world fields in a lighting-aware manner.

We present Lighting-Aware Neural Field (LANe) for the compositional synthesis of driving scenes in a physically consistent manner. Specifically, we learn a scene representation that disentangles the static background and transient elements into a world-NeRF and class-specific object-NeRFs to allow compositional synthesis of multiple objects in the scene. Furthermore, we explicitly designed both the world and object models to handle lighting variation, which allows us to compose objects into scenes with spatially varying lighting. This is achieved by constructing a light field of the scene and using it in conjunction with a learned shader to modulate the appearance of the object NeRFs. We demonstrate the performance of our model on a synthetic dataset of diverse lighting conditions rendered with the CARLA simulator, as well as a novel real-world dataset of cars collected at different times of the day.

Our approach composes objects learned from one scene into an entirely different scene whilst still respecting the lighting variations in the novel scene, and show that it outperforms state of the art compositional scene synthesis on the challenging dataset setup.

Video

Method

Method

Overview of the proposed approach. We model the scene with a seperate world-NeRF, which lighting-aware by training on the same scene under different lighting conditions, and a class specific object-NeRF, which use information from the scene-NeRF to train the object NeRF.

Comparison

Comparison: LANe can synthesize scenes with object models that respect spatially varying lighting. This figure shows the object model moving through the scene with spatially varying lighting, we observe that the object gets brighter as it enters a region of light from a region of shadow.

Downstream Task Validation

The lack of diversity in lighting condition is a known issue with manually curated autonomous driving datasets. For instance, the KITTI dataset was captured around the noon time, with similar lighting and shadow conditions across different sequences. Previous work such as SIMBAR have shown that deep learning models trained with such limited lighting conditions are unable to generalize to the plethora of lighting conditions encountered in the real-world, and validated the effectiveness of scene relighting as a useful data augmentation methodology for vision tasks. Our approach LANe can be leveraged for lighting-aware data manipulation and data augmentation for downstream autonomous driving vision tasks.