Can NeRFs "See" Without Cameras?

NeurIPS 2025
1 University of Illinois Urbana-Champaign; 2 Amazon
* Equal contribution

1. NeRFs can see with cameras, can we generalize them to RF signals?

Left: vanilla NeRF—cameras capture light from an object (e.g., a bulldozer). Right: EchoNeRF—phones capture RF multipath echoes from a house.

In NeRFs (left), the sensors are cameras that record light (as RGB images) coming from an object of interest (bulldozer). From these multi-view RGB images, (optical) NeRFs learn to synthesize novel views of the object and reconstruct its shape. Similarly, in EchoNeRF (right), the sensors are phones that measure RF signals (as signal strength) present in an environment (the house), except that the measurements are now made inside the environment as opposed to outside. Given a few such RF measurements, EchoNeRF learns to infer the spatial layout of the house.

TL;DR: NeRFs use cameras to see objects, EchoNeRF uses RF signals to see environments.

2. Challenges

Overview of Camera NeRFs vs EchoNeRF

A camera pixel (left) primarily captures light arriving directly from the object in front of it—along the line-of-sight (LoS) path (see [1]). In contrast, an RF sensor (right) such as a phone receives the LoS path and multiple unknown reflections, or "echoes," from walls and other objects in the surroundings. Previous works [2], [3] attempted to reuse NeRF-style reasoning for RF signals; although signal prediction was demonstrated, the core inverse problem of reconstructing geometry remains unsolved. EchoNeRF redesigns NeRFs to learn from unknown multipath signals, enabling it to "see" the environment, thereby solving the inverse problem.

3. Contribution

In EchoNeRF, every voxel is parameterized by its opacity \(\delta\) and a (discrete) orientation \(\omega\). Using these quantities, we design both the line of sight and first-order reflected signals analytically, which are then aggregated to form an approximation of the received signal \(\psi^*\).

\[ \tilde{\psi} = \psi_{\mathrm{LoS}} + \psi_{\mathrm{ref}_1} \]

\[ \psi_{\mathrm{LoS}} = K \;\frac{\displaystyle \prod_{\{i \mid v_i \in \mathrm{LoS}\}} \big(1-\delta_i\big)}{d^2} \]

\[ \psi_{\mathrm{ref}_1} = \displaystyle \sum_{\{j \mid v_j \in \mathcal{V}\}} \psi_{\mathrm{ref}}(v_j) \]

\[ \psi_{\mathrm{ref}}(v_j) = \delta_{j}\; f(\theta,\beta)\; \frac{\displaystyle \prod_{k \in \{ \mathrm{Rx}: v_j \}} \big(1-\delta_{k}\big)\; \prod_{l \in \{ v_j : \mathrm{Tx} \}} \big(1-\delta_{l}\big) }{\left(d_{\mathrm{Tx}:v_j} + d_{v_j:\mathrm{Rx}}\right)^2} \]

We also introduce a two-stage training algorithm to stabilize the line-of-sight domination in the received signal.

4. Results

Overview of Camera NeRFs vs EchoNeRF

We evaluate EchoNeRF on the Zillow Indoor Dataset [4], which contains real-world floorplans. The above figure shows a qualitative comparison of ground truth floorplans against baselines. In the first row, red stars denote Tx locations and light gray dots denote Rx measurement locations. The bottom two rows show floorplans learnt by our Stage 1 and Stage 2 models with sharper walls and boundaries.

5. Downstream tasks can EchoNeRF solve

Because EchoNeRF solves the inverse problem, it allows for several downstream tasks:

  • Signal Power (RSSI) Prediction: We use the EchoNeRF's signal model to predict the RSSI at uniform grid locations across the predicted floorplan. We do this for the transmitters (Tx) used during training (red star) as well as for new, unseen Tx locations (green star). Observe that baselines overfit to the training Tx locations and fail to generalize to new Tx locations. EchoNeRF, on the other hand, generalizes well to unseen Tx locations.
  • Overview of Camera NeRFs vs EchoNeRF
  • Channel Impulse Response (CIR) Prediction: We randomly select a Tx-Rx pair inside the predicted floorplan and generate the CIRs using the NVIDIA's Sionna Simulator [5].
  • RIR PP 1
    RIR PP 2
    RIR PP 3
    RIR PP 4
  • Basic Ray tracing: We also show the ray tracing results by generating higher order reflections (until order 3) using Sionna on the predicted floorplans. Tx and Rx are shown in the image as red and green stars, respectively.
  • RIR PP 1
    RIR PP 2
    RIR PP 3
    RIR PP 4

References

  1. B. Mildenhall, P. P. Srinivasan, M. Tancik, J. T. Barron, R. Ramamoorthi, and R. Ng, "NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis," ECCV, 2020. [arXiv]
  2. X. Zhao, Z. An, Q. Pan, and L. Yang, "NeRF2: Neural Radio-Frequency Radiance Fields," Proceedings of the 29th Annual International Conference on Mobile Computing and Networking (MobiCom), 2023. [arXiv]
  3. H. Lu, C. Vattheuer, B. Mirzasoleiman, and O. Abari, "NeWRF: A Deep Learning Framework for Wireless Radiation Field Reconstruction and Channel Prediction," in Forty-first International Conference on Machine Learning (ICML), 2024. [arXiv]
  4. S. Cruz, W. Hutchcroft, Y. Li, R. Martin-Brualla, D. B. Goldman, M. W. Turek, and S. Izadi, "Zillow Indoor Dataset: Annotated Floor Plans with 360° Panoramas and 3D Room Layouts," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021. [PDF]
  5. J. Hoydis, S. Cammerer, F. Ait Aoudia, A. Vem, N. Binder, G. Marcus, and A. Keller, "Sionna: An Open-Source Library for Next-Generation Physical Layer Research," arXiv preprint arXiv:2203.11854, 2022. [arXiv]

BibTeX

    @inproceedings{echo-nerf-2025,
      title     = {Can NeRFs "See" Without Cameras?},
      author    = {Amballa, Chaitanya and Basu, Sattwik and Wei, Yu-Lin and Yang, Zhijian and Ergezer, Mehmet and Choudhury, Romit Roy},
      booktitle = {Advances in Neural Information Processing Systems (NeurIPS)},
      year      = {2025},
      url       = {https://arxiv.org/pdf/2505.22441},
      eprint    = {2505.22441},
      archivePrefix = {arXiv}
    }