Neural networks can represent and accurately reconstruct radiance fields forstatic 3D scenes (e.g., NeRF). Several works extend these to dynamic scenescaptured with monocular video, with promising performance. However, themonocular setting is known to be an under-constrained problem, and so methodsrely on data-driven priors for reconstructing dynamic content. We replace thesepriors with measurements from a time-of-flight (ToF) camera, and introduce aneural representation based on an image formation model for continuous-wave ToFcameras. Instead of working with processed depth maps, we model the raw ToFsensor measurements to improve reconstruction quality and avoid issues with lowreflectance regions, multi-path interference, and a sensor's limitedunambiguous depth range. We show that this approach improves robustness ofdynamic scene reconstruction to erroneous calibration and large motions, anddiscuss the benefits and limitations of integrating RGB+ToF sensors that arenow available on modern smartphones.