Abstract
Vision based robot manipulation uses cameras to capture one or more images ofa scene containing the objects to be manipulated. Taking multiple images canhelp if any object is occluded from one viewpoint but more visible from anotherviewpoint. However, the camera has to be moved to a sequence of suitablepositions for capturing multiple images, which requires time and may not alwaysbe possible, due to reachability constraints. So while additional images canproduce more accurate grasp poses due to the extra information available, thetime-cost goes up with the number of additional views sampled. Scenerepresentations like Gaussian Splatting are capable of rendering accuratephotorealistic virtual images from user-specified novel viewpoints. In thiswork, we show initial results which indicate that novel view synthesis canprovide additional context in generating grasp poses. Our experiments on theGraspnet-1billion dataset show that novel views contributed force-closuregrasps in addition to the force-closure grasps obtained from sparsely sampledreal views while also improving grasp coverage. In the future we hope this workcan be extended to improve grasp extraction from radiance fields constructedwith a single input image, using for example diffusion models or generalizableradiance fields.