We tackle the task of synthesizing novel views of an object given a few inputimages and associated camera viewpoints. Our work is inspired by recent'geometry-free' approaches where multi-view images are encoded as a (global)set-latent representation, which is then used to predict the color forarbitrary query rays. While this representation yields (coarsely) accurateimages corresponding to novel viewpoints, the lack of geometric reasoninglimits the quality of these outputs. To overcome this limitation, we propose'Geometry-biased Transformers' (GBTs) that incorporate geometric inductivebiases in the set-latent representation-based inference to encourage multi-viewgeometric consistency. We induce the geometric bias by augmenting thedot-product attention mechanism to also incorporate 3D distances between raysassociated with tokens as a learnable bias. We find that this, along withcamera-aware embeddings as input, allows our models to generate significantlymore accurate outputs. We validate our approach on the real-world CO3D dataset,where we train our system over 10 categories and evaluate its view-synthesisability for novel objects as well as unseen categories. We empirically validatethe benefits of the proposed geometric biases and show that our approachsignificantly improves over prior works.