Abstract
Long-range tasks demand reasoning over long inputs. Current solutions requirelarge compute budgets, training data, model weight access, or complextask-specific designs. We introduce PRISM, which processes information as astream of chunks while maintaining a structured in-context memory specifiedwith a typed hierarchical schema. PRISM outperforms baselines on diverse taskswhile using at least 4x shorter contexts than long-context models. Thisapproach is token-efficient, producing concise outputs and efficientlyleveraging key-value (KV) caches to reduce costs by up to 54% compared toalternative short-context methods. PRISM scales down to tiny chunks (<500tokens) without increasing encoding costs or sacrificing quality, andgeneralizes to new tasks with minimal effort by automatically generatingschemas from task descriptions.