A novel deep multi-agent reinforcement learning framework is proposed toidentify and resolve conflicts among a variable number of aircraft in ahigh-density, stochastic, and dynamic sector in en route airspace. Currentlythe sector capacity is limited by human air traffic controller's cognitivelimitation. In order to scale up to a high-density airspace, in this work weinvestigate the feasibility of a new concept (autonomous separation assurance)and a new approach (multi-agent reinforcement learning) to push the sectorcapacity above human cognitive limitation. We propose the concept of usingdistributed vehicle autonomy to ensure separation, instead of a centralizedsector air traffic controller. Our proposed framework utilizes an actor-criticmodel, Proximal Policy Optimization (PPO) that we customize to incorporate anattention network. By using the attention network, we are able to encode theinformation from a variable number of intruder aircraft into a fixed lengthvector and allow the agents to learn which intruder aircraft's information iscritical to achieve the optimal performance. This allows the agents to haveaccess to variable aircraft information in the sector in a scalable, efficientapproach to achieve high traffic throughput under uncertainty. The agents aretrained using a centralized learning, decentralized execution scheme where oneneural network is learned and shared by all agents in the environment. Tovalidate the proposed framework, we designed three challenging case studies inthe BlueSky air traffic control environment. Numerical results show theproposed framework significantly reduces the offline training time withoutsacrificing performance.