In a treatment allocation problem the individuals to be treated often arrivegradually. Initially, when the first treatments are made, little is known aboutthe effect of the treatments but as more treatments are assigned the policymaker learns about their effects by observing outcomes. Thus, there is atradeoff between exploring the available treatments to learn about their meritsand exploiting the best treatment, i.e. administering it as often as possible,in order to maximise the cumulative welfare of all the assignments made.Furthermore, a policy maker may not only be interested in the expected effectof the treatment but also its riskiness. Thus, we allow the welfare function todepend on the first and second moments of the distribution of treatmentoutcomes. We propose a dynamic treatment policy which attains the minimaxoptimal regret relative to the unknown best treatment in this dynamic setting.We allow for the data to arrive in batches as, say, unemployment programs onlystart once a month or blood samples are only send to the laboratory forinvestigation in batches. Furthermore, we show that the minimax optimality doesnot come at the price of overly aggressive experimentation as we provide upperbounds on the expected number of times any suboptimal treatment is assigned. Wealso consider the case where the outcome of a treatment is only observed withdelay as it may take time for the treatment to work. Thus, a doctor faces atradeoff between getting imprecise information quickly by making themeasurement soon after the treatment is given or getting precise informationlater at the expense of less information for the individuals who are treated inthe meantime. Finally, using Danish register data, we show how our treatmentpolicy can be used to assign unemployed to active labor market policy programsin order to maximise the probability of ending the unemployment spell.