Abstract
We present a scalable algorithm for learning parametric constraints in highdimensions from safe expert demonstrations. To reduce the illposedness of theconstraint recovery problem, our method uses hitandrun sampling to generatelower cost, and thus unsafe, trajectories. Both safe and unsafe trajectoriesare used to obtain a representation of the unsafe set that is compatible withthe data by solving an integer program in that representation's parameterspace. Our method can either leverage a known parameterization or incrementallygrow a parameterization while remaining consistent with the data, and weprovide theoretical guarantees on the conservativeness of the recovered unsafeset. We evaluate our method on highdimensional constraints forhighdimensional systems by learning constraints for 7DOF arm, quadrotor, andplanar pushing examples, and show that our method outperforms baselineapproaches.
Quick Read (beta)
Learning Parametric Constraints in High Dimensions from Demonstrations
Abstract
We present a scalable algorithm for learning parametric constraints in high dimensions from safe expert demonstrations. To reduce the illposedness of the constraint recovery problem, our method uses hitandrun sampling to generate lower cost, and thus unsafe, trajectories. Both safe and unsafe trajectories are used to obtain a representation of the unsafe set that is compatible with the data by solving an integer program in that representation’s parameter space. Our method can either leverage a known parameterization or incrementally grow a parameterization while remaining consistent with the data, and we provide theoretical guarantees on the conservativeness of the recovered unsafe set. We evaluate our method on highdimensional constraints for highdimensional systems by learning constraints for 7DOF arm, quadrotor, and planar pushing examples, and show that our method outperforms baseline approaches.
Learning Parametric Constraints in High Dimensions from Demonstrations
Glen Chou, Necmiye Ozay, and Dmitry Berenson 

Department of Electrical Engineering and Computer Science 
University of Michigan, Ann Arbor 
{gchou, necmiye, dmitryb}@umich.edu 
noticebox[b]3rd Conference on Robot Learning (CoRL 2019), Osaka, Japan.\[email protected]
Keywords: learning from demonstration, safe learning, constraint inference
1 Introduction
Learning from demonstration is a powerful paradigm for enabling robots to perform complex tasks. Inverse optimal control and inverse reinforcement learning (IOC/IRL) ([irl_1, irl_2, lfd3, ng_irl]) methods have been used to learn a cost function to replicate the behavior of an expert demonstrator. However, planning problems generally also require knowledge of constraints, which define the states or trajectories that are safe. For example, to get a robot arm to efficiently transport a cup of coffee without spilling it, one can optimize a cost function describing the length of the path, subject to constraints on the pose of the end effector. Constraints can represent safety requirements more strictly than cost functions, especially in safetycritical situations: enforcing a hard constraint can enable the robot to guarantee safe behavior, as opposed to using a “softened” cost penalty term. Furthermore, learning a global constraint shared across many tasks can help the robot generalize. Consider the arm, which must avoid spilling the coffee regardless of where the cup started off or needs to go.
While constraints are important, it can be impractical to exhaustively program all the possible constraints a robot should obey across all tasks. Thus, we consider the problem of extracting the latent constraints within expert demonstrations that are shared across tasks. We adopt the insight of [extended_version] that each safe, optimal demonstration induces a set of lowercost trajectories that must be unsafe due to violation of an unknown constraint. As in [extended_version], we sample these unsafe trajectories, ensuring that they are also consistent with the system dynamics, control constraints, and start/goal constraints. The unsafe trajectories are used together with the safe demonstrations in an “inverse” integer program that recovers an unsafe set consistent with the safe and unsafe trajectories. We make the following additional contributions in this paper. First, by using a (potentially known) parameterization of the constraints, our method enables the inference of safe and unsafe sets in highdimensional constraint spaces. Second, we relax the known parametrization assumption and propose a means to incrementally grow a parameterization with the data. Third, we introduce a method for extracting volumes of states which are guaranteed safe or guaranteed unsafe according to the data and parameterization. Fourth, we provide theoretical analysis showing that our method is guaranteed to output conservative estimates of the unsafe and safe sets under mild assumptions. Finally, we evaluate our method on highdimensional constraints for highdimensional systems by learning constraints for 7DOF arm, quadrotor, and planar pushing examples, showing that our method outperforms baseline approaches.
2 Related Work
Inverse optimal control [kalman, boyd] (IOC) and inverse reinforcement learning (IRL) [ng_irl] aim to recover an objective function that replicates provided expert demonstrations when optimized. Our method is complementary to these approaches; if the demonstrator solves a constrained optimization problem, we are finding its constraints, given the cost; IOC/IRL finds the cost, given the constraints [toussaint]. Risksensitive IRL [sumeet] is complementary to our work, which learns hard constraints. Similarly, [satinder] learns a statespace constraint shared across tasks as a penalty term in the reward function of an MDP. However, when representing a constraint as a penalty, it is unclear if a demonstrated action was made to avoid a penalty or to improve the trajectory cost in terms of the true cost function (or both). Thus, learning a penalty generalizing across cost functions becomes difficult. To avoid this, we assume a known cost function to explicitly reason about the constraint. Also relevant is safe reinforcement learning, which aims to perform exploration while minimizing visitation of unsafe states. Several methods [safe_exploration, krause, claire] use Gaussian process models to incrementally explore safe regions in the state space. We take a complementary approach to safe learning by using demonstrations in place of exploration to guide the learning of safe behaviors. Methods exist for learning geometric state space constraints [vijayakumar, shah], task space equality constraints [howard1, howard2], and convex constraints [melanie], which our algorithm generalizes by being able to learn arbitrary nonconvex parametric inequality constraints defined in some constraint space (not limited to the state space). Other methods aim to learn local trajectorybased constraints [dmitry, anca, lfdc1, lfdc2, lfdc3, lfdc4] by reasoning over the constraints within a single trajectory or task. In contrast, our method aims to learn a global constraint shared across tasks.
The method closest to our work is [extended_version], which learns a global shared constraint on a gridded constraint space; hence, the resulting constraint recovery method scales exponentially with the constraint space dimension and cannot exploit any side information on the structure of the constraint. This often leads to very conservative estimates of the unsafe set, and only grid cells visited by demonstrations can be learned guaranteed safe. We overcome these shortcomings with a novel algorithm that exploits constraint parameterizations for scalability and integration of prior knowledge, and also enables learning volumes of guaranteed safe/unsafe states in the original nondiscretized constraint space, yielding less conservative estimates of the safe/unsafe sets under weaker assumptions than [extended_version].
3 Problem Setup
Consider a system with discretetime dynamics ${x}_{t+1}=f({x}_{t},{u}_{t},t)$ or continuoustime dynamics $\dot{x}=f(x,u,t)$, where $x\in \mathcal{X}$ and $u\in \mathcal{U}$. The system performs tasks $\mathrm{\Pi}$ represented as constrained optimization problems over state/control trajectories ${\xi}_{x}$/${\xi}_{u}$ in state/control trajectory space ${\mathcal{T}}^{x}$/${\mathcal{T}}^{u}$:
Problem 1 (Forward problem / “task” $\mathrm{\Pi}$).
$$\begin{array}{ccc}\hfill \underset{{\xi}_{x},{\xi}_{u}}{\text{min}}\hfill & {c}_{\mathrm{\Pi}}({\xi}_{x},{\xi}_{u})\hfill & \\ \hfill \text{s.t.}\hfill & \varphi ({\xi}_{x},{\xi}_{u})\in \mathcal{S}(\theta )\subseteq \mathcal{C}\hfill & \\ & \overline{\varphi}({\xi}_{x},{\xi}_{u})\in \overline{\mathcal{S}}\subseteq \overline{\mathcal{C}}\hfill & \\ & {\varphi}_{\mathrm{\Pi}}({\xi}_{x},{\xi}_{u})\in {\mathcal{S}}_{\mathrm{\Pi}}\subseteq {\mathcal{C}}_{\mathrm{\Pi}}\hfill & \end{array}$$  (1) 
where ${c}_{\mathrm{\Pi}}(\cdot ):{\mathcal{T}}^{x}\times {\mathcal{T}}^{u}\to \mathbb{R}$ is a cost function for task $\mathrm{\Pi}$, and $\varphi (\cdot ,\cdot ):{\mathcal{T}}^{x}\times {\mathcal{T}}^{u}\to \mathcal{C}$ is a known mapping from statecontrol trajectories to a constraint space $\mathcal{C}$, elements of which are referred to as “constraint states”. Mappings $\overline{\varphi}(\cdot ,\cdot ):{\mathcal{T}}^{x}\times {\mathcal{T}}^{u}\to \overline{\mathcal{C}}$ and ${\varphi}_{\mathrm{\Pi}}(\cdot ,\cdot ):{\mathcal{T}}^{x}\times {\mathcal{T}}^{u}\to {\mathcal{C}}_{\mathrm{\Pi}}$ are known and map to constraint spaces $\overline{\mathcal{C}}$ and ${\mathcal{C}}_{\mathrm{\Pi}}$, containing a known shared safe set $\overline{\mathcal{S}}$ and a known taskdependent safe set ${\mathcal{S}}_{\mathrm{\Pi}}$, respectively. In this paper, we take ${\mathcal{T}}_{{\mathcal{S}}_{\mathrm{\Pi}}}$ to be the set of trajectories satisfying start/goal state constraints and ${\mathcal{T}}_{\overline{\mathcal{S}}}$ to be the set of dynamicallyfeasible trajectories obeying control constraints, though the dynamics may not be known in closed form. $\mathcal{S}(\theta )=\{k\in \mathcal{C}\mid g(k,\theta )>0\}$ is an unknown safe set defined by an unknown parameter $\theta \in \mathrm{\Theta}$ and a possibly unknown parameterization $g(\cdot ,\cdot )$. A demonstration, ${\xi}_{xu}\doteq ({\xi}_{x},{\xi}_{u})\in {\mathcal{T}}^{xu}$, is a statecontrol trajectory which approximately solves Problem 1, i.e. it satisfies all constraints and its cost is at most a factor of $\delta $ above the cost of a globally optimal solution ${\xi}_{xu}^{*}$, i.e. $c({\xi}_{x},{\xi}_{u})\le (1+\delta )c({\xi}_{x}^{*},{\xi}_{u}^{*})$. For convenience, we summarize our frequently used notation in Appendix F. In this paper, our goal is to recover the safe set $\mathcal{S}(\theta )$ and its complement, the unsafe set $\mathcal{A}(\theta )\doteq \mathcal{S}{(\theta )}^{c}$, given ${N}_{s}$ demonstrations ${\{{\xi}_{{s}_{j}}^{*}\}}_{j=1}^{{N}_{s}}$, ${N}_{\mathrm{\neg}s}$ inferred unsafe trajectories ${\{{\xi}_{\mathrm{\neg}{s}_{k}}\}}_{k=1}^{{N}_{\mathrm{\neg}s}}$, the cost function ${c}_{\mathrm{\Pi}}(\cdot )$, taskdependent constraints ${\mathcal{S}}_{\mathrm{\Pi}}$, and a simulator generating dynamicallyfeasible trajectories satisfying control constraints.
4 Method
In this section, we describe our method (a full algorithm block is presented in Appendix A). In Section 4.1, we describe how to sample unsafe trajectories. In Sections 4.2 and 4.3, we present mixed integer programs which recover a consistent constraint for a fixed parameterization and extract volumes of guaranteed safe/unsafe states. In Section 4.4, we present how our method can be extended to the case of unknown parameterizations.
4.1 Sampling lowercost trajectories
In this section, we describe the general sampling framework presented in [extended_version] while also relaxing the assumption of known closedform dynamics made in [extended_version]. We define the set of unsafe statecontrol trajectories induced by an optimal, safe demonstration ${\xi}_{xu}^{*}$, ${\mathcal{T}}_{\mathcal{A}}^{{\xi}_{xu}^{*}}$, as the set of statecontrol trajectories of lower cost that obey the known constraints, $$. We sample from ${\mathcal{T}}_{\mathcal{A}}^{{\xi}_{xu}^{*}}$ to obtain lowercost trajectories obeying the known constraints using hitandrun sampling [hit_and_run], a method guaranteeing convergence to a uniform distribution of samples over ${\mathcal{T}}_{\mathcal{A}}^{{\xi}_{xu}^{*}}$ in the limit; an illustration is shown in Fig. 4.1. Hitandrun starts from an initial point within the set, chooses a direction uniformly at random, moves a random amount in that direction such that the new point remains within the set, and repeats [extended_version]. We sample from ${\mathcal{T}}_{\mathcal{A}}^{{\xi}_{xu}^{*}}$ indirectly by sampling control sequences and rolling them out through the dynamics to generate dynamicallyfeasible trajectories. We emphasize that $f(x,u,t)$ does not need to be known in closed form. Given a control sequence sampled by hitandrun, a simulator can instead be used to output the resulting dynamicallyfeasible trajectory, which can then be checked for membership in ${\mathcal{T}}_{\mathcal{A}}^{{\xi}_{xu}^{*}}$ exactly as if the dynamics were known in closed form. Also, $\delta $suboptimality of the demonstration ${\xi}_{xu}^{\text{dem}}$ can be handled in this framework by sampling instead from $$. Optimal substructure in the cost function can be exploited to sample unsafe subtrajectories over shorter time windows on the demonstrations; shorter unsafe trajectories provide less ambiguous information regarding $\mathcal{A}$ and can better reduce the illposedness of the constraint recovery problem [extended_version].
4.2 Recovering the constraint
Recall that the unsafe set can be described by some parameterization $\mathcal{A}(\theta )\doteq \{k\in \mathcal{C}g(k,\theta )\le 0\}$, where we assume for now that $g(\cdot ,\cdot )$ is known, and $\theta $ are parameters to be learned. Intuitively, $g(k,\theta )$ tells us if constraint state $k$ (which is any element of constraint space $\mathcal{C}$) is safe according to parameter $\theta $. Then, a feasibility problem can be written to find a $\theta $ consistent with the data:
Problem 2 (Parametric constraint recovery problem).
find  $\theta $  
s.t.  $g({k}_{i},\theta )>0,\forall {k}_{i}\in \varphi ({\xi}_{{s}_{j}}^{*}),\forall j=1,\mathrm{\dots},{N}_{s}$  (2a)  
$\exists {k}_{i}\in \varphi ({\xi}_{\mathrm{\neg}{s}_{k}}),g({k}_{i},\theta )\le 0,\forall k=1,\mathrm{\dots},{N}_{\mathrm{\neg}s}$  (2b) 
Constraint (2a) enforces that each safe constraint state lies outside $\mathcal{A}(\theta )$ and constraint (2b) enforces that at least one constraint state on each unsafe trajectory lies inside $\mathcal{A}(\theta )$. Denote $\mathcal{F}$ as the feasible set of Problem 2. Further denote ${\mathcal{G}}_{\mathrm{\neg}s}$ and ${\mathcal{G}}_{s}$ as the set of constraint states which are learned guaranteed unsafe and safe, respectively; that is, a constraint state $k\in {\mathcal{G}}_{\mathrm{\neg}s}$ or $k\in {\mathcal{G}}_{s}$ if $k$ is classified unsafe or safe for all $\theta \in \mathcal{F}$: $${\mathcal{G}}_{\mathrm{\neg}s}\doteq \bigcap _{\theta \in \mathcal{F}}\{kg(k,\theta )\le 0\}$$ (3) $${\mathcal{G}}_{s}\doteq \bigcap _{\theta \in \mathcal{F}}\{kg(k,\theta )>0\}$$ (4)
In Problem 2, it is possible to learn that a constraint state is guaranteed safe/unsafe even if it does not lie directly on a demonstration/unsafe trajectory. This is due to the parameterization: for the given set of safe and unsafe trajectories, there may be no feasible $\theta \in \mathcal{F}$ where $k$ is classified unsafe/safe. It is precisely this extrapolation which will enable us to learn constraints in highdimensional spaces. We now identify classes of parameterizations for which Problem 2 can be efficiently solved:
Problem 3 (Polytopic constraint recovery problem).
find  $\theta ,{\{{b}_{s}^{i}\}}_{i=1}^{{N}_{s}},{\{{b}_{\mathrm{\neg}s}^{i}\}}_{i=1}^{{N}_{\mathrm{\neg}s}}$  
s.t.  $H(\theta ){k}_{i}>h(\theta )M(1{b}_{s}^{i}),{b}_{{s}_{j}}^{i}\in {\{0,1\}}^{{N}_{h}},$  
$\mathrm{}{\displaystyle \sum _{i=1}^{{N}_{h}}}{b}_{{s}_{j}}^{i}\ge 1,\forall {k}_{i}\in \varphi ({\xi}_{{s}_{j}}^{*}),i=1,\mathrm{\dots},{T}_{j},j=1,\mathrm{\dots},{N}_{s}$  (5a)  
$H(\theta ){k}_{i}\le h(\theta )+M(1{b}_{\mathrm{\neg}{s}_{k}}^{i}){\mathrm{\U0001d7cf}}_{{N}_{h}},{b}_{\mathrm{\neg}{s}_{k}}^{i}\in \{0,1\},$  
$\mathrm{}{\displaystyle \sum _{i=1}^{{T}_{k}}}{b}_{\mathrm{\neg}{s}_{k}}^{i}\ge 1,\forall {k}_{i}\in \varphi ({\xi}_{\mathrm{\neg}{s}_{k}}),\forall k=1,\mathrm{\dots},{N}_{\mathrm{\neg}s}$  (5b) 
Linear case: $g(k,\theta )$ is defined by a Boolean conjunction of linear inequalities, i.e. $\mathcal{A}(\theta )$ can be defined as the union and intersection of halfspaces. For this case, mixedinteger programming can be employed. If $g(k,\theta )\le 0$ is a single polytope, i.e. $g(k,\theta )\le 0\iff H(\theta )k\le h(\theta )$, where $H(\theta )$ and $h(\theta )$ are affine in $\theta $, we can solve Problem 3, a mixed integer feasibility problem, to find a feasible $\theta $. In Problem 3, $M$ is a large positive number and ${\mathrm{\U0001d7cf}}_{{N}_{h}}$ is a vector of ones of length ${N}_{h}$, where ${N}_{h}$ is the number of rows in $H(\theta )$. Constraints (5a) and (5b) use the bigM formulation [bigM] to enforce that each safe constraint state lies outside $\mathcal{A}(\theta )$ and that at least one constraint state on each unsafe trajectory lies inside $\mathcal{A}(\theta )$. Similar problems can be solved when the safe/unsafe set can be described by unions of polytopes. As an alternative to integer programming, satisfiability modulo theories (SMT) solvers [smt] can also be used to solve Problem 2 if $g(k,\theta )$ is defined by a Boolean conjunction of linear inequalities.
Convex case: $g(k,\theta )$ is defined by a Boolean conjunction of convex inequalities, i.e. $\mathcal{A}(\theta )$ can be described as the union and intersection of convex sets. For this case, satisfiability modulo convex optimization (SMC) [smc] can be employed to find a feasible $\theta $.

•
For suboptimal demonstrations / imperfect lowercost trajectory sampling, Problem 3 can become infeasible. To address this, slack variables can be introduced: replace constraint ${\sum}_{i=1}^{{T}_{k}}{b}_{\mathrm{\neg}{s}_{k}}^{i}\ge 1$ with ${\sum}_{i=1}^{{T}_{k}}{b}_{\mathrm{\neg}{s}_{k}}^{i}\ge {v}_{k},{v}_{k}\in \{0,1\}$ and change the feasibility problem to minimization of ${\sum}_{k=1}^{{N}_{\mathrm{\neg}s}}(1{v}_{k})$; this finds a $\theta $ that is consistent with as many unsafe trajectories as possible.

•
In addition to recovering sets of guaranteed learned unsafe and safe constraint states, a probability distribution over possibly unsafe constraint states can be estimated by sampling unsafe sets $\mathcal{A}(\theta )$ from the feasible set of Problem 2 using hitandrun sampling, starting from a feasible $\theta $.
4.3 Extracting guaranteed safe and unsafe states
One can check if a constraint state $k\in {\mathcal{G}}_{s}$ or $k\in {\mathcal{G}}_{\mathrm{\neg}s}$ by adding a constraint $g(k,\theta )\le 0$ or $g(k,\theta )>0$ to Problem 2 and checking feasibility of the resulting program; if the program is infeasible, $k\in {\mathcal{G}}_{s}$ or $k\in {\mathcal{G}}_{\mathrm{\neg}s}$. In other words, solving this modified integer program can be seen as querying an oracle about the safety of a constraint state $k$. The oracle can then return that $k$ is guaranteed safe (program infeasible after forcing $k$ to be unsafe), guaranteed unsafe (program infeasible after forcing $k$ to be safe), or unsure (program remains feasible despite forcing $k$ to be safe or unsafe).
Unlike the gridded formulation in [extended_version], Problem 2 works in the continuous constraint space. Thus, it is not possible to exhaustively check if each $k\in {\mathcal{G}}_{\mathrm{\neg}s}$ or $k\in {\mathcal{G}}_{s}$. To address this, the neighborhood of some constraint state ${k}_{\text{query}}$ can be checked for membership in ${\mathcal{G}}_{\mathrm{\neg}s}$ by solving the following problem:
Problem 4 (Volume extraction).
$$\begin{array}{ccc}\hfill \underset{\theta ,\epsilon}{\text{min}}\hfill & \epsilon \hfill & \\ \hfill \text{s.t.}\hfill & g({k}_{i},\theta )>0,\forall {k}_{i}\in \varphi ({\xi}_{{s}_{j}}^{*}),\forall j=1,\mathrm{\dots},{N}_{s}\hfill & \\ & \exists {k}_{i}\in \varphi ({\xi}_{\mathrm{\neg}{s}_{k}}),g({k}_{i},\theta )\le 0,\forall k=1,\mathrm{\dots},{N}_{\mathrm{\neg}s}\hfill & \\ & \exists {k}_{\text{\mathit{n}\mathit{e}\mathit{a}\mathit{r}}}\in \{{k}_{\text{\mathit{n}\mathit{e}\mathit{a}\mathit{r}}}\mid {\parallel {k}_{\text{\mathit{n}\mathit{e}\mathit{a}\mathit{r}}}{k}_{\text{\mathit{q}\mathit{u}\mathit{e}\mathit{r}\mathit{y}}}\parallel}_{\mathrm{\infty}}\le \epsilon \},g({k}_{\text{\mathit{n}\mathit{e}\mathit{a}\mathit{r}}},\theta )>0\hfill & \end{array}$$ 
In words, Problem 4 finds the smallest $\epsilon $hypercube centered at ${k}_{\text{query}}$ containing a $k\notin {\mathcal{G}}_{\mathrm{\neg}s}$; thus, any hypercube of size $$ is contained within ${\mathcal{G}}_{\mathrm{\neg}s}$: $\{k\mid {\parallel k{k}_{\text{query}}\parallel}_{\mathrm{\infty}}\le \widehat{\u03f5}\}\subseteq {\mathcal{G}}_{\mathrm{\neg}s}$. We can write a similar problem to check the neighborhood of ${k}_{\text{query}}$ for membership in ${\mathcal{G}}_{s}$. For some common parameterizations (axisaligned hyperrectangles, convex sets), there are even more efficient methods for recovering subsets of ${\mathcal{G}}_{s}$ and ${\mathcal{G}}_{\mathrm{\neg}s}$, which are described in Appendix B. Volumes of safe/unsafe space can thus be produced by repeatedly solving Problem 4 for different ${k}_{\text{query}}$, and these volumes can be passed to a planner to generate new trajectories that are guaranteed safe.
4.4 Unknown parameterizations
For many realistic applications, we do not have access to a known parameterization which can represent the unsafe set. Despite this, complex unsafe/safe sets can often be approximated as the union of many simple unsafe/safe sets. Along this line of thought, we present a method for incrementally growing a parameterization based on the complexity of the demonstrations and unsafe trajectories.
Suppose that the true parameterization $g(k,\theta )$ of the unsafe set $\mathcal{A}(\theta )=\{k\mid g(k,\theta )\le 0\}$ is unknown but can be exactly or approximately expressed as the union of ${N}^{*}$ simple sets $\mathcal{A}(\theta )\approxeq {\bigcup}_{i=1}^{{N}^{*}}\{k\mid {g}_{s}(k,{\theta}_{i})\le 0\}\doteq {\bigcup}_{i=1}^{{N}^{*}}\mathcal{A}({\theta}_{i})$, where each simple set $\mathcal{A}({\theta}_{i})$ has a known parameterization ${g}_{s}(\cdot ,\cdot )$ and ${N}^{*}$, the minimum number of simple sets needed to reconstruct $\mathcal{A}$, is unknown.
A lower bound on ${N}^{*}$, $\underset{\xaf}{N}$, can be estimated by incrementally adding simple sets until Problem 2 becomes feasible. However, for $$, the extracted ${\mathcal{G}}_{s}$ and ${\mathcal{G}}_{\mathrm{\neg}s}$ are not guaranteed to be conservative estimates of $\mathcal{S}$ and $\mathcal{A}$ (Theorem 5.3), and ${\mathcal{G}}_{s}$ and ${\mathcal{G}}_{\mathrm{\neg}s}$ are only guaranteed to be conservative if $\widehat{N}\ge {N}^{*}$, where $\widehat{N}$ is the chosen number of simple sets (see Theorem 5.2). Unfortunately, inferring a guaranteed overestimation of ${N}^{*}$ only from data is not possible, as there can always be subsets of the constraint which are not activated by the given demonstrations. Two facts mitigate this:

•
If an upper bound on the number of simple sets needed to describe $\mathcal{A}(\theta )$, ${\overline{N}}_{\text{loose}}\ge {N}^{*}$, is known (where this bound can be trivially loose), ${\mathcal{G}}_{s}\subseteq \mathcal{S}$ and ${\mathcal{G}}_{\mathrm{\neg}s}\subseteq \mathcal{A}$ by using ${\overline{N}}_{\text{loose}}$ simple sets in solving Problem 2. Hence, by using ${\overline{N}}_{\text{loose}}$, ${\mathcal{G}}_{s}$ and ${\mathcal{G}}_{\mathrm{\neg}s}$ can be made guaranteed conservative (see Theorem 5.2), at the cost of the resulting ${\mathcal{G}}_{s}$ and ${\mathcal{G}}_{\mathrm{\neg}s}$ being potentially small.

•
As the demonstrations begin to cover the space, $\underset{\xaf}{N}\to {N}^{*}$. Hence, by using $\underset{\xaf}{N}$ simple sets, ${\mathcal{G}}_{s}$ and ${\mathcal{G}}_{\mathrm{\neg}s}$ are asymptotically conservative.
In our experiments, we choose our simple sets as axisaligned hyperrectangles in $\mathcal{C}$, which is motivated by: 1) any open set in $\mathcal{C}$ can be approximated as a countable/finite union of open axisaligned hyperrectangles [tao]; 2) unions of hyperrectangles are easily representable in Problem 3.
5 Theoretical Analysis
In this section, we present theoretical analysis on our parametric constraint learning algorithm. In particular, we analyze the conditions under which our algorithm is guaranteed to learn a conservative estimate of the safe and unsafe sets. For space, the proofs and additional results on conservativeness (Section C.2) and the learnability of a constraint (Section C.1) are presented in the appendix. We develop the theory for $\mathcal{C}=\mathcal{X}$ for legibility, but the results can be easily extended to general $\mathcal{C}$.
Theorem 5.1 (Conservativeness: Known parameterization).
Suppose the parameterization $g\mathit{}\mathrm{(}x\mathrm{,}\theta \mathrm{)}$ is known exactly. Then, for a discretetime system, extracting ${\mathrm{G}}_{\mathrm{\neg}\mathit{}s}$ and ${\mathrm{G}}_{s}$ (as defined in (3) and (4), respectively) from the feasible set of Problem 2 returns ${\mathrm{G}}_{\mathrm{\neg}\mathit{}s}\mathrm{\subseteq}\mathrm{A}$ and ${\mathrm{G}}_{s}\mathrm{\subseteq}\mathrm{S}$. Further, if the known parameterization is $H\mathit{}\mathrm{(}\theta \mathrm{)}\mathit{}{x}_{i}\mathrm{\le}h\mathit{}\mathrm{(}\theta \mathrm{)}$ and $M$ in Problem 3 is chosen to be greater than
$$\mathrm{max}(\underset{{x}_{i}\in {\xi}_{s}}{\mathrm{max}}\underset{\theta}{\mathrm{max}}\underset{j}{\mathrm{max}}{(H(\theta ){x}_{i}h(\theta ))}_{j},\underset{{x}_{i}\in {\xi}_{\mathrm{\neg}s}}{\mathrm{max}}\underset{\theta}{\mathrm{max}}\underset{j}{\mathrm{max}}{(H(\theta ){x}_{i}h(\theta ))}_{j}),$$ 
then extracting ${\mathrm{G}}_{\mathrm{\neg}\mathit{}s}$ and ${\mathrm{G}}_{s}$ from the feasible set of Problem 3 recovers ${\mathrm{G}}_{\mathrm{\neg}\mathit{}s}\mathrm{\subseteq}\mathrm{A}$ and ${\mathrm{G}}_{s}\mathrm{\subseteq}\mathrm{S}$.
We also present conservativeness results for continuoustime dynamics in Corollary C.2.
Now, let’s consider the case where the true parameterization is not known and we use the incremental method described in Section 4.4, where ${g}_{s}(x,\theta )$ is the simple parameterization. We consider the overparameterized case (Theorem 5.2) and the underparameterized case (Theorem 5.3). We analyze the case where the true, under, and overparameterization are defined respectively as:
$$g(x,\theta )\le 0\iff \underset{i=1}{\overset{{N}^{*}}{\bigvee}}({g}_{s}(x,{\theta}_{i})\le 0)$$  (6) 
$$  (7) 
$$g(x,\theta )\le 0\iff \underset{i=1}{\overset{\overline{N}}{\bigvee}}({g}_{s}(x,{\theta}_{i})\le 0),\overline{N}>{N}^{*}.$$ (8)
Theorem 5.2 (Conservativeness: Overparameterization).
Theorem 5.3 (Conservativeness: Underparameterization).
Suppose the true parameterization and underparameterization are defined as in (6) and (7). Furthermore, assume that we incrementally grow the parameterization as described in Section 4.4. Then, the following are true:

1.
${\mathcal{G}}_{\mathrm{\neg}s}$ and ${\mathcal{G}}_{s}$ are not guaranteed to be contained in $\mathcal{A}$ (unsafe set) and $\mathcal{S}$ (safe set), respectively.

2.
Each recovered simple unsafe set $\mathcal{A}({\theta}_{i})$, $i=1,\mathrm{\dots},\underset{\xaf}{N}$, for any ${\theta}_{1},\mathrm{\dots},{\theta}_{\underset{\xaf}{N}}\in \mathcal{F}$, touches the true unsafe set (there are no spurious simple unsafe sets): for $i=1,\mathrm{\dots},\underset{\xaf}{N}$, for ${\theta}_{1},\mathrm{\dots},{\theta}_{\underset{\xaf}{N}}\in \mathcal{F}$, $\mathcal{A}({\theta}_{i})\cap \mathcal{A}\ne \mathrm{\varnothing}$ ($\underset{\xaf}{N}$ is as defined in Section 4.4).
6 Results
We evaluate our method, showing that our method can be applied to constraints with unknown parameterizations (Section 6.1), highdimensional constraints defined for highdimensional systems (Section 6.2), and settings where the dynamics are not known in closed form (Section 6.3). We also compare our performance with a neural network (NN) baseline^{1}^{1} 1 In all experiments, 1) the NN is trained with the safe/unsafe trajectories and predicts at test time if a queried constraint state is safe/unsafe; 2) error bars are generated by initializing the NN with 10 different random seeds and evaluating accuracy after training. The architectures/training details are presented in Appendix E.. We further compare with the gridbased method [extended_version] on the 2D examples. For space, experimental details are provided in Appendix E.
6.1 Unknown parameterization
Ushape: We first present a kinematic 2D example where a Ushape $\mathcal{A}$ is to be learned, but the number of simple unsafe sets needed to represent $\mathcal{A}$ (three) is unknown. In Row 1, Column 1 of Fig. 6.1, we outline $\mathcal{A}$ in black and overlay ${\mathcal{G}}_{\mathrm{\neg}s}$, ${\mathcal{G}}_{s}$, and the six provided demonstrations, synthetically generated via trajectory optimization. We note that due to the chosen control constraints and Ushape, there are parts of $\mathcal{A}$ (a subset of the white region in Fig. 6.1, Row 1, Column 1) which cannot be implied unsafe by sampled unsafe trajectories and the parameterization (see Theorem C.1). As a result, ${\mathcal{G}}_{\mathrm{\neg}s}$ may not fully cover $\mathcal{A}$, even with more demonstrations (Fig. 6.1, Row 1, Column 3). Note that the decrease in coverage^{2}^{2} 2 Coverage is measured as the intersection over union (IoU) of the relevant sets (see legends for exact formula). at the third demonstration is due to a increase from a twobox parameterization to a threebox parameterization. Likewise, the accuracy^{3}^{3} 3 In all experiments, computed accuracies are: IP (safe) = $\text{Vol}({\mathcal{G}}_{s}\cap \mathcal{S})/\text{Vol}({\mathcal{G}}_{s})$, IP (unsafe) = $\text{Vol}({\mathcal{G}}_{\mathrm{\neg}s}\cap \mathcal{A})/\text{Vol}({\mathcal{G}}_{\mathrm{\neg}s})$, NN (safe) = $({\sum}_{i=1}^{q}{\mathbf{I}}_{({x}_{i}\in \mathcal{S})\wedge (\text{NN classified}{x}_{i}\text{as safe})})/{\sum}_{i=1}^{q}{\mathbf{I}}_{{x}_{i}\in \mathcal{S}}$, NN (unsafe) = $({\sum}_{i=1}^{q}{\mathbf{I}}_{({x}_{i}\in \mathcal{A})\wedge (\text{NN classified}{x}_{i}\text{as unsafe}})/{\sum}_{i=1}^{q}{\mathbf{I}}_{{x}_{i}\in \mathcal{A}}$, where ${x}_{1},\mathrm{\dots},{x}_{q}$ are query states sampled from ${\mathcal{G}}_{\mathrm{\neg}s}\cup {\mathcal{G}}_{s}$ and ${\mathbf{I}}_{(\cdot )}$ is the indicator function. Note that NN accuracy is computed only on $({\mathcal{G}}_{s}\cup {\mathcal{G}}_{\mathrm{\neg}s})\subseteq \mathcal{C}$. decreases at the second demonstration due to overapproximation of $\mathcal{A}$ with two boxes (Fig. 6.1, Row 1, Column 4), but this overapproximation vanishes when switching to the threebox parameterization (which is exact; hence ${\mathcal{G}}_{s}$ and ${\mathcal{G}}_{\mathrm{\neg}s}$ are guaranteed conservative, c.f. Theorem 5.1). The gridbased method in [extended_version] always has perfect accuracy, since it does not extrapolate beyond the observed trajectories. However, as a result of that, it also yields low coverage (Fig. 6.1, Row 1, Column 2). The NN baseline achieves lower accuracy for the unsafe set as it misclassifies some corners of the U. Recovering a feasible $\theta $ using a multibox variant of Problem 3 recovers $\mathcal{A}$ exactly (Fig. 6.1, Row 1, Column 5). Finally, we note that in this (and future) examples, demonstrations were specifically chosen to be informative about the constraint. We present a version of this example in Appendix D with random demonstrations and show that the constraint is still learned (albeit needing more demonstrations).
Infinite boxes: To show that our method can still learn a constraint that cannot be easily expressed using a chosen parameterization, we limit our parameterization to an unknown number of axisaligned boxes and attempt to learn a diagonal “I” unsafe set (see Fig. 6.1, Row 2). This is a particularly difficult example, since an infinite number of axisaligned boxes will be needed to recover $\mathcal{A}$ exactly. However, for finite data, only a finite number of boxes will be needed; in particular, for 1, 2, 3, and 4 demonstrations (which are synthetically generated assuming kinematic system constraints), 3, 5, 6, and 6 boxes are required to generate a parameterization consistent with the data (see Fig. 6.1, Row 2, Column 1). Also overlaid in Fig. 6.1, Row 2, Column 1 are ${\mathcal{G}}_{\mathrm{\neg}s}$ and ${\mathcal{G}}_{s}$, which are approximated by solving Problem 4 for randomly sampled ${k}_{\text{center}}$. Compared to the gridded formulation in [extended_version] (see Fig. 6.1, Row 2, Column 3), ${\mathcal{G}}_{s}$ and ${\mathcal{G}}_{\mathrm{\neg}s}$ cover $\mathcal{S}$ and $\mathcal{A}$ far better due to the parameterization enabling the IP to extrapolate more from the demonstrations. Furthermore, we note that while the gridded case has perfect accuracy for the safe set, it does not for the unsafe set, due to grid alignment [extended_version]. Overall, the multibox variant of Problem 3 recovers $\mathcal{A}$ well (Fig. 6.1, Row 2, Column 5), and the remaining gap can be improved with more data. Last, we note that the NN baseline reaches comparable accuracies here (Fig. 6.1, Row 2, Column 4), since our method suffers from a few disadvantages for this particular example. First, attempting to represent the “I” with a finite number of boxes introduces a modeling bias that the NN does not have. Second, since the system is kinematic and the constraint is lowdimensional, many unsafe trajectories can be sampled, providing good coverage of the unsafe set. We show later that for higher dimensional constraints/systems with highly constrained dynamics, it becomes difficult to gather enough data for the NN to perform well.
6.2 Highdimensional examples
6D pose constraint for a 7DOF robot arm: In this example, we learn a 6D hyperrectangular pose constraint for the end effector of a 7DOF Kuka iiwa arm. One such setting is when the robot is to bring a cup to a human while ensuring its contents do not spill (angle constraint) and proxemics constraints (i.e. the end effector never gets too close to the human) are satisfied (position constraint). We examine this problem for the cases of optimal and suboptimal demonstrations.
Demonstration setup: The end effector orientation (parametrized in Euler angles) and position are constrained to satisfy $(\alpha ,\beta ,\gamma )\in [\underset{\xaf}{\alpha},\overline{\alpha}]\times [\underset{\xaf}{\beta},\overline{\beta}]\times [\underset{\xaf}{\gamma},\overline{\gamma}]$ and $(x,y,z)\in [\underset{\xaf}{x},\overline{x}]\times [\underset{\xaf}{y},\overline{y}]\times [\underset{\xaf}{z},\overline{z}]$ (see Fig. 6.2, Column 1). For the optimal case, we synthetically generate seven demonstrations minimizing jointspace trajectory length. For the suboptimal case, five suboptimal continuoustime demonstrations approximately optimizing jointspace trajectory length are recorded in a virtual reality environment, where a human demonstrator moves the arm from desired start to goal end effector configurations using an HTC Vive (see Fig. E.1). The demonstrations are timediscretized for lowercost trajectory sampling [extended_version]. In both cases, the constraint is recovered with Problem 3, where $H(\theta )={[I,I]}^{\top}$ and $h(\theta )=\theta ={[\overline{x},\overline{y},\overline{z},\overline{\alpha},\overline{\beta},\overline{\gamma},\underset{\xaf}{x},\underset{\xaf}{y},\underset{\xaf}{z},\underset{\xaf}{\alpha},\underset{\xaf}{\beta},\underset{\xaf}{\gamma}]}^{\top}$. For the suboptimal case, slack variables are added to ensure feasibility of Problem 3, and for a suboptimal demonstration of cost $\widehat{c}$, we only use trajectories of cost less than $0.9\widehat{c}$ as unsafe trajectories.
Results: The coverage plots (Fig. 6.2, Rows 1 and 3, Col. 2) show that as the number of demonstrations increases, ${\mathcal{G}}_{s}$/${\mathcal{G}}_{\mathrm{\neg}s}$ approach the true safe/unsafe sets $\mathcal{S}$/$\mathcal{A}$ ^{4}^{4} 4 For the unsafe sets, the IoUs are computed between ${\mathcal{G}}_{\mathrm{\neg}s}^{c}$ and ${\mathcal{A}}^{c}$, as in high dimensions, the IoU changes more smoothly for the complements than the IoU between ${\mathcal{G}}_{\mathrm{\neg}s}$ and $\mathcal{A}$, so we plot the the former for visual clarity.. For the suboptimal case, the low IoU values for lower numbers of demonstrations is due to overapproximation of the unsafe set in the $\alpha $ component (arising from continuoustime discretization and imperfect knowledge of the suboptimality bound); the fifth demonstration, where $\alpha $ takes values near $\pi ,\pi $ greatly reduces this overapproximation. The accuracy plots (Fig. 6.2, Rows 2 and 4, Col. 2) present results consistent with the theory: for the optimal case, all constraint states in ${\mathcal{G}}_{s}$ and ${\mathcal{G}}_{\mathrm{\neg}s}$ are truly safe and unsafe (Theorem 5.1), and the small overapproximation for the suboptimal case is consistent with the continuoustime conservativeness (Theorem C.2). Note that the NN accuracy is lower and can oscillate with demonstrations, since it finds just a single constraint which is approximately consistent with the data, while our method classifies safety by consulting all possible constraints which are exactly consistent with the data, thus performing more consistently. The NN performs better on the suboptimal case than it does on the optimal case, as more unsafe trajectories are sampled due to the suboptimality, improving coverage of the unsafe set. The projections of ${\widehat{\mathcal{G}}}_{\mathrm{\neg}s}^{c}$ (Fig. 6.2, Cols. 34, in red), where ${\widehat{\mathcal{G}}}_{\mathrm{\neg}s}\subseteq {\mathcal{G}}_{\mathrm{\neg}s}$ is obtained using the method in Appendix B, are compared to the safe set (blue outline), showing that the two match nearly exactly (though the gap for the suboptimal case is larger), and the gap can be likely reduced with more demonstrations. The projections of ${\mathcal{G}}_{s}$ (Fig. 6.2, Col. 5) match exactly with $\mathcal{A}$ for the optimal case (true safe set is outlined in blue) and match closely for the suboptimal case. Note that ${\mathcal{G}}_{s}\subseteq \mathcal{S}$, as is the case for all axisaligned box parameterizations.
3D constraint for 12D quadrotor model: We learn a 3D box angular velocity constraint for a quadrotor with discretetime 12D dynamics (see Appendix E for details). In this scenario, the quadrotor must avoid an a priori known unsafe set in position space while also ensuring that angular velocities are below a threshold: $(\dot{\alpha},\dot{\beta},\dot{\gamma})\in [\underset{\xaf}{\overset{\dot{}}{\alpha}},\overline{\dot{\alpha}}]\times [\underset{\xaf}{\overset{\dot{}}{\beta}},\overline{\dot{\beta}}]\times [\underset{\xaf}{\overset{\dot{}}{\gamma}},\overline{\dot{\gamma}}]$. The $(\dot{\alpha},\dot{\beta},\dot{\gamma})$ safe set is to be inferred from two demonstrations (see Fig. 6.3). The constraint is recovered with Problem 3, where $H(\theta )={[I,I]}^{\top}$ and $h(\theta )=\theta ={[\overline{\dot{\alpha}},\overline{\dot{\beta}},\overline{\dot{\gamma}},\underset{\xaf}{\overset{\dot{}}{\alpha}},\underset{\xaf}{\overset{\dot{}}{\beta}},\underset{\xaf}{\overset{\dot{}}{\gamma}}]}^{\top}$. Fig. 6.4 shows that with more demonstrations, ${\mathcal{G}}_{s}$ approaches the true safe set $\mathcal{S}$ and ${\mathcal{G}}_{\mathrm{\neg}s}$ approaches the true unsafe set $\mathcal{A}$, respectively. Consistent with Theorem 5.1, our method has perfect accuracy in ${\mathcal{G}}_{\mathrm{\neg}s}$ and ${\mathcal{G}}_{s}$. Here, the NN struggles more compared to the arm examples since due to the more constrained dynamics, fewer unsafe trajectories can be sampled, and a parameterization needs to be leveraged in order to say more about the unsafe set. The remaining columns of Fig. 6.4 show that we recover ${\mathcal{G}}_{\mathrm{\neg}s}$ and ${\mathcal{G}}_{s}$ exactly (the true safe set is outlined in blue).
6.3 Planar pushing example
In this section, using the FetchPushv1 environment in OpenAI Gym [openai], we aim to learn a 2D box unsafe set on the centerofmass (CoM) of a block pushed by the Fetch arm (see Fig. 6.5) using two demonstrations. Here, the dynamics of the block CoM are not known in closed form, but rollouts can still be sampled using the simulator. Since the block CoM is highly underactuated, it is not possible to sample short subtrajectories. Thus, without leveraging a parameterization, the constraint recovery problem is very illposed. Furthermore, while our method can explicitly consider the unsafeness in longer unsafe trajectories (at least one state is unsafe), the NN struggles with this example as it fails to accurately model that fact. Overall, Fig. 6.5 presents that ${\mathcal{G}}_{\mathrm{\neg}s}$/${\mathcal{G}}_{s}$ match up well with $\mathcal{A}$/$\mathcal{S}$, and our classification accuracy for safeness/unsafeness is perfect across demonstrations.
7 Discussion and Conclusion
In this paper, we present a method capable of learning parametric constraints in highdimensional spaces with and without known parameterizations. We also present a method for extracting volumes of guaranteed safe and guaranteed unsafe states, information which can be directly used in a planner to enforce safety constraints. We analyze our algorithm, showing that these recovered guaranteed safe/unsafe states are truly safe/unsafe under mild assumptions. We evaluate the method by learning a variety of constraints defined in highdimensional spaces for systems with highdimensional dynamics. One shortcoming of our work is scalability with the amount of data, due to the number of integer variables growing linearly with the number of safe/unsafe trajectories. As a result, learning constraints without extensive sampling of unsafe trajectories is a direction of future work.
Acknowledgments
This work was supported in part by a National Defense Science and Engineering Graduate (NDSEG) Fellowship, Office of Naval Research (ONR) grants N000141812501 and N000141712050, and National Science Foundation (NSF) grants ECCS1553873 and IIS1750489.
References
Appendix A Detailed algorithm block
Appendix B Extraction of ${\mathcal{G}}_{s}$ and ${\mathcal{G}}_{\mathrm{\neg}s}$
In this section, we discuss specific ways of extracting sets of guaranteed safe/unsafe states for axisaligned hyperrectangles (this method is used for all numerical examples in Section 6.2 and Section 6.3) and for convex parameterizations.
B.1 Axisaligned hyperrectangle parameterization
In this parameterization, $\mathcal{C}\subseteq {\mathbb{R}}^{n}$, $\theta =[{\underset{\xaf}{k}}_{1},{\overline{k}}_{1},\mathrm{\dots},{\underset{\xaf}{k}}_{n},{\overline{k}}_{n}]$, and $g(k,\theta )\le 0\iff H(\theta )k\le h(\theta )$, where $H(\theta )k={[{I}_{n\times n},{I}_{n\times n}]}^{\top}$ and $h(\theta )={[{\overline{k}}_{1},\mathrm{\dots},{\overline{k}}_{n},{\underset{\xaf}{k}}_{1},\mathrm{\dots},{\underset{\xaf}{k}}_{n}]}^{\top}$. Here, ${\underset{\xaf}{k}}_{i}$ and ${\overline{k}}_{i}$ are the lower and upper bounds of the hyperrectangle for coordinate $i$.
As the set of axisaligned hyperrectangles is closed under intersection, ${\mathcal{G}}_{\mathrm{\neg}s}$ is also an axisaligned hyperrectangle, the axisaligned bounding box of any two constraint states ${k}_{1},{k}_{2}\in {\mathcal{G}}_{\mathrm{\neg}s}$ is also contained in ${\mathcal{G}}_{\mathrm{\neg}s}$. This also implies that ${\mathcal{G}}_{\mathrm{\neg}s}$ can be fully described by finding the top and bottom corners ${[{\underset{\xaf}{k}}_{1},\mathrm{\dots},{\underset{\xaf}{k}}_{n}]}^{\top}$ and ${[{\overline{k}}_{1},\mathrm{\dots},{\overline{k}}_{n}]}^{\top}$. Suppose we start with a known $k\in {\mathcal{G}}_{\mathrm{\neg}s}$. Then, finding ${[{\underset{\xaf}{k}}_{1},\mathrm{\dots},{\underset{\xaf}{k}}_{n}]}^{\top}$ amounts to performing a binary search for each of the $n$ dimensions, and the same holds for finding ${[{\overline{k}}_{1},\mathrm{\dots},{\overline{k}}_{n}]}^{\top}$.
Recovering ${\mathcal{G}}_{s}$ is not as straightforward, as the complement of axisaligned boxes is not closed under intersection. While we can still solve Problem 4 to recover ${\mathcal{G}}_{s}$, an inner approximation of ${\mathcal{G}}_{s}$ can be more efficiently obtained: starting at a constraint state $k\in {\mathcal{G}}_{\mathrm{\neg}s}$, $2n$ line searches can be performed to find the two points of transition to ${\mathcal{G}}_{\mathrm{\neg}s}$ in each constraint coordinate. Denote as ${\widehat{\mathcal{G}}}_{s}$ the complement of the axisaligned bounding box of these $2n$ points; ${\widehat{\mathcal{G}}}_{s}$ is an inner approximation of ${\mathcal{G}}_{s}$, as ${\mathcal{G}}_{s}={({\bigcap}_{\theta \in \mathcal{F}}\{xg(x,\theta )\le 0\})}^{c}\supseteq \text{AABB}{({\bigcap}_{\theta \in \mathcal{F}}\{xg(x,\theta )\le 0\})}^{c}$, where $\text{AABB}(\cdot )$ denotes the axisaligned bounding box of a set of points. For example, consider the scenario in Fig. B.1 where there are only two feasible parameters, ${\theta}_{1}$ and ${\theta}_{2}$. Here, ${\mathcal{G}}_{s}$ is ${(\mathcal{A}({\theta}_{1})\cup \mathcal{A}({\theta}_{2}))}^{c}$ and ${\widehat{\mathcal{G}}}_{s}$ underapproximates the safe set (${\mathcal{G}}_{s}$ is in general not representable as the complement of an axisaligned box).
B.2 Convex parameterization
In this parameterization, for fixed $\theta $, $\{kg(k,\theta )\le 0\}$ is convex.
While apart from solving Problem 4 it is hard to recover ${\mathcal{G}}_{\mathrm{\neg}s}$ exactly, an inner approximation of ${\mathcal{G}}_{\mathrm{\neg}s}$ can be extracted more efficiently by taking the convex hull of any ${k}_{1},{k}_{2},\mathrm{\dots}\in {\mathcal{G}}_{\mathrm{\neg}s}$, as the convex hull is the minimal convex set containing ${k}_{1},{k}_{2},\mathrm{\dots}$.
The same approaches apply for recovering ${\mathcal{G}}_{s}$ when it is instead the safe set which is an axisaligned hyperrectangle or a convex set.
Appendix C Theoretical Analysis (Expanded)
In this section, we present theoretical analysis on our parametric constraint learning algorithm. In particular, we analyze the limits of what constraint states can be learned guaranteed unsafe/safe (Section C.1) as well as the conditions under which our algorithm is guaranteed to learn a conservative estimate of the safe and unsafe sets (Section C.2). For ease of reading, we repeat the theorem statements from the main body (the corresponding theorem numbers from the main body are listed in the theorem statement). We develop the theory for $\mathcal{C}=\mathcal{X}$ for legibility, but the results can be easily extended to general $\mathcal{C}$.
C.1 Learnability
In this section, we develop results for learnability of the unsafe set in the parametric case. We begin with the following notation:
Definition C.1 (Signed distance).
Signed distance from point $p\mathrm{\in}{\mathrm{R}}^{m}$ to set $\mathrm{S}\mathrm{\subseteq}{\mathrm{R}}^{m}$, $\text{\U0001d5cc\U0001d5bd}\mathit{}\mathrm{(}p\mathrm{,}\mathrm{S}\mathrm{)}\mathrm{=}\mathrm{}{\mathrm{inf}}_{y\mathrm{\in}\mathrm{\partial}\mathit{}\mathrm{S}}\mathit{}\mathrm{\parallel}p\mathrm{}y\mathrm{\parallel}$ if $p\mathrm{\in}\mathrm{S}$; ${\mathrm{inf}}_{y\mathrm{\in}\mathrm{\partial}\mathit{}\mathrm{S}}\mathit{}\mathrm{\parallel}p\mathrm{}y\mathrm{\parallel}$ if $p\mathrm{\in}{\mathrm{S}}^{c}$.
Definition C.2 ($\mathrm{\Delta}x$shell).
For a discrete time system satisfying $\mathrm{\parallel}{x}_{t\mathrm{+}\mathrm{1}}\mathrm{}{x}_{t}\mathrm{\parallel}\mathrm{\le}\mathrm{\Delta}\mathit{}x$ for all $t$, denote the $\mathrm{\Delta}\mathit{}x$ shell of the unsafe set as: ${\mathrm{A}}_{\mathrm{\Delta}\mathit{}x}\mathrm{\doteq}\mathrm{\{}x\mathrm{\in}\mathrm{A}\mathrm{}\mathrm{}\mathrm{\Delta}\mathit{}x\mathrm{\le}\text{\U0001d5cc\U0001d5bd}\mathit{}\mathrm{(}x\mathrm{,}\mathrm{A}\mathrm{)}\mathrm{\le}\mathrm{0}\mathrm{\}}$.
Definition C.3 (Implied unsafe set).
For some set $\mathrm{B}\mathrm{\subseteq}\mathrm{\Theta}$, denote $I\mathit{}\mathrm{(}\mathrm{B}\mathrm{)}\mathrm{\doteq}{\mathrm{\bigcap}}_{\theta \mathrm{\in}\mathrm{B}}\mathrm{\{}x\mathrm{}g\mathit{}\mathrm{(}x\mathrm{,}\theta \mathrm{)}\mathrm{\le}\mathrm{0}\mathrm{\}}$ as the set of states that are implied unsafe by restricting the parameter set to $\mathrm{B}$. In words, $I\mathit{}\mathrm{(}\mathrm{B}\mathrm{)}$ is the set of states for which all $\theta \mathrm{\in}\mathrm{B}$ mark as unsafe.
Definition C.4 (Feasible set $\mathcal{F}$).
Denote as $\mathrm{F}$ the feasible set of Problem 2 with ${N}_{s}$ demonstrations and ${N}_{\mathrm{\neg}\mathit{}s}$ unsafe trajectories sampled using the hitandrun method presented in Section 4.1:
$\mathcal{F}=\{\theta $  $\forall i\in \{1,\mathrm{\dots},{N}_{s}\},\forall x\in {\xi}_{i}^{*},g(x,\theta )>0,$  
$\forall j\in \{1,\mathrm{\dots},{N}_{\mathrm{\neg}s}\},\exists x\in {\xi}_{j},g(x,\theta )\le 0\}.$ 
Definition C.5 (Learnability and learnable set ${\mathcal{G}}_{\mathrm{\neg}s}^{*}$).
A state $x\mathrm{\in}\mathrm{A}$ is learnable if there exists any set of ${N}_{s}$ demonstrations and ${N}_{\mathrm{\neg}\mathit{}s}$ unsafe trajectories sampled using the hitandrun method presented in Section 4.1, where ${N}_{s}$ and ${N}_{\mathrm{\neg}\mathit{}s}$ may be infinite, such that $x\mathrm{\in}\mathrm{I}\mathit{}\mathrm{(}\mathrm{F}\mathrm{)}$. Accordingly, we define the learnable set of unsafe states ${\mathrm{G}}_{\mathrm{\neg}\mathit{}s}^{\mathrm{*}}$ as the union of all learnable states. Note that by this definition, a state ${x}_{s}\mathrm{\in}\mathrm{S}$ is always learnable, since there always exists some safe demonstration passing through ${x}_{s}$.
Lemma C.1.
Suppose $\mathrm{B}\mathrm{\subseteq}\widehat{\mathrm{B}}$, for some other set $\widehat{\mathrm{B}}$. Then, $I\mathit{}\mathrm{(}\widehat{\mathrm{B}}\mathrm{)}\mathrm{\subseteq}I\mathit{}\mathrm{(}\mathrm{B}\mathrm{)}$.
Proof.
By definition,
$I(\widehat{\mathcal{B}})$  $={\displaystyle \bigcap _{\theta \in \widehat{\mathcal{B}}}}\{xg(x,\theta )\le 0\}$  
$={\displaystyle \bigcap _{\theta \in \left(\mathcal{B}\cup (\widehat{\mathcal{B}}\setminus \mathcal{B})\right)}}\{xg(x,\theta )\le 0\}$  
$\subseteq {\displaystyle \bigcap _{\theta \in \mathcal{B}}}\{xg(x,\theta )\le 0\}$  
$=I(\mathcal{B}).$ 
∎
Lemma C.2.
Each unsafe trajectory ${\xi}_{j}$ with start and goal states in the safe set contains at least one state in the $\mathrm{\Delta}\mathit{}x$shell ${\mathrm{A}}_{\mathrm{\Delta}\mathit{}x}$: $\mathrm{\forall}j\mathrm{\in}\mathrm{\{}\mathrm{1}\mathrm{,}\mathrm{\dots}\mathrm{,}{N}_{\mathrm{\neg}\mathit{}s}\mathrm{\}}\mathrm{,}\mathrm{\exists}x\mathrm{\in}{\xi}_{j}\mathrm{,}x\mathrm{\in}{\mathrm{A}}_{\mathrm{\Delta}\mathit{}x}$.
Proof.
For each unsafe trajectory ${\xi}_{j}$ with start and goal states in the safe set, there exists $x\in {\xi}_{j},x\in \mathcal{A}$. Further, if there exists $x\in {\xi}_{j}\in (\mathcal{A}\setminus {\mathcal{A}}_{\mathrm{\Delta}x})$, then there also exists $x\in {\xi}_{j}\in {\mathcal{A}}_{\mathrm{\Delta}x}$. For contradiction, suppose there exists a time $\widehat{t}\in \{1,\mathrm{\dots},{T}_{j}\}$ for which ${\xi}_{j}(\widehat{t})\in (\mathcal{A}\setminus {\mathcal{A}}_{\mathrm{\Delta}x})$ and $\mathrm{\nexists}t\in \{1,\mathrm{\dots},{T}_{j}\}$ for which ${\xi}_{j}(t)\in {\mathcal{A}}_{\mathrm{\Delta}x}$. But this implies $$ or $\exists t>\widehat{t},\parallel \xi (t)\xi (t1)\parallel >\mathrm{\Delta}x$, i.e. to skip deeper than $\mathrm{\Delta}x$ into the unsafe set without first entering the $\mathrm{\Delta}x$ shell, the state must have changed by more than $\mathrm{\Delta}x$ in a single timestep. Contradiction. An analogous argument holds for the continuoustime case. ∎
The following result states that in discrete time, the learnable set of unsafe states ${\mathcal{G}}_{\mathrm{\neg}s}^{*}$ is contained by the set of states which must be implied unsafe by setting ${\mathcal{A}}_{\mathrm{\Delta}x}$ as unsafe. Furthermore, in continuous time, the same holds, except the ${\mathcal{A}}_{\mathrm{\Delta}x}$ is replaced by the boundary of the unsafe set, $\partial \mathcal{A}$.
Theorem C.1 (Discrete time learnability for parametric constraints).
For trajectories generated by discrete time systems, ${\mathrm{G}}_{\mathrm{\neg}\mathit{}s}\mathrm{\subseteq}{\mathrm{G}}_{\mathrm{\neg}\mathit{}s}^{\mathrm{*}}\mathrm{\subseteq}I\mathit{}\mathrm{(}{\mathrm{F}}_{\mathrm{\Delta}\mathit{}x}\mathrm{)}$, where
${\mathcal{F}}_{\mathrm{\Delta}x}=\{\theta $  $\forall i\in \{1,\mathrm{\dots},{N}_{s}\},\forall x\in {\xi}_{i}^{*},g(x,\theta )>0,\forall x\in {\mathcal{A}}_{\mathrm{\Delta}x},g(x,\theta )\le 0\}.$ 
Proof.
Recall that ${\mathcal{G}}_{\mathrm{\neg}s}\doteq {\bigcap}_{\theta \in \mathcal{F}}\{xg(x,\theta )\le 0\}$, where as previously defined, $\mathcal{F}$ is the feasible set of Problem 2. We can then show that ${\mathcal{F}}_{\mathrm{\Delta}x}\subseteq \mathcal{F}$, since enforcing that $g(x,\theta )\le 0$ for all $x\in {\mathcal{A}}_{\mathrm{\Delta}x}$ implies that there exists $x\in {\xi}_{j}$, for all $j\in \{1,\mathrm{\dots},{N}_{\mathrm{\neg}s}\}$ such that $g(x,\theta )\le 0$, via Lemma C.2. Then, via Lemma C.1, ${\mathcal{G}}_{\mathrm{\neg}s}=I(\mathcal{F})\subseteq I({\mathcal{F}}_{\mathrm{\Delta}x})$. As this holds for any arbitrary set of trajectories, ${\mathcal{G}}_{\mathrm{\neg}s}^{*}\subseteq I({\mathcal{F}}_{\mathrm{\Delta}x})$ as well, and ${\mathcal{G}}_{\mathrm{\neg}s}\subseteq {\mathcal{G}}_{\mathrm{\neg}s}^{*}$. ∎
Corollary C.1 (Continuoustime learnability for parametric constraints).
For trajectories generated by continuous time systems, ${\mathrm{G}}_{\mathrm{\neg}\mathit{}s}\mathrm{\subseteq}{\mathrm{G}}_{\mathrm{\neg}\mathit{}s}^{\mathrm{*}}\mathrm{\subseteq}I\mathit{}\mathrm{(}{\mathrm{F}}_{\mathrm{\partial}\mathit{}\mathrm{A}}\mathrm{)}$, where
${\mathcal{F}}_{\partial \mathcal{A}}=\{\theta $  $\forall x\in {\xi}_{i}^{*},\forall i\in \{1,\mathrm{\dots},{N}_{s}\},g(x,\theta )>0,\forall x\in \partial \mathcal{A},g(x,\theta )\le 0\}.$ 
Proof.
Since going from discrete time to continuous time implies $\mathrm{\Delta}x\to 0$, ${\mathcal{A}}_{\mathrm{\Delta}x}\to \partial \mathcal{A}$. Then, the logic from the proof of Theorem C.1 can be similarly applied to show the result. ∎
C.2 Conservativeness: Parametric
We write conditions for conservative recovery of the unsafe set and safe set when solving Problems 2 and 3 for discrete time and continuous time systems.
Theorem C.2 (Conservativeness: Known parameterization (Theorem 5.1 in the main body) ).
Suppose the parameterization $g\mathit{}\mathrm{(}x\mathrm{,}\theta \mathrm{)}$ is known exactly. Then, for a discretetime system, extracting ${\mathrm{G}}_{\mathrm{\neg}\mathit{}s}$ and ${\mathrm{G}}_{s}$ (as defined in (3) and (4), respectively) from the feasible set of Problem 2 returns ${\mathrm{G}}_{\mathrm{\neg}\mathit{}s}\mathrm{\subseteq}\mathrm{A}$ and ${\mathrm{G}}_{s}\mathrm{\subseteq}\mathrm{S}$. Further, if the known parameterization is $H\mathit{}\mathrm{(}\theta \mathrm{)}\mathit{}{x}_{i}\mathrm{\le}h\mathit{}\mathrm{(}\theta \mathrm{)}$ and $M$ in Problem 3 is chosen to be greater than
$$\mathrm{max}(\underset{{x}_{i}\in {\xi}_{s}}{\mathrm{max}}\underset{\theta}{\mathrm{max}}\underset{j}{\mathrm{max}}{(H(\theta ){x}_{i}h(\theta ))}_{j},\underset{{x}_{i}\in {\xi}_{\mathrm{\neg}s}}{\mathrm{max}}\underset{\theta}{\mathrm{max}}\underset{j}{\mathrm{max}}{(H(\theta ){x}_{i}h(\theta ))}_{j}),$$ 
then extracting ${\mathrm{G}}_{\mathrm{\neg}\mathit{}s}$ and ${\mathrm{G}}_{s}$ from the feasible set of Problem 3 recovers ${\mathrm{G}}_{\mathrm{\neg}\mathit{}s}\mathrm{\subseteq}\mathrm{A}$ and ${\mathrm{G}}_{s}\mathrm{\subseteq}\mathrm{S}$.
Proof.
We first prove that ${\mathcal{G}}_{\mathrm{\neg}s}\subseteq \mathcal{A}$. Consider first the case of Problem 2, or equivalently the case of Problem 3 where $M=\mathrm{\infty}$ (in this case, Problem 3 exactly enforces that at least one state in each unsafe trajectory is unsafe and all states on demonstrations are safe).
Suppose for contradiction that there exists some $x\in {\mathcal{G}}_{\mathrm{\neg}s},x\notin \mathcal{A}$. By definition of ${\mathcal{G}}_{\mathrm{\neg}s}$, $g(x,\theta )\le 0$, for all $\theta \in \mathcal{F}$, where $\mathcal{F}$ is the feasible set of parameters $\theta $ in Problem 2. However, as $x\notin \mathcal{A}$, but for all $\theta \in \mathcal{F},g(x,\theta )\le 0$ we know that ${\theta}_{\mathcal{A}}\notin \mathcal{F}$, where ${\theta}_{\mathcal{A}}$ is the parameter associated with the true unsafe set $\mathcal{A}$. However, $\mathcal{F}$ will always contain ${\theta}_{\mathcal{A}}$, since:

•
${\theta}_{\mathcal{A}}$ satisfies $g(x,{\theta}_{\mathcal{A}})>0$ for all $x$ in safe demonstrations, since all demonstrations are safe with respect to the true ${\theta}_{\mathcal{A}}$.

•
For each trajectory ${\xi}_{\mathrm{\neg}s}$ sampled using the hitandrun procedure in Section 4.1, there exists $x\in {\xi}_{\mathrm{\neg}s}$ such that $g(x,{\theta}_{\mathcal{A}})\le 0$.
We come to a contradiction, and hence for Problem 2 and for Problem 3 where $M=\mathrm{\infty}$, ${\mathcal{G}}_{\mathrm{\neg}s}\subseteq \mathcal{A}$.
Now, we consider the conditions on $M$ such that choosing $M\ge \text{const}$ or $M=\mathrm{\infty}$ causes no changes in the solution of Problem 3. $M$ must be chosen such that 1) $H(\theta ){x}_{i}h(\theta )>M\mathrm{\U0001d7cf}\iff H(\theta ){x}_{i}h(\theta )>\mathrm{\infty}\mathrm{\U0001d7cf}$, for all safe states ${x}_{i}\in {\xi}_{s}$, and 2) $H(\theta ){x}_{i}h(\theta )\le M\mathrm{\U0001d7cf}\iff H(\theta ){x}_{i}h(\theta )\le \mathrm{\infty}\mathrm{\U0001d7cf}$ for all states ${x}_{i}$ on unsafe trajectories ${\xi}_{\mathrm{\neg}s}$. Condition 1 is met if $$, where ${v}_{j}$ denotes the $j$th element of vector $v$; denote as ${M}_{1}$ an $M$ which satisfies this inequality. Condition 2 is met if $M\ge {\mathrm{max}}_{{x}_{i}\in {\xi}_{\mathrm{\neg}s}}{\mathrm{max}}_{\theta}{\mathrm{max}}_{j}{(H(\theta ){x}_{i}h(\theta ))}_{j}$; denote as ${M}_{2}$ an $M$ which satisfies this inequality. Then, $M$ should be chosen to satisfy $M>\mathrm{max}({M}_{1},{M}_{2})$.
The proof that ${\mathcal{G}}_{s}\subseteq \mathcal{S}$ is analogous. If there exists $x\in {\mathcal{G}}_{s},x\notin \mathcal{S}$, $g(x,\theta )>0$, for all $\theta \in \mathcal{F}$, then ${\theta}_{\mathcal{A}}\notin \mathcal{F}$. We follow the same reasoning from before to show that ${\theta}_{\mathcal{A}}\in \mathcal{F}$ for $M=\mathrm{\infty}$. Now, provided the condition on $M$ holds, we reach a contradiction. ∎
Remark.
A simple corollary from Theorem C.2 is that by solving Problem 4 repeatedly for different query centers ${x}_{\text{query}}$ for a discretetime system and unioning over the resulting volumes will also provide conservative estimates of ${\mathcal{G}}_{s}$ and ${\mathcal{G}}_{\mathrm{\neg}s}$. Further, if the assumption on $M$ holds, then the volume extraction analogue of Problem 3 will also return conservative estimates of ${\mathcal{G}}_{s}$ and ${\mathcal{G}}_{\mathrm{\neg}s}$.
As discussed in [extended_version], with continuoustime system dynamics, assigning unsafeness in lowercost trajectories difficult since there are an infinite number of states on the continuous trajectory. To ameliorate this, as in [extended_version], we timediscretize the sampled lowercost trajectories and feed the resulting discretetime trajectories into Problems 2 and 3. This can potentially cause a mild overapproximation of the unsafe set, which we quantify after introducing some notation.
Definition C.6 (Normal vectors).
Denote the outwardpointing normal vector at a point $p\mathrm{\in}\mathrm{\partial}\mathit{}\mathrm{A}$ as $\widehat{n}\mathit{}\mathrm{(}p\mathrm{)}$. Furthermore, at nondifferentiable points on $\mathrm{\partial}\mathit{}\mathrm{A}$, $\widehat{n}\mathit{}\mathrm{(}p\mathrm{)}$ is replaced by the set of normal vectors for the subgradient of the Lipschitz function describing $\mathrm{\partial}\mathit{}\mathrm{A}$ at that point ([thickness]).
Definition C.7 ($\gamma $offset padding).
Define the $\gamma $offset padding $\mathrm{\partial}\mathit{}{\mathrm{A}}_{\gamma}$ as: $\mathrm{\partial}\mathit{}{\mathrm{A}}_{\gamma}\mathrm{=}\mathrm{\{}x\mathrm{\in}\mathrm{X}\mathrm{}x\mathrm{=}y\mathrm{+}d\mathit{}\widehat{n}\mathit{}\mathrm{(}y\mathrm{)}\mathrm{,}d\mathrm{\in}\mathrm{[}\mathrm{0}\mathrm{,}\gamma \mathrm{]}\mathrm{,}y\mathrm{\in}\mathrm{\partial}\mathit{}\mathrm{A}\mathrm{\}}$.
Definition C.8 ($\gamma $padded set).
We define the $\gamma $padded set of the unsafe set $\mathrm{A}$, $\mathrm{A}\mathit{}\mathrm{(}\gamma \mathrm{)}$, as the union of the $\gamma $offset padding and $\mathrm{A}$: $\mathrm{A}\mathit{}\mathrm{(}\gamma \mathrm{)}\mathrm{\doteq}\mathrm{\partial}\mathit{}{\mathrm{A}}_{\gamma}\mathrm{\cup}\mathrm{A}$.
Definition C.9 (Maximum distance on trajectories).
Denote ${D}_{\xi}\mathit{}\mathrm{(}\mathrm{[}a\mathrm{,}b\mathrm{]}\mathrm{)}\mathrm{\doteq}{\mathrm{sup}}_{{t}_{\mathrm{1}}\mathrm{\in}\mathrm{[}a\mathrm{,}b\mathrm{]}\mathrm{,}{t}_{\mathrm{2}}\mathrm{\in}\mathrm{[}{t}_{\mathrm{1}}\mathrm{,}b\mathrm{]}}\mathit{}{\mathrm{\parallel}\xi \mathit{}\mathrm{(}{t}_{\mathrm{1}}\mathrm{)}\mathrm{}\xi \mathit{}\mathrm{(}{t}_{\mathrm{2}}\mathrm{)}\mathrm{\parallel}}_{\mathrm{2}}$, for some trajectory $\xi $. Denote ${D}^{\mathrm{*}}\mathrm{\doteq}{\mathrm{max}}_{i\mathrm{\in}\mathrm{\{}\mathrm{1}\mathrm{,}\mathrm{\dots}\mathrm{,}{N}_{\mathrm{\neg}\mathit{}s}\mathrm{\}}}\mathit{}{D}_{{\xi}_{i}}^{\mathrm{*}}\mathit{}\mathrm{(}\mathrm{[}{a}_{i}\mathrm{,}{b}_{i}\mathrm{]}\mathrm{)}$. In words, ${D}_{\xi}\mathit{}\mathrm{(}\mathrm{[}a\mathrm{,}b\mathrm{]}\mathrm{)}$ is the maximum distance between any two points on trajectory $\xi $ from time $a$ to time $b$, and ${D}^{\mathrm{*}}$ takes the maximum distance over all ${N}_{\mathrm{\neg}\mathit{}s}$ trajectories.
Lemma C.3 (Maximum distance).
Consider a continuous time trajectory $\xi \mathrm{:}\mathrm{[}\mathrm{0}\mathrm{,}T\mathrm{]}\mathrm{\to}\mathrm{X}$. Suppose it is known that in some time interval $\mathrm{[}a\mathrm{,}b\mathrm{]}\mathrm{,}a\mathrm{\le}b\mathrm{,}a\mathrm{,}b\mathrm{\in}\mathrm{[}\mathrm{0}\mathrm{,}T\mathrm{]}$, $\xi $ is unsafe; denote this subsegment as $\xi \mathit{}\mathrm{(}\mathrm{[}a\mathrm{,}b\mathrm{]}\mathrm{)}$. Consider any $t\mathrm{\in}\mathrm{[}a\mathrm{,}b\mathrm{]}$. Then, the signed distance from $\xi \mathit{}\mathrm{(}t\mathrm{)}$ to the unsafe set, $\text{\U0001d5cc\U0001d5bd}\mathit{}\mathrm{(}\xi \mathit{}\mathrm{(}t\mathrm{)}\mathrm{,}\mathrm{A}\mathrm{)}$, is bounded by ${D}_{\xi}\mathit{}\mathrm{(}\mathrm{[}a\mathrm{,}b\mathrm{]}\mathrm{)}\mathrm{\doteq}{\mathrm{sup}}_{{t}_{\mathrm{1}}\mathrm{\in}\mathrm{[}a\mathrm{,}b\mathrm{]}\mathrm{,}{t}_{\mathrm{2}}\mathrm{\in}\mathrm{[}{t}_{\mathrm{1}}\mathrm{,}b\mathrm{]}}\mathit{}{\mathrm{\parallel}\xi \mathit{}\mathrm{(}{t}_{\mathrm{1}}\mathrm{)}\mathrm{}\xi \mathit{}\mathrm{(}{t}_{\mathrm{2}}\mathrm{)}\mathrm{\parallel}}_{\mathrm{2}}$.
Proof.
Since there exists $\stackrel{~}{t}\in [a,b]$ such that $\xi (\stackrel{~}{t})\in \mathcal{A}$, ${sup}_{t\in [a,b]}\text{\U0001d5cc\U0001d5bd}(\xi (t),\mathcal{A})={sup}_{t\in [a,b]}\text{\U0001d5cc\U0001d5bd}(\xi (t),\xi (\stackrel{~}{t}))\le {sup}_{{t}_{1}\in [a,b],{t}_{2}\in [{t}_{1},b]}{\parallel \xi ({t}_{1})\xi ({t}_{2})\parallel}_{2}$. ∎
Corollary C.2.
For a continuoustime system where demonstrations and sampled unsafe trajectories are timediscretized, if $M$ is chosen as in Theorem C.2, ${\mathrm{G}}_{s}\mathrm{\subseteq}\mathrm{S}$, where $\mathrm{S}$ is the safe set, and ${\mathrm{G}}_{\mathrm{\neg}\mathit{}s}\mathrm{\subseteq}\mathrm{A}\mathit{}\mathrm{(}{D}^{\mathrm{*}}\mathrm{)}$, where ${D}^{\mathrm{*}}$ is as defined in Definition C.9.
Proof.
The reasoning for ${\mathcal{G}}_{s}\subseteq \mathcal{S}$ follows from the proof of ${\mathcal{G}}_{\mathrm{\neg}s}\subseteq \mathcal{A}$ in the proof of Theorem C.2.
Now we prove ${\mathcal{G}}_{\mathrm{\neg}s}\subseteq \mathcal{A}({D}^{*})$. Suppose in this case, there exists a state $x={\xi}_{j}({t}_{i})\notin \mathcal{A}$ which is truly safe but lies on a sampled unsafe trajectory ${\xi}_{j}([{a}_{j},{b}_{j}])$, and suppose that $\{{t}_{1},\mathrm{\dots},{t}_{N}\}$ is chosen such that for all $k\in \{1,\mathrm{\dots},N\}\setminus \{i\}$, ${\xi}_{j}({t}_{k})$ belongs to a known safe cell. Then, we may incorrectly learn that ${\xi}_{j}({t}_{i})$ is unsafe, as we force at least one point in the sampled trajectory to be unsafe. Via Lemma C.3, we know that ${\xi}_{j}({t}_{i})$ is at most ${D}_{{\xi}_{j}}([{a}_{j},{b}_{j}])$ signed distance away from $\mathcal{A}$. Hence, for this trajectory, any learned guaranteed unsafe state must be contained in the ${D}_{{\xi}_{j}}([{a}_{j},{b}_{j}])$padded unsafe set. For this to hold for all unsafe trajectories sampled with the hitandrun procedure presented in Section 4.1, we must pad the unsafe set by ${D}^{*}$. Hence, under this assumption, the algorithm returns a conservative estimate of the ${D}^{*}$padded unsafe set. ∎
Let’s consider the case where the true parameterization is not known and we use the method described in Section 4.4, where ${g}_{s}(x,\theta )$ is the simple parameterization. We consider the underparameterized case (Theorem 5.3) and the overparameterized case (Theorem 5.2). In particular, we analyze the case where the true parameterization, the underparameterization, and the overparameterization are defined respectively as:
$$g(x,\theta )\le 0\iff \underset{i=1}{\overset{{N}^{*}}{\bigvee}}({g}_{s}(x,{\theta}_{i})\le 0)$$  (9) 
$$  (10) 
$$g(x,\theta )\le 0\iff \underset{i=1}{\overset{\overline{N}}{\bigvee}}({g}_{s}(x,{\theta}_{i})\le 0),\overline{N}>{N}^{*}.$$ (11)
Theorem C.3 (Conservativeness: Overparameterization (Theorem 5.2 in the main body) ).
Proof.
Note that (9) is equivalent to $\left({\bigvee}_{i=1}^{\overline{N}}({g}_{s}(x,{\theta}_{i})\le 0)\right)$, where ${\theta}_{{N}^{*}+1},\mathrm{\dots},{\theta}_{\overline{N}}$ are constrained to satisfy $\{x\mid {g}_{s}(x,{\theta}_{i})\le 0\}=\mathrm{\varnothing},i={N}^{*}+1,\mathrm{\dots},\overline{N}$. Thus, the true $\theta $ is equivalent to adding additional constraints on a loosened parameterization (the overparameterization). Let $\widehat{\mathcal{F}}$ be the feasible set of Problem 2 with $\theta $ loosened as above, i.e. $\mathcal{F}=\widehat{\mathcal{F}}\cap \{\theta \mid \{x\mid {g}_{s}(x,{\theta}_{i})\le 0\}=\mathrm{\varnothing},i={N}^{*}+1,\mathrm{\dots},\overline{N}\}$. Via Lemma C.1, $\mathcal{F}\subseteq \widehat{\mathcal{F}}$; thus, ${I}_{\mathrm{\neg}s}(\widehat{\mathcal{F}})\subseteq {I}_{\mathrm{\neg}s}(\mathcal{F})\subseteq \mathcal{A}$, where the last set containment follows from Theorem 5.1. Vice versa, ${I}_{s}(\widehat{\mathcal{F}})\subseteq {I}_{s}(\mathcal{F})\subseteq \mathcal{S}$, where again the last set containment follows from Theorem 5.1. ∎
Theorem C.4 (Conservativeness: Underparameterization (Theorem 5.3 in the main body) ).
Suppose the true parameterization and underparameterization are defined as in (9) and (10). Furthermore, assume that we incrementally grow the parameterization as described in Section 4.4. Then, the following are true:

1.
${\mathcal{G}}_{\mathrm{\neg}s}$ and ${\mathcal{G}}_{s}$ are not guaranteed to be contained in $\mathcal{A}$ (unsafe set) and $\mathcal{S}$ (safe set), respectively.

2.
Each recovered simple unsafe set $\mathcal{A}({\theta}_{i})$, $i=1,\mathrm{\dots},\underset{\xaf}{N}$, for any ${\theta}_{1},\mathrm{\dots},{\theta}_{\underset{\xaf}{N}}\in \mathcal{F}$, touches the true unsafe set (there are no spurious simple unsafe sets): for $i=1,\mathrm{\dots},\underset{\xaf}{N}$, for ${\theta}_{1},\mathrm{\dots},{\theta}_{\underset{\xaf}{N}}\in \mathcal{F}$, $\mathcal{A}({\theta}_{i})\cap \mathcal{A}\ne \mathrm{\varnothing}$ ($\underset{\xaf}{N}$ is as defined in Section 4.4).
Proof.

1.
We first formally prove the statement with a counterexample and then follow up with logic related to the proof of Theorem C.3.
Consider the example in Fig. C.1, where the parameterization is chosen as a single axisaligned box ${[{I}_{2\times 2},{I}_{2\times 2}]}^{\top}x\le \theta $ but $\mathcal{A}$ is only representable with at least two boxes. Suppose demonstrations are provided which imply that $({a}_{l},{b}_{l})$ and $({a}_{u},{b}_{u})$ are unsafe; then AABB$(\{({a}_{l},{b}_{l}),({a}_{u},{b}_{u})\})\u2288\mathcal{A}$ is implied unsafe.
Note that (10) is equivalent to $\left({\bigvee}_{i=1}^{{N}^{*}}({g}_{s}(x,{\theta}_{i})\le 0)\right)$, where ${\theta}_{\underset{\xaf}{N}+1},\mathrm{\dots},{\theta}_{{N}^{*}}$ are constrained to satisfy $\{x\mid {g}_{s}(x,{\theta}_{i})\le 0\}=\mathrm{\varnothing},i=\underset{\xaf}{N}+1,\mathrm{\dots},{N}^{*}$. Thus, restricting the parameterization is equivalent to adding additional constraints on the true $\theta $. Let $\widehat{\mathcal{F}}$ be the feasible set of Problem 2 with $\theta $ restricted as above, i.e. $\widehat{\mathcal{F}}=\mathcal{F}\cap \{\theta \mid \{x\mid {g}_{s}(x,{\theta}_{i})\le 0\}=\mathrm{\varnothing},i=\underset{\xaf}{N}+1,\mathrm{\dots},{N}^{*}\}$. Via Lemma C.1, $\widehat{\mathcal{F}}\subseteq \mathcal{F}$; thus, ${I}_{\mathrm{\neg}s}(\mathcal{F})\subseteq {I}_{\mathrm{\neg}s}(\widehat{\mathcal{F}})$. Since ${I}_{\mathrm{\neg}s}(\mathcal{F})$ can equal $\mathcal{A}$, potentially ${\mathcal{G}}_{\mathrm{\neg}s}={I}_{\mathrm{\neg}s}(\widehat{\mathcal{F}})\cap \mathcal{S}\ne \mathrm{\varnothing}$. Vice versa, ${I}_{s}(\mathcal{F})\subseteq {I}_{s}(\widehat{\mathcal{F}})$, and since ${I}_{s}(\mathcal{F})$ can equal $\mathcal{S}$, potentially ${\mathcal{G}}_{s}={I}_{s}(\widehat{\mathcal{F}})\cap \mathcal{S}\ne \mathrm{\varnothing}$.

2.
Assume, by contradiction, that Problem 2 outputs a simple unsafe set $\mathcal{A}({\theta}_{i}),i\in \{1,\mathrm{\dots},\underset{\xaf}{N}\}$, which does not touch the true unsafe set: $\exists i\in \{1,\mathrm{\dots},\underset{\xaf}{N}\},\mathcal{A}({\theta}_{i})\cap \mathcal{A}({\theta}^{*})=\mathrm{\varnothing}$. Then, ${\theta}_{j},j\in \{1,\mathrm{\dots},\underset{\xaf}{N}\}\setminus \{i\}$ would be a feasible point for Problem 2 with a parametrization that contains only $\underset{\xaf}{N}1$ simple sets. However, we know Problem 2 with $\underset{\xaf}{N}1$ simple sets is infeasible. Contradiction.
∎
Appendix D Extra numerical examples
D.1 Ushape (random demonstrations)
In this example, we show what the performance of our method looks like with random demonstrations on the Ushape example. On the left of Fig. D.1, we show that our coverage grows more slowly than for the case where demonstrations are chosen for their informativeness; furthermore, coverage for the safe set is higher and coverage for the unsafe set is lower in the random demonstration case. This is because by using random demonstrations, we cover a good deal of $\mathcal{S}$, so ${\mathcal{G}}_{s}$ becomes larger; on the other hand, many of these safe demonstrations may not come in contact with the constraint, so there are relatively few unsafe trajectories that can be sampled, so ${\mathcal{G}}_{\mathrm{\neg}s}$ is not as large. In the center of Fig. D.1, we show that the accuracy of our method doesn’t change much, though the relative performance of the NN gets worse for classifying safe states; this is because the accuracy for the NN is now being evaluated on a larger region since ${\mathcal{G}}_{s}$ is larger due to more demonstrations. As in previous examples, the NN error bars are generated by training the NN ten times with initializations using different random seeds. On the right of Fig. D.1, we display a feasible $\mathcal{A}(\theta )$ recovered by solving a multibox variant of Problem 3. With more demonstrations, the gap between $\mathcal{A}(\theta )$ and the true unsafe set $\mathcal{A}$ will continue to shrink.
The main takeaways from this experiment are: 1) when demonstrations are not informative (in the sense that they do not interact with the constraint), it can take many demonstrations to learn the unsafe set (this holds for any constraint recovery method), and 2) our accuracy remains just as high as for the case with specifically chosen demonstrations and is not much affected by the coverage.
Appendix E Experimental details
For all neural network baseline results in every experiment, the network is trained with weights initialized using ten different random seeds, and the resulting performance range (displayed as a shaded region) and average performance over the ten random seeds are plotted in the figures.
E.1 Unknown parameterizations
We emphasize that for all examples with unknown parameterization, by following the incremental procedure detailed in Section 4.4, we are finding the minimum number of boxes required to represent the data; in other words, we are always operating with the minimal feasible parameterization.
Ushape and infinite boxes:

•
For both experiments, the system dynamics are ${x}_{t+1}\doteq {[{\chi}_{t+1},{y}_{t+1}]}^{\top}={[{\chi}_{t},{y}_{t}]}^{\top}+{[{u}_{t}^{\chi},{u}_{t}^{y}]}^{\top}$. The Ushape experiment uses control constraints ${\parallel [{u}_{t}^{\chi},{u}_{t}^{y}]\parallel}_{2}\le 0.5$, while the infinitebox experiment uses control constraints ${\parallel [{u}_{t}^{\chi},{u}_{t}^{y}]\parallel}_{2}\le 1$.

•
For both experiments, the cost function is $c({\xi}_{x},{\xi}_{u})={\sum}_{i=1}^{T1}{\parallel {x}_{t+1}{x}_{t}\parallel}_{2}^{2}$.

•
Since the cost function has optimal substructure, 100000 unsafe trajectories for each subtrajectory are sampled. The dataset is downsampled to 50 unsafe trajectories for each subtrajectory, which are to be fed into the multibox variant of Problem 3.

•
For both experiments, the initial parameter set is restricted to ${[5,5,3,3]}^{\top}\le {\theta}_{i}\le {[8,8,3,3]}^{\top}$, for each ${\theta}_{i}$ (the parameter for box $i$). For the infinitebox experiment, each box is restricted to be at least $1.25\times 1.25$ in width/height.

•
Sampling time is around 15 seconds per demonstration (for the Ushape experiment) and 10 seconds per demonstration (for the infinitebox experiment). Computation time for solving Problem 3 is around 40 seconds (for the Ushape experiment) and 1520 seconds (for the infinitebox experiment).

•
The same data is used for training the neural network (7800 trajectories total for the Ushape case, 2000 trajectories for the infinitebox case). The neural network architecture used for this example is a fully connected (FC) layer, $2\times 10$ $\to $ LSTM, $10\times 10\to $ FC $10\times 1$ (the recurrent layer is used since we have variable length trajectories as training input). The network is trained using Adam.
Ushape with random demonstrations:

•
The system dynamics are ${x}_{t+1}\doteq {[{\chi}_{t+1},{y}_{t+1}]}^{\top}={[{\chi}_{t},{y}_{t}]}^{\top}+{[{u}_{t}^{\chi},{u}_{t}^{y}]}^{\top}$ with control constraints ${\parallel [{u}_{t}^{\chi},{u}_{t}^{y}]\parallel}_{2}\le 0.5$.

•
The cost function is $c({\xi}_{x},{\xi}_{u})={\sum}_{i=1}^{T1}{\parallel {x}_{t+1}{x}_{t}\parallel}_{2}^{2}$.

•
Demonstrations are generated for 35 pairs of start/goal states sampled uniformly at random over $(\chi ,y)\in [2,2]\times [2,2]$, rejecting any start/goal states that lie in $\mathcal{A}$.

•
Since the cost function has optimal substructure, 10000 unsafe trajectories for each subtrajectory are sampled. The dataset is downsampled to 25 unsafe trajectories for each subtrajectory, which are to be fed into the multibox variant of Problem 3.

•
The initial parameter set is restricted to ${[5,5,3,3]}^{\top}\le {\theta}_{i}\le {[8,8,3,3]}^{\top}$, for each ${\theta}_{i}$ (the parameter for box $i$).

•
Sampling time is around 2 minutes total. Computation time for solving the multibox variant of Problem 3 is around 90 seconds.

•
The same data is used for training the neural network (10100 trajectories total). The neural network architecture used for this example is a fully connected (FC) layer, $2\times 10$ $\to $ LSTM, $10\times 10\to $ FC $10\times 1$. The network is trained using Adam.
E.2 Highdimensional examples
7DOF arm, optimal/suboptimal demonstrations

•
The system dynamics are $\doteq {\theta}_{t+1}^{i}={\theta}_{t}^{i}+{u}_{t}^{i}$, $i=1,\mathrm{\dots},7$, with control constraints $2\le {u}_{t}^{i}\le 2$, $i=1,\mathrm{\dots},7$, where the state is $x=[{\theta}^{1},\mathrm{\dots},{\theta}^{7}]$.

•
The cost function is $c({\xi}_{x},{\xi}_{u})={\sum}_{i=1}^{T1}{\parallel {x}_{t+1}{x}_{t}\parallel}_{2}^{2}$. Note that the generate demonstrations (displayed in Fig. 6.2) push up against the position constraint, since the trajectory minimizing jointspace path length without the position constraint is an arc that exceeds the bounds of the position constraint; the position constraint ends up increasing the cost by truncating that arc.

•
The true safe set is $(x,y,z,\alpha ,\beta ,\gamma )\in [0.51,0.51]\times [0.3,1.1]\times [0.51,0.51]\times [\pi ,\pi ]\times [\pi /120,\pi /120]\times [\pi /120,\pi /120]$ for the optimal case and the true safe set is $(x,y,z,\alpha ,\beta ,\gamma )\in [0.57,0.47]\times [0.10,1.17]\times [0.56,0.56]\times [\pi ,\pi ]\times [0.12,0.12]\times [0.125,0.125]$ for the suboptimal case.

•
Since the cost function has optimal substructure, 250000 unsafe trajectories for each subtrajectory are sampled. For the suboptimal case, the continuoustime demonstrations are timediscretized down to $T=10$ timesteps. The dataset is downsampled to 500 unsafe trajectories for each subtrajectory, which are to be fed into Problem 3.

•
For the optimal case, the demonstrations are obtained by solving trajectory optimization problems solved with the IPOPT solver [ipopt]. For the suboptimal case, the demonstrations are recorded in a virtual reality (VR) environment displayed in Fig. E.1.

•
The initial parameter set is restricted to ${[1.5,1.5,1.5,\pi ,\pi ,\pi ]}^{\top}\le {[x,y,z,\alpha ,\beta ,\gamma ]}^{\top}\le {[1.5,1.5,1.5,\pi ,\pi ,\pi ]}^{\top}$.

•
Sampling time is 12.5 minutes total for the optimal case and 9 minutes total for the suboptimal case. Computation time for solving Problem 2 is around 2 seconds for both the optimal/suboptimal case.

•
The same data is used for training the neural network (70000 trajectories total for the optimal case, 49900 trajectories total for the suboptimal case). The neural network architecture used for this example is a fully connected (FC) layer, $3\times 20$ $\to $ LSTM, $20\times 20\to $ FC $20\times 1$. The network is trained using Adam.
12D quadrotor example

•
The system dynamics [quad_kth] are
$$\left[\begin{array}{c}\hfill \dot{\chi}\hfill \\ \hfill \dot{y}\hfill \\ \hfill \dot{z}\hfill \\ \hfill \dot{\alpha}\hfill \\ \hfill \dot{\beta}\hfill \\ \hfill \dot{\gamma}\hfill \\ \hfill \ddot{\chi}\hfill \\ \hfill \ddot{y}\hfill \\ \hfill \ddot{z}\hfill \\ \hfill \ddot{\alpha}\hfill \\ \hfill \ddot{\beta}\hfill \\ \hfill \ddot{\gamma}\hfill \end{array}\right]=\left[\begin{array}{c}\hfill \dot{\chi}\hfill \\ \hfill \dot{y}\hfill \\ \hfill \dot{z}\hfill \\ \hfill \dot{\beta}\frac{\mathrm{sin}(\gamma )}{\mathrm{cos}(\beta )}+\dot{\gamma}\frac{\mathrm{cos}(\gamma )}{\mathrm{cos}(\beta )}\hfill \\ \hfill \beta \mathrm{cos}(\gamma )\dot{\gamma}\mathrm{sin}(\gamma )\hfill \\ \hfill \dot{\alpha}+\dot{\beta}\mathrm{sin}(\gamma )\mathrm{tan}(\beta )+\dot{\gamma}\mathrm{cos}(\gamma )\mathrm{tan}(\beta )\hfill \\ \hfill \frac{1}{m}[\mathrm{sin}(\gamma )\mathrm{sin}(\alpha )+\mathrm{cos}(\gamma )\mathrm{cos}(\alpha )\mathrm{sin}(\beta )]{u}_{1}\hfill \\ \hfill \frac{1}{m}[\mathrm{cos}(\alpha )\mathrm{sin}(\gamma )\mathrm{cos}(\gamma )\mathrm{sin}(\alpha )\mathrm{sin}(\beta )]{u}_{1}\hfill \\ \hfill g\frac{1}{m}[\mathrm{cos}(\gamma )\mathrm{cos}(\beta )]{u}_{1}\hfill \\ \hfill \frac{{I}_{y}{I}_{z}}{{I}_{x}}\dot{\beta}\dot{\gamma}+\frac{1}{{I}_{x}}{u}_{2}\hfill \\ \hfill \frac{{I}_{z}{I}_{x}}{{I}_{y}}\dot{\alpha}\dot{\gamma}+\frac{1}{{I}_{y}}{u}_{3}\hfill \\ \hfill \frac{{I}_{x}{I}_{y}}{{I}_{z}}\dot{\alpha}\dot{\beta}+\frac{1}{{I}_{z}}{u}_{4}\hfill \end{array}\right],$$ (12) with control constraints ${[0,0.02,0.02,0.02]}^{\top}\le {u}_{t}\le {[mg,0.02,0.02,0.02]}^{\top}$. For our purposes, we convert the dynamics to discrete time by performing forward Euler integration with discretization time $\delta t=0.4$ seconds. The state is $x={[\chi ,y,z,\alpha ,\beta ,\gamma ,\dot{x},\dot{y},\dot{z},\dot{\alpha},\dot{\beta},\dot{\gamma}]}^{\top}$, and the constants are $g=9.81\text{m}/{\text{s}}^{2}$, $m=1$kg, ${I}_{x}=0.5\text{kg}\cdot {\text{m}}^{2}$, ${I}_{y}=0.1\text{kg}\cdot {\text{m}}^{2}$, and ${I}_{z}=0.3\text{kg}\cdot {\text{m}}^{2}$.

•
The known unsafe set in $(\chi ,y,z)$ is $(\chi ,y,z)\notin [0.5,0.5]\times [0.5,0.5]\times [0.5,0.5]$.

•
The true safe set in $(\dot{\alpha},\dot{\beta},\dot{\gamma})$ is $(\dot{\alpha},\dot{\beta},\dot{\gamma})\in {[0.006,0.006]}^{3}$.

•
The cost function is $c({\xi}_{x},{\xi}_{u})={\sum}_{i=1}^{T1}{\parallel {[{\chi}_{i+1},{y}_{i+1},{z}_{i+1},{\dot{\alpha}}_{i+1},{\dot{\beta}}_{i+1},{\dot{\gamma}}_{i+1}]}^{\top}{[{\chi}_{i},{y}_{i},{z}_{i},{\dot{\alpha}}_{i},{\dot{\beta}}_{i},{\dot{\gamma}}_{i}]}^{\top}\parallel}_{2}$ (penalizing acceleration and path length).

•
The demonstrations are obtained by solving trajectory optimization problems solved with the IPOPT solver [ipopt].

•
Since the cost function has optimal substructure, 10000 unsafe trajectories for each subtrajectory are sampled. The dataset is downsampled to 500 unsafe trajectories for each subtrajectory, which are to be fed into Problem 3.

•
The initial parameter set is restricted to ${[\pi /2,\pi /2,\pi /2]}^{\top}\le {[\dot{\alpha},\dot{\beta},\dot{\gamma}]}^{\top}\le {[\pi /2,\pi /2,\pi /2]}^{\top}$.

•
Sampling time is 8.5 minutes total for the optimal case and 9 minutes total for the suboptimal case. Computation time for solving Problem 2 is 12 seconds.

•
The same data is used for training the neural network (30000 trajectories total). The neural network architecture used for this example is a fully connected (FC) layer, $6\times 36$ $\to $ LSTM, $36\times 42\to $ FC $42\times 1$. The network is trained using Adam.
E.3 Blackbox system dynamics
Pushing example

•
The cost function is $c({\xi}_{x},{\xi}_{u})={\sum}_{i=1}^{T1}{\parallel {x}_{t+1}{x}_{t}\parallel}_{2}^{2}$. The two demonstrations are manually generated and are not exactly optimal.

•
1000 unsafe trajectories for each demonstrations are sampled.

•
The initial parameter set is restricted to ${[5,5,3,3]}^{\top}\le {\theta}_{i}\le {[8,8,3,3]}^{\top}$.

•
Sampling time is 2 hours for each demonstration (using the simulator is slower than using the closed form dynamics). Computation time for solving Problem 2 is around 1 second.

•
Demonstrations are timediscretized to 40 simulator timesteps when input to Problem 3.

•
The same data is used for training the neural network (2700 trajectories total). The neural network architecture used for this example is a fully connected (FC) layer, $8\times 10$ $\to $ FC, $10\times 10\to $ FC $10\times 1$. No recurrent layer is used this time since all trajectories are of the same length (no subtrajectories were sampled this time due to speed). The network is trained using Adam.
Appendix F Summary of frequently used notation
Meaning  Notation 

State, state space  $x$, $\mathcal{X}$ 
Control, control space  $u$, $\mathcal{U}$ 
State/control trajectory  ${\xi}_{x}$, ${\xi}_{u}$ 
Constraint state, constraint space  $k$, $\mathcal{C}$ 
Safe set, unsafe set  $\mathcal{S}$, $\mathcal{A}$ 
Parameterized safe set  $\mathcal{S}(\theta )=\{k\mid g(k,\theta )>0\}$ 
Parameterized unsafe set  $\mathcal{A}(\theta )=\{k\mid g(k,\theta )\le 0\}$ 
Safe demonstration $j$  ${\xi}_{{s}_{j}}^{*}$ 
Sampled unsafe trajectory $k$  ${\xi}_{\mathrm{\neg}{s}_{k}}$ 
Guaranteed safe set  ${\mathcal{G}}_{s}$ 
Guaranteed unsafe set  ${\mathcal{G}}_{\mathrm{\neg}s}$ 