Algorithms

The toolbox provides a collection of algorithms for Markov Decision Processes and Partially Observable Markov Decision Processes with resource constraints. On this page we provide an overview of the high-level architecture of the algorithms, and we provide specific information about the algorithms that have been implemented.

Overview of algorithms

Column generation (Yost and Washburn, 2000)
CMDP linear program (Altman, 1999)
Deterministic preallocation (Wu and Durfee, 2010)
Dynamic relaxation (De Nijs et al, 2017)
CGCP (Walraven and Spaan, 2018)
CALP (Poupart et al, 2015)
CC-POMCP (Lee et al, 2018)

Algorithms for Markov Decision Processes

Algorithms for Markov Decision Processes with constraints implement the CMDPAlgorithm interface. This interface provides a general skeleton which ensures that the input and output format of the algorithms is consistent. Each algorithm takes a CMDPInstance object as an input, and returns a CMDPSolution object. This structure is depicted in the figure below. More information about instance objects and the type of constraints is available here, and more information about solution objects can be found here.

Solve MDP with resource constraints

In the overview below we list the MDP algorithms that are available in the toolbox. Parameters of algorithms have to be defined in the code. Other settings can be changed in the CONFIG file.

Column generation

The column generation algorithm is a conditional preallocation method which has been introduced by Yost and Washburn (2000). The algorithm iteratively generates policies and adds the corresponding columns to a linear program. The algorithm derives a probability distribution over deterministic policies using this linear program, which it eventually returns as a solution.

Class: algorithms.mdp.colgen.ColGen

Supported constraints: budget, instantaneous

Number of agents: n

Parameters: none

Settings in config file: tolerance that is used to determine whether dual prices converge, time limit

Solution returned: probability distribution over deterministic policies

Literature: Yost, K. A., & Washburn, A. R. (2000). The LP/POMDP Marriage: Optimization with Imperfect Information. Naval Research Logistics, 47(8), 607–619.

CMDP linear program

The linear program for Constrained MDPs defines a stochastic policy which represents a conditional preallocation. The algorithm itself only initializes the linear program, and derives the stochastic policy from the solution after solving the linear program. More information about linear programs for Constrained MDPs can be found in a book by Altman (1999).

Class: algorithms.mdp.constrainedmdp.ConstrainedMDP

Supported constraints: budget, instantaneous

Number of agents: n

Parameters: none

Settings in config file: none

Solution returned: stochastic policy

Literature: Altman, E. (1999). Constrained Markov Decision Processes. CRC Press.

Deterministic preallocation

The deterministic preallocation algorithm in the toolbox uses the same linear program as discussed above, except that it introduces additional binary variables to enforce that resource constraints are never violated. Similar variables have been used in algorithms by Wu and Durfee (2010) and De Nijs et al. (2018).

Class: algorithms.mdp.deterministicpreallocation.DeterministicPreallocation

Supported constraints: budget, instantaneous

Number of agents: n

Parameters: none

Settings in config file: none

Solution returned: stochastic policy

Literature: Wu, J., & Durfee, E. H. (2010). Resource-Driven Mission-Phasing Techniques for Constrained Agents in Stochastic Environments. Journal of Artificial Intelligence Research, 38, 415–473.

De Nijs, F., Spaan, M. T. J., & De Weerdt, M. M. (2018). Preallocation and Planning under Stochastic Resource Constraints. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence (pp. 4662–4669).

Dynamic relaxation

The dynamic constraint relaxation algorithm proposed by De Nijs et al. (2017) is a hybrid algorithm that bridges the gap between conditional preallocation methods and deterministic preallocation methods. Rather than enforcing that resource constraints are respected at all times, the algorithm computes a solution for which the resource constraint violation probability is (empirically) bounded by a given parameter alpha. The algorithm uses either Column Generation or the CMDP linear program as a subroutine, which have been discussed above. These algorithms need to be provided as the first argument of the DynamicRelaxation constructor. This is illustrated in the example java file.

Class: algorithms.mdp.dynamicrelaxation.DynamicRelaxation

Supported constraints: budget, instantaneous

Number of agents: n

Parameters: tolerance on constraint violation probability (alpha), factor used for constraint relaxation (beta)

Settings in config file: tolerance to determine whether resource limits have stabilized during constraint relaxation, time limit

Solution returned: probability distribution over deterministic policies (with column generation), stochastic policy (with CMDP linear program)

Literature: De Nijs, F., Walraven, E., De Weerdt, M. M., & Spaan, M. T. J. (2017). Bounding the Probability of Resource Constraint Violations in Multi-Agent MDPs. In Proceedings of the 31st AAAI Conference on Artificial Intelligence (pp. 3562–3568).

Algorithms for Partially Observable Markov Decision Processes

Algorithms for Partially Observable Markov Decision Processes with constraints implement the CPOMDPAlgorithm interface. This interface provides a general skeleton which ensures that the input and output format of the algorithms is consistent. Each algorithm takes a CPOMDPInstance object as an input, and returns a CPOMDPSolution object. This structure is depicted in the figure below. More information about instance objects and the type of constraints is available here, and more information about solution objects can be found here.

Solve POMDP with resource constraints

In the overview below we list the MDP algorithms that are available in the toolbox. Parameters of algorithms have to be defined in the code. Other settings can be changed in the CONFIG file.

CGCP

The Column Generation algorithm for Constrained POMDPs (CGCP) proposed by Walraven and Spaan (2018) enhances the scalability of the original column generation algorithm by integrating tailored approximate POMDP algorithms. The algorithm uses point-based value iteration to generate the policies, and these policies are converted to a finite state controller before the corresponding column is added to the linear program. Similar to the MDP variant of the algorithm, which we discussed above, CGCP derives a probability distribution over policies from the linear program, which it returns as a solution.

Class: algorithms.pomdp.cgcp.CGCP

Supported constraints: budget

Number of agents: n

Parameters: none

Settings in config file: time limit subproblem solver, amount of time by which time limit subproblem solver is increased upon convergence, minimum number of iterations required with an objective increase before increasing time limit subproblem solver, time limit

Solution returned: probability distribution over deterministic policies

Literature: Walraven, E., & Spaan, M. T. J. (2018). Column Generation Algorithms for Constrained POMDPs. Journal of Artificial Intelligence Research, 62, 489–533.

CALP

The Constrained Approximate Linear Programming (CALP) algorithm by Poupart et al. (2015) formulates a Constrained POMDP as a Constrained MDP defined in terms of belief states rather than states. The algorithm searches iteratively for additional beliefs, which it uses to expand the Constrained MDP model. This model is solved using the standard linear program for Constrained MDPs, as discussed above.

Class: algorithms.pomdp.calp.FiniteCALP

Supported constraints: budget

Number of agents: 1

Parameters: none

Settings in config file: maximum number of new beliefs added in an iteration, maximum number of iterations, tolerance on reachability probability for the belief search, time limit

Solution returned: stochastic finite state controller

Literature: Poupart, P., Malhotra, A., Pei, P., Kim, K.E., Goh, B., & Bowling, M. (2015). Approximate Linear Programming for Constrained Partially Observable Markov Decision Processes. In Proceedings of the 29th AAAI Conference on Artificial Intelligence (pp. 3342–3348).

CC-POMCP

The Cost-Constrained POMCP (CC-POMCP) algorithm by Lee et al. (2018) is an online algorithm which performs Monte-Carlo Tree Search to choose actions that do not violate a budget constraint in expectation. The algorithm is an adapted version of the standard POMCP algorithm for unconstrained POMDPs. The solution object returned by the algorithm performs online search upon calling the getActions method. The algorithm in the toolbox considers a finite horizon, whereas the original version of the algorithm in the paper is infinite horizon.

Class: algorithms.pomdp.ccpomcp.CCPOMCP

Supported constraints: budget

Number of agents: 1

Parameters: parameter to multiply exploration bonus (k), number of iterations of the online search (iter), scalar for alpha learning rate (alphaScalar). The actual alpha_n used by the search procedure is defined as alphaScalar*(1/n), where n is the iteration number of the search.

Settings in config file: none

Solution returned: solution object which performs online search

Literature: Lee, J., Kim, G., Poupart, P., & Kim, K. (2018). Monte-Carlo Tree Search for Constrained POMDPs. In Advances in Neural Information Processing Systems (pp. 7934–7943).

Create your own algorithm

New algorithms can be added by creating a new package in the algorithms package. The new package should contain an implementation of the CMDPAlgorithm interface or the CPOMDPAlgorithm interface. The algorithm should retrieve relevant information about the problem domain in the setInstance() method, and it should return a proper solution when calling the solve() method. There are no additional requirements regarding the internal operation of the algorithm. It is only important that the algorithm respects the input and output format defined by the interface that it implements. The existing algorithms in the toolbox serve as an example.

The ConstrainedPlanningToolbox has been developed by the Algorithmics group at Delft University of Technology, The Netherlands. Please visit our website for more information.