Skip to content

Allow MDPs to use continuous distributions as rewards in Simulation

Currently, the following MDP is not allowed:

const real P0;
const real P1;

const int maxsteps = 1000;
transient real reward;
transient int step;

property P_MaxReward = Xmax(S(reward), (step >= maxsteps));

do {
    :: Arm(P0)
    :: Arm(P1)
}

process Arm(real mean) {
    {= step = 1, reward = Normal(mean, 1.0) =}
}

Since MDPs are discrete and cannot use continuous distributions. However, since it is only used for the reward, it should be possible to allow this.