Allow MDPs to use continuous distributions as rewards in Simulation
Currently, the following MDP is not allowed:
const real P0;
const real P1;
const int maxsteps = 1000;
transient real reward;
transient int step;
property P_MaxReward = Xmax(S(reward), (step >= maxsteps));
do {
:: Arm(P0)
:: Arm(P1)
}
process Arm(real mean) {
{= step = 1, reward = Normal(mean, 1.0) =}
}
Since MDPs are discrete and cannot use continuous distributions. However, since it is only used for the reward, it should be possible to allow this.