Julia hybrid constraint programming solver enhanced by a reinforcement learning driven search.
BSD-3-CLAUSE License
SeaPearl is a Constraint Programming solver that can use Reinforcement Learning agents as value-selection heuristics, using graphs as inputs for the agent's approximator. It is to be seen as a tool for researchers that gives the possibility to go above and beyond what has already been done with it.
The paper accompanying this solver can be found on the arXiv. If you use SeaPearl in your research, please cite our work.
The RL agents are defined using ReinforcementLearning.jl, their inputs are dealt with using Flux.jl. The CP part, inspired from MiniCP, is focused on readability. The code is meant to be clear and modular so that researchers could easily get access to CP data and use it as input for their ML model.
]add SeaPearl
Working examples can be found in SeaPearlZoo and documentation can be found here.
SeaPearl can be used either as a classic CP solver that uses predefined variable and value selection heuristics or as Reinforcement Learning driven CP solver that is capable of learning through solving automatically generated instances of a given problem (knapsack, tsptw, graphcoloring, EternityII ...).
To use SeaPearl as a classic CP solver, one needs to :
YourVariableSelectionHeuristic{TakeObjective} <: SeaPearl.AbstractVariableSelection{TakeObjective}
BasicHeuristic <: ValueSelection
trailer = SeaPearl.Trailer()
model = SeaPearl.CPModel(trailer)
#create variable :
SeaPearl.addVariable!(...)
#add constraints :
SeaPearl.addConstraint!(model, SeaPearl.AbstractConstraint(...))
#add optionnal objective function :
SeaPearl.addObjective!(model, ObjectiveVar)
To use SeaPearl as a RL-driven CP solver, one needs to :
CustomVariableSelectionHeuristic{TakeObjective} <: SeaPearl.AbstractVariableSelection{TakeObjective}
LearnedHeuristic{SR<:AbstractStateRepresentation, R<:AbstractReward, A<:ActionOutput} <: ValueSelection
agent = RL.Agent(
policy=(...),
trajectory=(...),
)
CustomReward <: SeaPearl.AbstractReward
CustomStateRepresentation <: SeaPearl.AbstractStateRepresentation
CustomFeaturization <: SeaPearl.AbstractFeaturization
CustomProblemGenerator <: AbstractModelGenerator
nb_epochs = 3000
CustomStrategy <: SearchStrategy #DFS, RBS, ILDS
CustomEvaluator <: AbstractEvaluator #or use predefined one : SeaPearl.SameInstancesEvaluator(...)
function CustomMetricsFun
metricsArray, eval_metricsArray = SeaPearl.train!(
valueSelectionArray=valueSelectionArray,
generator=tsptw_generator,
nbEpisodes=nbEpisodes,
strategy=strategy,
eval_strategy=eval_strategy,
variableHeuristic=variableSelection,
out_solver = true,
verbose = true,
evaluator=SeaPearl.SameInstancesEvaluator(valueSelectionArray,tsptw_generator; evalFreq = evalFreq, nbInstances = nbInstances, evalTimeOut = evalTimeOut),
restartPerInstances = restartPerInstances
)
All contributions are welcome! Have a look at our contributing guidelines.