Fine-tuning E Heuristics
on given benchmarks
Jan Jakubův & Josef Urban @ PIW 2016
AI4REASON @ ARG @ CIIRC @ CTU Prague
Given a benchmark problem set, find a collection of complementary E heuristics
maximizing the number of solved problems.
The Plan
- E Prover Brief Intro
- BliStr: Evolving E Heuristics for Benchmark Problems
- Conjecture-related weights for E
- BliStrTune: Evolving E Heuristics Reloaded
E Prover
by Stephan Schulz
- automated theorem prover for FOL with equality
- predefined --auto-schedule mode
- command line arguments to guide proof search
E Prover
Guiding Proof Search
- term ordering (KBO, LPO, ...)
- literal selection (to perform superpositions on)
- clause selection (to select a given clause)
- axiom relevancy pruning (SInE)
E Prover
Clause Selection / Priority Functions
- assign integer to each clause
- the smaller the better
- user can use predefined priority functions
- ConstPrio,
PreferUnits,
PreferGround, ...
E Prover
Clause Selection / Weight Functions
- assign real to each clause
- the smaller the better
- user can use predefined weight functions
- user can specify parameters
- eq. Clauseweight: basic symbol counting weight
- fweight - symbol weight
- vweight - variable weight
- pos_mult - positive literal multiplier
E Prover
Clause Selection / Clause Evaluation Function
- CEF defined by
- weight function
- priority function
- weight function parameters
- syntax: Clauseweight(PreferUnits,10,1,1.5)
- assign pair (prio,weight) to each clause
- select the clause with the smallest pair
E Prover
Clause Selection / Heuristic
- combines more clause evaluation functions (CEFs)
- command line syntax:
-H'(3*ConjectureTermPrefixWeight(SimulateSOS,1,3,0,1,1,4,1.5,3), \
3*RelevanceLevelWeight2(DeferSOS,1,1,2,1,400,10,18,200,5,4,2), \
5*ConjectureTermPrefixWeight(PreferGroundGoals,1,3,5,10,1,1,1.5,4))'
E Prover
Clause Selection / Summary
- priority functions
- weight functions
- clause evaluation functions
- heuristics
- protocol: proof search control arguments
The Plan
- E Prover Brief Intro
- BliStr: Evolving E Heuristics for Benchmark Problems
- Conjecture-related weights for E
- BliStrTune: Evolving E Heuristics Reloaded
ParamILS
Iterated Local Search
- method for parameter tuning and algorithm configuration
- by Hutter, Hoos, Stützle, Leyton-Brown, Fawcett
- from University of British Columbia (UBC)
- implementation available for download
Using ParamILS
as a blackbox
- describe configuration parameters and their domains
- write a wrapper to run with a specific configuration
- provide test problems
- run & hope
Using ParamILS
to improve E protocols
tord {Auto,LPO4,KBO,KBO6} [Auto]
sel {SelectMaxLComplexAvoidPosPred,SelectNewComplexAHP,...} [SelectComplexG]
prord {arity,invfreq,invfreqconstmin} [invfreqconstmin]
simparamod {none,normal,oriented} [normal]
srd {0,1} [1]
forwardcntxtsr {0,1} [1]
splaggr {0,1} [0]
...
BliStr: Blind Strategy Maker
by Josef Urban
BliStr: Blind Strategy Maker
- the protocols are like giraffes, the problems are their food
- the better the giraffe specializes for eating problems
unsolvable by others, the more it gets fed and further
evolved
BliStr: Brief Overview
- start with initial protocols
- evaluate current protocols on all problems
- for each protocol, collect best cheap problems
- improve each strategy on its best cheap problems
- (using iterated local search)
- evaluate new strategies
- re-collect best cheap problems (goto 3)
- end when there is no improvement
The Plan
- E Prover Brief Intro
- BliStr: Evolving E Heuristics for Benchmark Problems
- Conjecture-related weights for E
- BliStrTune: Evolving E Heuristics Reloaded
Conjecture-related Weights
our previous research
- Weight ConjectureRelativeSymbolWeight counts symbols with
smaller weights for conjecture symbols.
- Question: Does it make sense to consider also term structure, not just symbols?
- To answer: We have implemented several new weight functions which measure a clause "related-ness" to a conjecture using different metrics
Conjecture-related Weights
new weight functions
- TermWeight - shared subterms with conjecture
- PrefixWeight - common prefix with conjecture terms
- LevDistanceWeight - Levenstein distance
- TreeDistanceWeight - Tree Edit Distance
- TermTfIdfWeight - TF/IDF
- StrucDistanceWeight - structural distance
Conjecture-related Weights
evaluation
- Are these new weights helpful?
- How complementary are with previous E weights?
- What are the best parameters for these weights?
- Can we use them with BliStr?
The Plan
- E Prover Brief Intro
- BliStr: Evolving E Heuristics for Benchmark Problems
- Conjecture-related weights for E
- BliStrTune: Evolving E Heuristics Reloaded
BliStrTune
- BliStr does not change weight parameters
- it has only 12 (or so) hardcoded CEFs
- Idea:
- Extend BliStr to change weight parameters
- Problem: Too big parameter space
- ParamILS does not perform well
- Solution: Use two phases.
- tune global parameters
- tune weight function arguments
Global Tuning Phase
-tKBO6 -WSelectComplexG ...
3*ConjectureRelativeTermWeight(ConstPrio,0,1,0.1,18,400,50,300,1,4,0.8,1),
34*ConjectureRelativeTermWeight(PreferUnits,1,1,0.1,100,9999,100,5,1,9999.9,2,0.7),
8*ConjectureRelativeSymbolWeight(PreferGround,0.2,50,100,5,10,0.5,2,0.2)
Fine Tuning Phase
-tKBO6 -WSelectComplexG
3*ConjectureRelativeTermWeight(ConstPrio,0,1,0.1,18,400,50,300,1,4,0.8,1),
34*ConjectureRelativeTermWeight(PreferUnits,1,1,0.1,100,9999,100,5,1,9999,2,0.7),
8*ConjectureRelativeSymbolWeight(PreferGround,0.2,50,100,5,10,0.5,2,0.2)
BliStrTune Evaluation
- use data from MZR@Turing division at CACS'12
- 1000 training problems were provided beforehand
- 400 new problems were used in the competition
- all problems exported from Mizar by Josef
- Plan: train Blistr and BlistrTune, then compare
- (in progress, only first training run finished)
Impact of new weights and fine tuning
(results in progress)
E and Vampire in 60 seconds
(results in progress)
Statistics
most often used CEFs
CEF | # used |
FIFOWeight(DeferSOS) | 30 |
FIFOWeight(PreferNonGoals) | 27 |
FIFOWeight(PreferProcessed) | 22 |
StaggeredWeight(DeferSOS,1) | 17 |
StaggeredWeight(DeferSOS,2) | 15 |
Statistics
most often used weights
weight | # used |
ConjectureRelativeTermWeight | 370 |
ConjectureRelativeSymbolWeight | 354 |
ConjectureTermPrefixWeight | 346 |
ConjectureGeneralSymbolWeight | 281 |
RelevanceLevelWeight2 | 276 |
Statistics
most often used priorities
prio | # used |
PreferNonGoals | 426 |
PreferProcessed | 283 |
PreferWatchlist | 282 |
PreferUnitGroundGoals | 247 |
ConstPrio | 235 |
Statistics
usage of new weights
weight | # used |
ConjectureRelativeTermWeight | 370 |
ConjectureTermPrefixWeight | 346 |
ConjectureStrucDistanceWeight | 77 |
ConjectureLevDistanceWeight | 40 |
ConjectureTermTfIdfWeight | 29 |
ConjectureTreeDistanceWeight | 18 |
Statistics
most often used parameters
ConjectureRelativeSymbolWeight(
PreferGroundGoals,0.1,100,100,100,20,1.5,1.5,1.5)
ConjectureTermPrefixWeight(
PreferNonGoals,1,3,100,9999.9,0,9999.9,3,5)
ConjectureTermPrefixWeight(
DeferSOS,1,3,0.1,10,0,0.1,4,4)
ConjectureRelativeTermWeight(
PreferProcessed,1,1,0.1,10,100,50,50,1,3,2,2)
ConjectureRelativeSymbolWeight(
PreferNonGoals,0.1,100,50,20,18,0.1,1.5,1.5)