reveal.js

Fine-tuning E Heuristics

on given benchmarks

Jan Jakubův & Josef Urban @ PIW 2016

AI4REASON @ ARG @ CIIRC @ CTU Prague

Given a benchmark problem set, find a collection of complementary E heuristics maximizing the number of solved problems.

The Plan

E Prover Brief Intro
BliStr: Evolving E Heuristics for Benchmark Problems
Conjecture-related weights for E
BliStrTune: Evolving E Heuristics Reloaded

E Prover

by Stephan Schulz

automated theorem prover for FOL with equality
predefined --auto-schedule mode
command line arguments to guide proof search

E Prover

Guiding Proof Search

term ordering (KBO, LPO, ...)
literal selection (to perform superpositions on)
clause selection (to select a given clause)
axiom relevancy pruning (SInE)

E Prover

Clause Selection / Priority Functions

assign integer to each clause
the smaller the better
user can use predefined priority functions
ConstPrio, PreferUnits, PreferGround, ...

E Prover

Clause Selection / Weight Functions

assign real to each clause
the smaller the better
user can use predefined weight functions
user can specify parameters
eq. Clauseweight: basic symbol counting weight
- fweight - symbol weight
- vweight - variable weight
- pos_mult - positive literal multiplier

E Prover

Clause Selection / Clause Evaluation Function

CEF defined by
- weight function
- priority function
- weight function parameters
syntax: Clauseweight(PreferUnits,10,1,1.5)
assign pair (prio,weight) to each clause
select the clause with the smallest pair

E Prover

Clause Selection / Heuristic

combines more clause evaluation functions (CEFs)
command line syntax:

-H'(3*ConjectureTermPrefixWeight(SimulateSOS,1,3,0,1,1,4,1.5,3), \
    3*RelevanceLevelWeight2(DeferSOS,1,1,2,1,400,10,18,200,5,4,2), \
    5*ConjectureTermPrefixWeight(PreferGroundGoals,1,3,5,10,1,1,1.5,4))'

E Prover

Clause Selection / Summary

priority functions
weight functions
clause evaluation functions
heuristics
protocol: proof search control arguments

The Plan

E Prover Brief Intro
BliStr: Evolving E Heuristics for Benchmark Problems
Conjecture-related weights for E
BliStrTune: Evolving E Heuristics Reloaded

ParamILS

Iterated Local Search

method for parameter tuning and algorithm configuration
by Hutter, Hoos, Stützle, Leyton-Brown, Fawcett
from University of British Columbia (UBC)
implementation available for download

Using ParamILS

as a blackbox

describe configuration parameters and their domains
write a wrapper to run with a specific configuration
provide test problems
run & hope

Using ParamILS

to improve E protocols

tord {Auto,LPO4,KBO,KBO6} [Auto]
sel {SelectMaxLComplexAvoidPosPred,SelectNewComplexAHP,...} [SelectComplexG]
prord {arity,invfreq,invfreqconstmin} [invfreqconstmin]
simparamod {none,normal,oriented} [normal]
srd {0,1} [1]
forwardcntxtsr {0,1} [1]
splaggr {0,1} [0]
...

BliStr: Blind Strategy Maker

by Josef Urban

BliStr: Blind Strategy Maker

the protocols are like giraffes, the problems are their food
the better the giraffe specializes for eating problems unsolvable by others, the more it gets fed and further evolved

BliStr: Brief Overview

start with initial protocols
evaluate current protocols on all problems
for each protocol, collect best cheap problems
improve each strategy on its best cheap problems
1. (using iterated local search)
evaluate new strategies
re-collect best cheap problems (goto 3)
end when there is no improvement

BliStr Life

The Plan

E Prover Brief Intro
BliStr: Evolving E Heuristics for Benchmark Problems
Conjecture-related weights for E
BliStrTune: Evolving E Heuristics Reloaded

Conjecture-related Weights

our previous research

Weight ConjectureRelativeSymbolWeight counts symbols with smaller weights for conjecture symbols.
Question: Does it make sense to consider also term structure, not just symbols?
To answer: We have implemented several new weight functions which measure a clause "related-ness" to a conjecture using different metrics

Conjecture-related Weights

new weight functions

TermWeight - shared subterms with conjecture
PrefixWeight - common prefix with conjecture terms
LevDistanceWeight - Levenstein distance
TreeDistanceWeight - Tree Edit Distance
TermTfIdfWeight - TF/IDF
StrucDistanceWeight - structural distance

Conjecture-related Weights

evaluation

Are these new weights helpful?
How complementary are with previous E weights?
What are the best parameters for these weights?
Can we use them with BliStr?

The Plan

E Prover Brief Intro
BliStr: Evolving E Heuristics for Benchmark Problems
Conjecture-related weights for E
BliStrTune: Evolving E Heuristics Reloaded

BliStrTune

BliStr does not change weight parameters
it has only 12 (or so) hardcoded CEFs
Idea:
- Extend BliStr to change weight parameters
Problem: Too big parameter space
- ParamILS does not perform well
Solution: Use two phases.
1. tune global parameters
2. tune weight function arguments

Global Tuning Phase

-tKBO6 -WSelectComplexG ...
3*ConjectureRelativeTermWeight(ConstPrio,0,1,0.1,18,400,50,300,1,4,0.8,1),
34*ConjectureRelativeTermWeight(PreferUnits,1,1,0.1,100,9999,100,5,1,9999.9,2,0.7),
8*ConjectureRelativeSymbolWeight(PreferGround,0.2,50,100,5,10,0.5,2,0.2)

Fine Tuning Phase

-tKBO6 -WSelectComplexG
3*ConjectureRelativeTermWeight(ConstPrio,0,1,0.1,18,400,50,300,1,4,0.8,1),
34*ConjectureRelativeTermWeight(PreferUnits,1,1,0.1,100,9999,100,5,1,9999,2,0.7),
8*ConjectureRelativeSymbolWeight(PreferGround,0.2,50,100,5,10,0.5,2,0.2)

BliStrTune Evaluation

use data from MZR@Turing division at CACS'12
1000 training problems were provided beforehand
400 new problems were used in the competition
all problems exported from Mizar by Josef
Plan: train Blistr and BlistrTune, then compare
(in progress, only first training run finished)

Impact of new weights and fine tuning

(results in progress)

E and Vampire in 60 seconds

(results in progress)

Statistics

most often used CEFs

CEF	# used
FIFOWeight(DeferSOS)	30
FIFOWeight(PreferNonGoals)	27
FIFOWeight(PreferProcessed)	22
StaggeredWeight(DeferSOS,1)	17
StaggeredWeight(DeferSOS,2)	15

Statistics

most often used weights

weight	# used
ConjectureRelativeTermWeight	370
ConjectureRelativeSymbolWeight	354
ConjectureTermPrefixWeight	346
ConjectureGeneralSymbolWeight	281
RelevanceLevelWeight2	276

Statistics

most often used priorities

prio	# used
PreferNonGoals	426
PreferProcessed	283
PreferWatchlist	282
PreferUnitGroundGoals	247
ConstPrio	235

Statistics

usage of new weights

weight	# used
ConjectureRelativeTermWeight	370
ConjectureTermPrefixWeight	346
ConjectureStrucDistanceWeight	77
ConjectureLevDistanceWeight	40
ConjectureTermTfIdfWeight	29
ConjectureTreeDistanceWeight	18

Statistics

most often used parameters

ConjectureRelativeSymbolWeight(
   PreferGroundGoals,0.1,100,100,100,20,1.5,1.5,1.5)
ConjectureTermPrefixWeight(
   PreferNonGoals,1,3,100,9999.9,0,9999.9,3,5)
ConjectureTermPrefixWeight(
   DeferSOS,1,3,0.1,10,0,0.1,4,4)
ConjectureRelativeTermWeight(
   PreferProcessed,1,1,0.1,10,100,50,50,1,3,2,2)
ConjectureRelativeSymbolWeight(
   PreferNonGoals,0.1,100,50,20,18,0.1,1.5,1.5)

Fine-tuning E Heuristics

on given benchmarks

The Plan

E Prover

by Stephan Schulz

E Prover

Guiding Proof Search

E Prover

Clause Selection / Priority Functions

E Prover

Clause Selection / Weight Functions

E Prover

Clause Selection / Clause Evaluation Function

E Prover

Clause Selection / Heuristic

E Prover

Clause Selection / Summary

The Plan

ParamILS

Iterated Local Search

Using ParamILS

as a blackbox

Using ParamILS

to improve E protocols

BliStr: Blind Strategy Maker

by Josef Urban

BliStr: Blind Strategy Maker

BliStr: Brief Overview

BliStr Life

The Plan

Conjecture-related Weights

our previous research

Conjecture-related Weights

new weight functions

Conjecture-related Weights

evaluation

The Plan

BliStrTune

Global Tuning Phase

Fine Tuning Phase

BliStrTune Evaluation

Impact of new weights and fine tuning

(results in progress)

E and Vampire in 60 seconds

(results in progress)

Statistics

most often used CEFs

Statistics

most often used weights

Statistics

most often used priorities

Statistics

usage of new weights

Statistics

most often used parameters

Thank you