Fine-tuning E Heuristics

on given benchmarks

Jan Jakubův & Josef Urban @ PIW 2016

AI4REASON @ ARG @ CIIRC @ CTU Prague

Given a benchmark problem set, find a collection of complementary E heuristics maximizing the number of solved problems.

The Plan

  1. E Prover Brief Intro
  2. BliStr: Evolving E Heuristics for Benchmark Problems
  3. Conjecture-related weights for E
  4. BliStrTune: Evolving E Heuristics Reloaded

E Prover

by Stephan Schulz
  • automated theorem prover for FOL with equality
  • predefined --auto-schedule mode
  • command line arguments to guide proof search

E Prover

Guiding Proof Search
  • term ordering (KBO, LPO, ...)
  • literal selection (to perform superpositions on)
  • clause selection (to select a given clause)
  • axiom relevancy pruning (SInE)

E Prover

Clause Selection / Priority Functions
  • assign integer to each clause
  • the smaller the better
  • user can use predefined priority functions
  • ConstPrio, PreferUnits, PreferGround, ...

E Prover

Clause Selection / Weight Functions
  • assign real to each clause
  • the smaller the better
  • user can use predefined weight functions
  • user can specify parameters
  • eq. Clauseweight: basic symbol counting weight
    • fweight - symbol weight
    • vweight - variable weight
    • pos_mult - positive literal multiplier

E Prover

Clause Selection / Clause Evaluation Function
  • CEF defined by
    • weight function
    • priority function
    • weight function parameters
  • syntax: Clauseweight(PreferUnits,10,1,1.5)
  • assign pair (prio,weight) to each clause
  • select the clause with the smallest pair

E Prover

Clause Selection / Heuristic
  • combines more clause evaluation functions (CEFs)
  • command line syntax:
-H'(3*ConjectureTermPrefixWeight(SimulateSOS,1,3,0,1,1,4,1.5,3), \
    3*RelevanceLevelWeight2(DeferSOS,1,1,2,1,400,10,18,200,5,4,2), \
    5*ConjectureTermPrefixWeight(PreferGroundGoals,1,3,5,10,1,1,1.5,4))'

E Prover

Clause Selection / Summary
  • priority functions
  • weight functions
  • clause evaluation functions
  • heuristics
  • protocol: proof search control arguments

The Plan

  1. E Prover Brief Intro
  2. BliStr: Evolving E Heuristics for Benchmark Problems
  3. Conjecture-related weights for E
  4. BliStrTune: Evolving E Heuristics Reloaded

ParamILS

Iterated Local Search
  • method for parameter tuning and algorithm configuration
  • by Hutter, Hoos, Stützle, Leyton-Brown, Fawcett
  • from University of British Columbia (UBC)
  • implementation available for download

Using ParamILS

as a blackbox
  • describe configuration parameters and their domains
  • write a wrapper to run with a specific configuration
  • provide test problems
  • run & hope

Using ParamILS

to improve E protocols
tord {Auto,LPO4,KBO,KBO6} [Auto]
sel {SelectMaxLComplexAvoidPosPred,SelectNewComplexAHP,...} [SelectComplexG]
prord {arity,invfreq,invfreqconstmin} [invfreqconstmin]
simparamod {none,normal,oriented} [normal]
srd {0,1} [1]
forwardcntxtsr {0,1} [1]
splaggr {0,1} [0]
...

BliStr: Blind Strategy Maker

by Josef Urban

BliStr: Blind Strategy Maker

  • the protocols are like giraffes, the problems are their food
  • the better the giraffe specializes for eating problems unsolvable by others, the more it gets fed and further evolved

BliStr: Brief Overview

  1. start with initial protocols
  2. evaluate current protocols on all problems
  3. for each protocol, collect best cheap problems
  4. improve each strategy on its best cheap problems
    1. (using iterated local search)
  5. evaluate new strategies
  6. re-collect best cheap problems (goto 3)
  7. end when there is no improvement

BliStr Life

The Plan

  1. E Prover Brief Intro
  2. BliStr: Evolving E Heuristics for Benchmark Problems
  3. Conjecture-related weights for E
  4. BliStrTune: Evolving E Heuristics Reloaded

Conjecture-related Weights

our previous research
  • Weight ConjectureRelativeSymbolWeight counts symbols with smaller weights for conjecture symbols.
  • Question: Does it make sense to consider also term structure, not just symbols?
  • To answer: We have implemented several new weight functions which measure a clause "related-ness" to a conjecture using different metrics

Conjecture-related Weights

new weight functions
  • TermWeight - shared subterms with conjecture
  • PrefixWeight - common prefix with conjecture terms
  • LevDistanceWeight - Levenstein distance
  • TreeDistanceWeight - Tree Edit Distance
  • TermTfIdfWeight - TF/IDF
  • StrucDistanceWeight - structural distance

Conjecture-related Weights

evaluation
  • Are these new weights helpful?
  • How complementary are with previous E weights?
  • What are the best parameters for these weights?
  • Can we use them with BliStr?

The Plan

  1. E Prover Brief Intro
  2. BliStr: Evolving E Heuristics for Benchmark Problems
  3. Conjecture-related weights for E
  4. BliStrTune: Evolving E Heuristics Reloaded

BliStrTune

  • BliStr does not change weight parameters
  • it has only 12 (or so) hardcoded CEFs
  • Idea:
    • Extend BliStr to change weight parameters
  • Problem: Too big parameter space
    • ParamILS does not perform well
  • Solution: Use two phases.
    1. tune global parameters
    2. tune weight function arguments

Global Tuning Phase

-tKBO6 -WSelectComplexG ...
3*ConjectureRelativeTermWeight(ConstPrio,0,1,0.1,18,400,50,300,1,4,0.8,1),
34*ConjectureRelativeTermWeight(PreferUnits,1,1,0.1,100,9999,100,5,1,9999.9,2,0.7),
8*ConjectureRelativeSymbolWeight(PreferGround,0.2,50,100,5,10,0.5,2,0.2)

Fine Tuning Phase

-tKBO6 -WSelectComplexG 3*ConjectureRelativeTermWeight(ConstPrio,0,1,0.1,18,400,50,300,1,4,0.8,1), 34*ConjectureRelativeTermWeight(PreferUnits,1,1,0.1,100,9999,100,5,1,9999,2,0.7), 8*ConjectureRelativeSymbolWeight(PreferGround,0.2,50,100,5,10,0.5,2,0.2)

BliStrTune Evaluation

  • use data from MZR@Turing division at CACS'12
  • 1000 training problems were provided beforehand
  • 400 new problems were used in the competition
  • all problems exported from Mizar by Josef
  • Plan: train Blistr and BlistrTune, then compare
  • (in progress, only first training run finished)
Impact of new weights and fine tuning
(results in progress)

E and Vampire in 60 seconds

(results in progress)

Statistics

most often used CEFs
CEF# used
FIFOWeight(DeferSOS)30
FIFOWeight(PreferNonGoals)27
FIFOWeight(PreferProcessed)22
StaggeredWeight(DeferSOS,1)17
StaggeredWeight(DeferSOS,2)15

Statistics

most often used weights
weight# used
ConjectureRelativeTermWeight370
ConjectureRelativeSymbolWeight354
ConjectureTermPrefixWeight346
ConjectureGeneralSymbolWeight281
RelevanceLevelWeight2276

Statistics

most often used priorities
prio# used
PreferNonGoals426
PreferProcessed283
PreferWatchlist282
PreferUnitGroundGoals247
ConstPrio235

Statistics

usage of new weights
weight# used
ConjectureRelativeTermWeight370
ConjectureTermPrefixWeight346
ConjectureStrucDistanceWeight77
ConjectureLevDistanceWeight40
ConjectureTermTfIdfWeight29
ConjectureTreeDistanceWeight18

Statistics

most often used parameters
ConjectureRelativeSymbolWeight(
   PreferGroundGoals,0.1,100,100,100,20,1.5,1.5,1.5)
ConjectureTermPrefixWeight(
   PreferNonGoals,1,3,100,9999.9,0,9999.9,3,5)
ConjectureTermPrefixWeight(
   DeferSOS,1,3,0.1,10,0,0.1,4,4)
ConjectureRelativeTermWeight(
   PreferProcessed,1,1,0.1,10,100,50,50,1,3,2,2)
ConjectureRelativeSymbolWeight(
   PreferNonGoals,0.1,100,50,20,18,0.1,1.5,1.5)

Thank you