Entrants' System Descriptions
ATPBoost 1.0
Bartosz Piotrowski
University of Warsaw, Poland
Architecture
ATPBoost 1.0
[PU18],
is a metasystem for ATP in large theories relying on consistent naming of
symbols and formulae.
It learns premise selection from examples of successful proofs using several
machine learning methods (k-nearest neighbors, gradient boosted trees,
recurrent neural networks, and graph neural networks).
The underlying deductive system is (currently) E prover.
The system addresses the fact of a multiplicity of possible proofs.
The learning process is reinforced by a feedback loop between the learners and
the prover during which new training proofs may be found and wrong
classifications of premises may be corrected.
The version for CASC J10 will (likely) use non-neural machine learning methods
only (due to time and hardware limitations, as the neural methods are
time-consuming and often require GPU for efficient training).
Strategies
The main strategy of ATPBoost 1.0 for learning premise selection is to run
multiple iterations of the learning-proving feedback loop.
During the learning part, machine learning models are trained on the available
proofs to select relevant axioms.
During the proving part, the prover is run multiple times with axioms advised
by the trained models.
Newly found proofs give a feedback signal to the learners: highly-ranked
premises which were not used in any of the proofs are returned as negative
training examples, and new positive examples are extracted from the new proofs.
The machine learning models trained in such a way provide premise-selection
advice for ATP attempts for new conjectures. For each conjecture, the prover
is run with several axiom limits.
Implementation
The metasystem is implemented in Python. It uses several machine learning
libraries (XGBoost, OpenNMT, TensorFlow, Scikit-learn).
ATPBoost is available from
https://github.com/BartoszPiotrowski/ATPboost
Expected Competition Performance
ATPBoost is most useful when the number of available axioms is very large but
the problems itself are not very difficult provided relevant axioms are
selected.
ATPBoost has good performance on the MPTP Challenge.
The performance on the type of problems in the current LTB division is unknown!
CSE 1.3
Feng Cao
Southwest Jiaotong University, China
Architecture
CSE 1.3 is a developed prover based on the last version of CSE 1.2.
It is an automated theorem prover for first-order logic without equality
mainly based on a novel inference mechanism, called as Contradiction
Separation Based Dynamic Multi-Clause Synergized Automated Deduction (S-CS)
[XL+18],
which is able to handle multiple (two or more) clauses dynamically in a
synergized way in one deduction step, while binary resolution is its special
case.
CSE 1.3 also adopts conventional factoring, equality resolution (ER rule),
and variable renaming.
Some pre-processing techniques, including pure literal deletion and
simplification based on the distance to the goal clause, and a number of
standard redundancy criteria for pruning the search space: tautology deletion,
subsumption (forward and backward) are applied as well.
CSE 1.3 has been improved, compared with CSE 1.2, mainly from the following
aspects:
- Optimization of contradiction separation algorithm based on full usage
of clauses, which is able to increase the sufficiency of clauses
participating in deduction.
- Optimization of contradiction separation algorithm based on optimized
deduction path, which is able to effectively control the unifier
complexity in the deduction process.
- Dynamic adjustment of clause and literal weight update in the deduction
process.
Internally, CSE 1.3 works only with clausal normal form.
E prover
[Sch13]
is adopted with thanks for clausification of full first-order logic problems
during preprocessing.
Strategies
CSE 1.3 inherited most of the strategies in CSE 1.2.
The main new strategies are:
- SCSs control strategy.
Control the number of literals in SCSs generated in the deduction
process, and the SCSs meet the set contradictions will be retained.
- Clause weight update strategy.
Different update methods are applied in different clause deduction stages.
- Literal weight update strategy.
Flexible update methods are used for different literal deduction stages.
Implementation
CSE 1.3 is implemented mainly in C++, and JAVA is used for batch problem
running implementation.
Shared data structure is used for constants and shared variables storage.
In addition, special data structure is designed for property description of
clause, literal and term, so that it can support the multiple strategy mode.
E prover is used for clausification of FOF problems, and then TPTP4X is
applied to convert the CNF format into TPTP format.
Expected Competition Performance
CSE 1.3 has made some improvements compared to CSE 1.2, and so we expect a
better performance in this year's competition.
Acknowledgement:
Development of CSE 1.3 has been partially supported by the National Natural
Science Foundation of China (NSFC) (Grant No.61673320).
CSE_E 1.2
Feng Cao
Southwest Jiaotong University, China
Architecture
CSE_E 1.2 is an automated theorem prover for first-order logic by combining
CSE 1.3 and E 2.4, where CSE is based on the Contradiction Separation Based
Dynamic Multi-Clause Synergized Automated Deduction (S-CS)
[XL+18]
and E is mainly based on superposition.
The combination mechanism is like this: E and CSE are applied to the given
problem sequentially.
If either prover solves the problem, then the proof process completes.
If neither CSE nor E can solve the problem, some inferred clauses, especially
unit clauses, by CSE will be fed to E as lemmas, along with the original
clauses, for further proof search.
This kind of combination is expected to take advantage of both CSE and E,
and produce a better performance.
Concretely, CSE is able to generate a good number of unit clauses, based on
the fact that unit clauses are helpful for proof search and equality handling.
On the other hand, E has a good ability on equality handling.
Strategies
The strategies of CSE part of CSE_E 1.2 take the same strategies as in
CSE 1.3 standalone, e.g., clause/literal selection, strategy selection, and
CSC strategy.
The main new strategies for the combined systems are:
- Lemma filtering mainly based on variable appearance.
- Dynamic time allocation scheme in different run stages.
Implementation
CSE_E 1.2 is implemented mainly in C++, and JAVA is used for batch problem
running implementation.
The job dispatch between CSE and E is implemented in JAVA.
Expected Competition Performance
We expect CSE_E 1.2 to solve some hard problems and have a satisfying
performance.
Acknowledgement:
Development of CSE 1.3 has been partially supported by the National Natural
Science Foundation of China (NSFC) (Grant No.61673320).
Stephan Schulz for his kind permission on using his E prover that makes
CSE_E possible.
CVC4 1.8
Andrew Reynolds
University of Iowa, USA
Architecture
CVC4
[BC+11]
is an SMT solver based on the DPLL(T) architecture
[NOT06]
that includes built-in support for many theories, including linear arithmetic,
arrays, bit vectors, datatypes, finite sets and strings.
It incorporates approaches for handling universally quantified formulas.
For problems involving free function and predicate symbols, CVC4 primarily
uses heuristic approaches based on conflict-based instantiation and E-matching
for theorems, and finite model finding approaches for non-theorems.
For problems in pure arithmetic,
CVC4 uses techniques for counterexample-guided quantifier instantiation
[RKK17].
Like other SMT solvers, CVC4 treats quantified formulas using a two-tiered
approach.
First, quantified formulas are replaced by fresh Boolean predicates and the
ground theory solver(s) are used in conjunction with the underlying SAT solver
to determine satisfiability.
If the problem is unsatisfiable at the ground level, then the solver answers
"unsatisfiable".
Otherwise, the quantifier instantiation module is invoked, and will either add
instances of quantified formulas to the problem, answer "satisfiable", or
return unknown.
Finite model finding in CVC4 targets problems containing background theories
whose quantification is limited to finite and uninterpreted sorts.
In finite model finding mode, CVC4 uses a ground theory of finite cardinality
constraints that minimizes the number of ground equivalence classes, as
described in
[RT+13].
When the problem is satisfiable at the ground level, a candidate model is
constructed that contains complete interpretations for all predicate and
function symbols.
It then adds instances of quantified formulas that are in conflict with the
candidate model, as described in
[RT+13].
If no instances are added, it reports "satisfiable".
CVC4 has native support for problems in higher-order logic, as described in
recent work
[BR+19].
It uses a pragmatic approach for HOL, where lambdas are eliminated eagerly via
lambda lifting.
The approach extends the theory solver for quantifier-free uninterpreted
functions (UF) and E-matching.
For the former, the theory solver for UF in CVC4 now handles equalities
between functions using an extensionality inference.
Partial applications of functions are handle using a (lazy) applicative
encoding where some function applications are equated to the applicative
encoding. For the latter, several of the data structures for E-matching have
been modified to incorporate matching in the presence of equalities between
functions, function variables, and partial function applications.
Strategies
For handling theorems, CVC4 primarily uses conflict-based quantifier
instantiation
[RTd14,BFR17],
enumerative instantiation
[RBF18]
and E-matching.
CVC4 uses a handful of orthogonal trigger selection strategies for E-matching.
For handling non-theorems, CVC4 primarily uses finite model finding techniques.
Since CVC4 with finite model finding is also capable of establishing
unsatisfiability, it is used as a strategy for theorems as well.
For problems in pure arithmetic, CVC4 uses variations of counterexample-guided
quantifier instantiation
[RD+15],
which select relevant quantifier instantiations based on models for
counterexamples to quantified formulas.
At the quantifier-free level, CVC4 uses standard decision procedures for
linear arithmetic and uninterpreted functions, as well as heuristic approaches
for handling non-linear arithmetic
[RT+17].
Implementation
CVC4 is implemented in C++. The code is available from:
https://github.com/CVC4
Expected Competition Performance
The first-order theorem proving and finite model finding capabilities of CVC4
have not changed much in the past year.
Its heuristics for linear and non-linear arithmetic have changed slightly,
which should impact TFA.
Its higher-order capabilities have undergone some minor improvements, which
should lead to slightly better performance in THF.
E 2.4
Stephan Schulz
DHBW Stuttgart, Germany
Architecture
E 2.4pre
[Sch02,
Sch13,
SCV19]
is a purely equational theorem prover for many-sorted first-order logic with
equality.
It consists of an (optional) clausifier for pre-processing full first-order
formulae into clausal form, and a saturation algorithm implementing an
instance of the superposition calculus with negative literal selection and
a number of redundancy elimination techniques.
E is based on the DISCOUNT-loop variant of the given-clause algorithm, i.e.,
a strict separation of active and passive facts.
No special rules for non-equational literals have been implemented.
Resolution is effectively simulated by paramodulation and equality resolution.
As of E 2.1, PicoSAT
[Bie08]
can be used to periodically check the (on-the-fly grounded) proof state for
propositional unsatisfiability.
For the LTB divisions, a control program uses a SInE-like analysis to extract
reduced axiomatizations that are handed to several instances of E.
E will not use on-the-fly learning this year.
Strategies
Proof search in E is primarily controlled by a literal selection strategy, a
clause selection heuristic, and a simplification ordering.
The prover supports a large number of pre-programmed literal selection
strategies.
Clause selection heuristics can be constructed on the fly by combining
various parameterized primitive evaluation functions, or can be selected
from a set of predefined heuristics.
Clause evaluation heuristics are based on symbol-counting, but also take other
clause properties into account.
In particular, the search can prefer clauses from the set of support, or
containing many symbols also present in the goal.
Supported term orderings are several parameterized instances of
Knuth-Bendix-Ordering (KBO) and Lexicographic Path Ordering (LPO), which can
be lifted in different ways to literal orderings.
For CASC-27, E implements a strategy-scheduling automatic mode.
The total CPU time available is broken into several (unequal) time slices.
For each time slice, the problem is classified into one of several classes,
based on a number of simple features (number of clauses, maximal symbol
arity, presence of equality, presence of non-unit and non-Horn clauses, ...).
For each class, a schedule of strategies is greedily constructed from
experimental data as follows:
The first strategy assigned to a schedule is the the one that solves the
most problems from this class in the first time slice.
Each subsequent strategy is selected based on the number of solutions on
problems not already solved by a preceding strategy.
About 230 different strategies have been thoroughly evaluated on all untyped
first-order problems from TPTP 7.2.0.
In addition, we have explored some parts of the heuristic parameter space
with a short time limit of 5 seconds.
This allowed us to test about 650 strategies on all TPTP problems, and an
extra 7000 strategies on all 1193 UEQ problems from TPTP 7.2.0.
About 100 of these strategies are used in the automatic mode, and about 450
are used in at least one schedule.
Implementation
E is build around perfectly shared terms, i.e. each distinct term is
represented only once in a term bank.
The whole set of terms thus consists of a number of interconnected directed
acyclic graphs.
Term memory is managed by a simple mark-and-sweep garbage collector.
Unconditional (forward) rewriting using unit clauses is implemented using
perfect discrimination trees with size and age constraints.
Whenever a possible simplification is detected, it is added as a rewrite
link in the term bank.
As a result, not only terms, but also rewrite steps are shared.
Subsumption and contextual literal cutting (also known as subsumption
resolution) is supported using feature vector indexing
[Sch13].
Superposition and backward rewriting use fingerprint indexing
[Sch12],
a new technique combining ideas from feature vector indexing and path indexing.
Finally, LPO and KBO are implemented using the elegant and efficient
algorithms developed by Bernd Löchner in
[Loe06,
Loe06].
The prover and additional information are available at
https://www.eprover.org
Expected Competition Performance
E 2.4 was the CASC-27 UEQ winner.
E 2.5
Stephan Schulz
DHBW Stuttgart, Germany
Architecture
E 2.5pre
[Sch02,
Sch13,
SCV19]
is a purely equational theorem prover for many-sorted first-order logic with
equality.
It consists of an (optional) clausifier for pre-processing full first-order
formulae into clausal form, and a saturation algorithm implementing an instance
of the superposition calculus with negative literal selection and a number of
redundancy elimination techniques.
E is based on the DISCOUNT-loop variant of the given-clause
algorithm, i.e., a strict separation of active and passive facts.
No special rules for non-equational literals have been implemented.
Resolution is effectively simulated by paramodulation and equality resolution.
As of E 2.1, PicoSAT
[Bie08]
can be used to periodically check the (on-the-fly grounded) proof state
for propositional unsatisfiability.
For the LTB divisions, a control program uses a SInE-like analysis to extract
reduced axiomatizations that are handed to several instances of E.
E will not use on-the-fly learning this year.
Strategies
Proof search in E is primarily controlled by a literal selection strategy, a
clause selection heuristic, and a simplification ordering.
The prover supports a large number of pre-programmed literal selection
strategies.
Clause selection heuristics can be constructed on the fly by combining various
parameterized primitive evaluation functions, or can be selected from a set
of predefined heuristics.
Clause evaluation heuristics are based on symbol-counting, but also take other
clause properties into account.
In particular, the search can prefer clauses from the set of support, or
containing many symbols also present in the goal.
Supported term orderings are several parameterized instances of
Knuth-Bendix-Ordering (KBO) and Lexicographic Path Ordering (LPO), which can
be lifted in different ways to literal orderings.
For CASC-J10, E implements a strategy-scheduling automatic mode.
The total CPU time available is broken into several (unequal) time slices.
For each time slice, the problem is classified into one of several classes,
based on a number of simple features (number of clauses, maximal symbol arity,
presence of equality, presence of non-unit and non-Horn clauses, possibly
presence of certain axiom patterns, ...).
For each class, a schedule of strategies is greedily constructed from
experimental data as follows:
The first strategy assigned to a schedule is the the one that solves the most
problems from this class in the first time slice.
Each subsequent strategy is selected based on the number of solutions on
problems not already solved by a preceding strategy.
About 130 different strategies have been thoroughly evaluated on all untyped
first-order problems from TPTP 7.3.0.
We have also explored some parts of the heuristic parameter space with a short
time limit of 5 seconds.
This allowed us to test about 650 strategies on all TPTP problems, and an
extra 7000 strategies on UEQ problems from TPTP 7.2.0.
About 100 of these strategies are used in the automatic mode, and
about 450 are used in at least one schedule.
Implementation
E is build around perfectly shared terms, i.e. each distinct term is only
represented once in a term bank.
The whole set of terms thus consists of a number of interconnected directed
acyclic graphs.
Term memory is managed by a simple mark-and-sweep garbage collector.
Unconditional (forward) rewriting using unit clauses is implemented using
perfect discrimination trees with size and age constraints.
Whenever a possible simplification is detected, it is added as a rewrite link
in the term bank.
As a result, not only terms, but also rewrite steps are shared.
Subsumption and contextual literal cutting (also known as subsumption
resolution) is supported using feature vector indexing
[Sch13].
Superposition and backward rewriting use fingerprint indexing
[Sch12],
a new technique combining ideas from feature vector indexing and path indexing.
Finally, LPO and KBO are implemented using the elegant and efficient
algorithms developed by Bernd Löchner in
[Loe06,
Loe06].
The prover and additional information are available at
https://www.eprover.org
Expected Competition Performance
The inference core of E 2.5pre has been slightly modified since last years
pre-release.
We have also been able to evaluate some more different search strategies.
As a result, we expect performance to be somewhat better than in the last
years, especially in UEQ.
The system is expected to perform well in most proof classes, but will at best
complement top systems in the disproof classes.
Enigma 0.5.1
Jan Jakubuv
Czech Technical University in Prague, Czech Republic
Architecture
ENIGMA (Efficient learNing-based Inference Guiding MAchine)
[JU17,JU18,JU19,GJU19,CJ+19,JC+20]
is an efficient implementation of learning-based guidance for given clause
selection in saturation-based automated theorem provers.
Clauses from many proof searches are classified as positive and negative based
on their participation in the proofs.
An efficient classification model is trained on this data, using fast
feature-based characterization of the clauses.
The learned model is then tightly linked with the core prover and used as a
basis of a parameterized evaluation heuristic that provides fast ranking of
all generated clauses.
ENIGMA 0.4 uses E as its underlying prover.
The CASC-27 FOF competition version will most likely use a classifier based
on gradient-boosted trees (XGBoost or LightGBM) and clause features that
abstract from symbol and formula/clause names.
We may also include neural classifiers based on clause structure.
The system will be pre-trained on the latest public TPTP library and also use
a portfolio of strategies pre-trained for TPTP with BliStr(Tune)
[Urb13,JU17].
The system may also include strategies for large TPTP problems used previously
in the E.T. system.
Strategies
The core system is a modification of E adding learning-based evaluation of
the generated clauses.
The system is pre-trained on TPTP using abstracted characterization of clauses,
and the trained classifier is used during the competition as an additional
clause evaluation heuristic that is combined in various ways with the core
E strategies.
Implementation
The system modifies E's implementation by adding various features extraction
methods and linking E with efficient learning-based classifiers (tree-based,
linear and neural).
ENIGMA is available at:
https://github.com/ai4reason/enigmatic
Expected Competition Performance
ENIGMA should be able to improve on E's auto schedule in FOF.
Etableau 0.2
John Hester
University of Florida, USA
Architecture
Etableau 0.2 is an automated theorem prover built alongside E
[Sch13],
using it as a library.
Etableau at a basic level implements the clausal connection calculus, with
enforced regularity, folding up, local unification, and other improvements.
The greatest distinguishing feature of Etableau is a novel calculus using E's
saturation to close local branches of tableaux.
If a branch is local (shares no variables with other branches of the tableau)
and cannot be closed by other means, the branch is saturated as an independent
problem.
If a contradiction is found, this local branch can now be marked as closed.
If a local branch is found to be satisfiable, the problem is satisfiable.
Strategies
Etableau uses iterative deepening to control search through the space of
tableaux.
At a given maximum depth, tableaux to be expanded on are chosen based on the
ratio of branches that have been closed.
In other words, tableaux that have many closed branches and a small number of
remaining open branches are processed first.
In situations where attempts to close branches by saturation are made, E is used
with its auto heuristic.
Implementation
Etableau is written in C, using many data structures and functions from E
and with many added.
Etableau is available at this github repository:
https://github.com/hesterj/E-TAB
Expected Competition Performance
Etableau is expected to perform well on problems with smaller numbers of
axioms, and struggle on problems with many axioms.
GKC 0.5.1
Tanel Tammet
Tallinn University of Technology, Estonia
Architecture
GKC
[Tam19]
is a resolution prover optimized for search in large knowledge bases.
It is used as a foundation (GK Core) for building a common-sense reasoner GK.
In particular, GK can handle inconsistencies and perform probabilistic and
nonmonotonic reasoning, see
[Tam20].
We envision natural language question answering systems as the main potential
application for these specialized methods.
These standard inference rules have been implemented in GKC:
- Binary resolution with optionally the set of support strategy, negative
or positive ordered resolution or unit restriction.
- Hyperresolution.
- Factorization.
- Paramodulation and demodulation with the Knuth-Bendix ordering.
GKC does not currently implement any propositional inferences or instance
generation.
GKC splits the multiple strategies it decides to try between several forked
instances.
For the competition the plan is to use eight forks.
Each fork runs a subset of strategies sequentially.
Strategies
For CASC GKC uses multiple strategies run sequentially, with the time limit
starting at one second for each, increased 5 times once the whole batch has
been perforfmed.
The strategy selection takes into consideration the basic properties of the
problem, in particular its approximate size.
There is no interplay between different strategies.
We perform the selection of a given clause by using several queues in order to
spread the selection relatively uniformly over these categories of derived
clauses and their descendants: axioms, external axioms, assumptions and goals.
The queues are organized in two layers.
As a first layer we use the common ratio-based algorithm of alternating between
selecting n clauses from a weight-ordered queue and one clause from
the
FIFO queue with the derivation order.
As a second layer we use four separate queues based on the derivation history
of a clause.
Each queue in the second layer contains the two sub-queues of the first layer.
GKC only looks for proofs and does not try to show non-provability.
Implementation
GKC is implemented in C.
The data representation machinery is built upon a shared memory graph database
whitedb (http://whitedb.org/), enabling it to solve multiple different queries
in parallel processeses without a need to repeatedly parse or load the large
parsed knowledge base from the disk.
An interesting aspect of GKC is the pervasive use of hash indexes, feature
vectors and fingerprints, while no tree indexes are used.
GKC can be obtained from
https://github.com/tammet/gkc/
Expected Competition Performance
Compared to the performance in previous CASC, GKC 0.5.1 should perform somewhat
better.
In particular, more search strategies have been implemented and the selection
of search strategies is wider and more varied.
The core algorithms and data structures remain the same.
We expect GKC to be in the middle of the final ranking for FOF and below
the average in UEQ and LTB.
We expect GKC to perform well on very large problems.
iProver 3.3
Konstantin Korovin
University of Manchester, United Kingdom
Architecture
iProver
[Kor08]
is an automated theorem prover based on an instantiation calculus Inst-Gen
[Kor13,
GK03],
which is complete for first-order logic.
iProver approximates first-order clauses using propositional abstractions
which are solved using MiniSAT
[ES04]
and refined using model-guided instantiations.
iProver combines instantiation, resolution and superposition
[DK20]
and is extended with a general abstraction-refinement framework for under-and
over-approximations
[HK18,HK19].
Recent features in iProver include:
- Superposition calculus together with simplifications such as demodulation,
light normalisation, subsumption, subsumption resolution and global
subsumption.
iProver's simplification set up
[DK20]
is tunable via command line options and generalises common architectures
such as Discount or Otter.
- Heuristics used in iProver are learnt using dynamic clustering and
hyper-parameter optimisation using SMAC
[HHL12]
as described in
[HK19,HK19].
- In the LTB division iProver combines abstraction-refinement with axiom
selection based on the SinE algorithm
[HV11]
as implemented in Vampire
[KV13].
iProver will run in parallel most promising learnt heuristics.
Strategies
iProver has around 100 options to control the proof search including options
for literal selection, passive clause selection, frequency of calling the SAT
solver, simplifications and options for combination of instantiation with
resolution and superposition.
By default iProver will execute a small number of fixed schedules of selected
options depending on general syntactic properties such as Horn/non-Horn,
equational/non-equational, and maximal term depth.
In the competition we will also use a selection of learnt strategies.
Implementation
iProver is implemented in OCaml and for the ground reasoning uses MiniSat
[ES04].
iProver accepts FOF, TFF and CNF formats.
Vampire
[KV13,HK+12]
and E prover
[Sch13]
are used for proof-producing clausification of FOF/TFF problems,
Vampire is also used for SInE axiom selection
[HV11]
in the LTB division.
iProver is available at:
http://www.cs.man.ac.uk/~korovink/iprover/
Expected Competition Performance
We expect improvement in performance compared to the previous year due to
improved implementation of superposition, simplifications and heuristic
selection.
lazyCoP 0.1
Michael Rawson
University of Manchester, United Kingdom
Architecture
lazyCoP 0.1 is a connection-tableaux system for first-order logic with
equality.
It implements the lazy paramodulation calculus described in
[Pas08],
with some additional inferences such as "shortcut" strict rules and equality
lemmas.
The system implements well-known refinements of the predicate connection
calculus, such as tautology elimination and strong regularity, and these are
lifted to equalities where appropriate.
The resulting system appears to be complete, but we make no theoretical
claim one way or another.
The system was originally conceived to efficiently accommodate a
machine-learned heuristic guidance system: this system is not yet guided in
this way, but learned heuristics are intended for a future version.
Strategies
The system explores a tableaux-level search space using the classic A*
informed-search algorithm.
The (admissible) heuristic function is the number of open branches.
Typical connection systems explore via some kind of iterative deepening:
A* search is a necessity for future learned guidance, and is not as
catastrophic in memory consumption as might be expected.
No form of strategy scheduling is yet implemented and the system will run
for the entire time allowed on all available cores.
Implementation
A finite tree of inference rules forms a search space.
To expand a selected leaf node, the system traverses from root to leaf,
applying each rule to a new empty tableau.
Possible inferences from the resulting tableau are added to the leaf and the
resulting nodes are enqueued.
The system does not yet include a custom clausification routine: a recent
build of Vampire is employed for this purpose.
lazyCoP is implemented entirely in the Rust programming language, allowing
tight control over memory allocation and layout while avoiding some classes
of memory- and thread- safety bugs.
The source code (likely to be incomplete and/or buggy up to and including
the competition!) is available at:
https://github.com/MichaelRawson/lazycop
Expected Competition Performance
Performance on problems without equality is hoped to be comparable with other
connection systems, if slightly slower.
Problems requiring a modest amount of equational reasoning (or problems
requiring no equational reasoning but containing extraneous equality axioms)
are not expected to perform well, but should not cause catastrophic blowup
either.
Pure-equality problems (such as UEQ) are not the intended domain and do not
perform well, but the first author remains hopeful for At Least One Problem.
leanCoP 2.2
Jens Otten
University of Oslo, Norway
Architecture
leanCoP
[OB03,
Ott08]
is an automated theorem prover for classical first-order logic with equality.
It is a very compact implementation of the connection (tableau) calculus
[Bib87,
LS01].
Strategies
The reduction rule of the connection calculus is applied before the
extension rule.
Open branches are selected in a depth-first way.
Iterative deepening on the proof depth is performed in order to achieve
completeness.
Additional inference rules and techniques include regularity, lemmata,
and restricted backtracking
[Ott10].
leanCoP uses an optimized structure-preserving transformation
into clausal form
[Ott10]
and a fixed strategy schedule that is controlled by a shell script.
Implementation
leanCoP is implemented in Prolog.
The source code of the core prover consists only of a few lines of code.
Prolog's built-in indexing mechanism is used to quickly find connections
when the extension rule is applied.
leanCoP can read formulae in leanCoP syntax and in
TPTP first-order syntax. Equality axioms and axioms to support
distinct objects are automatically added if required.
The leanCoP core prover returns a very compact connection proof,
which is then translated into a more comprehensive output format,
e.g., into a lean (TPTP-style) connection proof or into a
readable text proof.
The source code of leanCoP 2.2 is available under the GNU general
public license.
It can be downloaded from the leanCoP website at:
http://www.leancop.de
The website also contains information about ileanCoP
[Ott08]
and MleanCoP
[Ott12,
Ott14],
two versions of leanCoP for first-order intuitionistic logic and first-order
modal logic, respectively.
Expected Competition Performance
As the prover has not changed, the performance of leanCoP 2.2
is expected to be the same as last year.
LEO-II 1.7.0
Alexander Steen
University of Luxembourg, Luxembourg
Architecture
LEO-II
[BP+08],
the successor of LEO
[BK98],
is a higher-order ATP system based on extensional higher-order resolution.
More precisely, LEO-II employs a refinement of extensional higher-order
RUE resolution
[Ben99].
LEO-II is designed to cooperate with specialist systems for fragments of
higher-order logic.
By default, LEO-II cooperates with the first-order ATP system E
[Sch02].
LEO-II is often too weak to find a refutation amongst the steadily growing
set of clauses on its own.
However, some of the clauses in LEO-II's search space attain a special
status: they are first-order clauses modulo the application of an
appropriate transformation function.
Therefore, LEO-II launches a cooperating first-order ATP system every n
iterations of its (standard) resolution proof search loop (e.g., 10).
If the first-order ATP system finds a refutation, it communicates its success
to LEO-II in the standard SZS format.
Communication between LEO-II and the cooperating first-order ATP system
uses the TPTP language and standards.
Strategies
LEO-II employs an adapted "Otter loop".
Moreover, LEO-II uses some basic strategy scheduling to try different
search strategies or flag settings.
These search strategies also include some different relevance filters.
Implementation
LEO-II is implemented in OCaml 4, and its problem representation language
is the TPTP THF language
[BRS08].
In fact, the development of LEO-II has largely paralleled the development
of the TPTP THF language and related infrastructure
[SB10].
LEO-II's parser supports the TPTP THF0 language and also the TPTP languages
FOF and CNF.
Unfortunately the LEO-II system still uses only a very simple
sequential collaboration model with first-order ATPs instead of using
the more advanced, concurrent and resource-adaptive OANTS architecture
[BS+08]
as exploited by its predecessor LEO.
The LEO-II system is distributed under a BSD style license, and it is
available from
http://www.leoprover.org
Expected Competition Performance
LEO-II ist not actively being developed anymore, hence there are no
expected improvements to last year's CASC results.
Leo-III 1.4
Alexander Steen
University of Luxembourg, Luxembourg
Architecture
Leo-III
[SB18]
[SB18],
the successor of LEO-II
[BP+08]
is a higher-order ATP system based on extensional higher-order paramodulation
with inference restrictions using a higher-order term ordering.
The calculus contains dedicated extensionality rules and is augmented with
equational simplification routines that have their intellectual roots in
first-order superposition-based theorem proving.
The saturation algorithm is a variant of the given clause loop procedure
inspired by the first-order ATP system E.
Leo-III cooperates with external first-order ATPs which are called
asynchronously during proof search; a focus is on cooperation with systems
that support typed first-order (TFF) input.
For this year's CASC, CVC4
[BC+11]
and E
[Sch02,
Sch13]
are used as external systems.
However, cooperation is in general not limited to first-order systems.
Further TPTP/TSTP-compliant external systems (such as higher-order ATPs or
counter model generators) may be included using simple command-line arguments.
If the saturation procedure loop (or one of the external provers) finds a
proof, the system stops, generates the proof certificate and returns the
result.
For the LTB division, Leo-III is augmented by an external Python3 driver which
schedules Leo-III of the batches.
Strategies
Leo-III comes with several configuration parameters that influence its proof
search by applying different heuristics and/or restricting inferences.
These parameters can be chosen manually by the user on start-up.
Leo-III currently does not offer portfolio scheduling (time slicing) nor
automatic selection of configuration parameters that seems somehow beneficial
for the reasoning problem input at hand.
Strategies and strategy scheduling will be addressed in further upcoming
versions.
Implementation
Leo-III utilizes and instantiates the associated LeoPARD system platform
[WSB15]
for higher-order (HO) deduction systems implemented in Scala (currently using
Scala 2.12 and running on a JVM with Java 8).
The prover makes use of LeoPARD's sophisticated data structures and implements
its own reasoning logic on top.
A generic parser is provided that supports all TPTP syntax dialects.
It is implemented using ANTLR4 and converts its produced concrete syntax tree
to an internal TPTP AST data structure which is then transformed into
polymorphically typed lambda terms.
As of version 1.1, Leo-III supports all common TPTP dialects (CNF, FOF, TFF,
THF) as well as its polymorphic variants
[BP13,
KSR16].
The term data structure of Leo-III uses a polymorphically typed spine term
representation augmented with explicit substitutions and De Bruijn-indices.
Furthermore, terms are perfectly shared during proof search, permitting
constant-time equality checks between alpha-equivalent terms.
Leo-III's saturation procedure may at any point invoke external reasoning
tools.
To that end, Leo-III includes an encoding module which translates (polymorphic)
higher-order clauses to polymorphic and monomorphic typed first-order clauses,
whichever is supported by the external system.
While LEO-II relied on cooperation with untyped first-order provers,
Leo-III exploits the native type support in first-order provers (TFF logic)
for removing clutter during translation and, in turn, higher effectivity of
external cooperation.
Leo-III is available on GitHub:
https://github.com/leoprover/Leo-III
Expected Competition Performance
Leo 1.4 was the CASC-27 LTB winner.
Leo-III 1.5
Alexander Steen
University of Luxembourg, Luxembourg
Architecture
Leo-III
[SB18],
the successor of LEO-II
[BP+08],
is a higher-order ATP system based on extensional higher-order paramodulation
with inference restrictions using a higher-order term ordering.
The calculus contains dedicated extensionality rules and is augmented with
equational simplification routines that have their intellectual roots in
first-order superposition-based theorem proving.
The saturation algorithm is a variant of the given clause loop procedure
inspired by the first-order ATP system E.
Leo-III cooperates with external first-order ATPs that are called
asynchronously during proof search; a focus is on cooperation with systems
that support typed first-order (TFF) input.
For this year's CASC, CVC4
[BC+11]
and E
[Sch02,Sch13]
are used as external systems.
However, cooperation is in general not limited to first-order systems.
Further TPTP/TSTP-compliant external systems (such as higher-order ATPs or
counter model generators) may be included using simple command-line arguments.
If the saturation procedure loop (or one of the external provers) finds a
proof, the system stops, generates the proof certificate and returns the
result.
For the LTB division, Leo-III is augmented by an external Python3 driver
which schedules Leo-III on the batches.
Strategies
Leo-III comes with several configuration parameters that influence its proof
search by applying different heuristics and/or restricting inferences.
These parameters can be chosen manually by the user on start-up.
Leo-III implements a naive time slicing approach of some of these strategies
for this CASC.
Implementation
Leo-III utilizes and instantiates the associated
LeoPARD system platform
[WSB15]
for higher-order (HO) deduction systems implemented in Scala (currently using
Scala 2.13 and running on a JVM with Java 8).
The prover makes use of LeoPARD's sophisticated
data structures and implements its own reasoning logic on top.
A generic parser is provided that supports all TPTP syntax dialects.
It is implemented using ANTLR4 and converts its produced concrete syntax tree
to an internal TPTP AST data structure, which is then transformed into
polymorphically typed lambda terms.
As of version 1.1, Leo-III supports all common TPTP dialects (CNF, FOF, TFF,
THF) as well as its polymorphic variants
[BP13,KRS16].
The term data structure of Leo-III uses a polymorphically typed spine term
representation augmented with explicit substitutions and De Bruijn-indices.
Furthermore, terms are perfectly shared during proof search, permitting
constant-time equality checks between alpha-equivalent terms.
Leo-III's saturation procedure may at any point invoke external reasoning
tools.
To that end, Leo-III includes an encoding module which translates (polymorphic)
higher-order clauses to polymorphic and monomorphic typed first-order clauses,
whichever is supported by the external system.
While LEO-II relied on cooperation with untyped first-order provers, Leo-III
exploits the native type support in first-order provers (TFF logic) for
removing clutter during translation and, in turn, higher effectivity of
external cooperation.
Leo-III is available on GitHub:
https://github.com/leoprover/Leo-III
Expected Competition Performance
Version 1.5 only marginally improves the previous release by fixing some bugs.
As CASC is using wall clock (WC) time instead of CPU time usage in all
divisions, the Java VM version of Leo-III is used in the competition (as
opposed to a native build used last year).
We hope that the JRE performs, after a slow start-up, quite well on longer
(wrt. WC time) runs.
We expect a similar performance as in last year's CASC.
Also in the LTB mode, there are no major novelties: only some timing
parameters have been changed compared to last year.
Stemming from Leo-III's support for polymorphic HOL reasoning, we expect a
reasonable performance compared to the other systems.
On the other hand, Leo-III's LTB mode does not do any learning and/or analysis
of the learning samples.
MaLARea 0.9
Josef Urban
Czech Technical University in Prague, Czech Republic
Architecture
MaLARea 0.8
[Urb07,US+08,KUV15]
is a metasystem for ATP in large theories where symbol and formula names are
used consistently.
It uses several deductive systems (now E, SPASS, Vampire, Paradox, Mace), as
well as complementary AI techniques like machine learning
based on symbol-based similarity, model-based similarity, term-based
similarity, and obviously previous successful proofs.
The version for CASC-27 will use the E prover with the BliStr(Tune)
[Urb13,JU17]
large-theory strategies, possibly also Prover9, Mace and Paradox.
The premise selection methods will likely also use the distance-weighted
k-nearest neighbor
[KU13],
ATPBoost
[PU18],
and E's implementation of SInE.
Strategies
The basic strategy is to run ATPs on problems, then use the machine learner
to learn axiom relevance for conjectures from solutions, and use
the most relevant axioms for next ATP attempts.
This is iterated, using different time limits and axiom limits.
Various features are used for learning, and the learning is complemented by
other criteria like model-based reasoning, symbol and term-based similarity,
etc.
Implementation
The metasystem is implemented in ca. 2500 lines of Perl.
It uses many external programs - the above mentioned ATPs and machine learner,
TPTP utilities, LADR utilities for work with models, and some standard
Unix tools.
MaLARea is available at:
https://github.com/JUrban/MPTP2/tree/master/MaLARea
The metasystem's Perl code is released under GPL2.
Expected Competition Performance
Thanks to machine learning, MaLARea is strongest on batches of many related
problems with many redundant axioms where some of the problems are easy to
solve and can be used for learning the axiom relevance.
MaLARea is not very good when all problems are too difficult (nothing to
learn from), or the problems (are few and) have nothing in common.
Some of its techniques (selection by symbol and term-based similarity,
model-based reasoning) could however make it even there slightly stronger
than standard ATPs.
MaLARea has a very good performance on the MPTP Challenge, which is a
predecessor of the LTB division, and on several previous LTB competitions.
Prover9 1109a
Bob Veroff on behalf of William McCune
University of New Mexico, USA
Architecture
Prover9, Version 2009-11A, is a resolution/paramodulation prover for
first-order logic with equality.
Its overall architecture is very similar to that of Otter-3.3
[McC03].
It uses the "given clause algorithm", in which not-yet-given clauses are
available for rewriting and for other inference operations (sometimes called
the "Otter loop").
Prover9 has available positive ordered (and nonordered) resolution and
paramodulation, negative ordered (and nonordered) resolution, factoring,
positive and negative hyperresolution, UR-resolution, and demodulation
(term rewriting).
Terms can be ordered with LPO, RPO, or KBO.
Selection of the "given clause" is by an age-weight ratio.
Proofs can be given at two levels of detail:
(1) standard, in which each line of the proof is a stored clause
with detailed justification, and
(2) expanded, with a separate line for each operation.
When FOF problems are input, proof of transformation to clauses
is not given.
Completeness is not guaranteed, so termination does not indicate
satisfiability.
Strategies
Prover9 has available many strategies; the following
statements apply to CASC.
Given a problem, Prover9 adjusts its inference rules and strategy according
to syntactic properties of the input clauses such as the presence of
equality and non-Horn clauses.
Prover9 also does some preprocessing, for example, to eliminate predicates.
For CASC Prover9 uses KBO to order terms for demodulation and for the
inference rules, with a simple rule for determining symbol precedence.
For the FOF problems, a preprocessing step attempts to
reduce the problem to independent subproblems by a miniscope
transformation; if the problem reduction succeeds, each
subproblem is clausified and given to the ordinary search
procedure; if the problem reduction fails, the original
problem is clausified and given to the search procedure.
Implementation
Prover9 is coded in C, and it uses the LADR libraries.
Some of the code descended from EQP
[McC97].
(LADR has some AC functions, but Prover9 does not use them).
Term data structures are not shared (as they are in Otter).
Term indexing is used extensively, with discrimination tree indexing for
finding rewrite rules and subsuming units, FPA/Path indexing for finding
subsumed units, rewritable terms, and resolvable literals.
Feature vector indexing
[Sch04]
is used for forward and backward nonunit subsumption.
Prover9 is available from
http://www.cs.unm.edu/~mccune/prover9/
Expected Competition Performance
Prover9 is the CASC fixed point, against which progress can be judged.
Each year it is expected do worse than the previous year, relative to the
other systems.
PyRes 1.3
Stephan Schulz
DHBW Stuttgart, Germany
Architecture
PyRes
[SP20]
is a simple resolution-style theorem prover for first-order logic, implemented
in very clear and well-commented Python.
It has been written as a pedagogical tool to illustrate the architecture and
basic algorithms of a saturation-style theorem prover.
The prover consists of a parser for (most of) TPTP-3 format, a simple
clausifier to convert full first-order format into clause normal form, and a
saturation core trying to derive the empty clause from the resulting clause
set.
The saturation core is based on the DISCOUNT-loop variant of the
given-clause algorithm, i.e., a strict separation of active and
passive facts.
It implements simple binary resolution and factoring
[Rob65],
optionally with selection of negative literals
[BG+01].
Redundancy elimination is restricted to forward and backward subsumption and
tautology deletion.
There are no inference rules for equality - if equality is detected, the
necessary axioms are added.
Strategies
The prover supports several negative literal selection strategies, as well as
selection of the given clause from a set of differently weighted priority
queues in the style of E
[SCV19].
In the competition, it will always select the syntactically largest literal,
and will use weight-age interleaved clause selection with a pick-given ratio
of 5 to 1.
Implementation
The prover is implemented in Python 3, with maximal emphasis on clear and
well-documented code.
Terms are represented as nested lists (equivalent to LISP style s-expressions).
Literals, clauses, and formulas are implemented as classes using an
object-oriented style.
One of the changes compared to last years version is the introduction of simple
indices for unification (which now uses top symbol hashing) and subsumption.
Subsumption now uses a new, simple technique we call predicate abstraction
indexing, which represents a clause as an ordered sequence of the
predicate symbols of its literals.
PyRes builds a proof object on the fly, and can print a TPTP-3 style proof or
saturation derivation.
The system source is available at:
https://github.com/eprover/PyRes
Expected Competition Performance
Performance is expected to be better than last year, but still mediocre for
non-equational problems, and abysmal for problems with equality.
However, per CASC rules, PyRes will still be assumed superior to any
non-participating prover.
Satallax 3.4
Michael Färber
ENS Paris-Saclay, France
Architecture
Satallax 3.4
[Bro12]
is an automated theorem prover for higher-order logic.
The particular form of higher-order logic supported by Satallax is Church's
simple type theory with extensionality and choice operators.
The SAT solver MiniSat
[ES04]
is responsible for much of the proof search.
The theoretical basis of search is a complete ground tableau calculus for
higher-order logic
[BS10]
with a choice operator
[BB11].
Problems are given in the THF format.
Proof search:
A branch is formed from the axioms of the problem and the negation of the
conjecture (if any is given).
From this point on, Satallax tries to determine unsatisfiability or
satisfiability of this branch.
Satallax progressively generates higher-order formulae and corresponding
propositional clauses
[Bro13].
These formulae and propositional clauses correspond to instances of the
tableau rules.
Satallax uses the SAT solver MiniSat to test the current set of propositional
clauses for unsatisfiability.
If the clauses are unsatisfiable, then the original branch is unsatisfiable.
Optionally, Satallax generates lambda-free higher-order logic (lfHOL) formulae
in addition to the propositional clauses
[VB+19].
If this option is used, then Satallax periodically calls the theorem prover E
[Sch13] to test for lfHOL unsatisfiability.
If the set of lfHOL formulae is unsatisfiable, then the original branch is
unsatisfiable.
Upon request, Satallax attempts to reconstruct a proof which can be output
in the TSTP format.
Strategies
There are about 150 flags that control the order in which formulae and
instantiation terms are considered and propositional clauses are generated.
Other flags activate some optional extensions to the basic proof procedure
(such as whether or not to call the theorem prover E).
A collection of flag settings is called a mode.
Approximately 500 modes have been defined and tested so far.
A strategy schedule is an ordered collection of modes with information about
how much time the mode should be allotted.
Satallax tries each of the modes for a certain amount of time sequentially.
Before deciding on the schedule to use, Satallax parses the problem and
determines if it is big enough that a SInE-based premise selection algorithm
[HV11]
should be used.
If SInE is not activated, then Satallax uses a strategy schedule consisting
of 37 modes.
If SInE is activated, than Satallax is run with a SInE-specific schedule
consisting of 58 modes with different SInE parameter values selecting different
premises.
Each mode is tried for time limits ranging from less than a second to just
over 1 minute.
Implementation
Satallax is implemented in OCaml, making use of the external tools MiniSat
(via a foreign function interface) and E.
Satallax is available at:
http://cl-informatik.uibk.ac.at/~mfaerber/satallax.html
Expected Competition Performance
Satallax 3.4 was the CASC-27 THF winner.
Satallax 3.5
Michael Färber
ENS Paris-Saclay, France
Architecture
Satallax
[Bro12]
is an automated theorem prover for higher-order logic.
The particular form of higher-order logic supported by Satallax is Church’s
simple type theory with extensionality and choice operators.
The SAT solver MiniSat
[ES04]
is responsible for much of the proof search.
The theoretical basis of search is a complete ground tableau calculus for
higher-order logic
[BS10]
with a choice operator
[BB11].
Problems are given in the THF format.
A branch is formed from the axioms of the problem and the negation of the
conjecture (if any is given).
From this point on, Satallax tries to determine unsatisfiability or
satisfiability of this branch.
Satallax progressively generates higher-order formulae and corresponding
propositional clauses
[Bro13].
These formulae and propositional clauses correspond to instances of the
tableau rules.
Satallax uses the SAT solver MiniSat to test the current set of propositional
clauses for unsatisfiability.
If the clauses are unsatisfiable, then the original branch is unsatisfiable.
Optionally, Satallax generates lambda-free higher-order logic (lfHOL) formulae
in addition to the propositional clauses
[VB+19].
If this option is used, then Satallax periodically calls the theorem prover E
[Sch13]
to test for lfHOL unsatisfiability.
If the set of lfHOL formulae is unsatisfiable, then the original branch is
unsatisfiable.
Upon request, Satallax attempts to reconstruct a proof which can be output
in the TSTP format.
Strategies
There are about 150 flags that control the order in which formulae and
instantiation terms are considered and propositional clauses are generated.
Other flags activate some optional extensions to the basic proof procedure
(such as whether or not to call the theorem prover E).
A collection of flag settings is called a mode.
Approximately 500 modes have been defined and tested so far.
A strategy schedule is an ordered collection of modes with information about
how much time the mode should be allotted.
Satallax tries each of the modes for a certain amount of time sequentially.
Before deciding on the schedule to use, Satallax parses the problem and
determines if it is big enough that a SInE-based premise selection algorithm
[HV11]
should be used.
If SInE is not activated, then Satallax uses a strategy schedule consisting of
37 modes.
If SInE is activated, than Satallax is run with a SInE-specific schedule
consisting of 58 modes with different SInE parameter values selecting
different premises.
Each mode is tried for time limits ranging from less than a second to just
over 1 minute.
Implementation
Satallax is implemented in OCaml, making use of the external tools MiniSat
(via a foreign function interface) and E.
Satallax is available at:
http://cl-informatik.uibk.ac.at/~mfaerber/satallax.html
Expected Competition Performance
The main change from Satallax 3.4 is the switch to a branch of E that fixes a
higher-order unsoundness.
The performance of Satallax 3.5 should be comparable to Satallax 3.4.
Twee 2.2.1
Nick Smallbone
Chalmers University of Technology, Sweden
Architecture
Twee 2.2.1 is an equational prover based on unfailing completion
[BDP89].
It features ground joinability testing
[MN90]
and a connectedness test
[BD88],
which together eliminate many redundant inferences in the presence of
unorientable equations.
Twee's implementation of ground joinability testing performs case splits on
the order of variables, in the style of
[MN90],
and discharges individual cases by rewriting modulo a variable ordering.
The case splitting strategy chooses only useful case splits, which
prevents the number of cases from blowing up.
Horn clauses are encoded as equations as described in
[CS18].
The CASC version of Twee "handles" non-Horn clauses by discarding them.
Strategies
Twee's strategy is simple and it does not tune its heuristics or strategy
based on the input problem.
The term ordering is always KBO; functions are ordered by number of
occurrences and always have weight 1.
The main loop is a DISCOUNT loop.
The active set contains rewrite rules and unorientable equations, which are
used for rewriting, and the passive set contains unprocessed critical pairs.
Twee often interreduces the active set, and occasionally simplifies the passive
set with respect to the active set.
Each critical pair is scored using a weighted sum of the weight of both of its
terms.
Terms are treated as DAGs when computing weights, i.e., duplicate subterms
are only counted once per term.
The weights of critical pairs that correspond to Horn clauses are adjusted
by the heuristic described in
[CS18],
section 5.
For CASC, to take advantage of multiple cores, several versions of
Twee run in parallel using different parameters.
Implementation
Twee is written in Haskell. Terms are represented as array-based flatterms
for efficient unification and matching.
Rewriting uses an imperfect discrimination tree.
The passive set is represented compactly (12 bytes per critical pair) by
storing only the information needed to reconstruct the critical pair, not
the critical pair itself.
Because of this, Twee can run for an hour or more without exhausting memory.
Twee uses an LCF-style kernel: all rules in the active set come with a
certified proof object which traces back to the input axioms.
When a conjecture is proved, the proof object is transformed into a
human-readable proof.
Proof construction does not harm efficiency because the proof kernel is
invoked only when a new rule is accepted.
In particular, reasoning about the passive set does not invoke the kernel.
The translation from Horn clauses to equations is not yet certified.
Twee can be downloaded from:
http://nick8325.github.io/twee
Expected Competition Performance
Twee is quite strong at UEQ, but will be at a disadvantage compared to provers
with smart timeslicing (such as E and Vampire) which can make better use of
the multiple cores.
Prediction: Twee will solve more problems than last year but finish below
E and Vampire.
It could still solve some problems that no-one else does, though.
Twee will do badly in FOF since it throws away all non-Horn clauses.
It may get lucky and solve a few hard problems, especially if some
mostly-equational problems show up.
Vampire 4.4
Giles Reger
University of Manchester, United Kingdom
There are no major changes to the main part of Vampire since 4.4, beyond
some new proof search heuristics and new default values for some options.
The biggest addition is support for higher-order reasoning via translation
to applicative form and combinators, addition of axioms and extra inference
rules, and a new form of combinatory unification.
Architecture
Vampire
[KV13]
4.4 is an automatic theorem prover for first-order logic with extensions to
theory-reasoning and higher-order logic.
Vampire implements the calculi of ordered binary resolution and superposition
for handling equality.
It also implements the Inst-gen calculus and a MACE-style finite model builder
[RSV16].
Splitting in resolution-based proof search is controlled by the AVATAR
architecture which uses a SAT or SMT solver to make splitting decisions
[Vor14,
RB+16].
Both resolution and instantiation based proof search make use of global
subsumption.
A number of standard redundancy criteria and simplification techniques are
used for pruning the search space: subsumption, tautology deletion, subsumption
resolution and rewriting by ordered unit equalities.
The reduction ordering is the Knuth-Bendix Ordering.
Substitution tree and code tree indexes are used to implement all major
operations on sets of terms, literals and clauses.
Internally, Vampire works only with clausal normal form.
Problems in the full first-order logic syntax are clausified during
preprocessing.
Vampire implements many useful preprocessing transformations including the
SinE axiom selection algorithm.
When a theorem is proved, the system produces a verifiable proof, which
validates both the clausification phase and the refutation of the CNF.
Vampire 4.4 provides a very large number of options for strategy selection.
The most important ones are:
- Choices of saturation algorithm:
- Limited Resource Strategy
[RV03]
- DISCOUNT loop
- Otter loop
- Instantiation using the Inst-Gen calculus
- MACE-style finite model building with sort inference
- Splitting via AVATAR
[Vor14]
- A variety of optional simplifications.
- Parameterized reduction orderings.
- A number of built-in literal selection functions and different modes of
comparing literals
[HR+16].
- Age-weight ratio that specifies how strongly lighter clauses are preferred
for inference selection.
- Set-of-support strategy.
- For theory-reasoning:
- Ground equational reasoning via congruence closure.
- Addition of theory axioms and evaluation of interpreted functions.
- Use of Z3 with AVATAR to restrict search to ground-theory-consistent
splitting branches
[RB+16].
- Specialised theory instantiation and unification
[RSV18].
- Extensionality resolution with detection of extensionality axioms
- For higher-order problems:
- Translation to applicative and combinator form.
- Addition of combinator axioms.
- Addition of shortcut inference rules that encode axioms.
- Proof search heuristics targetting the growth of combinator axioms.
- Restricted combinatory unification
[RSV18].
Implementation
Vampire 4.4 is implemented in C++.
It makes use of Minisat and Z3.
Expected Competition Performance
Vampire 4.4 was the CASC-27 TFA, FOF, and FNT winner.
Vampire 4.5
Giles Reger
University of Manchester, United Kingdom
Architecture
Vampire
[KV13]
4.5 is an automatic theorem prover for first-order logic with extensions to
theory-reasoning and higher-order logic.
Vampire implements the calculi of ordered binary resolution and superposition
for handling equality.
It also implements the Inst-gen calculus and a MACE-style finite model builder
[RSV16].
Splitting in resolution-based proof search is controlled by the AVATAR
architecture which uses a SAT or SMT solver to make splitting decisions
[Vor14,RB+16].
A number of standard redundancy criteria and simplification techniques are
used for pruning the search space: subsumption, tautology deletion,
subsumption resolution and rewriting by ordered unit equalities.
The reduction ordering is the Knuth-Bendix Ordering. Substitution tree and
code tree indexes are used to implement all major operations on sets of terms,
literals and clauses.
Internally, Vampire works only with clausal normal form. Problems in the full
first-order logic syntax are clausified during preprocessing
[RSV16].
Vampire implements many useful preprocessing transformations including the
SinE axiom selection algorithm.
When a theorem is proved, the system produces a verifiable proof, which validates both the clausification phase and the refutation of the CNF.
Strategies
Vampire 4.5 provides a very large number of options for strategy selection.
The most important ones are:
- Choices of saturation algorithm:
- Limited Resource Strategy
[RV03]
- DISCOUNT loop
- Otter loop
- Instantiation using the Inst-Gen calculus
- MACE-style finite model building with sort inference
- Splitting via AVATAR
[Vor14]
- A variety of optional simplifications.
- Parameterized reduction orderings.
- A number of built-in literal selection functions and different modes of
comparing literals
[HR+16].
- Age-weight ratio that specifies how strongly lighter clauses are preferred
for inference selection.
This has been extended with a layered clause selection approach
[GS20].
- Set-of-support strategy with extensions for theory reasoning.
- For theory-reasoning:
- Ground equational reasoning via congruence closure.
- Addition of theory axioms and evaluation of interpreted functions.
- Use of Z3 with AVATAR to restrict search to ground-theory-consistent
splitting branches
[RB+16].
- Specialised theory instantiation and unification
[RSV18].
- Extensionality resolution with detection of extensionality axioms
- For higher-order problems:
- Translation to polymorphic first-order logic using applicative form
and combinators.
- A new superposition calculus
[BG20]
utilising a KBO-like ordering
[BG20]
for orienting combinator equations.
The calculus introduces an inference, narrow, for rewriting with
combinator equations.
- Proof search heuristics targeting the growth of clauses resulting
from narrowing.
- An extension of unification with abstraction to deal with functional
and boolean extensionality.
- Various inferences to deal with booleans
Implementation
Vampire 4.5 is implemented in C++.
It makes use of minisat and Z3.
See the website
https://vprover.github.io/
for more information and access to the GitHub repository.
Expected Competition Performance
There are four areas of improvement in Vampire 4.5.
Firstly, a new layered clause selection approach
[GS20]
gives Vampire more fine-grained control over clause selection, in particular
the way in which clauses involving theory axioms are selected.
Secondly, theory evaluation and instantiation methods have been overhauled.
Thirdly, a new subsumption demodulation rule
[GKR20]
improves support for reasoning with conditional equalities.
Finally, higher-order reasoning (introduced in Vampire 4.4) has been rewritten
based on a new superposition calculus
[BG20]
utilising a KBO-like ordering
[BG20]
for orienting combinator equations.
Vampire 4.5 should be an improvement on Vampire 4.4.
Zipperposition 2.0
Petar Vukmirović
Vrije Universiteit Amsterdam, The Netherlands
Architecture
Zipperposition is a superposition-based theorem prover for typed first-order
logic with equality and higher-order logic. It is a pragmatic implementation of
a complete calculus for Boolean-free higher-order logic
[BB+19].
It features a number of extensions that include polymorphic types; user-defined
rewriting on terms and formulas ("deduction modulo theories"); a lightweight
variant of AVATAR for case splitting; boolean reasoning
[VN20].
The core architecture of the prover is based on saturation with an extensible
set of rules for inferences and simplifications.
Zipperposition uses a recently developed full higher-order unification
algorithm that enables efficient integration of procedures for decidable
fragments of higher-order unification
[VBN20].
The initial calculus and main loop were imitations of an old version of E
[Sch02],
but there are many more rules nowadays.
A summary of the calculus for integer arithmetic and induction can be found in
[Cru15].
Strategies
The system uses various strategies in a portfolio.
The strategies are run in parallel, making use of all CPU cores available.
We designed the portfolio of strategies by manual inspection of different
TPTP problems.
Heuristics used in Zipperposition are inspired by efficient heuristics used
in E.
Portfolio mode differentiates higher-order problems from the first-order ones.
If the problem is first-order all higher-order prover features are turned off.
Other than that, the portfolio is static and does not depend on the syntactic
properties of the problem.
Implementation
The prover is implemented in OCaml, and has been around for eight years.
Term indexing is done using fingerprints for unification, perfect
discrimination trees for rewriting, and feature vectors for subsumption.
Some inference rules such as contextual literal cutting make heavy use of
subsumption.
For higher-order problems some strategies use E prover, running in lambda-free
higher-order mode, as an end-game backend prover.
The code can be found at
https://github.com/sneeuwballen/zipperposition
and is entirely free software (BSD-licensed).
Zipperposition can also output graphic proofs using graphviz.
Some tools to perform type inference and clausification for typed formulas
are also provided, as well as a separate library for dealing with terms and
formulas
[Cru15].
Expected Competition Performance
The prover is expected to have average performance on FOF, similar to Prover9,
and a good performance on THF, at the level of last-year's CASC winner.