Entrants' System Descriptions
CSE 1.4
Feng Cao
Southwest Jiaotong University, China
Architecture
CSE 1.4 is a developed prover based on the last version - CSE 1.3.
It is an automated theorem prover for first-order logic without equality,
based mainly on a novel inference mechanism called Contradiction
Separation Based Dynamic Multi-Clause Synergized Automated Deduction (S-CS)
[XL+18].
S-CS is able to handle multiple (two or more) clauses dynamically in a
synergized way in one deduction step, while binary resolution is a special
case.
CSE 1.4 also adopts conventional factoring, equality resolution (ER rule),
and variable renaming. Some pre-processing techniques, including pure literal
deletion and simplification based on the distance to the goal clause, and
a number of standard redundancy criteria for pruning the search space:
tautology deletion, subsumption (forward and backward), are applied as
well.
CSE 1.4 has been improved compared with CSE 1.3, mainly from the
following aspects:
- Optimization of the contradiction separation algorithm based on full
usage of clauses, which is able to evaluate whether the input
clause has been fully used in the deduction process.
- Optimization of the contradiction separation algorithm based on
optimized deduction path, which is able to increase the deduction
usage of original clauses.
Internally CSE 1.4 works only with clausal normal form.
The E prover
[Sch13]
is adopted with thanks for clausification of full first-order logic problems
during preprocessing.
Strategies
CSE 1.4 inherited most of the strategies in CSE 1.3. The main new
strategies are:
- Deduction control strategy. Dynamically evaluate the complexity of
deduction based on the unified clause structure.
- CSCs control strategy. Evaluate whether the CSC generated in
the deduction process are retained according its term structure.
- Clause selection strategy. Select different types of clauses to
participate in S-CS deduction according to the dynamic sorting method.
Implementation
CSE 1.4 is implemented mainly in C++, and Java is used for batch problem
running implementation.
A shared data structure is used for constants and shared variables storage.
In addition, special data structure is designed for property description of
clause, literal and term, so that it can support the multiple strategy mode.
E prover is used for clausification of FOF problems, and then TPTP4X is
applied to convert the CNF format into TPTP format.
Expected Competition Performance
CSE 1.4 has made some improvements compared to CSE 1.3, and so
we expect a better performance in this year's competition.
Acknowledgement:
Development of CSE 1.4 has been partially supported by the National
Natural Science Foundation of China (NSFC) (Grant No.61673320) and
The General Research Project of Jiangxi Education Department (Grant
No. GJJ200818).
CSE_E 1.3
Peiyao Liu
Southwest Jiaotong University, China
Architecture
CSE_E 1.3 is an automated theorem prover for first-order logic, combining
CSE-F 1.0 and E 2.5, where CSE-F is based on the Contradiction Separation
Based Dynamic Multi-Clause Synergized Automated Deduction (S-CS)
[XL+18]
and E is mainly based on superposition.
The combination mechanism is like this: E and CSE-F are applied to the given
problem sequentially.
If either prover solves the problem, then the proof process completes.
If neither CSE-F nor E can solve the problem, some inferred clauses with no
more than two literals, especially unit clauses, from CSE-F are fed to E
as lemmas, along with the original clauses, for further proof search.
This kind of combination is expected to take advantage of both CSE-F and E,
and produce a better performance.
Concretely, CSE-F is able to generate a good number of unit clauses, based on
the fact that unit clauses are helpful for proof search and equality handling.
On the other hand, E has a good ability on equality handling.
Strategies
The CSE-F part of CSE_E 1.3 uses almost the same strategies as in the
CSE-F 1.0 standalone, e.g., clause/literal selection, strategy selection, and
CSC strategy.
The only difference is that equality handling strategies of CSE-F part of the
combined system are blocked.
The main new strategies for the combined systems are:
- Lemma filtering mainly based on deduction weight of binary clauses.
- Fine-grained dynamic time allocation scheme in different run stages.
Implementation
CSE_E 1.3 is implemented mainly in C++, and Java is used for batch
problem running implementation.
The job dispatch between CSE-F and E is implemented in C++.
Expected Competition Performance
We expect CSE_E 1.3 to solve some hard problems that E cannot solve
and have a satisfying performance.
Acknowledgement:
Development of CSE-F 1.0 has been partially supported by the National
Natural Science Foundation of China (NSFC) (Grant No.61673320).
Acknowledgements to Prof. Stephan Schulz for his kind permission on
using his E prover that makes CSE_E possible.
CSE-F 1.0
Peiyao Liu
Southwest Jiaotong University, China
Architecture
CSE-F 1.0 is an automated theorem prover for first-order logic without
equality mainly based on a novel inference mechanism, called as Contradiction
Separation Based Dynamic Multi-Clause Synergized Automated Deduction (S-CS)
[XL+18].
S-CS is able to handle multiple (two or more) clauses dynamically in a
synergized way in one deduction step, while binary resolution is its special
case.
CSE-F 1.0 is the successor of CSE 1.3, and retains the advantages of CSE 1.3.
At the same time, it has improved the deduction framework, and implements a
new S-CS inference algorithm.
With this new inference algorithm, binary clauses are fully used until no
binary clause is added to the contradiction during each deduction epoch,
given the fact that unit clauses are helpful for proof search.
Based on this, CSE-F 1.0 can solve some hard problems that CSE 1.3 cannot.
CSE-F 1.0 also adopts conventional factoring, equality resolution (ER rule),
and variable renaming.
Some pre-processing techniques, including pure literal deletion and
simplification based on the distance to the goal clause, and a number of
standard redundancy criteria for pruning the search space: tautology deletion,
subsumption (forward and backward) are applied as well.
Internally, CSE-F 1.0 works only with clausal normal form.
The E prover
[Sch13]
is adopted with thanks for clausification of full first-order logic
problems during preprocessing.
Strategies
CSE-F 1.0 inherited most of the strategies in CSE 1.3. It introduces
some new strategies as well, e.g., pre-processing, clause (literal)
selection and clause filtering. At the same time, some strategies in CSE
1.3 were optimized.
The main new strategies are:
- Pre-processing strategy.
Mark clauses containing independent literals that can form complementary
pairs with certain unit clause for subsequent pre-processing.
- Clause(literal) selection strategy.
This strategy mainly considers the comprehensive weight involving
complexity, function nesting level, deduction distance of clause(literal).
- Clause filtering strategy.
Filter out the binary clauses that do not meet deduction conditions.
The main optimized strategies include:
- Clause deduction distance.
Dynamically update clause deduction distance in the process of
constructing contradictions in order to better guide selection of
clauses, rather than just as a pre-processing strategy.
- Contradiction separation clause (CSC) strategy.
Strictly limit the number of literals in CSC.
Initially limit the number of literals in CSC to zero, and gradually
expand the number of literals in CSC during the deduction process.
Implementation
CSE-F 1.0 is implemented mainly in C++, and Java is used for batch problem
running implementation.
A shared data structure is used for constants and shared variables storage.
In addition, special data structure is designed for property description of
clause, literal and term, so that it supports multiple strategy mode.
E prover is used for clausification of FOF problems, and then TPTP4X is
applied to convert the CNF format into TPTP format.
Expected Competition Performance
CSE-F 1.0 adopts a new algorithm framework, and so we expect a
better performance than CSE 1.3 in this year's competition.
Acknowledgement:
Development of CSE-F 1.0 has been partially supported by the National
Natural Science Foundation of China (NSFC) (Grant No.61673320).
cvc5 1.0
Andrew Reynolds
University of Iowa, USA
Architecture
cvc5 is the successor of CVC4
[BC+11].
It is an SMT solver based on the CDCL(T) architecture
[NOT06]
that includes built-in support for many theories, including linear arithmetic,
arrays, bit vectors, datatypes, finite sets and strings.
It incorporates approaches for handling universally quantified formulas.
For problems involving free function and predicate symbols, cvc5 primarily
uses heuristic approaches based on conflict-based instantiation and E-matching
for theorems, and finite model finding approaches for non-theorems.
Like other SMT solvers, cvc5 treats quantified formulas using a two-tiered
approach.
First, quantified formulas are replaced by fresh Boolean predicates and the
ground theory solver(s) are used in conjunction with the underlying SAT solver
to determine satisfiability.
If the problem is unsatisfiable at the ground level, then the solver answers
"unsatisfiable".
Otherwise, the quantifier instantiation module is invoked, and will either add
instances of quantified formulas to the problem, answer "satisfiable", or
return unknown.
Finite model finding in cvc5 targets problems containing background theories
whose quantification is limited to finite and uninterpreted sorts.
In finite model finding mode, cvc5 uses a ground theory of finite cardinality
constraints that minimizes the number of ground equivalence classes, as
described in
[RT+13].
When the problem is satisfiable at the ground level, a candidate model is
constructed that contains complete interpretations for all predicate and
function symbols.
It then adds instances of quantified formulas that are in conflict with the
candidate model, as described in
[RT+13].
If no instances are added, it reports "satisfiable".
cvc5 has native support for problems in higher-order logic, as described in
[BR+19].
It uses a pragmatic approach for HOL, where lambdas are eliminated eagerly via
lambda lifting.
The approach extends the theory solver for quantifier-free uninterpreted
functions (UF) and E-matching.
For the former, the theory solver for UF in cvc5 now handles equalities between
functions using an extensionality inference.
Partial applications of functions are handle using a (lazy) applicative
encoding where some function applications are equated to the applicative
encoding.
For the latter, several of the data structures for E-matching have been
modified to incorporate matching in the presence of equalities between
functions, function variables, and partial function applications.
Strategies
For handling theorems, cvc5 primarily uses conflict-based quantifier
instantiation
[RTd14,BFR17],
numerative instantiation
[RBF18]
and E-matching.
vc5 uses a handful of orthogonal trigger selection strategies for E-matching,
nd several orthogonal ordering heuristics for enumerative instantiation.
For handling non-theorems, cvc5 primarily uses finite model finding techniques.
Since cvc5 with finite model finding is also capable of establishing
unsatisfiability, it is used as a strategy for theorems as well.
Implementation
cvc5 is implemented in C++.
The code is available from
https://github.com/cvc5/cvc5
Expected Competition Performance
cvc5 has support for fine-grained proofs, which will be generated in solutions
this year.
Due to performance overhead in generating proofs, we expect the performance of
cvc5 to degrade slightly with respect to CVC4 last year.
Otherwise, the first-order theorem proving and finite model finding
capabilities of cvc5 have undergone minor improvements, particularly to
enumerative instantiation.
Hence, cvc5 will perform comparably to previous versions of CVC4.
Drodi 3.1.5
Oscar Contreras
Amateur programmer, Spain
Architecture
Drodi 3.1 is a very basic and lightweight automated theorem prover.
It implements ordered resolution and equality paramodulation inferences as
well as demodulation and some other standard simplifications.
It also includes its own basic implementations of clausal normal form
conversion
[NW01],
AVATAR architecture with a SAT solver
[Vor14],
Limited Resource Strategy
[RV03],
discrimination trees as well as KBO, non recursive and lexicographic reduction
orderings.
Drodi produces a (hopefully) verifiable proof in TPTP format.
Strategies
Drodi 3.1 has a fair number of selectable strategies including, but not
limited to, the following:
- Otter, Discount and Limited Resource Strategy
[RV03]
saturation algorithms.
- A basic implementation of AVATAR architecture
[Vor14].
- Several literal and term reduction orderings.
- Several literal selection options.
- Several clause selection heuristics with adjustable selection ratios, including several types of clause weight queues and one age queue.
- Classical clause relevancy pruning.
- Drodi can generate a learning file from successful proofs and use the
file to guide clause selection strategy.
It is based in the enhanced ENIGMA method.
However, unlike ENIGMA, the learning file is completely general and can
be used with any kind of problems.
This generality allows the use of the same learning file in both FOF
and UEQ CASC competition divisions.
Drodi's integrated learning functions are a generalization of ENIGMA
[JU17,
JU18].
It uses a general learning file applicable to any kind of problems during
CASC competition.
Literal polarity, equality, skolem and variable occurrences, are stored in
clause feature vectors.
Unlike ENIGMA, instead of storing the specific functions and predicates
themselves only the general properties of functions and non equality
predicates are stored in clause feature vectors.
Predicates are differentiated from functions.
In addition the following properties are also stored:
- Predicate or function is in the conjecture.
- Predicate or function is in the problem file but it is not in the
conjecture.
- Predicate or function is only in axiom files.
Implementation
Drodi 3.1 is implemented in C.
It includes discrimination trees and hashing indexing.
All the code is original, without special code libraries or code taken from
other sources.
Expected Competition Performance
This is the first time that Drodi participates in CASC.
It will enter the FOF and UEQ divisions.
Program tests with 2020 CASC‑J10 problems indicate that Drodi will score in
the second half of the score table, probably in the last or next to last
position.
E 2.5
Stephan Schulz
DHBW Stuttgart, Germany
Architecture
E 2.5pre
[Sch02,
Sch13,
SCV19]
is a purely equational theorem prover for many-sorted first-order logic with
equality.
It consists of an (optional) clausifier for pre-processing full first-order
formulae into clausal form, and a saturation algorithm implementing an instance
of the superposition calculus with negative literal selection and a number of
redundancy elimination techniques.
E is based on the DISCOUNT-loop variant of the given-clause
algorithm, i.e., a strict separation of active and passive facts.
No special rules for non-equational literals have been implemented.
Resolution is effectively simulated by paramodulation and equality resolution.
As of E 2.1, PicoSAT
[Bie08]
can be used to periodically check the (on-the-fly grounded) proof state
for propositional unsatisfiability.
For the LTB divisions, a control program uses a SInE-like analysis to extract
reduced axiomatizations that are handed to several instances of E.
E will not use on-the-fly learning this year.
Strategies
Proof search in E is primarily controlled by a literal selection strategy, a
clause selection heuristic, and a simplification ordering.
The prover supports a large number of pre-programmed literal selection
strategies.
Clause selection heuristics can be constructed on the fly by combining various
parameterized primitive evaluation functions, or can be selected from a set
of predefined heuristics.
Clause evaluation heuristics are based on symbol-counting, but also take other
clause properties into account.
In particular, the search can prefer clauses from the set of support, or
containing many symbols also present in the goal.
Supported term orderings are several parameterized instances of
Knuth-Bendix-Ordering (KBO) and Lexicographic Path Ordering (LPO), which can
be lifted in different ways to literal orderings.
For CASC-J10, E implements a strategy-scheduling automatic mode.
The total CPU time available is broken into several (unequal) time slices.
For each time slice, the problem is classified into one of several classes,
based on a number of simple features (number of clauses, maximal symbol arity,
presence of equality, presence of non-unit and non-Horn clauses, possibly
presence of certain axiom patterns, ...).
For each class, a schedule of strategies is greedily constructed from
experimental data as follows:
The first strategy assigned to a schedule is the the one that solves the most
problems from this class in the first time slice.
Each subsequent strategy is selected based on the number of solutions on
problems not already solved by a preceding strategy.
About 130 different strategies have been thoroughly evaluated on all untyped
first-order problems from TPTP 7.3.0.
We have also explored some parts of the heuristic parameter space with a short
time limit of 5 seconds.
This allowed us to test about 650 strategies on all TPTP problems, and an
extra 7000 strategies on UEQ problems from TPTP 7.2.0.
About 100 of these strategies are used in the automatic mode, and
about 450 are used in at least one schedule.
Implementation
E is build around perfectly shared terms, i.e. each distinct term is only
represented once in a term bank.
The whole set of terms thus consists of a number of interconnected directed
acyclic graphs.
Term memory is managed by a simple mark-and-sweep garbage collector.
Unconditional (forward) rewriting using unit clauses is implemented using
perfect discrimination trees with size and age constraints.
Whenever a possible simplification is detected, it is added as a rewrite link
in the term bank.
As a result, not only terms, but also rewrite steps are shared.
Subsumption and contextual literal cutting (also known as subsumption
resolution) is supported using feature vector indexing
[Sch13].
Superposition and backward rewriting use fingerprint indexing
[Sch12],
a new technique combining ideas from feature vector indexing and path indexing.
Finally, LPO and KBO are implemented using the elegant and efficient
algorithms developed by Bernd Löchner in
[Loe06,
Loe06].
The prover and additional information are available at
https://www.eprover.org
Expected Competition Performance
The inference core of E 2.5pre has been slightly modified since last years
pre-release.
We have also been able to evaluate some more different search strategies.
As a result, we expect performance to be somewhat better than in the last
years, especially in UEQ.
The system is expected to perform well in most proof classes, but will at best
complement top systems in the disproof classes.
E 2.6
Stephan Schulz
DHBW Stuttgart, Germany
Architecture
E
[Sch02,
Sch13,
SCV19]
is a purely equational theorem prover for many-sorted first-order logic with
equality, with some extensions for higher-order logic.
It consists of an (optional) clausifier for pre-processing full first-order
formulae into clausal form, and a saturation algorithm implementing an instance
of the superposition calculus with negative literal selection and a number of
redundancy elimination techniques.
E is based on the DISCOUNT-loop variant of the given-clause algorithm, i.e., a
strict separation of active and passive facts.
No special rules for non-equational literals have been implemented.
Resolution is effectively simulated by paramodulation and equality resolution.
As of E 2.1, PicoSAT
[Bie08]
can be used to periodically check the (on-the-fly grounded) proof state for
propositional unsatisfiability.
For the LTB divisions, a control program uses a SInE-like analysis to extract
reduced axiomatizations that are handed to several instances of E.
E will not use on-the-fly learning this year.
Strategies
Proof search in E is primarily controlled by a literal selection strategy, a
clause selection heuristic, and a simplification ordering.
The prover supports a large number of pre-programmed literal selection
strategies.
Clause selection heuristics can be constructed on the fly by combining various
parameterized primitive evaluation functions, or can be selected from a set of
predefined heuristics.
Clause evaluation heuristics are based on symbol-counting, but also take other
clause properties into account.
In particular, the search can prefer clauses from the set of support, or
containing many symbols also present in the goal.
Supported term orderings are several parameterized instances of
Knuth-Bendix-Ordering (KBO) and Lexicographic Path Ordering (LPO), which can
be lifted in different ways to literal orderings.
For CASC-28, E implements a strategy-scheduling automatic mode.
The total CPU time available is broken into several (unequal) time slices.
For each time slice, the problem is classified into one of several classes,
based on a number of simple features (number of clauses, maximal symbol arity,
presence of equality, presence of non-unit and non-Horn clauses, possibly
presence of certain axiom patterns...).
For each class, a schedule of strategies is greedily constructed from
experimental data as follows:
The first strategy assigned to a schedule is the the one that solves the most
problems from this class in the first time slice.
Each subsequent strategy is selected based on the number of solutions on
problems not already solved by a preceding strategy.
About 140 different strategies have been thoroughly evaluated on all untyped
first-order problems from TPTP 7.3.0.
We have also explored some parts of the heuristic parameter space with a short
time limit of 5 seconds.
This allowed us to test about 650 strategies on all TPTP problems, and an extra
7000 strategies on UEQ problems from TPTP 7.2.0.
About 100 of these strategies are used in the automatic mode, and about 450 are
used in at least one schedule.
Implementation
E is build around perfectly shared terms, i.e. each distinct term is only
represented once in a term bank.
The whole set of terms thus consists of a number of interconnected directed
acyclic graphs.
Term memory is managed by a simple mark-and-sweep garbage collector.
Unconditional (forward) rewriting using unit clauses is implemented using
perfect discrimination trees with size and age constraints.
Whenever a possible simplification is detected, it is added as a rewrite link
in the term bank.
As a result, not only terms, but also rewrite steps are shared.
Subsumption and contextual literal cutting (also known as subsumption
resolution) is supported using feature vector indexing
[Sch13].
Superposition and backward rewriting use fingerprint indexing
[Sch12],
a new technique combining ideas from feature vector indexing and path indexing.
Finally, LPO and KBO are implemented using the elegant and efficient algorithms
developed by Bernd Löchner in
[Loe06,Loe06].
The prover and additional information are available at
https://www.eprover.org
Expected Competition Performance
The inference core of E 2.6 has been slightly modified since last year.
We have also been able to evaluate some more different search strategies.
As a result, we expect performance to be somewhat better than in the last
years, especially in UEQ.
The system is expected to perform well in most proof classes, but will at best
complement top systems in the disproof classes.
Ehoh 2.7
Petar Vukmirović
Vrije Universiteit Amsterdam, The Netherlands
Architecture
Ehoh is a higher-order superposition-based theorem prover implementing
lambda-free higher-order superposition
[BB+21].
Recently, Ehoh has been extended to support not only lambda-free, but full
higher-order syntax.
Internally, Ehoh unfolds all definitions of predicate symbols, lifts lambdas
and removes all Boolean subterms through a FOOL-like
[KK+16]
preprocessing transformation.
After these steps are performed, the problem lies in the lambda-free fragment
and the standard lambda-free superposition applies.
Ehoh also supports TFX $ite and $let syntax.
On the reasoning side, modest additions to the calculus have been made:
We implemented rules NegExt, PosExt and Ext-* family of rules described by
Bentkamp et al.
[BB+19].
Full support for lambda-terms and calculus-level treatment of Boolean terms
is expected in the next version of Ehoh.
Strategies
The system uses exactly the same portfolio of strategies as E 2.7, with the
only difference that rules NegExt, PosExt and Ext-* family rules are turned
on regardless of the chosen strategy.
Implementation
Ehoh 2.7 shares the codebase of E 2.7:
It is a version of E prover compiled with compile-time option ENABLE_LFHO
enabled.
Ehoh is available from
https://github.com/eprover/eprover
which includes more details on Ehoh's compilation and installation.
Expected Competition Performance
The prover is expected to have poor performance on THF problems, slightly
worse than CVC4 1.8.
On SLH problems, it is expected to perform better, on a par with
Zipperposition.
Etableau 0.67
John Hester
University of Florida, USA
Architecture
Etableau is a theorem prover for first order logic based on combining the
strong connection calculus and the superposition calculus.
Etableau centers the idea of local variables in tableau proof search.
Branches that are local (contain only local variables) are sent to the core
proof procedure of E.
Saturating along branches allows the automatic generation of unit lemmata.
Strategies
Etableau uses a depth first branch selection function, and maintains a small
number of distinct tableaux in memory simultaneously.
During superposition proof search on local branches, E's "--auto" mode is used.
Etableau can backtrack when proof search fails, and remembers previous
attempts at using superposition search on branches so that the search does not
have to repeat itself.
Implementation
Etableau is implemented in C and compiled alongside E, using E as a library
and orthogonal prover.
This allows Etableau to use the clause and formula datatypes of E, facilitating
directly calling the proof search functions of E with clauses from the tableau
rather than starting a new process for every time an attempt to saturate a
branch is made.
Etableau also uses the clausification and preprocessing of E.
Etableau can be obtained from
https://github.com/hesterj/Etableau
Expected Competition Performance
Etableau will solve fewer problems than E, but may solve some that others
cannot.
Etableau will solve significantly more problems than last year.
GKC 0.7
Tanel Tammet
Tallinn University of Technology, Estonia
Architecture
GKC
[Tam19]
is a resolution prover optimized for search in large knowledge bases.
It is used as a foundation (GK Core) for building a common-sense reasoner GK.
In particular, GK can handle inconsistencies and perform probabilistic and
nonmonotonic reasoning, see
[Tam21].
We envision natural language question answering systems as the main potential
application for these specialized methods.
The WASM version of the previous GKC 0.6 is used as the prover engine in the
educational http://logictools.org
system.
It can read and output proofs in the TPTP, simplified TPTP and JSON format,
the latter compatible with JSON-LD, see
[TS21].
These standard inference rules have been implemented in GKC:
- Binary resolution with optionally the set of support strategy, negative
or positive ordered resolution or unit restriction.
- Hyperresolution.
- Factorization.
- Paramodulation and demodulation with the Knuth-Bendix ordering.
GKC does not currently implement any propositional inferences or instance
generation.
It only looks for proofs and does not try to show non-provability.
Strategies
GKC uses multiple strategies run sequentially, with the time limit starting at
0.1 seconds for each, increased 10 or 5 times once the whole batch has been
performed.
The strategy selections takes into consideration the basic properties of the
problem: the presence of equality and the approximate size of the problem.
We perform the selection of a given clause by using several queues in order
to spread the selection relatively uniformly over these categories of derived
clauses and their descendants: axioms, external axioms, assumptions and goals.
The queues are organized in two layers.
As a first layer we use the common ratio-based algorithm of alternating between
selecting n clauses from a weight-ordered queue and one clause from the FIFO
queue with the derivation order.
As a second layer we use four separate queues based on the derivation history
of a clause.
Each queue in the second layer contains the two sub-queues of the first layer.
Implementation
GKC is implemented in C. The data representation machinery is built upon a
shared memory graph database Whitedb enabling
it to solve multiple different queries in parallel processeses without a need
to repeatedly parse or load the large parsed knowledge base from the disk.
An interesting aspect of GKC is the pervasive use of hash indexes, feature
vectors and fingerprints, while no tree indexes are used.
GKC can be obtained from
https://github.com/tammet/gkc/
Expected Competition Performance
Compared to the performance in previous CASC, GKC 0.7 should perform somewhat
better.
In particular, more search strategies have been implemented and the selection
of search strategies is wider and more varied.
The core algorithms and data structures remain the same.
We expect GKC to be in the middle of the final ranking for FOF and below the
average in UEQ and LTB.
We expect GKC to perform well on very large problems.
iProver 3.5
Konstantin Korovin
University of Manchester, United Kingdom
Architecture
iProver interleaves instantiation calculus Inst-Gen
[Kor13,Kor08,GK03]
with ordered resolution and superposition calculi
[DK20].
iProver approximates first-order clauses using propositional abstractions
which are solved using MiniSAT
[ES04]
and refined using model-guided instantiations.
iProver also implements a general abstraction-refinement framework for
under-and over-approximations of first-order clauses
[HK18,HK19].
First-order clauses are exchanged between calculi during the proof search.
Recent features in iProver include:
- AC joinability and AC normalisation
[DK21].
- Support for quantified linear and non-linear arithmetic.
- Superposition calculus with simplifications including: demodulation,
light normalisation, subsumption, subsumption resolution and global
subsumption.
iProver's simplification set up
[DK20]
is tunable via command line options and generalises common architectures
such as Discount or Otter.
- HOS-ML framework for learning heuristics using combination of
hyper-parameter optimisation and dynamic clustering together with
schedule optimisation using constraint solving
[HK21,HK19]
Strategies
iProver has around 100 options to control the proof search including options
for literal selection, passive clause selection, frequency of calling the SAT
solver, simplifications and options for combination of instantiation with
resolution and superposition.
For the competition HOS-ML
[HK21]
was used to build a multi-core schedule from heuristics learnt over a sample
of FOF problems.
Implementation
iProver is implemented in OCaml and for the ground reasoning uses MiniSat
[ES04]
and Z3
[dMB08].
iProver accepts FOF, TFF and CNF formats.
Vampire
[KV13,HK+12]
and E prover
[Sch13]
are used for proof-producing clausification of FOF/TFF problems.
Vampire is also used for SInE axiom selection
[HV11]
in the LTB division.
iProver is available at:
http://www.cs.man.ac.uk/~korovink/iprover/
Expected Competition Performance
We expect improvement in performance compared to the previous year due to
improvements in superposition, AC reasoning, simplifications and heuristic
selection.
Heuristic tuning is still work in progress and in particular we reused
heuristics trained for FOF in the LTB division which might be not ideal as
the nature of the problems is quite different.
JavaRes 1.3.0
Adam Pease
Articulate Software, USA
Architecture
JavaRes is a simple, resolution-based theorem prover, primarily created for
teaching theorem proving.
It implements the basic calculus from Robinson's seminal paper
[Rob65],
extended with negative literal selection and some redundancy elimination as
described by Bachmair and Ganziner
[BG+01].
The core is a given-clause based clausal saturation algorithm.
The system also supports full first-order input via clausification, and
equality handling via automatic addition of equality axioms.
Strategies
JavaRes includes all the optimization strategies in PyRes.
For clause selection it implements two methods, which are combined.
The most basic is a first-in-first-out (FIFO) strategy that will eventually
try every clause.
A symbol-counting strategy picks the clause with the fewest symbols.
This results in a strong bias to smaller clauses while ensuring that all
clauses will eventually be tried.
JavaRes supports indexing for subsumption and resolution.
Subsumption removes clauses from the set of clauses to be processed (called
"forward subsumption") and from the set already processed ("backwards
subsumption") thereby decreasing the problem search space.
More general clauses subsume more specific ones.
Indexing is used and employs records with signs and predicate symbols only,
so that potential clauses can be accepted or rejected more rapidly than
attempting unification.
JavaRes also implements PyRes' approach to literal selection.
Largest literal selection is the default strategy. For large theories, JavaRes
has implemented the SInE algorithm
[HV11]
although performance on the LTB problems is so poor for this simple prover,
compared to modern provers such as E and Vampire that we do not enter JavaRes
in that division.
Implementation
JavaRes is largely a re-implementation of PyRes, but in Java, and with
additional features that are not part of the core inference algorithm.
Additional features include implementation of SInE, parsing of SUO-KIF syntax,
graphical proof output using GraphViz.
The implementation is designed to be straightforward and doesn't include any
of the newer Java language features such as lambda expressions.
It is available from
https://github.com/ontologyportal/JavaRes
Expected Competition Performance
JavaRes is faster than PyRes simply due to the implementation language.
It solves a few more problems than PyRes but is significantly faster on
problems that both provers solve.
It is inferior compared to most modern superposition-based provers.
It is expected to perform reasonably well on problems without equality.
LEO-II 1.7.0
Alexander Steen
University of Luxembourg, Luxembourg
Architecture
LEO-II
[BP+08],
the successor of LEO
[BK98],
is a higher-order ATP system based on extensional higher-order resolution.
More precisely, LEO-II employs a refinement of extensional higher-order
RUE resolution
[Ben99].
LEO-II is designed to cooperate with specialist systems for fragments of
higher-order logic.
By default, LEO-II cooperates with the first-order ATP system E
[Sch02].
LEO-II is often too weak to find a refutation amongst the steadily growing
set of clauses on its own.
However, some of the clauses in LEO-II's search space attain a special
status: they are first-order clauses modulo the application of an
appropriate transformation function.
Therefore, LEO-II launches a cooperating first-order ATP system every n
iterations of its (standard) resolution proof search loop (e.g., 10).
If the first-order ATP system finds a refutation, it communicates its success
to LEO-II in the standard SZS format.
Communication between LEO-II and the cooperating first-order ATP system
uses the TPTP language and standards.
Strategies
LEO-II employs an adapted "Otter loop".
Moreover, LEO-II uses some basic strategy scheduling to try different
search strategies or flag settings.
These search strategies also include some different relevance filters.
Implementation
LEO-II is implemented in OCaml 4, and its problem representation language
is the TPTP THF language
[BRS08].
In fact, the development of LEO-II has largely paralleled the development
of the TPTP THF language and related infrastructure
[SB10].
LEO-II's parser supports the TPTP THF0 language and also the TPTP languages
FOF and CNF.
Unfortunately the LEO-II system still uses only a very simple
sequential collaboration model with first-order ATPs instead of using
the more advanced, concurrent and resource-adaptive OANTS architecture
[BS+08]
as exploited by its predecessor LEO.
The LEO-II system is distributed under a BSD style license, and it is
available from
http://www.leoprover.org
Expected Competition Performance
LEO-II is not actively being developed anymore, hence there are no
expected improvements to last year's CASC results.
Leo-III 1.6
Alexander Steen
University of Luxembourg, Luxembourg
Architecture
Leo-III
[SB21],
the successor of LEO-II
[BP+08],
is a higher-order ATP system based on extensional higher-order paramodulation
with inference restrictions using a higher-order term ordering.
The calculus contains dedicated extensionality rules and is augmented with
equational simplification routines that have their intellectual roots on
first-order superposition-based theorem proving.
The saturation algorithm is a variant of the given clause loop procedure
inspired by the first-order ATP system E.
Leo-III cooperates with external first-order ATPs which are called
asynchronously during proof search; a focus is on cooperation with systems
that support typed first-order (TFF) input.
For this year's CASC, CVC4
[BC+11]
and E
[Sch02,Sch13]
are used as external systems.
However, cooperation is in general not limited to first-order systems.
Further TPTP/TSTP-compliant external systems (such as higher-order ATPs or
counter model generators) may be included using simple command-line arguments.
If the saturation procedure loop (or one of the external provers) finds a
proof, the system stops, generates the proof certificate and returns the
result.
For the LTB division, Leo-III is augmented by an external Python3 driver which schedules Leo-III on the batches.
Strategies
Leo-III comes with several configuration parameters that influence its proof
search by applying different heuristics and/or restricting inferences.
These parameters can be chosen manually by the user on start-up.
Leo-III implements a naive time slicing approach of some of these strategies
since last CASC.
Implementation
Leo-III utilizes and instantiates the associated LeoPARD system platform
[WSB15]
for higher-order (HO) deduction systems implemented in Scala (currently using
Scala 2.13 and running on a JVM with Java 8).
The prover makes use of LeoPARD's data structures and implements its own
reasoning logic on top.
A hand-crafted parser is provided that supports all TPTP syntax dialects.
It converts its produced concrete syntax tree to an internal TPTP AST data
structure which is then transformed into polymorphically typed lambda terms.
As of version 1.1, Leo-III supports all common TPTP dialects (CNF, FOF, TFF,
THF) as well as its polymorphic variants
[BP13,KRS16].
The term data structure of Leo-III uses a polymorphically typed spine term
representation augmented with explicit substitutions and De Bruijn-indices.
Furthermore, terms are perfectly shared during proof search, permitting
constant-time equality checks between alpha-equivalent terms.
Leo-III's saturation procedure may at any point invoke external reasoning
tools.
To that end, Leo-III includes an encoding module that translates (polymorphic)
higher-order clauses to polymorphic and monomorphic typed first-order clauses,
whichever is supported by the external system.
While LEO-II relied on cooperation with untyped first-order provers, Leo-III
exploits the native type support in first-order provers (TFF logic) for
removing clutter during translation and, in turn, higher effectivity of
external cooperation.
Leo-III is available on GitHub:
https://github.com/leoprover/Leo-III
Expected Competition Performance
Version 1.6 only marginally improves the previous release by fixing some bugs;
also the old (and very slow) ANTLR-based parser was replaced by a new
(hand-crafted) TPTP parser.
As CASC is using wall clock (WC) time instead of CPU time usage in all
divisions (except for SLH), the Java VM version of Leo-III is used in the
competition (as opposed to a native build used last year).
We hope that the JRE performs - after a slow start-up - quite well on longer
runs (wrt. WC time).
We do not expect Leo-III to be competitive in the SLH division as it imposes
strong CPU time limits that Leo-III's JRE will quickly exceed.
For LTB and THF, we expect a similar performance as in last year's CASC.
In the LTB mode, Leo-III is testing a preliminary SinE-based axiom selection.
Stemming from Leo-III's support for polymorphic HOL reasoning, we expect a
reasonable performance.
On the other hand Leo-III's performance for reasoning with a large number of
axioms is quite poor.
Leo-III's LTB mode does not do any learning and/or analysis of the learning
samples.
Prover9 1109a
Bob Veroff on behalf of William McCune
University of New Mexico, USA
Architecture
Prover9, Version 2009-11A, is a resolution/paramodulation prover for
first-order logic with equality.
Its overall architecture is very similar to that of Otter-3.3
[McC03].
It uses the "given clause algorithm", in which not-yet-given clauses are
available for rewriting and for other inference operations (sometimes called
the "Otter loop").
Prover9 has available positive ordered (and nonordered) resolution and
paramodulation, negative ordered (and nonordered) resolution, factoring,
positive and negative hyperresolution, UR-resolution, and demodulation
(term rewriting).
Terms can be ordered with LPO, RPO, or KBO.
Selection of the "given clause" is by an age-weight ratio.
Proofs can be given at two levels of detail:
(1) standard, in which each line of the proof is a stored clause
with detailed justification, and
(2) expanded, with a separate line for each operation.
When FOF problems are input, proof of transformation to clauses
is not given.
Completeness is not guaranteed, so termination does not indicate
satisfiability.
Strategies
Prover9 has available many strategies; the following
statements apply to CASC.
Given a problem, Prover9 adjusts its inference rules and strategy according
to syntactic properties of the input clauses such as the presence of
equality and non-Horn clauses.
Prover9 also does some preprocessing, for example, to eliminate predicates.
For CASC Prover9 uses KBO to order terms for demodulation and for the
inference rules, with a simple rule for determining symbol precedence.
For the FOF problems, a preprocessing step attempts to
reduce the problem to independent subproblems by a miniscope
transformation; if the problem reduction succeeds, each
subproblem is clausified and given to the ordinary search
procedure; if the problem reduction fails, the original
problem is clausified and given to the search procedure.
Implementation
Prover9 is coded in C, and it uses the LADR libraries.
Some of the code descended from EQP
[McC97].
(LADR has some AC functions, but Prover9 does not use them).
Term data structures are not shared (as they are in Otter).
Term indexing is used extensively, with discrimination tree indexing for
finding rewrite rules and subsuming units, FPA/Path indexing for finding
subsumed units, rewritable terms, and resolvable literals.
Feature vector indexing
[Sch04]
is used for forward and backward nonunit subsumption.
Prover9 is available from
http://www.cs.unm.edu/~mccune/prover9/
Expected Competition Performance
Prover9 is the CASC fixed point, against which progress can be judged.
Each year it is expected do worse than the previous year, relative to the
other systems.
RPx 1.0
Anders Schlichtkrull
Aalborg University Copenhagen, Denmark
Architecture
RPx 1.0
[SBT19,SBT18]
implements the nondeterministic ordered resolution prover by Bachmair and
Ganzinger
[BG+01].
It therefore uses their ordered resolution rule together with their definitions
of tautology deletion, subsumption rules and reduction rules.
The ordered resolution rule is restricted to binary resolution.
Strategies
The prover loop is loosely modelled after the one described by Voronkov
[Vor14].
It applies the ordered resolution rule together with tautology deletion,
subsumption and reduction.
This strategy is applied to all problems.
Implementation
The prover is built in Isabelle/HOL
[RPXISA]
as a data refinement from the calculus down to a fully executable program.
This goes through the following refinement layers: (1) the nondeterministic
ordered resolution prover by Bachmair and Ganzinger
[SB+20,SB+18,SB+18,BG+01],
(2) a nondeterministic ordered resolution prover that enforces fairness, (3) a
deterministic prover that represents clauses and the clauses database as lists,
and commits to a strategy for assigning priorities to clauses, and (4) a fully
executable program with a concrete datatype for atoms and executable
definitions for most general unifiers, clause subsumption, and the order on
atoms.
Each layer's prover's soundness and completeness are proved by a refinement
from the soundness and completeness of the previous layer.
From the fourth layer Standard ML code is extracted using Isabelle's code
generator
[HN10].
The fourth layer uses several formalizations from IsaFoR
[ST13,TS09].
RPx employs the TPTP parser and clausifier of Metis
[Hur03].
Coupling Metis and the generated RPX code together required a simple conversion
between their very similar datatypes for clauses.
To be able to solve problems with equality in the competition, RPx uses a
script written by Geoff Sutcliffe that uses the TPTP4X tool to add the axioms
of equality into problems with equality.
RPx is available at
https://github.com/anderssch/RPx
Expected Competition Performance
RPx competes in the divisions FOF and FNT.
RPx uses a magnificent calculus but the data structures are mediocre.
A benchmark done when developing RPx concluded that it is not a competitive
prover.
This was concluded by comparing its performance with that of Vampire, E and
Metis.
Vampire and E were far ahead.
Metis also performed better, but by a smaller margin
[SBT19].
SATCoP 0.1
Michael Rawson
University of Manchester, United Kingdom
Architecture
SATCoP 0.1
[RR21]
implements a typical connection-tableau system with a SAT twist: first-order
clauses (partially) instantiated while building tableaux are continuously
grounded and fed to a boolean satisfiability routine.
When the growing set of propositional clauses becomes unsatisfiable, we have
found a proof.
In the meantime, ground information can influence search.
Satisfying assignments focus SATCoP somewhat: goal literals are only attempted
if their ground abstraction is assigned true.
Ground information can also be used to control a combination of pseudo-random
shuffling and iterative deepening.
We note that this system has been developed slightly since
[RR21]:
the system is very new and there are still many directions and optimisations
to explore.
Strategies
There are no specially-designed strategies in SATCoP.
However, some parts of the system employ a pseudo-random number generator
(PRNG), which can change proof search significantly.
Therefore, to make use of multiple cores, we launch an appropriate number of
threads, each using a different seed for their PRNG.
Each thread runs to the time limit uninterrupted
Implementation
The system is implemented compactly in a few thousand lines of Rust.
The internal SAT routine is by far the biggest bottleneck: we first try cheap
stochastic local search, then fall back to PicoSAT
[Bie08]
if we fail to find a satisfying assignment quickly.
See the website
https://github.com/MichaelRawson/satcop/
Expected Competition Performance
Based on the 2020 competition, we hope to prove at least 150 of the 2021
FOF problems.
SATCoP can in principle attempt unit equality problems, but is not very good
at them.
It cannot show non-theorems, or deal with other logics.
Twee 2.4
Nick Smallbone
Chalmers University of Technology, Sweden
Architecture
Twee
[Sma21]
is a theorem prover for unit equality problems based on unfailing completion
[BDP89].
It implements a DISCOUNT loop, where the active set contains rewrite rules
(and unorientable equations) and the passive set contains critical pairs.
The basic calculus is not goal-directed, but Twee implements a transformation
which improves goal direction for many problems.
Twee features ground joinability testing
[MN90]
and a connectedness test
[BD88],
which together eliminate many redundant inferences in the presence of
unorientable equations.
The ground joinability test performs case splits on the order of variables,
in the style of
[MN90],
and discharges individual cases by rewriting modulo a variable ordering.
Horn clauses are encoded as equations as described in
[CS18].
For CASC, Twee accepts non-Horn problems but throws away all the non-Horn
clauses.
Strategies
Twee's strategy is simple and it does not tune its heuristics or strategy based
on the input problem.
The term ordering is always KBO; by default, functions are ordered by number
of occurrences and have weight 1.
The proof loop repeats the following steps:
- Select and normalise the lowest-scored critical pair, and if it is not
redundant, add it as a rewrite rule to the active set.
- Normalise the active rules with respect to each other.
- Normalise the goal with respect to the active rules.
Each critical pair is scored using a weighted sum of the weight of both of
its terms.
Terms are treated as DAGs when computing weights, i.e., duplicate subterms are
only counted once per term.
The weights of critical pairs that correspond to Horn clauses are adjusted by
the heuristic described in [CS18], section 5.
For CASC, to take advantage of multiple cores, several versions of Twee run
in parallel using different parameters (e.g., with the goal-directed
transformation on or off).
Implementation
Twee is written in Haskell.
Terms are represented as array-based flatterms for efficient unification and
matching.
Rewriting uses a perfect discrimination tree.
The passive set is represented compactly (12 bytes per critical pair) by only
storing the information needed to reconstruct the critical pair, not the
critical pair itself.
Because of this, Twee can run for an hour or more without exhausting memory.
Twee uses an LCF-style kernel: all rules in the active set come with a
certified proof object which traces back to the input axioms.
When a conjecture is proved, the proof object is transformed into a
human-readable proof.
Proof construction does not harm efficiency because the proof kernel is
invoked only when a new rule is accepted.
In particular, reasoning about the passive set does not invoke the kernel.
The translation from Horn clauses to equations is not yet certified.
Twee can be downloaded as open source from:
http://nick8325.github.io/twee
Expected Competition Performance
Twee is quite strong at UEQ, and ought to compete with the top provers.
It should perform better than last year, thanks to the goal-directed
transformation mentioned above and some other performance improvements.
It may suffer in COL (because its handling of existential goals is mediocre)
and in RNG (where many problems are best solved with LPO or RPO).
As Twee only supports Horn clauses it will do badly in FOF.
It may get lucky and solve a few hard problems, especially if some
mostly-equational problems show up.
Vampire 4.5
Giles Reger
University of Manchester, United Kingdom
Architecture
Vampire
[KV13]
4.5 is an automatic theorem prover for first-order logic with extensions to
theory-reasoning and higher-order logic.
Vampire implements the calculi of ordered binary resolution and superposition
for handling equality.
It also implements the Inst-gen calculus and a MACE-style finite model builder
[RSV16].
Splitting in resolution-based proof search is controlled by the AVATAR
architecture which uses a SAT or SMT solver to make splitting decisions
[Vor14,RB+16].
A number of standard redundancy criteria and simplification techniques are
used for pruning the search space: subsumption, tautology deletion,
subsumption resolution and rewriting by ordered unit equalities.
The reduction ordering is the Knuth-Bendix Ordering. Substitution tree and
code tree indexes are used to implement all major operations on sets of terms,
literals and clauses.
Internally, Vampire works only with clausal normal form. Problems in the full
first-order logic syntax are clausified during preprocessing
[RSV16].
Vampire implements many useful preprocessing transformations including the
SinE axiom selection algorithm.
When a theorem is proved, the system produces a verifiable proof, which validates both the clausification phase and the refutation of the CNF.
Strategies
Vampire 4.5 provides a very large number of options for strategy selection.
The most important ones are:
- Choices of saturation algorithm:
- Limited Resource Strategy
[RV03]
- DISCOUNT loop
- Otter loop
- Instantiation using the Inst-Gen calculus
- MACE-style finite model building with sort inference
- Splitting via AVATAR
[Vor14]
- A variety of optional simplifications.
- Parameterized reduction orderings.
- A number of built-in literal selection functions and different modes of
comparing literals
[HR+16].
- Age-weight ratio that specifies how strongly lighter clauses are preferred
for inference selection.
This has been extended with a layered clause selection approach
[GS20].
- Set-of-support strategy with extensions for theory reasoning.
- For theory-reasoning:
- Ground equational reasoning via congruence closure.
- Addition of theory axioms and evaluation of interpreted functions.
- Use of Z3 with AVATAR to restrict search to ground-theory-consistent
splitting branches
[RB+16].
- Specialised theory instantiation and unification
[RSV18].
- Extensionality resolution with detection of extensionality axioms
- For higher-order problems:
- Translation to polymorphic first-order logic using applicative form
and combinators.
- A new superposition calculus
[BR20]
utilising a KBO-like ordering
[BR20]
for orienting combinator equations.
The calculus introduces an inference, narrow, for rewriting with
combinator equations.
- Proof search heuristics targeting the growth of clauses resulting
from narrowing.
- An extension of unification with abstraction to deal with functional
and boolean extensionality.
- Various inferences to deal with booleans
Implementation
Vampire 4.5 is implemented in C++.
It makes use of minisat and Z3.
See the website
https://vprover.github.io/
for more information and access to the GitHub repository.
Expected Competition Performance
There are four areas of improvement in Vampire 4.5.
Firstly, a new layered clause selection approach
[GS20]
gives Vampire more fine-grained control over clause selection, in particular
the way in which clauses involving theory axioms are selected.
Secondly, theory evaluation and instantiation methods have been overhauled.
Thirdly, a new subsumption demodulation rule
[GKR20]
improves support for reasoning with conditional equalities.
Finally, higher-order reasoning (introduced in Vampire 4.4) has been rewritten
based on a new superposition calculus
[BR20]
utilising a KBO-like ordering
[BR20]
for orienting combinator equations.
Vampire 4.5 should be an improvement on Vampire 4.4.
Vampire 4.6
Giles Reger
University of Manchester, United Kingdom
There are only small changes between Vampire 4.5 and Vampire 4.6 in the tracks
relevant to CASC.
Most of our efforts have been spent on theory reasoning (which are not relevant
as TFA is not running) and efforts to parallelise Vampire which are too
immature for CASC this year.
One significant engineering effort has been to incorporate higher-order and
polymorphic reasoning into the "main branch" such that a single executable
is used for all divisions.
Architecture
Vampire
[KV13]
is an automatic theorem prover for first-order logic with extensions to
theory-reasoning and higher-order logic.
Vampire implements the calculi of ordered binary resolution and superposition
for handling equality.
It also implements the Inst-gen calculus and a MACE-style finite model builder
[RSV16].
Splitting in resolution-based proof search is controlled by the AVATAR
architecture which uses a SAT or SMT solver to make splitting decisions
[Vor14,RB+16].
A number of standard redundancy criteria and simplification techniques are
used for pruning the search space: subsumption, tautology deletion, subsumption
resolution and rewriting by ordered unit equalities.
The reduction ordering is the Knuth-Bendix Ordering.
Substitution tree and code tree indexes are used to implement all major
operations on sets of terms, literals and clauses.
Internally, Vampire works only with clausal normal form.
Problems in the full first-order logic syntax are clausified during
preprocessing
[RSV16].
Vampire implements many useful preprocessing transformations including the
SinE axiom selection algorithm.
When a theorem is proved, the system produces a verifiable proof, which
validates both the clausification phase and the refutation of the CNF.
Strategies
Vampire 4.6 provides a very large number of options for strategy selection.
The most important ones are:
- Choices of saturation algorithm:
- Limited Resource Strategy
[RV03]
- DISCOUNT loop
- Otter loop
- Instantiation using the Inst-Gen calculus
- MACE-style finite model building with sort inference
- Splitting via AVATAR
[Vor14]
- A variety of optional simplifications.
- Parameterized reduction orderings.
- A number of built-in literal selection functions and different modes of
comparing literals
[HR+16].
- Age-weight ratio that specifies how strongly lighter clauses are preferred
for inference selection.
This has been extended with a layered clause selection approach
[GS20].
- Set-of-support strategy with extensions for theory reasoning.
- For theory-reasoning:
- Ground equational reasoning via congruence closure.
- Addition of theory axioms and evaluation of interpreted functions
[RSV21].
- Use of Z3 with AVATAR to restrict search to ground-theory-consistent
splitting branches
[RB+16].
- Specialised theory instantiation and unification
[RSV18].
- Extensionality resolution with detection of extensionality axioms
- For higher-order problems:
- Translation to polymorphic first-order logic using applicative form
and combinators
- A superposition calculus
[BR20]
utilising a KBO-like ordering
[BR20]
for orienting combinator equations.
The calculus introduces an inference, narrow, for rewriting with
combinator equations.
- Proof search heuristics targeting the growth of clauses resulting
from narrowing.
- An extension of unification with abstraction to deal with functional
and boolean extensionality.
- Various inferences to deal with booleans
Implementation
Vampire 4.6 is implemented in C++.
It makes use of minisat and z3. See the website for more information and
access to the GitHub repository:
https://vprover.github.io/
Expected Competition Performance
Vampire 4.6 should be roughly the same as Vampire 4.5.
Zipperposition 2.0
Petar Vukmirović
Vrije Universiteit Amsterdam, The Netherlands
Architecture
Zipperposition is a superposition-based theorem prover for typed first-order
logic with equality and higher-order logic. It is a pragmatic implementation of
a complete calculus for Boolean-free higher-order logic
[BB+19].
It features a number of extensions that include polymorphic types; user-defined
rewriting on terms and formulas ("deduction modulo theories"); a lightweight
variant of AVATAR for case splitting; boolean reasoning
[VN20].
The core architecture of the prover is based on saturation with an extensible
set of rules for inferences and simplifications.
Zipperposition uses a recently developed full higher-order unification
algorithm that enables efficient integration of procedures for decidable
fragments of higher-order unification
[VBN20].
The initial calculus and main loop were imitations of an old version of E
[Sch02],
but there are many more rules nowadays.
A summary of the calculus for integer arithmetic and induction can be found in
[Cru15].
Strategies
The system uses various strategies in a portfolio.
The strategies are run in parallel, making use of all CPU cores available.
We designed the portfolio of strategies by manual inspection of different
TPTP problems.
Heuristics used in Zipperposition are inspired by efficient heuristics used
in E.
Portfolio mode differentiates higher-order problems from the first-order ones.
If the problem is first-order all higher-order prover features are turned off.
Other than that, the portfolio is static and does not depend on the syntactic
properties of the problem.
Implementation
The prover is implemented in OCaml, and has been around for eight years.
Term indexing is done using fingerprints for unification, perfect
discrimination trees for rewriting, and feature vectors for subsumption.
Some inference rules such as contextual literal cutting make heavy use of
subsumption.
For higher-order problems some strategies use E prover, running in lambda-free
higher-order mode, as an end-game backend prover.
The code can be found at
https://github.com/sneeuwballen/zipperposition
and is entirely free software (BSD-licensed).
Zipperposition can also output graphic proofs using graphviz.
Some tools to perform type inference and clausification for typed formulas
are also provided, as well as a separate library for dealing with terms and
formulas
[Cru15].
Expected Competition Performance
The prover is expected to have average performance on FOF, similar to Prover9,
and a good performance on THF, at the level of last-year's CASC winner.
Zipperposition 2.1
Petar Vukmirović
Vrije Universiteit Amsterdam, The Netherlands
Architecture
Zipperposition is a superposition-based theorem prover for typed first-order
logic with equality and for higher-order logic.
It is a pragmatic implementation of a complete calculus for full higher-order
logic
[BB+21].
It features a number of extensions that include polymorphic types; user-defined
rewriting on terms and formulas ("deduction modulo theories"); a lightweight
variant of AVATAR for case splitting
[EBT21];
pragmatic boolean reasoning
[VN20].
The core architecture of the prover is based on saturation with an extensible
set of rules for inferences and simplifications.
Zipperposition uses a full higher-order unification algorithm that enables
efficient integration of procedures for decidable fragments of higher-order
unification
[VBN20].
The initial calculus and main loop were imitations of an old version of E
[Sch02].
With the implementation of higher-order superposition, the main loop had to be
adapted to deal with possibly infinite sets of unifiers
[VB+21].
A summary of the calculus for integer arithmetic and induction can be found in
[Cru15].
Strategies
The system uses various strategies in a portfolio.
The strategies are run in parallel, making use of all CPU cores available.
We designed the portfolio of strategies by manual inspection of different
TPTP problems.
Heuristics used in Zipperposition are inspired by efficient heuristics used
in E.
A detailed overview of various calculus extensions used by the strategies
is available
[VB+21].
Portfolio mode differentiates higher-order problems from the first-order ones.
If the problem is first-order all higher-order prover features are turned off.
In particular, the prover uses standard first-order superposition calculus
and disables collaboration with the backend prover.
Other than that, the portfolio is static and does not depend on the syntactic
properties of the problem.
Implementation
The prover is implemented in OCaml, and has been around for nine years.
Term indexing is done using fingerprints for unification, perfect
discrimination trees for rewriting, and feature vectors for subsumption.
Some inference rules such as contextual literal cutting make heavy use of
subsumption.
For higher-order problems some strategies use E prover, running in lambda-free
higher-order mode, as an end-game backend prover.
The code can be found at
https://github.com/sneeuwballen/zipperposition
and is entirely free software (BSD-licensed).
Zipperposition can also output graphic proofs using graphviz.
Some tools to perform type inference and clausification for typed formulas
are also provided, as well as a separate library for dealing with terms and
formulas
[Cru15].
The code can be found at
https://github.com/sneeuwballen/zipperposition
and is entirely free software (BSD-licensed).
Zipperposition can also output graphic proofs using graphviz.
Some tools to perform type inference and clausification for typed formulas
are also provided, as well as a separate library for dealing with terms and
formulas
[Cru15].
Expected Competition Performance
The prover is expected to have average performance on FOF.
It is expected to perform well at THF, at least as good as last-year's version.
In the SLH and LTB divisions we expect reasonable performance, on a par with
Ehoh.