Entrants' System Descriptions
Connect++ 0.6.0
Sean Holden
University of Cambridge, United Kingdom
Architecture
Connect++ Version 0.6.0 is the current publicly-available release of the Connect++ prover for
first-order logic, introduced in
[Hol23].
It is a connection prover using the same calculus and inference rules as leanCoP
[Ott10].
That is, it uses the connection caclulus with regularity and (optionally) with lemmas.
Strategies
The (default) proof search is essentially the same as that used by leanCoP Version 2.1.
That is, it employs a search trying left branches of extensions first, with restricted
backtracking as described in
[Ott10],
and using the leftmost literal of the relevant clause for all inference rules.
It does not attempt to exactly reproduce leanCoP's search order.
When not using a schedule, command-line options allow many of these choices to be altered; for
example allowing backtracking restriction for extensions while still fully exploring left
branches, or other modification of the backtracking restrictions.
If run with its default schedule it uses one similar to that of leanCoP Version 2.1, including
the various applications of definitional clause conversion.
Alternatively it can read and apply arbitrary schedules if desired.
At present Connect++ does not attempt to tune its proof search based on the characteristics of
individual problems.
Implementation
Connect++ is implemented in C++ - minimally the 2017 standard - and built using cmake.
Libraries from the Boost collection are used for parsing, hashing, some random number generation,
and processing of command-line options.
The system has a built-in proof checker for verifying its own output, but also includes a
standalone checker implemented in SWI Prolog.
As substitutions need to apply to an entire proof tree the system only represents each variable
once and shares the representation, simultaneously maintaining a stack of substitutions making
removal of substitutions under backtracking trivial.
It also creates subterms only once and shares them; these are indexed allowing constant-time
lookup, and nothing is ever removed from the index, meaning that if a term is constructed again
after its initial construction no new memory allocation takes place and the term itself is
obtained in constant time.
At the same time, fresh copies of variables are recycled under backtracking - these two design
choices appear to interact very effectively, as the recycling of the variables seems to make
it quite likely that subterms already in the index can be reused.
By default a standard recursive unification algorithm is used, but a polynomial-time version is
optional.
If a schedule is used, it is assumed that different approaches to definitional clause conversion
may be needed - typically all clauses, conjecture clauses only, or no clauses.
As these choices can lead to different matrices, and the conversion itself can be expensive,
the system stores and switches between the different matrices rather than converting multiple
times.
As the system was developed with two guiding aims - to provide a clear implementation easily
modified by others, somewhat in the spirit of MiniSAT
[ES04],
and to support experiments in machine learning for guiding the proof search - the implementation
avoids the use of direct recusion in favour of a pair of stacks and an iterative implementation
based on these, as described in
[Hol23].
This allows complete and arbitrary control of backtracking restriction and other modifications
to the proof search using typically quite simple modifications to the code.
The source and documentation are available at
http://www.cl.cam.ac.uk/~sbh11/connect++.html
Expected Competition Performance
The system remains at an early stage of development and is currently undergoing systematic
profiling and improvement.
It is not expected at this stage to be competitive.
CSE 1.7
Feng Cao
JiangXi University of Science and Technology, China
Architecture
CSE 1.7 is a developed prover based on the last version - CSE 1.6.
It is an automated theorem prover for first-order logic without equality, based mainly on a novel
inference mechanism called Contradiction Separation Based Dynamic Multi-Clause Synergized
Automated Deduction (S-CS)
[XL+18].
S-CS is able to handle multiple (two or more) clauses dynamically in a synergized way in one
deduction step, while binary resolution is a special case.
CSE 1.7 also adopts conventional factoring, equality resolution (ER rule), and variable renaming.
Some pre-processing techniques, including pure literal deletion and simplification based on the
distance to the goal clause, and a number of standard redundancy criteria for pruning the search
space: tautology deletion, subsumption (forward and backward), are applied as well.
CSE 1.7 has been improved compared with CSE 1.6, mainly from the following aspects:
- A multi-clause contradiction separation deduction algorithm based on unit clauses has been
proposed, which can effectively enhance the reasoning ability of unit clauses for
multi-clause deduction.
- A multi-clause synergized deduction algorithm with full use of synergized clauses is
proposed, which can make the unify substitution of candidate literals involved in deduction
from simple to complex, thus effectively optimizing the multi-clause deduction search path.
Internally CSE 1.7 works only with clausal normal form.
The E prover
[Sch13]
is adopted with thanks for clausification of full first-order logic problems during preprocessing.
Strategies
CSE 1.7 inherited most of the strategies in CSE 1.6.
The main new strategies are:
- By analyzing clause influence degree, the number of remaining literals, and literal deduction
ability in the process of multi-clause dynamic deduction, a clause evaluation method based on
the clause comprehensive weight is proposed, which can effectively control the number of
literals in the contradiction separation clause.
- By analyzing the relationship and differences between variables, functions, and constants, a
stability based on clause measurement and calculation method is proposed, which can
effectively evaluate clauses with different term structures.
The lower the stability of clauses, the easier it is to participate in deduction.
Implementation
CSE 1.7 is implemented mainly in C++, and Java is used for batch problem running implementation.
A shared data structure is used for constants and shared variables storage.
In addition, special data structure is designed for property description of clause, literal and
term, so that it can support the multiple strategy mode.
E prover is used for clausification of FOF problems, and then TPTP4X is applied to convert the
CNF format into TPTP format.
Expected Competition Performance
CSE 1.7 has made some improvements compared to CSE 1.6, and so we expect a better performance
in this year's competition.
Acknowledgement:
Development of CSE 1.7 has been partially supported by the General Research Project of Jiangxi
Education Department (Grant No. GJJ200818).
CSE_E 1.6
Peiyao Liu
Southwest Jiaotong University, China
Architecture
CSE_E 1.6 is an automated theorem prover for first-order logic by combining CSE 1.6 and E 3.1,
where CSE 1.6 is based on the Contradiction Separation Based Dynamic Multi-Clause Synergized
Automated Deduction (S-CS)
[XL+18]
and E is mainly based on superposition.
The combination mechanism is like this: E and CSE are applied to the given problem sequentially.
If either prover solves the problem, then the proof process completes.
If neither CSE nor E can solve the problem, some inferred clauses with no more than two literals,
especially unit clauses, by CSE will be fed to E as lemmas, along with the original clauses, for
further proof search.
This kind of combination is expected to take advantage of both CSE and E, and produce a better
performance.
Concretely, CSE is able to generate a good number of unit clauses, based on the fact that unit
clauses are helpful for proof search and equality handling.
On the other hand, E has a good ability on equality handling.
Strategies
The CSE part of CSE_E 1.6 takes almost the same strategies as in that in CSE 1.6 standalone, e.g.,
clause/literal selection, strategy selection, and CSC strategy.
The only difference is that equality handling strategies of CSE part of the combined system are
blocked.
The main new strategies for the combined systems are:
- Lemma filtering mainly based on deduction weight of binary clauses.
- Fine-grained dynamic time allocation scheme in different run stages.
Implementation
CSE_E 1.6 is implemented mainly in C++, and Java is used for batch problem running implementation.
The job dispatch between CSE and E is implemented in C++.
Expected Competition Performance
We expect CSE_E 1.6 to solve some hard problems that E cannot solve and have a satisfying
performance.
Acknowledgement:
Development of CSE_E 1.6 has been partially supported by the National Natural Science Foundation
of China (NSFC) (Grant No. 61976130).
Stephan Schulz for his kind permission on using his E prover that makes CSE_E possible.
CSG_E 1.0
Peiyao Liu
Southwest Jiaotong University, China
Architecture
CSG_E 1.0 is an automated theorem prover for first-order logic by combining CSG 1.0 and E 3.1,
where CSG 1.0 is based on the Contradiction Separation Based Dynamic Multi-Clause Synergized
Automated Deduction (S-CS)
[XL+18]
and E is mainly based on superposition.
CSG uses a new deduction calculus based on S-CS rule called gridle construction method.
The main idea of the new method is to select N literals to construct a maximal standard
contradiction (called a gridle), then use the gridle to match the clauses until the gridle is
filled, and finally the literals tha cannot be matched are composed into the new inferred clause.
The combination mechanism is like this: E and CSG are applied to the given problem sequentially.
If either prover solves the problem, then the proof process completes.
If neither CSG nor E can solve the problem, some inferred clauses with no more than two literals,
especially unit clauses, by CSG will be fed to E as lemmas, along with the original clauses, for
further proof search.
This kind of combination is expected to take advantage of both CSE and E, and produce a better
performance.
Concretely, CSE is able to generate a good number of unit clauses, based on the fact that unit
clauses are helpful for proof search and equality handling.
On the other hand, E has a good ability on equality handling.
Strategies
The CSG part of CSG_E 1.0 takes almost the new strategies as in that in CSG 1.0 standalone, e.g.,
clause/literal selection, gridle generation literal strategy, gridle-size adjustment strategy and
CSC strategy.
The only difference is that equality handling strategies of CSG part of the combined system are
blocked.
Implementation
CSG_E 1.0 is implemented mainly in C++, and Java is used for batch problem running implementation.
The job dispatch between CSG and E is implemented in C++.
Expected Competition Performance
We expect CSG_E 1.0 to solve some hard problems that E cannot solve and have a satisfying
performance.
Acknowledgement:
Development of CSG_E 1.0 has been partially supported by the National Natural Science Foundation
of China (NSFC) (Grant No. 61976130).
Stephan Schulz for his kind permission on using his E prover that makes CSE_E possible.
CSI_E 1.0
Guoyan Zeng
Southwest Jiaotong University, China
Architecture
CSI_E 1.0 is an automated theorem prover for first-order logic, combining CSI 1.0 and E, where
CSI 1.0 is a multi-layer inverse and parallel prover based on the Contradiction Separation Based
Dynamic Multi-Clause Synergized Automated Deduction (S-CS)
[XL+18]
and E is mainly based on superposition.
The combination mechanism is like this: E and CSI are applied to the given problem sequentially.
If either prover solves the problem, then the proof process completes.
If neither CSI nor E can solve the problem, some inferred clauses with no more than two literals,
especially unit clauses, by CSI will be fed to E as lemmas, along with the original clauses, for
further proof search.
This kind of combination is expected to take advantage of both CSI and E, and produce a better
performance.
Concretely, CSI is able to generate a good number of unit clauses, based on the fact that unit
clauses are helpful for proof search and equality handling.
On the other hand, E has a good ability on equality handling.
Strategies
The CSI part of CSI_E 1.0 takes almost the same strategies as in that in CSE 1.6 standalone, e.g.,
clause/literal selection, strategy selection, and CSC strategy.
The only difference is that equality handling strategies of CSE part of the combined system are
blocked. The main new strategies for the combined systems are:
- Complementary ratio strategy.
A measure and calculation method of complementary relation between two clauses, which can
guide the selection of clauses to participate in deduction and plan the deduction path
effectively.
- Portfolio strategy clause selection schemes for different run stages.
Implementation
CSI_E 1.0 is implemented mainly in C++, and Java is used for batch problem running implementation.
The job dispatch between CSI and E is implemented in C++.
Expected Competition Performance
We expect CSI_E 1.0 to solve some hard problems that E cannot solve and have a satisfying
performance.
Acknowledgement:
Development of CSE_E 1.6 has been partially supported by the National Natural Science Foundation
of China (NSFC) (Grant No. 62106206, 62206227).
Stephan Schulz for his kind permission on using his E prover that makes CSE_E possible.
cvc5 1.1.3
Andy Reynolds
University of Iowa, USA
Architecture
cvc5
[BB+22]
is the successor of CVC4
[BC+11].
It is an SMT solver based on the CDCL(T) architecture
[NOT06]
that includes built-in support for many theories, including linear arithmetic, arrays, bit
vectors, datatypes, finite sets and strings.
It incorporates approaches for handling universally quantified formulas.
For problems involving free function and predicate symbols, cvc5 primarily uses heuristic
approaches based on conflict-based instantiation and E-matching for theorems, and finite model
finding approaches for non-theorems.
Like other SMT solvers, cvc5 treats quantified formulas using a two-tiered approach.
First, quantified formulas are replaced by fresh Boolean predicates and the ground theory
solver(s) are used in conjunction with the underlying SAT solver to determine satisfiability.
If the problem is unsatisfiable at the ground level, then the solver answers "unsatisfiable".
Otherwise, the quantifier instantiation module is invoked, and will either add instances of
quantified formulas to the problem, answer "satisfiable", or return unknown.
Finite model finding in cvc5 targets problems containing background theories whose quantification
is limited to finite and uninterpreted sorts.
In finite model finding mode, cvc5 uses a ground theory of finite cardinality constraints that
minimizes the number of ground equivalence classes, as described in
[RT+13].
When the problem is satisfiable at the ground level, a candidate model is constructed that
contains complete interpretations for all predicate and function symbols.
It then adds instances of quantified formulas that are in conflict with the candidate model, as
described in
[RT+13].
If no instances are added, it reports "satisfiable".
cvc5 has native support for problems in higher-order logic, as described in
[BR+19].
It uses a pragmatic approach for HOL, where lambdas are eliminated eagerly via lambda lifting.
The approach extends the theory solver for quantifier-free uninterpreted functions (UF) and
E-matching.
For the former, the theory solver for UF in cvc5 now handles equalities between functions using
an extensionality inference.
Partial applications of functions are handle using a (lazy) applicative encoding where some
function applications are equated to the applicative encoding.
For the latter, several of the data structures for E-matching have been modified to incorporate
matching in the presence of equalities between functions, function variables, and partial function
applications.
Strategies
For handling theorems, cvc5 primarily uses conflict-based quantifier instantiation
[RTd14,
BFR17],
enumerative instantiation
[RBF18]
and E-matching.
cvc5 uses a handful of orthogonal trigger selection strategies for E-matching, and several
orthogonal ordering heuristics for enumerative instantiation.
For handling non-theorems, cvc5 primarily uses finite model finding techniques.
Since cvc5 with finite model finding is also capable of establishing unsatisfiability, it is
used as a strategy for theorems as well.
Implementation
cvc5 is implemented in C++. The code is available from
https://github.com/cvc5/cvc5
Expected Competition Performance
The performance of cvc5 will be comparable to last year.
We continue to rely on a conversion from TPTP to smt2 as a preprocess step.
This year, we have added various new strategies to TFN and THF, including a new instantiation
strategy that combines model-based quantifier instantiation
[Gd09]
with enumerative syntax-guided synthesis
[RB+19].
Drodi 3.6.0
Oscar Contreras
Amateur Programmer, Spain
Architecture
Drodi 3.6.0 is a very basic and lightweight automated theorem prover.
It implements the following main features:
Strategies
Drodi has a fair number of selectable strategies including but not limited to the following:
- Otter, Discount and Limited Resource Strategy
[RV03]
saturation algorithms.
- A basic implementation of AVATAR architecture
[Vor14].
- Several literal and term reduction orderings.
- Several literal selection options
[HR+16].
- Several layered clause selection heuristics with adjustable selection ratios
[GS20].
- Classical clause relevancy pruning.
- Some strategies are run a second time with a previously applied randomization to the problem
[Sud22].
- Drodi can generate a learning file from successful proofs and use the file to guide clause
selection strategy.
It is based in the enhanced ENIGMA method.
However, unlike ENIGMA, the learning file is completely general and can be used with any kind
of problems.
This generality allows the use of the same learning file in both FOF and UEQ CASC competition
divisions.
The learning file is generated over a set of TPTP problems before the CASC competition using
built-in Drodi functions that include a L2 Support Vector Machine.
Drodi integrated learning functions are a generalization of ENIGMA
[JU17,
JU18].
Literals polarity, equality, skolem and variable occurrences are stored in clause feature
vectors.
Unlike ENIGMA, instead of storing the specific functions and predicates themselves only the
SinE distance and arity of functions and non equality predicates are stored in clause feature
vectors with different features assigned to predicates and functions.
Implementation
Drodi is implemented in C.
It includes discrimination trees and hashing indexing.
All the code is original, without special code libraries or code taken from other sources.
Expected Competition Performance
Drodi 3.6.0 is basically the same solver as last year with only minor improvements so performance
will be similar to CASC-29.
E 3.1
Stephan Schulz
DHBW Stuttgart, Germany
Architecture
E
[Sch02,
Sch13,
SCV19]
is a purely equational theorem prover for many-sorted first-order logic with
equality, and for monomorphic higher-order logic.
It consists of an (optional) clausifier for pre-processing full first-order
formulae into clausal form, and a saturation algorithm implementing an instance
of the superposition calculus with negative literal selection and a number of
redundancy elimination techniques, optionally with higher-order extensions
[VB+21,
VBS23].
E is based on the DISCOUNT-loop variant of the given-clause algorithm, i.e., a
strict separation of active and passive facts.
No special rules for non-equational literals have been implemented.
Resolution is effectively simulated by paramodulation and equality resolution.
As of E 2.1, PicoSAT
[Bie08]
can be used to periodically check the (on-the-fly grounded) proof state for
propositional unsatisfiability.
Strategies
Proof search in E is primarily controlled by a literal selection strategy, a clause selection
heuristic, and a simplification ordering.
The prover supports a large number of pre-programmed literal selection strategies.
Clause selection heuristics can be constructed on the fly by combining various parameterized
primitive evaluation functions, or can be selected from a set of predefined heuristics.
Clause evaluation heuristics are based on symbol-counting, but also take other clause properties
into account.
In particular, the search can prefer clauses from the set of support, or containing many symbols
also present in the goal.
Supported term orderings are several parameterized instances of Knuth-Bendix-Ordering (KBO) and
Lexicographic Path Ordering (LPO), which can be lifted in different ways to literal orderings.
For CASC-29, E implements a two-stage multi-core strategy-scheduling automatic mode.
The total CPU time available is broken into several (unequal) time slices.
For each time slice, the problem is classified into one of several classes, based on a number of
simple features (number of clauses, maximal symbol arity, presence of equality, presence of
non-unit and non-Horn clauses, possibly presence of certain axiom patterns...).
For each class, a schedule of strategies is greedily constructed from experimental data as follows:
The first strategy assigned to a schedule is the the one that solves the most problems from this
class in the first time slice.
Each subsequent strategy is selected based on the number of solutions on problems not already
solved by a preceding strategy.
The strategies are then scheduled onto the available cores and run in parallel.
About 140 different strategies have been thoroughly evaluated on all untyped first-order problems
from TPTP 7.3.0.
We have also explored some parts of the heuristic parameter space with a short time limit of 5
seconds.
This allowed us to test about 650 strategies on all TPTP problems, and an extra 7000 strategies
on UEQ problems from TPTP 7.2.0.
About 100 of these strategies are used in the automatic mode, and about 450 are used in at least
one schedule.
Implementation
E is build around perfectly shared terms, i.e. each distinct term is only represented once in a
term bank.
The whole set of terms thus consists of a number of interconnected directed acyclic graphs.
Term memory is managed by a simple mark-and-sweep garbage collector.
Unconditional (forward) rewriting using unit clauses is implemented using perfect discrimination
trees with size and age constraints.
Whenever a possible simplification is detected, it is added as a rewrite link in the term bank.
As a result, not only terms, but also rewrite steps are shared.
Subsumption and contextual literal cutting (also known as subsumption resolution) is supported
using feature vector indexing
[Sch13].
Superposition and backward rewriting use fingerprint indexing
[Sch12],
a new technique combining ideas from feature vector indexing and path indexing.
Finally, LPO and KBO are implemented using the elegant and efficient algorithms developed by
Bernd Löchner in
[Loe06,
Loe06].
The prover and additional information are available at
https://www.eprover.org
Expected Competition Performance
E 3.1 is the CASC-29 SLH winner.
E 3.2.0
Stephan Schulz
DHBW Stuttgart, Germany
Architecture
E
[Sch02,
Sch13,
SCV19]
is a purely equational theorem prover for many-sorted first-order logic with equality, and for
monomorphic higher-order logic.
It consists of an (optional) clausifier for pre-processing full first-order formulae into clausal
form, and a saturation algorithm implementing an instance of the superposition calculus with
negative literal selection and a number of redundancy elimination techniques, optionally with
higher-order extensions
[VB+21,
VBS23].
E is based on the DISCOUNT-loop variant of the given-clause algorithm, i.e., a strict separation
of active and passive facts.
No special rules for non-equational literals have been implemented.
Resolution is effectively simulated by paramodulation and equality resolution.
As of E 2.1, PicoSAT
[Bie08]
can be used to periodically check the (on-the-fly grounded) proof state for propositional
unsatisfiability.
Strategies
Proof search in E is primarily controlled by a literal selection strategy, a clause selection
heuristic, and a simplification ordering.
The prover supports a large number of pre-programmed literal selection strategies.
Clause selection heuristics can be constructed on the fly by combining various parameterized
primitive evaluation functions, or can be selected from a set of predefined heuristics.
Clause evaluation heuristics are based on symbol-counting, but also take other clause properties
into account.
In particular, the search can prefer clauses from the set of support, or containing many symbols
also present in the goal.
Supported term orderings are several parameterized instances of Knuth-Bendix-Ordering (KBO) and
Lexicographic Path Ordering (LPO), which can be lifted in different ways to literal orderings.
For CASC-J12, E implements a two-stage multi-core strategy-scheduling automatic mode.
The total CPU time available is broken into several (unequal) time slices.
For each time slice, the problem is classified into one of several classes, based on a number of
simple features (number of clauses, maximal symbol arity, presence of equality, presence of
non-unit and non-Horn clauses, possibly presence of certain axiom patterns, ...).
For each class, a schedule of strategies is greedily constructed from experimental data as follows:
The first strategy assigned to a schedule is the the one that solves the most problems from this
class in the first time slice.
Each subsequent strategy is selected based on the number of solutions on problems not already
solved by a preceding strategy.
The strategies are then scheduled onto the available cores and run in parallel.
About 140 different strategies have been thoroughly evaluated on all untyped first-order problems
from TPTP 7.3.0.
We have also explored some parts of the heuristic parameter space with a short time limit of 5
seconds.
This allowed us to test about 650 strategies on all TPTP problems, and an extra 7000 strategies
on UEQ problems from TPTP 7.2.0.
About 100 of these strategies are used in the automatic mode, and about 450 are used in at least
one schedule.
Implementation
E is build around perfectly shared terms, i.e., each distinct term is only represented once in a
term bank.
The whole set of terms thus consists of a number of interconnected directed acyclic graphs.
Term memory is managed by a simple mark-and-sweep garbage collector.
Unconditional (forward) rewriting using unit clauses is implemented using perfect discrimination
trees with size and age constraints.
Whenever a possible simplification is detected, it is added as a rewrite link in the term bank.
As a result, not only terms, but also rewrite steps are shared.
Subsumption and contextual literal cutting (also known as subsumption resolution) is supported
using feature vector indexing
[Sch13].
Superposition and backward rewriting use fingerprint indexing
[Sch12],
a new technique combining ideas from feature vector indexing and path indexing.
Finally, LPO and KBO are implemented using the elegant and efficient algorithms developed by
Bernd Löchner in
[Loe06,
Loe06].
The prover and additional information are available at
https://www.eprover.org
Expected Competition Performance
E 3.2.0 is basically E 3.1 made more robust and somewhat more efficient.
We have not yet been able to evaluate and integrate new search strategies making full use of all
new features.
As a result, we expect performance to be similar to last year's version.
The system is expected to perform well in most proof classes, but will at best complement top
systems in the disproof classes.
GKC 0.8
Tanel Tammet
Tallinn University of Technology, Estonia
Architecture
GKC
[Tam19]
is a resolution prover optimized for search in large knowledge bases.
The GKC version 0.8 running at CASC-29 is a marginally improved version of the GKC 0.7 running
in two previous CASCs.
Almost all of the GKC development effort this year has gone to the commonsense superstructure GK
(https://logictools.org/gk/) and the natural
language reasoning pipeline
(https://github.com/tammet/nlpsolver)
[TJ+23].
GKC is used as a foundation (GK Core) for building a common-sense reasoner GK.
In particular, GK can handle inconsistencies and perform probabilistic and nonmonotonic reasoning
[Tam21,
Tam22].
The WASM version of the previous GKC 0.6 is used as the prover engine in the educational
http://logictools.org system.
It can read and output proofs in the TPTP, simplified TPTP and JSON format, the latter compatible
with JSON-LD
[TS21].
GKC only looks for proofs and does not try to show non-provability.
These standard inference rules have been implemented in GKC:
- Binary resolution with optionally the set of support strategy, negative or positive ordered
resolution or unit restriction.
- Hyperresolution.
- Factorization.
- Paramodulation and demodulation with the Knuth-Bendix ordering.
GKC includes an experimental implementation of propositional inferencing and instance generation,
which we do not plan to use during the current CASC.
Strategies
GKC uses multiple strategies run sequentially, with the time limit starting at 0.1 seconds for
each, increased 10 or 5 times once the whole batch has been performed.
The strategy selections takes into consideration the basic properties of the problem: the presence
of equality and the approximate size of the problem.
We perform the selection of a given clause by using several queues in order to spread the selection
relatively uniformly over these categories of derived clauses and their descendants: axioms,
external axioms, assumptions and goals.
The queues are organized in two layers.
As a first layer we use the common ratio-based algorithm of alternating between selecting
n clauses from a weight-ordered queue and one clause from the FIFO queue with the
derivation order.
As a second layer we use four separate queues based on the derivation history of a clause.
Each queue in the second layer contains the two sub-queues of the first layer.
Implementation
GKC is implemented in C.
The data representation machinery is built upon a shared memory graph database
Whitedb enabling it to solve multiple
different queries in parallel processeses without a need to repeatedly parse or load the large
parsed knowledge base from the disk.
An interesting aspect of GKC is the pervasive use of hash indexes, feature vectors and
fingerprints, while no tree indexes are used.
GKC can be obtained from
https://github.com/tammet/gkc/
Expected Competition Performance
We expect GKC to be in the middle of the final ranking for FOF and below the average in UEQ.
We expect GKC to perform well on very large problems.
iProver 3.9
Konstantin Korovin
University of Manchester, United Kingdom
Architecture
iProver
[Kor08,
DK20]
is a theorem prover for quantified first-order logic with theories.
iProver interleaves instantiation calculus Inst-Gen
[Kor13,
Kor08,
GK03]
with ordered resolution and superposition calculi
[DK20].
iProver approximates first-order clauses using propositional abstractions
that are solved using MiniSAT
[ES04]
or Z3
[dMB08]
and refined using model-guided instantiations.
iProver also implements a general abstraction-refinement framework for
under-and over-approximations of first-order clauses
[HK18,
HK19].
First-order clauses are exchanged between calculi during the proof search.
Recent features in iProver include:
- Ground joinability and connectedness in the superposition calculus
[DK22].
- Support for quantified reasoning with arithmetic and arrays.
- AC joinability and AC normalisation
[DK21].
- Superposition calculus with simplifications including: demodulation,
light normalisation, subsumption, subsumption resolution and global subsumption.
iProver's simplification set up
[DK20]
is tunable via command line options and generalises common architectures
such as Discount or Otter.
- HOS-ML framework for learning heuristics using combination of
hyper-parameter optimisation and dynamic clustering together with
schedule optimisation using constraint solving
[HK21,
HK19]
Strategies
iProver has around 100 options to control the proof search including options
for literal selection, passive clause selection, frequency of calling the
SAT/SMT solvers, simplifications, and options for combination of instantiation
with resolution and superposition.
For the competition HOS-ML
[HK21]
was used to build a multi-core schedule from heuristics learnt over a sample
of FOF problems.
Some theories and fragments are recognised such as EPR, UEQ, Horn, groups,
rings and lattices for which options are adapted accordingly.
Implementation
iProver is implemented in OCaml.
For the ground reasoning uses MiniSat
[ES04]
and Z3
[dMB08].
iProver accepts FOF, TFF and CNF formats.
Vampire
[KV13,
RSV15]
and E prover
[Sch13]
are used for proof-producing clausification of FOF/TFF problems.
Vampire is also used for SInE axiom selection
[HV11]
in the LTB division and for theory axioms in the TFA division.
iProver is available at:
https://gitlab.com/korovin/iprover
Expected Competition Performance
iProver is regularly in the top three in FOF, UEQ, TFN, TFA.
We expect an improved performance compared to the previous year due to polishing new methods for
equational reasoning, and heuristic optimization.
LEO-II 1.7.0
Alexander Steen
University of Greifswald, Germany
Architecture
LEO-II
[BP+08],
the successor of LEO
[BK98],
is a higher-order ATP system based on extensional higher-order resolution.
More precisely, LEO-II employs a refinement of extensional higher-order
RUE resolution
[Ben99].
LEO-II is designed to cooperate with specialist systems for fragments of
higher-order logic.
By default, LEO-II cooperates with the first-order ATP system E
[Sch02].
LEO-II is often too weak to find a refutation amongst the steadily growing
set of clauses on its own.
However, some of the clauses in LEO-II's search space attain a special
status: they are first-order clauses modulo the application of an
appropriate transformation function.
Therefore, LEO-II launches a cooperating first-order ATP system every n
iterations of its (standard) resolution proof search loop (e.g., 10).
If the first-order ATP system finds a refutation, it communicates its success
to LEO-II in the standard SZS format.
Communication between LEO-II and the cooperating first-order ATP system
uses the TPTP language and standards.
Strategies
LEO-II employs an adapted "Otter loop".
Moreover, LEO-II uses some basic strategy scheduling to try different
search strategies or flag settings.
These search strategies also include some different relevance filters.
Implementation
LEO-II is implemented in OCaml 4, and its problem representation language
is the TPTP THF language
[BRS08].
In fact, the development of LEO-II has largely paralleled the development
of the TPTP THF language and related infrastructure
[SB10].
LEO-II's parser supports the TPTP THF0 language and also the TPTP languages
FOF and CNF.
Unfortunately the LEO-II system still uses only a very simple
sequential collaboration model with first-order ATPs instead of using
the more advanced, concurrent and resource-adaptive OANTS architecture
[BS+08]
as exploited by its predecessor LEO.
The LEO-II system is distributed under a BSD style license, and it is
available from
http://www.leoprover.org
Expected Competition Performance
LEO-II is not actively being developed anymore, hence there are no expected improvements to last
year's CASC results.
Leo-III 1.7.15
Alexander Steen
University of Greifswald, Germany
Architecture
Leo-III
[SB21],
the successor of LEO-II
[BP+08],
is a higher-order ATP system based on extensional higher-order paramodulation
with inference restrictions using a higher-order term ordering.
The calculus contains dedicated extensionality rules and is augmented with
equational simplification routines that have their intellectual roots in
first-order superposition-based theorem proving.
The saturation algorithm is a variant of the given clause loop procedure
inspired by the first-order ATP system E.
Leo-III cooperates with external first-order ATPs that are called
asynchronously during proof search; a focus is on cooperation with systems
that support typed first-order (TFF) input.
For this year's CASC
E
[Sch02,
Sch13]
is used as external system.
However, cooperation is in general not limited to first-order systems.
Further TPTP/TSTP-compliant external systems (such as higher-order ATPs or
counter model generators) may be included using simple command-line arguments.
If the saturation procedure loop (or one of the external provers) finds a
proof, the system stops, generates the proof certificate and returns the
result.
Strategies
Leo-III comes with several configuration parameters that influence its proof
search by applying different heuristics and/or restricting inferences.
These parameters can be chosen manually by the user on start-up.
Leo-III implements a very naive time slicing approach in which at most three manually
fixed parameter configurations are used, one after each other.
In practice, this hardly ever happens and Leo-III will just run with its
default parameter setting.
Implementation
Leo-III utilizes and instantiates the associated
LeoPARD system platform
[WSB15]
for higher-order (HO) deduction systems implemented in Scala (currently using
Scala 2.13 and running on a JVM with Java >= 8).
The prover makes use of LeoPARD's data structures
and implements its own reasoning logic on top.
A hand-crafted parser is provided that supports all TPTP syntax dialects.
It converts its produced concrete syntax tree to an internal TPTP AST data
structure which is then transformed into polymorphically typed lambda terms.
As of version 1.1, Leo-III supports all common TPTP dialects (CNF, FOF,
TFF, THF) as well as their polymorphic variants
[BP13,
KRS16].
Since version 1.6.X (X >= 0) Leo-III also accepts non-classical problem input
represented in non-classical TPTP, see ...
https://tptp.org/NonClassicalLogic/
The term data structure of Leo-III uses a polymorphically typed spine term
representation augmented with explicit substitutions and De Bruijn-indices.
Furthermore, terms are perfectly shared during proof search, permitting
constant-time equality checks between alpha-equivalent terms.
Leo-III's saturation procedure may at any point invoke external reasoning tools.
To that end, Leo-III includes an encoding module which translates (polymorphic)
higher-order clauses to polymorphic and monomorphic typed first-order clauses,
whichever is supported by the external system.
While LEO-II relied on cooperation with untyped first-order provers, Leo-III
exploits the native type support in first-order provers (TFF logic) for
removing clutter during translation and, in turn, higher effectivity of
external cooperation.
Leo-III is available on GitHub:
https://github.com/leoprover/Leo-III
Expected Competition Performance
Version 1.7.15 is, for all intents and purposes of CASC, equivalent to the version from last
year except that some minor bugs have been fixed, and the support for reasoning in various
quantified non-classical logics (not relevant to CASC) was improved.
We do not expect Leo-III to be strongly competitive against more recent
higher-order provers as Leo-III does not implement several standard features
of effective systems (including time slicing and proper axiom selection).
Prover9 1109a
Bob Veroff on behalf of William McCune
University of New Mexico, USA
Architecture
Prover9, Version 2009-11A, is a resolution/paramodulation prover for first-order logic with
equality.
Its overall architecture is very similar to that of Otter-3.3
[McC03].
It uses the "given clause algorithm", in which not-yet-given clauses are available for rewriting
and for other inference operations (sometimes called the "Otter loop").
Prover9 has available positive ordered (and nonordered) resolution and paramodulation, negative
ordered (and nonordered) resolution, factoring, positive and negative hyperresolution,
UR-resolution, and demodulation (term rewriting).
Terms can be ordered with LPO, RPO, or KBO.
Selection of the "given clause" is by an age-weight ratio.
Proofs can be given at two levels of detail:
(1) standard, in which each line of the proof is a stored clause with detailed justification, and
(2) expanded, with a separate line for each operation.
When FOF problems are input, proof of transformation to clauses is not given.
Completeness is not guaranteed, so termination does not indicate satisfiability.
Strategies
Prover9 has available many strategies; the following statements apply to CASC.
Given a problem, Prover9 adjusts its inference rules and strategy according to syntactic
properties of the input clauses such as the presence of equality and non-Horn clauses.
Prover9 also does some preprocessing, for example, to eliminate predicates.
For CASC Prover9 uses KBO to order terms for demodulation and for the inference rules, with a
simple rule for determining symbol precedence.
For the FOF problems, a preprocessing step attempts to reduce the problem to independent
subproblems by a miniscope transformation; if the problem reduction succeeds, each
subproblem is clausified and given to the ordinary search procedure; if the problem reduction
fails, the original problem is clausified and given to the search procedure.
Implementation
Prover9 is coded in C, and it uses the LADR libraries.
Some of the code descended from EQP
[McC97].
(LADR has some AC functions, but Prover9 does not use them).
Term data structures are not shared (as they are in Otter).
Term indexing is used extensively, with discrimination tree indexing for finding rewrite rules
and subsuming units, FPA/Path indexing for finding subsumed units, rewritable terms, and
resolvable literals.
Feature vector indexing
[Sch04]
is used for forward and backward nonunit subsumption.
Prover9 is available from
http://www.cs.unm.edu/~mccune/prover9/
Expected Competition Performance
Prover9 is the CASC fixed point, against which progress can be judged.
Each year it is expected do worse than the previous year, relative to the other systems.
Twee 2.4.2
Nick Smallbone
Chalmers University of Technology, Sweden
Architecture
Twee 2.4.2
[Sma21]
is a theorem prover for unit equality problems based on unfailing completion
[BDP89].
It implements a DISCOUNT loop, where the active set contains rewrite rules (and unorientable
equations) and the passive set contains critical pairs.
The basic calculus is not goal-directed, but Twee implements a transformation which improves goal
direction for many problems.
Twee features ground joinability testing
[MN90]
and a connectedness test
[BD88],
which together eliminate many redundant inferences in the presence of unorientable equations.
The ground joinability test performs case splits on the order of variables, in the style of
[MN90],
and discharges individual cases by rewriting modulo a variable ordering.
Strategies
Twee's strategy is simple and it does not tune its heuristics or strategy based on the input
problem.
The term ordering is always KBO; by default, functions are ordered by number of occurrences and
have weight 1.
The proof loop repeats the following steps:
- Select and normalise the lowest-scored critical pair, and if it is not redundant, add it as
a rewrite rule to the active set.
- Normalise the active rules with respect to each other.
- Normalise the goal with respect to the active rules.
Each critical pair is scored using a weighted sum of the weight of both of its terms.
Terms are treated as DAGs when computing weights, i.e., duplicate subterms are counted only once
per term.
For CASC, to take advantage of multiple cores, several versions of Twee run in parallel using
different parameters (e.g., with the goal-directed transformation on or off).
Implementation
Twee is written in Haskell.
Terms are represented as array-based flatterms for efficient unification and matching.
Rewriting uses a perfect discrimination tree.
The passive set is represented compactly (12 bytes per critical pair) by storing only the
information needed to reconstruct the critical pair, not the critical pair itself.
Because of this, Twee can run for an hour or more without exhausting memory.
Twee uses an LCF-style kernel: all rules in the active set come with a certified proof object
which traces back to the input axioms.
When a conjecture is proved, the proof object is transformed into a human-readable proof.
Proof construction does not harm efficiency because the proof kernel is invoked only when a new
rule is accepted.
In particular, reasoning about the passive set does not invoke the kernel.
Twee can be downloaded as open source from:
https://nick8325.github.io/twee
Expected Competition Performance
Twee 2.4.2 is the CASC-29 UEQ winner.
Twee 2.5.0
Nick Smallbone
Chalmers University of Technology, Sweden
Architecture
Twee 2.4.2
[Sma21]
is a theorem prover for unit equality problems based on unfailing completion
[BDP89].
It implements a DISCOUNT loop, where the active set contains rewrite rules (and unorientable
equations) and the passive set contains critical pairs.
The basic calculus is not goal-directed, but Twee implements a transformation which improves goal
direction for many problems.
Twee features ground joinability testing
[MN90]
and a connectedness test
[BD88],
which together eliminate many redundant inferences in the presence of unorientable equations.
The ground joinability test performs case splits on the order of variables, in the style of
[MN90],
and discharges individual cases by rewriting modulo a variable ordering.
New this year is a mode which rewrites backwards from the goal instead of enumerating critical
pairs, but it is still rather rough.
Strategies
Twee's strategy is simple and it does not tune its heuristics or strategy based on the input
problem.
The term ordering is always KBO; by default, functions are ordered by number of occurrences and
have weight 1.
The proof loop repeats the following steps:
- Select and normalise the lowest-scored critical pair, and if it is not redundant, add it as
a rewrite rule to the active set.
- Normalise the active rules with respect to each other.
- Normalise the goal with respect to the active rules.
Each critical pair is scored using a weighted sum of the weight of both of its terms.
Terms are treated as DAGs when computing weights, i.e., duplicate subterms are counted only once
per term.
For CASC, to take advantage of multiple cores, several versions of Twee run in parallel using
different parameters (e.g., with the goal-directed transformation on or off).
Implementation
Twee is written in Haskell.
Terms are represented as array-based flatterms for efficient unification and matching.
Rewriting uses a perfect discrimination tree.
The passive set is represented compactly (12 bytes per critical pair) by storing only the
information needed to reconstruct the critical pair, not the critical pair itself.
Because of this, Twee can run for an hour or more without exhausting memory.
Twee uses an LCF-style kernel: all rules in the active set come with a certified proof object
which traces back to the input axioms.
When a conjecture is proved, the proof object is transformed into a human-readable proof.
Proof construction does not harm efficiency because the proof kernel is invoked only when a new
rule is accepted.
In particular, reasoning about the passive set does not invoke the kernel.
Twee can be downloaded as open source from:
https://nick8325.github.io/twee
Expected Competition Performance
Similar to last year, but we hope the new goal-directed mode might solve a few interesting
problems.
Vampire 4.8
Michael Rawson
TU Wien, Austria
There have been a number of changes and improvements since Vampire 4.7, although it is still the
same beast.
Most significant from a competition point of view are long-awaited refreshed strategy schedules.
As a result, several features present in previous competitions will now come into full force,
including new rules for the evaluation and simplification of theory literals.
A large number of completely new features and improvements also landed this year: highlights
include a significant refactoring of the substitution tree implementation, the arrival of
encompassment demodulation to Vampire, and support for parametric datatypes.
Vampire's higher-order support has also been re-implemented from the ground up.
The new implementation is still at an early stage and its theoretical underpinnings are being
developed.
There is currently no documentation of either.
Architecture
Vampire
[KV13]
is an automatic theorem prover for first-order logic with extensions to theory-reasoning and higher-order logic.
Vampire implements the calculi of ordered binary resolution, and superposition for handling equality.
It also implements the Inst-gen calculus and a MACE-style finite model builder
[RSV16].
Splitting in resolution-based proof search is controlled by the AVATAR architecture which uses a SAT or SMT solver to make splitting decisions
[Vor14,
RB+16].
A number of standard redundancy criteria and simplification techniques are used for pruning the
search space: subsumption, tautology deletion, subsumption resolution and rewriting by ordered
unit equalities.
The reduction ordering is the Knuth-Bendix Ordering.
Substitution tree and code tree indexes are used to implement all major operations on sets of
terms, literals and clauses.
Internally, Vampire works only with clausal normal form.
Problems in the full first-order logic syntax are clausified during preprocessing
[RSV16].
Vampire implements many useful preprocessing transformations including the SinE axiom selection
algorithm.
When a theorem is proved, the system produces a verifiable proof, which validates both the
clausification phase and the refutation of the CNF.
Strategies
Vampire 4.8 provides a very large number of options for strategy selection.
The most important ones are:
- Choices of saturation algorithm:
- Limited Resource Strategy
[RV03]
- DISCOUNT loop
- Otter loop
- Instantiation using the Inst-Gen calculus
- MACE-style finite model building with sort inference
- Splitting via AVATAR
[Vor14]
- A variety of optional simplifications.
- Parameterized reduction orderings.
- A number of built-in literal selection functions and different modes of comparing literals
[HR+16].
- Age-weight ratio that specifies how strongly lighter clauses are preferred for inference
selection.
This has been extended with a layered clause selection approach
[GS20].
- Set-of-support strategy with extensions for theory reasoning.
- For theory-reasoning:
- Ground equational reasoning via congruence closure.
- Addition of theory axioms and evaluation of interpreted functions
[RSV21].
- Use of Z3 with AVATAR to restrict search to ground-theory-consistent splitting branches
[RB+16].
- Specialised theory instantiation and unification
[RSV18].
- Extensionality resolution with detection of extensionality axioms
The schedule for the new HOL implementation was developed using Snake, a strategy schedule
construction tool described in more
detail last year.
The Snake schedule this year fully embraces Vampire randomisation support
[Sud22]
and, in particular, every strategy independently shuffles the input problem, to nullify (in
expectation) the effect of problem scrambling done by the organisers.
Implementation
Vampire 4.8 is implemented in C++.
It makes use of fixed versions of Minisat and Z3.
See the website for more information and access to
the GitHub repository.
Expected Competition Performance
Vampire 4.8 is the CASC-29 THF, TFA, TFN, and FOF winner.
Vampire 4.9
Michael Rawson
TU Wien, Austria
There have been a number of improvements since Vampire 4.8, although it is still the same beast.
For the first time this year, Vampire's schedules were constructed mostly using the Snake strategy
selection tool, although a return of the traditional Spider is still possible in future.
Improvements from the past year include:
- A runtime-specialised version of unidirectional term ordering checks
- Improvements to unification with abstraction
- Surprising improvements to Vampire's basic routines such as renaming and unification
- A simple interactive mode
- Revitalisation of code trees
- Experimental features not yet fully understood, mostly aimed at unit-equational reasoning.
- Portability: Vampire is much more standards-compliant and portable than previously, with
much-reduced dependence on platform-specific APIs and hardware architectures, aided by C++17
Vampire's higher-order support remains very similar to last year, although a re-implementation
intended for mainline Vampire is already underway.
Architecture
Vampire
[KV13]
is an automatic theorem prover for first-order logic with extensions to theory-reasoning and
higher-order logic.
Vampire implements the calculi of ordered binary resolution, and superposition for handling
equality.
It also implements a MACE-style finite model builder for finding finite counter-examples
[RSV16].
Splitting in resolution-based proof search is controlled by the AVATAR architecture which uses a
SAT or SMT solver to make splitting decisions
[Vor14,
RB+16].
A number of standard redundancy criteria and simplification techniques are used for pruning the
search space: subsumption, tautology deletion, subsumption resolution and rewriting by ordered
unit equalities.
The reduction ordering is the Knuth-Bendix Ordering.
Substitution tree and code tree indexes are used to implement all major operations on sets of
terms, literals and clauses.
Internally, Vampire works only with clausal normal form.
Problems in the full first-order logic syntax are clausified during preprocessing
[RSV16].
Vampire implements many useful preprocessing transformations including the SInE axiom selection
algorithm.
When a theorem is proved, the system produces a verifiable proof, which validates both the
clausification phase and the refutation of the CNF.
Strategies
Vampire 4.9 provides a very large number of options for strategy selection.
The most important ones are:
- Choices of saturation algorithm:
- Limited Resource Strategy
[RV03]
- DISCOUNT loop
- Otter loop
- MACE-style finite model building with sort inference
- Splitting via AVATAR
[Vor14]
- A variety of optional simplifications.
- Parameterized reduction orderings.
- A number of built-in literal selection functions and different modes of comparing literals
[HR+16].
- Age-weight ratio that specifies how strongly lighter clauses are preferred for inference
selection.
This has been extended with a layered clause selection approach
[GS20].
- Set-of-support strategy with extensions for theory reasoning.
- For theory-reasoning:
- Ground equational reasoning via congruence closure.
- Addition of theory axioms and evaluation of interpreted functions
[RSV21].
- Use of Z3 with AVATAR to restrict search to ground-theory-consistent splitting branches
[RB+16].
- Specialised theory instantiation and unification
[RSV18].
- Extensionality resolution with detection of extensionality axioms
Implementation
Vampire 4.9 is implemented in C++.
It makes use of fixed versions of Minisat and Z3.
See the GitHub repository and associated wiki
for more information.
Expected Competition Performance
Vampire 4.9 should be an improvement on the previous version.
A reasonably strong performance across all divisions is therefore expected.
In the higher-order divisions, performance should be the same as last year.
Zipperposition 2.1.9999
Jasmin Blanchette
Ludwig-Maximilians-Universität München, Germany
Architecture
Zipperposition is a superposition-based theorem prover for typed first-order
logic with equality and for higher-order logic.
It is a pragmatic implementation of a complete calculus for full higher-order
logic
[BB+21].
It features a number of extensions that include polymorphic types, user-defined
rewriting on terms and formulas ("deduction modulo theories"), a lightweight
variant of AVATAR for case splitting
[EBT21],
and Boolean reasoning
[VN20].
The core architecture of the prover is based on saturation with an extensible
set of rules for inferences and simplifications.
Zipperposition uses a full higher-order unification algorithm that enables
efficient integration of procedures for decidable fragments of higher-order
unification
[VBN20].
The initial calculus and main loop were imitations of an earlier version of E
[Sch02].
With the implementation of higher-order superposition, the main loop had to be
adapted to deal with possibly infinite sets of unifiers
[VB+21].
Strategies
The system uses various strategies in a portfolio.
The strategies are run in parallel, making use of all CPU cores available.
We designed the portfolio of strategies by manual inspection of TPTP problems.
Zipperposition's heuristics are inspired by efficient heuristics used in E.
Various calculus extensions are used by the strategies
[VB+21].
The portfolio mode distinguishes between first-order and higher-order problems.
If the problem is first-order, all higher-order prover features are turned off.
In particular, the prover uses standard first-order superposition calculus
and disables collaboration with the backend prover (described below).
Other than that, the portfolio is static and does not depend on the syntactic
properties of the problem.
Implementation
The prover is implemented in OCaml.
Term indexing is done using fingerprints for unification, perfect discrimination trees for
rewriting
, and feature vectors for subsumption.
Some inference rules such as contextual literal cutting make heavy use of subsumption.
For higher-order problems, some strategies use the E prover as an end-game backend prover.
Zipperposition's code can be found at
https://github.com/sneeuwballen/zipperposition
and is entirely free software (BSD-licensed).
Zipperposition can also output graphic proofs using graphviz.
Some tools to perform type inference and clausification for typed formulas are also provided, as
well as a separate library for dealing with terms and formulas
[Cru15].
Expected Competition Performance
The prover is expected to perform well on THF, about as well as last year's version.
We expect to beat E.