https://github.com/BartoszPiotrowski/ATPboost
CSE 1.3
Feng Cao
Southwest Jiaotong University, China
Architecture
CSE 1.3 is a developed prover based on the last version of CSE 1.2.
It is an automated theorem prover for first-order logic without equality
mainly based on a novel inference mechanism, called as Contradiction
Separation Based Dynamic Multi-Clause Synergized Automated Deduction (S-CS)
[XL+18],
which is able to handle multiple (two or more) clauses dynamically in a
synergized way in one deduction step, while binary resolution is its special
case.
CSE 1.3 also adopts conventional factoring, equality resolution (ER rule),
and variable renaming.
Some pre-processing techniques, including pure literal deletion and
simplification based on the distance to the goal clause, and a number of
standard redundancy criteria for pruning the search space: tautology deletion,
subsumption (forward and backward) are applied as well.
CSE 1.3 has been improved, compared with CSE 1.2, mainly from the following aspects:
CSE_E 1.2
Feng Cao
Southwest Jiaotong University, China
Architecture
CSE_E 1.2 is an automated theorem prover for first-order logic by combining
CSE 1.3 and E 2.4, where CSE is based on the Contradiction Separation Based
Dynamic Multi-Clause Synergized Automated Deduction (S-CS)
[XL+18]
and E is mainly based on superposition.
The combination mechanism is like this: E and CSE are applied to the given
problem sequentially.
If either prover solves the problem, then the proof process completes.
If neither CSE nor E can solve the problem, some inferred clauses, especially
unit clauses, by CSE will be fed to E as lemmas, along with the original
clauses, for further proof search.
This kind of combination is expected to take advantage of both CSE and E,
and produce a better performance.
Concretely, CSE is able to generate a good number of unit clauses, based on
the fact that unit clauses are helpful for proof search and equality handling.
On the other hand, E has a good ability on equality handling.
CVC4 1.8
Andrew Reynolds
University of Iowa, USA
Architecture
CVC4
[BC+11]
is an SMT solver based on the DPLL(T) architecture
[NOT06]
that includes built-in support for many theories, including linear arithmetic,
arrays, bit vectors, datatypes, finite sets and strings.
It incorporates approaches for handling universally quantified formulas.
For problems involving free function and predicate symbols, CVC4 primarily
uses heuristic approaches based on conflict-based instantiation and E-matching
for theorems, and finite model finding approaches for non-theorems.
For problems in pure arithmetic,
CVC4 uses techniques for counterexample-guided quantifier instantiation
[RKK17].
Like other SMT solvers, CVC4 treats quantified formulas using a two-tiered approach. First, quantified formulas are replaced by fresh Boolean predicates and the ground theory solver(s) are used in conjunction with the underlying SAT solver to determine satisfiability. If the problem is unsatisfiable at the ground level, then the solver answers "unsatisfiable". Otherwise, the quantifier instantiation module is invoked, and will either add instances of quantified formulas to the problem, answer "satisfiable", or return unknown. Finite model finding in CVC4 targets problems containing background theories whose quantification is limited to finite and uninterpreted sorts. In finite model finding mode, CVC4 uses a ground theory of finite cardinality constraints that minimizes the number of ground equivalence classes, as described in [RT+13]. When the problem is satisfiable at the ground level, a candidate model is constructed that contains complete interpretations for all predicate and function symbols. It then adds instances of quantified formulas that are in conflict with the candidate model, as described in [RT+13]. If no instances are added, it reports "satisfiable".
CVC4 has native support for problems in higher-order logic, as described in recent work [BR+19]. It uses a pragmatic approach for HOL, where lambdas are eliminated eagerly via lambda lifting. The approach extends the theory solver for quantifier-free uninterpreted functions (UF) and E-matching. For the former, the theory solver for UF in CVC4 now handles equalities between functions using an extensionality inference. Partial applications of functions are handle using a (lazy) applicative encoding where some function applications are equated to the applicative encoding. For the latter, several of the data structures for E-matching have been modified to incorporate matching in the presence of equalities between functions, function variables, and partial function applications.
https://github.com/CVC4
E 2.4
Stephan Schulz
DHBW Stuttgart, Germany
Architecture
E 2.4pre
[Sch02,
Sch13,
SCV19]
is a purely equational theorem prover for many-sorted first-order logic with
equality.
It consists of an (optional) clausifier for pre-processing full first-order
formulae into clausal form, and a saturation algorithm implementing an
instance of the superposition calculus with negative literal selection and
a number of redundancy elimination techniques.
E is based on the DISCOUNT-loop variant of the given-clause algorithm, i.e.,
a strict separation of active and passive facts.
No special rules for non-equational literals have been implemented.
Resolution is effectively simulated by paramodulation and equality resolution.
As of E 2.1, PicoSAT
[Bie08]
can be used to periodically check the (on-the-fly grounded) proof state for
propositional unsatisfiability.
For the LTB divisions, a control program uses a SInE-like analysis to extract reduced axiomatizations that are handed to several instances of E. E will not use on-the-fly learning this year.
For CASC-27, E implements a strategy-scheduling automatic mode. The total CPU time available is broken into several (unequal) time slices. For each time slice, the problem is classified into one of several classes, based on a number of simple features (number of clauses, maximal symbol arity, presence of equality, presence of non-unit and non-Horn clauses, ...). For each class, a schedule of strategies is greedily constructed from experimental data as follows: The first strategy assigned to a schedule is the the one that solves the most problems from this class in the first time slice. Each subsequent strategy is selected based on the number of solutions on problems not already solved by a preceding strategy.
About 230 different strategies have been thoroughly evaluated on all untyped first-order problems from TPTP 7.2.0. In addition, we have explored some parts of the heuristic parameter space with a short time limit of 5 seconds. This allowed us to test about 650 strategies on all TPTP problems, and an extra 7000 strategies on all 1193 UEQ problems from TPTP 7.2.0. About 100 of these strategies are used in the automatic mode, and about 450 are used in at least one schedule.
https://www.eprover.org
E 2.5
Stephan Schulz
DHBW Stuttgart, Germany
Architecture
E 2.5pre
[Sch02,
Sch13,
SCV19]
is a purely equational theorem prover for many-sorted first-order logic with
equality.
It consists of an (optional) clausifier for pre-processing full first-order
formulae into clausal form, and a saturation algorithm implementing an instance
of the superposition calculus with negative literal selection and a number of
redundancy elimination techniques.
E is based on the DISCOUNT-loop variant of the given-clause
algorithm, i.e., a strict separation of active and passive facts.
No special rules for non-equational literals have been implemented.
Resolution is effectively simulated by paramodulation and equality resolution.
As of E 2.1, PicoSAT
[Bie08]
can be used to periodically check the (on-the-fly grounded) proof state
for propositional unsatisfiability.
For the LTB divisions, a control program uses a SInE-like analysis to extract reduced axiomatizations that are handed to several instances of E. E will not use on-the-fly learning this year.
For CASC-J10, E implements a strategy-scheduling automatic mode. The total CPU time available is broken into several (unequal) time slices. For each time slice, the problem is classified into one of several classes, based on a number of simple features (number of clauses, maximal symbol arity, presence of equality, presence of non-unit and non-Horn clauses, possibly presence of certain axiom patterns, ...). For each class, a schedule of strategies is greedily constructed from experimental data as follows: The first strategy assigned to a schedule is the the one that solves the most problems from this class in the first time slice. Each subsequent strategy is selected based on the number of solutions on problems not already solved by a preceding strategy.
About 130 different strategies have been thoroughly evaluated on all untyped first-order problems from TPTP 7.3.0. We have also explored some parts of the heuristic parameter space with a short time limit of 5 seconds. This allowed us to test about 650 strategies on all TPTP problems, and an extra 7000 strategies on UEQ problems from TPTP 7.2.0. About 100 of these strategies are used in the automatic mode, and about 450 are used in at least one schedule.
https://www.eprover.org
Enigma 0.5.1
Jan Jakubuv
Czech Technical University in Prague, Czech Republic
Architecture
ENIGMA (Efficient learNing-based Inference Guiding MAchine)
[JU17,JU18,JU19,GJU19,CJ+19,JC+20]
is an efficient implementation of learning-based guidance for given clause
selection in saturation-based automated theorem provers.
Clauses from many proof searches are classified as positive and negative based
on their participation in the proofs.
An efficient classification model is trained on this data, using fast
feature-based characterization of the clauses.
The learned model is then tightly linked with the core prover and used as a
basis of a parameterized evaluation heuristic that provides fast ranking of
all generated clauses.
ENIGMA 0.4 uses E as its underlying prover.
The CASC-27 FOF competition version will most likely use a classifier based
on gradient-boosted trees (XGBoost or LightGBM) and clause features that
abstract from symbol and formula/clause names.
We may also include neural classifiers based on clause structure.
The system will be pre-trained on the latest public TPTP library and also use
a portfolio of strategies pre-trained for TPTP with BliStr(Tune)
[Urb13,JU17].
The system may also include strategies for large TPTP problems used previously
in the E.T. system.
https://github.com/ai4reason/enigmatic
Etableau 0.2
John Hester
University of Florida, USA
Architecture
Etableau 0.2 is an automated theorem prover built alongside E
[Sch13],
using it as a library.
Etableau at a basic level implements the clausal connection calculus, with
enforced regularity, folding up, local unification, and other improvements.
The greatest distinguishing feature of Etableau is a novel calculus using E's
saturation to close local branches of tableaux.
If a branch is local (shares no variables with other branches of the tableau)
and cannot be closed by other means, the branch is saturated as an independent
problem.
If a contradiction is found, this local branch can now be marked as closed.
If a local branch is found to be satisfiable, the problem is satisfiable.
https://github.com/hesterj/E-TAB
GKC 0.5.1
Tanel Tammet
Tallinn University of Technology, Estonia
Architecture
GKC
[Tam19]
is a resolution prover optimized for search in large knowledge bases.
It is used as a foundation (GK Core) for building a common-sense reasoner GK.
In particular, GK can handle inconsistencies and perform probabilistic and
nonmonotonic reasoning, see
[Tam20].
We envision natural language question answering systems as the main potential
application for these specialized methods.
These standard inference rules have been implemented in GKC:
GKC splits the multiple strategies it decides to try between several forked instances. For the competition the plan is to use eight forks. Each fork runs a subset of strategies sequentially.
We perform the selection of a given clause by using several queues in order to spread the selection relatively uniformly over these categories of derived clauses and their descendants: axioms, external axioms, assumptions and goals. The queues are organized in two layers. As a first layer we use the common ratio-based algorithm of alternating between selecting n clauses from a weight-ordered queue and one clause from the FIFO queue with the derivation order. As a second layer we use four separate queues based on the derivation history of a clause. Each queue in the second layer contains the two sub-queues of the first layer.
GKC only looks for proofs and does not try to show non-provability.
GKC can be obtained from
https://github.com/tammet/gkc/
iProver 3.3
Konstantin Korovin
University of Manchester, United Kingdom
Architecture
iProver
[Kor08]
is an automated theorem prover based on an instantiation calculus Inst-Gen
[Kor13,
GK03],
which is complete for first-order logic.
iProver approximates first-order clauses using propositional abstractions
which are solved using MiniSAT
[ES04]
and refined using model-guided instantiations.
iProver combines instantiation, resolution and superposition
[DK20]
and is extended with a general abstraction-refinement framework for under-and
over-approximations
[HK18,HK19].
Recent features in iProver include:
http://www.cs.man.ac.uk/~korovink/iprover/
lazyCoP 0.1
Michael Rawson
University of Manchester, United Kingdom
Architecture
lazyCoP 0.1 is a connection-tableaux system for first-order logic with
equality.
It implements the lazy paramodulation calculus described in
[Pas08],
with some additional inferences such as "shortcut" strict rules and equality
lemmas.
The system implements well-known refinements of the predicate connection
calculus, such as tautology elimination and strong regularity, and these are
lifted to equalities where appropriate.
The resulting system appears to be complete, but we make no theoretical
claim one way or another.
The system was originally conceived to efficiently accommodate a machine-learned heuristic guidance system: this system is not yet guided in this way, but learned heuristics are intended for a future version.
https://github.com/MichaelRawson/lazycop
leanCoP 2.2
Jens Otten
University of Oslo, Norway
Architecture
leanCoP
[OB03,
Ott08]
is an automated theorem prover for classical first-order logic with equality.
It is a very compact implementation of the connection (tableau) calculus
[Bib87,
LS01].
leanCoP can read formulae in leanCoP syntax and in TPTP first-order syntax. Equality axioms and axioms to support distinct objects are automatically added if required. The leanCoP core prover returns a very compact connection proof, which is then translated into a more comprehensive output format, e.g., into a lean (TPTP-style) connection proof or into a readable text proof.
The source code of leanCoP 2.2 is available under the GNU general public license. It can be downloaded from the leanCoP website at:
http://www.leancop.deThe website also contains information about ileanCoP [Ott08] and MleanCoP [Ott12, Ott14], two versions of leanCoP for first-order intuitionistic logic and first-order modal logic, respectively.
LEO-II 1.7.0
Alexander Steen
University of Luxembourg, Luxembourg
Architecture
LEO-II
[BP+08],
the successor of LEO
[BK98],
is a higher-order ATP system based on extensional higher-order resolution.
More precisely, LEO-II employs a refinement of extensional higher-order
RUE resolution
[Ben99].
LEO-II is designed to cooperate with specialist systems for fragments of
higher-order logic.
By default, LEO-II cooperates with the first-order ATP system E
[Sch02].
LEO-II is often too weak to find a refutation amongst the steadily growing
set of clauses on its own.
However, some of the clauses in LEO-II's search space attain a special
status: they are first-order clauses modulo the application of an
appropriate transformation function.
Therefore, LEO-II launches a cooperating first-order ATP system every n
iterations of its (standard) resolution proof search loop (e.g., 10).
If the first-order ATP system finds a refutation, it communicates its success
to LEO-II in the standard SZS format.
Communication between LEO-II and the cooperating first-order ATP system
uses the TPTP language and standards.
Unfortunately the LEO-II system still uses only a very simple sequential collaboration model with first-order ATPs instead of using the more advanced, concurrent and resource-adaptive OANTS architecture [BS+08] as exploited by its predecessor LEO.
The LEO-II system is distributed under a BSD style license, and it is available from
http://www.leoprover.org
Leo-III 1.4
Alexander Steen
University of Luxembourg, Luxembourg
Architecture
Leo-III
[SB18]
[SB18],
the successor of LEO-II
[BP+08]
is a higher-order ATP system based on extensional higher-order paramodulation
with inference restrictions using a higher-order term ordering.
The calculus contains dedicated extensionality rules and is augmented with
equational simplification routines that have their intellectual roots in
first-order superposition-based theorem proving.
The saturation algorithm is a variant of the given clause loop procedure
inspired by the first-order ATP system E.
Leo-III cooperates with external first-order ATPs which are called asynchronously during proof search; a focus is on cooperation with systems that support typed first-order (TFF) input. For this year's CASC, CVC4 [BC+11] and E [Sch02, Sch13] are used as external systems. However, cooperation is in general not limited to first-order systems. Further TPTP/TSTP-compliant external systems (such as higher-order ATPs or counter model generators) may be included using simple command-line arguments. If the saturation procedure loop (or one of the external provers) finds a proof, the system stops, generates the proof certificate and returns the result.
For the LTB division, Leo-III is augmented by an external Python3 driver which schedules Leo-III of the batches.
The term data structure of Leo-III uses a polymorphically typed spine term representation augmented with explicit substitutions and De Bruijn-indices. Furthermore, terms are perfectly shared during proof search, permitting constant-time equality checks between alpha-equivalent terms.
Leo-III's saturation procedure may at any point invoke external reasoning tools. To that end, Leo-III includes an encoding module which translates (polymorphic) higher-order clauses to polymorphic and monomorphic typed first-order clauses, whichever is supported by the external system. While LEO-II relied on cooperation with untyped first-order provers, Leo-III exploits the native type support in first-order provers (TFF logic) for removing clutter during translation and, in turn, higher effectivity of external cooperation.
Leo-III is available on GitHub:
https://github.com/leoprover/Leo-III
Leo-III 1.5
Alexander Steen
University of Luxembourg, Luxembourg
Architecture
Leo-III
[SB18],
the successor of LEO-II
[BP+08],
is a higher-order ATP system based on extensional higher-order paramodulation
with inference restrictions using a higher-order term ordering.
The calculus contains dedicated extensionality rules and is augmented with
equational simplification routines that have their intellectual roots in
first-order superposition-based theorem proving.
The saturation algorithm is a variant of the given clause loop procedure
inspired by the first-order ATP system E.
Leo-III cooperates with external first-order ATPs that are called
asynchronously during proof search; a focus is on cooperation with systems
that support typed first-order (TFF) input.
For this year's CASC, CVC4
[BC+11]
and E
[Sch02,Sch13]
are used as external systems.
However, cooperation is in general not limited to first-order systems.
Further TPTP/TSTP-compliant external systems (such as higher-order ATPs or
counter model generators) may be included using simple command-line arguments.
If the saturation procedure loop (or one of the external provers) finds a
proof, the system stops, generates the proof certificate and returns the
result.
For the LTB division, Leo-III is augmented by an external Python3 driver which schedules Leo-III on the batches.
The term data structure of Leo-III uses a polymorphically typed spine term representation augmented with explicit substitutions and De Bruijn-indices. Furthermore, terms are perfectly shared during proof search, permitting constant-time equality checks between alpha-equivalent terms.
Leo-III's saturation procedure may at any point invoke external reasoning tools. To that end, Leo-III includes an encoding module which translates (polymorphic) higher-order clauses to polymorphic and monomorphic typed first-order clauses, whichever is supported by the external system. While LEO-II relied on cooperation with untyped first-order provers, Leo-III exploits the native type support in first-order provers (TFF logic) for removing clutter during translation and, in turn, higher effectivity of external cooperation.
Leo-III is available on GitHub:
https://github.com/leoprover/Leo-III
Also in the LTB mode, there are no major novelties: only some timing parameters have been changed compared to last year. Stemming from Leo-III's support for polymorphic HOL reasoning, we expect a reasonable performance compared to the other systems. On the other hand, Leo-III's LTB mode does not do any learning and/or analysis of the learning samples.
MaLARea 0.9
Josef Urban
Czech Technical University in Prague, Czech Republic
Architecture
MaLARea 0.8
[Urb07,US+08,KUV15]
is a metasystem for ATP in large theories where symbol and formula names are
used consistently.
It uses several deductive systems (now E, SPASS, Vampire, Paradox, Mace), as
well as complementary AI techniques like machine learning
based on symbol-based similarity, model-based similarity, term-based
similarity, and obviously previous successful proofs.
The version for CASC-27 will use the E prover with the BliStr(Tune)
[Urb13,JU17]
large-theory strategies, possibly also Prover9, Mace and Paradox.
The premise selection methods will likely also use the distance-weighted
k-nearest neighbor
[KU13],
ATPBoost
[PU18],
and E's implementation of SInE.
https://github.com/JUrban/MPTP2/tree/master/MaLAReaThe metasystem's Perl code is released under GPL2.
Prover9 1109a
Bob Veroff on behalf of William McCune
University of New Mexico, USA
Architecture
Prover9, Version 2009-11A, is a resolution/paramodulation prover for
first-order logic with equality.
Its overall architecture is very similar to that of Otter-3.3
[McC03].
It uses the "given clause algorithm", in which not-yet-given clauses are
available for rewriting and for other inference operations (sometimes called
the "Otter loop").
Prover9 has available positive ordered (and nonordered) resolution and paramodulation, negative ordered (and nonordered) resolution, factoring, positive and negative hyperresolution, UR-resolution, and demodulation (term rewriting). Terms can be ordered with LPO, RPO, or KBO. Selection of the "given clause" is by an age-weight ratio.
Proofs can be given at two levels of detail: (1) standard, in which each line of the proof is a stored clause with detailed justification, and (2) expanded, with a separate line for each operation. When FOF problems are input, proof of transformation to clauses is not given.
Completeness is not guaranteed, so termination does not indicate satisfiability.
Given a problem, Prover9 adjusts its inference rules and strategy according to syntactic properties of the input clauses such as the presence of equality and non-Horn clauses. Prover9 also does some preprocessing, for example, to eliminate predicates.
For CASC Prover9 uses KBO to order terms for demodulation and for the inference rules, with a simple rule for determining symbol precedence.
For the FOF problems, a preprocessing step attempts to reduce the problem to independent subproblems by a miniscope transformation; if the problem reduction succeeds, each subproblem is clausified and given to the ordinary search procedure; if the problem reduction fails, the original problem is clausified and given to the search procedure.
http://www.cs.unm.edu/~mccune/prover9/
PyRes 1.3
Stephan Schulz
DHBW Stuttgart, Germany
Architecture
PyRes
[SP20]
is a simple resolution-style theorem prover for first-order logic, implemented
in very clear and well-commented Python.
It has been written as a pedagogical tool to illustrate the architecture and
basic algorithms of a saturation-style theorem prover.
The prover consists of a parser for (most of) TPTP-3 format, a simple
clausifier to convert full first-order format into clause normal form, and a
saturation core trying to derive the empty clause from the resulting clause
set.
The saturation core is based on the DISCOUNT-loop variant of the given-clause algorithm, i.e., a strict separation of active and passive facts. It implements simple binary resolution and factoring [Rob65], optionally with selection of negative literals [BG+01]. Redundancy elimination is restricted to forward and backward subsumption and tautology deletion. There are no inference rules for equality - if equality is detected, the necessary axioms are added.
One of the changes compared to last years version is the introduction of simple indices for unification (which now uses top symbol hashing) and subsumption. Subsumption now uses a new, simple technique we call predicate abstraction indexing, which represents a clause as an ordered sequence of the predicate symbols of its literals. PyRes builds a proof object on the fly, and can print a TPTP-3 style proof or saturation derivation.
The system source is available at:
https://github.com/eprover/PyRes
Satallax 3.4
Michael Färber
ENS Paris-Saclay, France
Architecture
Satallax 3.4
[Bro12]
is an automated theorem prover for higher-order logic.
The particular form of higher-order logic supported by Satallax is Church's
simple type theory with extensionality and choice operators.
The SAT solver MiniSat
[ES04]
is responsible for much of the proof search.
The theoretical basis of search is a complete ground tableau calculus for
higher-order logic
[BS10]
with a choice operator
[BB11].
Problems are given in the THF format.
Proof search: A branch is formed from the axioms of the problem and the negation of the conjecture (if any is given). From this point on, Satallax tries to determine unsatisfiability or satisfiability of this branch. Satallax progressively generates higher-order formulae and corresponding propositional clauses [Bro13]. These formulae and propositional clauses correspond to instances of the tableau rules. Satallax uses the SAT solver MiniSat to test the current set of propositional clauses for unsatisfiability. If the clauses are unsatisfiable, then the original branch is unsatisfiable. Optionally, Satallax generates lambda-free higher-order logic (lfHOL) formulae in addition to the propositional clauses [VB+19]. If this option is used, then Satallax periodically calls the theorem prover E [Sch13] to test for lfHOL unsatisfiability. If the set of lfHOL formulae is unsatisfiable, then the original branch is unsatisfiable. Upon request, Satallax attempts to reconstruct a proof which can be output in the TSTP format.
http://cl-informatik.uibk.ac.at/~mfaerber/satallax.html
Satallax 3.5
Michael Färber
ENS Paris-Saclay, France
Architecture
Satallax
[Bro12]
is an automated theorem prover for higher-order logic.
The particular form of higher-order logic supported by Satallax is Church’s
simple type theory with extensionality and choice operators.
The SAT solver MiniSat
[ES04]
is responsible for much of the proof search.
The theoretical basis of search is a complete ground tableau calculus for
higher-order logic
[BS10]
with a choice operator
[BB11].
Problems are given in the THF format.
A branch is formed from the axioms of the problem and the negation of the
conjecture (if any is given).
From this point on, Satallax tries to determine unsatisfiability or
satisfiability of this branch.
Satallax progressively generates higher-order formulae and corresponding
propositional clauses
[Bro13].
These formulae and propositional clauses correspond to instances of the
tableau rules.
Satallax uses the SAT solver MiniSat to test the current set of propositional
clauses for unsatisfiability.
If the clauses are unsatisfiable, then the original branch is unsatisfiable.
Optionally, Satallax generates lambda-free higher-order logic (lfHOL) formulae
in addition to the propositional clauses
[VB+19].
If this option is used, then Satallax periodically calls the theorem prover E
[Sch13]
to test for lfHOL unsatisfiability.
If the set of lfHOL formulae is unsatisfiable, then the original branch is
unsatisfiable.
Upon request, Satallax attempts to reconstruct a proof which can be output
in the TSTP format.
Satallax is available at:
http://cl-informatik.uibk.ac.at/~mfaerber/satallax.html
Twee 2.2.1
Nick Smallbone
Chalmers University of Technology, Sweden
Architecture
Twee 2.2.1 is an equational prover based on unfailing completion
[BDP89].
It features ground joinability testing
[MN90]
and a connectedness test
[BD88],
which together eliminate many redundant inferences in the presence of
unorientable equations.
Twee's implementation of ground joinability testing performs case splits on the order of variables, in the style of [MN90], and discharges individual cases by rewriting modulo a variable ordering. The case splitting strategy chooses only useful case splits, which prevents the number of cases from blowing up.
Horn clauses are encoded as equations as described in [CS18]. The CASC version of Twee "handles" non-Horn clauses by discarding them.
The main loop is a DISCOUNT loop. The active set contains rewrite rules and unorientable equations, which are used for rewriting, and the passive set contains unprocessed critical pairs. Twee often interreduces the active set, and occasionally simplifies the passive set with respect to the active set. Each critical pair is scored using a weighted sum of the weight of both of its terms. Terms are treated as DAGs when computing weights, i.e., duplicate subterms are only counted once per term. The weights of critical pairs that correspond to Horn clauses are adjusted by the heuristic described in [CS18], section 5.
For CASC, to take advantage of multiple cores, several versions of Twee run in parallel using different parameters.
The passive set is represented compactly (12 bytes per critical pair) by storing only the information needed to reconstruct the critical pair, not the critical pair itself. Because of this, Twee can run for an hour or more without exhausting memory.
Twee uses an LCF-style kernel: all rules in the active set come with a certified proof object which traces back to the input axioms. When a conjecture is proved, the proof object is transformed into a human-readable proof. Proof construction does not harm efficiency because the proof kernel is invoked only when a new rule is accepted. In particular, reasoning about the passive set does not invoke the kernel. The translation from Horn clauses to equations is not yet certified.
Twee can be downloaded from:
http://nick8325.github.io/twee
There are no major changes to the main part of Vampire since 4.4, beyond
some new proof search heuristics and new default values for some options.
The biggest addition is support for higher-order reasoning via translation
to applicative form and combinators, addition of axioms and extra inference
rules, and a new form of combinatory unification.
Vampire 4.4
Giles Reger
University of Manchester, United Kingdom
Architecture
Vampire
[KV13]
4.4 is an automatic theorem prover for first-order logic with extensions to
theory-reasoning and higher-order logic.
Vampire implements the calculi of ordered binary resolution and superposition
for handling equality.
It also implements the Inst-gen calculus and a MACE-style finite model builder
[RSV16].
Splitting in resolution-based proof search is controlled by the AVATAR
architecture which uses a SAT or SMT solver to make splitting decisions
[Vor14,
RB+16].
Both resolution and instantiation based proof search make use of global
subsumption.
A number of standard redundancy criteria and simplification techniques are used for pruning the search space: subsumption, tautology deletion, subsumption resolution and rewriting by ordered unit equalities. The reduction ordering is the Knuth-Bendix Ordering. Substitution tree and code tree indexes are used to implement all major operations on sets of terms, literals and clauses. Internally, Vampire works only with clausal normal form. Problems in the full first-order logic syntax are clausified during preprocessing. Vampire implements many useful preprocessing transformations including the SinE axiom selection algorithm.
When a theorem is proved, the system produces a verifiable proof, which validates both the clausification phase and the refutation of the CNF. Vampire 4.4 provides a very large number of options for strategy selection. The most important ones are:
Vampire 4.5
Giles Reger
University of Manchester, United Kingdom
Architecture
Vampire
[KV13]
4.5 is an automatic theorem prover for first-order logic with extensions to
theory-reasoning and higher-order logic.
Vampire implements the calculi of ordered binary resolution and superposition
for handling equality.
It also implements the Inst-gen calculus and a MACE-style finite model builder
[RSV16].
Splitting in resolution-based proof search is controlled by the AVATAR
architecture which uses a SAT or SMT solver to make splitting decisions
[Vor14,RB+16].
A number of standard redundancy criteria and simplification techniques are
used for pruning the search space: subsumption, tautology deletion,
subsumption resolution and rewriting by ordered unit equalities.
The reduction ordering is the Knuth-Bendix Ordering. Substitution tree and
code tree indexes are used to implement all major operations on sets of terms,
literals and clauses.
Internally, Vampire works only with clausal normal form. Problems in the full
first-order logic syntax are clausified during preprocessing
[RSV16].
Vampire implements many useful preprocessing transformations including the
SinE axiom selection algorithm.
When a theorem is proved, the system produces a verifiable proof, which validates both the clausification phase and the refutation of the CNF.
https://vprover.github.io/for more information and access to the GitHub repository.
Zipperposition 2.0
Petar Vukmirović
Vrije Universiteit Amsterdam, The Netherlands
Architecture
Zipperposition is a superposition-based theorem prover for typed first-order
logic with equality and higher-order logic. It is a pragmatic implementation of
a complete calculus for Boolean-free higher-order logic
[BB+19].
It features a number of extensions that include polymorphic types; user-defined
rewriting on terms and formulas ("deduction modulo theories"); a lightweight
variant of AVATAR for case splitting; boolean reasoning
[VN20].
The core architecture of the prover is based on saturation with an extensible
set of rules for inferences and simplifications.
Zipperposition uses a recently developed full higher-order unification
algorithm that enables efficient integration of procedures for decidable
fragments of higher-order unification
[VBN20].
The initial calculus and main loop were imitations of an old version of E
[Sch02],
but there are many more rules nowadays.
A summary of the calculus for integer arithmetic and induction can be found in
[Cru15].
https://github.com/sneeuwballen/zipperpositionand is entirely free software (BSD-licensed). Zipperposition can also output graphic proofs using graphviz. Some tools to perform type inference and clausification for typed formulas are also provided, as well as a separate library for dealing with terms and formulas [Cru15].