For CASC-29, E implements a two-stage multi-core strategy-scheduling automatic mode. The total CPU time available is broken into several (unequal) time slices. For each time slice, the problem is classified into one of several classes, based on a number of simple features (number of clauses, maximal symbol arity, presence of equality, presence of non-unit and non-Horn clauses, possibly presence of certain axiom patterns...). For each class, a schedule of strategies is greedily constructed from experimental data as follows: The first strategy assigned to a schedule is the the one that solves the most problems from this class in the first time slice. Each subsequent strategy is selected based on the number of solutions on problems not already solved by a preceding strategy. The strategies are then scheduled onto the available cores and run in parallel.
About 140 different strategies have been thoroughly evaluated on all untyped first-order problems from TPTP 7.3.0. We have also explored some parts of the heuristic parameter space with a short time limit of 5 seconds. This allowed us to test about 650 strategies on all TPTP problems, and an extra 7000 strategies on UEQ problems from TPTP 7.2.0. About 100 of these strategies are used in the automatic mode, and about 450 are used in at least one schedule.
https://www.eprover.org
Prover9 1109a
Bob Veroff on behalf of William McCune
University of New Mexico, USA
Architecture
Prover9, Version 2009-11A, is a resolution/paramodulation prover for first-order logic with
equality.
Its overall architecture is very similar to that of Otter-3.3
[McC03].
It uses the "given clause algorithm", in which not-yet-given clauses are available for rewriting
and for other inference operations (sometimes called the "Otter loop").
Prover9 has available positive ordered (and nonordered) resolution and paramodulation, negative ordered (and nonordered) resolution, factoring, positive and negative hyperresolution, UR-resolution, and demodulation (term rewriting). Terms can be ordered with LPO, RPO, or KBO. Selection of the "given clause" is by an age-weight ratio.
Proofs can be given at two levels of detail: (1) standard, in which each line of the proof is a stored clause with detailed justification, and (2) expanded, with a separate line for each operation. When FOF problems are input, proof of transformation to clauses is not given.
Completeness is not guaranteed, so termination does not indicate satisfiability.
Given a problem, Prover9 adjusts its inference rules and strategy according to syntactic properties of the input clauses such as the presence of equality and non-Horn clauses. Prover9 also does some preprocessing, for example, to eliminate predicates.
For CASC Prover9 uses KBO to order terms for demodulation and for the inference rules, with a simple rule for determining symbol precedence.
For the FOF problems, a preprocessing step attempts to reduce the problem to independent subproblems by a miniscope transformation; if the problem reduction succeeds, each subproblem is clausified and given to the ordinary search procedure; if the problem reduction fails, the original problem is clausified and given to the search procedure.
http://www.cs.unm.edu/~mccune/prover9/
Twee 2.4.2
Nick Smallbone
Chalmers University of Technology, Sweden
Architecture
Twee 2.4.2
[Sma21]
is a theorem prover for unit equality problems based on unfailing completion
[BDP89].
It implements a DISCOUNT loop, where the active set contains rewrite rules (and unorientable
equations) and the passive set contains critical pairs.
The basic calculus is not goal-directed, but Twee implements a transformation which improves goal
direction for many problems.
Twee features ground joinability testing [MN90] and a connectedness test [BD88], which together eliminate many redundant inferences in the presence of unorientable equations. The ground joinability test performs case splits on the order of variables, in the style of [MN90], and discharges individual cases by rewriting modulo a variable ordering.
Each critical pair is scored using a weighted sum of the weight of both of its terms. Terms are treated as DAGs when computing weights, i.e., duplicate subterms are counted only once per term.
For CASC, to take advantage of multiple cores, several versions of Twee run in parallel using different parameters (e.g., with the goal-directed transformation on or off).
The passive set is represented compactly (12 bytes per critical pair) by storing only the information needed to reconstruct the critical pair, not the critical pair itself. Because of this, Twee can run for an hour or more without exhausting memory.
Twee uses an LCF-style kernel: all rules in the active set come with a certified proof object which traces back to the input axioms. When a conjecture is proved, the proof object is transformed into a human-readable proof. Proof construction does not harm efficiency because the proof kernel is invoked only when a new rule is accepted. In particular, reasoning about the passive set does not invoke the kernel.
Twee can be downloaded as open source from:
https://nick8325.github.io/twee
There have been a number of changes and improvements since Vampire 4.7, although it is still the
same beast.
Most significant from a competition point of view are long-awaited refreshed strategy schedules.
As a result, several features present in previous competitions will now come into full force,
including new rules for the evaluation and simplification of theory literals.
A large number of completely new features and improvements also landed this year: highlights
include a significant refactoring of the substitution tree implementation, the arrival of
encompassment demodulation to Vampire, and support for parametric datatypes.
Vampire's higher-order support has also been re-implemented from the ground up.
The new implementation is still at an early stage and its theoretical underpinnings are being
developed.
There is currently no documentation of either.
Vampire 4.8
Michael Rawson
TU Wien, Austria
Architecture
Vampire
[KV13]
is an automatic theorem prover for first-order logic with extensions to theory-reasoning and higher-order logic.
Vampire implements the calculi of ordered binary resolution, and superposition for handling equality.
It also implements the Inst-gen calculus and a MACE-style finite model builder
[RSV16].
Splitting in resolution-based proof search is controlled by the AVATAR architecture which uses a SAT or SMT solver to make splitting decisions
[Vor14,
RB+16].
A number of standard redundancy criteria and simplification techniques are used for pruning the
search space: subsumption, tautology deletion, subsumption resolution and rewriting by ordered
unit equalities.
The reduction ordering is the Knuth-Bendix Ordering.
Substitution tree and code tree indexes are used to implement all major operations on sets of
terms, literals and clauses.
Internally, Vampire works only with clausal normal form.
Problems in the full first-order logic syntax are clausified during preprocessing
[RSV16].
Vampire implements many useful preprocessing transformations including the SinE axiom selection
algorithm.
When a theorem is proved, the system produces a verifiable proof, which validates both the
clausification phase and the refutation of the CNF.