Entrants' System Descriptions


ConnectPP 0.6.1

Sean Holden
University of Cambridge, United Kingdom

Architecture

Connect++ Version 0.6.1 is the current publicly-available release of the Connect++ prover for first-order logic, introduced in [
Hol23]. It is a connection prover using the same calculus and inference rules as leanCoP [Ott10]. That is, it uses the connection caclulus with regularity and (optionally) with lemmas.

Strategies

The (default) proof search is essentially the same as that used by leanCoP Version 2.1. That is, it employs a search trying left branches of extensions first, with restricted backtracking as described in [Ott10], and using the leftmost literal of the relevant clause for all inference rules. It does not attempt to exactly reproduce leanCoP's search order. When not using a schedule, command-line options allow many of these choices to be altered; for example allowing backtracking restriction for extensions while still fully exploring left branches, or other modification of the backtracking restrictions. If run with its default schedule it uses one similar to that of leanCoP Version 2.1, including the various applications of definitional clause conversion. Alternatively it can read and apply arbitrary schedules if desired. At present Connect++ does not attempt to tune its proof search based on the characteristics of individual problems.

Implementation

Connect++ is implemented in C++ - minimally the 2017 standard - and built using cmake. Libraries from the Boost collection are used for parsing, hashing, some random number generation, and processing of command-line options. The system has a built-in proof checker for verifying its own output, but also includes a standalone checker implemented in SWI Prolog.

As substitutions need to apply to an entire proof tree the system only represents each variable once and shares the representation, simultaneously maintaining a stack of substitutions making removal of substitutions under backtracking trivial. It also creates subterms only once and shares them; these are indexed allowing constant-time lookup, and nothing is ever removed from the index, meaning that if a term is constructed again after its initial construction no new memory allocation takes place and the term itself is obtained in constant time. At the same time, fresh copies of variables are recycled under backtracking - these two design choices appear to interact very effectively, as the recycling of the variables seems to make it quite likely that subterms already in the index can be reused.

By default a standard recursive unification algorithm is used, but a polynomial-time version is optional.

If a schedule is used, it is assumed that different approaches to definitional clause conversion may be needed - typically all clauses, conjecture clauses only, or no clauses. As these choices can lead to different matrices, and the conversion itself can be expensive, the system stores and switches between the different matrices rather than converting multiple times.

As the system was developed with two guiding aims - to provide a clear implementation easily modified by others, somewhat in the spirit of MiniSAT [ES04], and to support experiments in machine learning for guiding the proof search - the implementation avoids the use of direct recusion in favour of a pair of stacks and an iterative implementation based on these, as described in [Hol23]. This allows complete and arbitrary control of backtracking restriction and other modifications to the proof search using typically quite simple modifications to the code.

The source and documentation are available at

    http://www.cl.cam.ac.uk/~sbh11/connect++.html

Expected Competition Performance

The system remains at an early stage of development and is currently undergoing systematic profiling and improvement. It is not expected at this stage to be competitive with the state-of-the-art, but is expected to be a distinct improvement on last year's version 0.6.0.


CSE_E 1.7

Peiyao Liu
Xihua University, China

Architecture

CSE_E 1.7 is an automated theorem prover for first-order logic by combining CSE 1.8 and E 2.6, where CSE 1.8 is based on the Contradiction Separation Based Dynamic Multi-Clause Synergized Automated Deduction (S-CS) [
XL+18], and E is mainly based on superposition. The combination mechanism is like this: E and CSE are applied to the given problem sequentially. If either prover solves the problem, then the proof process completes. If neither CSE nor E can solve the problem, some inferred clauses with no more than two literals, especially unit clauses, by CSE will be fed to E as lemmas, along with the original clauses, for further proof search. This kind of combination is expected to take advantage of both CSE and E, and produce a better performance. Concretely, CSE is able to generate a good number of unit clauses, based on the fact that unit clauses are helpful for proof search and equality handling. On the other hand, E has a good ability on equality handling.

Strategies

The primary enhancement in version CSE_E 1.7, relative to its predecessors, lies in the adoption of a parallelization strategy. The core concept of this strategy is to leverage CSE's inherent capability to efficiently derive a large number of unit clauses. First, CSE is invoked on a single processor core to generate a set of unit clauses, which serves as the initial lemma set. Subsequently, multiple partitioning schemes are applied to divide this lemma set into N distinct subsets. N processor cores are then activated, with each core executing the theorem prover E to independently check the unsatisfiability of the combination of the original CNF clause set and its assigned subset of lemmas.

Implementation

CSE_E 1.7 is implemented mainly in C++. The job dispatch between CSE and E is implemented in C++.

Expected Competition Performance

We expect CSE_E 1.7 to solve some hard problems that E cannot solve and have a satisfying performance.

Acknowledgements

Development of CSE_E 1.7 has been supported by the Key Project of Sichuan Science and Technology Innovation and Entrepreneurship Seeding Program (Grant No. 2024JDRC0084). Stephan Schulz for his kind permission on using his E prover that makes CSE_E possible.


CSI_Enigma 1.0.6

Guoyan Zeng
Southwest Jiaotong University, China

Architecture

CSI_Enigma 1.0.6 is an automated theorem prover for first-order logic, combining CSI 1.1 and Enigma, where CSI 1.1 is a multi-layer inverse and parallel prover based on the Contradiction Separation Based Dynamic Multi-Clause Synergized Automated Deduction (S-CS) [
XL+18], and Enigma is an efficient implementation of learning-based guidance for given clause selection in saturation-based automated theorem provers. This kind of combination is expected to take advantage of both CSI and Enigma, and produce a better performance. Concretely, CSI is able to construct a good number of unit clauses, based on the fact that unit clauses are helpful for proof search and equality handling. On the other hand, Enigma has a good ability.

Strategies

The CSI part of CSI_Enigma 1.0.6 takes almost the same strategies as in that in CSE 1.6 standalone, e.g., clause/literal selection, strategy selection, and CSC strategy. The only difference is that equality handling strategies of CSE part of the combined system are blocked. The main new strategies for the combined systems are:

Implementation

CSI_Enigma 1.0.6 is implemented mainly in C++.

Expected Competition Performance

We expect CSI_Enigma 1.0.6 to solve some hard problems that CSI_E cannot solve and have a satisfying performance. CSI_Enigma 1.0.6 can solve Satisfiable and Unsatisfiable problems.

Acknowledgement

Development of CSI_Enigma 1.0.6 has been partially supported by the National Natural Science Foundation of China (NSFC) (Grant No. 62106206, 62206227).


cvc5 1.3.0

Andrew Reynolds
University of Iowa, USA

Architecture

cvc5 [
BB+22] is the successor of CVC4 [BC+11]. It is an SMT solver based on the CDCL(T) architecture [NOT06] that includes built-in support for many theories, including linear arithmetic, arrays, bit vectors, datatypes, finite sets and strings. It incorporates approaches for handling universally quantified formulas. For problems involving free function and predicate symbols, cvc5 primarily uses heuristic approaches based on conflict-based instantiation and E-matching for theorems, and finite model finding approaches for non-theorems.

Like other SMT solvers, cvc5 treats quantified formulas using a two-tiered approach. First, quantified formulas are replaced by fresh Boolean predicates and the ground theory solver(s) are used in conjunction with the underlying SAT solver to determine satisfiability. If the problem is unsatisfiable at the ground level, then the solver answers "unsatisfiable". Otherwise, the quantifier instantiation module is invoked, and will either add instances of quantified formulas to the problem, answer "satisfiable", or return unknown. Finite model finding in cvc5 targets problems containing background theories whose quantification is limited to finite and uninterpreted sorts. In finite model finding mode, cvc5 uses a ground theory of finite cardinality constraints that minimizes the number of ground equivalence classes, as described in [RT+13]. When the problem is satisfiable at the ground level, a candidate model is constructed that contains complete interpretations for all predicate and function symbols. It then adds instances of quantified formulas that are in conflict with the candidate model, as described in [RT+13]. If no instances are added, it reports "satisfiable".

cvc5 has native support for problems in higher-order logic, as described in [BR+19]. It uses a pragmatic approach for HOL, where lambdas are eliminated eagerly via lambda lifting. The approach extends the theory solver for quantifier-free uninterpreted functions (UF) and E-matching. For the former, the theory solver for UF in cvc5 now handles equalities between functions using an extensionality inference. Partial applications of functions are handle using a (lazy) applicative encoding where some function applications are equated to the applicative encoding. For the latter, several of the data structures for E-matching have been modified to incorporate matching in the presence of equalities between functions, function variables, and partial function applications.

cvc5 integrates MBQI-Enum [KRB25], a new instantiation strategy that combines model-based quantifier instantiation [Gd09] with syntax-guided synthesis [RB+19]. This approach improves HOL support by generating instantiations that include lambda-terms, identity functions, and terms with uninterpreted symbols. Additionally, cvc5 incorporates an extended version of MBQI-Enum that supports Hilbert's choice operator. This extension improves cvc5's success on higher-order logic benchmarks by generating instantiations that involve choice terms.

Strategies

For handling theorems, cvc5 primarily uses conflict-based quantifier instantiation [RTd14, BFR17], enumerative instantiation [RBF18] and E-matching. cvc5 uses a handful of orthogonal trigger selection strategies for E-matching, and several orthogonal ordering heuristics for enumerative instantiation. For handling non-theorems, cvc5 primarily uses finite model finding techniques. Since cvc5 with finite model finding is also capable of establishing unsatisfiability, it is used as a strategy for theorems as well.

Implementation

cvc5 is implemented in C++. The code is available from
    https://github.com/cvc5/cvc5

Expected Competition Performance

This year, we have added several new strategies for the THF division, including an extension of MBQI-Enum [KRB25] for reasoning about Hilbert's choice operator, which significantly improves cvc5's performance on benchmarks involving higher-order quantifiers and choice terms. We continue to rely on a conversion from TPTP to smt2 as a preprocess step. Overall, our performance should mostly be the same as last year on the other divisions apart from THF.


Drodi 4.1.0

Oscar Contreras
Amateur Programmer, Spain

Architecture

Drodi 4.1.0 is a very basic and lightweight automated theorem prover. It implements the following main features:

Strategies

Drodi has a fair number of selectable strategies including but not limited to the following:

Implementation

Drodi is implemented in C. It includes discrimination trees and hashing indexing. All the code is original, without special code libraries or code taken from other sources.

Expected Competition Performance

Drodi 4.1.0 solves around 5% more problemas than last year version. Anyway we expect that performance will be similar to last year's version.

E 3.3.0

Stephan Schulz
DHBW Stuttgart, Germany

Architecture

E [
Sch02, Sch13, SCV19] is a purely equational theorem prover for many-sorted first-order logic with equality, and for monomorphic higher-order logic. It consists of an (optional) clausifier for pre-processing full first-order formulae into clausal form, and a saturation algorithm implementing an instance of the superposition calculus with negative literal selection and a number of redundancy elimination techniques, optionally with higher-order extensions [VB+21, VBS23]. E is based on the DISCOUNT-loop variant of the given-clause algorithm, i.e., a strict separation of active and passive facts. No special rules for non-equational literals have been implemented. Resolution is effectively simulated by paramodulation and equality resolution. As of E 2.1, PicoSAT [Bie08] can be used to periodically check the (on-the-fly grounded) proof state for propositional unsatisfiability.

Strategies

Proof search in E is primarily controlled by a literal selection strategy, a clause selection heuristic, and a simplification ordering. The prover supports a large number of pre-programmed literal selection strategies. Clause selection heuristics can be constructed on the fly by combining various parameterized primitive evaluation functions, or can be selected from a set of predefined heuristics. Clause evaluation heuristics are based on symbol-counting, but also take other clause properties into account. In particular, the search can prefer clauses from the set of support, or containing many symbols also present in the goal. Supported term orderings are several parameterized instances of Knuth-Bendix-Ordering (KBO) and Lexicographic Path Ordering (LPO), which can be lifted in different ways to literal orderings.

For CASC-30, E implements a two-stage multi-core strategy-scheduling automatic mode. The total CPU time available is broken into several (unequal) time slices. For each time slice, the problem is classified into one of several classes, based on a number of simple features (number of clauses, maximal symbol arity, presence of equality, presence of non-unit and non-Horn clauses, possibly presence of certain axiom patterns, ...). For each class, a schedule of strategies is greedily constructed from experimental data as follows: The first strategy assigned to a schedule is the the one that solves the most problems from this class in the first time slice. Each subsequent strategy is selected based on the number of solutions on problems not already solved by a preceding strategy. The strategies are then scheduled onto the available cores and run in parallel.

About 140 different strategies have been thoroughly evaluated on all untyped first-order problems from TPTP 7.3.0. We have also explored some parts of the heuristic parameter space with a short time limit of 5 seconds. This allowed us to test about 650 strategies on all TPTP problems, and an extra 7000 strategies on UEQ problems from TPTP 7.2.0. About 100 of these strategies are used in the automatic mode, and about 450 are used in at least one schedule.

Implementation

E is build around perfectly shared terms, i.e., each distinct term is only represented once in a term bank. The whole set of terms thus consists of a number of interconnected directed acyclic graphs. Term memory is managed by a simple mark-and-sweep garbage collector. Unconditional (forward) rewriting using unit clauses is implemented using perfect discrimination trees with size and age constraints. Whenever a possible simplification is detected, it is added as a rewrite link in the term bank. As a result, not only terms, but also rewrite steps are shared. Subsumption and contextual literal cutting (also known as subsumption resolution) is supported using feature vector indexing [Sch13]. Superposition and backward rewriting use fingerprint indexing [Sch12], a new technique combining ideas from feature vector indexing and path indexing. Finally, LPO and KBO are implemented using the elegant and efficient algorithms developed by Bernd Löchner in [Loe06, Loe06]. The prover and additional information are available at
    https://www.eprover.org

Expected Competition Performance

E 3.3.0 is basically E 3.2.0 with bug fixes. We have not yet been able to evaluate and integrate new search strategies making full use of all new features. As a result, we expect performance to be similar to last year's version. The system is expected to perform well in most proof classes, but will at best complement top systems in the disproof classes.


hopCoP 0.1

Michael Rawson
University of Souhhampton, United Kingdom

Architecture

hopCoP is a system in the connection family of theorem provers that includes SETHEO, leanCoP, and this year's Connect++. These systems all attempt to produce a proof object from beginning to end, backtracking one step on failure. However, hopCoP differs by analysing the position that caused the failure and learning (in the sense of CDCL rather than gradient descent) a reason for failure. This reason is recorded (and used to avoid similar failures in future), then the system backjumps, undoing multiple decisions at once until the reasons for failure are corrected [
REK25].

Strategies

hopCoP does not employ strategy scheduling. Search begins at the conjecture (or positive clauses, failing that), and continues with some pseudo-random aspects until either a proof is found or the system determines that there is not a proof within a certain tableau depth. In the latter case, the depth is increased and search tried again. Multiple cores are used to search multiple iterative deepening levels at once, a trick learned from Vampire's finite model builder.

Implementation

The implementation is imperative in nature and the skeleton was lifted from SATCoP [RR21], but it has been adapted considerably to support a new core search routine of around 500 lines. The source is available at:
    http://github.com/MichaelRawson/hopcop
and should be somewhat hackable.

Expected Competition Performance

Having gone to some effort to improve the behaviour of connection-driven proof search, we hope to prove more theorems than the closest-related system leanCoP would have done. However, we expect fierce competition from Connect++ and to be roundly beaten by state-of-the art systems.


iProver 3.9

Konstantin Korovin
University of Manchester, United Kingdom

Architecture

iProver [
Kor08, DK20] is a theorem prover for quantified first-order logic with theories. iProver interleaves instantiation calculus Inst-Gen [Kor13, Kor08, GK03] with ordered resolution and superposition calculi [DK20]. iProver approximates first-order clauses using propositional abstractions that are solved using MiniSAT [ES04] or Z3 [dMB08] and refined using model-guided instantiations. iProver also implements a general abstraction-refinement framework for under-and over-approximations of first-order clauses [HK18, HK19]. First-order clauses are exchanged between calculi during the proof search.

Recent features in iProver include:

Strategies

iProver has around 100 options to control the proof search including options for literal selection, passive clause selection, frequency of calling the SAT/SMT solvers, simplifications, and options for combination of instantiation with resolution and superposition. For the competition HOS-ML [HK21] was used to build a multi-core schedule from heuristics learnt over a sample of FOF problems. Some theories and fragments are recognised such as EPR, UEQ, Horn, groups, rings and lattices for which options are adapted accordingly.

Implementation

iProver is implemented in OCaml. For the ground reasoning uses MiniSat [ES04] and Z3 [dMB08]. iProver accepts FOF, TFF and CNF formats. Vampire [KV13, RSV15] and E prover [Sch13] are used for proof-producing clausification of FOF/TFF problems. Vampire is also used for SInE axiom selection [HV11] in the LTB division and for theory axioms in the TFA division. iProver is available at:
    https://gitlab.com/korovin/iprover

Expected Competition Performance

iProver 3.9 is the CASC-J12 TFN winner.


iProver 3.9.3

Konstantin Korovin
The University of Manchester, United Kingdom

Architecture

iProver [
Kor08, DK20] is a theorem prover for quantified first-order logic with theories. iProver interleaves instantiation calculus Inst-Gen [Kor13, Kor08, GK03] with ordered resolution and superposition calculi [DK20]. iProver approximates first-order clauses using propositional abstractions that are solved using MiniSAT [ES04] or Z3 [dMB08] and refined using model-guided instantiations. iProver also implements a general abstraction-refinement framework for under-and over-approximations of first-order clauses [HK18, HK19]. First-order clauses are exchanged between calculi during the proof search.

Recent features in iProver include:

Strategies

iProver has around 100 options to control the proof search including options for literal selection, passive clause selection, frequency of calling the SAT/SMT solvers, simplifications, and options for combination of instantiation with resolution and superposition. For the competition HOS-ML [HK21] was used to build a multi-core schedule from heuristics learnt over a sample of FOF problems. Some theories and fragments are recognised such as EPR, UEQ, Horn, groups, rings and lattices for which options are adapted accordingly.

Implementation

iProver is implemented in OCaml. For the ground reasoning uses MiniSat [ES04] and Z3 [dMB08]. iProver accepts FOF, TFF and CNF formats. Vampire [KV13, RSV15] and E prover [Sch13] are used for proof-producing clausification of FOF/TFF problems. Vampire is also used for SInE axiom selection [HV11] of theory axioms in the TFA division. iProver is available at:
    https://gitlab.com/korovin/iprover

Expected Competition Performance

iProver is regularly in the top 3 in FOF, EPR, UEQ, TFN, TFA. We expect an improved performance compared to the previous year due to efficiency improvements and heuristic optimization.


LastButNotLeast 0

Julie Cailler
University of Lorraine, CNRS, Inria, LORIA, Nancy, France

Architecture

LastButNotLeast: The fastest way to give up - now with 0 bugs! Probably the fastest prover in the competition: Unfortunately, designing this prover may force you to share a meal with very weird people - but hey, that's the price of true innovation.

Strategies

The strategy is simple: always return GaveUp. It's fast, reliable, and guarantees we never solve anything - by design.

Implementation

#!/usr/bin/env python

print("% LastButNotLeast: The fastest way to give up!")
print("% For best results, do not expect results.")
print("% SZS status GaveUp")
print("% It's not a bug - it's a philosophical stance.")
print("% Thanks for trying LastButNotLeast :)")

Expected Competition Performance

The prover is expected to come last, and any other result would be genuinely surprising - possibly even concerning. In fact, LastButNotLeast proudly exists to ensure that no other prover - no matter how underwhelming - has to carry the burden of being last.


Leo-III 1.7.19

Alexander Steen
University of Greifswald, Germany

Architecture

Leo-III [
SB21], the successor of LEO-II [BP+08], is a higher-order ATP system based on extensional higher-order paramodulation with inference restrictions using a higher-order term ordering. The calculus contains dedicated extensionality rules and is augmented with equational simplification routines that have their intellectual roots in first-order superposition-based theorem proving. The saturation algorithm is a variant of the given clause loop procedure inspired by the first-order ATP system E.

Leo-III cooperates with external first-order ATPs that are called asynchronously during proof search; a focus is on cooperation with systems that support typed first-order (TFF) input. For this year's CASC E [Sch02, Sch13] is used as external system. However, cooperation is in general not limited to first-order systems. Further TPTP/TSTP-compliant external systems (such as higher-order ATPs or counter model generators) may be included using simple command-line arguments. If the saturation procedure loop (or one of the external provers) finds a proof, the system stops, generates the proof certificate and returns the result.

Strategies

Leo-III comes with several configuration parameters that influence its proof search by applying different heuristics and/or restricting inferences. These parameters can be chosen manually by the user on start-up. Leo-III implements a very naive time slicing approach in which at most three manually fixed parameter configurations are used, one after each other. In practice, this hardly ever happens and Leo-III will just run with its default parameter setting.

Implementation

Leo-III utilizes and instantiates the associated LeoPARD system platform [WSB15] for higher-order (HO) deduction systems implemented in Scala (currently using Scala 2.13 and running on a JVM with Java >= 8). The prover makes use of LeoPARD's data structures and implements its own reasoning logic on top. A hand-crafted parser is provided that supports all TPTP syntax dialects. It converts its produced concrete syntax tree to an internal TPTP AST data structure which is then transformed into polymorphically typed lambda terms. As of version 1.1, Leo-III supports all common TPTP dialects (CNF, FOF, TFF, THF) as well as their polymorphic variants [BP13, KRS16]. Since version 1.6.X (X >= 0) Leo-III also accepts non-classical problem input represented in non-classical TPTP, see ...
    https://tptp.org/NonClassicalLogic/

The term data structure of Leo-III uses a polymorphically typed spine term representation augmented with explicit substitutions and De Bruijn-indices. Furthermore, terms are perfectly shared during proof search, permitting constant-time equality checks between alpha-equivalent terms.

Leo-III's saturation procedure may at any point invoke external reasoning tools. To that end, Leo-III includes an encoding module which translates (polymorphic) higher-order clauses to polymorphic and monomorphic typed first-order clauses, whichever is supported by the external system. While LEO-II relied on cooperation with untyped first-order provers, Leo-III exploits the native type support in first-order provers (TFF logic) for removing clutter during translation and, in turn, higher effectivity of external cooperation.

Leo-III is available on GitHub:

    https://github.com/leoprover/Leo-III

Expected Competition Performance

Version 1.7.19 is, for all intents and purposes of CASC, equivalent to the version from the previous year except that some minor bugs were fixed, and the support for reasoning in various quantified non-classical logics (not relevant to CASC) has been improved. We do not expect Leo-III to be strongly competitive against more recent higher-order provers, as Leo-III does not implement several standard features of effective systems (including time slicing and proper axiom selection).


LisaTT 0.9.1

Simon Guilloud
EPFL, Switzerland

Architecture

Lisa is a proof assistant based on first order logic and set theory. Lisa offers a unified proof script, tactic and implementation by implementing an expressive DSL directly within the Host language Scala. Lisa also foundationally relies on Orthologic, a generalization of Boolean Algebra without the distributivity law that admits quadratic time normalization and validity checking. This makes proofs shorter and simpler, and crucially improve on automation, in particular in Lisa's Tableau tactic.

Strategies

Lisa's Tableau tactic, is, unsurprisingly, a tableau-based solver. It's main strategy and differentiating point is the use of Orthologic, which simplifies the initial input. We know of classes that Orthologic solve quickly but that are difficult for ATP's and Sat solver, though this tends to be more significant. Beyond this, the implementation is standard, partially inspired from the Zenon prover for branch elimination.

Implementation

The implementation (in Scala) is functional and produces low level proofs that are accepted by Lisa's kernel. It is available from:
    https://github.com/epfl-lara/lisa/tree/main

Expected Competition Performance

We do not expect great performances from the system, as it is the result of two weeks of work of one PhD student and the heuristic are rather simple. Nonetheless, it could in theory prove problems that no other ATP is expected to solve, thanks to orthologic normalization (we do not know if such problems exist in the TPTP library).


Prover9 1109a

Bob Veroff on behalf of William McCune
University of New Mexico, USA

Architecture

Prover9, Version 2009-11A, is a resolution/paramodulation prover for first-order logic with equality. Its overall architecture is very similar to that of Otter-3.3 [
McC03]. It uses the "given clause algorithm", in which not-yet-given clauses are available for rewriting and for other inference operations (sometimes called the "Otter loop").

Prover9 has available positive ordered (and nonordered) resolution and paramodulation, negative ordered (and nonordered) resolution, factoring, positive and negative hyperresolution, UR-resolution, and demodulation (term rewriting). Terms can be ordered with LPO, RPO, or KBO. Selection of the "given clause" is by an age-weight ratio.

Proofs can be given at two levels of detail: (1) standard, in which each line of the proof is a stored clause with detailed justification, and (2) expanded, with a separate line for each operation. When FOF problems are input, proof of transformation to clauses is not given.

Completeness is not guaranteed, so termination does not indicate satisfiability.

Strategies

Prover9 has available many strategies; the following statements apply to CASC.

Given a problem, Prover9 adjusts its inference rules and strategy according to syntactic properties of the input clauses such as the presence of equality and non-Horn clauses. Prover9 also does some preprocessing, for example, to eliminate predicates.

For CASC Prover9 uses KBO to order terms for demodulation and for the inference rules, with a simple rule for determining symbol precedence.

For the FOF problems, a preprocessing step attempts to reduce the problem to independent subproblems by a miniscope transformation; if the problem reduction succeeds, each subproblem is clausified and given to the ordinary search procedure; if the problem reduction fails, the original problem is clausified and given to the search procedure.

Implementation

Prover9 is coded in C, and it uses the LADR libraries. Some of the code descended from EQP [McC97]. (LADR has some AC functions, but Prover9 does not use them). Term data structures are not shared (as they are in Otter). Term indexing is used extensively, with discrimination tree indexing for finding rewrite rules and subsuming units, FPA/Path indexing for finding subsumed units, rewritable terms, and resolvable literals. Feature vector indexing [Sch04] is used for forward and backward nonunit subsumption. Prover9 is available from
    http://www.cs.unm.edu/~mccune/prover9/

Expected Competition Performance

Prover9 is the CASC fixed point, against which progress can be judged. Each year it is expected do worse than the previous year, relative to the other systems.


SPASS-SCL 0.1

Christoph Weidenbach
Max Planck Institute for Informatics, Germany

Architecture

SPASS-SCL-FOL 0.1 is a prototype implementing SCL(FOL) [
BSW23] for first-order logic without equality and an SMT-style support for equality in the case of BSR. Equality beyond BSR is not supported. Currently, proof documentation is not supported. Models are always given explicitly [BK+24]. SPASS-SCL-FOL 0.1 is mainly built to study the properties of SCL-based calculi.

Strategies

SPASS-SCL-FOL 0.1 does not have any complex strategy parameters. It is a pure SCL(FOL) solver without the use of a portfolio. It is in no way adapted to the problems of the TPTP. It runs five different SCL(FOL) strategies in parallel.

Implementation

SPASS-SCL-FOL is implemented in C on top of the SPASS-Workbench and will become our next system after our SAT solver SPASS-SAT and our SMT solver SPASS-SATT. Due to its prototypical status, it is currently not published on our website.

Expected Competition Performance

SPASS-SCL-FOL 0.1 won't win any TPTP category, but should show reasonable performance in the sense that even our prototype version should not be subsumed by any other ATP.


Toma 0.7

Teppei Saito
Japan Advanced Institute of Science and Technology, Japan

Architecture

Toma is an automatic equational theorem prover based on a DISCOUNT loop [
DKS97] implementing ordered completion [BDP89]. In addition to LPO and KBO, the tool also implements non-standard term orders such as weighted path orders [YKS15] and monotonic semantic path orders [BFR00].

Strategies

For this competition, the tool tries two runs for a given problem: one with a fixed KBO and another with an LPO. Based on an idea from [WM18], the precedence of the LPO is chosen in a way that minimizes the number of critical pairs between axioms, using the SMT solver Z3.

Implementation

For efficiency, Toma implements fingerprint indexing [Sch12], Waidmeister's AC heuristics [AHL03], and a connectedness testing [BD88, Sma21]. The source code is available at
    https://www.jaist.ac.jp/project/maxcomp/.

Expected Competition Performance

Toma would solve almost half as many problems as state-of-the-art solvers.


Twee 2.6.0

Nick Smallbone
Chalmers University of Technology, Sweden

Architecture

Twee 2.6.0 [
Sma21] is a theorem prover for unit equality problems based on unfailing completion [BDP89]. It implements a DISCOUNT loop, where the active set contains rewrite rules (and unorientable equations) and the passive set contains critical pairs. The basic calculus is not goal-directed, but Twee implements a transformation which improves goal direction for many problems.

Twee features ground joinability testing [MN90] and a connectedness test [BD88], which together eliminate many redundant inferences in the presence of unorientable equations. The ground joinability test performs case splits on the order of variables, in the style of [MN90], and discharges individual cases by rewriting modulo a variable ordering. This year's version has fixes for some completeness bugs which allows some slightly more aggressive redundancy criteria to be used.

Strategies

Twee's strategy is simple and it does not tune its heuristics or strategy based on the input problem. The term ordering is always KBO; by default, functions are ordered by number of occurrences and have weight 1. The proof loop repeats the following steps:

Each critical pair is scored using a weighted sum of the weight of both of its terms. Terms are treated as DAGs when computing weights, i.e., duplicate subterms are counted only once per term.

For CASC, to take advantage of multiple cores, several versions of Twee run in parallel using different parameters (e.g., with the goal-directed transformation on or off).

Implementation

Twee is written in Haskell. Terms are represented as array-based flatterms for efficient unification and matching. Rewriting uses a perfect discrimination tree.

The passive set is represented compactly (12 bytes per critical pair) by storing only the information needed to reconstruct the critical pair, not the critical pair itself. Because of this, Twee can run for an hour or more without exhausting memory.

Twee uses an LCF-style kernel: all rules in the active set come with a certified proof object which traces back to the input axioms. When a conjecture is proved, the proof object is transformed into a human-readable proof. Proof construction does not harm efficiency because the proof kernel is invoked only when a new rule is accepted. In particular, reasoning about the passive set does not invoke the kernel.

Twee can be downloaded as open source from:

    https://nick8325.github.io/twee

Expected Competition Performance

Competing with the top provers.


Vampire 4.4

Giles Reger
University of Manchester, United Kingdom

There are no major changes to the main part of Vampire since 4.4, beyond some new proof search heuristics and new default values for some options. The biggest addition is support for higher-order reasoning via translation to applicative form and combinators, addition of axioms and extra inference rules, and a new form of combinatory unification.

Architecture

Vampire [
KV13] 4.4 is an automatic theorem prover for first-order logic with extensions to theory-reasoning and higher-order logic. Vampire implements the calculi of ordered binary resolution and superposition for handling equality. It also implements the Inst-gen calculus and a MACE-style finite model builder [RSV16]. Splitting in resolution-based proof search is controlled by the AVATAR architecture which uses a SAT or SMT solver to make splitting decisions [Vor14, RB+16]. Both resolution and instantiation based proof search make use of global subsumption.

A number of standard redundancy criteria and simplification techniques are used for pruning the search space: subsumption, tautology deletion, subsumption resolution and rewriting by ordered unit equalities. The reduction ordering is the Knuth-Bendix Ordering. Substitution tree and code tree indexes are used to implement all major operations on sets of terms, literals and clauses. Internally, Vampire works only with clausal normal form. Problems in the full first-order logic syntax are clausified during preprocessing. Vampire implements many useful preprocessing transformations including the SinE axiom selection algorithm.

When a theorem is proved, the system produces a verifiable proof, which validates both the clausification phase and the refutation of the CNF. Vampire 4.4 provides a very large number of options for strategy selection. The most important ones are:

Implementation

Vampire 4.4 is implemented in C++. It makes use of Minisat and Z3.

Expected Competition Performance

Vampire 4.4 is the CASC-27 EPR winner.


Vampire 4.9

Michael Rawson
TU Wien, Austria

There have been a number of improvements since Vampire 4.8, although it is still the same beast. For the first time this year, Vampire's schedules were constructed mostly using the Snake strategy selection tool, although a return of the traditional Spider is still possible in future. Improvements from the past year include:

Vampire's higher-order support remains very similar to last year, although a re-implementation intended for mainline Vampire is already underway.

Architecture

Vampire [
KV13] is an automatic theorem prover for first-order logic with extensions to theory-reasoning and higher-order logic. Vampire implements the calculi of ordered binary resolution, and superposition for handling equality. It also implements a MACE-style finite model builder for finding finite counter-examples [RSV16]. Splitting in resolution-based proof search is controlled by the AVATAR architecture which uses a SAT or SMT solver to make splitting decisions [Vor14, RB+16]. A number of standard redundancy criteria and simplification techniques are used for pruning the search space: subsumption, tautology deletion, subsumption resolution and rewriting by ordered unit equalities. The reduction ordering is the Knuth-Bendix Ordering. Substitution tree and code tree indexes are used to implement all major operations on sets of terms, literals and clauses. Internally, Vampire works only with clausal normal form. Problems in the full first-order logic syntax are clausified during preprocessing [RSV16]. Vampire implements many useful preprocessing transformations including the SInE axiom selection algorithm. When a theorem is proved, the system produces a verifiable proof, which validates both the clausification phase and the refutation of the CNF.

Strategies

Vampire 4.9 provides a very large number of options for strategy selection. The most important ones are:

Implementation

Vampire 4.9 is implemented in C++. It makes use of fixed versions of Minisat and Z3. See the GitHub repository and associated wiki for more information.

Expected Competition Performance

Vampire 4.9 is the CASC-J12 THF, TFA, FOF, UEQ, SLH, and ICU winner.


Vampire 5.0

Michael Rawson
University of Southampton, United Kongdom

Vampire 5.0 remains similar in spirit to all previous versions, but a bumper crop of changes have been merged this competition cycle. Various non-competition improvements to Vampire including a program synthesis mode [HA+24] and partial support for the polymorphic SMT-LIB 2.7 standard landed, but for the competition we mention:

Vampire's higher-order support remains very similar to last year, although a re-implementation intended for mainline Vampire is being merged in stages.

Architecture

Vampire [BB+25] is an automatic theorem prover for first-order logic with extensions to theory-reasoning and higher-order logic. Vampire implements several extensions of a core superposition calculus. It also implements a MACE-style finite model builder for finding finite counter-examples [RSV16]. Splitting in saturation-based proof search is controlled by the AVATAR architecture which uses a SAT or SMT solver to make splitting decisions [Vor14, RB+16]. A number of standard redundancy criteria and simplification techniques are used for pruning the search space: subsumption, tautology deletion, subsumption resolution and rewriting by ordered unit equalities. Substitution tree and code tree indices are used to implement all major operations on sets of terms, literals and clauses. Internally, Vampire works only with clausal normal form: problems not already in CNF are clausified during preprocessing [RSV16]. Vampire implements many preprocessing transformations, including the SInE axiom selection algorithm for large theories and blocked clause elimination.

Strategies

Vampire 5.0 provides a very large number of options for strategy selection. The most important ones are:

Implementation

Vampire 5.0 is implemented in C++. It makes use of fixed versions of Minisat, CaDiCaL, GMP, VIRAS, and Z3. See the GitHub repository and associated wiki for more information.

Expected Competition Performance

Vampire 5.0 should be an improvement on the previous version. A reasonably strong performance across all divisions is therefore expected. In the higher-order divisions, performance should be the same as last year.


Zipperposition 2.1.9999

Jasmin Blanchette
Ludwig-Maximilians-Universität München, Germany

Architecture

Zipperposition is a superposition-based theorem prover for typed first-order logic with equality and for higher-order logic. It is a pragmatic implementation of a complete calculus for full higher-order logic [
BB+21]. It features a number of extensions that include polymorphic types, user-defined rewriting on terms and formulas ("deduction modulo theories"), a lightweight variant of AVATAR for case splitting [EBT21], and Boolean reasoning [VN20]. The core architecture of the prover is based on saturation with an extensible set of rules for inferences and simplifications. Zipperposition uses a full higher-order unification algorithm that enables efficient integration of procedures for decidable fragments of higher-order unification [VBN20]. The initial calculus and main loop were imitations of an earlier version of E [Sch02]. With the implementation of higher-order superposition, the main loop had to be adapted to deal with possibly infinite sets of unifiers [VB+21].

Strategies

The system uses various strategies in a portfolio. The strategies are run in parallel, making use of all CPU cores available. We designed the portfolio of strategies by manual inspection of TPTP problems. Zipperposition's heuristics are inspired by efficient heuristics used in E. Various calculus extensions are used by the strategies [VB+21]. The portfolio mode distinguishes between first-order and higher-order problems. If the problem is first-order, all higher-order prover features are turned off. In particular, the prover uses standard first-order superposition calculus and disables collaboration with the backend prover (described below). Other than that, the portfolio is static and does not depend on the syntactic properties of the problem.

Implementation

The prover is implemented in OCaml. Term indexing is done using fingerprints for unification, perfect discrimination trees for rewriting, and feature vectors for subsumption. Some inference rules such as contextual literal cutting make heavy use of subsumption. For higher-order problems, some strategies use the E prover as an end-game backend prover.

Zipperposition's code can be found at

    https://github.com/sneeuwballen/zipperposition
and is entirely free software (BSD-licensed).

Zipperposition can also output graphic proofs using graphviz. Some tools to perform type inference and clausification for typed formulas are also provided, as well as a separate library for dealing with terms and formulas [Cru15].

Expected Competition Performance

The prover is expected to perform well on THF, about as well as last year's version. We expect to beat E.