Entrants' System Descriptions


Beagle 0.9.47

Peter Baumgartner
Data61, Australia

Architecture

Beagle [
BBW15] is an automated theorem prover for sorted first-order logic with equality over built-in theories. The theories currently supported are integer arithmetic, linear rational arithmetic and linear real arithmetic. It accepts formulas in the FOF and TFF formats of the TPTP syntax, and formulas in the SMT-LIB version 2 format.

Beagle first converts the input formulas into clause normal form. Pure arithmetic (sub-)formulas are treated by eager application of quantifier elimination. The core reasoning component implements the Hierarchic Superposition Calculus with Weak Abstraction (HSPWA) [BW13]. Extensions are a splitting rule for clauses that can be divided into variable disjoint parts, and a partial instantiation rule for variables with finite domain, and two kinds of background-sorted variables trading off completeness vs. search space.

The HSPWA calculus generalizes the superposition calculus by integrating theory reasoning in a black-box style. For the theories mentioned above, Beagle combines quantifier elimination procedures and other solvers to dispatch proof obligations over these theories. The default solvers are an improved version of Cooper's algorithm for linear integer arithmetic, and the CVC4 SMT solver for linear real/rational arithmetic. Non-linear integer arithmetic is treated by partial instantiation and additional lemmas.

Strategies

Beagles uses the Discount loop for saturating a clause set under the calculus' inference rules. Simplification techniques include standard ones, such as subsumption deletion, demodulation by ordered unit equations, and tautology deletion. It also includes theory specific simplification rules for evaluating ground (sub)terms, and for exploiting cancellation laws and properties of neutral elements, among others. In the competition an aggressive form of arithmetic simplification is used, which seems to perform best in practice.

Beagle uses strategy scheduling by trying (at most) three flag settings sequentially.

Implementation

Beagle is implemented in Scala. It is a full implementation of the HSPWA calculus. It uses a simple form of indexing, essentially top-symbol hashes, stored with each term and computed in a lazy way. Fairness is achieved through a combination of measuring clause weights and their derivation-age. It can be fine-tuned with a weight-age ratio parameter, as in other provers. Beagle's web site is
    https://bitbucket.org/peba123/beagle

Expected Competition Performance

Beagle 0.9.47 is the CASC-J8 TFN winner.


CSE 1.6

Feng Cao
JiangXi University of Science and Technology, China

Architecture

CSE 1.6 is a developed prover based on the last version - CSE 1.5. It is an automated theorem prover for first-order logic without equality, based mainly on a novel inference mechanism called Contradiction Separation Based Dynamic Multi-Clause Synergized Automated Deduction (S-CS) [
XL+18]. S-CS is able to handle multiple (two or more) clauses dynamically in a synergized way in one deduction step, while binary resolution is a special case. CSE 1.6 also adopts conventional factoring, equality resolution (ER rule), and variable renaming. Some pre-processing techniques, including pure literal deletion and simplification based on the distance to the goal clause, and a number of standard redundancy criteria for pruning the search space: tautology deletion, subsumption (forward and backward), are applied as well. CSE 1.6 has been improved compared with CSE 1.5, mainly from the following aspects:
  1. A multi-clause dynamic deduction algorithm based on full use of clauses is added by a different type of clause adequacy evaluation method.
  2. A multi-clause synergized deduction algorithm with full use of synergized clauses is added, which can make the substitution of candidate literals involved in deduction from simple to complex.
Internally CSE 1.6 works only with clausal normal form. The E prover [Sch13] is adopted with thanks for clausification of full first-order logic problems during preprocessing.

Strategies

CSE 1.6 inherited most of the strategies in CSE 1.5. The main new strategies are:

Implementation

CSE 1.6 is implemented mainly in C++, and Java is used for batch problem running implementation. A shared data structure is used for constants and shared variables storage. In addition, special data structure is designed for property description of clause, literal and term, so that it can support the multiple strategy mode. E prover is used for clausification of FOF problems, and then TPTP4X is applied to convert the CNF format into TPTP format.

Expected Competition Performance

CSE 1.6 has made some improvements compared to CSE 1.5, and so we expect a better performance in this year's competition.

Acknowledgement: Development of CSE 1.6 has been partially supported by the General Research Project of Jiangxi Education Department (Grant No. GJJ200818).


CSE_E 1.5

Peiyao Liu
Southwest Jiaotong University, China

Architecture

CSE_E 1.5 is an automated theorem prover for first-order logic by combining CSE 1.6 and E 3.0, where CSE 1.6 is based on the Contradiction Separation Based Dynamic Multi-Clause Synergized Automated Deduction (S-CS) [
XL+18] and E is mainly based on superposition. The combination mechanism is like this: E and CSE are applied to the given problem sequentially. If either prover solves the problem, then the proof process completes. If neither CSE nor E can solve the problem, some inferred clauses with no more than two literals, especially unit clauses, by CSE will be fed to E as lemmas, along with the original clauses, for further proof search. This kind of combination is expected to take advantage of both CSE and E, and produce a better performance. Concretely, CSE is able to generate a good number of unit clauses, based on the fact that unit clauses are helpful for proof search and equality handling. On the other hand, E has a good ability on equality handling.

Strategies

The CSE part of CSE_E 1.5 takes almost the same strategies as in that in CSE 1.6 standalone, e.g., clause/literal selection, strategy selection, and CSC strategy. The only difference is that equality handling strategies of CSE part of the combined system are blocked. The main new strategies for the combined systems are:

Implementation

CSE_E 1.5 is implemented mainly in C++, and JAVA is used for batch problem running implementation. The job dispatch between CSE and E is implemented in C++.

Expected Competition Performance

We expect CSE_E 1.5 to solve some hard problems that E cannot solve and have a satisfying performance.

Acknowledgement: Development of CSE_E 1.5 has been partially supported by the National Natural Science Foundation of China (NSFC) (Grant No. 61976130). Stephan Schulz for his kind permission on using his E prover that makes CSE_E possible.


cvc5 1.0

Andrew Reynolds
University of Iowa, USA

Architecture

cvc5 is the successor of CVC4 [
BC+11]. It is an SMT solver based on the CDCL(T) architecture [NOT06] that includes built-in support for many theories, including linear arithmetic, arrays, bit vectors, datatypes, finite sets and strings. It incorporates approaches for handling universally quantified formulas. For problems involving free function and predicate symbols, cvc5 primarily uses heuristic approaches based on conflict-based instantiation and E-matching for theorems, and finite model finding approaches for non-theorems. Like other SMT solvers, cvc5 treats quantified formulas using a two-tiered approach. First, quantified formulas are replaced by fresh Boolean predicates and the ground theory solver(s) are used in conjunction with the underlying SAT solver to determine satisfiability. If the problem is unsatisfiable at the ground level, then the solver answers "unsatisfiable". Otherwise, the quantifier instantiation module is invoked, and will either add instances of quantified formulas to the problem, answer "satisfiable", or return unknown. Finite model finding in cvc5 targets problems containing background theories whose quantification is limited to finite and uninterpreted sorts. In finite model finding mode, cvc5 uses a ground theory of finite cardinality constraints that minimizes the number of ground equivalence classes, as described in [RT+13]. When the problem is satisfiable at the ground level, a candidate model is constructed that contains complete interpretations for all predicate and function symbols. It then adds instances of quantified formulas that are in conflict with the candidate model, as described in [RT+13]. If no instances are added, it reports "satisfiable".

cvc5 has native support for problems in higher-order logic, as described in [BR+19]. It uses a pragmatic approach for HOL, where lambdas are eliminated eagerly via lambda lifting. The approach extends the theory solver for quantifier-free uninterpreted functions (UF) and E-matching. For the former, the theory solver for UF in cvc5 now handles equalities between functions using an extensionality inference. Partial applications of functions are handle using a (lazy) applicative encoding where some function applications are equated to the applicative encoding. For the latter, several of the data structures for E-matching have been modified to incorporate matching in the presence of equalities between functions, function variables, and partial function applications.

Strategies

For handling theorems, cvc5 primarily uses conflict-based quantifier instantiation [RTd14, BFR17], numerative instantiation [RBF18] and E-matching. vc5 uses a handful of orthogonal trigger selection strategies for E-matching, and several orthogonal ordering heuristics for enumerative instantiation. For handling non-theorems, cvc5 primarily uses finite model finding techniques. Since cvc5 with finite model finding is also capable of establishing unsatisfiability, it is used as a strategy for theorems as well.

Implementation

cvc5 is implemented in C++. The code is available from
    https://github.com/cvc5/cvc5

Expected Competition Performance

cvc5 1.0 is the CASC-J11 TFA winner.


cvc5 1.0.5

Andrew Reynolds
University of Iowa, USA

Architecture

cvc5 [
BB+22] is the successor of CVC4 [BC+11]. It is an SMT solver based on the CDCL(T) architecture~\cite{NOT06} that includes built-in support for many theories, including linear arithmetic, arrays, bit vectors, datatypes, finite sets and strings. It incorporates approaches for handling universally quantified formulas. For problems involving free function and predicate symbols, cvc5 primarily uses heuristic approaches based on conflict-based instantiation and E-matching for theorems, and finite model finding approaches for non-theorems.

Like other SMT solvers, cvc5 treats quantified formulas using a two-tiered approach. First, quantified formulas are replaced by fresh Boolean predicates and the ground theory solver(s) are used in conjunction with the underlying SAT solver to determine satisfiability. If the problem is unsatisfiable at the ground level, then the solver answers "unsatisfiable". Otherwise, the quantifier instantiation module is invoked, and will either add instances of quantified formulas to the problem, answer "satisfiable", or return unknown. Finite model finding in cvc5 targets problems containing background theories whose quantification is limited to finite and uninterpreted sorts. In finite model finding mode, cvc5 uses a ground theory of finite cardinality constraints that minimizes the number of ground equivalence classes, as described in [RT+13]. When the problem is satisfiable at the ground level, a candidate model is constructed that contains complete interpretations for all predicate and function symbols. It then adds instances of quantified formulas that are in conflict with the candidate model, as described in [RT+13]. If no instances are added, it reports "satisfiable".

cvc5 has native support for problems in higher-order logic, as described in [BR+19]. It uses a pragmatic approach for HOL, where lambdas are eliminated eagerly via lambda lifting. The approach extends the theory solver for quantifier-free uninterpreted functions (UF) and E-matching. For the former, the theory solver for UF in cvc5 now handles equalities between functions using an extensionality inference. Partial applications of functions are handle using a (lazy) applicative encoding where some function applications are equated to the applicative encoding. For the latter, several of the data structures for E-matching have been modified to incorporate matching in the presence of equalities between functions, function variables, and partial function applications.

Strategies

For handling theorems, cvc5 primarily uses conflict-based quantifier instantiation [RTd14, BFR17], enumerative instantiation [RBF18], and E-matching. cvc5 uses a handful of orthogonal trigger selection strategies for E-matching, and several orthogonal ordering heuristics for enumerative instantiation. For handling non-theorems, cvc5 primarily uses finite model finding techniques. Since cvc5 with finite model finding is also capable of establishing unsatisfiability, it is used as a strategy for theorems as well.

Implementation

cvc5 is implemented in C++. The code is available from
    https://github.com/cvc5/cvc5

Expected Competition Performance

The first-order theorem proving, finite model finding and proof generation [BB+22], in cvc5 have undergone minor improvements in the past year. Additionally, this year cvc5 will be using a translator from TPTP to smt2 as a preprocess step and new faster parser for smt2. Overall, cvc5 will perform similar to previous years.


Drodi 3.5.1

Oscar Contreras
Amateur Programmer, Spain

Architecture

Drodi 3.5.1 is a very basic and lightweight automathed theorem prover. It implements ordered resolution and equality paramodulation inferences as well as demodulation and some other standard simplifications. It also includes its own basic implementations of clausal normal form conversion [
NW01], AVATAR architecture with a SAT solver [Vor14], Limited Resource Strategy [RV03], discrimination trees as well as KBO, non recursive and lexicographic reduction orderings. Drodi produces a (hopefully) verifiable proof in TPTP format.

The following features have been added since last CASC competition:

Strategies

Drodi 3.5.1 has a fair number of selectable strategies including but not limited to the following:

Implementation

Drodi 3.5.1 is implemented in C. It includes discrimination trees and hashing indexing. All the code is original, without special code libraries or code taken from other sources.

Expected Competition Performance

Several new features have been added to Drodi 3.5.1 and it is expected a significant improvement in FOF with respect to last year CASC and a more modest improvement in UEQ.


Duper 1.0

Joshua Clune
Carnegie Mellon University, USA

Architecture

Duper is a superposition-based theorem prover for dependent type theory, designed to prove theorems in the proof assistant Lean 4. To solve problems in TPTP format, Duper first translates them into Lean goals, using a shallow embedding of first-order and higher-order logic into dependent type theory. Translation is currently possible for FOF, TFF, and THF problems without arithmetic. When run in Lean, Duper produces proof terms which get verified by Lean's kernel. When run as a standalone executable for the purposes of this competition, Duper still produces Lean proof terms but they are not checked by Lean's kernel.

Duper's core architecture is based on saturation with a set of inference and simplification rules that operate on dependently-typed terms but primarily perform first-order and some higher-order reasoning. The calculus is closely based on Zipperposition's [BB+21] though it has been adapted to be sound in Lean's type theory. The unification procedure extends [VBN20] to dependent type theory, and remains complete for the higher order fragment of Lean. Duper uses streams to represent conclusions of inference rules and interleaves between unification and inference, as in [BB+21].

Strategies

Currently, Duper has only a single strategy without time slicing and does not take advantage of multiple cores. The strategy is simple and does not tune its heuristics based on the input problem. Duper uses a KBO term ordering with a basic precedence and weight heuristic.

Implementation

The prover is implemented in Lean 4 as a Lean tactic. The script provided to solve TPTP problems parses the given file, converts it to a Lean 4 goal, then attempts to solve said goal with the Duper tactic. Duper's source code can be found at
    https://github.com/leanprover-community/duper

Expected Competition Performance

Duper is still in an early stage of development and is not expected to compete with matured tools.


E 3.0

Stephan Schulz
DHBW Stuttgart, Germany

Architecture

E [
Sch02, Sch13, SCV19] is a purely equational theorem prover for many-sorted first-order logic with equality, and for monomorphic higher-order logic. It consists of an (optional) clausifier for pre-processing full first-order formulae into clausal form, and a saturation algorithm implementing an instance of the superposition calculus with negative literal selection and a number of redundancy elimination techniques, optionally with higher-order extensions [VB+21]. E is based on the DISCOUNT-loop variant of the given-clause algorithm, i.e., a strict separation of active and passive facts. No special rules for non-equational literals have been implemented. Resolution is effectively simulated by paramodulation and equality resolution. As of E 2.1, PicoSAT [Bie08] can be used to periodically check the (on-the-fly grounded) proof state for propositional unsatisfiability. For the LTB divisions, a control program uses a SInE-like analysis to extract reduced axiomatizations that are handed to several instances of E. E will not use on-the-fly learning this year.

Strategies

Proof search in E is primarily controlled by a literal selection strategy, a clause selection heuristic, and a simplification ordering. The prover supports a large number of pre-programmed literal selection strategies. Clause selection heuristics can be constructed on the fly by combining various parameterized primitive evaluation functions, or can be selected from a set of predefined heuristics. Clause evaluation heuristics are based on symbol-counting, but also take other clause properties into account. In particular, the search can prefer clauses from the set of support, or containing many symbols also present in the goal. Supported term orderings are several parameterized instances of Knuth-Bendix-Ordering (KBO) and Lexicographic Path Ordering (LPO), which can be lifted in different ways to literal orderings.

For CASC-J11, E implements a multi-core strategy-scheduling automatic mode. The total CPU time available is broken into several (unequal) time slices. For each time slice, the problem is classified into one of several classes, based on a number of simple features (number of clauses, maximal symbol arity, presence of equality, presence of non-unit and non-Horn clauses, possibly presence of certain axiom patterns...). For each class, a schedule of strategies is greedily constructed from experimental data as follows: The first strategy assigned to a schedule is the the one that solves the most problems from this class in the first time slice. Each subsequent strategy is selected based on the number of solutions on problems not already solved by a preceding strategy.

About 140 different strategies have been thoroughly evaluated on all untyped first-order problems from TPTP 7.3.0. We have also explored some parts of the heuristic parameter space with a short time limit of 5 seconds. This allowed us to test about 650 strategies on all TPTP problems, and an extra 7000 strategies on UEQ problems from TPTP 7.2.0. About 100 of these strategies are used in the automatic mode, and about 450 are used in at least one schedule.

Implementation

E is build around perfectly shared terms, i.e. each distinct term is only represented once in a term bank. The whole set of terms thus consists of a number of interconnected directed acyclic graphs. Term memory is managed by a simple mark-and-sweep garbage collector. Unconditional (forward) rewriting using unit clauses is implemented using perfect discrimination trees with size and age constraints. Whenever a possible simplification is detected, it is added as a rewrite link in the term bank. As a result, not only terms, but also rewrite steps are shared. Subsumption and contextual literal cutting (also known as subsumption resolution) is supported using feature vector indexing [Sch13]. Superposition and backward rewriting use fingerprint indexing [Sch12], a new technique combining ideas from feature vector indexing and path indexing. Finally, LPO and KBO are implemented using the elegant and efficient algorithms developed by Bernd Löchner in [Loe06,Loe06]. The prover and additional information are available at
    https://www.eprover.org

Expected Competition Performance

E 3.0 is the CASC-J11 SLH winner.

E 3.1

Stephan Schulz
DHBW Stuttgart, Germany

Architecture

E [
Sch02, Sch13, SCV19] is a purely equational theorem prover for many-sorted first-order logic with equality, and for monomorphic higher-order logic. It consists of an (optional) clausifier for pre-processing full first-order formulae into clausal form, and a saturation algorithm implementing an instance of the superposition calculus with negative literal selection and a number of redundancy elimination techniques, optionally with higher-order extensions [VB+21, VBS23]. E is based on the DISCOUNT-loop variant of the given-clause algorithm, i.e., a strict separation of active and passive facts. No special rules for non-equational literals have been implemented. Resolution is effectively simulated by paramodulation and equality resolution. As of E 2.1, PicoSAT [Bie08] can be used to periodically check the (on-the-fly grounded) proof state for propositional unsatisfiability.

Strategies

Proof search in E is primarily controlled by a literal selection strategy, a clause selection heuristic, and a simplification ordering. The prover supports a large number of pre-programmed literal selection strategies. Clause selection heuristics can be constructed on the fly by combining various parameterized primitive evaluation functions, or can be selected from a set of predefined heuristics. Clause evaluation heuristics are based on symbol-counting, but also take other clause properties into account. In particular, the search can prefer clauses from the set of support, or containing many symbols also present in the goal. Supported term orderings are several parameterized instances of Knuth-Bendix-Ordering (KBO) and Lexicographic Path Ordering (LPO), which can be lifted in different ways to literal orderings.

For CASC-29, E implements a two-stage multi-core strategy-scheduling automatic mode. The total CPU time available is broken into several (unequal) time slices. For each time slice, the problem is classified into one of several classes, based on a number of simple features (number of clauses, maximal symbol arity, presence of equality, presence of non-unit and non-Horn clauses, possibly presence of certain axiom patterns...). For each class, a schedule of strategies is greedily constructed from experimental data as follows: The first strategy assigned to a schedule is the the one that solves the most problems from this class in the first time slice. Each subsequent strategy is selected based on the number of solutions on problems not already solved by a preceding strategy. The strategies are then scheduled onto the available cores and run in parallel.

About 140 different strategies have been thoroughly evaluated on all untyped first-order problems from TPTP 7.3.0. We have also explored some parts of the heuristic parameter space with a short time limit of 5 seconds. This allowed us to test about 650 strategies on all TPTP problems, and an extra 7000 strategies on UEQ problems from TPTP 7.2.0. About 100 of these strategies are used in the automatic mode, and about 450 are used in at least one schedule.

Implementation

E is build around perfectly shared terms, i.e. each distinct term is only represented once in a term bank. The whole set of terms thus consists of a number of interconnected directed acyclic graphs. Term memory is managed by a simple mark-and-sweep garbage collector. Unconditional (forward) rewriting using unit clauses is implemented using perfect discrimination trees with size and age constraints. Whenever a possible simplification is detected, it is added as a rewrite link in the term bank. As a result, not only terms, but also rewrite steps are shared. Subsumption and contextual literal cutting (also known as subsumption resolution) is supported using feature vector indexing [Sch13]. Superposition and backward rewriting use fingerprint indexing [Sch12], a new technique combining ideas from feature vector indexing and path indexing. Finally, LPO and KBO are implemented using the elegant and efficient algorithms developed by Bernd Löchner in [Loe06, Loe06]. The prover and additional information are available at
    https://www.eprover.org

Expected Competition Performance

E 3.1 is basically E 3.0 made more robust and somewhat more efficient. We have not yet been able to evaluate and integrate new search strategies making full use of all new features. As a result, we expect performance to be similar to last year's version. The system is expected to perform well in most proof classes, but will at best complement top systems in the disproof classes.


GKC 0.8

Tanel Tammet
Tallinn University of Technology, Estonia

Architecture

GKC [
Tam19] is a resolution prover optimized for search in large knowledge bases. The GKC version 0.8 running at CASC-29 is a marginally improved version of the GKC 0.7 running in two previous CASCs. Almost all of the GKC development effort this year has gone to the commonsense superstructure GK (https://logictools.org/gk/) and the natural language reasoning pipeline (https://github.com/tammet/nlpsolver) [TJ+23].

GKC is used as a foundation (GK Core) for building a common-sense reasoner GK. In particular, GK can handle inconsistencies and perform probabilistic and nonmonotonic reasoning [Tam21, Tam22].

The WASM version of the previous GKC 0.6 is used as the prover engine in the educational http://logictools.org system. It can read and output proofs in the TPTP, simplified TPTP and JSON format, the latter compatible with JSON-LD [TS21].

GKC only looks for proofs and does not try to show non-provability. These standard inference rules have been implemented in GKC:

GKC includes an experimental implementation of propositional inferencing and instance generation, which we do not plan to use during the current CASC.

Strategies

GKC uses multiple strategies run sequentially, with the time limit starting at 0.1 seconds for each, increased 10 or 5 times once the whole batch has been performed. The strategy selections takes into consideration the basic properties of the problem: the presence of equality and the approximate size of the problem.

We perform the selection of a given clause by using several queues in order to spread the selection relatively uniformly over these categories of derived clauses and their descendants: axioms, external axioms, assumptions and goals. The queues are organized in two layers. As a first layer we use the common ratio-based algorithm of alternating between selecting n clauses from a weight-ordered queue and one clause from the FIFO queue with the derivation order. As a second layer we use four separate queues based on the derivation history of a clause. Each queue in the second layer contains the two sub-queues of the first layer.

Implementation

GKC is implemented in C. The data representation machinery is built upon a shared memory graph database Whitedb enabling it to solve multiple different queries in parallel processeses without a need to repeatedly parse or load the large parsed knowledge base from the disk. An interesting aspect of GKC is the pervasive use of hash indexes, feature vectors and fingerprints, while no tree indexes are used. GKC can be obtained from
    https://github.com/tammet/gkc/

Expected Competition Performance

We expect GKC to be in the middle of the final ranking for FOF and below the average in UEQ. We expect GKC to perform well on very large problems.


iProver 3.8

Konstantin Korovin
University of Manchester, United Kingdom

None provided


Lash 1.13

Cezary Kaliszyk
University of Innsbruck, Austria

Architecture

Lash [
BK22] is a higher-order automated theorem prover created as a fork of the theorem prover Satallax. The basic underlying calculus of Satallax is a ground tableau calculus whose rules only use shallow information about the terms and formulas taking part in the rule.

Strategies

There are about 113 flags that control Lash's behavior, most of them inherited from Satallax. A mode is a collection of flag values. Starting from 10 Satallax modes, Grackle was used to derive 61 modes automatically, and grouped into two schedules of 15 modes each.

Implementation

Lash uses new, efficient C representations of vital structures and operations. Most importantly, Lash uses a C representation of (normal) terms with perfect sharing along with a C implementation of normalizing substitutions. Lash's version 1.12 additionally includes a new term enumeration scheme, and Grackle-based strategy schedule.

Expected Competition Performance

Comparable to Satallax.


LEO-II 1.7.0

Alexander Steen
University of Greifswald, Germany

Architecture

LEO-II [
BP+08], the successor of LEO [BK98], is a higher-order ATP system based on extensional higher-order resolution. More precisely, LEO-II employs a refinement of extensional higher-order RUE resolution [Ben99]. LEO-II is designed to cooperate with specialist systems for fragments of higher-order logic. By default, LEO-II cooperates with the first-order ATP system E [Sch02]. LEO-II is often too weak to find a refutation amongst the steadily growing set of clauses on its own. However, some of the clauses in LEO-II's search space attain a special status: they are first-order clauses modulo the application of an appropriate transformation function. Therefore, LEO-II launches a cooperating first-order ATP system every n iterations of its (standard) resolution proof search loop (e.g., 10). If the first-order ATP system finds a refutation, it communicates its success to LEO-II in the standard SZS format. Communication between LEO-II and the cooperating first-order ATP system uses the TPTP language and standards.

Strategies

LEO-II employs an adapted "Otter loop". Moreover, LEO-II uses some basic strategy scheduling to try different search strategies or flag settings. These search strategies also include some different relevance filters.

Implementation

LEO-II is implemented in OCaml 4, and its problem representation language is the TPTP THF language [BRS08]. In fact, the development of LEO-II has largely paralleled the development of the TPTP THF language and related infrastructure [SB10]. LEO-II's parser supports the TPTP THF0 language and also the TPTP languages FOF and CNF.

Unfortunately the LEO-II system still uses only a very simple sequential collaboration model with first-order ATPs instead of using the more advanced, concurrent and resource-adaptive OANTS architecture [BS+08] as exploited by its predecessor LEO.

The LEO-II system is distributed under a BSD style license, and it is available from

    http://www.leoprover.org

Expected Competition Performance

LEO-II is not actively being developed anymore, hence there are no expected improvements to last year's CASC results.


Leo-III 1.7.8

Alexander Steen
University of Greifswald, Germany

Architecture

Leo-III [
SB21], the successor of LEO-II [BP+08], is a higher-order ATP system based on extensional higher-order paramodulation with inference restrictions using a higher-order term ordering. The calculus contains dedicated extensionality rules and is augmented with equational simplification routines that have their intellectual roots in first-order superposition-based theorem proving. The saturation algorithm is a variant of the given clause loop procedure inspired by the first-order ATP system E.

Leo-III cooperates with external first-order ATPs that are called asynchronously during proof search; a focus is on cooperation with systems that support typed first-order (TFF) input. For this year's CASC, CVC4 [BC+11] and E [Sch02, Sch13] are used as external systems. However, cooperation is in general not limited to first-order systems. Further TPTP/TSTP-compliant external systems (such as higher-order ATPs or counter model generators) may be included using simple command-line arguments. If the saturation procedure loop (or one of the external provers) finds a proof, the system stops, generates the proof certificate and returns the result.

Strategies

Leo-III comes with several configuration parameters that influence its proof search by applying different heuristics and/or restricting inferences. These parameters can be chosen manually by the user on start-up. Leo-III implements a very naive time slicing approach in which at most three different parameter configurations are used, one after each other. In practice, this hardly ever happens and Leo-III will just run with its default parameter setting.

Implementation

Leo-III utilizes and instantiates the associated LeoPARD system platform [WSB15] for higher-order (HO) deduction systems implemented in Scala (currently using Scala 2.13 and running on a JVM with Java 8). The prover makes use of LeoPARD's data structures and implements its own reasoning logic on top. A hand-crafted parser is provided that supports all TPTP syntax dialects. It converts its produced concrete syntax tree to an internal TPTP AST data structure which is then transformed into polymorphically typed lambda terms. As of version 1.1, Leo-III supports all common TPTP dialects (CNF, FOF, TFF, THF) as well as their polymorphic variants [BP13, KRS16]. Since version 1.6.X (X >= 0) Leo-III also accepts non-classical problem input represented in non-classical TPTP, see ...
    https://tptp.org/NonClassicalLogic/

The term data structure of Leo-III uses a polymorphically typed spine term representation augmented with explicit substitutions and De Bruijn-indices. Furthermore, terms are perfectly shared during proof search, permitting constant-time equality checks between alpha-equivalent terms.

Leo-III's saturation procedure may at any point invoke external reasoning tools. To that end, Leo-III includes an encoding module which translates (polymorphic) higher-order clauses to polymorphic and monomorphic typed first-order clauses, whichever is supported by the external system. While LEO-II relied on cooperation with untyped first-order provers, Leo-III exploits the native type support in first-order provers (TFF logic) for removing clutter during translation and, in turn, higher effectivity of external cooperation.

Leo-III is available on GitHub:

    https://github.com/leoprover/Leo-III

Expected Competition Performance

Version 1.7.8 is, for all intents and purposes of CASC, equivalent to the version from previous year except that support for reasoning in various quantified non-classical logics were added and/or improved. We do not expect Leo-III to be strongly competitive against more recent higher-order provers as Leo-III does not implement several standard features of effective systems (including time slicing and proper axiom selection). For the first time, a native LLVM build of Leo-III is used instead of the JVM-based set-up (at least for the SLH division).


Princess 230619

Philipp Rümmer
University of Regensburg, Germany

Architecture

Princess [
Rue08] is a theorem prover for first-order logic modulo linear integer arithmetic. The prover uses a combination of techniques from the areas of first-order reasoning and SMT solving. The main underlying calculus is a free-variable tableau calculus, which is extended with constraints to enable backtracking-free proof expansion, and positive unit hyper-resolution for lightweight instantiation of quantified formulae. Linear integer arithmetic is handled using a set of built-in proof rules resembling the Omega test, which altogether yields a calculus that is complete for full Presburger arithmetic, for first-order logic, and for a number of further fragments. Princess also contains theory modules for, among others, non-linear arithmetic, rationals, bit-vectors, arrays, heaps, algebraic data-types, strings.

The list of contributors is available on https://github.com/uuverifiers/princess/blob/master/AUTHORS.

Strategies

For CASC, Princess will run a fixed schedule of configurations for each problem (portfolio method). Configurations determine, among others, the mode of proof expansion (depth-first, breadth-first), selection of triggers in quantified formulae, clausification, and the handling of functions. The portfolio was chosen based on training with a random sample of problems from the TPTP library.

The following options are used in the competition:

-portfolio=casc +threads +printProof -inputFormat=tptp

The version 230619 submitted to CASC-29 is the standard version of Princess. This is in contrast to CASC-26 (2017), when Princess was participating the last time, when a version tailor-made for CASC was used.

Implementation

Princess is entirely written in Scala and runs on any recent Java virtual machine; besides the standard Scala and Java libraries, only the Cup parser library is used.

Sources and binaries are available from

    https://github.com/uuverifiers/princess

Expected Competition Performance

The system has not been competing at CASC in the years since 2017. Most of the implementation effort since then has focused on theory solvers, not on the core first-order proof procedure, so that the results should be similar as in 2017. Like in 2017, there is still limited support for proof generation in the context of quantifiers (free variables used in the constructed tableau), so that Princess will lose points here.


Prover9 1109a

Bob Veroff on behalf of William McCune
University of New Mexico, USA

Architecture

Prover9, Version 2009-11A, is a resolution/paramodulation prover for first-order logic with equality. Its overall architecture is very similar to that of Otter-3.3 [
McC03]. It uses the "given clause algorithm", in which not-yet-given clauses are available for rewriting and for other inference operations (sometimes called the "Otter loop").

Prover9 has available positive ordered (and nonordered) resolution and paramodulation, negative ordered (and nonordered) resolution, factoring, positive and negative hyperresolution, UR-resolution, and demodulation (term rewriting). Terms can be ordered with LPO, RPO, or KBO. Selection of the "given clause" is by an age-weight ratio.

Proofs can be given at two levels of detail: (1) standard, in which each line of the proof is a stored clause with detailed justification, and (2) expanded, with a separate line for each operation. When FOF problems are input, proof of transformation to clauses is not given.

Completeness is not guaranteed, so termination does not indicate satisfiability.

Strategies

Prover9 has available many strategies; the following statements apply to CASC.

Given a problem, Prover9 adjusts its inference rules and strategy according to syntactic properties of the input clauses such as the presence of equality and non-Horn clauses. Prover9 also does some preprocessing, for example, to eliminate predicates.

For CASC Prover9 uses KBO to order terms for demodulation and for the inference rules, with a simple rule for determining symbol precedence.

For the FOF problems, a preprocessing step attempts to reduce the problem to independent subproblems by a miniscope transformation; if the problem reduction succeeds, each subproblem is clausified and given to the ordinary search procedure; if the problem reduction fails, the original problem is clausified and given to the search procedure.

Implementation

Prover9 is coded in C, and it uses the LADR libraries. Some of the code descended from EQP [McC97]. (LADR has some AC functions, but Prover9 does not use them). Term data structures are not shared (as they are in Otter). Term indexing is used extensively, with discrimination tree indexing for finding rewrite rules and subsuming units, FPA/Path indexing for finding subsumed units, rewritable terms, and resolvable literals. Feature vector indexing [Sch04] is used for forward and backward nonunit subsumption. Prover9 is available from
    http://www.cs.unm.edu/~mccune/prover9/

Expected Competition Performance

Prover9 is the CASC fixed point, against which progress can be judged. Each year it is expected do worse than the previous year, relative to the other systems.


Satallax 3.4

Cezary Kaliszyk
Universität Innsbruck, Austria

Architecture

Satallax 3.4 [
Bro12] is an automated theorem prover for higher-order logic. The particular form of higher-order logic supported by Satallax is Church's simple type theory with extensionality and choice operators. The SAT solver MiniSat [ES04] is responsible for much of the proof search. The theoretical basis of search is a complete ground tableau calculus for higher-order logic [BS10] with a choice operator [BB11]. Problems are given in the THF format.

Proof search: A branch is formed from the axioms of the problem and the negation of the conjecture (if any is given). From this point on, Satallax tries to determine unsatisfiability or satisfiability of this branch. Satallax progressively generates higher-order formulae and corresponding propositional clauses [Bro13]. These formulae and propositional clauses correspond to instances of the tableau rules. Satallax uses the SAT solver MiniSat to test the current set of propositional clauses for unsatisfiability. If the clauses are unsatisfiable, then the original branch is unsatisfiable. Optionally, Satallax generates lambda-free higher-order logic (lfHOL) formulae in addition to the propositional clauses [VB+19]. If this option is used, then Satallax periodically calls the theorem prover E [Sch13] to test for lfHOL unsatisfiability. If the set of lfHOL formulae is unsatisfiable, then the original branch is unsatisfiable. Upon request, Satallax attempts to reconstruct a proof which can be output in the TSTP format.

Strategies

There are about 150 flags that control the order in which formulae and instantiation terms are considered and propositional clauses are generated. Other flags activate some optional extensions to the basic proof procedure (such as whether or not to call the theorem prover E). A collection of flag settings is called a mode. Approximately 500 modes have been defined and tested so far. A strategy schedule is an ordered collection of modes with information about how much time the mode should be allotted. Satallax tries each of the modes for a certain amount of time sequentially. Before deciding on the schedule to use, Satallax parses the problem and determines if it is big enough that a SInE-based premise selection algorithm [HV11] should be used. If SInE is not activated, then Satallax uses a strategy schedule consisting of 37 modes. If SInE is activated, than Satallax is run with a SInE-specific schedule consisting of 58 modes with different SInE parameter values selecting different premises. Each mode is tried for time limits ranging from less than a second to just over 1 minute.

Implementation

Satallax is implemented in OCaml, making use of the external tools MiniSat (via a foreign function interface) and E. Satallax is available at:
    http://cl-informatik.uibk.ac.at/~mfaerber/satallax.html

Expected Competition Performance

Satallax 3.4 was the CASC-27 THF winner.


Toma 0.4

Teppei Saito
Japan Advanced Institute of Science and Technology, Japan

Architecture

Toma 0.4 is an automatic equational theorem prover. It proves unsatisfiability of a UEQ problem as follows: A given problem is transformed into a word problem whose validity entails unsatisfiability of the original problem. The word problem is solved by a new variant of maximal (ordered) completion [
WM18, Hir21].

Strategies

Toma performs ordered completion in the following way: (1) Given an equational system E, the tool constructs a lexicographic path order > that maximizes reducibility of the ordered rewrite system (E, >) [WM18]. (2) Using the order >, the tool runs ordered completion [BDP89] on E without the deduce rule (critical pair generation). Such a run eventually ends with an inter-reduced version (E', >). (3) Then the tool checks joinability of the goal. If the goal is joinable in (E', >), the tool outputs the proof and terminates. Otherwise, assigning the union of E' and a set of critical pairs of (E', >) to E, the tool goes back to the first step (1). After a certain number of iterations of the loop, the tool skips the order generation step (1) and repeats only (2)(3) with a fixed lexicographic path order. Toma is written in Haskell. Z3 is used to solve maximization problems of reducibility. The source code is available at:
    https://www.jaist.ac.jp/project/maxcomp/

Expected Competition Performance

Toma is still in the experimental stage and unable to compete with matured tools. The tool would solve several easy problems from the categories BOO, GRP and RNG.


Twee 2.4.1

Nick Smallbone
Chalmers University of Technology, Sweden

Architecture

Twee [
Sma21] is a theorem prover for unit equality problems based on unfailing completion [BDP89]. It implements a DISCOUNT loop, where the active set contains rewrite rules (and unorientable equations) and the passive set contains critical pairs. The basic calculus is not goal-directed, but Twee implements a transformation which improves goal direction for many problems.

Twee features ground joinability testing [MN90] and a connectedness test [BD88], which together eliminate many redundant inferences in the presence of unorientable equations. The ground joinability test performs case splits on the order of variables, in the style of [MN90], and discharges individual cases by rewriting modulo a variable ordering.

Strategies

Twee's strategy is simple and it does not tune its heuristics or strategy based on the input problem. The term ordering is always KBO; by default, functions are ordered by number of occurrences and have weight 1. The proof loop repeats the following steps: Each critical pair is scored using a weighted sum of the weight of both of its terms. Terms are treated as DAGs when computing weights, i.e., duplicate subterms are only counted once per term.

For CASC, to take advantage of multiple cores, several versions of Twee run in parallel using different parameters (e.g., with the goal-directed transformation on or off).

Implementation

Twee is written in Haskell. Terms are represented as array-based flatterms for efficient unification and matching. Rewriting uses a perfect discrimination tree. The passive set is represented compactly (12 bytes per critical pair) by only storing the information needed to reconstruct the critical pair, not the critical pair itself. Because of this, Twee can run for an hour or more without exhausting memory.

Twee uses an LCF-style kernel: all rules in the active set come with a certified proof object which traces back to the input axioms. When a conjecture is proved, the proof object is transformed into a human-readable proof. Proof construction does not harm efficiency because the proof kernel is invoked only when a new rule is accepted. In particular, reasoning about the passive set does not invoke the kernel. The translation from Horn clauses to equations is not yet certified.

Twee can be downloaded as open source from:

    http://nick8325.github.io/twee

Expected Competition Performance

Twee 2.4.1 is the CASC-J11 UEQ winner.


Twee 2.4.2

Nick Smallbone
Chalmers University of Technology, Sweden

Architecture

Twee 2.4.2 [
Sma21] is a theorem prover for unit equality problems based on unfailing completion [BDP89]. It implements a DISCOUNT loop, where the active set contains rewrite rules (and unorientable equations) and the passive set contains critical pairs. The basic calculus is not goal-directed, but Twee implements a transformation which improves goal direction for many problems.

Twee features ground joinability testing [MN90] and a connectedness test [BD88], which together eliminate many redundant inferences in the presence of unorientable equations. The ground joinability test performs case splits on the order of variables, in the style of [MN90], and discharges individual cases by rewriting modulo a variable ordering.

Strategies

Twee's strategy is simple and it does not tune its heuristics or strategy based on the input problem. The term ordering is always KBO; by default, functions are ordered by number of occurrences and have weight 1. The proof loop repeats the following steps:

Each critical pair is scored using a weighted sum of the weight of both of its terms. Terms are treated as DAGs when computing weights, i.e., duplicate subterms are counted only once per term.

For CASC, to take advantage of multiple cores, several versions of Twee run in parallel using different parameters (e.g., with the goal-directed transformation on or off).

Implementation

Twee is written in Haskell. Terms are represented as array-based flatterms for efficient unification and matching. Rewriting uses a perfect discrimination tree.

The passive set is represented compactly (12 bytes per critical pair) by storing only the information needed to reconstruct the critical pair, not the critical pair itself. Because of this, Twee can run for an hour or more without exhausting memory.

Twee uses an LCF-style kernel: all rules in the active set come with a certified proof object which traces back to the input axioms. When a conjecture is proved, the proof object is transformed into a human-readable proof. Proof construction does not harm efficiency because the proof kernel is invoked only when a new rule is accepted. In particular, reasoning about the passive set does not invoke the kernel.

Twee can be downloaded as open source from:

    https://nick8325.github.io/twee

Expected Competition Performance

Either slightly better or slightly worse than Twee 2.4.1.


Vampire 4.7

Giles Reger
University of Manchester, United Kingdom

There are only small changes between Vampire 4.7 and Vampire 4.6 in the tracks relevant to CASC. As TFA did not run in 2021, the updates related to the paper "Making Theory Reasoning Simpler" [RSV21] that were present last year should have an impact this year. This work introduces a new set of rules for the evaluation and simplification of theory literals. We have also added some optional preprocessing steps inspired by Twee (see "Twee: An Equational Theorem Prover" [Sma21]) but these have not been fully incorporated into our strategy portfolio so are unlikely to make a significant impact.

Architecture

Vampire [KV13] is an automatic theorem prover for first-order logic with extensions to theory-reasoning and higher-order logic. Vampire implements the calculi of ordered binary resolution and superposition for handling equality. It also implements the Inst-gen calculus and a MACE-style finite model builder [RSV16]. Splitting in resolution-based proof search is controlled by the AVATAR architecture which uses a SAT or SMT solver to make splitting decisions [Vor14, RB+16]. A number of standard redundancy criteria and simplification techniques are used for pruning the search space: subsumption, tautology deletion, subsumption resolution and rewriting by ordered unit equalities. The reduction ordering is the Knuth-Bendix Ordering. Substitution tree and code tree indexes are used to implement all major operations on sets of terms, literals and clauses. Internally, Vampire works only with clausal normal form. Problems in the full first-order logic syntax are clausified during preprocessing [RSV16]. Vampire implements many useful preprocessing transformations including the SinE axiom selection algorithm. When a theorem is proved, the system produces a verifiable proof, which validates both the clausification phase and the refutation of the CNF.

Strategies

Vampire 4.7 provides a very large number of options for strategy selection. The most important ones are:

Implementation

Vampire 4.7 is implemented in C++. It makes use of minisat and z3 (the tagged GitHub commit details which z3 commit). See the website for more information and access to the GitHub repository.:
    https://vprover.github.io/

Expected Competition Performance

Vampire 4.7 is the CASC-J11 FOF and FNT winner.


Vampire 4.8

Michael Rawson
TU Wien, Austria

There have been a number of changes and improvements since Vampire 4.7, although it is still the same beast. Most significant from a competition point of view are long-awaited refreshed strategy schedules. As a result, several features present in previous competitions will now come into full force, including new rules for the evaluation and simplification of theory literals. A large number of completely new features and improvements also landed this year: highlights include a significant refactoring of the substitution tree implementation, the arrival of encompassment demodulation to Vampire, and support for parametric datatypes.

Vampire's higher-order support has also been re-implemented from the ground up. The new implementation is still at an early stage and its theoretical underpinnings are being developed. There is currently no documentation of either.

Architecture

Vampire [
KV13] is an automatic theorem prover for first-order logic with extensions to theory-reasoning and higher-order logic. Vampire implements the calculi of ordered binary resolution, and superposition for handling equality. It also implements the Inst-gen calculus and a MACE-style finite model builder [RSV16]. Splitting in resolution-based proof search is controlled by the AVATAR architecture which uses a SAT or SMT solver to make splitting decisions [Vor14, RB+16]. A number of standard redundancy criteria and simplification techniques are used for pruning the search space: subsumption, tautology deletion, subsumption resolution and rewriting by ordered unit equalities. The reduction ordering is the Knuth-Bendix Ordering. Substitution tree and code tree indexes are used to implement all major operations on sets of terms, literals and clauses. Internally, Vampire works only with clausal normal form. Problems in the full first-order logic syntax are clausified during preprocessing [RSV16]. Vampire implements many useful preprocessing transformations including the SinE axiom selection algorithm. When a theorem is proved, the system produces a verifiable proof, which validates both the clausification phase and the refutation of the CNF.

Strategies

Vampire 4.8 provides a very large number of options for strategy selection. The most important ones are: The schedule for the new HOL implementation was developed using Snake, a strategy schedule construction tool described in more detail last year. The Snake schedule will this year fully embrace Vampire randomisation support [Sud22] and, in particular, every strategy will independently shuffle the input problem, to nullify (in expectation) the effect of problem scrambling done by the organisers.

Implementation

Vampire 4.8 is implemented in C++. It makes use of fixed versions of Minisat and Z3. See the website for more information and access to the GitHub repository.

Expected Competition Performance

Vampire 4.8 should be an improvement on the previous version. A reasonably strong performance across all divisions is therefore expected. In the higher-order divisions, performance should be better than the previous year, but not yet on par with category-leading solvers.


Zipperposition 2.1.999

Jasmin Blanchette
Vrije Universiteit Amsterdam, The Netherlands

Architecture

Zipperposition is a superposition-based theorem prover for typed first-order logic with equality and for higher-order logic. It is a pragmatic implementation of a complete calculus for full higher-order logic [
BB+21]. It features a number of extensions that include polymorphic types, user-defined rewriting on terms and formulas ("deduction modulo theories"), a lightweight variant of AVATAR for case splitting [EBT21], and Boolean reasoning [VN20]. The core architecture of the prover is based on saturation with an extensible set of rules for inferences and simplifications. Zipperposition uses a full higher-order unification algorithm that enables efficient integration of procedures for decidable fragments of higher-order unification [VBN20]. The initial calculus and main loop were imitations of an earlier version of E [Sch02]. With the implementation of higher-order superposition, the main loop had to be adapted to deal with possibly infinite sets of unifiers [VB+21].

Strategies

The system uses various strategies in a portfolio. The strategies are run in parallel, making use of all CPU cores available. We designed the portfolio of strategies by manual inspection of TPTP problems. Zipperposition's heuristics are inspired by efficient heuristics used in E. Various calculus extensions are used by the strategies [VB+21]. The portfolio mode distinguishes between first-order and higher-order problems. If the problem is first-order, all higher-order prover features are turned off. In particular, the prover uses standard first-order superposition calculus and disables collaboration with the backend prover (described below). Other than that, the portfolio is static and does not depend on the syntactic properties of the problem.

Implementation

The prover is implemented in OCaml. Term indexing is done using fingerprints for unification, perfect discrimination trees for rewriting, and feature vectors for subsumption. Some inference rules such as contextual literal cutting make heavy use of subsumption. For higher-order problems, some strategies use the E prover as an end-game backend prover.

Zipperposition's code can be found at

    https://github.com/sneeuwballen/zipperposition
and is entirely free software (BSD-licensed).

Zipperposition can also output graphic proofs using graphviz. Some tools to perform type inference and clausification for typed formulas are also provided, as well as a separate library for dealing with terms and formulas [Cru15].

Expected Competition Performance

Zipperposition 2.1.999 is the CASC-J11 THF winner.


Zipperposition 2.1.9999

Jasmin Blanchette
Ludwig-Maximilians-Universität München, Germany

Architecture

Zipperposition is a superposition-based theorem prover for typed first-order logic with equality and for higher-order logic. It is a pragmatic implementation of a complete calculus for full higher-order logic [
BB+21]. It features a number of extensions that include polymorphic types, user-defined rewriting on terms and formulas ("deduction modulo theories"), a lightweight variant of AVATAR for case splitting [EBT21], and Boolean reasoning [VN20]. The core architecture of the prover is based on saturation with an extensible set of rules for inferences and simplifications. Zipperposition uses a full higher-order unification algorithm that enables efficient integration of procedures for decidable fragments of higher-order unification [VBN20]. The initial calculus and main loop were imitations of an earlier version of E [Sch02]. With the implementation of higher-order superposition, the main loop had to be adapted to deal with possibly infinite sets of unifiers [VB+21].

Strategies

The system uses various strategies in a portfolio. The strategies are run in parallel, making use of all CPU cores available. We designed the portfolio of strategies by manual inspection of TPTP problems. Zipperposition's heuristics are inspired by efficient heuristics used in E. Various calculus extensions are used by the strategies [VB+21]. The portfolio mode distinguishes between first-order and higher-order problems. If the problem is first-order, all higher-order prover features are turned off. In particular, the prover uses standard first-order superposition calculus and disables collaboration with the backend prover (described below). Other than that, the portfolio is static and does not depend on the syntactic properties of the problem.

Implementation

The prover is implemented in OCaml. Term indexing is done using fingerprints for unification, perfect discrimination trees for rewriting , and feature vectors for subsumption. Some inference rules such as contextual literal cutting make heavy use of subsumption. For higher-order problems, some strategies use the E prover as an end-game backend prover.

Zipperposition's code can be found at

    https://github.com/sneeuwballen/zipperposition
and is entirely free software (BSD-licensed).

Zipperposition can also output graphic proofs using graphviz. Some tools to perform type inference and clausification for typed formulas are also provided, as well as a separate library for dealing with terms and formulas [Cru15].

Expected Competition Performance

The prover is expected to perform well on THF, about as well as last year's version. We expect to beat E. In the SLH division, we expect respectable performance, but E will probably win.