Beagle 0.4
Peter Baumgartner
NICTA, Australia
Architecture
Beagle is an automated theorem prover for sorted first-order logic with
equality over built-in theories.
The theories currently supported are linear integer, linear rational and
linear real arithmetic.
It accepts formulas in the FOF, TFF, and TFF-INT formats of the TPTP syntax.
A first-order prover, beagle accepts arbitrarily quantified input formulas
and converts them to clause normal form. The core reasoning component
implements the Hierarchic Superposition Calculus with Weak Abstraction [BW13].
That calculus generalizes the well-known superposition calculus by integrating
black-box reasoning for specific theories.
For the theories mentioned above, beagle combines quantifier elimination
procedures and other solvers to discard proof obligations over these
theories.
Strategies
Beagles uses the well-know Discount loop for saturating a clause set under the
calculus' inference rules. Simplification techniques include standard ones,
such as subsumption deletion, rewriting by ordered unit equations, tautology
deletion. It also includes theory specific simplification rules for evaluating
ground (sub)terms, and for exploiting cancellation laws and properties of
neutral elements, among others.
Beagle features a splitting rule for clauses that can be divided into variable
disjoint parts. Splitting is particularly useful in combination with the
quantifier elimination procedure mentioned above, as the theory reasoner then
need to deal with conjunctions of quantifier-free unit clauses only.
Beagle uses a single, uniform strategy for every problem.
Implementation
Beagle has been implemented in Scala.
It is a full implementation of the calculus mentioned above, albeit a slow
one. Fairness is achieved through a combination of measuring clause
lengths, depths and their derivation-age.
Beagle's web site is
cocATP 0.1.8
Cristobal Camarero
University of Cantabria, Spain
Architecture
cocATP is a Coq-inspired [Coq] automated theorem prover made on the free
time of the author.
It implements (the non-inductive) part of Coq's logic (calculus of
constructions) and syntax.
The proof terms it creates are accepted as proofs by Coq by the addition
of a few definitions (substituting the respective inductive definitions of
Coq).
As in Coq, equality and logical connectives other than implication (which
is a dependent product of the logic) are defined adding the proper axioms.
The reasoning is tried to be done the more general possible, avoiding to
give any special treatment to both equality and logical connectives.
cocATP is currently very underdeveloped.
Recently the axiom of choice and excluded middle have been added, but the
latter is not actually exploited.
The functional extensionality axiom have some records of appropriate usage.
The propositional extensionality axiom is not included yet.
In addition to axioms, cocATP includes a little library of basic lemmas,
most of which with the proof that cocATP has constructed.
Strategies
Makes a search tree, doing steps trying to be some similar to human
reasoning (or more closely a human guessing at Coq's interface).
Some pattern-matching is done to to apply hypothesis.
Implements existential variables (similar to Coq's tactics evar and eapply).
Implementation
cocATP is fully implemented in Python-2.7.
It uses the Ply-3.4 library to build the parsers for both the Coq and
TPTP syntaxes.
Includes a type-verifier of Calculus of Constructions without induction
constructions, which must be defined with axioms.
That is, a buggy partial clone of Coq.
There is support for most of the TPTP syntax, problems are translated to
a set of calculus of constructions terms.
cocATP has been specially prepared for THF, but all the other TPTP formulae
without numbers should be accepted.
However cocATP does NOT include a SAT solver, thus it will probably not
solve your trivial cnf problems.
Expected Competition Performance
Belong to the THF division.
Based on local tests with the 2012 battery it will perform worse than TPS.
CVC4 1.2
Andrew Reynolds
University of Iowa, USA
Architecture
CVC4 [BC+11] is an SMT solver based on the DPLL(T) architecture [NOT06] that
includes built-in support for many theories including linear arithmetic,
arrays, bit vectors and datatypes.
Additionally, it supports universally quantified formulas.
When quantified formulas are present, CVC4 typically uses E-matching for
answering unsatisfiable, and a finite model finding method for answering
satisfiable.
Like other SMT solvers, CVC4 treats quantified formulas using a two-tiered
approach.
First, quantified formulas are replaced by fresh boolean predicates and the
ground theory solver(s) are used in conjunction with the underlying SAT
solver to determine satisfiability.
If the problem is unsatisfiable at the ground level, then the solver answers
"unsatisfiable".
Otherwise, the quantifier instantiation module is invoked, and will either
add instances of quantified formulas to the problem, answer "satisfiable", or
return unknown.
The finite model finding has been developed to target problems containing
background theories, whose quantification is limited to finite and
uninterpreted sorts.
When finite model finding is turned on, CVC4 uses a ground theory of finite
cardinality constraints that minimizes the number of ground equivalence
classes, as described in [RT+13].
When the problem is satisfiable at the ground level, a candidate model is
constructed that contains complete interpretations for all predicate and
function symbols.
Quantifier instantiation strategies are then invoked to add instances of
quantified formulas that are in conflict with the candidate model, as
described in [4]. If no instances are added, then the solver reports
"satisfiable".
Strategies
For handling theorems, CVC4 uses a variation of E-matching.
Quantifier instantiation based on E-matching is performed lazily in CVC4,
after the ground solver reports satisfiable.
For handling non-theorems, the finite model finding feature of CVC4 will
use a small number of orthogonal quantifier instantiation strategies.
Since CVC4 with finite model finding is also capable of answering
unsatisfiable, it will be used as a strategy for theorems as well.
Implementation
CVC4 is implemented in C++.
The code is available from
E 1.8
Stephan Schulz
Technische Universität München, Germany
Architecture
E 1.8pre [Sch02] is a purely equational theorem prover for full
first-order logic with equality.
It consists of an (optional) clausifier for pre-processing full first-order
formulae into clausal form, and a saturation algorithm implementing an
instance of the superposition calculus with negative literal selection and
a number of redundancy elimination techniques.
E is based on the DISCOUNT-loop variant of the given-clause
algorithm, i.e., a strict separation of active and passive facts.
No special rules for non-equational literals have been implemented.
Resolution is effectively simulated by paramodulation and equality resolution.
For the LTB divisions, a control program uses a SInE-like analysis to
extract reduced axiomatizations that are handed to several instances
of E.
Strategies
Proof search in E is primarily controlled by a literal selection
strategy, a clause evaluation heuristic, and a simplification ordering.
The prover supports a large number of pre-programmed literal selection
strategies. Clause evaluation heuristics can be constructed on the fly
by combining various parameterized primitive evaluation functions, or can
be selected from a set of predefined heuristics.
Clause evaluation heuristics are based on symbol-counting, but also take
other clause properties into account.
In particular, the search can prefer clauses from the set of support, or
containing many symbols also present in the goal.
Supported term orderings are several parameterized instances of
Knuth-Bendix-Ordering (KBO) and Lexicographic Path Ordering (LPO).
For CASC-24, E implements a new strategy-scheduling automatic mode.
The total CPU time available is broken into 5 (unequal) time slices.
For each time slice, the problem is classified into one of
several classes, based on a number of simple features (number of
clauses, maximal symbol arity, presence of equality, presence of
non-unit and non-Horn clauses,...).
For each class, a schedule of strategies is greedily constructed from
experimental data as follows:
The first strategy assigned to a schedule is the the one that solves
the most problems from this class in the first time slice.
Each subsequent strategy is selected based on the number of solutions on
problems not already solved by a preceding strategy.
About 170 different strategies have been evaluated on all untyped
first-order problems from TPTP 5.4.0, and about 140 of these
strategies are used in the automatic mode.
Implementation
E is build around perfectly shared terms, i.e. each distinct term is
only represented once in a term bank.
The whole set of terms thus consists of a number of interconnected directed
acyclic graphs.
Term memory is managed by a simple mark-and-sweep garbage collector.
Unconditional (forward) rewriting using unit clauses is implemented
using perfect discrimination trees with size and age constraints.
Whenever a possible simplification is detected, it is added as a rewrite
link in the term bank.
As a result, not only terms, but also rewrite steps are shared.
Subsumption and contextual literal cutting (also known as subsumption
resolution) is supported using feature vector indexing [Sch04].
Superposition and backward rewriting use fingerprint indexing [Sch12],
a new technique combining ideas from feature vector indexing and path
indexing.
Finally, LPO and KBO are implemented using the elegant and efficient
algorithms developed by Bernd Löchner in [Loe06,Loe06].
The prover and additional information are available from
E-KRHyper 1.4
Björn Pelzer
University Koblenz-Landau, Germany
Architecture
E-KRHyper [PW07] is a theorem proving and model generation system for
first-order logic with equality.
It is an implementation of the E-hyper tableau calculus [BFP07], which
integrates a superposition-based handling of equality [BG98] into the hyper
tableau calculus [BFN96].
The system is an extension of the KRHyper theorem prover [Wer03], which
implements the original hyper tableau calculus.
An E-hyper tableau is a tree whose nodes are labeled with clauses and which
is built up by the application of the inference rules of the E-hyper tableau
calculus.
The calculus rules are designed such that most of the reasoning is performed
using positive unit clauses.
Splitting is done without rigid variables.
Instead, variables which would be shared between branches are prevented by
ground substitutions, which are guessed from the Herbrand universe and
constrained by rewrite rules.
Redundancy rules allow the detection and removal of clauses that are redundant
with respect to a branch.
The hyper extension inference from the original hyper tableau calculus is
equivalent to a series of E-hyper tableau calculus inference applications.
Therefore the implementation of the hyper extension in KRHyper by a variant
of semi-naive evaluation [Ull89] is retained in E-KRHyper, where it serves
as a shortcut inference for the resolution of non-equational literals.
Strategies
E-KRHyper has traditionally used a uniform search strategy for all problems,
based on the aforementioned semi-naive evaluation.
As of version 1.4 the prover also contains a prototypical implementation
of the common Given-Clause-Algorithm ("Otter-loop") [McC03],
which may be used as an alternative on problems of a certain size.
The E-hyper tableau is generated depth-first, with E-KRHyper always working
on a single branch. Refutational completeness and a fair search control are
ensured by an iterative deepening strategy with a limit on the maximum term
weight of generated clauses.
In the LTB division E-KRHyper sequentially tries three axiom selection
strategies:
an implementation of Krystof Hoder's SInE algorithm,
another incomplete selection based on the CNF representations of the axioms,
and finally the complete axiom set.
Implementation
E-KRHyper is implemented in the functional/imperative language OCaml.
The system runs on Unix and MS-Windows platforms.
the GNU Public License from the E-KRHyper website at:
The system accepts input in the TPTP-format and in the TPTP-supported
Protein-format.
The calculus implemented by E-KRHyper works on clauses, so first order
formula input is converted into CNF by an algorithm similar to the one
used by Otter [McC03], with some additional connector literals to prevent
explosive clause growth when dealing with DNF-like structures.
E-KRHyper operates on an E-hyper tableau which is represented by linked
node records.
Several layered discrimination-tree based indexes (both perfect and
non-perfect) provide access to the clauses in the tableau and support
backtracking.
E-KRHyper is available from
iProver-Eq 0.85
Christoph Sticksel, Konstantin Korovin
University of Iowa, USA
Architecture
iProver-Eq [KS10] extends the iProver system [Kor08] with built-in
equational reasoning, along the lines of [GK04].
As in the iProver system, first-order reasoning is combined with ground
satisfiability checking where the latter is delegated to an off-the-shelf
ground solver.
iProver-Eq consists of three core components: i) ground reasoning by an
SMT solver, ii) first-order equational reasoning on literals in a
candidate model by a labelled unit superposition calculus [KS10,KS10] and
iii) instantiation of clauses with substitutions obtained by ii).
Given a set of first-order clauses, iProver-Eq first abstracts it to a
set of ground clauses which are then passed to the ground solver.
If the ground abstraction is unsatisfiable, then the set of first-order
clauses is also unsatisfiable.
Otherwise, literals are selected from the first-order clauses based on
the model of the ground solver.
The labelled unit superposition calculus checks whether selected literals
are conflicting.
If they are conflicting, then clauses are instantiated such that the
ground solver has to refine its model in order to resolve the conflict.
Otherwise, satisfiability of the initial first-order clause set is shown.
Clause selection and literal selection in the unit superposition calculus
are implemented in separate given clause algorithms.
Relevant substitutions are accumulated in labels during unit superposition
inferences and then used to instantiate clauses.
For redundancy elimination iProver-Eq uses demodulation, dismatching
constraints and global subsumption.
In order to efficiently propagate redundancy elimination from instantiation
into unit superposition, we implemented different representations of labels
based on sets, AND/OR-trees and OBDDs.
Non-equational resolution and equational superposition inferences provide
further simplifications.
Strategies
Proof search options in iProver-Eq control clause and literal selection
in the respective given clause algorithms.
Equally important is the global distribution of time between the inference
engines and the ground solver.
At CASC, iProver-Eq will execute a fixed schedule of selected options.
If no equational literals occur in the input, iProver-Eq falls back to
the inference rules of iProver, otherwise the latter are disabled and
only unit superposition is used.
If all clauses are unit equations, no instances need to be generated and
the calculus is run without the otherwise necessary bookkeeping.
Implementation
iProver-Eq is implemented in OCaml and the CVC4 SMT solver [BC+11] for the
ground reasoning in the equational case and MiniSat [ES04] in the
non-equational case.
iProver-Eq accepts FOF and CNF formats, where Vampire [HKV11,HK+12] is
used for clausification and preprocessing of FOF problems.
iProver-Eq is available from
iProverModulo 0.7-0.2
Guillaume Burel
ENSIIE/Cedric, France
Architecture
iProverModulo [Bur11] is a patch to iProver [Kor08] to integrate polarized
resolution modulo [Dow10].
Polarized resolution modulo consists in presenting the theory in which the
problem has to be solved by means of polarized rewriting rules instead of
axioms.
It can also be seen as a combination of the set-of-support strategy and
selection of literals.
The integration of polarized resolution modulo in iProver only affects
its ordered resolution calculus, so that the instantiation calculus is
untouched.
To be able to use polarized resolution modulo, the theory has to be
presented as a rewriting system. A theory preprocessor is therefore
used to convert the axioms of the problem into such a rewriting
system.
Strategies
A theory preprocessor is first run to transform the formulas of the
problem whose role is "axiom" into polarized rewriting rules.
This preprocessor offers a set of strategies to that purpose.
For the competition, the Equiv(ClausalAll) and the ClausalAll strategies
will be used.
The former strategy may be incomplete, depending on the shape of the axioms,
so that the prover may give up in certain cases.
However, it shows interesting results on some problems.
The second strategy should be complete, at least when equality is not
involved.
The rewriting system for the first strategy is tried for half the time
given for the problem, then the prover is restarted with the second strategy
if no proof has been found.
The patched version of iProver is run on the remaining formulas
modulo the rewriting rules produced by the preprocessor. No scheduling is
performed. To be compatible with polarized resolution modulo, literals
are selected only when they are maximal w.r.t. a KBO ordering, and orphans
are not eliminated. To take advantage of polarized resolution modulo,
the resolution calculus is triggered more often than the instantiation
calculus, on the contrary to the original iProver.
Normalization of clauses w.r.t. the term rewriting system produced by
the preprocessor is performed by transforming these rules into an
OCaml program, compiling this program, and dynamically linking it with
the prover.
Implementation
iProverModulo is available as a patch to iProver.
The most important additions are the plugin-based normalization engine and
the handling of polarized rewriting rules.
iProverModulo is available from
Isabelle 2012
Jasmin C. Blanchette1, Lawrence C. Paulson2,
Tobias Nipkow1, Makarius Wenzel3
1Technische Universität München, Germany,
2University of Cambridge, United Kingdom,
3Université Paris Sud, France
Architecture
Isabelle/HOL 2012 [NPW12] is the higher-order logic incarnation of the generic
proof assistant Isabelle.
Isabelle/HOL provides several automatic proof tactics, notably an equational
reasoner [Nip89], a classical reasoner [PN94], and a tableau prover [Pau99].
It also integrates external first- and higher-order provers via its subsystem
Sledgehammer [PB10,BBP11].
Previous versions of Isabelle relied on the TPTP2X tool to translate TPTP
files to Isabelle theory files.
Starting this year, Isabelle includes a parser for the TPTP syntaxes CNF,
FOF, TFF0, and THF0 as well as TPTP versions of its popular tools, invokable
on the command line as isabelle tptp_tool max_secs
file.p.
For example:
isabelle tptp_isabelle_demo 100 SEU/SEU824^3.p
Two versions of Isabelle participate this year.
The demo version includes its competitors LEO-II [BP+08] and Satallax
[Bro12] as Sledgehammer backends, whereas the competition version
leaves them out.
Strategies
The Isabelle tactic submitted to the competition simply tries the
following tactics sequentially:
- sledgehammer
- Invokes the following sequence of provers as oracles via Sledgehammer:
- satallax - Satallax 2.4 [Bro12] (demo only);
- leo2 - LEO-II 1.3.2 [BP+08] (demo only);
- spass - SPASS 3.8ds [BP+12];
- vampire - Vampire 1.8 (revision 1435) [RV02];
- e - E 1.4 [Sch04];
- z3_tptp - Z3 3.2 with TPTP syntax [dMB08].
- nitpick
- For problems involving only the type $o of Booleans, checks
whether a finite model exists using Nitpick [BN10].
- simp
- Performs equational reasoning using rewrite rules [Nip89].
- blast
- Searches for a proof using a fast untyped tableau prover and then
attempts to reconstruct the proof using Isabelle tactics [Pau99].
- auto+spass
- Combines simplification and classical reasoning [PN94] under one roof;
then invoke Sledgehammer with SPASS on any subgoals that emerge.
- z3
- Invokes the SMT solver Z3 3.2 [dMB08].
- cvc3
- Invokes the SMT solver CVC3 2.2 [BT07].
- fast
- Searches for a proof using sequent-style reasoning, performing a
depth-first search [PN94].
Unlike blast, it construct proofs directly in Isabelle.
That makes it slower but enables it to work in the presence of the more
unusual features of HOL, such as type classes and function unknowns.
- best
- Similar to fast, except that it performs a best-first search.
- force
- Similar to auto, but more exhaustive.
- meson
- Implements Loveland's MESON procedure [Lov78].
Constructs proofs directly in Isabelle.
- fastforce
- Combines fast and force.
Implementation
Isabelle is a generic theorem prover written in Standard ML.
Its meta-logic, Isabelle/Pure, provides an intuitionistic fragment of
higher-order logic.
The HOL object logic extends pure with a more elaborate version of higher-order
logic, complete with the familiar connectives and quantifiers.
Other object logics are available, notably FOL (first-order logic) and ZF
(Zermelo-Fraenkel set theory).
The implementation of Isabelle relies on a small LCF-style kernel, meaning that
inferences are implemented as operations on an abstract theorem
datatype.
Assuming the kernel is correct, all values of type theorem are
correct by construction.
Most of the code for Isabelle was written by the Isabelle teams at the
University of Cambridge and the Technische Universität München.
Isabelle/HOL is available for all major platforms under a BSD-style license
from
Isabelle 2013
Jasmin C. Blanchette1, Lawrence C. Paulson2,
Tobias Nipkow1, Makarius Wenzel3
1Technische Universität München, Germany,
2University of Cambridge, United Kingdom,
3Université Paris Sud, France
Architecture
Isabelle/HOL 2013 [NPW12] is the higher-order logic incarnation of the generic
proof assistant Isabelle.
Isabelle/HOL provides several automatic proof tactics, notably an equational
reasoner [Nip89], a classical reasoner [PN94], and a tableau prover [Pau99].
It also integrates external first- and higher-order provers via its subsystem
Sledgehammer [PB10,BBP11].
Previous versions of Isabelle relied on the TPTP2X tool to translate TPTP
files to Isabelle theory files.
Isabelle includes a parser for the TPTP syntaxes CNF, FOF, TFF0, and THF0
as well as TPTP versions of its popular tools, invokable on the command line
as isabelle tptp_tool max_secs file.p.
For example:
isabelle tptp_isabelle_demo 100 SEU/SEU824^3.p
Strategies
The Isabelle tactic submitted to the competition simply tries the
following tactics sequentially:
- sledgehammer
- Invokes the following sequence of provers as oracles via Sledgehammer:
- spass - SPASS 3.8ds [BP+12];
- vampire - Vampire 1.8 (revision 1435) [RV02];
- e - E 1.4 [Sch04];
- z3_tptp - Z3 3.2 with TPTP syntax [dMB08].
- nitpick
- For problems involving only the type $o of Booleans, checks
whether a finite model exists using Nitpick [BN10].
- simp
- Performs equational reasoning using rewrite rules [Nip89].
- blast
- Searches for a proof using a fast untyped tableau prover and then
attempts to reconstruct the proof using Isabelle tactics [Pau99].
- auto+spass
- Combines simplification and classical reasoning [PN94] under one roof;
then invoke Sledgehammer with SPASS on any subgoals that emerge.
- z3
- Invokes the SMT solver Z3 3.2 [dMB08].
- cvc3
- Invokes the SMT solver CVC3 2.2 [BT07].
- fast
- Searches for a proof using sequent-style reasoning, performing a
depth-first search [PN94].
Unlike blast, it construct proofs directly in Isabelle.
That makes it slower but enables it to work in the presence of the more
unusual features of HOL, such as type classes and function unknowns.
- best
- Similar to fast, except that it performs a best-first search.
- force
- Similar to auto, but more exhaustive.
- meson
- Implements Loveland's MESON procedure [Lov78].
Constructs proofs directly in Isabelle.
- fastforce
- Combines fast and force.
Implementation
Isabelle is a generic theorem prover written in Standard ML.
Its meta-logic, Isabelle/Pure, provides an intuitionistic fragment of
higher-order logic.
The HOL object logic extends pure with a more elaborate version of higher-order
logic, complete with the familiar connectives and quantifiers.
Other object logics are available, notably FOL (first-order logic) and ZF
(Zermelo-Fraenkel set theory).
The implementation of Isabelle relies on a small LCF-style kernel, meaning that
inferences are implemented as operations on an abstract theorem
datatype.
Assuming the kernel is correct, all values of type theorem are
correct by construction.
Most of the code for Isabelle was written by the Isabelle teams at the
University of Cambridge and the Technische Universität München.
Isabelle/HOL is available for all major platforms under a BSD-style license
from
Nitrox 2013
Jasmin C. Blanchette1, Emina Torlak2
1Technische Universität München, Germany,
2University of California, USA
Architecture
Nitrox is the first-order version of Nitpick [BN10], an open source
counterexample generator for Isabelle/HOL [NPW12].
It builds on Kodkod [TJ07], a highly optimized first-order relational
model finder based on SAT.
The name Nitrox is a portmanteau of Nitpick
and Paradox (clever, eh?).
Strategies
Nitrox employs Kodkod to find a finite model of the negated conjecture.
It performs a few transformations on the input, such as pushing quantifiers
inside, but 99% of the solving logic is in Kodkod and the underlying SAT
solver.
The translation from HOL to Kodkod's first-order relational logic (FORL) is
parameterized by the cardinalities of the atomic types occurring in it.
Nitrox enumerates the possible cardinalities for the universe.
If a formula has a finite counterexample, the tool eventually finds it,
unless it runs out of resources.
Nitpick is optimized to work with higher-order logic (HOL) and its definitional
principles (e.g., (co)inductive predicates, (co)inductive datatypes,
(co)recursive functions).
When invoked on untyped first-order problem, few of its optimizations come
into play, and the problem handed to Kodkod is essentially a first-order
relational logic (FORL) rendering of the TPTP FOF problem.
There are two main exceptions:
- Nested quantifiers are moved as far inside the formula as possible before
Kodkod gets a chance to look at them [BN10].
- Definitions invoked with fixed arguments are specialized.
Implementation
Nitrox, like most of Isabelle/HOL, is written in Standard ML.
Unlike Isabelle itself, which adheres to the LCF small-kernel discipline,
Nitrox does not certify its results and must be trusted.
Kodkod is written in Java.
MiniSat 1.14 is used as the SAT solver.
Expected Competition Performance
Since Nitpick was designed for HOL, it doesn't have any type inference à
la Paradox.
It also doesn't use the SAT solver incrementally, which penalizes it a bit
(but not as much as the missing type inference).
Kodkod itself is known to perform less well on FOF than Paradox, because it
is designed and optimized for a somewhat different logic, FORL.
On the other hand, Kodkod's symmetry breaking seems better calibrated than
Paradox's.
Hence, we expect Nitrox to end up in second or third place in the FNT division.
PEPR 0.0ps
Tianyi Liang, Cesare Tinelli
University of Iowa, USA
Architecture
PEPR is a hybrid parallel automated theorem prover for the effectively
propositional fragment of first-order logic (EPR).
Its core is based on a variant of the Model Evolution calculus [BT03,BT05],
which is a decision procedure for the satisfiability of EPR formulas.
During reasoning, the prover maintains a set of possibly non-ground literals,
the context, to represent a candidate Herbrand model of the input clause set;
it uses resolution-like operations between literals in the context and input
clauses to identify instances of the latter that are falsified by the
candidate model.
These instances are then used to modify ("repair") the context
non-deterministically in an attempt to satisfy them.
If a context is not repairable, the input set has been proven to be
unsatisfiable.
A model is found when the context is saturated.
PEPR combines synergistically a portfolio approach, with multiple sub-provers
that differ on the strategies they use to repair their context, and
MapReduce-style clause-level parallelism for the computation of input clause
instances [LT12].
The parallel architecture of PEPR is based on the Actor model with
asynchronous message passing.
To minimize communication all literals/clauses, including the input ones,
are created locally in each sub-prover, which allows PEPR to run also in a
distributed environment.
Because CASC measures performance in terms of total CPU time, the version
of PEPR submitted to CASC-24 is actually sequential.
Strategies
In addition to the strategies implemented in Darwin, a theorem prover for
the full ME calculus, PEPR implements conjecture distance, candidate
coverage and predicate credits as literal selection heuristics, as well
as thread scheduling for the parallel part.
Implementation
The sequential core of PEPR is fully implemented in C++, and it features
new functionalities from the new C++11 standards.
It requires either GCC 4.7 (or above) or MSVC 2012 to compile.
To parse the TPTP syntax, it also requires the ANTLR 3.4 library.
PEPR accepts the TPTP CNF format for EPR formula natively, and requires an
external clausifier for the FOF format.
The parallel part is built strictly upon the sequential core, and requires
the Charm++ library to compile.
Although PEPR shows some promising results in its ability to exploit multiple
cores, it still a prototype.
Many configurations inside PEPR need to be further tuned.
Expected Competition Performance
PEPR will enter only the EPR division.
Based on local tests with the CASC-23 benchmarks we expect it to rank in
the middle range.
Princess 2012-06-04
Philipp Rümmer, Aleksandar Zeljić
Uppsala University, Sweden
Architecture
Princess [Rue08,Rue12] is a theorem prover for first-order logic modulo
linear integer arithmetic.
The prover has been under development since 2007, and represents a
combination of techniques from the areas of first-order reasoning and
SMT solving.
The main underlying calculus is a free-variable tableau calculus, which
is extended with constraints to enable backtracking-free proof expansion,
and positive unit hyper-resolution for lightweight instantiation of quantified
formulae.
Linear integer arithmetic is handled using a set of built-in proof rules
resembling the Omega test, which altogether yields a calculus that is
complete for full Presburger arithmetic, for first-order logic, and for
a number of further fragments.
The internal calculus of Princess only supports uninterpreted
predicates; uninterpreted functions are encoded as predicates,
together with the usual axioms.
Through appropriate translation of quantified formulae with functions,
the e-matching technique common in SMT solvers can be simulated;
triggers in quantified formulae are chosen based on heuristics similar
to those in the Simplify prover.
Strategies
Princess supports a number of different proof expansion strategies
(e.g., depth-first, breadth-first), which are chosen based on
syntactic properties of a problem (in particular, the kind of
quantifiers occurring in the problem). Further options exist to
control, for instance, the selection of triggers in quantified
formulae, clausification, and the handling of functions.
For CASC, Princess will run a schedule with a small number of
configurations for each problem (portfolio method). The schedule is
determined either statically, or dynamically using syntactic
attributes of problems (such as number and kind of quantifiers, etc),
based on training using a random sample of problems from the TPTP
library.
Implementation
Princess is entirely written in Scala and runs on any recent Java
virtual machine; besides the standard Scala and Java libraries, only
the Cup parser library is employed.
Princess is available from
Prover9 2009-11A
Bob Veroff on behalf of William McCune
University of New Mexico, USA
Architecture
Prover9, Version 2009-11A, is a resolution/paramodulation prover for
first-order logic with equality.
Its overall architecture is very similar to that of Otter-3.3 [McC03].
It uses the "given clause algorithm", in which not-yet-given clauses are
available for rewriting and for other inference operations (sometimes called
the "Otter loop").
Prover9 has available positive ordered (and nonordered) resolution and
paramodulation, negative ordered (and nonordered) resolution, factoring,
positive and negative hyperresolution, UR-resolution, and demodulation
(term rewriting).
Terms can be ordered with LPO, RPO, or KBO.
Selection of the "given clause" is by an age-weight ratio.
Proofs can be given at two levels of detail:
(1) standard, in which each line of the proof is a stored clause
with detailed justification, and
(2) expanded, with a separate line for each operation.
When FOF problems are input, proof of transformation to clauses
is not given.
Completeness is not guaranteed, so termination does not indicate
satisfiability.
Strategies
Like Otter, Prover9 has available many strategies; the following
statements apply to CASC-2012.
Given a problem, Prover9 adjusts its inference rules and strategy according
to syntactic properties of the input clauses such as the presence of
equality and non-Horn clauses.
Prover9 also does some preprocessing, for example, to eliminate predicates.
In previous CASC competitions, Prover9 has used LPO to order terms for
demodulation and for the inference rules, with a simple rule for determining
symbol precedence.
For CASC 2012, we are going to use KBO instead.
For the FOF problems, a preprocessing step attempts to
reduce the problem to independent subproblems by a miniscope
transformation; if the problem reduction succeeds, each
subproblem is clausified and given to the ordinary search
procedure; if the problem reduction fails, the original
problem is clausified and given to the search procedure.
Implementation
Prover9 is coded in C, and it uses the LADR libraries.
Some of the code descended from EQP [McC97].
(LADR has some AC functions, but Prover9 does not use them).
Term data structures are not shared (as they are in Otter).
Term indexing is used extensively, with discrimination tree indexing for
finding rewrite rules and subsuming units, FPA/Path indexing for finding
subsumed units, rewritable terms, and resolvable literals.
Feature vector indexing [Sch04] is used for forward and backward nonunit
subsumption.
Prover9 is available from
Finishes in the middle of the pack are anticipated in all categories
in which Prover9 competes.
Satallax 2.7
Chad E. Brown
Saarland University, Germany
Architecture
Satallax 2.7 [Bro12] is an automated theorem prover for higher-order logic.
The particular form of higher-order logic supported by Satallax is Church's
simple type theory with extensionality and choice operators.
The SAT solver MiniSat [ES04] is responsible for much of the search for a
proof.
The theoretical basis of search is a complete ground tableau calculus for
higher-order logic [BS10] with a choice operator [BB11].
A problem is given in the THF format.
A branch is formed from the axioms of the problem and the negation of the
conjecture (if any is given).
From this point on, Satallax tries to determine unsatisfiability or
satisfiability of this branch.
Satallax progressively generates higher-order formulae and corresponding
propositional clauses [Bro13].
These formulae and propositional clauses correspond to instances of the
tableau rules.
Satallax uses the SAT solver MiniSat as an engine to test the current set
of propositional clauses for unsatisfiability. If the clauses are
unsatisfiable, then the original branch is unsatisfiable.
Additionally, Satallax may optionally generate first-order formulas in
addition to the propositional clauses.
If this option is used, then Satallax peroidically calls the first-order
theorem prover E to test for first-order unsatisfiability.
If the set of first-order formulas is unsatisfiable, then the original branch
is unsatisfiable.
Strategies
There are about a hundred flags that control the order in which formulas
and instantiation terms are considered and propositional clauses are
generated.
Other flags activate some optional extensions to the basic proof procedure
(such as whether or not to call the theorem prover E).
A collection of flag settings is called a mode.
Approximately 500 modes have been defined and tested so far.
A strategy schedule is an ordered collection of modes with information about
how much time the mode should be allotted.
Satallax tries each of the modes for a certain amount of time sequentially.
Satallax 2.7 has strategy schedule consisting of 68 modes.
Each mode is tried for time limits ranging from 0.1 seconds to 54.9 seconds.
The strategy schedule was determined through experimentation using the
THF problems in version 5.4.0 of the TPTP library.
Implementation
Satallax 2.7 is implemented in OCaml.
A foreign function interface is used to in teract with MiniSat 2.2.0.
Satallax is available from
TEMPLAR::leanCoP 0.8
Mario Frank, Jens Otten
University of Potsdam, Germany
Architecture
TEMPLAR is a formula set pruner which has the scope to reduce the set of
given axioms to a size which can be handled by current state-of-the-art
automated theorem provers.
It reads and analyses the given axiom set and searches for a set of formulae
which seem to be relevant for the proof of a given formula.
The selected formulae can be given to an automated theorem prover with
additional information and the theorem prover can attempt a proof.
Multiple instances of a theorem prover can be started in parallel while TEMPLAR
calculates other formula sets.
The input formulae are transformed into negation normal form (but not into
clause normal form) and multiple indexing techniques are applied to speed
up the formula selection.
The selection itself is done by a dynamic module which uses two different
algorithms and triggers them depending on the structure of the current
formula set.
TEMPLAR conceptually supports every theorem prover for which an executor (a
module) is implemented.
Currently, one executor is existent, which is the leanCoP-Executor.
leanCop [OB03,Ott08] 2.2, a compact connection calculus based theorem prover
for classical first order logic is used as inference engine.
The CASC version of TEMPLAR is a beta stadium version of the project and
lacks of some features and performance optimizations in order to reduce the
memory consumption.
Strategies
TEMPLAR analyses all axioms and gathers information about included predicate
symbols, their arity and additional information like their polarities and for
instance the complete formula graph.
TEMPLAR uses - depending on the metrics of the formula set (e.g. size of
the set) - two different search approaches which are chained, if appropriate.
The one approach which is tailored for smaller sets is an unification
based graph search with multiple search graph pruning techniques.
This is an reconception and extension of the concept of ARDE which was
described in [Fra12] and used in the CASC-J6.
For this approach, the negation normal forms of all formulae are used.
The other approach is an implementation of a frequency based concept commonly
used for natural language processing (e.g. automated text summarization) and
is tailored for bigger formula sets. This concept aims to recognise the
topics of a set of documents and can recognise common symbols and thus
should have similar effects like SInE [HV11] but uses another weighting
approach.
Implementation
TEMPLAR is fully implemented in C++ conforming to the C++ standard of 2011.
The TPTP and batch file parsing is done by a parser which is implemented in
a descriptive meta-language (a grammar) conforming to the C++ syntax. This
grammar is transformed into C++ instructions by the boost library (boost spirit
Qi).
The selected formulae can be output in the original form or the Skolem normal
form, conforming to the TPTP syntax.
TEMPLAR can be compiled with the GNU Compiler Collection (GCC) with the
minimum version 4.6.3 and the Boost library, minimum version 1.42 (1.46 is
better).
leanCop can be run with SWI Prolog or Eclipse Prolog (5.x) for example.
Currently, TEMPLAR is not available since it is a diploma thesis project
but will be published afterwards in the final version.
Expected Competition Performance
TEMPLAR extends a theorem prover by the described pruning techniques but
gives the prover the chance to attempt a proof on the original file, too.
Thus, the performance is expected to be at least the performance of the
underlying prover.
Due to the restrictions, some proofs are expected to be found faster and
some normally not (in the given time) provable problems are expected to
become provable.
Vampire 2.6
Krystof Hoder, Andrei Voronkov
University of Manchester, England
Architecture
Vampire 2.6 is an automatic theorem prover for first-order classical logic.
It consists of a shell and a kernel.
The kernel implements the calculi of ordered binary resolution and
superposition for handling equality.
It also implements the Inst-gen calculus.
The splitting rule in kernel adds propositional parts to clauses, which are
being manipulated using binary decision diagrams and a SAT solver.
A number of standard redundancy criteria and simplification techniques are
used for pruning the search space: subsumption, tautology deletion,
subsumption resolution and rewriting by ordered unit equalities.
The reduction ordering is the Knuth-Bendix Ordering.
Substitution tree and code tree indexes are used to implement all major
operations on sets of terms, literals and clauses.
Although the kernel of the system works only with clausal normal form, the
shell accepts a problem in the full first-order logic syntax, clausifies it
and performs a number of useful transformations before passing the result
to the kernel.
Also the axiom selection algorithm Sine [HV11] can be enabled as part of
the preprocessing.
When a theorem is proved, the system produces a verifiable proof, which
validates both the clausification phase and the refutation of the CNF.
Strategies
The Vampire 2.6 kernel provides a fairly large number of options for strategy se
lection. The most important ones are:
- Choice of the main procedure:
- Limited Resource Strategy
- DISCOUNT loop
- Otter loop
- Goal oriented mode based on tabulation
- Instantiation using the Inst-gen calculus
- A variety of optional simplifications.
- Parameterized reduction orderings.
- A number of built-in literal selection functions and different modes
of comparing literals.
- Age-weight ratio that specifies how strongly lighter clauses are
preferred for inference selection.
- Set-of-support strategy.
Implementation
Vampire 2.6 is implemented in C++.
Expected Competition Performance
Vampire 2.6 is the CASC-J6 FOF and LTB division winner.