A first-order prover, beagle accepts arbitrarily quantified input formulas
and converts them to clause normal form. The core reasoning component
implements the Hierarchic Superposition Calculus with Weak Abstraction [BW13].
That calculus generalizes the well-known superposition calculus by integrating
black-box reasoning for specific theories.
For the theories mentioned above, beagle combines quantifier elimination
procedures and other solvers to discard proof obligations over these
theories.
Beagle features a splitting rule for clauses that can be divided into variable
disjoint parts. Splitting is particularly useful in combination with the
quantifier elimination procedure mentioned above, as the theory reasoner then
need to deal with conjunctions of quantifier-free unit clauses only.
Beagle uses a single, uniform strategy for every problem.
Beagle's web site is
cocATP is currently very underdeveloped.
Recently the axiom of choice and excluded middle have been added, but the
latter is not actually exploited.
The functional extensionality axiom have some records of appropriate usage.
The propositional extensionality axiom is not included yet.
In addition to axioms, cocATP includes a little library of basic lemmas,
most of which with the proof that cocATP has constructed.
Like other SMT solvers, CVC4 treats quantified formulas using a two-tiered
approach.
First, quantified formulas are replaced by fresh boolean predicates and the
ground theory solver(s) are used in conjunction with the underlying SAT
solver to determine satisfiability.
If the problem is unsatisfiable at the ground level, then the solver answers
"unsatisfiable".
Otherwise, the quantifier instantiation module is invoked, and will either
add instances of quantified formulas to the problem, answer "satisfiable", or
return unknown.
The finite model finding has been developed to target problems containing
background theories, whose quantification is limited to finite and
uninterpreted sorts.
When finite model finding is turned on, CVC4 uses a ground theory of finite
cardinality constraints that minimizes the number of ground equivalence
classes, as described in [RT+13].
When the problem is satisfiable at the ground level, a candidate model is
constructed that contains complete interpretations for all predicate and
function symbols.
Quantifier instantiation strategies are then invoked to add instances of
quantified formulas that are in conflict with the candidate model, as
described in [4]. If no instances are added, then the solver reports
"satisfiable".
For handling non-theorems, the finite model finding feature of CVC4 will
use a small number of orthogonal quantifier instantiation strategies.
Since CVC4 with finite model finding is also capable of answering
unsatisfiable, it will be used as a strategy for theorems as well.
For the LTB divisions, a control program uses a SInE-like analysis to
extract reduced axiomatizations that are handed to several instances
of E.
For CASC-24, E implements a new strategy-scheduling automatic mode.
The total CPU time available is broken into 5 (unequal) time slices.
For each time slice, the problem is classified into one of
several classes, based on a number of simple features (number of
clauses, maximal symbol arity, presence of equality, presence of
non-unit and non-Horn clauses,...).
For each class, a schedule of strategies is greedily constructed from
experimental data as follows:
The first strategy assigned to a schedule is the the one that solves
the most problems from this class in the first time slice.
Each subsequent strategy is selected based on the number of solutions on
problems not already solved by a preceding strategy.
About 170 different strategies have been evaluated on all untyped
first-order problems from TPTP 5.4.0, and about 140 of these
strategies are used in the automatic mode.
An E-hyper tableau is a tree whose nodes are labeled with clauses and which
is built up by the application of the inference rules of the E-hyper tableau
calculus.
The calculus rules are designed such that most of the reasoning is performed
using positive unit clauses.
Splitting is done without rigid variables.
Instead, variables which would be shared between branches are prevented by
ground substitutions, which are guessed from the Herbrand universe and
constrained by rewrite rules.
Redundancy rules allow the detection and removal of clauses that are redundant
with respect to a branch.
The hyper extension inference from the original hyper tableau calculus is
equivalent to a series of E-hyper tableau calculus inference applications.
Therefore the implementation of the hyper extension in KRHyper by a variant
of semi-naive evaluation [Ull89] is retained in E-KRHyper, where it serves
as a shortcut inference for the resolution of non-equational literals.
E-KRHyper is available from
MaLeS (Machine Learning of Strategies) is a framework for automatic tuning
of ATPs.
It combines strategy finding (similar to ParamILS [HH+09] and BliStr [Urb13])
with strategy scheduling (similar to SatZilla [XH+08]).
A description of E can be found above.
The learning algorithm is described in [KUS13].
Further information can be found on the project website.
Major additions in the current version are:
iProver is available from
Some of iProver features are summarised below.
iProver is available from
iProver-Eq consists of three core components: i) ground reasoning by an
SMT solver, ii) first-order equational reasoning on literals in a
candidate model by a labelled unit superposition calculus [KS10,KS10] and
iii) instantiation of clauses with substitutions obtained by ii).
Given a set of first-order clauses, iProver-Eq first abstracts it to a
set of ground clauses which are then passed to the ground solver.
If the ground abstraction is unsatisfiable, then the set of first-order
clauses is also unsatisfiable.
Otherwise, literals are selected from the first-order clauses based on
the model of the ground solver.
The labelled unit superposition calculus checks whether selected literals
are conflicting.
If they are conflicting, then clauses are instantiated such that the
ground solver has to refine its model in order to resolve the conflict.
Otherwise, satisfiability of the initial first-order clause set is shown.
Clause selection and literal selection in the unit superposition calculus
are implemented in separate given clause algorithms.
Relevant substitutions are accumulated in labels during unit superposition
inferences and then used to instantiate clauses.
For redundancy elimination iProver-Eq uses demodulation, dismatching
constraints and global subsumption.
In order to efficiently propagate redundancy elimination from instantiation
into unit superposition, we implemented different representations of labels
based on sets, AND/OR-trees and OBDDs.
Non-equational resolution and equational superposition inferences provide
further simplifications.
If no equational literals occur in the input, iProver-Eq falls back to
the inference rules of iProver, otherwise the latter are disabled and
only unit superposition is used.
If all clauses are unit equations, no instances need to be generated and
the calculus is run without the otherwise necessary bookkeeping.
iProver-Eq is available from
The integration of polarized resolution modulo in iProver only affects
its ordered resolution calculus, so that the instantiation calculus is
untouched.
To be able to use polarized resolution modulo, the theory has to be
presented as a rewriting system. A theory preprocessor is therefore
used to convert the axioms of the problem into such a rewriting
system.
The patched version of iProver is run on the remaining formulas
modulo the rewriting rules produced by the preprocessor. No scheduling is
performed. To be compatible with polarized resolution modulo, literals
are selected only when they are maximal w.r.t. a KBO ordering, and orphans
are not eliminated. To take advantage of polarized resolution modulo,
the resolution calculus is triggered more often than the instantiation
calculus, on the contrary to the original iProver.
Normalization of clauses w.r.t. the term rewriting system produced by
the preprocessor is performed by transforming these rules into an
OCaml program, compiling this program, and dynamically linking it with
the prover.
iProverModulo is available from
The theory preprocessor is available independently from iProverModulo from
Previous versions of Isabelle relied on the TPTP2X tool to translate TPTP
files to Isabelle theory files.
Starting this year, Isabelle includes a parser for the TPTP syntaxes CNF,
FOF, TFF0, and THF0 as well as TPTP versions of its popular tools, invokable
on the command line as isabelle tptp_tool max_secs
file.p.
For example:
The implementation of Isabelle relies on a small LCF-style kernel, meaning that
inferences are implemented as operations on an abstract theorem
datatype.
Assuming the kernel is correct, all values of type theorem are
correct by construction.
Most of the code for Isabelle was written by the Isabelle teams at the
University of Cambridge and the Technische Universität München.
Isabelle/HOL is available for all major platforms under a BSD-style license
from
Previous versions of Isabelle relied on the TPTP2X tool to translate TPTP
files to Isabelle theory files.
Isabelle includes a parser for the TPTP syntaxes CNF, FOF, TFF0, and THF0
as well as TPTP versions of its popular tools, invokable on the command line
as isabelle tptp_tool max_secs file.p.
For example:
The implementation of Isabelle relies on a small LCF-style kernel, meaning that
inferences are implemented as operations on an abstract theorem
datatype.
Assuming the kernel is correct, all values of type theorem are
correct by construction.
Most of the code for Isabelle was written by the Isabelle teams at the
University of Cambridge and the Technische Universität München.
Isabelle/HOL is available for all major platforms under a BSD-style license
from
Unfortunately the LEO-II system still uses only a very simple
sequential collaboration model with first-order ATPs instead of using
the more advanced, concurrent and resource-adaptive OANTS architecture
[BS+08] as exploited by its predecessor LEO.
The LEO-II system is distributed under a BSD style license, and it is
available from
MaLARea is available from
The system is also able to work with second order statements.
It may also receive knowledge and know-how for a specific domain from a
human user; see [Pas89] and [Pas93].
These two possibilities are not used while working with the TPTP Library.
Muscadet is available from
The translation from HOL to Kodkod's first-order relational logic (FORL) is
parameterized by the cardinalities of the atomic types occurring in it.
Nitrox enumerates the possible cardinalities for the universe.
If a formula has a finite counterexample, the tool eventually finds it,
unless it runs out of resources.
Nitpick is optimized to work with higher-order logic (HOL) and its definitional
principles (e.g., (co)inductive predicates, (co)inductive datatypes,
(co)recursive functions).
When invoked on untyped first-order problem, few of its optimizations come
into play, and the problem handed to Kodkod is essentially a first-order
relational logic (FORL) rendering of the TPTP FOF problem.
There are two main exceptions:
Paradox incorporates the following features: Polynomial-time
clause splitting heuristics, the use of incremental SAT,
static symmetry reduction techniques, and the use of sort
inference.
PEPR combines synergistically a portfolio approach, with multiple sub-provers
that differ on the strategies they use to repair their context, and
MapReduce-style clause-level parallelism for the computation of input clause
instances [LT12].
The parallel architecture of PEPR is based on the Actor model with
asynchronous message passing.
To minimize communication all literals/clauses, including the input ones,
are created locally in each sub-prover, which allows PEPR to run also in a
distributed environment.
Because CASC measures performance in terms of total CPU time, the version
of PEPR submitted to CASC-24 is actually sequential.
Although PEPR shows some promising results in its ability to exploit multiple
cores, it still a prototype.
Many configurations inside PEPR need to be further tuned.
The internal calculus of Princess only supports uninterpreted
predicates; uninterpreted functions are encoded as predicates,
together with the usual axioms.
Through appropriate translation of quantified formulae with functions,
the e-matching technique common in SMT solvers can be simulated;
triggers in quantified formulae are chosen based on heuristics similar
to those in the Simplify prover.
For CASC, Princess will run a schedule with a small number of
configurations for each problem (portfolio method). The schedule is
determined either statically, or dynamically using syntactic
attributes of problems (such as number and kind of quantifiers, etc),
based on training using a random sample of problems from the TPTP
library.
Prover9 has available positive ordered (and nonordered) resolution and
paramodulation, negative ordered (and nonordered) resolution, factoring,
positive and negative hyperresolution, UR-resolution, and demodulation
(term rewriting).
Terms can be ordered with LPO, RPO, or KBO.
Selection of the "given clause" is by an age-weight ratio.
Proofs can be given at two levels of detail:
(1) standard, in which each line of the proof is a stored clause
with detailed justification, and
(2) expanded, with a separate line for each operation.
When FOF problems are input, proof of transformation to clauses
is not given.
Completeness is not guaranteed, so termination does not indicate
satisfiability.
Given a problem, Prover9 adjusts its inference rules and strategy according
to syntactic properties of the input clauses such as the presence of
equality and non-Horn clauses.
Prover9 also does some preprocessing, for example, to eliminate predicates.
In previous CASC competitions, Prover9 has used LPO to order terms for
demodulation and for the inference rules, with a simple rule for determining
symbol precedence.
For CASC 2012, we are going to use KBO instead.
For the FOF problems, a preprocessing step attempts to
reduce the problem to independent subproblems by a miniscope
transformation; if the problem reduction succeeds, each
subproblem is clausified and given to the ordinary search
procedure; if the problem reduction fails, the original
problem is clausified and given to the search procedure.
Finishes in the middle of the pack are anticipated in all categories
in which Prover9 competes.
Satallax progressively generates higher-order formulae and corresponding
propositional clauses [Bro13].
These formulae and propositional clauses correspond to instances of the
tableau rules.
Satallax uses the SAT solver MiniSat as an engine to test the current set
of propositional clauses for unsatisfiability. If the clauses are
unsatisfiable, then the original branch is unsatisfiable.
Additionally, Satallax may optionally generate first-order formulas in
addition to the propositional clauses.
If this option is used, then Satallax peroidically calls the first-order
theorem prover E to test for first-order unsatisfiability.
If the set of first-order formulas is unsatisfiable, then the original branch
is unsatisfiable.
The learning algorithm is described in [KUS13].
Further information can be found on the project website.
The input formulae are transformed into negation normal form (but not into
clause normal form) and multiple indexing techniques are applied to speed
up the formula selection.
The selection itself is done by a dynamic module which uses two different
algorithms and triggers them depending on the structure of the current
formula set.
TEMPLAR conceptually supports every theorem prover for which an executor (a
module) is implemented.
Currently, one executor is existent, which is the leanCoP-Executor.
leanCop [OB03,Ott08] 2.2, a compact connection calculus based theorem prover
for classical first order logic is used as inference engine.
The CASC version of TEMPLAR is a beta stadium version of the project and
lacks of some features and performance optimizations in order to reduce the
memory consumption.
TEMPLAR uses - depending on the metrics of the formula set (e.g. size of
the set) - two different search approaches which are chained, if appropriate.
The one approach which is tailored for smaller sets is an unification
based graph search with multiple search graph pruning techniques.
This is an reconception and extension of the concept of ARDE which was
described in [Fra12] and used in the CASC-J6.
For this approach, the negation normal forms of all formulae are used.
The other approach is an implementation of a frequency based concept commonly
used for natural language processing (e.g. automated text summarization) and
is tailored for bigger formula sets. This concept aims to recognise the
topics of a set of documents and can recognise common symbols and thus
should have similar effects like SInE [HV11] but uses another weighting
approach.
TEMPLAR can be compiled with the GNU Compiler Collection (GCC) with the
minimum version 4.6.3 and the Boost library, minimum version 1.42 (1.46 is
better).
leanCop can be run with SWI Prolog or Eclipse Prolog (5.x) for example.
Currently, TEMPLAR is not available since it is a diploma thesis project
but will be published afterwards in the final version.
Substitution tree and code tree indexes are used to implement all major
operations on sets of terms, literals and clauses.
Although the kernel of the system works only with clausal normal form, the
shell accepts a problem in the full first-order logic syntax, clausifies it
and performs a number of useful transformations before passing the result
to the kernel.
Also the axiom selection algorithm Sine [HV11] can be enabled as part of
the preprocessing.
When a theorem is proved, the system produces a verifiable proof, which
validates both the clausification phase and the refutation of the CNF.
The prover also uses a so-called knowledge base that contains
description of algebraic theories, of individual axioms (such as associativity
and commutativity), lemmas and theory-specific rewrite systems.
This base currently contains few definitions and lemmas.
Zipperposition is available from
Beagle 0.4
Peter Baumgartner
NICTA, Australia
Architecture
Beagle is an automated theorem prover for sorted first-order logic with
equality over built-in theories.
The theories currently supported are linear integer, linear rational and
linear real arithmetic.
It accepts formulas in the FOF, TFF, and TFF-INT formats of the TPTP syntax.
Strategies
Beagles uses the well-know Discount loop for saturating a clause set under the
calculus' inference rules. Simplification techniques include standard ones,
such as subsumption deletion, rewriting by ordered unit equations, tautology
deletion. It also includes theory specific simplification rules for evaluating
ground (sub)terms, and for exploiting cancellation laws and properties of
neutral elements, among others.
Implementation
Beagle has been implemented in Scala.
It is a full implementation of the calculus mentioned above, albeit a slow
one. Fairness is achieved through a combination of measuring clause
lengths, depths and their derivation-age.
http://users.cecs.anu.edu.au/~baumgart/systems/beagle/
Expected Competition Performance
Beagle is implemented in a straightforward way and would benefit from
optimized data structures. We do not expect it to come in among the first.
cocATP 0.1.8
Cristobal Camarero
University of Cantabria, Spain
Architecture
cocATP is a Coq-inspired [Coq] automated theorem prover made on the free
time of the author.
It implements (the non-inductive) part of Coq's logic (calculus of
constructions) and syntax.
The proof terms it creates are accepted as proofs by Coq by the addition
of a few definitions (substituting the respective inductive definitions of
Coq).
As in Coq, equality and logical connectives other than implication (which
is a dependent product of the logic) are defined adding the proper axioms.
The reasoning is tried to be done the more general possible, avoiding to
give any special treatment to both equality and logical connectives.
Strategies
Makes a search tree, doing steps trying to be some similar to human
reasoning (or more closely a human guessing at Coq's interface).
Some pattern-matching is done to to apply hypothesis.
Implements existential variables (similar to Coq's tactics evar and eapply).
Implementation
cocATP is fully implemented in Python-2.7.
It uses the Ply-3.4 library to build the parsers for both the Coq and
TPTP syntaxes.
Includes a type-verifier of Calculus of Constructions without induction
constructions, which must be defined with axioms.
That is, a buggy partial clone of Coq.
There is support for most of the TPTP syntax, problems are translated to
a set of calculus of constructions terms.
cocATP has been specially prepared for THF, but all the other TPTP formulae
without numbers should be accepted.
However cocATP does NOT include a SAT solver, thus it will probably not
solve your trivial cnf problems.
Expected Competition Performance
Belong to the THF division.
Based on local tests with the 2012 battery it will perform worse than TPS.
CVC4 1.2
Andrew Reynolds
University of Iowa, USA
Architecture
CVC4 [BC+11] is an SMT solver based on the DPLL(T) architecture [NOT06] that
includes built-in support for many theories including linear arithmetic,
arrays, bit vectors and datatypes.
Additionally, it supports universally quantified formulas.
When quantified formulas are present, CVC4 typically uses E-matching for
answering unsatisfiable, and a finite model finding method for answering
satisfiable.
Strategies
For handling theorems, CVC4 uses a variation of E-matching.
Quantifier instantiation based on E-matching is performed lazily in CVC4,
after the ground solver reports satisfiable.
Implementation
CVC4 is implemented in C++.
The code is available from
https://github.com/CVC4
Expected Competition Performance
This is the first year CVC4 has entered the FOF division, where it will
likely finish around the middle.
The finite model finding feature in CVC4 has undergone a lot of development,
so it should perform better than last year.
E 1.8
Stephan Schulz
Technische Universität München, Germany
Architecture
E 1.8pre [Sch02] is a purely equational theorem prover for full
first-order logic with equality.
It consists of an (optional) clausifier for pre-processing full first-order
formulae into clausal form, and a saturation algorithm implementing an
instance of the superposition calculus with negative literal selection and
a number of redundancy elimination techniques.
E is based on the DISCOUNT-loop variant of the given-clause
algorithm, i.e., a strict separation of active and passive facts.
No special rules for non-equational literals have been implemented.
Resolution is effectively simulated by paramodulation and equality resolution.
Strategies
Proof search in E is primarily controlled by a literal selection
strategy, a clause evaluation heuristic, and a simplification ordering.
The prover supports a large number of pre-programmed literal selection
strategies. Clause evaluation heuristics can be constructed on the fly
by combining various parameterized primitive evaluation functions, or can
be selected from a set of predefined heuristics.
Clause evaluation heuristics are based on symbol-counting, but also take
other clause properties into account.
In particular, the search can prefer clauses from the set of support, or
containing many symbols also present in the goal.
Supported term orderings are several parameterized instances of
Knuth-Bendix-Ordering (KBO) and Lexicographic Path Ordering (LPO).
Implementation
E is build around perfectly shared terms, i.e. each distinct term is
only represented once in a term bank.
The whole set of terms thus consists of a number of interconnected directed
acyclic graphs.
Term memory is managed by a simple mark-and-sweep garbage collector.
Unconditional (forward) rewriting using unit clauses is implemented
using perfect discrimination trees with size and age constraints.
Whenever a possible simplification is detected, it is added as a rewrite
link in the term bank.
As a result, not only terms, but also rewrite steps are shared.
Subsumption and contextual literal cutting (also known as subsumption
resolution) is supported using feature vector indexing [Sch04].
Superposition and backward rewriting use fingerprint indexing [Sch12],
a new technique combining ideas from feature vector indexing and path
indexing.
Finally, LPO and KBO are implemented using the elegant and efficient
algorithms developed by Bernd Löchner in [Loe06,Loe06].
The prover and additional information are available from
http://www.eprover.org
Expected Competition Performance
E 1.8 has slightly better strategies than previous versions, and now
can produce proof objects internally and much more efficiently.
The system is expected to perform well in most proof classes, but will at
best complement top systems in the disproof classes.
E-KRHyper 1.4
Björn Pelzer
University Koblenz-Landau, Germany
Architecture
E-KRHyper [PW07] is a theorem proving and model generation system for
first-order logic with equality.
It is an implementation of the E-hyper tableau calculus [BFP07], which
integrates a superposition-based handling of equality [BG98] into the hyper
tableau calculus [BFN96].
The system is an extension of the KRHyper theorem prover [Wer03], which
implements the original hyper tableau calculus.
Strategies
E-KRHyper has traditionally used a uniform search strategy for all problems,
based on the aforementioned semi-naive evaluation.
As of version 1.4 the prover also contains a prototypical implementation
of the common Given-Clause-Algorithm ("Otter-loop") [McC03],
which may be used as an alternative on problems of a certain size.
The E-hyper tableau is generated depth-first, with E-KRHyper always working
on a single branch. Refutational completeness and a fair search control are
ensured by an iterative deepening strategy with a limit on the maximum term
weight of generated clauses.
In the LTB division E-KRHyper sequentially tries three axiom selection
strategies:
an implementation of Krystof Hoder's SInE algorithm,
another incomplete selection based on the CNF representations of the axioms,
and finally the complete axiom set.
Implementation
E-KRHyper is implemented in the functional/imperative language OCaml.
The system runs on Unix and MS-Windows platforms.
the GNU Public License from the E-KRHyper website at:
The system accepts input in the TPTP-format and in the TPTP-supported
Protein-format.
The calculus implemented by E-KRHyper works on clauses, so first order
formula input is converted into CNF by an algorithm similar to the one
used by Otter [McC03], with some additional connector literals to prevent
explosive clause growth when dealing with DNF-like structures.
E-KRHyper operates on an E-hyper tableau which is represented by linked
node records.
Several layered discrimination-tree based indexes (both perfect and
non-perfect) provide access to the clauses in the tableau and support
backtracking.
http://www.uni-koblenz.de/~bpelzer/ekrhyper
Expected Competition Performance
There has been some streamlining regarding memory consumption,
but overall we expect no big change over last year's performance.
E-MaLeS 1.2
Daniel Kühlwein1, Josef Urban1, Stephan Schulz2
1Radboud University Nijmegen, The Netherlands
2Technische Universität München, Germany
Architecture
E-MaLeS 1.2 is the result of applying the MaLeS 1.2 framework to E prover
(version 1.7) [Sch02].
Strategies
Given a set of problems, MaLeS 1.2 finds good parameter settings by using a
local random search algorithm.
The found settings are stored as strategies.
For each strategy, MaLeS learns a function that given a new problem predicts
the time the ATP needs to solve this problem when using this particular
strategy.
When trying to solve a new problem, MaLeS uses these prediction functions to create a strategy schedule.
Implementation
MaLeS is implemented in python, using the numpy and scipy libraries.
MaLeS is open source and available from
https://code.google.com/p/males/
E prover is available from
http://www.eprover.org
Expected Competition Performance
E-MaLeS 1.2 should perform better than E 1.7.
iProver 0.9
Konstantin Korovin
University of Manchester, United Kingdom
Architecture
iProver is an automated theorem prover based on an instantiation calculus
Inst-Gen [GK03,Kor09] which is complete for first-order logic.
One of the distinctive features of iProver is a modular combination of
first-order reasoning with ground reasoning.
In particular, iProver currently integrates MiniSat [ES04] for reasoning
with ground abstractions of first-order clauses.
In addition to instantiation, iProver implements ordered resolution calculus
and a combination of instantiation and ordered resolution; see
[Kor08] for the implementation details.
The saturation process is implemented as a modification of a given
clause algorithm.
iProver uses non-perfect discrimination trees for the unification indexes,
priority queues for passive clauses, and a compressed vector index for
subsumption and subsumption resolution (both forward and backward).
The following redundancy eliminations are implemented: blocking non-proper
instantiations; dismatching constraints [GK04,Kor08]; global subsumption
[Kor08]; resolution-based simplifications and propositional-based
simplifications.
A compressed feature vector index is used
for efficient forward/backward subsumption and subsumption resolution.
Equality is dealt with (internally) by adding the necessary axioms of equality
with an option of using Brand's transformation.
In the LTB division, iProver-SInE uses axiom selection based
on the SInE algorithm [HV11] as implemented in Vampire [HKV10],
i.e., axiom selection is done by Vampire and
proof attempts are done by iProver.
Strategies
iProver has around 40 options to control the proof
search including options for literal selection,
passive clause selection, frequency of calling the SAT solver,
simplifications and options for combination of instantiation with resolution.
At CASC iProver will execute a small number of fixed schedules of selected
options depending on general syntactic properties such as Horn/non-Horn,
equational/non-equational, and maximal term depth.
Implementation
iProver is implemented in OCaml and for the ground reasoning uses MiniSat.
iProver accepts FOF and CNF formats,
where Vampire [HKV10] is used for clausification of FOF problems.
http://www.cs.man.ac.uk/~korovink/iprover/
Expected Competition Performance
iProver 0.9 is the CASC-23 EPR division winner.
iProver 1.0
Konstantin Korovin, Christoph Sticksel
University of Manchester, United Kingdom
Architecture
iProver is an automated theorem prover based on an instantiation calculus
Inst-Gen [GK03,Kor13] which is complete for first-order logic.
iProver combines first-order reasoning with ground reasoning for which it
uses MiniSat [ES04] and was recently extended with PicoSAT [Bie08] and
Lingeling [Bie12] (only MiniSat will be used at this CASC).
iProver also combines instantiation with ordered resolution;
see [Kor08] for the implementation details.
The proof search is implemented using a saturation process based on
the given clause algorithm. iProver uses non-perfect discrimination trees
for the unification indexes, priority queues for passive clauses, and
a compressed vector index for subsumption and subsumption resolution
(both forward and backward).
The following redundancy eliminations are implemented: blocking non-proper
instantiations; dismatching constraints [GK04,Kor08]; global subsumption
[Kor08]; resolution-based simplifications and propositional-based
simplifications. A compressed feature vector index is used
for efficient forward/backward subsumption and subsumption resolution.
Equality is dealt with (internally) by adding the necessary axioms of equality
with an option of using Brand's transformation.
In the LTB division, iProver uses axiom selection based on the SInE
algorithm [HV11] as implemented in Vampire [HKV11], i.e., axiom selection
is done by Vampire and proof attempts are done by iProver.
Type inference is targeted at improving finite model finding and symmetry
breaking.
Semantic filtering is used in preprocessing to eliminated irrelevant clauses.
Proof extraction is challenging
due to simplifications such global subsumption which involve global reasoning
with the whole clause set and can be computationally expensive.
Strategies
iProver has around 60 options to control the proof search including options
for literal selection, passive clause selection, frequency of calling the
SAT solver, simplifications and options for combination of instantiation
with resolution.
At CASC iProver will execute a small number of fixed schedules of selected
options depending on general syntactic properties such as Horn/non-Horn,
equational/non-equational, and maximal term depth.
The strategy for satisfiable problems (FNT division) includes finite model
finding.
Implementation
iProver is implemented in OCaml and for the ground reasoning uses
MiniSat [ES04].
iProver accepts FOF and CNF formats.
Vampire [HKV11,HK+12] is used for proof-producing clausification of FOF
problems as well as for axiom selection [HV11] in the LTB division.
http://www.cs.man.ac.uk/~korovink/iprover/
Expected Competition Performance
Compared to the last year, iProver had a number of changes in internal
datastructures affecting mainly extensibility of the code.
We expect similar performance in the EPR division and
good performance on satisfiable problems (FNT division).
iProver-Eq 0.85
Christoph Sticksel, Konstantin Korovin
University of Iowa, USA
Architecture
iProver-Eq [KS10] extends the iProver system [Kor08] with built-in
equational reasoning, along the lines of [GK04].
As in the iProver system, first-order reasoning is combined with ground
satisfiability checking where the latter is delegated to an off-the-shelf
ground solver.
Strategies
Proof search options in iProver-Eq control clause and literal selection
in the respective given clause algorithms.
Equally important is the global distribution of time between the inference
engines and the ground solver.
At CASC, iProver-Eq will execute a fixed schedule of selected options.
Implementation
iProver-Eq is implemented in OCaml and the CVC4 SMT solver [BC+11] for the
ground reasoning in the equational case and MiniSat [ES04] in the
non-equational case.
iProver-Eq accepts FOF and CNF formats, where Vampire [HKV11,HK+12] is
used for clausification and preprocessing of FOF problems.
http://www.divms.uiowa.edu/~csticksel/iprover-eq
Expected Competition Performance
iProver-Eq has seen some optimisations from the version in the previous
CASCs, in particular the integration of CVC4.
We expect reasonably good performance in all divisions, including the EPR
divisions where instantiation-based methods are particularly strong.
iProverModulo 0.7-0.2
Guillaume Burel
ENSIIE/Cedric, France
Architecture
iProverModulo [Bur11] is a patch to iProver [Kor08] to integrate polarized
resolution modulo [Dow10].
Polarized resolution modulo consists in presenting the theory in which the
problem has to be solved by means of polarized rewriting rules instead of
axioms.
It can also be seen as a combination of the set-of-support strategy and
selection of literals.
Strategies
A theory preprocessor is first run to transform the formulas of the
problem whose role is "axiom" into polarized rewriting rules.
This preprocessor offers a set of strategies to that purpose.
For the competition, the Equiv(ClausalAll) and the ClausalAll strategies
will be used.
The former strategy may be incomplete, depending on the shape of the axioms,
so that the prover may give up in certain cases.
However, it shows interesting results on some problems.
The second strategy should be complete, at least when equality is not
involved.
The rewriting system for the first strategy is tried for half the time
given for the problem, then the prover is restarted with the second strategy
if no proof has been found.
Implementation
iProverModulo is available as a patch to iProver.
The most important additions are the plugin-based normalization engine and
the handling of polarized rewriting rules.
http://www.ensiie.fr/~guillaume.burel/blackandwhite_iProverModulo.html.en
http://www.ensiie.fr/~guillaume.burel/blackandwhite_autotheo.html.en
Both of them are written in OCaml.
Expected Competition Performance
Although iProver Modulo is based on iProver, we do not expect to perform
better than the original iProver.
This can be justified by the following arguments:
First, iProver Modulo is based on a somewhat old version of iProver (v0.7).
Second, the theory preprocessor may produce rewriting systems for which
polarized resolution modulo is incomplete.
Third, we had to switch off scheduling to cope with the constraints of
polarized resolution modulo.
Nevertheless, we expect to solve some problems that are difficult for
other provers.
Isabelle 2012
Jasmin C. Blanchette1, Lawrence C. Paulson2,
Tobias Nipkow1, Makarius Wenzel3
1Technische Universität München, Germany,
2University of Cambridge, United Kingdom,
3Université Paris Sud, France
Architecture
Isabelle/HOL 2012 [NPW12] is the higher-order logic incarnation of the generic
proof assistant Isabelle.
Isabelle/HOL provides several automatic proof tactics, notably an equational
reasoner [Nip89], a classical reasoner [PN94], and a tableau prover [Pau99].
It also integrates external first- and higher-order provers via its subsystem
Sledgehammer [PB10,BBP11].
Two versions of Isabelle participate this year.
The demo version includes its competitors LEO-II [BP+08] and Satallax
[Bro12] as Sledgehammer backends, whereas the competition version
leaves them out.
isabelle tptp_isabelle_demo 100 SEU/SEU824^3.p
Strategies
The Isabelle tactic submitted to the competition simply tries the
following tactics sequentially:
Implementation
Isabelle is a generic theorem prover written in Standard ML.
Its meta-logic, Isabelle/Pure, provides an intuitionistic fragment of
higher-order logic.
The HOL object logic extends pure with a more elaborate version of higher-order
logic, complete with the familiar connectives and quantifiers.
Other object logics are available, notably FOL (first-order logic) and ZF
(Zermelo-Fraenkel set theory).
http://www.cl.cam.ac.uk/research/hvg/Isabelle
Expected Competition Performance
Isabelle 2012 is the CASC-J6 THF division winner.
Isabelle 2013
Jasmin C. Blanchette1, Lawrence C. Paulson2,
Tobias Nipkow1, Makarius Wenzel3
1Technische Universität München, Germany,
2University of Cambridge, United Kingdom,
3Université Paris Sud, France
Architecture
Isabelle/HOL 2013 [NPW12] is the higher-order logic incarnation of the generic
proof assistant Isabelle.
Isabelle/HOL provides several automatic proof tactics, notably an equational
reasoner [Nip89], a classical reasoner [PN94], and a tableau prover [Pau99].
It also integrates external first- and higher-order provers via its subsystem
Sledgehammer [PB10,BBP11].
isabelle tptp_isabelle_demo 100 SEU/SEU824^3.p
Strategies
The Isabelle tactic submitted to the competition simply tries the
following tactics sequentially:
Implementation
Isabelle is a generic theorem prover written in Standard ML.
Its meta-logic, Isabelle/Pure, provides an intuitionistic fragment of
higher-order logic.
The HOL object logic extends pure with a more elaborate version of higher-order
logic, complete with the familiar connectives and quantifiers.
Other object logics are available, notably FOL (first-order logic) and ZF
(Zermelo-Fraenkel set theory).
http://www.cl.cam.ac.uk/research/hvg/Isabelle
Expected Competition Performance
Isabelle won last year by a thin margin.
This year: first or second place.
LEO-II 1.6.0
Christoph Benzmüller
Freie Universität Berlin
Architecture
LEO-II [BP+08], the successor of LEO [BK98], is a higher-order ATP system
based on extensional higher-order resolution.
More precisely, LEO-II employs a refinement of extensional higher-order
RUE resolution [Ben99].
LEO-II is designed to cooperate with specialist systems for fragments of
higher-order logic.
By default, LEO-II cooperates with the first-order ATP system E [Sch02].
LEO-II is often too weak to find a refutation amongst the steadily growing
set of clauses on its own.
However, some of the clauses in LEO-II's search space attain a special
status: they are first-order clauses modulo the application of an
appropriate transformation function.
Therefore, LEO-II launches a cooperating first-order ATP system every n
iterations of its (standard) resolution proof search loop (e.g., 10).
If the first-order ATP system finds a refutation, it communicates its success
to LEO-II in the standard SZS format.
Communication between LEO-II and the cooperating first-order ATP system
uses the TPTP language and standards.
Strategies
LEO-II employs an adapted "Otter loop".
Moreover, LEO-II uses some basic strategy scheduling to try different
search strategies or flag settings.
These search strategies also include some different relevance filters.
Implementation
LEO-II is implemented in OCaml 4, and its problem representation language
is the TPTP THF language [BRS08].
In fact, the development of LEO-II has largely paralleled the development
of the TPTP THF language and related infrastructure [SB10].
LEO-II's parser supports the TPTP THF0 language and also the TPTP languages
FOF and CNF.
http://www.leoprover.org
Expected Competition Performance
Recent improvements of LEO-II include a revised ATP interface, new
translations into first-order logic, rule support for the axiom of
choice, detection of defined equality, modified extensional
unification, and more flexible strategy scheduling.
LEO-II can now also classify some Satisfiable problems and detect some
CounterSatisfiable problems.
However, these improvements are very likely not strong enough to attack
Satallax or Isabelle in CASC.
MaLARea 0.5
Josef Urban1, Stephan Schulz2, Jiri Vyskocil3
1Radboud University Nijmegen, The Netherlands
2Technische Universität München, Germany
3Czech Technical University, Czech Republic
Architecture
MaLARea 0.5 [Urb07,US+08] is a metasystem for ATP in large theories
where symbol and formula names are used consistently.
It uses several deductive systems (now E, SPASS, Vampire, Paradox, Mace),
as well as complementary AI techniques like machine learning (the SNoW system)
based on symbol-based similarity, model-based similarity, term-based
similarity, and obviously previous successful proofs.
The version for CASC 2013 will mainly use E prover with the BliStr [Urb13]
large-theory strategies, possibly also Prover9, Mace and Paradox.
The premise selection methods will likely also use the distance-weighted
k-nearest neighbor [KU13] and E's implementation of SInE.
Strategies
The basic strategy is to run ATPs on problems, then use the machine learner
to learn axiom relevance for conjectures from solutions, and use
the most relevant axioms for next ATP attempts.
This is iterated, using different timelimits and axiom limits.
Various features are used for learning, and the learning is complemented
by other criteria like model-based reasoning, symbol and term-based
similarity, etc.
Implementation
The metasystem is implemented in ca. 2500 lines of Perl. It uses
many external programs - the above mentioned ATPs and machine learner,
TPTP utilities, LADR utilities for work with models, and some standard
Unix tools.
https://github.com/JUrban/MPTP2/tree/master/MaLARea
The metasystem's Perl code is released under GPL2
Expected Competition Performance
Thanks to machine learning, MaLARea is strongest on batches of many
related problems with many redundant axioms where some of the problems
are easy to solve and can be used for learning the axiom
relevance.
MaLARea is not very good when all problems are too difficult (nothing to
learn from), or the problems (are few and) have nothing in common.
Some of its techniques (selection by symbol and term-based similarity,
model-based reasoning) could however make it even there slightly stronger
than standard ATPs.
MaLARea has a very good performance on the MPTP Challenge, which is a
predecessor of the LTB division, and it is the winner of the 2008 MZR LTB
category.
MaLARea 0.4 came second in the 2012 MZR@Turing competition and solved
most problems in the Assurance class.
Muscadet 4.3
Dominique Pastre
University Paris Descartes, France
Architecture
The Muscadet theorem prover is a knowledge-based system.
It is based on Natural Deduction, following the terminology of [Ble71] and
[Pas78], and uses methods which resembles those used by humans.
It is composed of an inference engine, which interprets and executes rules,
and of one or several bases of facts,
which are the internal representation of "theorems to be proved".
Rules are either universal and put into the system, or built by the system
itself by metarules from data (definitions and lemmas).
Rules may add new hypotheses, modify the conclusion, create objects,
split theorems into two or more subtheorems
or build new rules which are local for a (sub-)theorem.
Strategies
There are specific strategies for existential, universal, conjonctive or
disjunctive hypotheses and conclusions, and equalities.
Functional symbols may be used, but an automatic creation of intermediate
objects allows deep subformulae to be flattened and treated as if the
concepts were defined by predicate symbols.
The successive steps of a proof may be forward deduction (deduce new hypotheses
from old ones), backward deduction (replace the conclusion by a new one),
refutation (only if the conclusion is a negation), search for objects
satisfying the conclusion or dynamic building of new rules.
Implementation
Muscadet [Pas01] is implemented in SWI-Prolog.
Rules are written as more or less declarative Prolog clauses.
Metarules are written as sets of Prolog clauses.
The inference engine includes the Prolog interpreter and some procedural
Prolog clauses.
A theorem may be split into several subtheorems, structured as a tree with
"and" and "or" nodes.
All the proof search steps are memorized as facts including all the elements
which will be necessary to extract later the useful steps (the name of the
executed action or applied rule, the new facts added or rule dynamically
built, the antecedents and a brief explanation).
http://www.math-info.univ-paris5.fr/~pastre/muscadet/muscadet.html
Expected Competition Performance
The best performances of Muscadet will be for problems manipulating many
concepts in which all statements (conjectures, definitions, axioms) are
expressed in a manner similar to the practice of humans, especially of
mathematicians [Pas02,Pas07].
It will have poor performances for problems using few concepts but large
and deep formulas leading to many splittings.
Its best results will be in set theory, especially for functions and relations.
Its originality is that proofs are given in natural style.
Changes since last year are that some bugs have been fixed, also some
imperfections in the writing of the proofs, and some heuristics have been
improved which shorten a number of proofs.
Nitrox 2013
Jasmin C. Blanchette1, Emina Torlak2
1Technische Universität München, Germany,
2University of California, USA
Architecture
Nitrox is the first-order version of Nitpick [BN10], an open source
counterexample generator for Isabelle/HOL [NPW12].
It builds on Kodkod [TJ07], a highly optimized first-order relational
model finder based on SAT.
The name Nitrox is a portmanteau of Nitpick
and Paradox (clever, eh?).
Strategies
Nitrox employs Kodkod to find a finite model of the negated conjecture.
It performs a few transformations on the input, such as pushing quantifiers
inside, but 99% of the solving logic is in Kodkod and the underlying SAT
solver.
Implementation
Nitrox, like most of Isabelle/HOL, is written in Standard ML.
Unlike Isabelle itself, which adheres to the LCF small-kernel discipline,
Nitrox does not certify its results and must be trusted.
Kodkod is written in Java.
MiniSat 1.14 is used as the SAT solver.
Expected Competition Performance
Since Nitpick was designed for HOL, it doesn't have any type inference à
la Paradox.
It also doesn't use the SAT solver incrementally, which penalizes it a bit
(but not as much as the missing type inference).
Kodkod itself is known to perform less well on FOF than Paradox, because it
is designed and optimized for a somewhat different logic, FORL.
On the other hand, Kodkod's symmetry breaking seems better calibrated than
Paradox's.
Hence, we expect Nitrox to end up in second or third place in the FNT division.
Paradox 3.0
Koen Claessen, Niklas Sörensson
Chalmers University of Technology, Sweden
Architecture
Paradox [CS03] is a finite-domain model generator.
It is based on a MACE-style [McC03] flattening and instantiating of the
first-order clauses into propositional clauses, and then the use of
a SAT solver to solve the resulting problem.
Strategies
There is only one strategy in Paradox:
In this way, Paradox can be used both as a model finder and as an EPR solver.
Implementation
The main part of Paradox is implemented in Haskell using the GHC compiler.
Paradox also has a built-in incremental SAT solver which is written in C++.
The two parts are linked together on the object level using Haskell's Foreign
Function Interface.
Expected Competition Performance
Paradox 3.0 is the CASC-J6 FNT division winner.
PEPR 0.0ps
Tianyi Liang, Cesare Tinelli
University of Iowa, USA
Architecture
PEPR is a hybrid parallel automated theorem prover for the effectively
propositional fragment of first-order logic (EPR).
Its core is based on a variant of the Model Evolution calculus [BT03,BT05],
which is a decision procedure for the satisfiability of EPR formulas.
During reasoning, the prover maintains a set of possibly non-ground literals,
the context, to represent a candidate Herbrand model of the input clause set;
it uses resolution-like operations between literals in the context and input
clauses to identify instances of the latter that are falsified by the
candidate model.
These instances are then used to modify ("repair") the context
non-deterministically in an attempt to satisfy them.
If a context is not repairable, the input set has been proven to be
unsatisfiable.
A model is found when the context is saturated.
Strategies
In addition to the strategies implemented in Darwin, a theorem prover for
the full ME calculus, PEPR implements conjecture distance, candidate
coverage and predicate credits as literal selection heuristics, as well
as thread scheduling for the parallel part.
Implementation
The sequential core of PEPR is fully implemented in C++, and it features
new functionalities from the new C++11 standards.
It requires either GCC 4.7 (or above) or MSVC 2012 to compile.
To parse the TPTP syntax, it also requires the ANTLR 3.4 library.
PEPR accepts the TPTP CNF format for EPR formula natively, and requires an
external clausifier for the FOF format.
The parallel part is built strictly upon the sequential core, and requires
the Charm++ library to compile.
Expected Competition Performance
PEPR will enter only the EPR division.
Based on local tests with the CASC-23 benchmarks we expect it to rank in
the middle range.
Princess 2012-06-04
Philipp Rümmer, Aleksandar Zeljić
Uppsala University, Sweden
Architecture
Princess [Rue08,Rue12] is a theorem prover for first-order logic modulo
linear integer arithmetic.
The prover has been under development since 2007, and represents a
combination of techniques from the areas of first-order reasoning and
SMT solving.
The main underlying calculus is a free-variable tableau calculus, which
is extended with constraints to enable backtracking-free proof expansion,
and positive unit hyper-resolution for lightweight instantiation of quantified
formulae.
Linear integer arithmetic is handled using a set of built-in proof rules
resembling the Omega test, which altogether yields a calculus that is
complete for full Presburger arithmetic, for first-order logic, and for
a number of further fragments.
Strategies
Princess supports a number of different proof expansion strategies
(e.g., depth-first, breadth-first), which are chosen based on
syntactic properties of a problem (in particular, the kind of
quantifiers occurring in the problem). Further options exist to
control, for instance, the selection of triggers in quantified
formulae, clausification, and the handling of functions.
Implementation
Princess is entirely written in Scala and runs on any recent Java
virtual machine; besides the standard Scala and Java libraries, only
the Cup parser library is employed.
Princess is available from
http://www.philipp.ruemmer.org/princess.shtml
Expected Competition Performance
Princess 2012-06-04 is the CASC-J6 TFA division winner.
Prover9 2009-11A
Bob Veroff on behalf of William McCune
University of New Mexico, USA
Architecture
Prover9, Version 2009-11A, is a resolution/paramodulation prover for
first-order logic with equality.
Its overall architecture is very similar to that of Otter-3.3 [McC03].
It uses the "given clause algorithm", in which not-yet-given clauses are
available for rewriting and for other inference operations (sometimes called
the "Otter loop").
Strategies
Like Otter, Prover9 has available many strategies; the following
statements apply to CASC-2012.
Implementation
Prover9 is coded in C, and it uses the LADR libraries.
Some of the code descended from EQP [McC97].
(LADR has some AC functions, but Prover9 does not use them).
Term data structures are not shared (as they are in Otter).
Term indexing is used extensively, with discrimination tree indexing for
finding rewrite rules and subsuming units, FPA/Path indexing for finding
subsumed units, rewritable terms, and resolvable literals.
Feature vector indexing [Sch04] is used for forward and backward nonunit
subsumption.
Prover9 is available from
http://www.cs.unm.edu/~mccune/prover9/
Expected Competition Performance
Some of the strategy development for CASC was done by experimentation with
the CASC-2004 competition "selected" problems.
(Prover9 has not yet been run on other TPTP problems.)
Prover9 is unlikely to challenge the CASC leaders, because
(1) extensive testing and tuning over TPTP problems has not been done,
(2) theories (e.g., ring, combinatory logic, set theory) are not recognized,
(3) term orderings and symbol precedences are not fine-tuned, and
(4) multiple searches with differing strategies are not run.
Satallax 2.7
Chad E. Brown
Saarland University, Germany
Architecture
Satallax 2.7 [Bro12] is an automated theorem prover for higher-order logic.
The particular form of higher-order logic supported by Satallax is Church's
simple type theory with extensionality and choice operators.
The SAT solver MiniSat [ES04] is responsible for much of the search for a
proof.
The theoretical basis of search is a complete ground tableau calculus for
higher-order logic [BS10] with a choice operator [BB11].
A problem is given in the THF format.
A branch is formed from the axioms of the problem and the negation of the
conjecture (if any is given).
From this point on, Satallax tries to determine unsatisfiability or
satisfiability of this branch.
Strategies
There are about a hundred flags that control the order in which formulas
and instantiation terms are considered and propositional clauses are
generated.
Other flags activate some optional extensions to the basic proof procedure
(such as whether or not to call the theorem prover E).
A collection of flag settings is called a mode.
Approximately 500 modes have been defined and tested so far.
A strategy schedule is an ordered collection of modes with information about
how much time the mode should be allotted.
Satallax tries each of the modes for a certain amount of time sequentially.
Satallax 2.7 has strategy schedule consisting of 68 modes.
Each mode is tried for time limits ranging from 0.1 seconds to 54.9 seconds.
The strategy schedule was determined through experimentation using the
THF problems in version 5.4.0 of the TPTP library.
Implementation
Satallax 2.7 is implemented in OCaml.
A foreign function interface is used to in teract with MiniSat 2.2.0.
Satallax is available from
http://satallax.com
Expected Competition Performance
Satallax 2.1 won the THF division at CASC in 2011
and Satallax 2.4 came in second among the competing systems in 2012.
Satallax has traditionally been weak on problems involving equational
reasoning.
Satallax 2.7 should be much better on such problems since most of these
problems (in practice) are first-order and can be solved by E, so Satallax
2.7 should perform better than Satallax 2.4 would in the competition.
Satallax-MaLeS 1.2
Daniel Kühlwein1, Josef Urban1, Chad Brown2
1Radboud University Nijmegen, The Netherlands
2Saarland University, Germany
Architecture
Satallax-MaLeS 1.2 is the result of applying the MaLeS 1.2 framework to
Satallax (version 2.7) [Bro12].
MaLeS (Machine Learning of Strategies) is a framework for automatic tuning
of ATPs.
It combines strategy finding (similar to ParamILS [HH+09] and BliStr [Urb13])
with strategy scheduling (similar to SatZilla [XH+08]).
A description of Satallax can be found above.
Strategies
Given a set of problems, MaLeS 1.2 finds good parameter settings by using
a local random search algorithm.
The found settings are stored as strategies.
For each strategy, MaLeS learns a function that given a new problem predicts
the time the ATP needs to solve this problem when using this particular
strategy.
When trying to solve a new problem, MaLeS uses these prediction functions
to create a strategy schedule.
Implementation
MaLeS is implemented in python, using the numpy and scipy libraries.
MaLeS is open source and available from
https://code.google.com/p/males/
Satallax is available from
http://satallax.com
Expected Competition Performance
Satallax-MaLeS 1.2 should perform similar to Satallax 2.7.
SPASS+T 2.2.19
Uwe Waldmann
Max-Planck-Insitut für Informatik, Germany
Architecture
SPASS+T is an extension of the superposition-based theorem prover SPASS
that integrates algebraic knowledge into SPASS in three complementary ways:
by passing derived formulas to an external SMT procedure
(currently Yices or CVC3), by adding standard axioms,
and by built-in arithmetic simplification and inference rules.
A first version of the system has been described in [PW06];
later a much more sophisticated coupling of the SMT procedure has been
added [Zim07].
The latest version provides improved support for isint/1, israt/1, floor/1
and ceiling/1 and adds partial input abstraction and history-dependent weights
for numerical constants.
Strategies
Standard axioms and built-in arithmetic simplification and inference rules
are integrated into the standard main loop of SPASS.
Inferences between standard axioms are excluded, so the
user-supplied formulas are taken as set of support.
The external SMT procedure runs in parallel in a separate process,
leading occasionally to non-deterministic behaviour.
Implementation
SPASS+T is implemented in C.
SPASS+T is available from
http://www.mpi-inf.mpg.de/~uwe/software/#TSPASS
Expected Competition Performance
SPASS+T 2.2.16 came second in the TFA division of last CASC;
we expect a similar performance in CASC 2013.
TEMPLAR::leanCoP 0.8
Mario Frank, Jens Otten
University of Potsdam, Germany
Architecture
TEMPLAR is a formula set pruner which has the scope to reduce the set of
given axioms to a size which can be handled by current state-of-the-art
automated theorem provers.
It reads and analyses the given axiom set and searches for a set of formulae
which seem to be relevant for the proof of a given formula.
The selected formulae can be given to an automated theorem prover with
additional information and the theorem prover can attempt a proof.
Multiple instances of a theorem prover can be started in parallel while TEMPLAR
calculates other formula sets.
Strategies
TEMPLAR analyses all axioms and gathers information about included predicate
symbols, their arity and additional information like their polarities and for
instance the complete formula graph.
Implementation
TEMPLAR is fully implemented in C++ conforming to the C++ standard of 2011.
The TPTP and batch file parsing is done by a parser which is implemented in
a descriptive meta-language (a grammar) conforming to the C++ syntax. This
grammar is transformed into C++ instructions by the boost library (boost spirit
Qi).
Expected Competition Performance
TEMPLAR extends a theorem prover by the described pruning techniques but
gives the prover the chance to attempt a proof on the original file, too.
Thus, the performance is expected to be at least the performance of the
underlying prover.
Due to the restrictions, some proofs are expected to be found faster and
some normally not (in the given time) provable problems are expected to
become provable.
TPS 3.120601S1b
Chad E. Brown1, Peter Andrews2
1Saarland University, Germany,
2Carnegie Melon University, USA
Architecture
TPS is a higher-order theorem proving system that has been developed over
several decades under the supervision of Peter B. Andrews with substantial
work by Eve Longini Cohen, Dale A. Miller, Frank Pfenning, Sunil Issar,
Carl Klapper, Dan Nesmith, Hongwei Xi, Matthew Bishop, Chad E. Brown,
Mark Kaminski, Rémy Chrétien and Cris Perdue.
TPS can be used to prove theorems of Church's type theory automatically,
interactively, or semi-automatically [AB+96,AB06].
When searching for a proof, TPS first searches for an expansion proof
[Mil87] or an extensional expansion proof [Bro07] of the theorem.
Part of this process involves searching for acceptable matings [And81].
Using higher-order unification, a pair of occurrences of subformulae (which
are usually literals) is mated appropriately on each vertical path through
an expanded form of the theorem to be proved.
The expansion proof thus obtained is then translated [Pfe87] without further
search into a proof of the theorem in natural deduction style.
Strategies
Strategies used by TPS in the search process include:
Implementation
TPS has been developed as a research tool for developing, investigating, and
refining a variety of methods of searching for expansion proofs, and
variations of these methods.
Its behavior is controlled by hundreds of flags.
A set of flags, with values for them, is called a mode.
The strategy of the current version of TPS consists of 71 modes.
When searching for a proof in automatic mode, TPS tries each of these modes
in turn for a specified amount of time (at least 1 second and at most 41
seconds).
If TPS succeeds in finding an expansion proof, it translates the expansion
proof to a natural deduction proof.
This final step ensures that TPS will not incorrectly report that a formula
has been proven.
TPS is implemented in Common Lisp, and is available from
http://gtps.math.cmu.edu/tps.html
Expected Competition Performance
TPS was the CASC-22 THF division winner in 2009.
In the CASC competitions since 2009 TPS has come in last behind Isabelle,
LEO-II and Satallax.
There have been no changes to TPS since last year, so there is no reason
to believe TPS will win this year.
Vampire 2.6
Krystof Hoder, Andrei Voronkov
University of Manchester, England
Architecture
Vampire 2.6 is an automatic theorem prover for first-order classical logic.
It consists of a shell and a kernel.
The kernel implements the calculi of ordered binary resolution and
superposition for handling equality.
It also implements the Inst-gen calculus.
The splitting rule in kernel adds propositional parts to clauses, which are
being manipulated using binary decision diagrams and a SAT solver.
A number of standard redundancy criteria and simplification techniques are
used for pruning the search space: subsumption, tautology deletion,
subsumption resolution and rewriting by ordered unit equalities.
The reduction ordering is the Knuth-Bendix Ordering.
Strategies
The Vampire 2.6 kernel provides a fairly large number of options for strategy se
lection. The most important ones are:
Implementation
Vampire 2.6 is implemented in C++.
Expected Competition Performance
Vampire 2.6 is the CASC-J6 FOF and LTB division winner.
Zipperposition 0.2
Guillaume Burel, Simon Cruanes
ENSIIE/Cedric, France
Architecture
Zipperposition 0.2 is a superposition prover for first order logic with
equality. It is written in OCaml, and largely inspired from E [Sch02].
It features lazy transformation to CNF [GS03], the Superposition, Equality
Resolution and Equality Factoring inference rules, and many simplification
rules including unit rewriting, condensation, subsumption and contextual
literal cutting.
Strategies
Zipperposition allows one to choose a term ordering family among KBO or RPO
(using RPO by default).
Otherwise, it is totally automated and tries to select a symbol precedence
based on its input (e.g., if it detect a symbol definition in the input, it
will try to orient it so that demodulation will eliminate the symbol).
Implementation
The prover is written in OCaml. For indexing, it uses perfect discrimination
trees for demodulation, feature vector indexes for subsumption, and
fingerprint indexing for inferences.
It depends on a few OCaml libraries to serialize/deserialize its knowledge
base and to provide a Datalog reasoner at the meta-level.
Since Zipperposition is designed to be a prototype for experimenting with
superposition, its performance is average at best.
http://www.rocq.inria.fr/deducteam/Zipperposition/
Expected Competition Performance
We expect Zipperposition to have decent performance on FOF problems with
equality, and poor performance on large CNF problems without equality.
The system is not tuned for CASC in any way.