CVC3 works with a version of first-order logic with polymorphic types and
has a wide variety of features including:
EP 1.4pre is just a combination of E 1.4pre in verbose mode
and a proof analysis tool extracting the used inference steps. For the
LTB division, a control program uses a SInE-like analysis to extract
reduced axiomatizations that are handed to several instances of E.
The automatic mode is based on a static partition of the set of all
clausal problems based on a number of simple features (number of
clauses, maximal symbol arity, presence of equality, presence of
non-unit and non-Horn clauses,...). Each class of clauses is
automatically assigned a heuristic that performs well on problems from
this class in test runs. About 100 different strategies have been
evaluated on all untyped first-order problems from TPTP 4.1.0.
An E-hyper tableau is a tree whose nodes are labeled with clauses and which
is built up by the application of the inference rules of the E-hyper tableau
calculus.
The calculus rules are designed such that most of the reasoning is performed
using positive unit clauses.
Splitting is done without rigid variables.
Instead, variables which would be shared between branches are prevented by
ground substitutions, which are guessed from the Herbrand universe and
constrained by rewrite rules.
Redundancy rules allow the detection and removal of clauses that are redundant
with respect to a branch.
The hyper extension inference from the original hyper tableau calculus is
equivalent to a series of E-hyper tableau calculus inference applications.
Therefore the implementation of the hyper extension in KRHyper by a variant
of semi-naive evaluation [Ull89] is retained in E-KRHyper, where it serves
as a shortcut inference for the resolution of non-equational literals.
In the LTB division E-KRHyper sequentially tries three axiom selection
strategies:
an implementation of Krystof Hoder's SInE algorithm,
another incomplete selection based on the CNF representations of the axioms,
and finally the complete axiom set.
Major additions in the current version are:
iProver is available from:
iProver-Eq consists of three core components: i) ground reasoning
by an SMT solver, ii) first-order equational reasoning on literals
in a candidate model by a labelled unit superposition calculus
[KS10,KS10] and iii) instantiation of clauses with substitutions
obtained by ii).
Given a set of first-order clauses, iProver-Eq first abstracts it
to a set of ground clauses which are then passed to the ground solver.
If the ground abstraction is unsatisfiable, then the set of first-order
clauses is also unsatisfiable.
Otherwise, literals are selected from the first-order clauses based on
the model of the ground solver.
The labelled unit superposition calculus checks whether selected literals
are conflicting.
If they are conflicting, then clauses are instantiated such that the
ground solver has to refine its model in order to resolve the conflict.
Otherwise, satisfiability of the initial first-order clause set is shown.
Clause selection and literal selection in the unit superposition
calculus are implemented in separate given clause algorithms.
Relevant substitutions are accumulated in labels during unit superposition
inferences and then used to instantiate clauses.
For redundancy elimination iProver-Eq uses demodulation, dismatching
constraints and global subsumption.
In order to efficiently propagate redundancy elimination from instantiation
into unit superposition, we implemented different representations of labels
based on sets, AND/OR-trees and OBDDs.
Non-equational resolution and equational superposition inferences provide
further simplifications.
For the LTB division, iProver-Eq-SInE uses axiom selection based on the
SInE algorithm [HV11] as implemented in Vampire [HKV10], i.e., axiom
selection is done by Vampire and proof attempts are done by iProver-Eq.
If no equational literals occur in the input, iProver-Eq falls back
to the inference rules of iProver, otherwise the latter are disabled
and only unit superposition is used.
If all clauses are unit equations, no instantiations need to be generated
and the calculus is run without the otherwise necessary bookkeeping.
Although Isabelle is designed for interactive proof development,
it is a little known fact that it is possible to run Isabelle from
the command line, passing in a theory file with a formula to solve.
Isabelle theory files can include Standard ML code to be executed when the
file is processed.
The TPTP2X Isabelle format module outputs a THF problem in Isabelle/HOL
syntax, augmented with ML code that (1) runs the ten tactics in sequence,
each with a CPU time limit, until one succeeds or all fail, and
(2) reports the result and proof (if found) using the SZS
standards.
A Perl script is used to insert the CPU time limit (equally divided
over the ten tactics) into TPTP2X's Isabelle format output, and then
run the command line isabelle-process on the resulting theory file.
The implementation of Isabelle relies on a small LCF-style kernel, meaning that
inferences are implemented as operations on an abstract theorem
datatype. Assuming the kernel is correct, all values of type theorem
are correct by construction.
Most of the code for Isabelle was written by the Isabelle teams at the
University of Cambridge and the Technische Universität München.
A notable exception is the metis proof method, which was taken from
the HOL4 theorem prover (also implemented in ML).
Isabelle/HOL is available for all major platforms under a BSD-style license
from
leanCoP can read formulae using the leanCoP syntax as well as the
(raw) TPTP syntax format.
Equality axioms are automatically added if required.
The core leanCoP prover returns a very compact connection proof, which is
translated into a readable proof.
Several output formats are available.
As the main enhancement leanCoP 2.2 now supports the output of proofs in
an unofficial TPTP syntax format for representing derivations in connection
(tableau) calculi [OS10].
Furthermore, besides the Linux/Unix and MacOS platforms, most Windows
platforms are now supported as well.
The source code of leanCoP 2.2 is available under the GNU general
public license. Together with more information it can be found on the
leanCoP website at
The improved performance of LEO-II in comparison to its predecessor LEO
(implemented in LISP) is due to several novel features including the
exploitation of term sharing and term indexing techniques [TB06], support
for primitive equality reasoning (extensional higher-order RUE resolution),
and improved heuristics at the calculus level.
One recent development is LEO-II's new parser: in addition to the TPTP
THF language, this parser now also supports the TPTP FOF and CNF
languages.
Hence, LEO-II can now also be used for FOF and CNF problems.
Unfortunately the LEO-II system still uses only a very simple
sequential collaboration model with first-order ATPs instead of using
the more advanced, concurrent and resource-adaptive OANTS architecture
[BS+08] as exploited by its predecessor LEO.
The LEO-II system is distributed under a BSD style license, and it is
available from:
The improved performance of LEO-II in comparison to its predecessor LEO
(implemented in LISP) is due to several novel features including the
exploitation of term sharing and term indexing techniques [TB06], support
for primitive equality reasoning (extensional higher-order RUE resolution),
and improved heuristics at the calculus level.
LEO-II's parser supports the TPTP THF0 language and also the TPTP languages
FOF and CNF.
Unfortunately the LEO-II system still uses only a very simple
sequential collaboration model with first-order ATPs instead of using
the more advanced, concurrent and resource-adaptive OANTS architecture
[BS+08] as exploited by its predecessor LEO.
The LEO-II system is distributed under a BSD style license, and it is
available from:
LEO-II will again participate in the FOF and CNF categories in
order to evaluate its performance for these fragments.
For this, note that LEO-II still employs its own input processing and
normalization techniques, and that calls to prover E are applied only
modulo Hurd's fully typed translation.
MELIA accepts formulas in the TFF format (typed TPTP formulas, see
the TPTP technical
report).
It includes a pre-processor for transforming such formulas into (sorted)
clausal logic over foreground sorts specified in the input files and
built-in linear integer arithmetic.
Expectations are not high.
Experiments with various first order calculi [Hur03]
have shown a given clause algorithm and ordered resolution to best
suit this application, and that is what Metis 2.3 implements.
Since equality often appears in interactive theorem prover goals, Metis 2.3
also implements the ordered paramodulation calculus.
In addition to standard age and size measures, Metis 2.3 uses
finite models to weight clauses in the Passive set.
When integrated with higher order logic, an interpretation of known
functions and relations is manually constructed to make many of their
standard properties valid in the finite model. For example, the domain
of the model is the set {0,...,7}, and the higher order logic
arithmetic functions are interpreted in the model modulo 8.
Unknown functions and relations are interpreted randomly, but with a
bias towards making supporting theorems valid in the model.
The finite model strategy carries over to TPTP problems, by manually
interpreting a collection of functions and relations that appear in TPTP axiom
files in such a way as to make the axioms valid in the model.
Metis 2.3 reads problems in TPTP format and outputs detailed proofs
in TSTP format, where each refutation step is one of 6 simple
inference rules.
Metis 2.3 implements a complete calculus, so when the set of clauses is
saturated it can soundly declare the input problem to be unprovable (and
outputs the saturation set).
Metis 2.3 is free software, released under the MIT license.
It can be downloaded from
The system is also able to work with second order statements. It
may also receive knowledge and know-how for a specific domain from a
human user; see [Pas89] and [Pas93]. These two possibilities
are not used while working with the TPTP Library.
Muscadet is available from:
SAT solvers are particularly sensitive to the encoding of problems, so special
care is needed when translating HOL formulas.
As a rule, HOL scalars are mapped to FORL singletons and functions are mapped to
FORL relations accompanied by a constraint.
An n-ary first-order function (curried or not) can be coded as an
(n + 1)-ary relation accompanied by a constraint. However, if the return
type is the type of Booleans, the function is more efficiently coded as an
unconstrained n-ary relation.
Higher-order quantification and functions bring complications of their own. A
function from σ to τ cannot be directly passed as an argument in FORL;
Nitpick's workaround is to pass |σ| arguments of type τ that encode a
function table.
Nitpick is available as part of Isabelle/HOL for all major platforms under a
BSD-style license from
The translation from HOL to Kodkod's first-order relational logic (FORL) is
parameterized by the cardinalities of the atomic types occurring in it. Nitrox
enumerates the possible cardinalities for the universe. If a formula has a
finite counterexample, the tool eventually finds it, unless it runs out of
resources.
Nitpick is optimized to work with higher-order logic (HOL) and its definitional
principles (e.g., (co)inductive predicates, (co)inductive datatypes,
(co)recursive functions). When invoked on untyped first-order problem, few of
its optimizations come into play, and the problem handed to Kodkod is
essentially a first-order relational logic (FORL) rendering of the TPTP FOF
problem. One exception is nested quantifiers, which
Nitpick optimizes before Kodkod gets a chance to look at them [BN10].
Acknowledgments: Ross Overbeek, Larry Wos, Bob Veroff, and
Rusty Lusk contributed to the development of Otter.
Paradox incorporates the following features: Polynomial-time
clause splitting heuristics, the use of incremental SAT,
static symmetry reduction techniques, and the use of sort
inference.
Refute is available as part of Isabelle/HOL for all major platforms under a
BSD-style license from
The theoretical basis of search is a complete ground tableau calculus for
higher-order logic [BS10] with a choice operator [BB11].
A problem is given in the THF format.
A branch is formed from the axioms of the problem and the negation of the
conjecture (if any is given).
From this point on, Satallax tries to determine unsatisfiability or
satisfiability of this branch.
Satallax progressively generates higher-order formulae and corresponding
propositional clauses [Bro11].
These formulae and propositional clauses correspond to instances of the
tableau rules.
Satallax uses the SAT solver MiniSat as an engine to test the current set
of propositional clauses for unsatisfiability.
If the clauses are unsatisfiable, then the original branch is unsatisfiable.
If there are no quantifiers at function types, the generation of higher-order
formulae and corresponding clauses may terminate [BS09,BS09].
In such a case, if MiniSat reports the final set of clauses as satisfiable,
then the original set of higher-order formulae is satisfiable (by a standard
model in which all types are interpreted as finite sets).
A strategy schedule is an ordered collection of modes with information about
how much time the mode should be allotted.
Satallax tries each of the modes for a certain amount of time sequentially.
Satallax 2.1 has eight strategy schedules which were determined through
experimentation using the THF problems in version 5.1.0 of the TPTP library.
One of these eight strategy schedules is chosen based on the amount of time
Satallax is given to solve the problem.
For example, if Satallax is given 180 seconds to solve the problem, then
a schedule with 38 modes is chosen.
For the TFA division, the TFF formulae are converted to FOF using the
standard approach [Wal83], with type predicates to check the types of
numeric variables.
Numbers are represented internally as special constant symbols that carry
both the type and the value.
The basic mathematical functionality is provided by a new inference rule
called ground arithmetic rewriting.
Given a clause as a premise, it traverses the term structure of all its
literals in a bottom up fashion, and performs the symbolically represented
arithmetic operations whenever all arguments are numbers (thus even terms
below uninterpreted symbols may get simplified).
Arithmetic predicates (including equality) may get evaluated, which simplifies
the clause, or might show its redundancy.
The GMP arithmetic library is used for arbitrary precision computation.
To extend the mathematical capabilities beyond ground arithmetic rewriting,
Mathematica is used as an external source of axioms.
SPASS-XDB generates requests from negative arithmetic literals, taking
advantage of the fact that Mathematica understands all arithmetic and
logical symbols.
When SPASS-XDB selects a negative literal that contains arithmetic symbols,
all negative literals in the clause are scanned to check whether they can
be conjoined with the selected literal into a single request.
The S2M2S mediator translates the request into the Mathematica
language, and calls the FindInstance function of Mathematica.
FindInstance inputs an expression and a set of variables that
occur in the expression, and finds instances of the variables that make
the expression true.
Axioms that are instances of the first literal in the request are returned.
The mediator reads the answer, and creates new axioms that are passed back
to SPASS-XDB.
When searching for a proof, TPS first searches for an expansion proof
[Mil87] or an extensional expansion proof [Bro07] of the theorem.
Part of this process involves searching for acceptable matings [And81].
Using higher-order unification, a pair of occurrences of subformulae (which
are usually literals) is mated appropriately on each vertical path through
an expanded form of the theorem to be proved.
The expansion proof thus obtained is then translated [Pfe87] without further
search into a proof of the theorem in natural deduction style.
TPS is implemented in Common Lisp, and is available from
Substitution tree and code tree indexes are used to implement all major
operations on sets of terms, literals and clauses.
Although the kernel of the system works only with clausal normal form, the
shell accepts a problem in the full first-order logic syntax, clausifies
it and performs a number of useful transformations before passing the result
to the kernel.
Also the axiom selection algorithm Sine [HV11] can be enabled as part of
the preprocessing.
When a theorem is proved, the system produces a verifiable proof, which
validates both the clausification phase and the refutation of the CNF.
CVC3 2.4
Clark Barrett1, Cesare Tinelli2
1New York University, 2University of Iowa
Architecture
CVC3 [BT07] is a DPLL-based theorem prover for Satisfiability Modulo
Theories (SMT) problems.
It can be used to prove the validity (or, dually, the satisfiability) of
first-order formulas in a large number of built-in logical theories and
their combination.
CVC3 is the last offspring of a series of popular SMT provers, which
originated at Stanford University with the SVC system. In particular,
it builds on the code base of CVC Lite, its most recent predecessor.
Its high level design follows that of the Sammy prover.
Strategies
CVC3 uses congruence closure for equality and uninterpreted functions,
and Fourier-Motzkin for arithmetic.
Perhaps most relevant to CASC are the strategies for quantifiers.
CVC3 uses E-matching and instantiation heuristics to search for quantifier
instantiations that can close search branches.
In addition, some heuristics for complete instantiation are available.
Implementation
CVC3 is implemented in C++.
For details of the implementation, downloads, additional publications, and
a user's guide, please refer to the CVC3 web site
http://www.cs.nyu.edu/acsys/cvc3
Expected Competition Performance
CVC3 is being entered as an experimental joint venture of the SMT and
ATP community.
We expect its performance to be somewhere in the middle of the pack as we
have made no specific effort to tune it for CASC.
We hope that its performance will shed light on areas where SMT solvers
are strong as opposed to ATP systems.
E(P/LTB) 1.4pre
Stephan Schulz
Technische Universität München, Germany
Architecture
E 1.4pre [Sch02,Sch04] is described in this section.
E is a purely equational theorem prover for full first-order logic with
equality.
It consists of an (optional) clausifier for pre-processing full
first-order formulae into clausal from, and a saturation algorithm
implementing an instance of the superposition calculus with negative
literal selection and a number of redundancy elimination techniques.
E is based on the DISCOUNT-loop variant of the given-clause
algorithm, i.e., a strict separation of active and passive facts.
No special rules for non-equational literals have been implemented.
Resolution is effectively simulated by paramodulation and equality
resolution.
Strategies
Proof search in E is primarily controlled by a literal selection
strategy, a clause evaluation heuristic, and a simplification
ordering. The prover supports a large number of pre-programmed literal
selection strategies. Clause evaluation heuristics can be constructed
on the fly by combining various parametrized primitive evaluation
functions, or can be selected from a set of predefined
heuristics. Clause evaluation heuristics are based on symbol-counting,
but also take other clause properties into account. In particular, the
search can prefer clauses from the set of support, or containing many
symbols also present in the goal. Supported term orderings are several
parametrized instances of Knuth-Bendix-Ordering (KBO) and
Lexicographic Path Ordering (LPO).
Implementation
E is build around perfectly shared terms, i.e. each distinct term is
only represented once in a term bank. The whole set of terms thus
consists of a number of interconnected directed acyclic graphs.
Term memory is managed by a simple mark-and-sweep garbage collector.
Unconditional (forward) rewriting using unit clauses is implemented
using perfect discrimination trees with size and age constraints.
Whenever a possible simplification is detected, it is added as a rewrite
link in the term bank.
As a result, not only terms, but also rewrite steps are shared.
Subsumption and contextual literal cutting (also known as subsumption
resolution) is supported using feature vector indexing [Sch04].
Superposition and backward rewriting use fingerprint indexing, a new
technique combining ideas from feature vector indexing and path
indexing.
Finally, LPO and KBO are implemented using the elegant and efficient
algorithms developed by Bernd Löchner in [Loe06,Loe06].
The prover and additional information are available at
http://www.eprover.org
Expected Competition Performance
E 1.4pre is relatively little changed from last years entry.
The system is expected to perform well in most proof classes, but will at
best complement top systems in the disproof classes.
E-Darwin 1.4
Björn Pelzer
University Koblenz-Landau, Germany
Architecture
E-Darwin 1.4 [BFT04,BPT11] is an automated theorem prover for first order
clausal logic with equality.
It is a modified version of the Darwin prover [BFT04],
intended as a testbed for variants of the Model Evolution calculus [BT03].
Among other things it implements several different approaches [BT05,BPT11]
to incorporating equality reasoning into Model Evolution.
Three principal data structures are used:
the context (a set of rewrite literals),
the set of constrained clauses,
and the set of derived candidates.
The prover always selects one candidate,
which may be a new clause or a new context literal,
and exhaustively computes inferences with this candidate and
the context and clause set, moving the results to the candidate set.
Afterwards the candidate is inserted into one of the context or the clause set,
respectively, and the next candidate is selected.
The inferences are superposition-based.
Demodulation and various means of redundancy detection are used as well.
Strategies
The uniform search strategy is identical to the one employed in the original
Darwin, slightly adapted to account for derived clauses.
Implementation
E-Darwin is implemented in the functional/imperative language OCaml.
Darwin's method of storing partial unifiers
has been adapted to equations and subterm positions
for the superposition inferences in E-Darwin.
A combination of perfect and non-perfect discrimination tree indexes
is used to store the context and the clauses.
The system has been tested on Unix and is available under the GNU Public
License from the E-Darwin website at
http://www.uni-koblenz.de/~bpelzer/edarwin
Expected Competition Performance
There have been some calculus changes since last year,
but the performance should remain similar.
While the original Darwin performs strongly in EPR, E-Darwin is more of
a generalist, less effective in EPR, yet stronger in the other divisions.
E-KRHyper 1.2
Björn Pelzer
University Koblenz-Landau, Germany
Architecture
E-KRHyper [PW07] is a theorem proving and model generation system for
first-order logic with equality.
It is an implementation of the E-hyper tableau calculus [BFP07], which
integrates a superposition-based handling of equality [BG98] into the hyper
tableau calculus [BFN96].
The system is an extension of the KRHyper theorem prover [Wer03],
which implements the original hyper tableau calculus.
Strategies
E-KRHyper uses a uniform search strategy for all problems.
The E-hyper tableau is generated depth-first, with E-KRHyper always working
on a single branch.
Refutational completeness and a fair search control are ensured by an
iterative deepening strategy with a limit on the maximum term weight of
generated clauses.
Implementation
E-KRHyper is implemented in the functional/imperative language OCaml.
The system accepts input in the TPTP-format and in the TPTP-supported
Protein-format.
The calculus implemented by E-KRHyper works on clauses, so first order
formula input is converted into CNF by an algorithm similar to the one
used by Otter [McC03], with some additional connector literals to prevent
explosive clause growth when dealing with DNF-like structures.
E-KRHyper operates on an E-hyper tableau which is represented by linked
node records.
Several layered discrimination-tree based indexes (both perfect and
non-perfect) provide access to the clauses in the tableau and support
backtracking.
The system runs on Unix and MS-Windows platforms, and is available under
the GNU Public License from the E-KRHyper
website at
http://www.uni-koblenz.de/~bpelzer/ekrhyper
Expected Competition Performance
There have been minor improvements since the last version,
but overall E-KRHyper will remain in the middle ground.
E-MaLeS 1.0
Daniel Kuehlwein1, Josef Urban1, Stephan Schulz2
1Radboud Universiteit Nijmegen, The Netherlands,
2Technische Universität München, Germany
Architecture
E-MaLeS is a meta system for E 1.3.
It uses kernel methods to learn which of E's strategies are most likely
to solve a problem.
Furthermore E-MaLeS runs several strategies for a shorter time instead
of one strategy for the whole time.
Strategies
Since E-MaLeS is based on E, please refer to E's description for its
internal procedures.
The performance of E's strategies was evaluated over the TPTP problems.
Each problem was characterised by E's problem features.
Then kernel methods were used to learn which strategy is most likely
to solve a problem given its features.
Implementation
E-MaLeS is implemented in Python using the Numpy/Scipy library.
Expected Competition Performance
Since E-MaLeS is based on E we expect it to perform at least as good as E.
FIMO 0.2
Orkunt Sabuncu
University of Potsdam, Germany
Architecture
FIMO is a system for computing finite models of first-order formulas by
incremental Answer Set Programming (iASP).
The input theory is transformed to an incremental logic program.
If any, answer sets of this program represent finite models of the input
theory.
iClingo is used for computing answer sets of iASP programs.
Strategies
FIMO is the successor of the system fmc2iasp [GSS10].
Unlike fmc2iasp, FIMO does not rely on flattening for translating the
input theory to a satisfiablity problem.
FIMO features symmetry breaking and incremental answer set solving
provided by the underlying iASP system iClingo.
Implementation
FIMO is developed in Python.
Being the successor of fmc2iasp, it will be available from
http://potassco.sourceforge.net
Expected Competition Performance
fmc2iasp performed well against Paradox in FNT division of 2009 (not in
competition but published in [GSS10]).
However, FIMO is based on a different strategy.
H2WO4 11.07
David Stanovsky
Charles University in Prague, Czech Republic
Architecture
Tungstic acid is a substance reacting with problems in the first order
logic wit h special arithmetical functions (as defined in
the TPTP library), producing a code in Wolfram's Mathematica
that attempts to find a solution.
The first version, H2WO4 11.07, is a simple script
transla ting between the two languages and calling built-in functions
of Mathematica to solve the problem.
Strategies
The current version is using various combinations of three functions:
Implementation
This is a pair of Perl scripts: one for parsing TPTP problems into
Mathematica, the other for processing Mathematica's output.
The scripts are available at
http://www.karlin.mff.cuni.cz/~stanovsk/h2wo4/
They were tested with the text-based interface of Mathematica 7.0.
Expected Competition Performance
My motivation is, to compare the other entrants with a commercial,
state-of-the- art computer algebra system (Mathematica, in my case).
I wish it did not do well, compared to specially developed systems :-)
Mathematica is strong in pure arithmetic and weak in logical reasoning.
It does well on the last year set of problems in TPTP.
iProver 0.8
Konstantin Korovin
University of Manchester, United Kingdom
Architecture
iProver is an automated theorem prover based on an instantiation calculus
Inst-Gen [GK03,Kor09] which is complete for first-order logic.
One of the distinctive features of iProver is a modular combination of
first-order reasoning with ground reasoning.
In particular, iProver currently integrates MiniSat [ES04] for reasoning
with ground abstractions of first-order clauses.
In addition to instantiation, iProver implements ordered resolution calculus
and a combination of instantiation and ordered resolution; see
[Kor08] for the implementation details.
The saturation process is implemented as a modification of a given
clause algorithm.
We use non-perfect discrimination trees for the unification indexes,
priority queues for passive clauses, and a compressed vector index for
subsumption and subsumption resolution (both forward and backward).
The following redundancy eliminations are implemented: blocking non-proper
instantiations; dismatching constraints [GK04,Kor08]; global subsumption
[Kor08]; resolution-based simplifications and propositional-based
simplifications.
We implemented a compressed feature vector index
for efficient forward/backward subsumption and subsumption resolution.
Equality is dealt with (internally) by adding the necessary axioms of equality.
Major additions in the current version are:
Strategies
iProver has around 40 options to control the proof
search including options for literal selection,
passive clause selection, frequency of calling the SAT solver,
simplifications and options for combination of instantiation with resolution.
At CASC iProver will execute a small number of fixed schedules of selected
options depending on general syntactic properties such as Horn/non-Horn,
equational/non-equational, and maximal term depth.
Implementation
iProver is implemented in OCaml and for the ground reasoning uses MiniSat.
iProver accepts FOF and CNF formats, where either Vampire [RV02] or E prover
[Sch04] is used for clausification of FOF problems.
iProver is available from:
http://www.cs.man.ac.uk/~korovink/iprover/
Expected Competition Performance
iProver 0.8 is the CASC-J5 EPR division winner.
iProver(-SInE) 0.9
Konstantin Korovin
The University of Manchester, United Kingdom
Architecture
iProver is an automated theorem prover based on an instantiation calculus
Inst-Gen [GK03,Kor09] which is complete for first-order logic.
One of the distinctive features of iProver is a modular combination of
first-order reasoning with ground reasoning.
In particular, iProver currently integrates MiniSat [ES04] for reasoning
with ground abstractions of first-order clauses.
In addition to instantiation, iProver implements ordered resolution calculus
and a combination of instantiation and ordered resolution; see
[Kor08] for the implementation details.
The saturation process is implemented as a modification of a given
clause algorithm.
iProver uses non-perfect discrimination trees for the unification indexes,
priority queues for passive clauses, and a compressed vector index for
subsumption and subsumption resolution (both forward and backward).
The following redundancy eliminations are implemented: blocking non-proper
instantiations; dismatching constraints [GK04,Kor08]; global subsumption
[Kor08]; resolution-based simplifications and propositional-based
simplifications.
A compressed feature vector index is used
for efficient forward/backward subsumption and subsumption resolution.
Equality is dealt with (internally) by adding the necessary axioms of equality
with an option of using Brand's transformation.
In the LTB division, iProver-SInE uses axiom selection based
on the SInE algorithm [HV11] as implemented in Vampire [HKV10],
i.e., axiom selection is done by Vampire and
proof attempts are done by iProver.
Strategies
iProver has around 40 options to control the proof
search including options for literal selection,
passive clause selection, frequency of calling the SAT solver,
simplifications and options for combination of instantiation with resolution.
At CASC iProver will execute a small number of fixed schedules of selected
options depending on general syntactic properties such as Horn/non-Horn,
equational/non-equational, and maximal term depth.
Implementation
iProver is implemented in OCaml and for the ground reasoning uses MiniSat.
iProver accepts FOF and CNF formats,
where Vampire [HKV10] is used for clausification of FOF problems.
http://www.cs.man.ac.uk/~korovink/iprover/
Expected Competition Performance
iProver 0.9 is expected to perform slightly better than the previous version.
iProver-Eq(-SInE) 0.7
Christoph Sticksel, Konstantin Korovin
The University of Manchester, United Kingdom
Architecture
iProver-Eq [KS10] extends the iProver system [Kor08] with built-in
equational reasoning, along the lines of [GK04].
As in the iProver system, first-order reasoning is combined with ground
satisfiability checking where the latter is delegated to an off-the-shelf
ground solver.
Strategies
Proof search options in iProver-Eq control clause and literal
selection in the respective given clause algorithms.
Equally important is the global distribution of time between the inference
engines and the ground solver.
At CASC, iProver-Eq will execute a fixed schedule of selected options.
Implementation
iProver-Eq is implemented in OCaml and uses CVC3 [BT07] for the
ground reasoning in the equational case and MiniSat [ES04] in the
non-equational case.
iProver-Eq accepts FOF and CNF formats, where Vampire [HKV10] is used for
clausification of FOF problems.
iProver-Eq is available at
http://www.cs.man.ac.uk/~sticksec/iprover-eq
Expected Competition Performance
iProver-Eq has seen many optimisations from the version in the previous CASCs.
We expect reasonably good performance in all divisions, including the EPR
divisions where instantiation-based methods are particularly strong.
Isabelle/HOL (a.k.a. IsabelleP) 2011
Jasmin C. Blanchette1, Lawrence C. Paulson2,
Tobias Nipkow1, Makarius Wenzel1,
Stefan Berghofer1
1Technische Universität München, Germany
2University of Cambridge, United Kingdom
Architecture
Isabelle/HOL 2011 [NPW09] is the higher-order logic incarnation of the
generic proof assistant
Isabelle2011.
Isabelle/HOL provides several automatic proof tactics, notably an equational
reasoner [Nip89], a classical reasoner [PN94], a tableau prover [Pau99], and
a first-order resolution-based prover [Hur03].
Strategies
The IsabelleP tactic submitted to the competition simply tries the
following tactics sequentially:
Implementation
Isabelle is a generic theorem prover written in Standard ML. Its meta-logic,
Isabelle/Pure, provides an intuitionistic fragment of higher-order logic. The
HOL object logic extends pure with a more elaborate version of higher-order
logic, complete with the familiar connectives and quantifiers. Other object
logics are available, notably FOL (first-order logic) and ZF
(Zermelo–Fraenkel set theory).
http://www.cl.cam.ac.uk/research/hvg/Isabelle
Expected Competition Performance
Results from last year would suggest that Isabelle will finish third
in the THF category, after Satallax and LEO-II. However, since last year,
we have added the sledgehammer proof methods, which we expect will
improve our chances.
leanCoP 2.2
Jens Otten
University of Potsdam, Germany
Architecture
leanCoP [OB03,Ott08] is an automated theorem prover for classical
first-order logic with equality.
It is a very compact implementation of the connection (tableau) calculus
[Bib87,LS01].
Strategies
The reduction rule of the connection calculus is applied before the extension
rule.
Open branches are selected in a depth-first way.
Iterative deepening on the proof depth is used to achieve completeness.
Additional inference rules and strategies include regularity, lemmata, and
restricted backtracking [Ott10].
leanCoP uses an optimized structure-preserving transformation into clausal
form [Ott10] and a fixed strategy scheduling.
Implementation
leanCoP is implemented in Prolog (ECLiPSe, SICStus and SWI Prolog
are currently supported).
The source code of the core prover is only a few lines long and fits on
half a page.
Prolog's built-in indexing mechanism is used to quickly find connections.
http://www.leancop.de
Expected Competition Performance
As the core prover has not changed, we expect the performance of leanCoP 2.2
to be similar to the performance of leanCoP 2.1.
LEO-II 1.2
Christoph Benzmüller1,
Frank Theiss2
1Articulate Software, USA,
2Saarland University, Germany
Architecture
LEO-II [BP+08], the successor of LEO [BK98], is a higher-order ATP system
based on extensional higher-order resolution.
More precisely, LEO-II employs a refinement of extensional higher-order
RUE resolution [Ben99].
LEO-II is designed to cooperate with specialist systems for fragments of
higher-order logic.
By default, LEO-II cooperates with the first-order ATP systems E [Sch02].
LEO-II is often too weak to find a refutation amongst the steadily growing
set of clauses on its own.
However, some of the clauses in LEO-II's search space attain a special
status: they are first-order clauses modulo the application of an
appropriate transformation function.
The default transformation is Hurd's fully typed translation [Hur03].
Therefore, LEO-II launches a cooperating first-order ATP system every n
iterations of its (standard) resolution proof search loop (e.g.,
n = 10).
If the first-order ATP system finds a refutation, it communicates its success
to LEO-II in the standard SZS format.
Communication between LEO-II and the cooperating first-order ATP system
uses the TPTP language and standards.
Strategies
LEO-II employs an adapted "Otter loop".
In contrast to its competitor systems (such as Satallax, TPS, and IsabelleP)
LEO-II so far only employs a monolithic search strategy, that is, it does not
use strategy scheduling to try different search strategies or flag settings.
However, LEO-II version 1.2 for the first time includes some very
naive relevance filtering and selectively applies some simple scheduling for
different relevance filters.
Implementation
LEO-II is implemented in Objective Caml version 3.10, and its problem
representation language is the new TPTP THF language [BRS08].
In fact, the development of LEO-II has largely paralleled the development
of the TPTP THF language and related infrastructure [SB10].
http://leoprover.org
Expected Competition Performance
LEO-II 1.2 is the CASC-J5 THF division winner.
LEO-II 1.2.8
Christoph Benzmüller1,
Frank Theiss2
1Freie Universität Berlin, Germany,
2Saarland University, Germany
Architecture
LEO-II [BP+08], the successor of LEO [BK98], is a higher-order ATP system
based on extensional higher-order resolution.
More precisely, LEO-II employs a refinement of extensional higher-order
RUE resolution [Ben99].
LEO-II is designed to cooperate with specialist systems for fragments of
higher-order logic.
By default, LEO-II cooperates with the first-order ATP systems E [Sch02].
LEO-II is often too weak to find a refutation amongst the steadily growing
set of clauses on its own.
However, some of the clauses in LEO-II's search space attain a special
status: they are first-order clauses modulo the application of an
appropriate transformation function.
The default transformation is Hurd's fully typed translation [Hur03].
Therefore, LEO-II launches a cooperating first-order ATP system every n
iterations of its (standard) resolution proof search loop (e.g., 10).
If the first-order ATP system finds a refutation, it communicates its success
to LEO-II in the standard SZS format.
Communication between LEO-II and the cooperating first-order ATP system
uses the TPTP language and standards.
Strategies
LEO-II employs an adapted "Otter loop".
Moreover, LEO-II now also uses some very basic strategy scheduling to try
different search strategies or flag settings.
These search strategies also include some different relevance filters.
Implementation
LEO-II is implemented in Objective Caml version 3.12, and its problem
representation language is the TPTP THF language [BRS08].
In fact, the development of LEO-II has largely paralleled the development
of the TPTP THF language and related infrastructure [SB10].
http://leoprover.org
Expected Competition Performance
LEO-II has not improved much over the last year.
The main modifications concern proof output in order to enable proof
reconstruction/verification of LEO-II proofs in the Isabelle system.
I doubt that LEO-II will be able to defend its championship at this
years CASC since some of its competitor systems, such as Satallax,
have significantly changed over the last year.
However, it is great to have such a strong dynamics in the THF category.
MELIA 0.1.2
Peter Baumgartner
NICTA and ANU, Australia
Architecture
MELIA is a theorem prover for the Model Evolution Calculus with Equality
and Linear Integer Arithmetic [BT11,BPT11].
It also integrates most of the theoretical developments of the Model
Evolution calculus, in particular superposition-like inference rules for
equality handling and built-in inference rules for linear integer arithmetic.
Strategies
MELIA features a variety of flag settings to control its search, e.g., for
selecting literals to focus inferences on.
In the competition, MELIA uses the same search strategy for all problems.
It includes a heuristics to select the next literal to split on and to
select the next superposition inference.
Implementation
MELIA has been written (from scratch) in Scala, and runs on the Java virtual
machine.
Expected Competition Performance
MELIA's participates in the TFA division only.
It is in a very early stage and has not been tuned for performance (or for
the competition).
It lacks term indexing techniques to make it more efficient.
When integer arithmetic is involved, MELIA is incomplete even if
(theoretically) unneccessary, and it is likely to miss proving some
theorems because of that.
Metis 2.3
Joe Hurd
Galois Inc., USA
Architecture
Metis 2.3 [Hur03] is a proof tactic used in the
HOL4 interactive theorem prover.
It works by converting a higher order logic goal to a set of clauses in
first order logic, with the property that a refutation of the clause set
can be translated to a higher order logic proof of the original goal.
Strategies
Metis 2.3 uses a fixed strategy for every input problem. Negative
literals are always chosen over positive literals, and terms are
ordered using the Knuth-Bendix ordering with uniform symbol weight and
precedence favouring reduced arity.
Implementation
Metis 2.3 is written in Standard ML, for ease of integration with
HOL4. It uses indexes for resolution, paramodulation, (forward)
subsumption and demodulation. It keeps the Active clause set
reduced with respect to all the unit equalities so far derived.
http://www.gilith.com/software/metis
Expected Competition Performance
There have been only minor changes to Metis 2.3 since CASC J5, so
it is expected to perform at approximately the same level in CASC 23
and end up in the lower third of the table.
MetiTarski 1.8
Lawrence C. Paulson
University of Cambridge, United Kingdom
Architecture
MetiTarski [AP10,DA+09] is an automatic theorem prover based on a combination
of resolution and QEPCAD-B [Bro03], a decision procedure for the theory of
real closed fields.
It is designed to prove theorems involving real-valued special functions such
as log, exp, sin, cos, atan and sqrt.
In particular, it is designed to prove universally quantified inequalities
involving such functions.
Support for existentially quantified inequalities is very limited.
MetiTarski is a modified version of Joe Hurd's theorem prover, Metis [Hur03].
Strategies
MetiTarski employs resolution, augmented with axiom files that specify upper
and lower bounds of the special functions mentioned in the problem.
MetiTarski also has code to simplify polynomials and put them into canonical
form.
The resolution calculus is extended with a literal deletion rule: if the
decision procedure finds a literal to be inconsistent with its context
(which consists of known facts and the negation of the other literals in
the clause), then it is deleted.
From 2011, MetiTarski also implements case-splitting with backtracking.
MetiTarski is incomplete, and nothing can be inferred if it fails to prove
a conjecture.
Implementation
MetiTarski, like Metis, is implemented in Standard ML.
QEPCAD is implemented in C and C++.
The latest version of MetiTarski can be downloaded from
http://www.cl.cam.ac.uk/~lp15/papers/Arith/
Expected Competition Performance
No expectation provided.
Muscadet 4.1
Dominique Pastre
University Paris Descartes, France
Architecture
The Muscadet theorem prover is a knowledge-based system.
It is based on Natural Deduction, following the terminology of
[Ble71] and [Pas78],
and uses methods which resembles those used by humans.
It is composed of an inference engine, which interprets and executes rules,
and of one or several bases of facts,
which are the internal representation of "theorems to be proved".
Rules are either universal and put into the system, or built by the system
itself by metarules from data (definitions and lemmas).
Rules may add new hypotheses, modify the conclusion, create objects,
split theorems into two or more subtheorems
or build new rules which are local for a (sub-)theorem.
Strategies
There are specific strategies for existential, universal, conjonctive or
disjunctive hypotheses and conclusions, and equalities.
Functional symbols may be used, but an automatic creation of intermediate
objects allows deep subformulae to be flattened and treated as if the
concepts were defined by predicate symbols.
The successive steps of a proof may be forward deduction (deduce new hypotheses
from old ones), backward deduction (replace the conclusion by a new one),
refutation (only if the conclusion is a negation), search for objects
satisfying the conclusion or dynamic building of new rules.
Implementation
Muscadet [Pas01] is implemented in SWI-Prolog.
Rules are written as more or less declarative Prolog clauses.
Metarules are written as sets of Prolog clauses.
The inference engine includes the Prolog interpreter and some procedural Prolog
clauses.
A theorem may be split into several subtheorems, structured as a tree with "and" and "or" nodes.
All the proof search steps are memorized as facts including all the elements which will be necessary to extract later the useful steps
(the name of the executed action or applied rule, the new facts added or rule dynamically built, the antecedents and a brief explanation).
http://www.math-info.univ-
Expected Competition Performance
The best performances of Muscadet will be for problems
manipulating many concepts in which all statements (conjectures,
definitions, axioms) are expressed in a manner similar to the
practice of humans, especially of mathematicians [Pas02,Pas07].
It will have poor performances for problems using few concepts but large
and deep formulas leading to many splittings.
Its best results will be in set theory, especially for functions and relations.
It's originality is that proofs are given in natural style.
Nitpick (a.k.a. IsabelleN) 2011
Jasmin C. Blanchette
Technische Universität München, Germany
Architecture
Nitpick [BN10] is an open source counterexample generator for Isabelle/HOL
[NPW09].
It builds on Kodkod [TJ07], a highly optimized first-order relational model
finder based on SAT.
The name Nitpick is appropriated from a now retired Alloy precursor.
Strategies
Nitpick employs Kodkod to find a finite model of the negated conjecture. The
translation from HOL to Kodkod's first-order relational logic (FORL) is
parameterized by the cardinalities of the atomic types occurring in it. Nitpick
enumerates the possible cardinalities for each atomic type, exploiting
monotonicity to prune the search space [BK11].
If a formula has a finite counterexample, the tool eventually finds it,
unless it runs out of resources.
Implementation
Nitpick, like most of Isabelle/HOL, is written in Standard ML. Unlike Isabelle
itself, which adheres to the LCF small-kernel discipline, Nitpick does not
certify its results and must be trusted.
http://www.cl.cam.ac.uk/research/hvg/Isabelle
Expected Competition Performance
Thanks to Kodkod's amazing power, we expect that Nitpick will beat both
Satallax and Refute with its hands tied behind its back in the TNT category.
Nitrox 0.2
Jasmin C. Blanchette1, Emina Torlak2
1Technische Universität München, Germany
2IBM Research, USA
Architecture
Nitrox is the first-order version of Nitpick [BN10], an an open source
counterexample generator for Isabelle/HOL [NPW09].
It builds on Kodkod [TJ07], a highly optimized first-order relational
model finder based on SAT.
The name Nitrox is a portmanteau of Nitpick
and Paradox (clever, eh?).
Strategies
Nitrox employs Kodkod to find a finite model of the negated conjecture. It
performs a few transformations on the input, such as pushing quantifiers inside,
but 99% of the solving logic is in Kodkod and the underlying SAT solver.
Implementation
Nitrox, like most of Isabelle/HOL, is written in Standard ML. Unlike Isabelle
itself, which adheres to the LCF small-kernel discipline, Nitrox does not
certify its results and must be trusted. Kodkod is written in Java. MiniSat 1.14
is used as the SAT solver.
Expected Competition Performance
Since Nitpick was designed for HOL, it doesn't have any type inference
àla Paradox.
It also doesn't use the SAT solver incrementally, which penalizes it
a bit (but not as much as the missing type inference).
Kodkod itself is known to perform less well on FOF than Paradox, because
it is designed and optimized for a somewhat different logic, FORL.
On the other hand, Kodkod's symmetry breaking seems better calibrated
than Paradox's.
Hence, we expect Nitrox to end up in second place at best in the TNF category.
Otter 3.3
William McCune
Argonne National Laboratory, USA
Architecture
Otter 3.3 [McC03] is an ATP system for statements in first-order (unsorted)
logic with equality.
Otter is based on resolution and paramodulation applied to clauses.
An Otter search uses the "given clause algorithm", and typically involves
a large database of clauses; subsumption and demodulation play an important
role.
Strategies
Otter's original automatic mode, which reflects no tuning to the TPTP
problems, will be used.
Implementation
Otter is written in C.
Otter uses shared data structures for clauses and terms, and it uses
indexing for resolution, paramodulation, forward and backward subsumption,
forward and backward demodulation, and unit conflict.
Otter is available from:
http://www.cs.unm.edu/~mccune/otter/
Expected Competition Performance
Otter has been entered into CASC as a stable benchmark against which
progress can be judged (there have been only minor changes to Otter since
1996 [MW97], nothing that really affects its performance in CASC).
This is not an ordinary entry, and we do not hope for Otter to do well
in the competition.
Paradox 3.0
Koen Claessen, Niklas Sörensson
Chalmers University of Technology, Sweden
Architecture
Paradox [CS03] is a finite-domain model generator.
It is based on a MACE-style [McC03] flattening and instantiating of the
first-order clauses into propositional clauses, and then the use of
a SAT solver to solve the resulting problem.
Strategies
There is only one strategy in Paradox:
In this way, Paradox can be used both as a model finder and as an EPR solver.
Implementation
The main part of Paradox is implemented in Haskell using the GHC compiler.
Paradox also has a built-in incremental SAT solver which is written in C++.
The two parts are linked together on the object level using Haskell's Foreign
Function Interface.
Expected Competition Performance
Paradox 3.0 is the CASC-J5 FNT division winner.
Refute (a.k.a. IsabelleM) 2011
Jasmin C. Blanchette1, Tjark Weber2
1Technische Universität München, Germany
2University of Cambridge, United Kingdom
Architecture
Refute [Web08] is an open source counterexample generator for Isabelle/HOL
[NPW09] based on a SAT solver, and Nitpick's [BN10] precursor.
Strategies
Refute employs a SAT solver to find a finite model of the negated conjecture.
The translation from HOL to propositional logic is parameterized by the
cardinalities of the atomic types occurring in the conjecture. Refute enumerates
the possible cardinalities for each atomic type. If a formula has a finite
counterexample, the tool eventually finds it, unless it runs out of resources.
Implementation
Refute, like most of Isabelle/HOL, is written in Standard ML. Unlike Isabelle
itself, which adheres to the LCF small-kernel discipline, Refute does not
certify its results and must be trusted.
http://www.cl.cam.ac.uk/research/hvg/Isabelle
Expected Competition Performance
We expect that Refute will solve about 75% of the problems solved by Nitpick in
the TNT category, and perhaps a few problems that Nitpick cannot solve.
Satallax 2.1
Chad E. Brown
Saarland University, Germany
Architecture
Satallax [Bro11] is an automated theorem prover for higher-order logic.
The particular form of higher-order logic supported by Satallax is Church's
simple type theory with extensionality and choice operators.
The SAT solver MiniSat [ES04] is responsible for much of the search for a
proof.
Strategies
There are a number of flags that control the order in which formulas and
instantiation terms are considered and propositional clauses are generated.
Other flags activate some optional extensions to the basic proof procedure.
A collection of flag settings is called a mode.
Approximately 250 modes have been tried so far.
Regardless of the mode, the search procedure is sound and complete for
higher-order logic with choice.
This implies that if search terminates with a particular mode, then we can
conclude that the original set of formulae is unsatisfiable or satisfiable.
Implementation
Satallax 2.1 is implemented in OCaml.
A foreign function interface is used to interact with MiniSat.
Satallax is available from
http://satallax.com
Expected Competition Performance
Satallax 1.4 proved to be competitive in the THF division of CASC last year,
coming in second out of four systems.
Since last year, Satallax has been reimplemented in OCaml (instead of Lisp)
and the integration with MiniSat has been improved.
In addition, a number of new heuristics and flags to control these heuristics
have been added.
Based on these improvements, Satallax is expected to perform well in the
THF division of CASC this year.
On the other hand, Satallax is typically weak on problems requiring equality
reasoning or nontrivial higher-order instantiations.
Since Satallax can sometimes be used to determine satisfiability of a set
of formulas, it will also compete in the TNT demonstration division.
SPASS+T---2.2.14
Uwe Waldmann1,
Stephan Zimmer2
1Max-Planck-Institut für Informatik, Germany,
2AbsInt GmbH, Germany
Architecture
SPASS+T is an extension of the superposition-based theorem prover SPASS
that integrates algebraic knowledge into SPASS in three complementary ways:
by passing derived formulas to an external SMT procedure
(currently Yices or CVC3), by adding standard axioms,
and by built-in arithmetic simplification and inference rules.
A first version of the system has been described in [PW06].
In the current version, a much more sophisticated coupling of
the SMT procedure has been added [Zim07].
Strategies
Standard axioms and built-in arithmetic simplification and inference rules
are integrated into the standard main loop of SPASS.
Inferences between standard axioms are excluded, so the
user-supplied formulas are taken as set of support.
The external SMT procedure runs in parallel in a separate process,
leading occasionally to non-deterministic behaviour.
Implementation
SPASS+T is implemented in C. The system is available from
http://www.mpi-inf.mpg.de/~uwe/software/#TSPASS
Expected Competition Performance
SPASS+T came a close second in the TFA division of last CASC,
and is has improved noticeably since then.
SPASS-XDB 3.01X0.6
Geoff Sutcliffe1,
Martin Suda2,3
1University of Miami, USA,
2Max-Planck-Institut für Informatik and Saarland University, Germany,
3Charles University in Prague, Czech Republic
Architecture
SPASS-XDB [SS+09,SS+10] is an extended version of the well-known,
state-of-the-art, SPASS automated theorem proving system [WF+09].
The original SPASS reads a problem, consisting of axioms and a conjecture,
in TPTP format from a file, and searches for a proof by refutation of the
negated conjecture.
SPASS-XDB adds the capability of retrieving extra positive unit axioms
(facts) from external sources during the proof search (hence the "XDB",
standing for eXternal DataBases).
The axioms are retrieved asynchronously, on-demand, based on an expectation
that they will contribute to completing the proof.
The axioms are retrieved from a range of external sources, including SQL
databases, SPARQL endpoints, WWW services, computation sources (e.g.,
computer algebra systems), etc., using a TPTP standard protocol.
Strategies
Generally, SPASS-XDB follows SPASS' strategies.
However, SPASS, like most (all?) ATP systems, was designed under the
assumption that all formulae are in the problem file, i.e., it is ignorant
that external axioms might be delivered.
To regain completeness, constraints on SPASS’ search are relaxed in SPASS-XDB.
This increases the search space, so the constraints are relaxed
in a controlled, incremental fashion [SS+10].
The search space is also affected by the number of external axioms that can
be delivered, and mechanisms to control the delivery and focus the consequent
search are used [SS+10].
Implementation
SPASS-XDB, as an extension of SPASS, is written in C.
The internal arithmetic is done using the GMP multiple precision arithmetic
library.
Reals are converted to rationals for computation, but results are presented
in real format.
SPASS-XDB is available for use online in the SystemOnTPTP interface:
http://www.tptp.org/cgi-bin/SystemOnTPTP
Expected Competition Performance
SPASS-XDB should do well, particularly on problems that use uncommon
parts of the TPTP TFA syntax, e.g., the evaleq predicate.
TPS 3.110228S1a
Chad E. Brown1,
Peter B. Andrews2
1Saarland University, Germany,
2Carnegie Mellon University, USA
Architecture
TPS is a higher-order theorem proving system that has been developed over
several decades under the supervision of Peter B. Andrews with substantial
work by Eve Longini Cohen, Dale A. Miller, Frank Pfenning, Sunil Issar,
Carl Klapper, Dan Nesmith, Hongwei Xi, Matthew Bishop, Chad E. Brown,
Mark Kaminski, Rémy ChrĂ©tien and Cris Perdue.
TPS can be used to prove theorems of Church's type theory automatically,
interactively, or semi-automatically [AB+96,AB06].
Strategies
Strategies used by TPS in the search process include:
Implementation
TPS has been developed as a research tool for developing, investigating, and
refining a variety of methods of searching for expansion proofs, and
variations of these methods.
Its behavior is controlled by hundreds of flags.
A set of flags, with values for them, is called a mode.
When searching for a proof during the competition, TPS tries each of 80
selected modes in turn for a specified amount of time.
If TPS succeeds in finding an expansion proof, it translates the expansion
proof to a natural deduction proof.
This final step ensures that TPS will not incorrectly report that a formula
has been proven.
http://gtps.math.cmu.edu/tps.html
Expected Competition Performance
TPS 3.080227G1d was the CASC-22 THF division winner.
The main difference between the older version and the newer version is that
more modes are included during the search and the final translation to
natural deduction has been enabled.
Vampire 1.8
Krystof Hoder, Andrei Voronkov
The University of Manchester, United Kingdom
Architecture
Vampire 1.8, is an automatic theorem prover for first-order
classical logic.
It consists of a shell and a kernel.
The kernel implements the calculi of ordered binary resolution and
superposition for handling equality.
The splitting rule in kernel adds propositional parts to clauses, which are
manipulated using binary decision diagrams and a SAT solver.
A number of standard redundancy criteria and simplification techniques are
used for pruning the search space: subsumption, tautology deletion,
subsumption resolution and rewriting by ordered unit equalities.
The reduction ordering is the Knuth-Bendix Ordering.
Implementation
Vampire 1.8 is implemented in C++.
Strategies
The Vampire 1.8 kernel provides a fairly large number of options for
strategy selection. The most important ones are:
Expected Competition Performance
We expect Vampire 1.8 to slightly outperform the last year's Vampire 0.6.
Waldmeister 710
Thomas Hillenbrand
Max-Planck-Institut für Informatik, Germany
Architecture
Waldmeister 710 [Hil03] is a system for unit equational deduction.
Its theoretical basis is unfailing completion in the sense of
[BDP89] with refinements towards ordered completion (cf. [AHL03]).
The system saturates the input axiomatization, distinguishing active facts,
which induce a rewrite relation, and passive facts, which are the one-step
conclusions of the active ones up to redundancy.
The saturation process is parameterized by a reduction ordering and a heuristic
assessment of passive facts [HJL99].
This year's version is the result of polishing and fixing a few things in
last year's.
Implementation
The approach taken to control the proof search is to choose the search
parameters – reduction ordering and heuristic assessment –
according to the algebraic structure given in the problem specification [HJL99].
This is based on the observation that proof tasks sharing major parts of
their axiomatization often behave similarly.
Strategies
The prover is coded in ANSI-C. It runs on Solaris, Linux,
MacOS X, and Windows/Cygwin. The central data strucures are: perfect
discrimination trees for the active facts; group-wise compressions for
the passive ones; and sets of rewrite successors for the conjectures.
Visit the Waldmeister web pages at:
http://www.waldmeister.org
Expected Competition Performance
Waldmeister 710 is the CASC-J5 UEQ division winner.
Z3 2./20
Nikolaj Bjorner, Leonardo de Moura
Microsoft Research, USA
System description not supplied.