TFF, THF, BNF Distribution

Geoff Sutcliffe
University of Miami
geoff[@]cs.miami.edu

Introduction

The syntax for simply typed first order form, and typed higher order form, is now reasonably stable. The THF is now in use by a few people, including Chris Benzmueller and Viktor Kunak. No-one is using the TFF syntax, but if the SPASS people ever get off the DFG horse maybe they'll find it appropriate. Some further information about the simply typed first order form is available from the original proposal with Koen Claessen. There have been a few improvements to the THF syntax since its first release, and the latest version is given at the end of this document. A key change was motivated by Florian Rabe, to relax the syntax somewhat so that it is easier to use, but makes more nonsense syntactically legal.

The final goal is to add THF problems to the TPTP, as proposed by Chris Benzmueller and Chad Brown in this TPHOLs paper.

Is this all too Overwhelming?

The combined THF, TFF, FOF, and CNF syntax is quite large. In order to make it more managable I plan to split it into pieces:

A base syntax, which includes the FOF and CNF components, and common elements.
A TFF extension, which can be added to the base
A THF extension, which can be added to the base

Status wrt Semantics

While FOL semantics is simple and standard, in HOL there are alternative semantics, e.g., standard semantics, Henkin semantics, non-extensional semantics, etc. A full discussion is in this JSL paper by Chris Benzmueller, Chad Brown, and Michael Kohlhase. Different semantics lead to different (SZS) statuses for problems. The TPTP syntax (original and THF) does not provide any way of regording these alternatives. Generally, the THF needs ways of specifying the semantics information. This requires a simple extension of the header section conventions for TPTP problem files.

Question: Does a THF ATP system need to know the semantics wrt which the user wants an answer? If so, this information will have to be specified, but it seems most likely to be a command line option for the system rather than a statement in the problem.

Libraries of Axioms

Just like for the FOF and CNF cases, there is a need for someone to develop librarues of THF axioms that can be used in THF problems. Viktor Kunak has sugggested that the most important things are:

sets and relations
transitive closure
inductive definitions
integers
algebraic data types, sequences

Looking at e.g. Isabelle or HOL libraries, would be a good starting point.

%----v3.2.0.10 (TPTP version.internal development number)
%------------------------------------------------------------------------------
%----README ... this header provides important meta- and usage information
%----
%----Intended uses of the various parts of the TPTP syntax are explained
%----in the TPTP technical manual, linked from www.tptp.org.
%----
%----Four kinds of separators are used, to indicate different types of rules:
%----  ::= is used for regular grammar rules, for syntactic parsing.
%----  :== is used for semantic grammar rules. These define specific values
%----      that make semantic sense when more general syntactic rules apply.
%----  ::- is used for rules that produce tokens.
%----  ::: is used for rules that define character classes used in the
%----       construction of tokens.
%----
%----White space may occur between any two tokens. White space is not specified
%----in the grammar, but there are some restrictions to ensure that the grammar
%----is compatible with standard Prolog: a  should be readable with
%----read/1.
%----
%----The syntax of comments is defined by the  rule. Comments may
%----occur between any two tokens, but do not act as white space. Comments
%----will normally be discarded at the lexical level, but may be processed
%----by systems that understand them (e.g., if the system comment convention
%----is followed).
%------------------------------------------------------------------------------
%----Files. Empty file is OK.
          ::= *
         ::=  | 

%----Formula records
  ::=  |  |  |
                         
%----Future languages may include ...  english | efof | tfof | mathml | ...
      ::= thf(,,).
      ::= tff(,,).
      ::= fof(,,).
      ::= cnf(,,).
        ::=  | ,
%----In derivations the annotated formulae names must be unique, so that
%----parent references (see ) are unambiguous.

%----Types for problems.
%----Note: The previous  from ...
%----    ::= -
%----... is now gone. Parsers may choose to be tolerant of it for backwards
%----compatibility.
       ::= 
       :== axiom | hypothesis | definition | lemma | theorem |
                         conjecture | negated_conjecture | plain | fi_domain |
                         fi_functors | fi_predicates | type | unknown
%----"axiom"s are accepted, without proof, as a basis for proving "conjecture"s
%----in FOF problems. In CNF problems "axiom"s are accepted as part of the set
%----whose satisfiability has to be established. There is no guarantee that the
%----axioms of a problem are consistent.
%----"hypothesis"s are assumed to be true for a particular problem, and are
%----used like "axiom"s.
%----"definition"s are used to define symbols, and are used like "axiom"s.
%----See  for details of thf usage.
%----"lemma"s and "theorem"s have been proven from the "axiom"s, can be used
%----like "axiom"s, but are redundant wrt the "axiom"s. A problem containing
%----a non-redundant "lemma" or theorem" is ill-formed. "theorem"s are more
%----important than "lemma"s from the user perspective.
%----"conjecture"s occur in only FOF problems, and all are to be proven from
%----the "axiom"(-like) formulae. A problem is solved only when all
%----"conjecture"s are proven.
%----"negated_conjecture"s occur in only CNF problems, and are formed from
%----negation of a "conjecture" in a FOF to CNF conversion.
%----"plain"s have no special user semantics, and can be used like "axiom"s.
%----"fi_domain", "fi_functors", and "fi_predicates" are used to record the
%----domain, interpretation of functors, and interpretation of predicates, for
%----a finite interpretation.
%----"type" defines the type globally for one symbol; treat as $true.
%----"unknown"s have unknown role, and this is an error situation.
%------------------------------------------------------------------------------
%----THF formulae. All formulae must be closed.
        ::=  | 
  ::=  | 
 ::=  |  |
                         
%----Only some binary connectives can be written without ()s.
%----There's no precedence among binary connectives
    ::=  
                         
 ::=  | =
%----Associative connectives & and | are in .
   ::=  |  |
                         
     ::=    |
                           
    ::=  &  |
                          & 
  ::=  @  |
                          @ 
 ::=  | @
%---- are in ()s or do not have a 
%----at the top level. Essentially, a  is any
%----lambda expression that "has enough parentheses" to be used inside
%----a larger lambda expression. However, lambda notation might not be used.
 ::=  |  |
                          |  | () |
                          | ()
 ::=  [] :
                         
     ::= ! | ?
%----@ (denoting apply) is left-associative and lambda is right-associative.
%----$lambda [X] : $lambda [Y] : f @ g (where f is a 
%----and g is a ) should be parsed as:
%----($lambda [X] : ($lambda [Y] : f)) @ g.
%----That is, g is not in the scope of either $lambda.
    ::=  [] :
                         
         ::= $lambda | ^
  ::=  | ,
       ::=  | 
 ::=  : 

%----Unary connectives bind more tightly than binary
  ::=  
 ::= 

%----THF atoms
           ::=  | 
     ::=  : 
   ::=  | () | 
   ::= 
   :==  | 
%----If the entire formula is a , it should be treated as an
%----assertion that the atom is in this type, globally. If a thf_typed_atom>
%----appears inside a larger expression, it is presently unspecified what the
%----scope of this type declaration is. Users producing such formulas should
%----be sure to clarify their intent; ad-hoc polymorphism and subtyping are
%----two of several possibilities.

%----thf_tuple is really the "pair" function of lambda calculus, so it
%----should be right-associative.  It is also "cons" in Lisp.
%----So "(U, V)" is "syntactic sugar" for "lambda [X]: (X @ U @ V)".
%----Some useful functions to work with tuples might be:
%----"first := $lambda [E]: (E @ $lambda [F, R]: F)" and
%----"rest  := $lambda [E]: (E @ $lambda [F, R]: R)".
%----Then "(first @ (U, V)) = U" and "(rest @ (U, V)) = V" are valid.
%---- is not a  because it would confuse Prolog
%----if it appeared at top level without parentheses.
          ::= , |
                         ,

%----If "name := expr" is the entire formula, it has global scope, i.e., the
%----entire input. If "name := expr" is in the context of a $letrec, e.g.,
%----"$letrec [name1 := expr1, name2 := expr2, ...] : formula" (where the
%----symbols "name1", "name2", ... normally appear in "formula", and might
%----appear in "expr1", "expr2", ...), each occurrence of namei" should be
%----replaced by "expri" within "formula", but not elsewhere, i.e., "formula"
%----is the scope of the definition. For "cascading" definitions use
%----"$letrec [a := b]: $letrec [c := d]: e". As a formula a thf_definition>
%----evaluates to true.
         ::=  [] : 
      ::=  |
                         ,
     ::=  :=  |
                         ()
             ::= $letrec | :=

%---- appears after ":", where a type is being specified
%----for a constant or variable (or other expression, in the future), or
%----within a larger . However, this is just for readability,
%----and  expands simply to thf_unitary_formula>, so the
%----syntax allows just about any lambda expression with "enough" parentheses
%----to serve as a type. The expected use of this flexibility is parametric
%----polymorphism in types, expressed with lambda abstraction:
%----  list := ^ [T]: ((T * (list @ T)) + (emptylist @ T))
%----  ! [L : (list @ int)]: $let [sum := ^ [L1]: ...] : .
%----If a  that looks like a type appears in the function
%----position of an apply formula, it is considered to be a function into bool
%----that is true exactly when its parameter is of the type denoted by the
%----, e.g., ((nat -> nat) @ fibonacci) might evaluate to
%----$true.
%---- is right-associative: o->o->o means o->(o->o).
%---- is left-associative: o * o * o means (o * o) * o.
%---- is left-associative: o + o + o means (o + o) + o.
   ::= 
    ::=  |  |
                         
   ::=    |
                           
     ::=    |
                           
     ::=    |
                           
          ::= -> | >
%------------------------------------------------------------------------------
%----TFF formulae. All formulae must be closed.
%----TFF is like FOF except that predicate and function symbols must be typed
%----exactly once at top level (with formula role "type"), and variables must
%----be typed when they are bound in quantifier lists. atomic_formula>s are
%----just like FOF. See the porposal linked from the TPTP web page for details
%----of the semantics.
        ::=  | 
  ::=  | 
 ::=  | 
 ::=  
                         
   ::=  | 
     ::=   
                         *
 ::=  
    ::=  & 
                         *
 ::= & 
 ::=  |  |
                          | ()
 ::=  [] :
                         
  ::=  | ,
       ::=  : 
  ::=  

%---- can appear only at top level, unlike in THF.
     ::=  :  |
                         ()
   ::=  | 
%----The types of the s and s are:
%----     : $real
%----     : $int
%----     : $int
%----     : $distinct_object
%----    $true : $oType
%----    $false : $oType

           ::=  | 
   ::=  |  |
                         () |  ()
%----For Josef Urban, change  to , and leave
%----out  (to avoid conflict on . Also,
%---- can be ()ed, so leave out (), but
%----that means  doesn't have (()) option.
   ::=    |
                         ()
     ::=    |
                           
%------------------------------------------------------------------------------
%----FOF formulae. All formulae must be closed.
        ::=  | 
     ::=  | 
%----Only some binary connectives are associative
%----There's no precedence among binary connectives
    ::=   
  ::= <=> | => | <= | <~> | ~ | ~&
%----Associative connectives & and | are in 
       ::=  | 
         ::=   
                         *
    ::=  
        ::=  & 
                         *
   ::= & 
   ::=  | &
%---- are in ()s or do not have a  at the
%----top level.
    ::=  |  |
                          | ()
 ::=  [] : 
         ::= ! | ?
%----! is universal quantification and ? is existential. Syntactically, the
%----quantification is the left operand of :, and the  is
%----the right operand. Although : is a binary operator syntactically, it is
%----not a , and thus a  is a
%----.
%----Universal   example: ! [X,Y] : ((p(X) & p(Y)) => q(X,Y)).
%----Existential example: ? [X,Y] : (p(X) & p(Y)) & ~ q(X,Y).
%----Quantifiers have higher precedence than binary connectives, so in
%----the existential example the quantifier applies to only (p(X) & p(Y)).
      ::=  | ,
%----Future variables may have existential counting
%----Unary connectives bind more tightly than binary
      ::=  
   ::= ~
%------------------------------------------------------------------------------
%----CNF formulae (variables implicitly universally quantified)
        ::= () | 
        ::=  *
   ::=  
            ::=  | ~ 
%------------------------------------------------------------------------------
%----Special higher order terms
     ::=  |  |
                          |  |
                         

%----First order atoms
     ::=  |  |
                         
 ::= 
%----A  looks like a , but really we mean
%----   ::=  | ()
%----          ::= 
%----            ::= 
%----Using  removes a reduce/reduce ambiguity in lex/yacc.
%----Note: "defined" means a word starting with one $ and "system" means $$.
          ::=  | ,
 ::=  | () |
                           
 ::= = | !=
%----Some systems still interpret equal/2 as equality. The use of equal/2
%----for other purposes is therefore discouraged. Please refrain from either
%----use. Use infix '=' for equality. Note:  !=  is equivalent
%----to ~  = 
       ::= 
       :== $true | $false
       ::= 
       :==
       ::= 
       :== $oType | $o | $iType | $i | $tType |
                         $real | $int | $distinct_object
%----$oType/$o is the Boolean type, i.e., the type of $true and $false.
%----$iType/$i is type of individuals. tType is the type of all types.
%----$real is the type of s. $int is the type of s and
%----s. $distinct_object is the type of <$distinct_object>s.

 ::= 
%----s are used for evaluable predicates that are
%----available in particular tools. The predicate names are not controlled
%----by the TPTP syntax, so use with due care. The same is true for
%----s.

%----First order terms
               ::=  | 
      ::=  |  | 
         ::=  | ()
           ::= 
            ::= 
       ::= 
%---- is split off so it can also be used by THF
       ::=  | 
%----This is a more general formulation, not working yet
%        ::=  | 
% ::=  | () |
%                            
%    ::= 
%    :==
%     ::= 
%     :==
%  ::=
%----System terms have system specific interpretations
        ::=  | ()
    ::= 
     ::= 
           ::= 

%----Formula sources
             ::= 
             :==  |  |  |
                         unknown
%----Only a  can be a , i.e., derived formulae can be
%----identified by a  or an 
         :==  | 
   :== inference(,,
                             [])
     :== 
%----Examples are        deduction | modus_tollens | modus_ponens | rewrite |
%                        resolution | paramodulation | factorization |
%                        cnf_conversion | cnf_refutation | ...
        :==  | ,
        :== 
     :== : | 
    :== introduced()
         :== definition | axiom_of_choice | tautology
%----This should be used to record the symbol being defined, or the function
%----for the axiom of choice
    :==  |  | 
        :== file()
          :== , | 
             :== theory()
        :== equality | ac
%----More theory names may be added in the future. The  is
%----used to store, e.g., which axioms of equality have been implicitly used,
%----e.g., theory(equality,[rst]). Standard format still to be decided.
     :== creator()
       :== 

%----Useful info fields
      ::= , | 
        ::= 
        :== [] | []
         :==  | ,
          :==  |  | 
%----Useful info for formula records
       :==  | 
   :== description()
        :== iquote()
%----s are used for recording exactly what the system output about
%----the inference step. In the future it is planned to encode this information
%----in standardized forms as  in each .
%----Useful info for inference records
     :==  |  | 
   :== status() | 
%----These are the status values from the SZS ontology
       :== tau | tac | eqv | thm | sat | cax | noc | csa | cth |
                         ceq | unc | uns | sab | sam | sar | sap | csp | csr |
                         csm | csb
%----The most commonly used status values are:
%----  thm - Every model (and there are some) of the parent formulae is a
%----        model of the inferred formula. Regular logical consequences.
%----  cth - Every model (and there are some) of the parent formulae is a
%----        model of the negation of the inferred formula. Used for negation
%----        of conjectures in FOF to CNF conversion.
%----  sab - There is a bijection between the models (and there are some) of
%----        the parent formulae and models of the inferred formula. Used for
%----        Skolemization steps.
%----For the full hierarchy see the SZSOntology file distributed with the TPTP.
  :== assumptions([])
     :== (,)
         :== refutation()
%----Useful info for creators is just 

%----Include directives
            ::= include().
  ::= ,[] | 
          ::=  | ,

%----Non-logical data
       ::=  | : |
                         
       ::=  | () |
                          | 
  ::=  | ,
       ::= [] | []
  ::=  | ,

%----General purpose
               ::=  | 
        ::=  | 
 ::= 
 ::= 
             ::=  |  | 
%----Numbers are always interpreted as themselves, and are thus implicitly
%----distinct if they have different values, e.g., 1 != 2 is an implicit axiom.
%----All numbers are base 10 at the moment.
          ::= 
               ::=
%------------------------------------------------------------------------------
%----Rules from here on down are for defining tokens (terminal symbols) of the
%----grammar, assuming they will be recognized by a lexical scanner.
%----A ::- rule defines a token, a ::: rule defines a macro that is not a
%----token. Usual regexp notation is used. Single characters are always placed
%----in []s to disable any special meanings (for uniformity this is done to
%----all characters, not only those with special meanings).

%----These are tokens that appear in the syntax rules above. No rules
%----defined here because they appear explicitly in the syntax rules,
%----except that , ,  denote "|", "*", "+", respectively.
%----Keywords:    fof cnf thf tff include
%----Punctuation: ( ) , . [ ] :
%----Operators:   ! ? ~ & | <=> => <= <~> ~| ~& * +
%----Predicates:  = != $true $false

%----For lex/yacc there cannot be spaces on either side of the | here
            ::: |
       ::: [%]*
      ::: [/][*][*][*]*[/]
     ::: ([^*]*[*][*]*[^/*])*[^*]*
%----Defined comments are a convention used for annotations that are used as
%----additional input for systems. They look like comments, but start with %$
%----or /*$. A wily user of the syntax can notice the $ and extract information
%----from the "comment" and pass that on as input to the system. They are
%----analogous to pragmas in programming languages. To extract these separately
%----from regular comments, the rules are:
%----      ::- |
%----     ::: [%]*
%----    ::: [/][*][*][*]*[/]
%----A string that matches both  and  should be
%----recognized as , so put these before .
%----Defined comments that are in use include:
%----    TO BE ANNOUNCED
%----System comments are a convention used for annotations that may used as
%----additional input to a specific system. They look like comments, but start
%----with %$$ or /*$$. A wily user of the syntax can notice the $$ and extract
%----information from the "comment" and pass that on as input to the system.
%----The specific system for which the information is intended should be
%----identified after the $$, e.g., /*$$Otter 3.3: Demodulator */
%----To extract these separately from regular comments, the rules are:
%----       ::- |
%----     ::: [%]*
%----    ::: [/][*][*][*]*[/]
%----A string that matches both  and  should
%----be recognized as , so put these before .

      ::- [']([^\\']|[\\][']|[\\][\\])*[']
%----      ::- '*', but ' and \ are escaped.
%----\ is used as the escape character for ' and \, i.e., if \' is  encountered
%----the ' is not the end of the , and if \\ is encountered the
%----second \ is not an escape. Both characters (the escape \ and the following
%----' or \) are retained and printed on output. Behaviour is undefined if the
%----escape \ is followed by anything other than ' or \. Behaviour is undefined
%----if a non- is encountered. If the contents of a  constitute a , then the ''s should be stripped to
%----produce a .

    ::- ["]([^\\"]|[\\]["]|[\\][\\])*["]
%----    ::- "*", but " and \ are escaped. The
%----comments for  apply, with ' replaced by ".
%----Distinct objects are always interpreted as themselves, and are thus
%----implicitly distinct if they look different, e.g., "Apple" != "Microsoft"
%----is an implicit axiom.

        ::- 
 ::- 
         ::- *
         ::- *
              ::- [|]
               ::- [*]
               ::- [+]

%----Numbers
               ::- (|)
     ::- 
   ::- 
%----s and s have to be different tokens
%----because s are allowed as annotated formula names.
   ::: (|)
   ::: 
            ::: (|)
     ::: 
    ::: ([0]|*)
        ::: [.]*
               ::: [+-]
           ::: [Ee]

%----Character classes
            ::: [0-9]
   ::: [1-9]
        ::: [a-z]
        ::: [A-Z]
      ::: (|||[_])
             ::: [$]
     ::: .
%---- is any printable ASCII character, codes 32 (space) to 126
%----(tilde). printable_char> does not include tabs, newlines, bells, etc. The
%----use of . does not not exclude tab, so this is a bit loose.
      ::: [.\n]
%------------------------------------------------------------------------------