HTPS Parameters
This writeup provides detailed descriptions of parameters available for configuring the HyperTree Proof Search (HTPS) algorithm within the open-htps library.
Parameters
exploration (double): Exploration parameter in UCB (Upper Confidence Bound), often referred to as c.
policy_type (enum): Policy selection, either “AlphaZero” or “RPO”.
num_expansions (int): Total number of expansions to perform before terminating the search.
succ_expansions (int): Number of expansions performed per theorems_to_expand call. Determines how many theorems are returned for expansion at each step.
early_stopping (bool): Whether already solved nodes should be excluded from further expansions. Not to be confused with early_stopping_solved_if_root_not_proven.
no_critic (bool): Indicates if a critic model should be used or omitted during the search.
backup_once (bool): If set to true, each subtree is backed up only once, useful when a subtree is selected multiple times due to high succ_expansions.
backup_one_for_solved (bool): Determines whether solved nodes should always back up with a value of 1.0. If false, uses the critic value instead.
depth_penalty (double): A discount factor applied as a penalty based on node depth.
count_threshold (int): Minimum number of visits a node must have to qualify as a training example.
tactic_p_threshold (double): Probability threshold from the tree policy for a tactic to be considered a training example.
tactic_sample_q_conditioning (bool): Enables returning tactic samples for Q-conditioning (for details, see the original HTPS paper).
only_learn_best_tactics (bool): If true, returns only tactic samples involved in minimal proofs.
tactic_init_value (double): Initial tactic value assigned before a goal has been evaluated by the critic or expanded.
q_value_solved (enum): Determines the Q-value assignment for solved node-tactic pairs.
policy_temperature (double): Temperature parameter applied to the tree policy distribution.
metric (enum): Metric used in minimal proof search. Options include:
“depth”
“length”
“time”
node_mask (enum): Specifies nodes returned as training examples:
“NoMask”: all samples are returned
“Solving”: only samples from solved nodes are returned (does not require the root to be proven)
“Proof”: only samples from proof nodes are returned (requires the root to be proven)
“MinimalProof”: only samples from nodes on the minimal proof are returned
“MinimalProofSolving”: same as “MinimalProof” if the root is proven, otherwise same as “Solving”
effect_subsampling_rate (double): Subsampling rate applied to effect samples.
critic_subsampling_rate (double): Subsampling rate applied to critic samples.
early_stopping_solved_if_root_not_proven (bool): Stops expansion of solved nodes if the root has not yet been proven, prioritizing proving the root first.
virtual_loss (int): The number of virtual counts added for each node visit to encourage exploration.