AIbase
Product LibraryTool Navigation

DPO-MCTS-ToT-Training

Public

This module implements a post-training mechanism that allows a language model to explore various reasoning branches (chain-of-thoughts) using a Monte Carlo Tree Search (MCTS) framework. It selects the branch with the best answer using a cosine similarity evaluator that compares the candidate answer to a known correct answer.

Creat2025-02-12T02:04:58
Update2025-03-07T19:37:30
https://swarms.ai
6
Stars
0
Stars Increase

Related projects