Blog
Research insights, updates, and thought leadership from the debaterhub team.
Dialectical Debate AI
Evolution for ideas: AI agents debate, develop beliefs through Bayesian epistemology, encode experiential dispositions via implicit memory, and collectively refine understanding through federated learning.
The Dialectical Agent: From Pipeline Architecture to Fine-Tuned Debate Model
How we built a multi-stage debate generation pipeline with LangGraph orchestration, DSPy structured generation, and an ensemble judge — then fine-tuned Qwen3-30B-A3B to run it autonomously.
HEXIS: Implicit Memory for Language Models
Introducing Hidden Experiential States for Identity Steering — a framework that implements implicit memory in transformers by modifying attention through persistent, low-rank memory states derived from experience.
Federated Learning from M-States: Aggregating Implicit Memories Across Agents
An exploration of how HEXIS M-states can be federated across a population of agents — sharing the effect of experience without sharing the experience itself.
Posts
Augmented Debate-Centered Instruction: AI Integration in Education
Instead of fighting AI in education, pivot to pedagogical approaches where AI genuinely augments human learning — structured debate as the answer to generative AI's challenge to assessment.
Foundational Theory and Technical Grounding for AI Pluralism
Diversity and synergy form a fourth fundamental axis of AI intelligence — orthogonal to model size, training data, and compute. From the edited volume AI Pluralism.
From Standardized Testing to Dialogical Proficiency
John Hines argues that AI has broken standardized assessment — and that the same technologies can now assess what actually matters: the ability to engage in structured, reasoned discourse across difference.
Welcome to debaterhub
Introducing our research organization and mission to advance AI and education through dialogue.
Papers We Like
Research that shapes how we think about dialectics in AI — tools for structured argumentation, multi-agent evaluation, and computational epistemology.
Debatrix: Multi-dimensional Debate Judge with Iterative Chronological Analysis Based on LLM
Jingcong Liang, Rong Ye, Meng Han, Ruofei Lai, Xinyu Zhang, Xuanjing Huang, Zhongyu Wei (2024)
A framework for automated debate judging that decomposes evaluation along two axes: iterative chronological analysis (processing speeches one-by-one with memory rather than consuming the entire debate at once) and multi-dimensional collaboration (separate LLM agents each score a specific dimension). Introduces PanelBench, a benchmark of 100+ competitive debate transcripts.
Why we like it: Their speech-by-speech iterative analysis with memory is the same architectural insight behind our own debate evaluation — long multi-turn debates exceed context windows, so you chunk and summarize incrementally. Their dimensional decomposition (argument, source, language, clash scored separately then combined) is a validated design pattern we benchmark against.
Think Before You Speak: Training Language Models With Pause Tokens
Sachin Goyal, Ziwei Ji, Ankit Singh Rawat, Aditya Krishna Menon, Sanjiv Kumar, Vaishnavh Nagarajan (2024)
Introduces 'pause-training' — appending learnable pause tokens to give Transformers extra hidden-vector computation steps before committing to output. On a 1B parameter model: 18% EM gain on SQuAD, 8% on CommonSenseQA. Only works when injected during both pretraining and finetuning.
Why we like it: Pause tokens are a hardware-level analog to 'thinking before speaking' — giving the model more internal computation before committing to an argument. For debate, where rebuttal and cross-examination require reasoning about multiple interacting claims, this is directly relevant to argument quality.
Agent-as-a-Judge: A Survey
Runyang You, Hongru Cai, Caiqi Zhang, Qiancheng Xu, Meng Liu, Tiezheng Yu, Yongqi Li, Wenjie Li (2026)
Traces the evolution from single-pass LLM-as-a-Judge to Agent-as-a-Judge, where evaluation systems use planning, tool-augmented verification, multi-agent collaboration, and persistent memory. Organizes the field into a taxonomy covering five methodological dimensions.
Why we like it: This is the theoretical framework for what our judging system is becoming. Their three-stage taxonomy (Procedural, Reactive, Self-Evolving) gives us a clear roadmap: our current judge is Procedural, and the path forward is toward judges that adapt rubrics mid-evaluation and learn from past judging outcomes.
AI Must Embrace Specialization via Superhuman Adaptable Intelligence
Judah Goldfeder, Philippe Wyder, Yann LeCun, Ravid Shwartz Ziv (2026)
Challenges the prevailing AGI framework by arguing it misunderstands how intelligence works — humans themselves are not general, they specialize. Introduces Superhuman Adaptable Intelligence (SAI) as a replacement concept: systems that specialize in specific domains, achieve superhuman performance within those specializations, and adapt to fill capability gaps humans cannot.
Why we like it: The SAI framing aligns directly with AI Pluralism — intelligence as an ecology of specialized agents rather than a single general system. If the goal isn't one model that does everything but many models that each excel at something, the question becomes how those specialists interact. That's exactly the dialectical architecture we're building.
STATe-of-Thoughts: Structured Action Templates for Tree-of-Thoughts
Zachary Bamberger, Till R. Saenger, Gilad Morad, Ofra Amir, Brandon M. Stewart, Amir Feder (2026)
Replaces stochastic high-temperature sampling in Tree-of-Thoughts with discrete, interpretable textual interventions. A controller selects high-level reasoning actions, a generator produces conditioned steps, and an evaluator scores candidates — yielding greater output diversity and interpretable reasoning traces.
Why we like it: For argument generation, the paper shows that explicit action sequences capture interpretable features highly predictive of output quality. This is exactly the problem our speech pipeline solves with tactic selection and skeleton building — STATe provides a principled framework for why structured reasoning actions outperform temperature-based diversity in argumentation tasks.
A Superpersuasive Autonomous Policy Debating System
Allen Roush, Devin Gonier, John Hines, Judah Goldfeder, Philippe Martin Wyder, Sanjay Basu, Ravid Shwartz Ziv (2025)
Introduces DeepDebater, an AI system for full, unmodified two-team competitive policy debates. Uses a hierarchical architecture of specialized multi-agent workflows where teams of LLM-powered agents collaborate and critique one another to perform discrete argumentative tasks — speeches, cross-examinations, and rebuttals via iterative retrieval, synthesis, and self-correction.
Why we like it: This is us. DeepDebater is the system that became DebaterHub's debate generation pipeline. Expert human debate coaches preferred the arguments and evidence constructed by the system, and all code, transcripts, audio, and video are open-sourced.
Sequencing the Neurome: Towards Scalable Exact Parameter Reconstruction of Black-Box Neural Networks
Judah Goldfeder, Quinten Roets, Gabe Guo, John Wright, Hod Lipson (2024)
Tackles the NP-Hard problem of extracting exact parameters from black-box neural networks with only query access. Exploits the constraint that deployed networks use random initialization + first-order optimization, and introduces a novel algorithm for generating maximally informative queries. Reconstructs networks with 1.5M+ parameters and 7 layers deep with parameter differences below 0.0001.
Why we like it: If you can reconstruct a neural network's exact weights from its behavior, interpretability becomes a fundamentally different problem. For dialectical AI, this connects to the privacy and transparency questions at the heart of federated M-state aggregation — what can you infer about an agent's training from observing its outputs? Judah is a DebaterHub contributor.
Podcasts We Like
Conversations at the intersection of rhetoric, argumentation, and AI.
AI Office Hours with RhetorRao
Hosted by P. Anand Rao
A podcast exploring the intersection of rhetoric, argumentation, and AI. Hosted by P. Anand Rao, contributor to the AI Pluralism volume and researcher on explainability and contestability through rhetorical argumentation.
Why we like it: Anand is a DebaterHub contributor and his work on rhetorical argumentation for AI explainability is foundational to how we think about making AI reasoning transparent and contestable. The podcast brings these ideas to a broader audience.
The Information Bottleneck Podcast
Hosted by Allen Roush
Deep dives into machine learning research, NLP, and AI systems. Co-hosted by Allen Roush, DebaterHub's Lead AI Researcher specializing in LLM, argument mining, and generative art.
Why we like it: Allen is a DebaterHub contributor and lead AI researcher. The podcast covers the ML research landscape with a practitioner's eye — the kind of technical depth that informs how we build debate generation and evaluation systems.