Blog

Research insights, updates, and thought leadership from the debaterhub team.

Projectactive

Dialectical Debate AI

Evolution for ideas: AI agents debate, develop beliefs through Bayesian epistemology, encode experiential dispositions via implicit memory, and collectively refine understanding through federated learning.

Project summary

Phase 1researchdebate-ai

The Dialectical Agent: From Pipeline Architecture to Fine-Tuned Debate Model

How we built a multi-stage debate generation pipeline with LangGraph orchestration, DSPy structured generation, and an ensemble judge — then fine-tuned Qwen3-30B-A3B to run it autonomously.

March 12, 202611 min read

Read article

Phase 2researchHEXIS

HEXIS: Implicit Memory for Language Models

Introducing Hidden Experiential States for Identity Steering — a framework that implements implicit memory in transformers by modifying attention through persistent, low-rank memory states derived from experience.

March 12, 202620 min read

Read article

Phase 3researchHEXIS

Federated Learning from M-States: Aggregating Implicit Memories Across Agents

An exploration of how HEXIS M-states can be federated across a population of agents — sharing the effect of experience without sharing the experience itself.

March 12, 202617 min read

Read article

Posts

educationADCIdebate

Augmented Debate-Centered Instruction: AI Integration in Education

Instead of fighting AI in education, pivot to pedagogical approaches where AI genuinely augments human learning — structured debate as the answer to generative AI's challenge to assessment.

March 15, 202612 min read

Read article

researchAI-pluralismmulti-agent

Foundational Theory and Technical Grounding for AI Pluralism

Diversity and synergy form a fourth fundamental axis of AI intelligence — orthogonal to model size, training data, and compute. From the edited volume AI Pluralism.

March 15, 202613 min read

Read article

educationassessmentAI

From Standardized Testing to Dialogical Proficiency

John Hines argues that AI has broken standardized assessment — and that the same technologies can now assess what actually matters: the ability to engage in structured, reasoned discourse across difference.

March 15, 202612 min read

Read article

announcementresearch

Welcome to debaterhub

Introducing our research organization and mission to advance AI and education through dialogue.

January 9, 20261 min read

Read article

Papers We Like

Research that shapes how we think about dialectics in AI — tools for structured argumentation, multi-agent evaluation, and computational epistemology.

Debatrix: Multi-dimensional Debate Judge with Iterative Chronological Analysis Based on LLM

Jingcong Liang, Rong Ye, Meng Han, Ruofei Lai, Xinyu Zhang, Xuanjing Huang, Zhongyu Wei (2024)

A framework for automated debate judging that decomposes evaluation along two axes: iterative chronological analysis (processing speeches one-by-one with memory rather than consuming the entire debate at once) and multi-dimensional collaboration (separate LLM agents each score a specific dimension). Introduces PanelBench, a benchmark of 100+ competitive debate transcripts.

Why we like it: Their speech-by-speech iterative analysis with memory is the same architectural insight behind our own debate evaluation — long multi-turn debates exceed context windows, so you chunk and summarize incrementally. Their dimensional decomposition (argument, source, language, clash scored separately then combined) is a validated design pattern we benchmark against.

Think Before You Speak: Training Language Models With Pause Tokens

Sachin Goyal, Ziwei Ji, Ankit Singh Rawat, Aditya Krishna Menon, Sanjiv Kumar, Vaishnavh Nagarajan (2024)

Introduces 'pause-training' — appending learnable pause tokens to give Transformers extra hidden-vector computation steps before committing to output. On a 1B parameter model: 18% EM gain on SQuAD, 8% on CommonSenseQA. Only works when injected during both pretraining and finetuning.

Why we like it: Pause tokens are a hardware-level analog to 'thinking before speaking' — giving the model more internal computation before committing to an argument. For debate, where rebuttal and cross-examination require reasoning about multiple interacting claims, this is directly relevant to argument quality.

Agent-as-a-Judge: A Survey

Runyang You, Hongru Cai, Caiqi Zhang, Qiancheng Xu, Meng Liu, Tiezheng Yu, Yongqi Li, Wenjie Li (2026)

Traces the evolution from single-pass LLM-as-a-Judge to Agent-as-a-Judge, where evaluation systems use planning, tool-augmented verification, multi-agent collaboration, and persistent memory. Organizes the field into a taxonomy covering five methodological dimensions.

Why we like it: This is the theoretical framework for what our judging system is becoming. Their three-stage taxonomy (Procedural, Reactive, Self-Evolving) gives us a clear roadmap: our current judge is Procedural, and the path forward is toward judges that adapt rubrics mid-evaluation and learn from past judging outcomes.

AI Must Embrace Specialization via Superhuman Adaptable Intelligence

Judah Goldfeder, Philippe Wyder, Yann LeCun, Ravid Shwartz Ziv (2026)

Challenges the prevailing AGI framework by arguing it misunderstands how intelligence works — humans themselves are not general, they specialize. Introduces Superhuman Adaptable Intelligence (SAI) as a replacement concept: systems that specialize in specific domains, achieve superhuman performance within those specializations, and adapt to fill capability gaps humans cannot.

Why we like it: The SAI framing aligns directly with AI Pluralism — intelligence as an ecology of specialized agents rather than a single general system. If the goal isn't one model that does everything but many models that each excel at something, the question becomes how those specialists interact. That's exactly the dialectical architecture we're building.

STATe-of-Thoughts: Structured Action Templates for Tree-of-Thoughts

Zachary Bamberger, Till R. Saenger, Gilad Morad, Ofra Amir, Brandon M. Stewart, Amir Feder (2026)

Replaces stochastic high-temperature sampling in Tree-of-Thoughts with discrete, interpretable textual interventions. A controller selects high-level reasoning actions, a generator produces conditioned steps, and an evaluator scores candidates — yielding greater output diversity and interpretable reasoning traces.

Why we like it: For argument generation, the paper shows that explicit action sequences capture interpretable features highly predictive of output quality. This is exactly the problem our speech pipeline solves with tactic selection and skeleton building — STATe provides a principled framework for why structured reasoning actions outperform temperature-based diversity in argumentation tasks.

A Superpersuasive Autonomous Policy Debating System

Allen Roush, Devin Gonier, John Hines, Judah Goldfeder, Philippe Martin Wyder, Sanjay Basu, Ravid Shwartz Ziv (2025)

Introduces DeepDebater, an AI system for full, unmodified two-team competitive policy debates. Uses a hierarchical architecture of specialized multi-agent workflows where teams of LLM-powered agents collaborate and critique one another to perform discrete argumentative tasks — speeches, cross-examinations, and rebuttals via iterative retrieval, synthesis, and self-correction.

Why we like it: This is us. DeepDebater is the system that became DebaterHub's debate generation pipeline. Expert human debate coaches preferred the arguments and evidence constructed by the system, and all code, transcripts, audio, and video are open-sourced.

Sequencing the Neurome: Towards Scalable Exact Parameter Reconstruction of Black-Box Neural Networks

Judah Goldfeder, Quinten Roets, Gabe Guo, John Wright, Hod Lipson (2024)

Tackles the NP-Hard problem of extracting exact parameters from black-box neural networks with only query access. Exploits the constraint that deployed networks use random initialization + first-order optimization, and introduces a novel algorithm for generating maximally informative queries. Reconstructs networks with 1.5M+ parameters and 7 layers deep with parameter differences below 0.0001.

Why we like it: If you can reconstruct a neural network's exact weights from its behavior, interpretability becomes a fundamentally different problem. For dialectical AI, this connects to the privacy and transparency questions at the heart of federated M-state aggregation — what can you infer about an agent's training from observing its outputs? Judah is a DebaterHub contributor.

Podcasts We Like

Conversations at the intersection of rhetoric, argumentation, and AI.

AI Office Hours with RhetorRao

Hosted by P. Anand Rao

A podcast exploring the intersection of rhetoric, argumentation, and AI. Hosted by P. Anand Rao, contributor to the AI Pluralism volume and researcher on explainability and contestability through rhetorical argumentation.

Why we like it: Anand is a DebaterHub contributor and his work on rhetorical argumentation for AI explainability is foundational to how we think about making AI reasoning transparent and contestable. The podcast brings these ideas to a broader audience.

The Information Bottleneck Podcast

Hosted by Allen Roush

Deep dives into machine learning research, NLP, and AI systems. Co-hosted by Allen Roush, DebaterHub's Lead AI Researcher specializing in LLM, argument mining, and generative art.

Why we like it: Allen is a DebaterHub contributor and lead AI researcher. The podcast covers the ML research landscape with a practitioner's eye — the kind of technical depth that informs how we build debate generation and evaluation systems.