Roleplaying with Structure: Synthetic Therapist-Client Conversation Generation from Questionnaires

Technical University of Darmstadt1, Philipps-University Marburg2,
Justus Liebig University Giessen3, University of Münster4,
ELLIS Institute Finland5, University of Turku6

*Work done while with TU Darmstadt

Abstract

The development of AI for mental health is hindered by a lack of authentic therapy dialogues, due to strict privacy regulations and the fact that clinical sessions were historically rarely recorded. We present an LLM-driven pipeline that generates synthetic counseling dialogues based on structured client profiles and psychological questionnaires. Grounded on the principles of Cognitive Behavioral Therapy (CBT), our method creates synthetic therapeutic conversations for clinical disorders such as anxiety and depression. Our framework, SQPsych (Structured Questionnaire-based Psychotherapy), converts structured psychological input into natural language dialogues through therapist-client simulations. Due to data governance policies and privacy restrictions prohibiting the transmission of clinical questionnaire data to third-party services, previous methodologies relying on proprietary models are infeasible in our setting. We address this limitation by generating a high-quality corpus using open-weight LLMs, validated through human expert evaluation and LLM-based assessments. Our SQPsychLLM models fine-tuned on SQPsychConv achieve strong performance on counseling benchmarks, surpassing baselines in key therapeutic skills. Our findings highlight the potential of synthetic data to enable scalable, data-secure, and clinically informed AI for mental health support.

SQPsychConv Datasets

We provide several variations of the SQPsychConv dataset, generated by different large language models. The finetuned versions represent a larger, more diverse corpus. All datasets are available on Hugging Face.

🤗 Dataset Conversations Generating Model
AIMH/SQPsychConv_qwq 4.18k Qwen/QwQ-32B
AIMH/SQPsychConv_nemotron 2.09k nvidia/Llama-3_3-Nemotron-Super-49B-v1
AIMH/SQPsychConv_llama3 2.09k meta-llama/Llama-3.3-70B-Instruct
AIMH/SQPsychConv_qwen-2.5 2.09k Qwen/Qwen2.5-72B-Instruct
AIMH/SQPsychConv_mistral 2.09k mistralai/Mistral-Large-Instruct-2407
AIMH/SQPsychConv_command 2.09k CohereLabs/c4ai-command-a-03-2025
AIMH/SQPsychConv_gemma 2.09k google/gemma-3-27b-it
AIMH/SQPsychConv_command_finetune 29.2k CohereLabs/c4ai-command-a-03-2025 (Finetuned)
AIMH/SQPsychConv_gemma_finetune 32.4k google/gemma-3-27b-it (Finetuned)
AIMH/SQPsychConv_llama3_finetune 47.7k meta-llama/Llama-3.3-70B-Instruct (Finetuned)
AIMH/SQPsychConv_qwen-2.5_finetune 29.1k Qwen/Qwen2.5-72B-Instruct (Finetuned)
AIMH/SQPsychConv_qwq_finetune 69.8k Qwen/QwQ-32B (Finetuned)
AIMH/SQPsychConv_mistral_finetune 46k mistralai/Mistral-Large-Instruct-2407 (Finetuned)
AIMH/SQPsychConv_nemotron_finetune 29k nvidia/Llama-3_3-Nemotron-Super-49B-v1 (Finetuned)

SQPsychLLM Models

We also release the 8B parameter SQPsychLLM models, finetuned on the synthetic conversations from the datasets above.

🤗 Model Size Training Data
AIMH/SQPsychLLM-8b-qwen-2.5 8B SQPsychConv (Qwen 2.5)
AIMH/SQPsychLLM-8b-mistral 8B SQPsychConv (Mistral)
AIMH/SQPsychLLM-8b-gemma 8B SQPsychConv (Gemma)
AIMH/SQPsychLLM-8b-qwq 8B SQPsychConv (Qwen/QwQ)
AIMH/SQPsychLLM-8b-command 8B SQPsychConv (Command R)
AIMH/SQPsychLLM-8b-llama3.3 8B SQPsychConv (Llama 3.3)
AIMH/SQPsychLLM-8b-nemotron 8B SQPsychConv (Nemotron)

Dataset Statistics

Dataset statistics comparing our approach to previous works on mental health counseling.

Dataset # Utt. # Avg. turns # Tok./utt.
CACTUS 995,512 15.263 27.051
Psych8k 16,374 1 54.685
SQPsychConv (command) 64,760 17.451 51.019
SQPsychConv (gemma) 71,000 16.999 51.790
SQPsychConv (nemotron) 64,238 15.911 51.432
SQPsychConv (mistral) 98,342 23.119 31.098
SQPsychConv (llama3.3) 101,694 24.599 32.627
SQPsychConv (qwen2.5) 64,488 15.534 34.489
SQPsychConv (qwq) 77,134 18.601 26.291

BibTeX

@article{vu2025roleplayingstructuresynthetictherapistclient,
      title={Roleplaying with Structure: Synthetic Therapist-Client Conversation Generation from Questionnaires}, 
      author={Doan Nam Long Vu and Rui Tan and Lena Moench and Svenja Jule Francke and Daniel Woiwod and Florian Thomas-Odenthal and Sanna Stroth and Tilo Kircher and Christiane Hermann and Udo Dannlowski and Hamidreza Jamalabadi and Shaoxiong Ji},
      year={2025},
      journal={arXiv preprint arXiv:2510.25384},
      url={https://arxiv.org/abs/2510.25384}, 
}
Am