What the Research Actually Says About AI Companions in 2026. Not What You Want to Hear.

Last Updated: March 2026

Table of Contents

What the Research Actually Says About AI Companions in 2026. Not What You Want to Hear.

Quick Answer: The research on AI companions shows real effects, small-to-medium in size, that do not last without continued use. Most studies are methodologically weak: short timeframes, self-reported outcomes, no control groups. What users report consistently, namely the value of availability and shame-free interaction, has not been adequately studied. The honest picture is: something real is happening. We do not yet know how real, for whom, or for how long.

Studies on Woebot and similar apps show modest reductions in anxiety and depression symptoms. The effect sizes are real but small-to-medium.
Most AI companion research has serious methodological problems: short study periods (2-8 weeks), no control groups, self-reported outcomes, and high dropout rates.
The research does not show treatment-level outcomes. AI companions are not equivalent to therapy and no credible study claims they are.
What users report consistently, immediate availability and shame-free interaction, has barely been studied as a mechanism. The research has not caught up to actual usage patterns.
The effects appear real but not durable. Symptoms tend to return to baseline when users stop using the tools. This is consistent with the mechanism being availability rather than skill-building.

Why This Review Is Different From Most You Will Read

Most AI companion coverage is either enthusiastic promotion or reflexive dismissal. Companies publish research designed to validate their products. Critics dismiss the entire category based on concern rather than evidence. Both approaches tell you more about the author’s priors than about what the data actually shows.

I am going to do something different. I am going to tell you exactly what the research shows, exactly where it falls short, and exactly what remains genuinely unknown. This is going to be less satisfying than a clean verdict, because the honest state of the science is genuinely messy.

If you are looking for permission to use AI companions because they definitely work, this piece will not give you that. If you are looking for permission to dismiss them because they definitely do not work, this piece will not give you that either. What it will give you is an accurate picture of where the evidence actually stands.

The Woebot Studies: What They Show and What They Don’t

Woebot was one of the first AI companion apps to publish peer-reviewed research, and its 2017 study in the Journal of Medical Internet Research remains the most cited piece of research in this space. The study involved 70 college students randomly assigned to either Woebot or an informational control condition. After two weeks, the Woebot group showed significantly greater reductions in anxiety and depression symptoms as measured by validated scales.

That is the good news. Here is what the study did not show. The sample was 70 people, all university students, mostly female, over two weeks. Two weeks is not a meaningful timeframe for assessing an intervention’s durability. University students are not representative of the broader population of people with anxiety and depression. The dropout rate was significant. And the effect sizes, while statistically significant, were in the small-to-medium range, not the large effects that characterize effective psychotherapy.

Woebot published a follow-up study in 2021 with a larger sample (n=179) and an eight-week timeframe. The results showed similar modest effects. Participants reported reduced anxiety and depression symptoms compared to a control group receiving a self-help book. Importantly, the study showed no significant difference between Woebot and the self-help book on some measures, suggesting that some of the effect may be attributable to the act of engaging with any structured mental health resource rather than to Woebot specifically.

This is an important methodological point that gets overlooked in most coverage. Showing that an AI companion outperforms doing nothing is a very low bar. Showing that it outperforms other accessible resources, and by how much, is the question that matters for practical recommendations. Most studies in this space have not cleared that bar convincingly.

The Methodological Problems That Keep Showing Up

Before discussing specific findings, you need to understand the structural weaknesses that run through almost all AI companion research. These are not minor quibbles. They are fundamental limitations that constrain what conclusions you can legitimately draw.

Short study periods. Most studies run two to eight weeks. Mental health conditions are chronic and fluctuating. Two to eight weeks is enough to detect whether an acute intervention reduces current symptom levels. It is not enough to determine whether the effect persists, whether the user gains anything durable, or whether they are better off six months later than they would have been without the intervention.

Self-reported outcomes. The primary outcome measures in most studies are validated symptom scales like the PHQ-9 for depression and the GAD-7 for anxiety. These scales are self-reported, meaning participants fill them out based on how they perceive themselves. Participants in AI companion studies know they are using an AI companion tool, which creates expectation effects. People who believe they are receiving help tend to report feeling better.

No adequate control groups. A proper control condition for AI companion research would give participants a similar-looking tool that lacks the specific active ingredients you are testing. Most studies use waiting list controls (no intervention) or informational controls (a document or book). These controls do not rule out nonspecific factors: attention, novelty, expectation of benefit, or the simple act of reflecting on one’s symptoms.

High dropout rates. Engagement with AI companion apps drops off sharply after the first few days in most studies. Studies that report outcomes only for users who completed the full study period are measuring a self-selected group of people who found the intervention useful enough to continue, not the average user.

Sponsor influence. A substantial proportion of AI companion research is funded by or conducted in partnership with the companies whose products are being studied. This does not automatically invalidate the findings, but it warrants additional scrutiny regarding outcome measure selection, publication decisions, and how results are framed.

What the Research Does Show

Having outlined the limitations, let me be clear that the limitations do not mean nothing is happening. They mean we cannot be as confident about what is happening and why as the promotional coverage suggests.

Across multiple studies, AI companion apps consistently produce modest reductions in self-reported anxiety and depression symptoms in the short term. The effect is real. It shows up across different apps (Woebot, Wysa, Replika-adjacent tools), different populations, and different validated measures. When multiple methodologically limited studies point in the same direction, the direction is more credible even if the size remains uncertain.

A 2022 meta-analysis published in JMIR Mental Health pooled results from 13 studies on conversational AI for mental health support. The overall effect size for symptom reduction was small-to-medium (d = 0.56 for depression, d = 0.42 for anxiety). These are statistically meaningful effects. They are not clinically negligible. They are also substantially smaller than the effects produced by evidence-based psychotherapy, which typically shows large effects for depression (d = 0.8-1.2) in comparable studies.

The comparison to therapy is the right reference point. AI companions are frequently positioned in marketing as a complement to professional care. The research supports this framing. They are not positioned as equivalents in the literature, and the effect size gap between AI companions and therapy is large and consistent.

What the Research Has Not Studied (But Should)

Here is where the literature has a serious gap that no published research has adequately addressed. The most consistently reported benefits of AI companions, across user surveys, forum discussions, and qualitative studies, are not the symptom-reduction outcomes that clinical research typically measures.

Users report three things most consistently. First, the value of immediate availability. Having something that responds at 2am without social cost or time zone friction. Second, the reduction in shame that comes from non-judgmental interaction. Not having to perform your distress appropriately for a human audience. Third, the ability to express thoughts that feel too small, too repetitive, or too embarrassing to bring to a friend or therapist.

These are not the same as reduced PHQ-9 scores. They describe a different kind of value: the value of having a low-friction processing space. This is real and meaningful, and the quantitative clinical research framework is not well-designed to detect or measure it.

To put it bluntly: the research is measuring the wrong outcomes for the actual user experience. It is looking for therapy-like symptom reduction and finding small effects. Users are primarily using these tools for availability and shame-reduction and reporting substantial subjective value. These are different things and the gap between them explains a lot of the disconnect between what the evidence shows and what users experience.

Research data visualization showing AI companion effect sizes compared to therapy benchmarks — Effect sizes in AI companion research are real but modest compared to evidence-based psychotherapy benchmarks.

Claim	What the Evidence Shows	Confidence Level
AI companions reduce anxiety symptoms	Small-to-medium effect in short-term studies	Moderate — consistent across studies, method limits
AI companions reduce depression symptoms	Small-to-medium effect in short-term studies	Moderate — consistent across studies, method limits
AI companions are as effective as therapy	No evidence for this claim. Effect sizes are substantially smaller than therapy.	High confidence — this claim is not supported
Effects are durable after stopping use	Limited follow-up data. Available evidence suggests effects diminish after stopping.	Low confidence — not adequately studied
Availability and shame-reduction are valuable	Consistently reported by users. Not adequately studied as outcomes.	Moderate — user report consistent, no clinical validation
AI companions are harmful	No robust evidence of harm in general population. Concern about dependency noted.	Low confidence — harms not well studied

The Durability Problem: What Happens When You Stop

This is the research finding that gets the least attention and deserves the most. The few studies that have included follow-up assessments after users stopped using AI companion apps show a consistent pattern: symptom levels tend to return toward baseline.

A 2023 study following Wysa users for 12 weeks after the active intervention period found that gains made during active use partially eroded over the follow-up period. Users did not fully return to baseline, but the maintenance of gains was significantly weaker than during active use.

This is consistent with a specific mechanism: availability. If AI companions work primarily by providing an accessible space to process emotions in real time, then the benefits are contingent on using the tool. When you stop using it, you lose the availability benefit. This is fundamentally different from therapy, which ideally builds skills and insights that persist after the treatment ends.

The implication for users is straightforward. AI companions may be best understood as ongoing tools rather than interventions with a course of treatment. Like exercise or sleep hygiene, the benefit is present when you do it and diminishes when you stop. This is not a devastating limitation. It is just a different category of tool than the clinical research framing implies.

The Dependency Question: What Does the Evidence Actually Say?

Concerns about AI companion dependency get significant coverage in the popular press and almost none in the peer-reviewed literature. This is a meaningful gap: the concern is repeated frequently but not adequately studied.

What we know is that some users report high engagement levels with AI companion apps, and a subset report feeling distressed when the app is unavailable. This is a documented pattern from user surveys and qualitative interviews. It is not well-characterized in terms of prevalence, severity, or whether it meets any clinical definition of problematic use.

The research on technology dependency more broadly suggests that high engagement with a tool is not automatically pathological, and that the relevant question is whether the engagement interferes with other functioning. By that standard, the available evidence on AI companion dependency is insufficient to draw strong conclusions either way. Users report some dependency concern; clinicians note it as a theoretical risk; no study has adequately characterized the phenomenon in a population sample.

The honest position: dependency is a real risk worth monitoring individually. It is not a documented population-level harm that justifies dismissing AI companions as a category.

What Should You Actually Conclude From All This?

The honest summary of the research is this. AI companions produce real short-term benefits for anxiety and depression symptoms. The effects are modest, not treatment-level, and appear contingent on continued use rather than building durable capabilities. The research has significant methodological limitations that make confident conclusions difficult. The outcomes users report as most valuable, availability and shame-reduction, have not been adequately studied as mechanisms.

This does not mean AI companions are ineffective. It means the research has not yet characterized what they actually do well. A tool that reduces the friction of emotional processing, that is available at 2am without social cost, that allows you to say things you would be embarrassed to say to a human, provides real value even if that value does not show up cleanly in PHQ-9 score reductions.

Use them as one tool in a larger toolkit. Do not use them as a replacement for professional support if you need professional support. Do not expect durable effects from short-term use. Do not treat the absence of clinical-trial-level evidence as evidence of absence of effect.

Platforms like Candy AI, Replika, and others are providing something real to their users. The research has not yet caught up to characterizing what that something is with the precision the field eventually needs.

The Research That Needs to Happen

To be genuinely useful, AI companion research needs longer study periods, active control conditions that isolate specific mechanisms, outcome measures that capture availability and shame-reduction rather than just symptom scales, follow-up assessments to characterize durability, and adequately powered samples drawn from non-student populations.

None of these studies exist yet at scale. The field is young. The tools have evolved faster than the research infrastructure required to study them properly. The result is a significant mismatch between public perception (both positive and negative) and what the evidence base actually supports.

That mismatch matters. Overstating the evidence pushes vulnerable people toward tools as substitutes for care they actually need. Understating it dismisses real value that millions of users are already extracting from these tools every day. The research needs to get better. Until it does, honest uncertainty is the most defensible position.

Key Takeaways

AI companion research shows real short-term effects on anxiety and depression symptoms. Effect sizes are small-to-medium, not treatment-level.
Most studies have serious limitations: short timeframes, self-reported outcomes, no adequate control groups. These limits do not invalidate the findings; they limit confidence in them.
Effects appear to diminish when users stop using the tools. AI companions are best understood as ongoing tools, not courses of treatment.
The outcomes users report as most valuable, availability and shame-free interaction, have not been adequately studied as mechanisms.
AI companions are not therapy equivalents and no credible research claims they are. Use them as one tool among several, not a replacement for professional support when needed.

Frequently Asked Questions

Is there peer-reviewed evidence that AI companions actually help with mental health?

Yes. Multiple peer-reviewed studies, including a 2022 meta-analysis in JMIR Mental Health covering 13 studies, show small-to-medium reductions in anxiety and depression symptoms from AI companion app use. The evidence is real and consistent across studies, despite significant methodological limitations that prevent strong conclusions about effect size or durability.

Can AI companions replace therapy?

No. The effect sizes in AI companion research are substantially smaller than those produced by evidence-based psychotherapy. No credible published study positions AI companions as therapy equivalents. They are most accurately described as accessible adjuncts that can provide low-friction emotional processing, not treatments for clinical conditions.

Are AI companions safe for people with serious mental health conditions?

Research on this specific population is very limited. Most studies exclude people with severe mental health conditions or acute suicidality. People with serious mental health conditions should discuss AI companion use with their treatment providers rather than using it as a self-managed intervention.

Why do users report such strong positive experiences if the effect sizes are only small-to-medium?

This is the central gap in the research. The outcomes users report as most valuable, immediate availability and non-judgmental interaction, are not well-captured by standard symptom scales. The subjective experience of “having somewhere to put your thoughts at 2am” may be real and meaningful without producing large scores on the PHQ-9. The research framework is measuring the wrong things for the actual use case.

What should I look for when choosing an AI companion if I have anxiety?

Prioritize platforms designed around emotional attunement rather than entertainment. Replika is the most studied in the emotional support context and its free tier is adequate for most use cases. For users who want memory continuity across sessions, Candy AI offers indexed memory that reduces the re-establishment burden in each session. Neither platform is a clinical tool, and both recommend professional support for clinical conditions in their documentation.

Fuel more research: https://coff.ee/chuckmel