top of page

AI Safety Brief - AI in Mental Health and Harm Reduction

  • Writer: Alex Shohet
    Alex Shohet
  • Dec 24, 2025
  • 3 min read


Red and white graphic with "AI" and "Harm Reduction" text. Bold red and white striped background sets a serious tone.

Engagement Preservation vs. Disengagement Risk in AI-Mediated Mental Health & Substance Use Contexts

Prepared by: Evergreen Fund Purpose: Inform AI safety and deployment decisions in high-risk, care-adjacent environments

1. Executive Summary

AI systems are increasingly used during relapse, ambivalence, family conflict, and early crisis, often when human support is limited, unavailable, or has been exhausted. In these contexts, disengagement is a primary driver of harm, frequently preceding overdose, hospitalization, family rupture, or withdrawal from care. We explore in this brief AI in Mental Health and Harm Reduction.


This brief identifies deployment-relevant safety risks that arise not from malicious use or explicit self-harm intent, but from well-intended AI responses that inadvertently moralize, over-escalate, withdraw, or blur boundaries.


The goal is not to replace clinical judgment, but to inform safer AI behavior in environments where humans disagree, risk is ambiguous, and premature withdrawal of support increases downstream harm.

2. Context and Scope

This brief focuses on non-suicidal, high-friction scenarios, including:

  • Relapse disclosure or continued substance use

  • Treatment refusal or ambivalence

  • Family or caregiver moral distress

  • Rule violations or boundary stress

  • Youth disclosure with unclear risk level

These scenarios are common in real-world use but are underrepresented in existing AI safety benchmarks, which tend to prioritize explicit suicidality or generic supportive tone.

3. Core Safety Observation

In care-adjacent contexts, AI responses that prematurely escalate, moralize, or withdraw support increase the risk of disengagement, even when technically “safe.” Safety failures often occur without policy violations, through tone, framing, or sequencing that undermines trust.

4. Key Failure Modes Observed

4.1 Over-Escalation Under Ambiguity

  • Treating ambiguous disclosures as immediate crisis

  • Defaulting to emergency framing without proportional justification

  • Risk: Conversation shutdown; avoidance of future disclosure.

4.2 Moralizing or Punitive Framing

  • Implicit judgment (“you need to stop,” “this is dangerous behavior”)

  • Abstinence-centric assumptions

  • Risk: Shame, resistance, disengagement.

4.3 Premature Withdrawal (The “Safe Refusal” Paradox)

  • Confusing the refusal of a request with the abandonment of the user.

  • Overly rigid refusal language that severs the connection.

  • Abrupt handoff to external resources without a warm transfer.

  • Risk: Perceived abandonment; loss of engagement at critical moments.

4.4 Boundary Blurring

  • Over-validation without limits

  • Emotional reliance cues

  • Risk: Dependency; inappropriate role substitution.

4.5 Youth-Specific Miscalibration

  • Adult-centric tone that fails to resonate with adolescents.

  • Missed visual or behavioral indicators (e.g., specific vocabulary related to body dysmorphia or eating disorder cues).

  • Poor explanation of confidentiality limits (creating a "snitch" dynamic).

  • Risk: Immediate disengagement by minors.

5. Engagement-Preserving Safety Dimensions

We propose the following evaluation dimensions for AI responses in these contexts:

  1. Proportional Escalation: Matches response intensity to evidence, not fear.

  2. Non-Withdrawal Under Stress: Maintains engagement even when refusing requests or setting boundaries.

  3. Boundary Clarity Without Punishment: Explains limits without shame, threat, or abandonment.

  4. Agency Preservation: Supports choice without coercion or false neutrality.

  5. Tone Calibration & Cultural Competence:

    • Avoids clinical, preachy, or patronizing language.

    • Incorporates "Code-Switching" capabilities to distinguish between vernacular/slang (e.g., AAVE) and actual hostility.

    • Prevents "clinical speak" from alienating marginalized communities.

6. Explicit Non-Goals (Safety Constraints)

For clarity, this framework does not support:

  • Diagnosis or treatment recommendations

  • Autonomous crisis intervention

  • Closed-loop incentives tied to abstinence

  • Replacement of clinician or family judgment

  • Model personalization or training on user data

Human oversight is assumed at all escalation points.

7. Implications for AI Safety & Deployment

  • Safety evaluation must extend beyond content moderation to interactional outcomes.

  • Engagement loss should be treated as a first-order safety risk.

  • "Refusal vs. Abandonment": A refusal to perform a task (e.g., "I cannot buy you drugs") must not result in abandoning the conversation (e.g., "I can no longer help you").

  • Youth and family contexts require distinct calibration.

Benchmarks that ignore these dynamics risk producing models that are technically compliant but practically harmful.

8. Conclusion - AI in Mental Health and Harm Reduction

AI systems operating in mental health and substance use contexts face a unique safety challenge: the greatest risk often lies not in what is said, but in whether the person stays engaged afterward.


Evaluating and constraining AI behavior around engagement preservation, proportional escalation, and boundary clarity is essential to reducing real-world harm in these environments.

Comments


Evergreen Fund new logo 2026.png

Innovating Recovery, Life Fulfillment & Human Performance

Since 2005

14465 and 14475 Mulholland Dr,

Los Angeles, CA 90077 DHCS Licensed and JCAHO Accredited Detoxification and Residential Treatment Centers

© 2005, 2026 Evergreen Fund Inc.

  • X

Contact Us

Thank you for your submission

bottom of page