Back to Dashboard
Reddit

Reddit Data Methodology

How we extract and structure peptide experience data from r/Peptides

Dataset Overview

Unique experiences
Unique users
Compounds tracked

Data sourced from r/Peptides posts spanning January 2024 to January 20, 2026.

1

Data Collection

We use the Arctic Shift API to download Reddit posts and comments from r/Peptides. Arctic Shift is a public archive of Reddit data that provides API access to historical posts and comments.

Collection parameters:
  • • Subreddit: r/Peptides
  • • Date range: January 1, 2024 onwards
  • • Includes: All posts and their associated comments
  • • Rate limited to respect API guidelines (1 req/sec)

Usernames are hashed using SHA-256 with a private salt before storage. We never store raw Reddit usernames.

2

Experience Extraction

Each Reddit post and its comments are analyzed together using Gemini 2.0 Flash. The model reads through the entire thread looking for instances where someone explicitly describes their personal experience taking a specific peptide.

What qualifies as an "experience":
  • First-person account: The person explicitly states they took the peptide themselves
  • Specific outcome: They describe what happened — improved sleep, reduced pain, nausea, flushing, etc.
  • Clear attribution: The outcome is linked to a specific compound, not a vague stack
Examples of valid experiences:
  • "BPC-157 helped my tendon heal in about 3 weeks"
  • "Tirzepatide killed my appetite but the nausea was brutal"
  • "Tried Ipamorelin for 2 months, didn't notice anything"
What we skip:
  • • Generic recommendations without personal experience ("BPC is great stuff")
  • • Questions about dosing or sourcing
  • • Secondhand reports ("my friend tried...") — flagged as lower confidence
  • • Product quality issues (vial problems, reconstitution)
  • • Hypothetical or planned usage ("I'm thinking of trying...")
Structured data extracted:
  • Compound: The specific peptide mentioned
  • Outcome type: positive, negative, neutral (no effect), or mixed (both benefits and side effects)
  • Outcome category: What specifically happened (e.g., "improved healing", "nausea", "better sleep")
  • Benefit magnitude: minimal → moderate → significant → transformative
  • Side effect severity: mild → moderate → severe
  • Dosage info: Dose, frequency, and duration as stated by the user
  • Confidence: High, medium, or low based on clarity of the report

The extraction prompt was refined through multiple rounds of manual review to minimize false positives (extracting non-experiences) and improve attribution accuracy. A single post may yield zero experiences (if it's just a question) or multiple experiences (if commenters share their own stories).

3

Deduplication

Raw extraction can produce multiple experience rows per user — from multiple posts, repeated mentions, or over-extraction from a single comment. We deduplicate to ensure each data point represents a unique user's experience.

Deduplication rules:
  • One experience per user per compound: If a user posts about the same peptide multiple times, we keep only their highest-confidence report
  • Stack priority: If a user mentions both a single compound AND a stack containing it (e.g., "BPC-157" and "BPC-157 + TB-500"), we only count them in the stack — not double-counted
  • Compound normalization: Variants like "Reta", "Retatrutide", "HGH"/"GH", and "TB"/"TB-500" are merged into canonical names
Why this matters:
  • • A prolific poster mentioning BPC-157 in 10 threads counts as 1 user, not 10 experiences
  • • Someone taking "BPC-157 + TB-500" is counted only in the stack stats, not in BPC-157 or TB-500 individual stats
  • • The "N" you see represents unique users, not raw Reddit mentions

When multiple experiences exist for the same user and compound, we prioritize high-confidence extractions over medium or low confidence ones. The entire dataset is re-deduplicated with each batch to ensure consistency across all data.

4

Severe Effects Surfacing

Experiences flagged as "severe" side effects are highlighted on compound detail pages. This includes ER visits, hospitalizations, or persistent adverse effects. The original summary text is preserved so users can understand the context.

5

Data Schema Reference

View the complete database schema including all extraction and summary tables, column types, and the formulas used to calculate aggregated statistics.

View data schema

Limitations

  • Self-reported data: All experiences are self-reported by anonymous Reddit users. We cannot verify dosages, compounds, or outcomes.
  • Selection bias: People may be more likely to post about dramatic experiences (very positive or very negative) than neutral ones.
  • LLM extraction errors: The model may misinterpret context, attribute outcomes incorrectly, or miss relevant experiences. We assign confidence scores to flag uncertain extractions.
  • Compound naming: Users may refer to compounds by different names or abbreviations. We normalize common variants (e.g., "Reta" → "Retatrutide") but some edge cases may slip through.
  • Not medical advice: This data is for research and educational purposes only. Always consult a healthcare provider.