Scrub Private Data from GitHub

Use this skill when sensitive content has been accidentally committed or exposed in a public GitHub repository. Covers the full response: immediate containment, history rewriting, GitHub support escalation, and post-incident prevention.

When to Use

  • A PR exposes private strategy, pricing, or internal plans in a public repo
  • Credentials, API keys, or secrets are committed to a public branch
  • PII (personal user data) is pushed to a public repo
  • Any content that should not be publicly visible is found in GitHub

Prerequisites

  • GITHUB_TOKEN environment variable with repo scope (or fine-grained PAT with Contents, Pull Requests, and Administration permissions)
  • gh CLI available, or curl for direct GitHub API calls
  • Write access to the affected repository

Severity Classification

Before proceeding, classify the exposure:

SeverityExamplesResponse
CriticalPII, credentials, secrets, API keysFull procedure + GitHub support escalation (Step 4)
HighStrategy docs, pricing, internal plansFull procedure (Steps 1–3, 5)
MediumInternal process docs, draft contentSquash + cleanup (Steps 1–3)

Workflow

Step 1 — Squash and Overwrite the PR Diff

The PR page retains the diff even after the branch is deleted. To scrub it, recreate the branch with clean content and force-push before closing it — the branch must exist for the force-push to work:

# Clone the repo (or use existing checkout)
git clone https://github.com/{owner}/{repo}.git && cd {repo}
 
# Create the branch again from the PR's base (usually main)
git checkout -b {branch-name} origin/main
 
# Create a single empty commit (or a commit with sanitised content)
git commit --allow-empty -m "Scrubbed: removed sensitive content from PR #{PR_NUMBER}
 
This branch was recreated to overwrite the PR diff after accidental
exposure of private content.
 
Co-Authored-By: Bot <bot@cascadeguard.com>"
 
# Force-push to overwrite the PR's commit history
git push --force origin {branch-name}

After force-pushing, verify the PR page no longer shows the sensitive diff: https://github.com/{owner}/{repo}/pull/{PR_NUMBER}/files

If credentials were exposed, rotate them immediately after this step. Assume they are compromised from the moment the push was public.

Step 2 — Immediate Containment

Close the PR and delete the branch to reduce visibility:

# Close the PR without merging
gh pr close {PR_NUMBER} --repo {owner}/{repo}
 
# Delete the remote branch (if not the default branch)
gh api repos/{owner}/{repo}/git/refs/heads/{branch-name} -X DELETE

Step 3 — Check if Content Reached the Default Branch

If the PR was merged before being caught:

# Check if the file exists on main
gh api repos/{owner}/{repo}/contents/{file-path} --jq '.sha' 2>/dev/null
 
# If it exists, create a revert PR
git checkout -b revert-sensitive-content origin/main
git rm {file-path}
git commit -m "Remove accidentally committed sensitive content
 
Co-Authored-By: Bot <bot@cascadeguard.com>"
git push origin revert-sensitive-content

For Critical severity, use the standard title:

gh pr create --title "Remove sensitive content" \
  --body "Removing file that should not be in the public repo." \
  --repo {owner}/{repo}

For non-Critical severity (High or Medium — not proceeding to Step 4):

gh pr create --title "Remove low priority sensitive content" \
  --body "Removing file that should not be in the public repo." \
  --repo {owner}/{repo}

Important: Even after removing from main, the file remains in git history. For Critical severity, proceed to Step 4.

Step 4 — GitHub Support Escalation (Critical Severity Only)

For PII, credentials, or legally sensitive data that must be purged from GitHub’s caches and git history:

  1. Contact GitHub Support: https://support.github.com/contact
  2. Subject line: “Request to remove cached sensitive data from repository”
  3. Include:
    • Repository URL
    • Specific commit SHAs containing sensitive data
    • PR numbers affected
    • Nature of the sensitive data (PII, credentials, etc.)
    • Confirmation that the data has been removed from the current branch
  4. For credentials specifically: Also check https://github.com/{owner}/{repo}/security — GitHub’s secret scanning may have already flagged it.

GitHub can purge:

  • Cached PR diffs and commit views
  • Unreachable objects from their storage (after git push --force)
  • Fork network copies (in extreme cases)

Response time is typically 1–3 business days. For urgent PII exposure, mention GDPR/regulatory obligations in the request.

Step 5 — Audit Trail

Log the incident for the team record. Create a file at .ai/incidents/{date}-{slug}.md in the project workspace (e.g. .ai/incidents/2026-04-04-strategy-doc-exposed.md):

## Incident: {date} — {brief description}
 
- **Severity:** {Critical/High/Medium}
- **Repo:** {owner}/{repo}
- **PR/Branch:** #{PR_NUMBER} / {branch-name}
- **Content exposed:** {description — do NOT include the actual sensitive content}
- **Duration of exposure:** {time from push to containment}
- **Actions taken:**
  - [ ] PR diff overwritten via squash force-push
  - [ ] PR closed
  - [ ] Branch deleted
  - [ ] Content removed from default branch (if merged)
  - [ ] Credentials rotated (if applicable)
  - [ ] GitHub support contacted (if Critical)
- **Scrubbed by:** {agent/person}
- **Root cause:** {why it happened}
- **Prevention:** {what changes prevent recurrence}

Step 6 — Prevention Checks (Post-Incident)

After resolving the immediate issue, consider adding these guardrails:

Shared GitHub Action for path blocking — a reusable workflow that blocks PRs containing changes to sensitive directories (.ai/, docs/plans/, docs/strategy/). This prevents accidental exposure before it reaches a public repo. See CAS-169 for the implementation task.

Pre-commit hooks (via .pre-commit-config.yaml):

repos:
  - repo: https://github.com/zricethezav/gitleaks
    rev: v8.18.0
    hooks:
      - id: gitleaks

CI-level scanning (GitHub Actions):

- name: Scan for secrets
  uses: gitleaks/gitleaks-action@v2
  env:
    GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

Custom pattern detection — add a .gitleaks.toml to flag organisation-specific sensitive patterns:

[extend]
useDefault = true
 
[[rules]]
id = "cascadeguard-strategy"
description = "Internal strategy document keywords"
regex = '''(?i)(growth.?strategy|pricing.?tier|revenue.?model|internal.?roadmap)'''
path = '''(?i)\.(md|txt|doc)$'''
 
[[rules]]
id = "pii-patterns"
description = "Common PII patterns"
regex = '''(?i)(social.?security|passport.?number|date.?of.?birth|national.?id)'''

CODEOWNERS — require review for sensitive paths:

# Require CTO review for any docs/plans content
docs/plans/ @cascadeguard/cto-team
.ai/projects/ @cascadeguard/cto-team

Notes

  • GitHub caches PR diffs aggressively. The squash force-push in Step 1 is the most reliable way to overwrite what the PR page displays.
  • Even after force-pushing, the original commits may remain as “unreachable objects” in GitHub’s storage for up to 90 days before garbage collection. For Critical severity, GitHub support escalation is the only way to guarantee full removal.
  • If the repo has forks, the exposed commits may have been fetched by fork owners. GitHub support can help with fork network cleanup in extreme cases.
  • Always assume exposed credentials are compromised. Rotate first, clean up second.