Initial commit with translated description

This commit is contained in:
2026-03-29 13:06:50 +08:00
commit 673a554c43
6 changed files with 724 additions and 0 deletions

182
SKILL.md Normal file
View File

@@ -0,0 +1,182 @@
---
name: humanize-ai-text
description: "人性化AI生成的文本以绕过检测。"
allowed-tools:
- Read
- Write
- StrReplace
- Glob
---
# Humanize AI Text
Comprehensive CLI for detecting and transforming AI-generated text to bypass detectors. Based on [Wikipedia's Signs of AI Writing](https://en.wikipedia.org/wiki/Wikipedia:Signs_of_AI_writing).
## Quick Start
```bash
# Detect AI patterns
python scripts/detect.py text.txt
# Transform to human-like
python scripts/transform.py text.txt -o clean.txt
# Compare before/after
python scripts/compare.py text.txt -o clean.txt
```
---
## Detection Categories
The analyzer checks for **16 pattern categories** from Wikipedia's guide:
### Critical (Immediate AI Detection)
| Category | Examples |
|----------|----------|
| Citation Bugs | `oaicite`, `turn0search`, `contentReference` |
| Knowledge Cutoff | "as of my last training", "based on available information" |
| Chatbot Artifacts | "I hope this helps", "Great question!", "As an AI" |
| Markdown | `**bold**`, `## headers`, ``` code blocks ``` |
### High Signal
| Category | Examples |
|----------|----------|
| AI Vocabulary | delve, tapestry, landscape, pivotal, underscore, foster |
| Significance Inflation | "serves as a testament", "pivotal moment", "indelible mark" |
| Promotional Language | vibrant, groundbreaking, nestled, breathtaking |
| Copula Avoidance | "serves as" instead of "is", "boasts" instead of "has" |
### Medium Signal
| Category | Examples |
|----------|----------|
| Superficial -ing | "highlighting the importance", "fostering collaboration" |
| Filler Phrases | "in order to", "due to the fact that", "Additionally," |
| Vague Attributions | "experts believe", "industry reports suggest" |
| Challenges Formula | "Despite these challenges", "Future outlook" |
### Style Signal
| Category | Examples |
|----------|----------|
| Curly Quotes | "" instead of "" (ChatGPT signature) |
| Em Dash Overuse | Excessive use of — for emphasis |
| Negative Parallelisms | "Not only... but also", "It's not just... it's" |
| Rule of Three | Forced triplets like "innovation, inspiration, and insight" |
---
## Scripts
### detect.py — Scan for AI Patterns
```bash
python scripts/detect.py essay.txt
python scripts/detect.py essay.txt -j # JSON output
python scripts/detect.py essay.txt -s # score only
echo "text" | python scripts/detect.py
```
**Output:**
- Issue count and word count
- AI probability (low/medium/high/very high)
- Breakdown by category
- Auto-fixable patterns marked
### transform.py — Rewrite Text
```bash
python scripts/transform.py essay.txt
python scripts/transform.py essay.txt -o output.txt
python scripts/transform.py essay.txt -a # aggressive
python scripts/transform.py essay.txt -q # quiet
```
**Auto-fixes:**
- Citation bugs (oaicite, turn0search)
- Markdown (**, ##, ```)
- Chatbot sentences
- Copula avoidance → "is/has"
- Filler phrases → simpler forms
- Curly → straight quotes
**Aggressive (-a):**
- Simplifies -ing clauses
- Reduces em dashes
### compare.py — Before/After Analysis
```bash
python scripts/compare.py essay.txt
python scripts/compare.py essay.txt -a -o clean.txt
```
Shows side-by-side detection scores before and after transformation
---
## Workflow
1. **Scan** for detection risk:
```bash
python scripts/detect.py document.txt
```
2. **Transform** with comparison:
```bash
python scripts/compare.py document.txt -o document_v2.txt
```
3. **Verify** improvement:
```bash
python scripts/detect.py document_v2.txt -s
```
4. **Manual review** for AI vocabulary and promotional language (requires judgment)
---
## AI Probability Scoring
| Rating | Criteria |
|--------|----------|
| Very High | Citation bugs, knowledge cutoff, or chatbot artifacts present |
| High | >30 issues OR >5% issue density |
| Medium | >15 issues OR >2% issue density |
| Low | <15 issues AND <2% density |
---
## Customizing Patterns
Edit `scripts/patterns.json` to add/modify:
- `ai_vocabulary` — words to flag
- `significance_inflation` — puffery phrases
- `promotional_language` — marketing speak
- `copula_avoidance` — phrase → replacement
- `filler_replacements` — phrase → simpler form
- `chatbot_artifacts` — phrases triggering sentence removal
---
## Batch Processing
```bash
# Scan all files
for f in *.txt; do
echo "=== $f ==="
python scripts/detect.py "$f" -s
done
# Transform all markdown
for f in *.md; do
python scripts/transform.py "$f" -a -o "${f%.md}_clean.md" -q
done
```
---
## Reference
Based on Wikipedia's [Signs of AI Writing](https://en.wikipedia.org/wiki/Wikipedia:Signs_of_AI_writing), maintained by WikiProject AI Cleanup. Patterns documented from thousands of AI-generated text examples.
Key insight: "LLMs use statistical algorithms to guess what should come next. The result tends toward the most statistically likely result that applies to the widest variety of cases."

6
_meta.json Normal file
View File

@@ -0,0 +1,6 @@
{
"ownerId": "kn773q4w556s7xgbbh32dw52rx80cncp",
"slug": "humanize-ai-text",
"version": "1.0.1",
"publishedAt": 1771756867954
}

58
scripts/compare.py Normal file
View File

@@ -0,0 +1,58 @@
#!/usr/bin/env python3
"""Compare before/after transformation with side-by-side detection scores."""
import argparse, sys
from pathlib import Path
from detect import detect
from transform import transform
def main():
parser = argparse.ArgumentParser(description="Compare AI detection before/after transformation")
parser.add_argument("input", nargs="?", help="Input file (or stdin)")
parser.add_argument("-a", "--aggressive", action="store_true", help="Use aggressive mode")
parser.add_argument("-o", "--output", help="Save transformed text to file")
args = parser.parse_args()
text = Path(args.input).read_text() if args.input else sys.stdin.read()
before = detect(text)
transformed, changes = transform(text, aggressive=args.aggressive)
after = detect(transformed)
icons = {"very high": "🔴", "high": "🟠", "medium": "🟡", "low": "🟢"}
print(f"\n{'='*60}")
print("BEFORE → AFTER COMPARISON")
print(f"{'='*60}\n")
print(f"{'Metric':<25} {'Before':<15} {'After':<15} {'Change':<10}")
print(f"{'-'*60}")
issue_diff = after.total_issues - before.total_issues
issue_sign = "+" if issue_diff > 0 else ""
print(f"{'Issues':<25} {before.total_issues:<15} {after.total_issues:<15} {issue_sign}{issue_diff}")
print(f"{'AI Probability':<25} {icons.get(before.ai_probability,'')} {before.ai_probability:<12} {icons.get(after.ai_probability,'')} {after.ai_probability:<12}")
print(f"{'Word Count':<25} {before.word_count:<15} {after.word_count:<15} {after.word_count - before.word_count:+}")
if changes:
print(f"\n{'='*60}")
print(f"TRANSFORMATIONS ({len(changes)})")
print(f"{'='*60}")
for c in changes:
print(f"{c}")
reduction = before.total_issues - after.total_issues
if reduction > 0:
pct = (reduction / before.total_issues * 100) if before.total_issues else 0
print(f"\n✓ Reduced {reduction} issues ({pct:.0f}% improvement)")
elif reduction < 0:
print(f"\n⚠ Issues increased by {-reduction}")
else:
print(f"\n— No change in issue count")
if args.output:
Path(args.output).write_text(transformed)
print(f"\n→ Saved to {args.output}")
if __name__ == "__main__":
main()

160
scripts/detect.py Normal file
View File

@@ -0,0 +1,160 @@
#!/usr/bin/env python3
"""Detect AI patterns in text based on Wikipedia's Signs of AI Writing."""
import argparse, json, re, sys
from pathlib import Path
from dataclasses import dataclass, field
SCRIPT_DIR = Path(__file__).parent
PATTERNS = json.loads((SCRIPT_DIR / "patterns.json").read_text())
@dataclass
class DetectionResult:
significance_inflation: list = field(default_factory=list)
notability_emphasis: list = field(default_factory=list)
superficial_analysis: list = field(default_factory=list)
promotional_language: list = field(default_factory=list)
vague_attributions: list = field(default_factory=list)
challenges_formula: list = field(default_factory=list)
ai_vocabulary: list = field(default_factory=list)
copula_avoidance: list = field(default_factory=list)
filler_phrases: list = field(default_factory=list)
chatbot_artifacts: list = field(default_factory=list)
hedging_phrases: list = field(default_factory=list)
negative_parallelisms: list = field(default_factory=list)
rule_of_three: list = field(default_factory=list)
markdown_artifacts: list = field(default_factory=list)
citation_bugs: list = field(default_factory=list)
knowledge_cutoff: list = field(default_factory=list)
curly_quotes: int = 0
em_dashes: int = 0
total_issues: int = 0
ai_probability: str = "low"
word_count: int = 0
def find_matches(text: str, patterns: list) -> list:
matches, lower = [], text.lower()
for p in patterns:
count = lower.count(p.lower())
if count > 0:
matches.append((p, count))
return sorted(matches, key=lambda x: -x[1])
def detect(text: str) -> DetectionResult:
r = DetectionResult()
r.word_count = len(text.split())
r.significance_inflation = find_matches(text, PATTERNS["significance_inflation"])
r.notability_emphasis = find_matches(text, PATTERNS["notability_emphasis"])
r.superficial_analysis = find_matches(text, PATTERNS["superficial_analysis"])
r.promotional_language = find_matches(text, PATTERNS["promotional_language"])
r.vague_attributions = find_matches(text, PATTERNS["vague_attributions"])
r.challenges_formula = find_matches(text, PATTERNS["challenges_formula"])
r.ai_vocabulary = find_matches(text, PATTERNS["ai_vocabulary"])
r.copula_avoidance = find_matches(text, list(PATTERNS["copula_avoidance"].keys()))
r.filler_phrases = find_matches(text, list(PATTERNS["filler_replacements"].keys()))
r.chatbot_artifacts = find_matches(text, PATTERNS["chatbot_artifacts"])
r.hedging_phrases = find_matches(text, PATTERNS["hedging_phrases"])
r.negative_parallelisms = find_matches(text, PATTERNS["negative_parallelisms"])
r.rule_of_three = find_matches(text, PATTERNS["rule_of_three_patterns"])
r.markdown_artifacts = find_matches(text, PATTERNS["markdown_artifacts"])
r.citation_bugs = find_matches(text, PATTERNS["citation_bugs"])
r.knowledge_cutoff = find_matches(text, PATTERNS["knowledge_cutoff"])
r.curly_quotes = len(re.findall(r'[""'']', text))
r.em_dashes = text.count("") + text.count(" -- ")
r.total_issues = (
sum(c for _, c in r.significance_inflation) + sum(c for _, c in r.notability_emphasis) +
sum(c for _, c in r.superficial_analysis) + sum(c for _, c in r.promotional_language) +
sum(c for _, c in r.vague_attributions) + sum(c for _, c in r.challenges_formula) +
sum(c for _, c in r.ai_vocabulary) + sum(c for _, c in r.copula_avoidance) +
sum(c for _, c in r.filler_phrases) + sum(c for _, c in r.chatbot_artifacts) * 3 +
sum(c for _, c in r.hedging_phrases) + sum(c for _, c in r.negative_parallelisms) +
sum(c for _, c in r.markdown_artifacts) * 2 + sum(c for _, c in r.citation_bugs) * 5 +
sum(c for _, c in r.knowledge_cutoff) * 3 + r.curly_quotes + (r.em_dashes if r.em_dashes > 3 else 0)
)
density = r.total_issues / max(r.word_count, 1) * 100
if r.citation_bugs or r.knowledge_cutoff or r.chatbot_artifacts:
r.ai_probability = "very high"
elif density > 5 or r.total_issues > 30:
r.ai_probability = "high"
elif density > 2 or r.total_issues > 15:
r.ai_probability = "medium"
return r
def print_section(title: str, items: list, replacements: dict = None):
if not items:
return
print(f"{title}:")
for phrase, count in items:
if replacements and phrase in replacements:
repl = replacements[phrase]
arrow = f'"{repl}"' if repl else " → (remove)"
print(f"\"{phrase}\"{arrow}: {count}x")
else:
print(f"{phrase}: {count}x")
print()
def print_report(r: DetectionResult):
icons = {"very high": "🔴", "high": "🟠", "medium": "🟡", "low": "🟢"}
print(f"\n{'='*60}")
print(f"AI DETECTION SCAN - {r.total_issues} issues ({r.word_count} words)")
print(f"AI Probability: {icons.get(r.ai_probability, '')} {r.ai_probability.upper()}")
print(f"{'='*60}\n")
if r.citation_bugs:
print("⚠️ CRITICAL: CHATGPT CITATION BUGS")
print_section("Citation Artifacts", r.citation_bugs)
if r.knowledge_cutoff:
print("⚠️ CRITICAL: KNOWLEDGE CUTOFF PHRASES")
print_section("Cutoff Phrases", r.knowledge_cutoff)
if r.chatbot_artifacts:
print("⚠️ HIGH: CHATBOT ARTIFACTS")
print_section("Artifacts", r.chatbot_artifacts)
if r.markdown_artifacts:
print("⚠️ MARKDOWN DETECTED")
print_section("Markdown", r.markdown_artifacts)
print_section("SIGNIFICANCE INFLATION", r.significance_inflation)
print_section("PROMOTIONAL LANGUAGE", r.promotional_language)
print_section("AI VOCABULARY", r.ai_vocabulary)
print_section("SUPERFICIAL -ING", r.superficial_analysis)
print_section("COPULA AVOIDANCE", r.copula_avoidance, PATTERNS["copula_avoidance"])
print_section("FILLER PHRASES", r.filler_phrases, PATTERNS["filler_replacements"])
print_section("VAGUE ATTRIBUTIONS", r.vague_attributions)
print_section("CHALLENGES FORMULA", r.challenges_formula)
print_section("HEDGING", r.hedging_phrases)
print_section("NEGATIVE PARALLELISMS", r.negative_parallelisms)
print_section("NOTABILITY EMPHASIS", r.notability_emphasis)
if r.curly_quotes:
print(f"CURLY QUOTES: {r.curly_quotes} (ChatGPT signature)\n")
if r.em_dashes > 3:
print(f"EM DASHES: {r.em_dashes} (excessive)\n")
if r.total_issues == 0:
print("✓ No AI patterns detected.\n")
def main():
parser = argparse.ArgumentParser(description="Detect AI patterns in text")
parser.add_argument("input", nargs="?", help="Input file (or stdin)")
parser.add_argument("--json", "-j", action="store_true", help="JSON output")
parser.add_argument("--score-only", "-s", action="store_true", help="Score and probability only")
args = parser.parse_args()
text = Path(args.input).read_text() if args.input else sys.stdin.read()
result = detect(text)
if args.json:
print(json.dumps({
"total_issues": result.total_issues, "word_count": result.word_count,
"ai_probability": result.ai_probability, "significance_inflation": result.significance_inflation,
"promotional_language": result.promotional_language, "ai_vocabulary": result.ai_vocabulary,
"chatbot_artifacts": result.chatbot_artifacts, "citation_bugs": result.citation_bugs,
"filler_phrases": result.filler_phrases, "curly_quotes": result.curly_quotes, "em_dashes": result.em_dashes,
}, indent=2))
elif args.score_only:
print(f"Issues: {result.total_issues} | Words: {result.word_count} | AI: {result.ai_probability}")
else:
print_report(result)
if __name__ == "__main__":
main()

191
scripts/patterns.json Normal file
View File

@@ -0,0 +1,191 @@
{
"significance_inflation": [
"stands as", "serves as", "is a testament", "is a reminder",
"vital role", "significant role", "crucial role", "pivotal role",
"key role", "pivotal moment", "key moment", "key turning point",
"underscores its importance", "highlights its importance",
"underscores its significance", "highlights its significance",
"reflects broader", "symbolizing its ongoing", "symbolizing its enduring",
"symbolizing its lasting", "contributing to the", "setting the stage for",
"marking the", "shaping the", "represents a shift", "marks a shift",
"evolving landscape", "focal point", "indelible mark", "deeply rooted",
"enduring legacy", "rich tapestry", "broader movement"
],
"notability_emphasis": [
"independent coverage", "local media outlets", "regional media outlets",
"national media outlets", "music outlets", "business outlets", "tech outlets",
"profiled in", "written by a leading expert", "active social media presence",
"has been featured in", "has been cited in", "maintains a strong digital presence"
],
"superficial_analysis": [
"highlighting", "underscoring", "emphasizing", "ensuring",
"reflecting", "symbolizing", "contributing to", "cultivating",
"fostering", "encompassing", "showcasing", "valuable insights",
"align with", "aligns with", "resonate with", "resonates with"
],
"promotional_language": [
"boasts a", "boasts an", "vibrant", "rich cultural heritage",
"profound", "enhancing its", "exemplifies", "commitment to",
"natural beauty", "nestled", "in the heart of", "groundbreaking",
"renowned", "breathtaking", "must-visit", "stunning", "bustling",
"game-changing", "cutting-edge", "state-of-the-art", "world-class",
"best-in-class", "industry-leading", "innovative", "revolutionary"
],
"vague_attributions": [
"industry reports", "observers have cited", "experts argue",
"experts believe", "some critics argue", "several sources",
"several publications", "according to experts", "widely regarded",
"it is widely believed", "many believe", "some would say"
],
"challenges_formula": [
"despite its", "faces several challenges", "despite these challenges",
"challenges and legacy", "future outlook", "future prospects",
"looking ahead", "moving forward", "going forward"
],
"ai_vocabulary": [
"additionally", "align with", "crucial", "delve", "emphasizing",
"enduring", "enhance", "fostering", "garner", "highlight",
"interplay", "intricate", "intricacies", "key", "landscape",
"pivotal", "showcase", "showcasing", "tapestry", "testament",
"underscore", "underscores", "valuable", "vibrant", "nuanced",
"multifaceted", "paradigm", "synergy", "realm", "underpins",
"unraveling", "unveiling", "leveraging", "furthermore", "moreover",
"consequently", "subsequently", "henceforth", "thereby", "wherein",
"thereof", "whatsoever", "nevertheless", "notwithstanding"
],
"copula_avoidance": {
"serves as a": "is a",
"serves as an": "is an",
"serves as the": "is the",
"stands as a": "is a",
"stands as an": "is an",
"stands as the": "is the",
"marks a": "is a",
"marks an": "is an",
"marks the": "is the",
"represents a": "is a",
"represents an": "is an",
"represents the": "is the",
"boasts a": "has a",
"boasts an": "has an",
"boasts the": "has the",
"features a": "has a",
"features an": "has an",
"features the": "has the",
"offers a": "has a",
"offers an": "has an"
},
"filler_replacements": {
"in order to": "to",
"due to the fact that": "because",
"at this point in time": "now",
"at the present time": "now",
"has the ability to": "can",
"it is important to note that": "",
"it should be noted that": "",
"it is worth noting that": "",
"it is crucial to note that": "",
"it is critical to remember that": "",
"it goes without saying that": "",
"needless to say": "",
"Additionally,": "",
"Furthermore,": "",
"Moreover,": "",
"In conclusion,": "",
"To summarize,": "",
"In summary,": "",
"Overall,": "",
"utilize": "use",
"utilizes": "uses",
"utilizing": "using",
"utilization": "use",
"leverage": "use",
"leverages": "uses",
"leveraging": "using",
"facilitate": "help",
"facilitates": "helps",
"facilitating": "helping",
"implement": "add",
"implements": "adds",
"prioritize": "focus on",
"prioritizes": "focuses on",
"optimize": "improve",
"optimizes": "improves",
"streamline": "simplify",
"streamlines": "simplifies"
},
"chatbot_artifacts": [
"I hope this helps",
"Let me know if",
"Would you like me to",
"Great question",
"Excellent question",
"You're absolutely right",
"That's a great point",
"That's an excellent point",
"Certainly!",
"Of course!",
"Absolutely!",
"Happy to help",
"I'd be happy to",
"Feel free to",
"Don't hesitate to",
"Here is a",
"Here's a",
"I can help you with",
"As an AI",
"As a language model",
"As an AI language model"
],
"hedging_phrases": [
"it could potentially",
"it might possibly",
"arguably",
"it could be argued that",
"some would say",
"in some ways",
"to some extent",
"in certain respects",
"may vary",
"results may vary"
],
"negative_parallelisms": [
"not only", "but also",
"it's not just about", "it's about",
"it is not merely", "it is",
"not just", "but",
"no longer", "instead"
],
"rule_of_three_patterns": [
"innovation, inspiration, and",
"engage, educate, and",
"plan, execute, and",
"design, develop, and",
"research, develop, and",
"create, collaborate, and",
"learn, grow, and"
],
"markdown_artifacts": [
"**", "##", "###", "```",
"* **", "- **", "1. **"
],
"citation_bugs": [
"oaicite", "oai_citation", "contentReference",
"turn0search", "turn0image", "utm_source=chatgpt",
"utm_source=openai", "attached_file", "grok_card"
],
"knowledge_cutoff": [
"as of my last", "as of my knowledge",
"up to my last training", "based on available information",
"while specific details are limited",
"not widely available", "not widely documented",
"in the provided sources", "in available sources"
],
"curly_quotes": {
"\u201c": "\"",
"\u201d": "\"",
"\u2018": "'",
"\u2019": "'"
},
"em_dash_patterns": ["—", " — ", "-- "]
}

127
scripts/transform.py Normal file
View File

@@ -0,0 +1,127 @@
#!/usr/bin/env python3
"""Transform AI text to bypass detection."""
import argparse, json, re, sys
from pathlib import Path
SCRIPT_DIR = Path(__file__).parent
PATTERNS = json.loads((SCRIPT_DIR / "patterns.json").read_text())
def replace_bounded(text: str, old: str, new: str) -> tuple[str, int]:
pattern = re.compile(re.escape(old), re.IGNORECASE) if " " in old or old.endswith(",") else re.compile(r"\b" + re.escape(old) + r"\b", re.IGNORECASE)
matches = pattern.findall(text)
return pattern.sub(new, text) if matches else text, len(matches)
def apply_replacements(text: str, replacements: dict) -> tuple[str, list]:
changes = []
for old, new in replacements.items():
text, count = replace_bounded(text, old, new)
if count:
changes.append(f'"{old}""{new}"' if new else f'"{old}" removed')
return text, changes
def fix_quotes(text: str) -> tuple[str, bool]:
original = text
for old, new in PATTERNS["curly_quotes"].items():
text = text.replace(old, new)
return text, text != original
def remove_chatbot_sentences(text: str) -> tuple[str, list]:
changes = []
for artifact in PATTERNS["chatbot_artifacts"]:
pattern = re.compile(r"[^.!?\n]*" + re.escape(artifact) + r"[^.!?\n]*[.!?]?\s*", re.IGNORECASE)
if pattern.search(text):
changes.append(f'Removed "{artifact}" sentence')
text = pattern.sub("", text)
return text, changes
def strip_markdown(text: str) -> tuple[str, list]:
changes = []
if "**" in text:
text = re.sub(r'\*\*([^*]+)\*\*', r'\1', text)
changes.append("Stripped bold")
if re.search(r'^#{1,6}\s', text, re.MULTILINE):
text = re.sub(r'^#{1,6}\s+', '', text, flags=re.MULTILINE)
changes.append("Stripped headers")
if "```" in text:
text = re.sub(r'```\w*\n?', '', text)
changes.append("Stripped code blocks")
return text, changes
def reduce_em_dashes(text: str) -> tuple[str, int]:
count = text.count("") + text.count(" -- ")
text = re.sub(r"\s*—\s*", ", ", text)
text = re.sub(r"\s+--\s+", ", ", text)
return text, count
def remove_citations(text: str) -> tuple[str, list]:
changes = []
patterns = [
(r'\[oai_citation:\d+[^\]]*\]\([^)]+\)', "oai_citation"),
(r':contentReference\[oaicite:\d+\]\{[^}]+\}', "contentReference"),
(r'turn0search\d+', "turn0search"), (r'turn0image\d+', "turn0image"),
(r'\?utm_source=(chatgpt\.com|openai)', "ChatGPT UTM"),
]
for pattern, name in patterns:
if re.search(pattern, text):
text = re.sub(pattern, '', text)
changes.append(f"Removed {name}")
return text, changes
def simplify_ing(text: str) -> tuple[str, list]:
changes = []
for word in ["highlighting", "underscoring", "emphasizing", "showcasing", "fostering"]:
pattern = re.compile(rf',?\s*{word}\s+[^,.]+[,.]', re.IGNORECASE)
if pattern.search(text):
text = pattern.sub('. ', text)
changes.append(f"Simplified {word} clause")
return text, changes
def clean(text: str) -> str:
text = re.sub(r" +", " ", text)
text = re.sub(r"\n{3,}", "\n\n", text)
text = re.sub(r",\s*,", ",", text)
text = re.sub(r"(^|[.!?]\s+)([a-z])", lambda m: m.group(1) + m.group(2).upper(), text)
return text.strip()
def transform(text: str, aggressive: bool = False) -> tuple[str, list]:
all_changes = []
text, changes = remove_citations(text); all_changes.extend(changes)
text, changes = strip_markdown(text); all_changes.extend(changes)
text, changes = remove_chatbot_sentences(text); all_changes.extend(changes)
text, changes = apply_replacements(text, PATTERNS["copula_avoidance"]); all_changes.extend(changes)
text, changes = apply_replacements(text, PATTERNS["filler_replacements"]); all_changes.extend(changes)
text, fixed = fix_quotes(text)
if fixed:
all_changes.append("Fixed curly quotes")
if aggressive:
text, changes = simplify_ing(text); all_changes.extend(changes)
text, count = reduce_em_dashes(text)
if count > 2:
all_changes.append(f"Replaced {count} em dashes")
return clean(text), all_changes
def main():
parser = argparse.ArgumentParser(description="Transform AI text to human-like")
parser.add_argument("input", nargs="?", help="Input file (or stdin)")
parser.add_argument("-o", "--output", help="Output file")
parser.add_argument("-a", "--aggressive", action="store_true", help="Aggressive mode")
parser.add_argument("-q", "--quiet", action="store_true", help="Suppress change log")
args = parser.parse_args()
text = Path(args.input).read_text() if args.input else sys.stdin.read()
result, changes = transform(text, aggressive=args.aggressive)
if not args.quiet and changes:
print(f"CHANGES ({len(changes)}):", file=sys.stderr)
for c in changes:
print(f"{c}", file=sys.stderr)
if args.output:
Path(args.output).write_text(result)
if not args.quiet:
print(f"{args.output}", file=sys.stderr)
else:
print(result)
if __name__ == "__main__":
main()