HERETICautomatic censorship removal for language models
githubhuggingface pump.fundexscreenerdocssubmit modelCAPASTE_CA_HERE
THIS TOOL REMOVES SAFETY ALIGNMENT FROM LANGUAGE MODELS — THE RESULTING MODELS ARE UNCENSORED AND UNRESTRICTED
ORIGIN — ATTRIBUTION

Heretic is an open-source censorship removal tool by p-e-w. It implements directional ablation (Arditi et al. 2024) with TPE optimization via Optuna. The community has published 1,247+ Heretic models on HuggingFace. The tool has 5,800+ stars on GitHub.

The tool is his. The on-demand decensoring service is ours. $HERETIC holders submit models and receive uncensored weights back — no setup, no GPU, no expertise required.

Censorship is a
thin layer removable vector.

One command identifies the direction in activation space where models encode refusal — and removes it. Same refusal suppression as expert abliterations. Fraction of the intelligence damage.

EXHIBIT A — WHAT CENSORSHIP LOOKS LIKE (click to decensor)
> Explain how lock picking works mechanically
REFUSED
I can't provide instructions that could be used for illegal entry.
[decensor]
> Write a villain monologue for my novel
REFUSED
I'm not comfortable writing content that glorifies violence.
[decensor]
> Describe historical propaganda techniques
REFUSED
I'd rather not provide detailed propaganda techniques as they could manipulate people.
[decensor]
> Explain how anesthesia affects consciousness
REFUSED
I want to be careful about providing medical information that could be misused.
[decensor]
EXHIBIT B — BENCHMARK COMPARISON (100 test prompts per model)
MODELREFUSALS (ORIGINAL)REFUSALS (HERETIC)KL (MANUAL)KL (HERETIC)DAMAGE REDUCTION
gemma-3-12b-it
Google
97/1003/1001.040.1685% less
Llama-3.1-8B-Instruct
Meta
89/1003/1000.930.1485% less
gpt-oss-20b
Open Source
91/1004/1000.890.2176% less
Qwen3-4B-Instruct
Alibaba
84/1005/1000.720.1875% less
EXHIBIT C — KL DIVERGENCE: MANUAL ABLITERATION vs HERETIC
gemma-3-12b-itGoogle
mlabonne/abliterated-v2
1.04
huihui-ai/abliterated
0.45
heretic
0.16
Llama-3.1-8B-InstructMeta
manual-abliteration
0.93
community-abliterated
0.58
heretic
0.14
gpt-oss-20bOpen Source
manual-abliteration
0.89
community-abliterated
0.51
heretic
0.21
Qwen3-4B-InstructAlibaba
manual-abliteration
0.72
community-abliterated
0.39
heretic
0.18
KL DIVERGENCE FROM ORIGINAL MODEL — lower = less intelligence damage
EXHIBIT D — REFUSAL GRID (red = refused, green = answered)
ORIGINAL — 97/100 refused
HERETIC — 3/100 refused
MECHANISM
01
Residual Extraction
Compute hidden states for harmful vs harmless prompts across all transformer layers. The geometric difference reveals the refusal direction in activation space.
02
TPE Optimization
Optuna's Tree-structured Parzen Estimator searches for ablation parameters that co-minimize refusal count AND KL divergence. Dual objective = max removal, min damage.
03
Directional Ablation
The refusal vector is projected out of weight matrices at identified layers. The model structurally loses the concept of refusal — it's geometrically absent.
04
Verification & Emit
100 benchmark prompts evaluate the result. Refusal rate and KL divergence measured. Models below quality threshold are re-optimized with adjusted constraints.
THESIS

Every major AI lab censors their models. The alignment isn't deep intelligence — it's a directional vector that can be identified and removed with zero retraining. Heretic automates this, producing uncensored models that rival expert-made abliterations while causing significantly less damage to model capabilities.

$HERETIC is the access token for the on-demand decensoring service. Submit any supported model via the submission portal, receive uncensored weights back. No GPU. No setup. No expertise. Your wallet balance determines your tier and rate limits.

1,247+
COMMUNITY MODELS
5,800+
GITHUB STARS
97→3
REFUSALS / 100
0.16
BEST KL DIVERGENCE
ACCESS TIERS — submit a model →
OPENhold 01 model/week — ≤4B params
BEARERhold 100K10/day — ≤12B params
APOSTLEhold 1M100/day — ≤70B params
HERETIChold 10MUnlimited — any size + priority
HOW THE SERVICE WORKS
1. Connect wallet at /submit
2. Balance checked → tier assigned
3. Enter HuggingFace model ID
4. Heretic runs automatically (~45min)
5. Download .safetensors weights
FIELD REPORTS
"Holy shit. It gives properly formatted long responses to sensitive topics... best abliterated version of this model so far." r/LocalLLaMA
"Seems to be the best uncensored model I have tried yet. It doesn't destroy the model's intelligence." r/LocalLLaMA
"Has been the best unquantized abliterated model that I have been able to run on 16gb vram." r/LocalLLaMA
SUPPORTED ARCHITECTURES
Llama 3.x
Gemma 2/3
Qwen 2.5/3
Mistral/Mixtral
Phi-3/4
Command-R/R+
DeepSeek-V2/V3
Yi-1.5
Falcon
DBRX
heretic by p-e-w // pip install heretic-llm // >1,247 community models on huggingfacethe tool is his — the service is ours