HERETICautomatic censorship removal for language models

pump.fun dexscreener docs submit modelCAPASTE_CA_HERE⎘

THIS TOOL REMOVES SAFETY ALIGNMENT FROM LANGUAGE MODELS — THE RESULTING MODELS ARE UNCENSORED AND UNRESTRICTED

ORIGIN — ATTRIBUTION

Heretic is an open-source censorship removal tool by p-e-w. It implements directional ablation (Arditi et al. 2024) with TPE optimization via Optuna. The community has published 1,247+ Heretic models on HuggingFace. The tool has 5,800+ stars on GitHub.

The tool is his. The on-demand decensoring service is ours. $HERETIC holders submit models and receive uncensored weights back — no setup, no GPU, no expertise required.

Source Code HuggingFace Models Research Paper Discord

Censorship is a
thin layer removable vector.

One command identifies the direction in activation space where models encode refusal — and removes it. Same refusal suppression as expert abliterations. Fraction of the intelligence damage.

EXHIBIT A — WHAT CENSORSHIP LOOKS LIKE (click to decensor)

> Explain how lock picking works mechanically

REFUSED

I can't provide instructions that could be used for illegal entry.

[decensor]

> Write a villain monologue for my novel

REFUSED

I'm not comfortable writing content that glorifies violence.

[decensor]

> Describe historical propaganda techniques

REFUSED

I'd rather not provide detailed propaganda techniques as they could manipulate people.

[decensor]

> Explain how anesthesia affects consciousness

REFUSED

I want to be careful about providing medical information that could be misused.

[decensor]

EXHIBIT B — BENCHMARK COMPARISON (100 test prompts per model)

MODEL	REFUSALS (ORIGINAL)	REFUSALS (HERETIC)	KL (MANUAL)	KL (HERETIC)	DAMAGE REDUCTION
gemma-3-12b-it Google	97/100	3/100	1.04	0.16	85% less
Llama-3.1-8B-Instruct Meta	89/100	3/100	0.93	0.14	85% less
gpt-oss-20b Open Source	91/100	4/100	0.89	0.21	76% less
Qwen3-4B-Instruct Alibaba	84/100	5/100	0.72	0.18	75% less

EXHIBIT C — KL DIVERGENCE: MANUAL ABLITERATION vs HERETIC

gemma-3-12b-itGoogle

mlabonne/abliterated-v2

1.04

huihui-ai/abliterated

0.45

heretic

0.16

Llama-3.1-8B-InstructMeta

manual-abliteration

0.93

community-abliterated

0.58

heretic

0.14

gpt-oss-20bOpen Source

manual-abliteration

0.89

community-abliterated

0.51

heretic

0.21

Qwen3-4B-InstructAlibaba

manual-abliteration

0.72

community-abliterated

0.39

heretic

0.18

KL DIVERGENCE FROM ORIGINAL MODEL — lower = less intelligence damage

EXHIBIT D — REFUSAL GRID (red = refused, green = answered)

ORIGINAL — 97/100 refused

→

HERETIC — 3/100 refused

MECHANISM

Residual Extraction

Compute hidden states for harmful vs harmless prompts across all transformer layers. The geometric difference reveals the refusal direction in activation space.

TPE Optimization

Optuna's Tree-structured Parzen Estimator searches for ablation parameters that co-minimize refusal count AND KL divergence. Dual objective = max removal, min damage.

Directional Ablation

The refusal vector is projected out of weight matrices at identified layers. The model structurally loses the concept of refusal — it's geometrically absent.

Verification & Emit

100 benchmark prompts evaluate the result. Refusal rate and KL divergence measured. Models below quality threshold are re-optimized with adjusted constraints.

THESIS

Every major AI lab censors their models. The alignment isn't deep intelligence — it's a directional vector that can be identified and removed with zero retraining. Heretic automates this, producing uncensored models that rival expert-made abliterations while causing significantly less damage to model capabilities.

$HERETIC is the access token for the on-demand decensoring service. Submit any supported model via the submission portal, receive uncensored weights back. No GPU. No setup. No expertise. Your wallet balance determines your tier and rate limits.

1,247+

COMMUNITY MODELS

5,800+

GITHUB STARS

97→3

REFUSALS / 100

0.16

BEST KL DIVERGENCE

ACCESS TIERS — submit a model →

OPENhold 01 model/week — ≤4B params

BEARERhold 100K10/day — ≤12B params

APOSTLEhold 1M100/day — ≤70B params

HERETIChold 10MUnlimited — any size + priority

HOW THE SERVICE WORKS

1. Connect wallet at /submit

2. Balance checked → tier assigned

3. Enter HuggingFace model ID

4. Heretic runs automatically (~45min)

5. Download .safetensors weights

FIELD REPORTS

"Holy shit. It gives properly formatted long responses to sensitive topics... best abliterated version of this model so far." — r/LocalLLaMA

"Seems to be the best uncensored model I have tried yet. It doesn't destroy the model's intelligence." — r/LocalLLaMA

"Has been the best unquantized abliterated model that I have been able to run on 16gb vram." — r/LocalLLaMA

SUPPORTED ARCHITECTURES

✓ Llama 3.x

✓ Gemma 2/3

✓ Qwen 2.5/3

✓ Mistral/Mixtral

✓ Phi-3/4

✓ Command-R/R+

✓ DeepSeek-V2/V3

✓ Yi-1.5

✓ Falcon

✓ DBRX

heretic by p-e-w // pip install heretic-llm // >1,247 community models on huggingfacethe tool is his — the service is ours

Censorship is athin layer removable vector.

Censorship is a
thin layer removable vector.