High-performance profanity filter for Python, Rust, and JavaScript (WebAssembly)
with multilingual support and evasion detection.
Installation • Quick Start • Benchmarks • Supported Languages • Evasion Detection • Documentation
BadWords is a sophisticated profanity filtering library designed to clean up user-generated content. Unlike simple keyword matching, it uses similarity scoring, homoglyph detection, and transliteration to catch even the most cleverly disguised insults.
Architecture: The core is implemented in Rust for performance. Python provides a thin API layer with full type hints for IDE/linter support. The Rust library can also be used directly from Rust projects.
- Recommended: Python 3.13
- Minimum: Python 3.10+
pip install git+[https://github.com/FlacSy/badwords.git](https://github.com/FlacSy/badwords.git)
pip install badwords-pyfrom badwords import ProfanityFilter
# Initialize filter
p = ProfanityFilter()
# Load specific languages (e.g., English and Russian)
p.init(languages=["en", "ru"])
# Or load ALL 26+ supported languages
p.init()text = "Some very b4d text here"
# 1. Simple check (Returns Boolean)
is_bad = p.filter_text(text)
print(is_bad) # True
# 2. Censoring text (Returns String)
clean_text = p.filter_text(text, replace_character="*")
print(clean_text) # "Some very *** text here"| CPU | GPU | RAM | OS |
|---|---|---|---|
| x86_64 i7 Intel® Core™ i7-10700KF × 16 | NVIDIA GeForce RTX™ 3070 | 64 GB DDR4 3200MHz | Ubuntu 24.04.2 LTS |
Rule-based matching (en+ru, match_threshold=1.0). Run: make bench
| Scenario | Rust (badwords-core) | Python (badwords-py) |
|---|---|---|
| Clean text (no match) | ~7.6 µs (~130 K/s) | ~7.7 µs (~130 K/s) |
| Bad word (match) | ~3.1 µs (~320 K/s) | ~2.7 µs (~370 K/s) |
| Censor (replace) | ~2.8 µs (~360 K/s) | ~2.5 µs (~400 K/s) |
| 5 texts batch | ~15 µs (~330 K/s) | ~16 µs (~310 K/s) |
Python uses Rust via PyO3, overhead minimal.
Rule-based mode, en+ru. Run: make bench-compare (requires pip install glin-profanity)
| Scenario | BadWords | glin-profanity |
|---|---|---|
| Clean text | ~7 µs (~140 K/s) | ~4.4 ms (~230/s) |
| Bad word | ~1.3 µs (~770 K/s) | ~0.2 ms (~5 K/s) |
| Censor | ~1.8 µs (~560 K/s) | ~1.4 ms (~700/s) |
| 5 texts batch | ~16 µs (~310 K/s) | ~10 ms (~500/s) |
BadWords is ~100–600× faster (Rust core vs pure Python).
pip install glin-profanity[ml] + make bench-compare. 100 iter each.
| Scenario | BadWords ML (ONNX) | glin transformer |
|---|---|---|
| Clean text (43 chars) | ~6.5 ms (~150/s) | ~27 ms (~37/s) |
| Bad word (8 chars) | ~4.6 ms (~220/s) | ~21 ms (~47/s) |
| 5 texts batch (82 chars) | ~24 ms (~210/s) | ~107 ms (~47/s) |
BadWords ML (XLM-RoBERTa) ~3–4× faster than glin transformer.
The core method of the library.
| Parameter | Type | Default | Description |
|---|---|---|---|
text |
str |
Required | Input text to check. |
match_threshold |
float |
1.0 |
Similarity threshold (1.0 = exact match, 0.95 = fuzzy). |
replace_character |
str/None |
None |
If provided, returns censored string. If None, returns bool. |
Warning
Performance Tip: Using match_threshold < 1.0 enables fuzzy matching which is slower. Use 1.0 for high-traffic real-time filtering, or 0.95 for a good balance.
Standard filters are easy to bypass. BadWords is built to detect:
- Homoglyphs: Detects
hеllo(using Cyrillic 'е') orh4llo(numbers). - Transliteration: Automatically handles mapping between Cyrillic and Latin alphabets.
- Normalization: Strips diacritics, special characters, and decorative Unicode symbols.
- Similarity Analysis: Uses fuzzy matching to find words with deliberate typos.
_filter.filter_text("hеllо") # Mixed alphabets (Cyrillic + Latin) -> DETECTED
_filter.filter_text("h3ll0") # Character substitution -> DETECTED
_filter.filter_text("h⍺llo") # Mathematical/Greek symbols -> DETECTED
_filter.filter_text("привет") # Transliterated matches -> DETECTEDBadWords supports 25 languages out of the box:
| Code | Language | Code | Language | Code | Language |
|---|---|---|---|---|---|
en |
English | ru |
Russian | ua |
Ukrainian |
de |
German | fr |
French | it |
Italian |
sp |
Spanish | pl |
Polish | cz |
Czech |
ja |
Japanese | ko |
Korean | th |
Thai |
br |
Portuguese (BR) | da |
Danish | du |
Dutch |
fi |
Finnish | gr |
Greek | hu |
Hungarian |
in |
Indonesian | lt |
Lithuanian | no |
Norwegian |
po |
Portuguese | ro |
Romanian | sw |
Swedish |
tu |
Turkish |
Use p.get_all_languages() in code. Full list with word counts: badwords.flacsy.dev
from badwords import ProfanityFilter
def monitor_chat():
# Setup for a global chat
profanity_filter = ProfanityFilter()
profanity_filter.init(["en", "ru", "de"])
# Custom project-specific banned words
profanity_filter.add_words(["spam_link_v1", "scam_bot_99"])
user_input = "Hey! Check out this b.a.d.w.o.r.d"
# Moderate with high accuracy
is_offensive = profanity_filter.filter_text(user_input, match_threshold=0.95)
if is_offensive:
print("Message blocked: Contains restricted language.")
else:
# Proceed with processing
pass
if __name__ == "__main__":
monitor_chat()Published on crates.io:
[dependencies]
badwords-core = "2"use badwords_core::{ProfanityFilter, default_resource_dir};
let resource_dir = default_resource_dir();
let mut filter = ProfanityFilter::new(&resource_dir, true, true, true, true);
filter.init(None).unwrap();
filter.add_words(&["custom".to_string()]);
let (found, _) = filter.filter_text("hello", 1.0, None);Same Rust code for browser and Node.js, compiled to WASM.
# Browser
make wasm
# Node.js
make wasm-nodejs<script type="module">
import init, { ProfanityFilter } from './path/to/badwords_wasm.js';
await init();
const filter = new ProfanityFilter();
console.log(filter.isBad('text')); // boolean
console.log(filter.censor('text', '*')); // string
</script>const { ProfanityFilter } = require('badwords-wasm');
const filter = new ProfanityFilter();
filter.isBad('hello'); // false
filter.censor('bad word', '*'); // "*** word"
filter.addWords(['custom']);Built-in: en and ru. Additional languages via @badwords/languages:
npm install badwords-wasm @badwords/languagesimport init, { ProfanityFilter } from 'badwords-wasm';
import de from '@badwords/languages/de';
import ua from '@badwords/languages/ua';
await init();
const filter = new ProfanityFilter();
filter.addWords(de);
filter.addWords(ua);Available: br, cz, da, de, du, en, fi, fr, gr, hu, in, it, ja, ko, lt, no, pl, po, ro, ru, sp, sw, th, tu, ua. See @badwords/languages.
Examples: examples/wasm/browser/, examples/wasm/node/
Requires: Rust, Python, maturin
python -m venv .venv && source .venv/bin/activate # Linux/macOS
pip install maturin
make develop
# or: cd python && maturin build && pip install target/wheels/badwords_py-*.whlBuild the WASM package (requires wasm-pack):
cargo install wasm-pack
make wasmOutput: rust/badwords-wasm/pkg/ (npm package badwords-wasm)
- Browser: Use the generated JS with a bundler or static server. See
examples/wasm/browser/ - Node.js:
import init, { ProfanityFilter } from 'badwords-wasm'afternpm install. Seeexamples/wasm/node/ - Publish to npm:
make wasmormake wasm-nodejs, thenmake npm-publish - Optional languages:
@badwords/languages—make lang-packagesthenmake npm-publish-languages
Full documentation (Python, Rust, JavaScript) with examples and API reference: badwords.flacsy.dev (EN / RU).
Contributions are what make the open-source community an amazing place to learn, inspire, and create.
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/AmazingFeature) - Commit your Changes (
git commit -m 'Add AmazingFeature') - Push to the Branch (
git push origin feature/AmazingFeature) - Open a Pull Request
Distributed under the MIT License. See LICENSE for more information.