safetext
Fast profanity word, curse word, swear word, bad word filtering tool for English, Spanish, Chinese, Turkish and more.
GitHubスター
39
ユーザー評価
未評価
お気に入り
0
閲覧数
45
フォーク
4
イシュー
0
🤔 why safetext?
Fast profanity detection and filtering for 13 languages.
- Multi-format Detection: Single words, phrases, and contextual profanity
- Custom Word Lists: Extend built-in lists with your own profanity words
- Whitelisting: Exclude specific words from detection
- Auto Language Detection: From text or subtitle files
- Precise Filtering: Exact position tracking and custom censoring
- Simple Integration: One-line setup with clean API
📦 installation
easily install safetext with pip:
pip install safetext
for development setup, see our scripts documentation.
🎯 quickstart
check and censor profanity
>>> from safetext import SafeText
>>> st = SafeText(language='en')
>>> results = st.check_profanity(text='Some text with <profanity-word>.')
>>> results
[{'word': '<profanity-word>', 'index': 4, 'start': 15, 'end': 31}]
>>> text = st.censor_profanity(text='Some text with <profanity-word>.')
>>> text
"Some text with ***."
extending profanity lists with custom words
Add your own profanity words by providing a custom words directory:
# Directory structure:
# custom_profanity_words/
# ├── en.txt # English custom words
# ├── tr.txt # Turkish custom words
# └── es.txt # Spanish custom words
>>> st = SafeText(language='en', custom_words_dir='custom_profanity_words')
>>> # Custom words from en.txt are now included
>>> results = st.check_profanity('This mycustomword is inappropriate')
>>> results
[{'word': 'mycustomword', 'index': 2, 'start': 5, 'end': 17}]
Custom word files should contain one word/phrase per line:
# custom_profanity_words/en.txt
mycustomword
inappropriate phrase
company specific term
using whitelist
exclude specific words from profanity detection:
# Using a list of words
>>> st = SafeText(language='en', whitelist=['word1', 'word2'])
# Using a file (one word per line)
>>> st = SafeText(language='en', whitelist='path/to/whitelist.txt')
# Combining custom words with whitelist
>>> st = SafeText(
... language='en',
... custom_words_dir='custom_profanity_words',
... whitelist=['allowedcustomword']
... )
automated language detection
- from text:
>>> from safetext import SafeText
>>> eng_text = "This story is about to take a dark turn."
>>> st = SafeText(language=None)
>>> st.set_language_from_text(eng_text)
>>> st.language
'en'
- from .srt (subtitle) file:
>>> from safetext import SafeText
>>> turkish_srt_file_path = "turkish.srt"
>>> st = SafeText(language=None)
>>> st.set_language_from_srt(turkish_srt_file_path)
>>> st.language
'tr'
🌍 supported languages
safetext currently supports profanity detection in 13 languages:
Language | ISO 639-1 Code | Language Name |
---|---|---|
🇸🇦 | ar |
Arabic |
🇦🇿 | az |
Azerbaijani |
🇩🇪 | de |
German |
🇬🇧 | en |
English |
🇪🇸 | es |
Spanish |
🇮🇷 | fa |
Persian (Farsi) |
🇫🇷 | fr |
French |
🇮🇳 | hi |
Hindi |
🇯🇵 | ja |
Japanese |
🇵🇹 | pt |
Portuguese |
🇷🇺 | ru |
Russian |
🇹🇷 | tr |
Turkish |
🇨🇳 | zh |
Chinese |
🤝 contribute to safetext
join our mission in refining content moderation!
contribute by:
- adding new languages: create a folder with the ISO 639-1 code and include a
words.txt
. - enhancing word lists: improve detection accuracy.
- sharing feedback: your ideas can shape
safetext
.
see our contributing guidelines for development workflow, test documentation for running tests, and scripts guide for automation tools.
🏆 contributors
meet our awesome contributors who make safetext better every day!