safetext

Name: safetext
Availability: InStock
Author: Viddexa AI

Fast profanity word, curse word, swear word, bad word filtering tool for English, Spanish, Chinese, Turkish and more.

GitHub

GitHubスター

ユーザー評価

未評価

お気に入り

閲覧数

フォーク

イシュー

README

🤔 why safetext?

Fast profanity detection and filtering for 13 languages.

Multi-format Detection: Single words, phrases, and contextual profanity
Custom Word Lists: Extend built-in lists with your own profanity words
Whitelisting: Exclude specific words from detection
Auto Language Detection: From text or subtitle files
Precise Filtering: Exact position tracking and custom censoring
Simple Integration: One-line setup with clean API

📦 installation

easily install safetext with pip:

pip install safetext

for development setup, see our scripts documentation.

🎯 quickstart

check and censor profanity

>>> from safetext import SafeText

>>> st = SafeText(language='en')

>>> results = st.check_profanity(text='Some text with <profanity-word>.')
>>> results
[{'word': '<profanity-word>', 'index': 4, 'start': 15, 'end': 31}]

>>> text = st.censor_profanity(text='Some text with <profanity-word>.')
>>> text
"Some text with ***."

extending profanity lists with custom words

Add your own profanity words by providing a custom words directory:

# Directory structure:
# custom_profanity_words/
# ├── en.txt              # English custom words
# ├── tr.txt              # Turkish custom words
# └── es.txt              # Spanish custom words

>>> st = SafeText(language='en', custom_words_dir='custom_profanity_words')

>>> # Custom words from en.txt are now included
>>> results = st.check_profanity('This mycustomword is inappropriate')
>>> results
[{'word': 'mycustomword', 'index': 2, 'start': 5, 'end': 17}]

Custom word files should contain one word/phrase per line:

# custom_profanity_words/en.txt
mycustomword
inappropriate phrase
company specific term

using whitelist

exclude specific words from profanity detection:

# Using a list of words
>>> st = SafeText(language='en', whitelist=['word1', 'word2'])

# Using a file (one word per line)
>>> st = SafeText(language='en', whitelist='path/to/whitelist.txt')

# Combining custom words with whitelist
>>> st = SafeText(
...     language='en', 
...     custom_words_dir='custom_profanity_words',
...     whitelist=['allowedcustomword']
... )

automated language detection

from text:

>>> from safetext import SafeText

>>> eng_text = "This story is about to take a dark turn."

>>> st = SafeText(language=None)
>>> st.set_language_from_text(eng_text)

>>> st.language
'en'

from .srt (subtitle) file:

>>> from safetext import SafeText

>>> turkish_srt_file_path = "turkish.srt"

>>> st = SafeText(language=None)
>>> st.set_language_from_srt(turkish_srt_file_path)

>>> st.language
'tr'

🌍 supported languages

safetext currently supports profanity detection in 13 languages:

Language	ISO 639-1 Code	Language Name
🇸🇦	`ar`	Arabic
🇦🇿	`az`	Azerbaijani
🇩🇪	`de`	German
🇬🇧	`en`	English
🇪🇸	`es`	Spanish
🇮🇷	`fa`	Persian (Farsi)
🇫🇷	`fr`	French
🇮🇳	`hi`	Hindi
🇯🇵	`ja`	Japanese
🇵🇹	`pt`	Portuguese
🇷🇺	`ru`	Russian
🇹🇷	`tr`	Turkish
🇨🇳	`zh`	Chinese

🤝 contribute to safetext

join our mission in refining content moderation!

contribute by:

adding new languages: create a folder with the ISO 639-1 code and include a words.txt.
enhancing word lists: improve detection accuracy.
sharing feedback: your ideas can shape safetext.

see our contributing guidelines for development workflow, test documentation for running tests, and scripts guide for automation tools.