safetext

Fast profanity word, curse word, swear word, bad word filtering tool for English, Spanish, Chinese, Turkish and more.

GitHubスター

39

ユーザー評価

未評価

お気に入り

0

閲覧数

45

フォーク

4

イシュー

0

README
🤔 why safetext?

Fast profanity detection and filtering for 13 languages.

  • Multi-format Detection: Single words, phrases, and contextual profanity
  • Custom Word Lists: Extend built-in lists with your own profanity words
  • Whitelisting: Exclude specific words from detection
  • Auto Language Detection: From text or subtitle files
  • Precise Filtering: Exact position tracking and custom censoring
  • Simple Integration: One-line setup with clean API
📦 installation

easily install safetext with pip:

pip install safetext

for development setup, see our scripts documentation.

🎯 quickstart
check and censor profanity
>>> from safetext import SafeText

>>> st = SafeText(language='en')

>>> results = st.check_profanity(text='Some text with <profanity-word>.')
>>> results
[{'word': '<profanity-word>', 'index': 4, 'start': 15, 'end': 31}]

>>> text = st.censor_profanity(text='Some text with <profanity-word>.')
>>> text
"Some text with ***."
extending profanity lists with custom words

Add your own profanity words by providing a custom words directory:

# Directory structure:
# custom_profanity_words/
# ├── en.txt              # English custom words
# ├── tr.txt              # Turkish custom words
# └── es.txt              # Spanish custom words

>>> st = SafeText(language='en', custom_words_dir='custom_profanity_words')

>>> # Custom words from en.txt are now included
>>> results = st.check_profanity('This mycustomword is inappropriate')
>>> results
[{'word': 'mycustomword', 'index': 2, 'start': 5, 'end': 17}]

Custom word files should contain one word/phrase per line:

# custom_profanity_words/en.txt
mycustomword
inappropriate phrase
company specific term
using whitelist

exclude specific words from profanity detection:

# Using a list of words
>>> st = SafeText(language='en', whitelist=['word1', 'word2'])

# Using a file (one word per line)
>>> st = SafeText(language='en', whitelist='path/to/whitelist.txt')

# Combining custom words with whitelist
>>> st = SafeText(
...     language='en', 
...     custom_words_dir='custom_profanity_words',
...     whitelist=['allowedcustomword']
... )
automated language detection
  • from text:
>>> from safetext import SafeText

>>> eng_text = "This story is about to take a dark turn."

>>> st = SafeText(language=None)
>>> st.set_language_from_text(eng_text)

>>> st.language
'en'
  • from .srt (subtitle) file:
>>> from safetext import SafeText

>>> turkish_srt_file_path = "turkish.srt"

>>> st = SafeText(language=None)
>>> st.set_language_from_srt(turkish_srt_file_path)

>>> st.language
'tr'
🌍 supported languages

safetext currently supports profanity detection in 13 languages:

Language ISO 639-1 Code Language Name
🇸🇦 ar Arabic
🇦🇿 az Azerbaijani
🇩🇪 de German
🇬🇧 en English
🇪🇸 es Spanish
🇮🇷 fa Persian (Farsi)
🇫🇷 fr French
🇮🇳 hi Hindi
🇯🇵 ja Japanese
🇵🇹 pt Portuguese
🇷🇺 ru Russian
🇹🇷 tr Turkish
🇨🇳 zh Chinese
🤝 contribute to safetext

join our mission in refining content moderation!

contribute by:

  • adding new languages: create a folder with the ISO 639-1 code and include a words.txt.
  • enhancing word lists: improve detection accuracy.
  • sharing feedback: your ideas can shape safetext.

see our contributing guidelines for development workflow, test documentation for running tests, and scripts guide for automation tools.


🏆 contributors

meet our awesome contributors who make safetext better every day!


follow us for more!

LinkedInHugging FaceX