Library for detecting profanities in Go
MIT License
go-away is a stand-alone, lightweight library for detecting and censoring profanities in Go.
This library must remain extremely easy to use. Its original intent of not adding overhead will always remain.
go get -u github.com/TwiN/go-away
package main
import (
"github.com/TwiN/go-away"
)
func main() {
goaway.IsProfane("fuck this shit") // returns true
goaway.ExtractProfanity("fuck this shit") // returns "fuck"
goaway.Censor("fuck this shit") // returns "**** this ****"
goaway.IsProfane("F u C k th1$ $h!t") // returns true
goaway.ExtractProfanity("F u C k th1$ $h!t") // returns "fuck"
goaway.Censor("F u C k th1$ $h!t") // returns "* * * * th1$ ****"
goaway.IsProfane("@$$h073") // returns true
goaway.ExtractProfanity("@$$h073") // returns "asshole"
goaway.Censor("@$$h073") // returns "*******"
goaway.IsProfane("hello, world!") // returns false
goaway.ExtractProfanity("hello, world!") // returns ""
goaway.Censor("hello, world!") // returns "hello, world!"
}
Calling goaway.IsProfane(s)
, goaway.ExtractProfanity(s)
or goaway.Censor(s)
will use the default profanity detector,
but if you'd like to disable leet speak, numerical character or special character sanitization, you have to create a
ProfanityDetector instead:
profanityDetector := goaway.NewProfanityDetector().WithSanitizeLeetSpeak(false).WithSanitizeSpecialCharacters(false).WithSanitizeAccents(false)
profanityDetector.IsProfane("b!tch") // returns false because we're not sanitizing special characters
By default, the NewProfanityDetector
constructor uses the default dictionaries for profanities, false positives and false negatives.
These dictionaries are exposed as goaway.DefaultProfanities
, goaway.DefaultFalsePositives
and goaway.DefaultFalseNegatives
respectively.
If you need to load a different dictionary, you could create a new instance of ProfanityDetector
on this way:
profanities := []string{"ass"}
falsePositives := []string{"bass"}
falseNegatives := []string{"dumbass"}
profanityDetector := goaway.NewProfanityDetector().WithCustomDictionary(profanities, falsePositives, falseNegatives)
You may also specify custom character replacements using WithCustomCharacterReplacements
on a ProfanityDetector
.
By default, this is set to goaway.DefaultCharacterReplacements
.
Note that all character replacements with a value of ' '
are considered as special characters while all characters
with a value that is not ' '
are considered to be leetspeak characters. This means that using
profanityDetector.WithSanitizeSpecialCharacters(bool)
and profanityDetector.WithSanitizeLeetSpeak(bool)
will let you
toggle which character replacements are executed during the sanitization process.
Currently, go-away does not support UTF-8. As such, if the strings you are feeding to this library come from unsanitized user input, you are advised to filter out all non-ASCII characters.
If you'd like to add support for UTF-8, see #43 and #47.
While using a giant regex query to handle everything would be a way of doing it, as more words are added to the list of profanities, that would slow down the filtering considerably.
Instead, the following steps are taken before checking for profanities in a string:
w ords lik e tha t
assassin
) removedIn the future, the following additional steps could also be considered:
s~tring li~ke tha~~t
poooop -> poop
)
fuuck
wouldn't be detected, but it's better than nothing.fuck
entry would support fucker
, fucking
, etc.)