Class Stemmer
java.lang.Object
org.apache.lucene.analysis.hunspell.Stemmer
Stemmer uses the affix rules declared in the Dictionary to generate one or more stems for a word.
It conforms to the algorithm in the original hunspell algorithm, including recursive suffix
stripping.
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescription(package private) static interface
(package private) static interface
-
Field Summary
Fields -
Constructor Summary
ConstructorsConstructorDescriptionStemmer
(Dictionary dictionary) Constructs a new Stemmer which will use the provided Dictionary to create its stems. -
Method Summary
Modifier and TypeMethodDescriptionprivate boolean
applyAffix
(char[] strippedWord, int offset, int length, WordContext context, int affix, int previousAffix, int prefixId, int recursionDepth, boolean prefix, Stemmer.RootProcessor processor) Applies the affix rule to the given word, producing a list of stems if any are foundprivate boolean
callProcessor
(char[] word, int offset, int length, Stemmer.RootProcessor processor, IntsRef forms, int i) private static char[]
capitalizeAfterApostrophe
(char[] word, int length) private char[]
caseFoldLower
(char[] word, int length) folds lowercase variant of word (title cased) to lowerBufferprivate char[]
caseFoldTitle
(char[] word, int length) folds titlecase variant of word to titleBuffer(package private) WordCase
caseOf
(char[] word, int length) returns EXACT_CASE,TITLE_CASE, or UPPER_CASE type for the word(package private) boolean
doStem
(char[] word, int offset, int length, WordContext context, Stemmer.RootProcessor processor) private boolean
isAffixCompatible
(int affix, char prevFlag, int recursionDepth, boolean isPrefix, boolean previousWasPrefix, WordContext context) private boolean
isFlagAppendedByAffix
(int affixId, char flag) private boolean
isRootCompatibleWithContext
(WordContext context, int lastAffix, int entryId) private boolean
needsAnotherAffix
(int affix, int previousAffix, boolean isSuffix, int prefixId) private CharsRef
stem
(char[] word, int length) Find the stem(s) of the provided wordprivate boolean
stem
(char[] word, int offset, int length, WordContext context, int previous, char prevFlag, int prefixId, int recursionDepth, boolean doPrefix, boolean previousWasPrefix, Stemmer.RootProcessor processor) Generates a list of stems for the provided wordFind the stem(s) of the provided word.private String
stemException
(int morphDataId) private char[]
stripAffix
(char[] word, int offset, int length, int affixLen, int affix, boolean isPrefix) uniqueStems
(char[] word, int length) Find the unique stem(s) of the provided word(package private) boolean
varyCase
(char[] word, int length, WordCase wordCase, Stemmer.CaseVariationProcessor processor) private boolean
varySharpS
(char[] word, int length, Stemmer.CaseVariationProcessor processor)
-
Field Details
-
dictionary
-
formStep
private final int formStep
-
-
Constructor Details
-
Stemmer
Constructs a new Stemmer which will use the provided Dictionary to create its stems.- Parameters:
dictionary
- Dictionary that will be used to create the stems
-
-
Method Details
-
stem
Find the stem(s) of the provided word.- Parameters:
word
- Word to find the stems for- Returns:
- List of stems for the word
-
stem
Find the stem(s) of the provided word- Parameters:
word
- Word to find the stems for- Returns:
- List of stems for the word
-
varyCase
boolean varyCase(char[] word, int length, WordCase wordCase, Stemmer.CaseVariationProcessor processor) -
caseOf
returns EXACT_CASE,TITLE_CASE, or UPPER_CASE type for the word -
caseFoldTitle
private char[] caseFoldTitle(char[] word, int length) folds titlecase variant of word to titleBuffer -
caseFoldLower
private char[] caseFoldLower(char[] word, int length) folds lowercase variant of word (title cased) to lowerBuffer -
capitalizeAfterApostrophe
private static char[] capitalizeAfterApostrophe(char[] word, int length) -
varySharpS
-
doStem
boolean doStem(char[] word, int offset, int length, WordContext context, Stemmer.RootProcessor processor) -
uniqueStems
Find the unique stem(s) of the provided word- Parameters:
word
- Word to find the stems for- Returns:
- List of stems for the word
-
stemException
-
newStem
-
stem
private boolean stem(char[] word, int offset, int length, WordContext context, int previous, char prevFlag, int prefixId, int recursionDepth, boolean doPrefix, boolean previousWasPrefix, Stemmer.RootProcessor processor) Generates a list of stems for the provided word- Parameters:
word
- Word to generate the stems forprevious
- previous affix that was removed (so we dont remove same one twice)prevFlag
- Flag from a previous stemming step that need to be cross-checked with any affixes in this recursive stepprefixId
- ID of the most inner removed prefix, so that when removing a suffix, it's also checked against the wordrecursionDepth
- current recursiondepthdoPrefix
- true if we should remove prefixespreviousWasPrefix
- true if the previous removal was a prefix: if we are removing a suffix, and it has no continuation requirements, it's ok. but two prefixes (COMPLEXPREFIXES) or two suffixes must have continuation requirements to recurse.- Returns:
- whether the processing should be continued
-
stripAffix
private char[] stripAffix(char[] word, int offset, int length, int affixLen, int affix, boolean isPrefix) - Returns:
- null if affix conditions isn't met; a reference to the same char[] if the affix has no strip data and can thus be simply removed, or a new char[] containing the word affix removal
-
isAffixCompatible
private boolean isAffixCompatible(int affix, char prevFlag, int recursionDepth, boolean isPrefix, boolean previousWasPrefix, WordContext context) -
applyAffix
private boolean applyAffix(char[] strippedWord, int offset, int length, WordContext context, int affix, int previousAffix, int prefixId, int recursionDepth, boolean prefix, Stemmer.RootProcessor processor) Applies the affix rule to the given word, producing a list of stems if any are found- Parameters:
strippedWord
- Char array containing the word with the affix removed and the strip addedoffset
- where the word actually starts in the arraylength
- the length of the stripped wordaffix
- HunspellAffix representing the affix rule itselfprefixId
- when we already stripped a prefix, we can't simply recurse and check the suffix, unless both are compatible so we must check dictionary form against both to add it as a stem!recursionDepth
- current recursion depthprefix
- true if we are removing a prefix (false if it's a suffix)- Returns:
- whether the processing should be continued
-
isRootCompatibleWithContext
-
callProcessor
private boolean callProcessor(char[] word, int offset, int length, Stemmer.RootProcessor processor, IntsRef forms, int i) -
needsAnotherAffix
private boolean needsAnotherAffix(int affix, int previousAffix, boolean isSuffix, int prefixId) -
isFlagAppendedByAffix
private boolean isFlagAppendedByAffix(int affixId, char flag)
-