Class ClassicAnalyzer
- java.lang.Object
-
- org.apache.lucene.analysis.Analyzer
-
- org.apache.lucene.analysis.StopwordAnalyzerBase
-
- org.apache.lucene.analysis.standard.ClassicAnalyzer
-
- All Implemented Interfaces:
java.io.Closeable
,java.lang.AutoCloseable
public final class ClassicAnalyzer extends StopwordAnalyzerBase
FiltersClassicTokenizer
withClassicFilter
,LowerCaseFilter
andStopFilter
, using a list of English stop words. ClassicAnalyzer was named StandardAnalyzer in Lucene versions prior to 3.1. As of 3.1,StandardAnalyzer
implements Unicode text segmentation, as specified by UAX#29.- Since:
- 3.1
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from class org.apache.lucene.analysis.Analyzer
Analyzer.ReuseStrategy, Analyzer.TokenStreamComponents
-
-
Field Summary
Fields Modifier and Type Field Description static int
DEFAULT_MAX_TOKEN_LENGTH
Default maximum allowed token lengthprivate int
maxTokenLength
static CharArraySet
STOP_WORDS_SET
An unmodifiable set containing some common English words that are usually not useful for searching.-
Fields inherited from class org.apache.lucene.analysis.StopwordAnalyzerBase
stopwords
-
Fields inherited from class org.apache.lucene.analysis.Analyzer
GLOBAL_REUSE_STRATEGY, PER_FIELD_REUSE_STRATEGY
-
-
Constructor Summary
Constructors Constructor Description ClassicAnalyzer()
Builds an analyzer with the default stop words (STOP_WORDS_SET
).ClassicAnalyzer(java.io.Reader stopwords)
Builds an analyzer with the stop words from the given reader.ClassicAnalyzer(CharArraySet stopWords)
Builds an analyzer with the given stop words.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected Analyzer.TokenStreamComponents
createComponents(java.lang.String fieldName)
Creates a newAnalyzer.TokenStreamComponents
instance for this analyzer.int
getMaxTokenLength()
protected TokenStream
normalize(java.lang.String fieldName, TokenStream in)
Wrap the givenTokenStream
in order to apply normalization filters.void
setMaxTokenLength(int length)
Set maximum allowed token length.-
Methods inherited from class org.apache.lucene.analysis.StopwordAnalyzerBase
getStopwordSet, loadStopwordSet, loadStopwordSet, loadStopwordSet
-
Methods inherited from class org.apache.lucene.analysis.Analyzer
attributeFactory, close, getOffsetGap, getPositionIncrementGap, getReuseStrategy, getVersion, initReader, initReaderForNormalization, normalize, setVersion, tokenStream, tokenStream
-
-
-
-
Field Detail
-
DEFAULT_MAX_TOKEN_LENGTH
public static final int DEFAULT_MAX_TOKEN_LENGTH
Default maximum allowed token length- See Also:
- Constant Field Values
-
maxTokenLength
private int maxTokenLength
-
STOP_WORDS_SET
public static final CharArraySet STOP_WORDS_SET
An unmodifiable set containing some common English words that are usually not useful for searching.
-
-
Constructor Detail
-
ClassicAnalyzer
public ClassicAnalyzer(CharArraySet stopWords)
Builds an analyzer with the given stop words.- Parameters:
stopWords
- stop words
-
ClassicAnalyzer
public ClassicAnalyzer()
Builds an analyzer with the default stop words (STOP_WORDS_SET
).
-
ClassicAnalyzer
public ClassicAnalyzer(java.io.Reader stopwords) throws java.io.IOException
Builds an analyzer with the stop words from the given reader.- Parameters:
stopwords
- Reader to read stop words from- Throws:
java.io.IOException
- See Also:
WordlistLoader.getWordSet(Reader)
-
-
Method Detail
-
setMaxTokenLength
public void setMaxTokenLength(int length)
Set maximum allowed token length. If a token is seen that exceeds this length then it is discarded. This setting only takes effect the next time tokenStream or tokenStream is called.
-
getMaxTokenLength
public int getMaxTokenLength()
- See Also:
setMaxTokenLength(int)
-
createComponents
protected Analyzer.TokenStreamComponents createComponents(java.lang.String fieldName)
Description copied from class:Analyzer
Creates a newAnalyzer.TokenStreamComponents
instance for this analyzer.- Specified by:
createComponents
in classAnalyzer
- Parameters:
fieldName
- the name of the fields content passed to theAnalyzer.TokenStreamComponents
sink as a reader- Returns:
- the
Analyzer.TokenStreamComponents
for this analyzer.
-
normalize
protected TokenStream normalize(java.lang.String fieldName, TokenStream in)
Description copied from class:Analyzer
Wrap the givenTokenStream
in order to apply normalization filters. The default implementation returns theTokenStream
as-is. This is used byAnalyzer.normalize(String, String)
.
-
-