java.lang.Object
org.apache.lucene.analysis.fa.PersianStemmer

public class PersianStemmer extends Object
Stemmer for Persian.

Stemming is done in-place for efficiency, operating on a termbuffer.

Stemming is defined as:

  • Removal of attached definite article, conjunction, and prepositions.
  • Stemming of common suffixes.
  • Field Summary

    Fields
    Modifier and Type
    Field
    Description
    private static final char
     
    private static final char
     
    private static final char
     
    private static final char
     
    private static final char[][]
     
    private static final char
     
    private static final char
     
    private static final char
     
  • Constructor Summary

    Constructors
    Constructor
    Description
     
  • Method Summary

    Modifier and Type
    Method
    Description
    private boolean
    endsWithCheckLength(char[] s, int len, char[] suffix)
    Returns true if the suffix matches and can be stemmed
    int
    stem(char[] s, int len)
    Stem an input buffer of Persian text.
    private int
    stemSuffix(char[] s, int len)
    Stem suffix(es) off a Persian word.

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Field Details

  • Constructor Details

    • PersianStemmer

      public PersianStemmer()
  • Method Details

    • stem

      public int stem(char[] s, int len)
      Stem an input buffer of Persian text.
      Parameters:
      s - input buffer
      len - length of input buffer
      Returns:
      length of input buffer after normalization
    • stemSuffix

      private int stemSuffix(char[] s, int len)
      Stem suffix(es) off a Persian word.
      Parameters:
      s - input buffer
      len - length of input buffer
      Returns:
      new length of input buffer after stemming
    • endsWithCheckLength

      private boolean endsWithCheckLength(char[] s, int len, char[] suffix)
      Returns true if the suffix matches and can be stemmed
      Parameters:
      s - input buffer
      len - length of input buffer
      suffix - suffix to check
      Returns:
      true if the suffix matches and can be stemmed