Package org.apache.lucene.misc
Class SweetSpotSimilarity
java.lang.Object
org.apache.lucene.search.similarities.Similarity
org.apache.lucene.search.similarities.TFIDFSimilarity
org.apache.lucene.search.similarities.ClassicSimilarity
org.apache.lucene.misc.SweetSpotSimilarity
A similarity with a lengthNorm that provides for a "plateau" of equally good lengths, and tf
helper functions.
For lengthNorm, A min/max can be specified to define the plateau of lengths that should all have a norm of 1.0. Below the min, and above the max the lengthNorm drops off in a sqrt function.
For tf, baselineTf and hyperbolicTf functions are provided, which subclasses can choose between.
-
Nested Class Summary
Nested classes/interfaces inherited from class org.apache.lucene.search.similarities.Similarity
Similarity.SimScorer
-
Field Summary
FieldsModifier and TypeFieldDescriptionprivate int
private int
private float
private float
private double
private float
private float
private float
private float
Fields inherited from class org.apache.lucene.search.similarities.TFIDFSimilarity
discountOverlaps
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionfloat
baselineTf
(float freq) Implemented as:(x <= min) ? base : sqrt(x+(base**2)-min)
...but with a special case check for 0.float
hyperbolicTf
(float freq) Uses a hyperbolic tangent function that allows for a hard max...float
lengthNorm
(int numTerms) Implemented as:1/sqrt( steepness * (abs(x-min) + abs(x-max) - (max-min)) + 1 )
.void
setBaselineTfFactors
(float base, float min) Sets the baseline and minimum function variables for baselineTfvoid
setHyperbolicTfFactors
(float min, float max, double base, float xoffset) Sets the function variables for the hyperbolicTf functionsvoid
setLengthNormFactors
(int min, int max, float steepness, boolean discountOverlaps) Sets the default function variables used by lengthNorm when no field specific variables have been set.float
tf
(float freq) Delegates to baselineTftoString()
Methods inherited from class org.apache.lucene.search.similarities.ClassicSimilarity
idf, idfExplain
Methods inherited from class org.apache.lucene.search.similarities.TFIDFSimilarity
computeNorm, getDiscountOverlaps, idfExplain, scorer, setDiscountOverlaps
-
Field Details
-
ln_min
private int ln_min -
ln_max
private int ln_max -
ln_steep
private float ln_steep -
tf_base
private float tf_base -
tf_min
private float tf_min -
tf_hyper_min
private float tf_hyper_min -
tf_hyper_max
private float tf_hyper_max -
tf_hyper_base
private double tf_hyper_base -
tf_hyper_xoffset
private float tf_hyper_xoffset
-
-
Constructor Details
-
SweetSpotSimilarity
public SweetSpotSimilarity()
-
-
Method Details
-
setBaselineTfFactors
public void setBaselineTfFactors(float base, float min) Sets the baseline and minimum function variables for baselineTf- See Also:
-
setHyperbolicTfFactors
public void setHyperbolicTfFactors(float min, float max, double base, float xoffset) Sets the function variables for the hyperbolicTf functions- Parameters:
min
- the minimum tf value to ever be returned (default: 0.0)max
- the maximum tf value to ever be returned (default: 2.0)base
- the base value to be used in the exponential for the hyperbolic function (default: 1.3)xoffset
- the midpoint of the hyperbolic function (default: 10.0)- See Also:
-
setLengthNormFactors
public void setLengthNormFactors(int min, int max, float steepness, boolean discountOverlaps) Sets the default function variables used by lengthNorm when no field specific variables have been set.- See Also:
-
lengthNorm
public float lengthNorm(int numTerms) Implemented as:1/sqrt( steepness * (abs(x-min) + abs(x-max) - (max-min)) + 1 )
.This degrades to
1/sqrt(x)
when min and max are both 1 and steepness is 0.5:TODO: potential optimization is to just flat out return 1.0f if numTerms is between min and max.
- Overrides:
lengthNorm
in classClassicSimilarity
- Parameters:
numTerms
- the number of terms in the field, optionallydiscounting overlaps
- Returns:
- a length normalization value
- See Also:
-
tf
public float tf(float freq) Delegates to baselineTf- Overrides:
tf
in classClassicSimilarity
- Parameters:
freq
- the frequency of a term within a document- Returns:
- a score factor based on a term's within-document frequency
- See Also:
-
baselineTf
public float baselineTf(float freq) Implemented as:(x <= min) ? base : sqrt(x+(base**2)-min)
...but with a special case check for 0.This degrades to
sqrt(x)
when min and base are both 0 -
hyperbolicTf
public float hyperbolicTf(float freq) Uses a hyperbolic tangent function that allows for a hard max...tf(x)=min+(max-min)/2*(((base**(x-xoffset)-base**-(x-xoffset))/(base**(x-xoffset)+base**-(x-xoffset)))+1)
This code is provided as a convenience for subclasses that want to use a hyperbolic tf function.
-
toString
- Overrides:
toString
in classClassicSimilarity
-