Package org.apache.lucene.util.automaton
Class DaciukMihovAutomatonBuilder
- java.lang.Object
-
- org.apache.lucene.util.automaton.DaciukMihovAutomatonBuilder
-
public final class DaciukMihovAutomatonBuilder extends java.lang.Object
Builds a minimal, deterministicAutomaton
that accepts a set of strings. The algorithm requires sorted input data, but is very fast (nearly linear with the input size).
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description private static class
DaciukMihovAutomatonBuilder.State
DFSA state withchar
labels on transitions.
-
Field Summary
Fields Modifier and Type Field Description private static java.util.Comparator<CharsRef>
comparator
A comparator used for enforcing sorted UTF8 order, used in assertions only.(package private) static int
MAX_TERM_LENGTH
This builder rejects terms that are more than 1k chars long since it then uses recursion based on the length of the string, which might cause stack overflows.private CharsRef
previous
Previous sequence added to the automaton inadd(CharsRef)
.private DaciukMihovAutomatonBuilder.State
root
Root automaton state.private java.util.HashMap<DaciukMihovAutomatonBuilder.State,DaciukMihovAutomatonBuilder.State>
stateRegistry
A "registry" for state interning.
-
Constructor Summary
Constructors Modifier Constructor Description private
DaciukMihovAutomatonBuilder()
The default constructor is private.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description void
add(CharsRef current)
Add another character sequence to this automaton.private void
addSuffix(DaciukMihovAutomatonBuilder.State state, java.lang.CharSequence current, int fromIndex)
Add a suffix ofcurrent
starting atfromIndex
(inclusive) to statestate
.static Automaton
build(java.util.Collection<BytesRef> input)
Build a minimal, deterministic automaton from a sorted list ofBytesRef
representing strings in UTF-8.DaciukMihovAutomatonBuilder.State
complete()
Finalize the automaton and return the root state.private static int
convert(Automaton.Builder a, DaciukMihovAutomatonBuilder.State s, java.util.IdentityHashMap<DaciukMihovAutomatonBuilder.State,java.lang.Integer> visited)
Internal recursive traversal for conversion.private void
replaceOrRegister(DaciukMihovAutomatonBuilder.State state)
Replace last child ofstate
with an already registered state or stateRegistry the last child state.private boolean
setPrevious(CharsRef current)
Copycurrent
into an internal buffer.
-
-
-
Field Detail
-
MAX_TERM_LENGTH
static final int MAX_TERM_LENGTH
This builder rejects terms that are more than 1k chars long since it then uses recursion based on the length of the string, which might cause stack overflows.- See Also:
- Constant Field Values
-
stateRegistry
private java.util.HashMap<DaciukMihovAutomatonBuilder.State,DaciukMihovAutomatonBuilder.State> stateRegistry
A "registry" for state interning.
-
root
private DaciukMihovAutomatonBuilder.State root
Root automaton state.
-
previous
private CharsRef previous
Previous sequence added to the automaton inadd(CharsRef)
.
-
comparator
private static final java.util.Comparator<CharsRef> comparator
A comparator used for enforcing sorted UTF8 order, used in assertions only.
-
-
Method Detail
-
add
public void add(CharsRef current)
Add another character sequence to this automaton. The sequence must be lexicographically larger or equal compared to any previous sequences added to this automaton (the input must be sorted).
-
complete
public DaciukMihovAutomatonBuilder.State complete()
Finalize the automaton and return the root state. No more strings can be added to the builder after this call.- Returns:
- Root automaton state.
-
convert
private static int convert(Automaton.Builder a, DaciukMihovAutomatonBuilder.State s, java.util.IdentityHashMap<DaciukMihovAutomatonBuilder.State,java.lang.Integer> visited)
Internal recursive traversal for conversion.
-
build
public static Automaton build(java.util.Collection<BytesRef> input)
Build a minimal, deterministic automaton from a sorted list ofBytesRef
representing strings in UTF-8. These strings must be binary-sorted.
-
setPrevious
private boolean setPrevious(CharsRef current)
Copycurrent
into an internal buffer.
-
replaceOrRegister
private void replaceOrRegister(DaciukMihovAutomatonBuilder.State state)
Replace last child ofstate
with an already registered state or stateRegistry the last child state.
-
addSuffix
private void addSuffix(DaciukMihovAutomatonBuilder.State state, java.lang.CharSequence current, int fromIndex)
Add a suffix ofcurrent
starting atfromIndex
(inclusive) to statestate
.
-
-