Package org.apache.lucene.util.automaton
Class DaciukMihovAutomatonBuilder
java.lang.Object
org.apache.lucene.util.automaton.DaciukMihovAutomatonBuilder
Builds a minimal, deterministic
Automaton
that accepts a set of strings. The algorithm
requires sorted input data, but is very fast (nearly linear with the input size).-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionprivate static final class
DFSA state withchar
labels on transitions. -
Field Summary
FieldsModifier and TypeFieldDescriptionprivate static final Comparator<CharsRef>
A comparator used for enforcing sorted UTF8 order, used in assertions only.(package private) static final int
This builder rejects terms that are more than 1k chars long since it then uses recursion based on the length of the string, which might cause stack overflows.private CharsRef
Previous sequence added to the automaton inadd(CharsRef)
.Root automaton state.A "registry" for state interning. -
Constructor Summary
ConstructorsModifierConstructorDescriptionprivate
The default constructor is private. -
Method Summary
Modifier and TypeMethodDescriptionvoid
Add another character sequence to this automaton.private void
addSuffix
(DaciukMihovAutomatonBuilder.State state, CharSequence current, int fromIndex) Add a suffix ofcurrent
starting atfromIndex
(inclusive) to statestate
.static Automaton
build
(Collection<BytesRef> input) Build a minimal, deterministic automaton from a sorted list ofBytesRef
representing strings in UTF-8.complete()
Finalize the automaton and return the root state.private static int
convert
(Automaton.Builder a, DaciukMihovAutomatonBuilder.State s, IdentityHashMap<DaciukMihovAutomatonBuilder.State, Integer> visited) Internal recursive traversal for conversion.private void
Replace last child ofstate
with an already registered state or stateRegistry the last child state.private boolean
setPrevious
(CharsRef current) Copycurrent
into an internal buffer.
-
Field Details
-
MAX_TERM_LENGTH
static final int MAX_TERM_LENGTHThis builder rejects terms that are more than 1k chars long since it then uses recursion based on the length of the string, which might cause stack overflows.- See Also:
-
stateRegistry
A "registry" for state interning. -
root
Root automaton state. -
previous
Previous sequence added to the automaton inadd(CharsRef)
. -
comparator
A comparator used for enforcing sorted UTF8 order, used in assertions only.
-
-
Constructor Details
-
DaciukMihovAutomatonBuilder
private DaciukMihovAutomatonBuilder()The default constructor is private. Use static methods directly.
-
-
Method Details
-
add
Add another character sequence to this automaton. The sequence must be lexicographically larger or equal compared to any previous sequences added to this automaton (the input must be sorted). -
complete
Finalize the automaton and return the root state. No more strings can be added to the builder after this call.- Returns:
- Root automaton state.
-
convert
private static int convert(Automaton.Builder a, DaciukMihovAutomatonBuilder.State s, IdentityHashMap<DaciukMihovAutomatonBuilder.State, Integer> visited) Internal recursive traversal for conversion. -
build
Build a minimal, deterministic automaton from a sorted list ofBytesRef
representing strings in UTF-8. These strings must be binary-sorted. -
setPrevious
Copycurrent
into an internal buffer. -
replaceOrRegister
Replace last child ofstate
with an already registered state or stateRegistry the last child state. -
addSuffix
private void addSuffix(DaciukMihovAutomatonBuilder.State state, CharSequence current, int fromIndex) Add a suffix ofcurrent
starting atfromIndex
(inclusive) to statestate
.
-