Class BKDWriter

java.lang.Object
org.apache.lucene.util.bkd.BKDWriter
All Implemented Interfaces:
Closeable, AutoCloseable

public class BKDWriter extends Object implements Closeable
Recursively builds a block KD-tree to assign all incoming points in N-dim space to smaller and smaller N-dim rectangles (cells) until the number of points in a given rectangle is <= config.maxPointsInLeafNode. The tree is partially balanced, which means the leaf nodes will have the requested config.maxPointsInLeafNode except one that might have less. Leaf nodes may straddle the two bottom levels of the binary tree. Values that fall exactly on a cell boundary may be in either cell.

The number of dimensions can be 1 to 8, but every byte[] value is fixed length.

This consumes heap during writing: it allocates a Long[numLeaves], a byte[numLeaves*(1+config.bytesPerDim)] and then uses up to the specified maxMBSortInHeap heap space for writing.

NOTE: This can write at most Integer.MAX_VALUE * config.maxPointsInLeafNode / config.bytesPerDim total points.

  • Field Details

    • CODEC_NAME

      public static final String CODEC_NAME
      See Also:
    • VERSION_START

      public static final int VERSION_START
      See Also:
    • VERSION_LEAF_STORES_BOUNDS

      public static final int VERSION_LEAF_STORES_BOUNDS
      See Also:
    • VERSION_SELECTIVE_INDEXING

      public static final int VERSION_SELECTIVE_INDEXING
      See Also:
    • VERSION_LOW_CARDINALITY_LEAVES

      public static final int VERSION_LOW_CARDINALITY_LEAVES
      See Also:
    • VERSION_META_FILE

      public static final int VERSION_META_FILE
      See Also:
    • VERSION_CURRENT

      public static final int VERSION_CURRENT
      See Also:
    • SPLITS_BEFORE_EXACT_BOUNDS

      private static final int SPLITS_BEFORE_EXACT_BOUNDS
      Number of splits before we compute the exact bounding box of an inner node.
      See Also:
    • DEFAULT_MAX_MB_SORT_IN_HEAP

      public static final float DEFAULT_MAX_MB_SORT_IN_HEAP
      Default maximum heap to use, before spilling to (slower) disk
      See Also:
    • config

      protected final BKDConfig config
      BKD tree configuration
    • comparator

      private final ArrayUtil.ByteArrayComparator comparator
    • equalsPredicate

      private final BKDUtil.ByteArrayPredicate equalsPredicate
    • commonPrefixComparator

      private final ArrayUtil.ByteArrayComparator commonPrefixComparator
    • tempDir

      final TrackingDirectoryWrapper tempDir
    • tempFileNamePrefix

      final String tempFileNamePrefix
    • maxMBSortInHeap

      final double maxMBSortInHeap
    • scratchDiff

      final byte[] scratchDiff
    • scratch1

      final byte[] scratch1
    • scratch2

      final byte[] scratch2
    • scratchBytesRef1

      final BytesRef scratchBytesRef1
    • scratchBytesRef2

      final BytesRef scratchBytesRef2
    • commonPrefixLengths

      final int[] commonPrefixLengths
    • docsSeen

      protected final FixedBitSet docsSeen
    • pointWriter

      private PointWriter pointWriter
    • finished

      private boolean finished
    • tempInput

      private IndexOutput tempInput
    • maxPointsSortInHeap

      private final int maxPointsSortInHeap
    • minPackedValue

      protected final byte[] minPackedValue
      Minimum per-dim values, packed
    • maxPackedValue

      protected final byte[] maxPackedValue
      Maximum per-dim values, packed
    • pointCount

      protected long pointCount
    • totalPointCount

      private final long totalPointCount
      An upper bound on how many points the caller will add (includes deletions)
    • maxDoc

      private final int maxDoc
    • docIdsWriter

      private final DocIdsWriter docIdsWriter
  • Constructor Details

    • BKDWriter

      public BKDWriter(int maxDoc, Directory tempDir, String tempFileNamePrefix, BKDConfig config, double maxMBSortInHeap, long totalPointCount)
  • Method Details

    • verifyParams

      private static void verifyParams(double maxMBSortInHeap, long totalPointCount)
    • initPointWriter

      private void initPointWriter() throws IOException
      Throws:
      IOException
    • add

      public void add(byte[] packedValue, int docID) throws IOException
      Throws:
      IOException
    • writeField

      public Runnable writeField(IndexOutput metaOut, IndexOutput indexOut, IndexOutput dataOut, String fieldName, MutablePointTree reader) throws IOException
      Write a field from a MutablePointTree. This way of writing points is faster than regular writes with add(byte[], int) since there is opportunity for reordering points before writing them to disk. This method does not use transient disk in order to reorder points.
      Throws:
      IOException
    • computePackedValueBounds

      private void computePackedValueBounds(MutablePointTree values, int from, int to, byte[] minPackedValue, byte[] maxPackedValue, BytesRef scratch)
    • writeFieldNDims

      private Runnable writeFieldNDims(IndexOutput metaOut, IndexOutput indexOut, IndexOutput dataOut, String fieldName, MutablePointTree values) throws IOException
      Throws:
      IOException
    • writeField1Dim

      private Runnable writeField1Dim(IndexOutput metaOut, IndexOutput indexOut, IndexOutput dataOut, String fieldName, MutablePointTree reader) throws IOException
      Throws:
      IOException
    • merge

      public Runnable merge(IndexOutput metaOut, IndexOutput indexOut, IndexOutput dataOut, List<MergeState.DocMap> docMaps, List<PointValues> readers) throws IOException
      More efficient bulk-add for incoming PointValuess. This does a merge sort of the already sorted values and currently only works when numDims==1. This returns -1 if all documents containing dimensional values were deleted.
      Throws:
      IOException
    • getNumLeftLeafNodes

      private int getNumLeftLeafNodes(int numLeaves)
    • checkMaxLeafNodeCount

      private void checkMaxLeafNodeCount(int numLeaves)
    • finish

      public Runnable finish(IndexOutput metaOut, IndexOutput indexOut, IndexOutput dataOut) throws IOException
      Writes the BKD tree to the provided IndexOutputs and returns a Runnable that writes the index of the tree if at least one point has been added, or null otherwise.
      Throws:
      IOException
    • packIndex

      private byte[] packIndex(BKDWriter.BKDTreeLeafNodes leafNodes) throws IOException
      Packs the two arrays, representing a semi-balanced binary tree, into a compact byte[] structure.
      Throws:
      IOException
    • appendBlock

      private int appendBlock(ByteBuffersDataOutput writeBuffer, List<byte[]> blocks)
      Appends the current contents of writeBuffer as another block on the growing in-memory file
    • recursePackIndex

      private int recursePackIndex(ByteBuffersDataOutput writeBuffer, BKDWriter.BKDTreeLeafNodes leafNodes, long minBlockFP, List<byte[]> blocks, byte[] lastSplitValues, boolean[] negativeDeltas, boolean isLeft, int leavesOffset, int numLeaves) throws IOException
      lastSplitValues is per-dimension split value previously seen; we use this to prefix-code the split byte[] on each inner node
      Throws:
      IOException
    • writeIndex

      private void writeIndex(IndexOutput metaOut, IndexOutput indexOut, int countPerLeaf, BKDWriter.BKDTreeLeafNodes leafNodes, long dataStartFP) throws IOException
      Throws:
      IOException
    • writeIndex

      private void writeIndex(IndexOutput metaOut, IndexOutput indexOut, int countPerLeaf, int numLeaves, byte[] packedIndex, long dataStartFP) throws IOException
      Throws:
      IOException
    • writeLeafBlockDocs

      private void writeLeafBlockDocs(DataOutput out, int[] docIDs, int start, int count) throws IOException
      Throws:
      IOException
    • writeLeafBlockPackedValues

      private void writeLeafBlockPackedValues(DataOutput out, int[] commonPrefixLengths, int count, int sortedDim, IntFunction<BytesRef> packedValues, int leafCardinality) throws IOException
      Throws:
      IOException
    • writeLowCardinalityLeafBlockPackedValues

      private void writeLowCardinalityLeafBlockPackedValues(DataOutput out, int[] commonPrefixLengths, int count, IntFunction<BytesRef> packedValues) throws IOException
      Throws:
      IOException
    • writeHighCardinalityLeafBlockPackedValues

      private void writeHighCardinalityLeafBlockPackedValues(DataOutput out, int[] commonPrefixLengths, int count, int sortedDim, IntFunction<BytesRef> packedValues, int compressedByteOffset) throws IOException
      Throws:
      IOException
    • writeActualBounds

      private void writeActualBounds(DataOutput out, int[] commonPrefixLengths, int count, IntFunction<BytesRef> packedValues) throws IOException
      Throws:
      IOException
    • computeMinMax

      private static BytesRef[] computeMinMax(int count, IntFunction<BytesRef> packedValues, int offset, int length)
      Return an array that contains the min and max values for the [offset, offset+length] interval of the given BytesRefs.
    • writeLeafBlockPackedValuesRange

      private void writeLeafBlockPackedValuesRange(DataOutput out, int[] commonPrefixLengths, int start, int end, IntFunction<BytesRef> packedValues) throws IOException
      Throws:
      IOException
    • runLen

      private static int runLen(IntFunction<BytesRef> packedValues, int start, int end, int byteOffset)
    • writeCommonPrefixes

      private void writeCommonPrefixes(DataOutput out, int[] commonPrefixes, byte[] packedValue) throws IOException
      Throws:
      IOException
    • close

      public void close() throws IOException
      Specified by:
      close in interface AutoCloseable
      Specified by:
      close in interface Closeable
      Throws:
      IOException
    • verifyChecksum

      private Error verifyChecksum(Throwable priorException, PointWriter writer) throws IOException
      Called on exception, to check whether the checksum is also corrupt in this source, and add that information (checksum matched or didn't) as a suppressed exception.
      Throws:
      IOException
    • split

      protected int split(byte[] minPackedValue, byte[] maxPackedValue, int[] parentSplits)
      Pick the next dimension to split.
      Parameters:
      minPackedValue - the min values for all dimensions
      maxPackedValue - the max values for all dimensions
      parentSplits - how many times each dim has been split on the parent levels
      Returns:
      the dimension to split
    • switchToHeap

      private HeapPointWriter switchToHeap(PointWriter source) throws IOException
      Pull a partition back into heap once the point count is low enough while recursing.
      Throws:
      IOException
    • build

      private void build(int leavesOffset, int numLeaves, MutablePointTree reader, int from, int to, IndexOutput out, byte[] minPackedValue, byte[] maxPackedValue, int[] parentSplits, byte[] splitPackedValues, byte[] splitDimensionValues, long[] leafBlockFPs, int[] spareDocIds) throws IOException
      Throws:
      IOException
    • computePackedValueBounds

      private void computePackedValueBounds(BKDRadixSelector.PathSlice slice, byte[] minPackedValue, byte[] maxPackedValue) throws IOException
      Throws:
      IOException
    • build

      private void build(int leavesOffset, int numLeaves, BKDRadixSelector.PathSlice points, IndexOutput out, BKDRadixSelector radixSelector, byte[] minPackedValue, byte[] maxPackedValue, int[] parentSplits, byte[] splitPackedValues, byte[] splitDimensionValues, long[] leafBlockFPs, int[] spareDocIds) throws IOException
      The point writer contains the data that is going to be splitted using radix selection. /* This method is used when we are merging previously written segments, in the numDims > 1 case.
      Throws:
      IOException
    • computeCommonPrefixLength

      private void computeCommonPrefixLength(HeapPointWriter heapPointWriter, byte[] commonPrefix, int from, int to)
    • valuesInOrderAndBounds

      private static boolean valuesInOrderAndBounds(BKDConfig config, int count, int sortedDim, byte[] minPackedValue, byte[] maxPackedValue, IntFunction<BytesRef> values, int[] docs, int docsOffset)
    • valueInOrder

      private static boolean valueInOrder(BKDConfig config, long ord, int sortedDim, byte[] lastPackedValue, byte[] packedValue, int packedValueOffset, int doc, int lastDoc)
    • valueInBounds

      private static boolean valueInBounds(BKDConfig config, BytesRef packedValue, byte[] minPackedValue, byte[] maxPackedValue)