pyspark.streaming.DStream.reduceByWindow¶
-
DStream.
reduceByWindow
(reduceFunc: Callable[[T, T], T], invReduceFunc: Optional[Callable[[T, T], T]], windowDuration: int, slideDuration: int) → pyspark.streaming.dstream.DStream[T][source]¶ Return a new DStream in which each RDD has a single element generated by reducing all elements in a sliding window over this DStream.
if invReduceFunc is not None, the reduction is done incrementally using the old window’s reduced value :
reduce the new values that entered the window (e.g., adding new counts)
2. “inverse reduce” the old values that left the window (e.g., subtracting old counts) This is more efficient than invReduceFunc is None.
- Parameters
- reduceFuncfunction
associative and commutative reduce function
- invReduceFuncfunction
inverse reduce function of reduceFunc; such that for all y, and invertible x: invReduceFunc(reduceFunc(x, y), x) = y
- windowDurationint
width of the window; must be a multiple of this DStream’s batching interval
- slideDurationint
sliding interval of the window (i.e., the interval after which the new DStream will generate RDDs); must be a multiple of this DStream’s batching interval