SparseVector#
- class pyspark.ml.linalg.SparseVector(size, *args)[source]#
A simple sparse vector class for passing data to MLlib. Users may alternatively pass SciPy’s {scipy.sparse} data types.
Methods
dot
(other)Dot product with a SparseVector or 1- or 2-dimensional Numpy array.
norm
(p)Calculates the norm of a SparseVector.
Number of nonzero elements.
squared_distance
(other)Squared distance from a SparseVector or 1-dimensional NumPy array.
toArray
()Returns a copy of this SparseVector as a 1-dimensional numpy.ndarray.
Attributes
Size of the vector.
A list of indices corresponding to active entries.
A list of values corresponding to active entries.
Methods Documentation
- dot(other)[source]#
Dot product with a SparseVector or 1- or 2-dimensional Numpy array.
Examples
>>> a = SparseVector(4, [1, 3], [3.0, 4.0]) >>> a.dot(a) 25.0 >>> a.dot(array.array('d', [1., 2., 3., 4.])) 22.0 >>> b = SparseVector(4, [2], [1.0]) >>> a.dot(b) 0.0 >>> a.dot(np.array([[1, 1], [2, 2], [3, 3], [4, 4]])) array([ 22., 22.]) >>> a.dot([1., 2., 3.]) Traceback (most recent call last): ... AssertionError: dimension mismatch >>> a.dot(np.array([1., 2.])) Traceback (most recent call last): ... AssertionError: dimension mismatch >>> a.dot(DenseVector([1., 2.])) Traceback (most recent call last): ... AssertionError: dimension mismatch >>> a.dot(np.zeros((3, 2))) Traceback (most recent call last): ... AssertionError: dimension mismatch
- norm(p)[source]#
Calculates the norm of a SparseVector.
Examples
>>> a = SparseVector(4, [0, 1], [3., -4.]) >>> a.norm(1) 7.0 >>> a.norm(2) 5.0
- numNonzeros()[source]#
Number of nonzero elements. This scans all active values and count non zeros.
- squared_distance(other)[source]#
Squared distance from a SparseVector or 1-dimensional NumPy array.
Examples
>>> a = SparseVector(4, [1, 3], [3.0, 4.0]) >>> a.squared_distance(a) 0.0 >>> a.squared_distance(array.array('d', [1., 2., 3., 4.])) 11.0 >>> a.squared_distance(np.array([1., 2., 3., 4.])) 11.0 >>> b = SparseVector(4, [2], [1.0]) >>> a.squared_distance(b) 26.0 >>> b.squared_distance(a) 26.0 >>> b.squared_distance([1., 2.]) Traceback (most recent call last): ... AssertionError: dimension mismatch >>> b.squared_distance(SparseVector(3, [1,], [1.0,])) Traceback (most recent call last): ... AssertionError: dimension mismatch
Attributes Documentation
- size#
Size of the vector.
- indices#
A list of indices corresponding to active entries.
- values#
A list of values corresponding to active entries.