scala - Is it possible to correctly calculate SVD on IndexedRowMatrix in Spark? -
i've got indexedrowmatrix
[m x n], contains x non-zero rows. i'm setting k = 3.
when try calculate svd on object computeu set true, dimensions of u matrix [m x n], when correct dimensions [m x k].
why happen?
i've tried converting indexedrowmatrix
rowmatrix
, calculating svd. result dimensions [x x k], calculates result non-zero rows (matrix dropping indices, in documentation).
is possible convert matrix, keeping rows indices?
val csv = sc.textfile("hdfs://spark/nlp/merged_sparse.csv").cache() // original file val data = csv.mappartitions(lines => { val parser = new csvparser(' ') lines.map(line => { parser.parseline(line) }) }).map(line => { matrixentry(line(0).tolong - 1, line(1).tolong - 1 , line(2).toint) } ) val coordinatematrix: coordinatematrix = new coordinatematrix(data) val indexedrowmatrix: indexedrowmatrix = coordinatematrix.toindexedrowmatrix() val rowmatrix: rowmatrix = indexedrowmatrix.torowmatrix() val svd: singularvaluedecomposition[rowmatrix, matrix] = rowmatrix.computesvd(3, computeu = true, 1e-9) val u: rowmatrix = svd.u // u factor rowmatrix. val s: vector = svd.s // singular values stored in local dense vector. val v: matrix = svd.v // v factor local dense matrix. val indexedsvd: singularvaluedecomposition[indexedrowmatrix, matrix] = indexedrowmatrix.computesvd(3, computeu = true, 1e-9) val indexedu: indexedrowmatrix = indexedsvd.u // u factor rowmatrix. val indexeds: vector = indexedsvd.s // singular values stored in local dense vector. val indexedv: matrix = indexedsvd.v // v factor local dense matrix.
it looks bug in spark mllib. if you size of row vector in indexed matrix correctly return 3 columns:
indexedu.rows.first().vector.size
i looked @ source , looks they're incorrectly copying current number of columns indexed matrix:
val u = if (computeu) { val indexedrows = indices.zip(svd.u.rows).map { case (i, v) => indexedrow(i, v) } new indexedrowmatrix(indexedrows, nrows, ncols) //ncols incorrect here } else { null }
looks prime candidate bugfix/pull request.
Comments
Post a Comment