scala - Is it possible to correctly calculate SVD on IndexedRowMatrix in Spark? -


i've got indexedrowmatrix [m x n], contains x non-zero rows. i'm setting k = 3.

when try calculate svd on object computeu set true, dimensions of u matrix [m x n], when correct dimensions [m x k].

why happen?

i've tried converting indexedrowmatrix rowmatrix , calculating svd. result dimensions [x x k], calculates result non-zero rows (matrix dropping indices, in documentation).

is possible convert matrix, keeping rows indices?

    val csv = sc.textfile("hdfs://spark/nlp/merged_sparse.csv").cache()  // original file      val data = csv.mappartitions(lines => {         val parser = new csvparser(' ')         lines.map(line => {           parser.parseline(line)         })       }).map(line => {         matrixentry(line(0).tolong - 1, line(1).tolong - 1 , line(2).toint)        }     )      val coordinatematrix: coordinatematrix = new coordinatematrix(data)     val indexedrowmatrix: indexedrowmatrix = coordinatematrix.toindexedrowmatrix()     val rowmatrix: rowmatrix = indexedrowmatrix.torowmatrix()       val svd: singularvaluedecomposition[rowmatrix, matrix] = rowmatrix.computesvd(3, computeu = true, 1e-9)      val u: rowmatrix = svd.u // u factor rowmatrix.     val s: vector = svd.s // singular values stored in local dense vector.     val v: matrix = svd.v // v factor local dense matrix.      val indexedsvd: singularvaluedecomposition[indexedrowmatrix, matrix] = indexedrowmatrix.computesvd(3, computeu = true, 1e-9)      val indexedu: indexedrowmatrix = indexedsvd.u // u factor rowmatrix.     val indexeds: vector = indexedsvd.s // singular values stored in local dense vector.     val indexedv: matrix = indexedsvd.v // v factor local dense matrix. 

it looks bug in spark mllib. if you size of row vector in indexed matrix correctly return 3 columns:

indexedu.rows.first().vector.size 

i looked @ source , looks they're incorrectly copying current number of columns indexed matrix:

val u = if (computeu) {   val indexedrows = indices.zip(svd.u.rows).map { case (i, v) =>     indexedrow(i, v)   }   new indexedrowmatrix(indexedrows, nrows, ncols) //ncols incorrect here } else {   null } 

looks prime candidate bugfix/pull request.


Comments

Popular posts from this blog

python - No exponential form of the z-axis in matplotlib-3D-plots -

php - Best Light server (Linux + Web server + Database) for Raspberry Pi -

c# - "Newtonsoft.Json.JsonSerializationException unable to find constructor to use for types" error when deserializing class -