c# - Sequential vs parallel solution memory usage -
i have slight issue following scenario: i'm given list of id values, need run select query (where id parameter), combine result sets 1 big 1 , return caller.
since query might run minutes per id (that's issue, @ moment consider given fact), , there can 1000s of ids in input) tried use tasks. approach experience slow, solid increase in memory use.
as test, made simple sequential solution too, has normal memory usage graph, expected, slow. there's increase while it's running, drops normal level when it's finished.
here's skeleton of code:
public class rowitem { public int id { get; set; } public string name { get; set; } //the rest of properties } public list<rowitem> getrowitems(list<int> customerids) { // solution has memory leak var tasks = new list<task<list<rowitem>>>(); foreach (var customerid in customerids) { var task = task.factory.startnew(() => return processcustomerid(customerid)); tasks.add(task); } while (tasks.any()) { var index = task.waitany(tasks.toarray()); var task = tasks[index]; rowitems.addrange(task.result); tasks.removeat(index); } // works fine, slow foreach (var customerid in customerids) { rowitems.addrange(processcustomerid(customerid))); } return rowitems; } private list<rowitem> processcustomerid(int customerid) { var rowitems = new list<rowitem>(); using (var conn = new oracleconnection("xxx")) { conn.open(); var sql = "select * ..."; using (var command = new oraclecommand(sql, conn)) { using (var datareader = command.executereader()) { using (var datatable = new datatable()) { datatable.load(datareader); rowitems = datatable .rows .oftype<datarow>() .select( row => new rowitem { id = convert.toint32(row["id"]), name = row["name"].tostring(), //the rest of properties }) .tolist(); } } } conn.close(); } return rowitems; }
what doing wrong when using tasks? according this msdn article, don't need bother disposing them manually, there's barely else. guess processcustomerid ok, it's called in both variations.
update log current memory usage used process.getcurrentprocess().privatememorysize64
, noticed problem in task manager >> processes
using entity framework processcustomerid method like:
list<rowitem> rowitems; using(var ctx = new oracleentities()){ rowitems = ctx.customer .where(o => o.id == customerid) .select( new rowitem { id = convert.toint32(row["id"]), name = row["name"].tostring(), //the rest of properties } ).tolist(); } return rowitems;
unless transferring large amounts of data images, video, data or blobs should near instantaneous 1k data result.
if unclear taking time, , use pre 10g oracle, hard monitor this. if use entity framework can attach monitoring it! http://www.hibernatingrhinos.com/products/efprof
at least year ago oracle supported entity framework 5.
in sequential executed 1 one, in parallel literally started @ same time consuming resources , creating deadlocks.
Comments
Post a Comment