c# - Using threads to parse multiple Html pages faster -

- September 15, 2013

here's i'm trying do:

get 1 html page url contains multiple links inside
visit each link
extract data visited link , create object using it

so far did simple , slow way:

public list<link> searchlinks(string name)     {         list<link> foundlinks = new list<link>();         // gethtmldocument() returns htmldocument using input url.         htmldocument doc = gethtmldocument(au_search_url + fixspaces(name));         var link_list = doc.documentnode.selectnodes(@"/html/body/div[@id='parent-container']/div[@id='main-content']/ol[@id='searchresult']/li/h2/a");         foreach (var link in link_list)         {             // todo threads              // getobject() creates object using data gathered             foundlinks.add(getobject(link.innertext, link.attributes["href"].value, getlatestepisode(link.attributes["href"].value)));         }         return foundlinks;     }

to make faster/efficient need implement threads, i'm not sure how should approach it, because can't randomly start threads, need wait them finish, thread.join() kind of solves 'wait threads finish' problem, becomes not fast anymore think, because threads launched after earlier 1 finished.

the simplest way offload work multiple threads use parallel.foreach() in place of current loop. this:

parallel.foreach(link_list, link => {     foundlinks.add(getobject(link.innertext, link.attributes["href"].value, getlatestepisode(link.attributes["href"].value))); });

i'm not sure if there other threading concerns in overall code. (note, example, no longer guarantee data added foundlinks in same order.) long there's nothing explicitly preventing concurrent work taking place take advantage of threading on multiple cpu cores process work.

Search This Blog

Running

c# - Using threads to parse multiple Html pages faster -

Comments

Post a Comment

Popular posts from this blog

python - No exponential form of the z-axis in matplotlib-3D-plots -

c# - "Newtonsoft.Json.JsonSerializationException unable to find constructor to use for types" error when deserializing class -

Why does a .NET 4.0 program produce a system.unauthorizedAccess error on a Windows Server 2012 machine with .NET 4.5 installed? -