c# - A better solution for Webscraping -


goal:
locate the sentence "from today's featured article" website "http://en.wikipedia.org/wiki/main_page" using webscape c# code.

problem:
retrieve website's soucecode inside of string value. believe can locate sentence "from today's featured article" looping substring. have feeling inefficient approach.

is there better solution locate sentence "from today's featured article" string input?

info:
*i'm using c# code visual studio 2013 community.
*the soucecode not work properly. on the first 3 row working.

webclient w = new webclient();  string s = w.downloadstring("http://en.wikipedia.org/wiki/main_page");  string svar = regexutil.matchkey(input);     static class regexutil {     static regex _regex = new regex(@"$ddd$");     /// <summary>     /// returns key matched within input.     /// </summary>     static public string matchkey(string input)     {         //match match = regex.match(input, @"from today's featured article", regexoptions.ignorecase);          match match = _regex.match(input);         //  match match = regex.match("dot 55 perls");           if (match.success)         {             return match.groups[1].value;         }         else         {             return null;         }     } } 

if want find occurrence of string, need this:

int pos = html.indexof("from today's featured article"); 

however, should note find string within quotes or markup , not visible text.

in order search visible text, you'd need parse html remove tags, , search text between.


Comments

Popular posts from this blog

OpenCV OpenCL: Convert Mat to Bitmap in JNI Layer for Android -

android - org.xmlpull.v1.XmlPullParserException: expected: START_TAG {http://schemas.xmlsoap.org/soap/envelope/}Envelope -

python - How to remove the Xframe Options header in django? -