Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
298 views
in Technique[技术] by (71.8m points)

c# - How to get rendered html (processed by Javascript) in WebBrowser control?

I have an ASP.NET page and some custom class that fetches a specified webpage and returns that page body back.

protected String GetHtml()
{
    Thread thread = new Thread(new ThreadStart(GetHtmlWorker));
    thread.SetApartmentState(ApartmentState.STA);
    thread.Start();
    thread.Join();
    return docHtml;
}

protected void GetHtmlWorker()
{
    using (WebBrowser browser = new WebBrowser())
    {
        browser.ScriptErrorsSuppressed = true;
        browser.Navigate(_url);
        // Wait for control to load page
        while (browser.ReadyState != WebBrowserReadyState.Complete)
            Application.DoEvents();
        docHtml = browser.DocumentText;
    }
}

But what I need is to get DOM HTML instead of the page source because I do some extra operations over DOM by jQuery.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Here is one solution I found to get to the rendered HTML(DOM) after javascript was run:

Place a WebBrowser control named webBrowser1 on the Form of class Form1.

[Form1.cs[Design]]

Then for code use:

[Form1.cs]

using System;
using System.Runtime.InteropServices;
using System.Windows.Forms;

namespace WebBrowserTest
{
    public partial class Form1 : Form
    {
        public Form1()
        {
            InitializeComponent();
            this.webBrowser1.ObjectForScripting = new MyScript();
        }

        private void Form1_Load(object sender, EventArgs e)
        {
            webBrowser1.Navigate("http://localhost:6489/Default.aspx");
        }

        private void webBrowser1_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
        {
            webBrowser1.Navigate("javascript: window.external.CallServerSideCode();");
        }

        [ComVisible(true)]
        public class MyScript
        {
            public void CallServerSideCode()
            {
                var doc = ((Form1)Application.OpenForms[0]).webBrowser1.Document;
            }
        }
    }
}

Change the webBrowser1.Navigate("http://localhost:6489/Default.aspx") parameter in Form1_Load to the page whose DOM after being processed by javascript you wish to obtain.

You can access the modified DOM in the CallServerSideCode() method, for example:

doc.GetElementById("myDataTable");

Or you can access the rendered HTML like this:

var renderedHtml = doc.GetElementsByTagName("HTML")[0].OuterHtml;

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...