I'm currently trying to scrape dynamically in an Android App.
I found these 2 solutions using WebViews. But i'm not having much luck at getting it to work.
Android Web Scraping with a Headless Browser
Selendroid as a web scraper
Here is my main Activity:
public class MainActivity extends AppCompatActivity {
@Override
protected void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
setContentView(R.layout.activity_main);
Scraper scraper = new Scraper();
scraper.scrape(this);
}
}
Here is my Scraper Class.
import android.content.Context;
import android.os.NetworkOnMainThreadException;
import android.webkit.WebView;
import android.webkit.WebViewClient;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import java.net.MalformedURLException;
import java.net.SocketTimeoutException;
import java.net.URL;
import org.jsoup.Connection;
import org.jsoup.HttpStatusException;
import org.jsoup.Jsoup;
import org.jsoup.UnsupportedMimeTypeException;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
public class Scraper {
public void scrape(Context context)
{
final WebView webView = new WebView(context);
webView.loadUrl("http://fanfox.net/manga/tales_of_demons_and_gods/c180/1.html#ipg5");
webView.getSettings().setJavaScriptEnabled(true);
webView.addJavascriptInterface(new HtmlHandler(), "Html-Handler");
webView.setWebViewClient(new WebViewClient() {
@Override
public void onPageFinished(WebView view, String url) {
//Load HTML
webView.loadUrl("javascript:HtmlHandler.handleHtml(document.documentElement.outerHTML);");
}
});
}
}
Finally here is the Html Handler Class.
import android.webkit.JavascriptInterface;
public class HtmlHandler {
@JavascriptInterface
@SuppressWarnings("unused")
public void handleHtml(String html) {
// scrape the content here
System.out.println(html);
}
}
When I run this I get this error.
E/chromium: [ERROR:aw_browser_terminator.cc(125)] Renderer process (11241) crash detected (code -1).
I think I must be doing something wrong with the creation & loading of the WebView.
I want to be able to get the html of the website once it has loaded as a string.
Once I have it in string form, ill do some pattern matching to fetch the information I want.
Any help would be greatly appreciated.
question from:
https://stackoverflow.com/questions/65935301/fetch-html-content-from-a-webview-so-that-i-can-scrape-dynamically 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…