![]() ![]() The page.goto method is the right tool to start navigation. It’s important that this response handler is defined before navigating to a URL. We store it as text, as image, as font, whatever is needed. The response.buffer() contains all the content from the response, in the right format.This is how static site generators create pretty URLs for servers where you can’t access routing directly. If the URL has no extension name specified, we transform the file into a directory and add an index.html file.We take the pathname property to get the URL without the host name, and create a path on our local disk with the path.resolve method. The URL class from the url package helps us accessing parts of the response’s URL.This callback accesses a couple of properties to store an exact copy of the file on our hard disk. With every response in our page context, we execute a callback. async function start (urlToFetch ) /index.html ` Īwait fse. Let it sink in for a bit, I’ll explain each point afterwards in detail. Here’s the full code for scraping and saving a website. I need both to extract filenames and create a proper path to store the files on my disk. const puppeteer = require ( 'puppeteer' ) // v 1.1.0Ĭonst fse = require ( 'fs-extra' ) // v 5.0.0Īnd that’s it! The url and path packages are from core. It features a couple of nice shortcuts if you want to create folders and files in a single line. ![]() I’m using Node v9 and only need a couple of extra packages. That’s why I decided to use a headless Chrome instance with puppeteer to store an exact copy. And you need a browser context to record every request and response. ![]() curl and wget have troubles when dealing with an SPA. I found it particularly hard to save a website like it’s delivered with some of the tools around. In some cases, it can be hard to get to the actual artefact. ![]() For some of my performance audits I need an exact copy of the webpage as it is served by my clients infrastructure. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |