stacksapien/scalable-nodejs-web-crawler

Repository files navigation

A NodeJS based web-Crawler which can scale on the go!

  • Support both Static & Dynamic Page Crawling
  • Linux (Ubuntu)
  • Redis
  • Nodejs
  • Install NodeJS by executing the below command in root directory of project:
        $ cd init-scripts/
        $ sudo bash install-nodejs.sh
    
  • Install Redis
        $ sudo bash install-redis.sh
    
  • Install project dependencies. In root directory of the project execute the following command:
        $ npm install
    
    $ node index.js "<url>" "path-to-store-url"
    $ node index.js "https://stacksapien.com" "./temp"
  • In Above Example, Files like valid-urls.txt, external-urls.txt & invalid-urls.txt will be generated in temp folder of your git project directory.

About

A NodeJS based web-Crawler which scales on the go!

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published