When it comes to retrieving data from the web through scraping, not much is known about how to achieve that with Node.Js/JavaScript unlike languages like Python/PHP which already have popular modules that can help do that. This post is going to teach you how exactly you can scrape data from the web using Node.js/JavaScript. We are going to be using three packages to create our web scraping module so we need to install it in our Node project. The packages are - Cheerio - Request - Request Promise After you must have set up a working Node.js server for your project, go to the project terminal and install puppeteer using this command ``` npm install request cheerio request-promise ``` [Cheerio](https://github.com/cheeriojs/cheerio) is a lean implementation of jQuery that can be used to perform front-end tasks from the back-end. [Request](https://www.npmjs.com/package/request) and [request-promise](https://github.com/request/request-promise) are Node.js tools that will be used to make http requests. Create a new file in the root directory of the project and name it `scrape.js` or something. In the file, add the following starter code as a boilerplate ``` const scraper = () => { console.log('Scraping tool') } module.exports = scraper ``` In `app.js` which is in our project root directory, add the code ``` //run scraper var scrape = require('./scrape'); scrape() ``` below the line ``` app.use('/users', usersRouter); ``` Save all files and rerun the server and you should get something identical to the following results in your console  In `scrape.js`, we are going to replace the contents of the file with the following code ``` const requestPromise = require('request-promise'); const url = 'https://cointelegraph.com/tags/cryptocurrencies'; const scraper = () => { requestPromise(url) .then(function(html){ //success! console.log(html); }) .catch(function(err){ //handle error console.log(err) }); } module.exports = scraper ``` Run your server again and you should get something like this in your terminal  What the code above does is to use `request-promise` library that we installed earlier to fetch and return the html contents of any given url and log it in the console. In this case the given url is stored in the variable `url` and the library is called with the keyword `requestPromise`, which takes the `url` variable as an argument and returns the HTML contents of this page [https://cointelegraph.com/tags/cryptocurrencies](https://cointelegraph.com/tags/cryptocurrencies), which is a page containing latest crypto news on the cointelegraph website. After getiing the HTML code from the page we need to sort the code and extract whatever data we need to extract from the page. Visit the link of the page we scraped in Chrome browser and right click on the element you want to scrape then click inspect, to get access to the element in the Chrome inspector.  Once we are inspecting the element we want to scrape(in this case, the titles of each news piece on the page), we can now use `Cheerio` to parse the html for those titles and extract what we need from there. Replace the code in `scrape.js`, with the following code ``` const requestPromise = require('request-promise'); const $ = require('cheerio'); const url = 'https://cointelegraph.com/tags/cryptocurrencies'; const scraper = () => { requestPromise(url) .then(function(html){ //success! const newsHead = $('a > span.post-card-inline__title', html).toArray() const newsTitles = [] for (let i = 0; i < newsHead.length; i++) { newsTitles.push({ newsLink: `https://www.cointelegraph.com${newsHead[i].parent.attribs.href}`, newsTitle: `${newsHead[i].children[0].data}` }) } console.log(newsTitles) }) .catch(function(err){ //handle error console.log(err) }); } module.exports = scraper ``` The code above takes each element that we scraped from the crypto news page and then extracts two different data which are - Link to the actual news content - The news title We then store the data for each news piece in an object and the object is put into an array. If your run server now and check the tearminal you should get a result that looks like the image below which displays an array that lists each news object  That shows us how we can successfully scrape data from a web page and use it for our own purposes on our end. You can use this approach to get any data from any page, try it out and share your opinions in the comments.
author | gotgame | ||||||
---|---|---|---|---|---|---|---|
permlink | how-to-scrape-data-from-web-pages-using-node-js-express | ||||||
category | hive-175254 | ||||||
json_metadata | {"tags":["technology","ocd","neoxian","palnet","kurator","curie","gems"],"image":["https://i.ibb.co/cTTfRrs/new.png","https://i.ibb.co/4NCgvL4/new.png","https://i.ibb.co/j9CrMzR/new.png","https://i.ibb.co/0F7W3sY/new.png"],"links":["https://github.com/cheeriojs/cheerio","https://www.npmjs.com/package/request","https://github.com/request/request-promise","https://cointelegraph.com/tags/cryptocurrencies"],"app":"hiveblog/0.1","format":"markdown"} | ||||||
created | 2020-06-29 00:31:57 | ||||||
last_update | 2020-06-29 00:31:57 | ||||||
depth | 0 | ||||||
children | 6 | ||||||
last_payout | 2020-07-06 00:31:57 | ||||||
cashout_time | 1969-12-31 23:59:59 | ||||||
total_payout_value | 2.318 HBD | ||||||
curator_payout_value | 2.889 HBD | ||||||
pending_payout_value | 0.000 HBD | ||||||
promoted | 0.000 HBD | ||||||
body_length | 4,814 | ||||||
author_reputation | 23,969,707,386,372 | ||||||
root_title | "How to Scrape Data from Web Pages Using Node.js/Express" | ||||||
beneficiaries |
| ||||||
max_accepted_payout | 1,000,000.000 HBD | ||||||
percent_hbd | 10,000 | ||||||
post_id | 98,233,067 | ||||||
net_rshares | 16,753,097,198,211 | ||||||
author_curate_reward | "" |
voter | weight | wgt% | rshares | pct | time |
---|---|---|---|---|---|
xpilar | 0 | 1,054,750,264,771 | 50% | ||
samstonehill | 0 | 2,983,582,328 | 4% | ||
patrickulrich | 0 | 40,378,451,104 | 100% | ||
develcuy | 0 | 4,814,101,267 | 30% | ||
gbenga | 0 | 7,113,513,729 | 30% | ||
khussan | 0 | 711,665,641 | 31% | ||
jagged | 0 | 36,878,964,456 | 15% | ||
minnowbooster | 0 | 953,217,941,215 | 10% | ||
pixelfan | 0 | 11,838,164,730 | 2% | ||
tykee | 0 | 2,618,797,590 | 50% | ||
tfame3865 | 0 | 10,555,751,924 | 15% | ||
devcoin | 0 | 5,513,435,142 | 30% | ||
tipu | 0 | 4,081,175,269,812 | 8% | ||
sandeep126 | 0 | 247,804,663,528 | 100% | ||
cjsean | 0 | 680,721,407 | 10.5% | ||
mtl1979 | 0 | 538,858,873 | 15.5% | ||
kamchore | 0 | 177,578,925,723 | 50% | ||
guruvaj | 0 | 14,153,210,936 | 18% | ||
bartheek | 0 | 10,160,759,175 | 4% | ||
bala41288 | 0 | 52,194,336,732 | 20% | ||
iamowomizz | 0 | 859,450,113 | 100% | ||
happy-soul | 0 | 33,035,771,238 | 4% | ||
allyson19 | 0 | 2,191,301,513 | 12% | ||
steempampanga | 0 | 1,031,928,345 | 15% | ||
joelagbo | 0 | 8,766,023,231 | 100% | ||
retinox | 0 | 2,377,872,156 | 14.73% | ||
peerzadazeeshan | 0 | 9,221,875,904 | 22.5% | ||
paragism | 0 | 3,914,641,274 | 15% | ||
crypto.piotr | 0 | 516,065,740,169 | 31% | ||
minerthreat | 0 | 2,998,003,728 | 15% | ||
khiabels | 0 | 1,178,353,715 | 13.5% | ||
krbecrypto | 0 | 973,070,588 | 100% | ||
longer | 0 | 920,853,202 | 2% | ||
bigpower | 0 | 138,152,491,005 | 50% | ||
cyberspacegod | 0 | 1,002,360,749 | 30% | ||
quatro | 0 | 987,822,814 | 12% | ||
deathcross | 0 | 29,565,071,387 | 100% | ||
london65 | 0 | 1,252,867,120 | 13.5% | ||
laissez-faire | 0 | 63,397,285 | 100% | ||
unbiasedwriter | 0 | 18,044,492,250 | 20% | ||
reverseacid | 0 | 4,969,960,943 | 31% | ||
twoshyguys | 0 | 2,241,064,528 | 85% | ||
cwow2 | 0 | 442,112,430,820 | 31% | ||
dashand | 0 | 0 | 0.9% | ||
jacuzzi | 0 | 2,258,938,744 | 4% | ||
reflektor | 0 | 1,941,100,687,226 | 50% | ||
bippe | 0 | 682,591,928,375 | 50% | ||
hingsten | 0 | 931,133,583,754 | 50% | ||
limka | 0 | 39,704,079 | 100% | ||
theithei | 0 | 788,352,848 | 12% | ||
dwinf | 0 | 845,684,002 | 100% | ||
tinster | 0 | 603,355,672 | 15% | ||
glstech | 0 | 561,680,478 | 15% | ||
leighscotford | 0 | 638,086,343 | 1.6% | ||
tomlee | 0 | 2,996,291,227 | 15% | ||
lrcconsult | 0 | 26,439,184,631 | 50% | ||
contestcoin | 0 | 1,076,447,932 | 100% | ||
project.hope | 0 | 4,752,801,041,245 | 30% | ||
yggdrasil.laguna | 0 | 229,617,340 | 55% | ||
choppy | 0 | 332,459,112,845 | 100% | ||
brage | 0 | 8,530,544,673 | 50% | ||
gitplait | 0 | 75,294,761,298 | 100% | ||
sunsan | 0 | 1,744,139,895 | 15% | ||
hf19 | 0 | 48,564,778,588 | 100% | ||
beckie96830 | 0 | 4,811,052,856 | 30% |
@tipu curate
author | crypto.piotr |
---|---|
permlink | qcol1q |
category | hive-175254 |
json_metadata | {"users":["tipu"],"app":"hiveblog/0.1"} |
created | 2020-06-29 09:29:51 |
last_update | 2020-06-29 09:29:51 |
depth | 1 |
children | 1 |
last_payout | 2020-07-06 09:29:51 |
cashout_time | 1969-12-31 23:59:59 |
total_payout_value | 0.000 HBD |
curator_payout_value | 0.000 HBD |
pending_payout_value | 0.000 HBD |
promoted | 0.000 HBD |
body_length | 12 |
author_reputation | 27,396,789,428,606 |
root_title | "How to Scrape Data from Web Pages Using Node.js/Express" |
beneficiaries | [] |
max_accepted_payout | 1,000,000.000 HBD |
percent_hbd | 10,000 |
post_id | 98,239,279 |
net_rshares | 0 |
<a href="https://tipu.online/hive_curator?crypto.piotr" target="_blank">Upvoted 👌</a> (Mana: 16/32)
author | tipu |
---|---|
permlink | re-qcol1q-20200629t093003 |
category | hive-175254 |
json_metadata | "" |
created | 2020-06-29 09:30:03 |
last_update | 2020-06-29 09:30:03 |
depth | 2 |
children | 0 |
last_payout | 2020-07-06 09:30:03 |
cashout_time | 1969-12-31 23:59:59 |
total_payout_value | 0.000 HBD |
curator_payout_value | 0.000 HBD |
pending_payout_value | 0.000 HBD |
promoted | 0.000 HBD |
body_length | 108 |
author_reputation | 55,914,546,531,008 |
root_title | "How to Scrape Data from Web Pages Using Node.js/Express" |
beneficiaries | [] |
max_accepted_payout | 1,000,000.000 HBD |
percent_hbd | 10,000 |
post_id | 98,239,283 |
net_rshares | 0 |
Thanks for sharing an amazing Javascript tutorial. We are looking for people like you in our platform. <sub> Your post has been submitted to be curated with @gitplait community account because this is the kind of publications we like to see in our community. </sub> Join our [Community on Hive](https://hive.blog/trending/hive-103590) and Chat with us on [Discord](https://discord.gg/CWCj3rw). [[Gitplait-Team]](https://gitplait.tech/)
author | gitplait-mod1 |
---|---|
permlink | qcomvf |
category | hive-175254 |
json_metadata | {"users":["gitplait"],"links":["https://hive.blog/trending/hive-103590","https://discord.gg/CWCj3rw","https://gitplait.tech/"],"app":"hiveblog/0.1"} |
created | 2020-06-29 10:09:21 |
last_update | 2020-06-29 10:09:21 |
depth | 1 |
children | 0 |
last_payout | 2020-07-06 10:09:21 |
cashout_time | 1969-12-31 23:59:59 |
total_payout_value | 0.000 HBD |
curator_payout_value | 0.000 HBD |
pending_payout_value | 0.000 HBD |
promoted | 0.000 HBD |
body_length | 437 |
author_reputation | 64,455,719,431 |
root_title | "How to Scrape Data from Web Pages Using Node.js/Express" |
beneficiaries | [] |
max_accepted_payout | 1,000,000.000 HBD |
percent_hbd | 10,000 |
post_id | 98,239,739 |
net_rshares | 0 |
Congratulations @gotgame! You have completed the following achievement on the Hive blockchain and have been rewarded with new badge(s) : <table><tr><td><img src="https://images.hive.blog/60x70/http://hivebuzz.me/@gotgame/upvotes.png?202007022238"></td><td>You distributed more than 56000 upvotes. Your next target is to reach 57000 upvotes.</td></tr> </table> <sub>_You can view [your badges on your board](https://hivebuzz.me/@gotgame) And compare to others on the [Ranking](https://hivebuzz.me/ranking)_</sub> <sub>_If you no longer want to receive notifications, reply to this comment with the word_ `STOP`</sub> ###### Support the HiveBuzz project. [Vote](https://hivesigner.com/sign/update_proposal_votes?proposal_ids=%5B%22109%22%5D&approve=true) for [our proposal](https://peakd.com/me/proposals/109)!
author | hivebuzz |
---|---|
permlink | hivebuzz-notify-gotgame-20200702t224205000z |
category | hive-175254 |
json_metadata | {"image":["http://hivebuzz.me/notify.t6.png"]} |
created | 2020-07-02 22:42:06 |
last_update | 2020-07-02 22:42:06 |
depth | 1 |
children | 0 |
last_payout | 2020-07-09 22:42:06 |
cashout_time | 1969-12-31 23:59:59 |
total_payout_value | 0.000 HBD |
curator_payout_value | 0.000 HBD |
pending_payout_value | 0.000 HBD |
promoted | 0.000 HBD |
body_length | 813 |
author_reputation | 370,792,828,599,978 |
root_title | "How to Scrape Data from Web Pages Using Node.js/Express" |
beneficiaries | [] |
max_accepted_payout | 1,000,000.000 HBD |
percent_hbd | 10,000 |
post_id | 98,307,696 |
net_rshares | 0 |
I love javascript, even though I'm only good at Reactjs and vanilla javascript I know javascript as a 'language of all possibilities' and this tutorial proved it once again. Bookmarked!
author | joelagbo |
---|---|
permlink | re-gotgame-qcofur |
category | hive-175254 |
json_metadata | {"tags":["hive-175254"],"app":"peakd/2020.06.2"} |
created | 2020-06-29 07:37:42 |
last_update | 2020-06-29 07:37:42 |
depth | 1 |
children | 1 |
last_payout | 2020-07-06 07:37:42 |
cashout_time | 1969-12-31 23:59:59 |
total_payout_value | 0.000 HBD |
curator_payout_value | 0.000 HBD |
pending_payout_value | 0.000 HBD |
promoted | 0.000 HBD |
body_length | 185 |
author_reputation | 171,221,632,716,773 |
root_title | "How to Scrape Data from Web Pages Using Node.js/Express" |
beneficiaries | [] |
max_accepted_payout | 1,000,000.000 HBD |
percent_hbd | 10,000 |
post_id | 98,238,108 |
net_rshares | 0 |
Thanks for dropping by, glad you love the piece.
author | gotgame |
---|---|
permlink | qcp69z |
category | hive-175254 |
json_metadata | {"app":"hiveblog/0.1"} |
created | 2020-06-29 17:08:27 |
last_update | 2020-06-29 17:08:27 |
depth | 2 |
children | 0 |
last_payout | 2020-07-06 17:08:27 |
cashout_time | 1969-12-31 23:59:59 |
total_payout_value | 0.000 HBD |
curator_payout_value | 0.000 HBD |
pending_payout_value | 0.000 HBD |
promoted | 0.000 HBD |
body_length | 48 |
author_reputation | 23,969,707,386,372 |
root_title | "How to Scrape Data from Web Pages Using Node.js/Express" |
beneficiaries | [] |
max_accepted_payout | 1,000,000.000 HBD |
percent_hbd | 10,000 |
post_id | 98,245,588 |
net_rshares | 0 |