create account

Scrapy - extracting the data you need from websites by teamhumble

View this thread on: hive.blogpeakd.comecency.com
· @teamhumble ·
$7.43
Scrapy - extracting the data you need from websites
# Scrapy
extracting the data you need from websites

---
## Screenshots
<center><img alt="scrapy.jpg" src="https://s3-us-west-2.amazonaws.com/huntimages/production/steemhunt/2018-11-02/44435599-scrapy.jpg"/></center>





---
## Hunter's comment
'scraping' websites is pretty a common thing that's out there in the world. i knew companies many years back that did this at a micro level for people that just wanted information FAST that they could populate their CRM's with just to be able to cold call a bunch of execs and 'decision makers' in a business.

not cheap as well, they could run the reports and charge them quite a lot of money to deliver this data -- often the data was collected illegally or at least let's say it was a 'grey area' until laws came into place around it.

i'm sure it could be incredibly useful as well for someone wanting to build some web spiders that actually use this scraping technology in a productive way too, maybe for building exports or collecting together social media content to store away as legacy items.

---
## Link
https://scrapy.org

---
## Contributors
Hunter: @teamhumble

---
<center><br/>![Steemhunt.com](https://i.imgur.com/jB2axnW.png)<br/>
This is posted on Steemhunt - A place where you can dig products and earn STEEM.
[View on Steemhunt.com](https://steemhunt.com/@teamhumble/scrapy-extracting-the-data-you-need-from-websites)
</center>
👍  , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , and 70 others
properties (23)
authorteamhumble
permlinkscrapy-extracting-the-data-you-need-from-websites
categorysteemhunt
json_metadata{"tags":["steemhunt","scrapy","extracting-data","web-spiders","cloud"],"image":["https://s3-us-west-2.amazonaws.com/huntimages/production/steemhunt/2018-11-02/44435599-scrapy.jpg"],"links":["https://scrapy.org"],"community":"steemhunt","app":"steemhunt/1.0.0"}
created2018-11-02 13:11:57
last_update2018-11-02 13:11:57
depth0
children15
last_payout2018-11-09 13:11:57
cashout_time1969-12-31 23:59:59
total_payout_value5.488 HBD
curator_payout_value1.941 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length1,393
author_reputation315,232,864,758,316
root_title"Scrapy - extracting the data you need from websites"
beneficiaries
0.
accountsteemhunt
weight900
1.
accountsteemhunt.fund
weight100
2.
accountsteemhunt.pay
weight500
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id74,541,381
net_rshares7,410,096,552,806
author_curate_reward""
vote details (134)
@avocadoxy ·
$0.39
It's been useful actually, but i believe that many sites actually are blocking the accesses from those APIs. You may need to use it with VPN. Also im not sure if react-based websites nowadays would be working well with these types of scraping tools because they generate html tags after the server call (or only in a client page).
👍  ,
properties (23)
authoravocadoxy
permlinkre-teamhumble-scrapy-extracting-the-data-you-need-from-websites-20181102t203426171z
categorysteemhunt
json_metadata{"tags":["steemhunt"],"community":"steemhunt","app":"steemhunt/1.0.0"}
created2018-11-02 20:34:27
last_update2018-11-02 20:34:27
depth1
children0
last_payout2018-11-09 20:34:27
cashout_time1969-12-31 23:59:59
total_payout_value0.296 HBD
curator_payout_value0.097 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length330
author_reputation2,305,219,248,142
root_title"Scrapy - extracting the data you need from websites"
beneficiaries[]
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id74,563,531
net_rshares344,137,584,241
author_curate_reward""
vote details (2)
@focusnow ·
$0.32
With some sites really loaded with information and ads, Scrapy can help a user to select valuable pieces of data. I will try it out soon.
👍  
properties (23)
authorfocusnow
permlinkre-teamhumble-scrapy-extracting-the-data-you-need-from-websites-20181102t173106131z
categorysteemhunt
json_metadata{"tags":["steemhunt"],"community":"steemhunt","app":"steemhunt/1.0.0"}
created2018-11-02 17:31:18
last_update2018-11-02 17:31:18
depth1
children0
last_payout2018-11-09 17:31:18
cashout_time1969-12-31 23:59:59
total_payout_value0.238 HBD
curator_payout_value0.079 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length137
author_reputation61,594,718,916,591
root_title"Scrapy - extracting the data you need from websites"
beneficiaries[]
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id74,555,669
net_rshares274,968,854,894
author_curate_reward""
vote details (1)
@gungunkrishu ·
No more writing Python script s to scrape websites. I can now simply use Scrape Application and Cron it to do the job. The best part is that it's open source. Thanks for sharing.
properties (22)
authorgungunkrishu
permlinkre-teamhumble-scrapy-extracting-the-data-you-need-from-websites-20181103t065943656z
categorysteemhunt
json_metadata{"tags":["steemhunt"],"community":"steemhunt","app":"steemhunt/1.0.0"}
created2018-11-03 06:59:45
last_update2018-11-03 06:59:45
depth1
children0
last_payout2018-11-10 06:59:45
cashout_time1969-12-31 23:59:59
total_payout_value0.000 HBD
curator_payout_value0.000 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length178
author_reputation1,288,341,919,511,430
root_title"Scrapy - extracting the data you need from websites"
beneficiaries[]
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id74,586,361
net_rshares0
@healthdear ·
$0.15
A great website scraping tool. Definately a tool to bookmark when you are looking for extracting data from different websites. Thanks for sharing.
👍  ,
properties (23)
authorhealthdear
permlinkre-teamhumble-scrapy-extracting-the-data-you-need-from-websites-20181103t044059335z
categorysteemhunt
json_metadata{"tags":["steemhunt"],"community":"steemhunt","app":"steemhunt/1.0.0"}
created2018-11-03 04:40:57
last_update2018-11-03 04:40:57
depth1
children0
last_payout2018-11-10 04:40:57
cashout_time1969-12-31 23:59:59
total_payout_value0.115 HBD
curator_payout_value0.037 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length146
author_reputation58,889,465,489,515
root_title"Scrapy - extracting the data you need from websites"
beneficiaries[]
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id74,582,284
net_rshares134,642,695,211
author_curate_reward""
vote details (2)
@heroldius ·
$0.16
Great app written in Python and running on all systems to extract data from websites easily and quickly. Thanks for shating it @teamhumble, very useful.
👍  
properties (23)
authorheroldius
permlinkre-teamhumble-scrapy-extracting-the-data-you-need-from-websites-20181103t120723826z
categorysteemhunt
json_metadata{"tags":["steemhunt"],"community":"steemhunt","app":"steemhunt/1.0.0"}
created2018-11-03 12:07:21
last_update2018-11-03 12:07:21
depth1
children0
last_payout2018-11-10 12:07:21
cashout_time1969-12-31 23:59:59
total_payout_value0.117 HBD
curator_payout_value0.038 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length152
author_reputation460,066,518,601,580
root_title"Scrapy - extracting the data you need from websites"
beneficiaries[]
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id74,598,790
net_rshares134,363,644,553
author_curate_reward""
vote details (1)
@joannewong ·
$0.81
<center>https://media.giphy.com/media/3o7abKhOpu0NwenH3O/giphy.gif</center>
---

## <center>Impressive Hunt, Your **Hunt** just got **Verified!**</center>

---

Please read our [posting guidelines](https://github.com/Steemhunt/web/blob/master/POSTING_GUIDELINES.md). If you have any questions, please join our [Discord Group](https://discord.gg/mWXpgks).
👍  , ,
properties (23)
authorjoannewong
permlinkre-teamhumble-scrapy-extracting-the-data-you-need-from-websites-20181102t155007302z
categorysteemhunt
json_metadata{"tags":["steemhunt"],"community":"steemhunt","app":"steemhunt/1.0.0","verified_by":"joannewong"}
created2018-11-02 15:50:09
last_update2018-11-02 15:50:09
depth1
children0
last_payout2018-11-09 15:50:09
cashout_time1969-12-31 23:59:59
total_payout_value0.615 HBD
curator_payout_value0.198 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length354
author_reputation132,991,719,282,821
root_title"Scrapy - extracting the data you need from websites"
beneficiaries[]
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id74,550,676
net_rshares708,105,081,508
author_curate_reward""
vote details (3)
@murphylee ·
$0.16
We all use data every day. Extracting it from website is really cool. I think I like this hunt.
It's really good hunt 
👍  ,
properties (23)
authormurphylee
permlinkre-teamhumble-scrapy-extracting-the-data-you-need-from-websites-20181103t135333509z
categorysteemhunt
json_metadata{"tags":["steemhunt"],"community":"steemhunt","app":"steemhunt/1.0.0"}
created2018-11-03 13:53:39
last_update2018-11-03 13:53:39
depth1
children0
last_payout2018-11-10 13:53:39
cashout_time1969-12-31 23:59:59
total_payout_value0.120 HBD
curator_payout_value0.039 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length118
author_reputation10,049,042,547,991
root_title"Scrapy - extracting the data you need from websites"
beneficiaries[]
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id74,604,235
net_rshares139,257,244,312
author_curate_reward""
vote details (2)
@nygma ·
This is quite a good tool for extracting only relevant data from sites or other sources. The most important this is if it can get the delta, so I will need to take a look on this.
properties (22)
authornygma
permlinkre-teamhumble-scrapy-extracting-the-data-you-need-from-websites-20181102t182726181z
categorysteemhunt
json_metadata{"tags":["steemhunt"],"community":"steemhunt","app":"steemhunt/1.0.0"}
created2018-11-02 18:27:30
last_update2018-11-02 18:27:30
depth1
children0
last_payout2018-11-09 18:27:30
cashout_time1969-12-31 23:59:59
total_payout_value0.000 HBD
curator_payout_value0.000 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length179
author_reputation6,918,582,514,226
root_title"Scrapy - extracting the data you need from websites"
beneficiaries[]
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id74,558,250
net_rshares0
@pele23 ·
$0.39
Indeed a grey zone, but the website owner is in charge of the privacy in my opinion. Great product and hunt!
👍  ,
properties (23)
authorpele23
permlinkre-teamhumble-scrapy-extracting-the-data-you-need-from-websites-20181102t204633129z
categorysteemhunt
json_metadata{"tags":["steemhunt"],"community":"steemhunt","app":"steemhunt/1.0.0"}
created2018-11-02 20:46:36
last_update2018-11-02 20:46:36
depth1
children0
last_payout2018-11-09 20:46:36
cashout_time1969-12-31 23:59:59
total_payout_value0.296 HBD
curator_payout_value0.097 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length109
author_reputation347,007,328,847,644
root_title"Scrapy - extracting the data you need from websites"
beneficiaries[]
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id74,564,010
net_rshares344,106,180,928
author_curate_reward""
vote details (2)
@rabeel · (edited)
$0.32
## No doubt @teamhumble, "Scrapy" is very useful hunt you introduce here.
Steemhunt is great social media platform where we enjoy daily wonderful products, applications and other software. 
**Scrapy**  is very helpful scraping technology due to which we easily extract data we need from websites. Thanks a lot for always sharing useful hunts. stay blessed and keep sharing.
👍  
properties (23)
authorrabeel
permlinkre-teamhumble-scrapy-extracting-the-data-you-need-from-websites-20181102t171635440z
categorysteemhunt
json_metadata{"tags":["steemhunt"],"community":"steemhunt","app":"steemhunt/1.0.0"}
created2018-11-02 17:16:42
last_update2018-11-02 17:17:06
depth1
children1
last_payout2018-11-09 17:16:42
cashout_time1969-12-31 23:59:59
total_payout_value0.238 HBD
curator_payout_value0.079 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length373
author_reputation136,994,730,017,550
root_title"Scrapy - extracting the data you need from websites"
beneficiaries[]
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id74,555,017
net_rshares274,989,460,639
author_curate_reward""
vote details (1)
@avocadoxy ·
hey, stop making silly comments. 

so you just have your own f***ing comment format in SH like,

> No doubt [username], [Product name] is very useful hunt you introduce here. Steemhunt is great social media platform where we enjoy daily wonderful products, applications and other software.

and just copied and pasted from the hunting post like this,

> Scrapy is very helpful scraping technology due to which we easily extract data we need from websites. Thanks a lot for always sharing useful hunts.

and again the format

>Thanks a lot for always sharing useful hunts. stay blessed and keep sharing.


Seriously, shame on you f***ing penny pickers. 
👍  
properties (23)
authoravocadoxy
permlinkre-rabeel-re-teamhumble-scrapy-extracting-the-data-you-need-from-websites-20181102t202735022z
categorysteemhunt
json_metadata{"tags":["steemhunt"],"community":"steemhunt","app":"steemhunt/1.0.0"}
created2018-11-02 20:27:36
last_update2018-11-02 20:27:36
depth2
children0
last_payout2018-11-09 20:27:36
cashout_time1969-12-31 23:59:59
total_payout_value0.000 HBD
curator_payout_value0.000 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length652
author_reputation2,305,219,248,142
root_title"Scrapy - extracting the data you need from websites"
beneficiaries[]
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id74,563,225
net_rshares1,966,366,347
author_curate_reward""
vote details (1)
@steem-ua ·
#### Hi @teamhumble!

Your post was upvoted by @steem-ua, new Steem dApp, using UserAuthority for algorithmic post curation!
Your **UA** account score is currently 6.266 which ranks you at **#217** across all Steem accounts.
Your rank has improved 1 places in the last three days (old rank 218).

In our last Algorithmic Curation Round, consisting of 265 contributions, your post is ranked at **#20**.
##### Evaluation of your UA score:

* You've built up a nice network.
* The readers appreciate your great work!
* Good user engagement!


**Feel free to join our [@steem-ua Discord server](https://discord.gg/KpBNYGz)**
properties (22)
authorsteem-ua
permlinkre-scrapy-extracting-the-data-you-need-from-websites-20181107t202825z
categorysteemhunt
json_metadata"{"app": "beem/0.20.9"}"
created2018-11-07 20:28:27
last_update2018-11-07 20:28:27
depth1
children0
last_payout2018-11-14 20:28:27
cashout_time1969-12-31 23:59:59
total_payout_value0.000 HBD
curator_payout_value0.000 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length620
author_reputation23,214,230,978,060
root_title"Scrapy - extracting the data you need from websites"
beneficiaries[]
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id74,873,788
net_rshares0
@steemhunt ·
### Congratulations!
We have upvoted your post for your contribution within our community.
Thanks again and look forward to seeing your next hunt!

Want to chat? Join us on:
* Discord: https://discord.gg/mWXpgks
* Telegram: https://t.me/joinchat/AzcqGxCV1FZ8lJHVgHOgGQ
properties (22)
authorsteemhunt
permlinkre-scrapy-extracting-the-data-you-need-from-websites-steemhunt
categorysteemhunt
json_metadata{"tags":["steemhunt"],"community":"steemhunt","app":"steemhunt/1.0.0","format":"markdown"}
created2018-11-04 07:21:30
last_update2018-11-04 07:21:30
depth1
children0
last_payout2018-11-11 07:21:30
cashout_time1969-12-31 23:59:59
total_payout_value0.000 HBD
curator_payout_value0.000 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length269
author_reputation328,252,698,785,439
root_title"Scrapy - extracting the data you need from websites"
beneficiaries[]
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id74,644,892
net_rshares0
@thedawn ·
Scrapy is a very good and innovative product through this product we can get data which type of material we want or required very fast. It is useful product and Great hunt.
👍  ,
properties (23)
authorthedawn
permlinkre-teamhumble-scrapy-extracting-the-data-you-need-from-websites-20181102t171344243z
categorysteemhunt
json_metadata{"tags":["steemhunt"],"community":"steemhunt","app":"steemhunt/1.0.0"}
created2018-11-02 17:13:42
last_update2018-11-02 17:13:42
depth1
children1
last_payout2018-11-09 17:13:42
cashout_time1969-12-31 23:59:59
total_payout_value0.000 HBD
curator_payout_value0.000 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length172
author_reputation18,112,689,713,612
root_title"Scrapy - extracting the data you need from websites"
beneficiaries[]
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id74,554,874
net_rshares721,864,676
author_curate_reward""
vote details (2)
@avocadoxy ·
You really think this comment is helpful for SH? If you thought this hunt was cool, then you could just say "Cool hunt!". THB already mentioned all the info what you just repeated. You're clearly a penny pickers who constantly collecting f***** pennies from SH's comment voting pool. Shame on you.
👍  
properties (23)
authoravocadoxy
permlinkre-thedawn-re-teamhumble-scrapy-extracting-the-data-you-need-from-websites-20181102t202421519z
categorysteemhunt
json_metadata{"tags":["steemhunt"],"community":"steemhunt","app":"steemhunt/1.0.0"}
created2018-11-02 20:24:21
last_update2018-11-02 20:24:21
depth2
children0
last_payout2018-11-09 20:24:21
cashout_time1969-12-31 23:59:59
total_payout_value0.000 HBD
curator_payout_value0.000 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length297
author_reputation2,305,219,248,142
root_title"Scrapy - extracting the data you need from websites"
beneficiaries[]
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id74,563,071
net_rshares1,966,397,809
author_curate_reward""
vote details (1)