create account

Illegal Web Scraping: Makes Democratized Data Even More Crucial by taskmaster4450

View this thread on: hive.blogpeakd.comecency.com
· @taskmaster4450 ·
$8.21
Illegal Web Scraping: Makes Democratized Data Even More Crucial
It is amazing how things progress when one is following a [story](https://inleo.io/@leoglossary/leoglossary-story).


On a number of occasions we discussed the concept of democratized [data](https://inleo.io/@leoglossary/leoglossary-data).  In fact, this is what I view to be Hive's number 1 use case at the moment.


People often say find a need and fill it.  


The reality is that, in this era of [AI](https://inleo.io/@leoglossary/leoglossary-artificial-intelligence) and training large models, data is crucial.  Naturally, people are quickly realizing the value of said data on their [servers](https://inleo.io/@leoglossary/leoglossary-server-computer).  Those who operate on the [Internet](https://inleo.io/@leoglossary/leoglossary-internet) are confronted with a situation where the ability to control it is taking on added meaning.


With so much [money](https://inleo.io/@leoglossary/leoglossary-money) on the line, these entities are taking strides to lock things down.


While technological steps are one thing, it is something else when the law gets involved.

<center> https://img.inleo.io/DQmXtX4p6Atb3CHyVwUfPfxa2MpGyBcqMmwHmKsTZjaHGqc/image.png 
Image generated by Ideogram </center>


# Illegal Web Scraping Screams For Data Democratization


Here is a short [video](https://inleo.io/@leoglossary/leoglossary-video) that discusses a case that went in front of a United States [court](https://inleo.io/@leoglossary/leoglossary-court-legal) that dealt with data scraping.  The accused was found to be guilty.  This was a corporate case but does bring up some interesting questions.


https://inleo.io/threads/view/taskmaster4450le/re-leothreads-2mo8o3c7p


The one element here is that we are looking at data extraction tied to [fraud](https://inleo.io/@leoglossary/leoglossary-fraud).  Of course, in many parts of the [world](https://inleo.io/@leoglossary/leoglossary-world), there are laws against fraud, regardless of how the information used was acquired.


Leaving that aspect of this aside, what about the act of designing an automated agent to pull data off [websites](https://inleo.io/@leoglossary/leoglossary-website).  What happens if they becomes illegal?


Certainly, this is something that will have to filter through the courts and every country will be different.  However, we saw a move where developers are being held responsible for what their [software](https://inleo.io/@leoglossary/leoglossary-software) does.


Most will remember the case of [the Tornado Cash developer who got 64 months for money laundering](https://www.coindesk.com/policy/2024/05/14/tornado-cash-developer-alexey-pertsev-found-guilty-of-money-laundering/).  Basically he designed a [privacy](https://inleo.io/@leoglossary/leoglossary-privacy) [application](https://inleo.io/@leoglossary/leoglossary-application) that allowed for the swapping of [cryptocurrency](https://inleo.io/@leoglossary/cryptocurrency).


Thus, we cannot call it unreasonable to think that some [governments](https://inleo.io/@leoglossary/leoglossary-government) will take such action.  If that is the case, could [developers](https://inleo.io/@leoglossary/leoglossary-developer) be held responsible?


## Democratized Data


The democratization of data solves this problem.


What this means is generating data that is placed in public [databases](https://inleo.io/@leoglossary/leoglossary-database), such as the Hive [blockchain](https://inleo.io/@leoglossary/blockchain), where anyone is free to utilize it.  Since nobody owns it, start ups can garner the data to train their models. 


This is not the case with entities such as [Reddit](https://inleo.io/@leoglossary/leoglossary-reddit) and X which are locking down their sites.  The ability to scrape the Internet is diminishing.


We also have to factor in lawsuits.


[OpenAi](https://inleo.io/@leoglossary/leoglossary-openai) has been sued by a number of entities for training their models on data claimed to be under copyright laws.  This is going to have to make it through the court system before we know where the rulings stand.  Nevertheless, this [company](https://inleo.io/@leoglossary/leoglossary-company-business) faces the potential in billions in verdicts.


It is obvious start ups cannot withstand this.


So what are they to do?


Actually, a better question is what are we going to do?  Do we want a future where Big Tech is the only one with access to data?  Is the idea of a handful of mega-corporations being the developers of these models appealing to people?


The answer to this question should dictate future behavior.


If one has no problem with this future, then feeding the massive beasts are no problem.  [Google](https://inleo.io/@leoglossary/leoglossary-google-company), [Amazon](https://inleo.io/@leoglossary/leoglossary-amazon-company), X, and Meta will see their database grow on a daily basis, allowing them to feed increasing compute they acquire.


On the other hand, if one stands for [decentralization](https://inleo.io/@leoglossary/leoglossary-decentralization) and distribution, then these centralized entities are even less appealing.


## Web 3.0 = Decentralization


It is no secret that a core tenet of [Web 3.0](https://inleo.io/@leoglossary/leoglossary-web-3-0) is the idea of decentralization.


Actually, we are looking at a [technology](https://inleo.io/@leoglossary/leoglossary-technology) that was brought about with the idea of democratized data from the start.  The breakthrough of [Bitcoin](https://inleo.io/@leoglossary/leoglossary-bitcoin) came from the ability to arrive at [consensus](https://inleo.io/@leoglossary/leoglossary-consensus) without a centralized third party.  This means that the [ledger](https://inleo.io/@leoglossary/leoglossary-ledger), i.e. database, was not under the control of a single entity.


Bitcoin's data, for the most part, is limited to financial [transactions](https://inleo.io/@leoglossary/leoglossary-transaction).  Over the [years](https://inleo.io/@leoglossary/leoglossary-year), other databases are showing up that expanded upon this concept.  Hive is an example of a [permissionless](https://inleo.io/@leoglossary/leoglossary-permissionless-blockchain) text database.


We are now seeing this growing in imporance.  Some like to cite how "data is the new oil".  If that is the case, who is getting more [oil](https://inleo.io/@leoglossary/leoglossary-oil) is the question?  


Is humanity well served by creating another cartel like we see with the physical [commodity](https://inleo.io/@leoglossary/leoglossary-commodity), only this [time](https://inleo.io/@leoglossary/leoglossary-time) in the [digital](https://inleo.io/@leoglossary/leoglossary-digital) world?


Our success with cartels seems rather clear.


The foundation of the Internet is the database.  Everything we do is tied to it.  Without databases, we would have nothing on our [screen](https://inleo.io/@leoglossary/leoglossary-screen-film-industry).  This applies whether we are discussing [Web 2.0](https://inleo.io/@leoglossary/leoglossary-web-2-0) or Web 3.0.


AI training is taking this to another level.  We see the value grow, meaning these lead this large entities has keeps growing.  


Permissionless databases hold the key to combating this.  Even if the law starts to swing in the direction of holding developers responsible, democratized data makes it a meaningless point.


____


<center> [What Is Hive](https://inleo.io/@leoglossary/leoglossary-what-is-hive) </center>





Posted Using [InLeo Alpha](https://inleo.io/@taskmaster4450/illegal-web-scraping-makes-democratized-data-even-more-crucial-eby)
👍  , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , and 665 others
👎  
properties (23)
authortaskmaster4450
permlinkillegal-web-scraping-makes-democratized-data-even-more-crucial-eby
categoryhive-167922
json_metadata{"app":"leothreads/0.3","format":"markdown","tags":["hive-167922","ai","data","permissionless","developers","crime","mancave","neoxian","proofofbrain","archon"],"canonical_url":"https://inleo.io/@taskmaster4450/illegal-web-scraping-makes-democratized-data-even-more-crucial-eby","links":["https://inleo.io/@leoglossary/leoglossary-story).","https://inleo.io/@leoglossary/leoglossary-data).","https://inleo.io/@leoglossary/leoglossary-artificial-intelligence)","https://inleo.io/@leoglossary/leoglossary-server-computer).","https://inleo.io/@leoglossary/leoglossary-internet)","https://inleo.io/@leoglossary/leoglossary-money)","https://inleo.io/@leoglossary/leoglossary-video)","https://inleo.io/@leoglossary/leoglossary-court-legal)","https://inleo.io/threads/view/taskmaster4450le/re-leothreads-2mo8o3c7p","https://inleo.io/@leoglossary/leoglossary-fraud).","https://inleo.io/@leoglossary/leoglossary-world),","https://inleo.io/@leoglossary/leoglossary-website).","https://inleo.io/@leoglossary/leoglossary-software)","https://www.coindesk.com/policy/2024/05/14/tornado-cash-developer-alexey-pertsev-found-guilty-of-money-laundering/).","https://inleo.io/@leoglossary/leoglossary-privacy)","https://inleo.io/@leoglossary/leoglossary-application)","https://inleo.io/@leoglossary/cryptocurrency).","https://inleo.io/@leoglossary/leoglossary-government)","https://inleo.io/@leoglossary/leoglossary-developer)","https://inleo.io/@leoglossary/leoglossary-database),","https://inleo.io/@leoglossary/blockchain),","https://inleo.io/@leoglossary/leoglossary-reddit)","https://inleo.io/@leoglossary/leoglossary-openai)","https://inleo.io/@leoglossary/leoglossary-company-business)","https://inleo.io/@leoglossary/leoglossary-google-company),","https://inleo.io/@leoglossary/leoglossary-amazon-company),","https://inleo.io/@leoglossary/leoglossary-decentralization)","https://inleo.io/@leoglossary/leoglossary-web-3-0)","https://inleo.io/@leoglossary/leoglossary-technology)","https://inleo.io/@leoglossary/leoglossary-bitcoin)","https://inleo.io/@leoglossary/leoglossary-consensus)","https://inleo.io/@leoglossary/leoglossary-ledger),","https://inleo.io/@leoglossary/leoglossary-transaction).","https://inleo.io/@leoglossary/leoglossary-year),","https://inleo.io/@leoglossary/leoglossary-permissionless-blockchain)","https://inleo.io/@leoglossary/leoglossary-oil)","https://inleo.io/@leoglossary/leoglossary-commodity),","https://inleo.io/@leoglossary/leoglossary-time)","https://inleo.io/@leoglossary/leoglossary-digital)","https://inleo.io/@leoglossary/leoglossary-screen-film-industry).","https://inleo.io/@leoglossary/leoglossary-web-2-0)","https://inleo.io/@leoglossary/leoglossary-what-is-hive)","https://inleo.io/@taskmaster4450/illegal-web-scraping-makes-democratized-data-even-more-crucial-eby)"],"images":["https://img.inleo.io/DQmXtX4p6Atb3CHyVwUfPfxa2MpGyBcqMmwHmKsTZjaHGqc/image.png"],"isPoll":false,"dimensions":{}}
created2024-07-30 13:21:30
last_update2024-07-30 13:21:30
depth0
children10
last_payout2024-08-06 13:21:30
cashout_time1969-12-31 23:59:59
total_payout_value0.080 HBD
curator_payout_value8.126 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length7,615
author_reputation6,682,569,699,189,443
root_title"Illegal Web Scraping: Makes Democratized Data Even More Crucial"
beneficiaries
0.
accounttaskmaster4450le
weight9,900
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id135,789,563
net_rshares62,391,293,468,378
author_curate_reward""
vote details (730)
@daniasi ·
$0.58
**Do we want a future where Big Tech is the only one with access to data?** trust me, you wouldn't love witnessing this outcome. It has been a recent discussion I made with a friend on how data may become too expensive for the masses to afford should centralized hands again prevail.
👍  , , ,
properties (23)
authordaniasi
permlinkre-taskmaster4450-2q9zirpgu
categoryhive-167922
json_metadata{"app":"leothreads/0.3","format":"markdown","tags":["leofinance"],"canonical_url":"https://inleo.io/@daniasi/re-taskmaster4450-2q9zirpgu","isPoll":false,"pollOptions":{},"dimensions":[]}
created2024-07-30 15:02:48
last_update2024-07-30 15:02:48
depth1
children1
last_payout2024-08-06 15:02:48
cashout_time1969-12-31 23:59:59
total_payout_value0.292 HBD
curator_payout_value0.291 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length283
author_reputation76,021,005,863,664
root_title"Illegal Web Scraping: Makes Democratized Data Even More Crucial"
beneficiaries[]
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id135,791,159
net_rshares2,225,588,212,853
author_curate_reward""
vote details (4)
@taskmaster4450le ·
Just look at what Google paid Reddit.

That sums it up.
properties (22)
authortaskmaster4450le
permlinkre-daniasi-31mygchby
categoryhive-167922
json_metadata{"app":"leothreads/0.3","format":"markdown","tags":["leofinance"],"canonical_url":"https://inleo.io/@taskmaster4450le/re-daniasi-31mygchby","isPoll":false,"pollOptions":{},"dimensions":[]}
created2024-07-31 14:18:54
last_update2024-07-31 14:18:54
depth2
children0
last_payout2024-08-07 14:18:54
cashout_time1969-12-31 23:59:59
total_payout_value0.000 HBD
curator_payout_value0.000 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length55
author_reputation2,182,374,232,713,966
root_title"Illegal Web Scraping: Makes Democratized Data Even More Crucial"
beneficiaries[]
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id135,818,483
net_rshares0
@hivebuzz ·
Congratulations @taskmaster4450! You have completed the following achievement on the Hive blockchain And have been rewarded with New badge(s)

<table><tr><td><img src="https://images.hive.blog/60x70/https://hivebuzz.me/@taskmaster4450/upvoted.png?202407301327"></td><td>You received more than 2240000 upvotes.<br>Your next target is to reach 2250000 upvotes.</td></tr>
</table>

<sub>_You can view your badges on [your board](https://hivebuzz.me/@taskmaster4450) and compare yourself to others in the [Ranking](https://hivebuzz.me/ranking)_</sub>
<sub>_If you no longer want to receive notifications, reply to this comment with the word_ `STOP`</sub>



**Check out our last posts:**
<table><tr><td><a href="/hive-122221/@hivebuzz/pud-202408"><img src="https://images.hive.blog/64x128/https://i.imgur.com/805FIIt.jpg"></a></td><td><a href="/hive-122221/@hivebuzz/pud-202408">Hive Power Up Day - August 1st 2024</a></td></tr></table>
properties (22)
authorhivebuzz
permlinknotify-1722346633
categoryhive-167922
json_metadata{"image":["https://hivebuzz.me/notify.t6.png"]}
created2024-07-30 13:37:12
last_update2024-07-30 13:37:12
depth1
children0
last_payout2024-08-06 13:37:12
cashout_time1969-12-31 23:59:59
total_payout_value0.000 HBD
curator_payout_value0.000 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length932
author_reputation369,396,542,694,543
root_title"Illegal Web Scraping: Makes Democratized Data Even More Crucial"
beneficiaries[]
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id135,789,819
net_rshares0
@hivecurators ·
<center><sub>[ 🎉 Upvoted 🎉 ](https://hivesigner.com/sign/account-witness-vote?witness=sagarkothari88&approve=1)<br>👏 Keep Up the good work on Hive ♦️ 👏<br>❤️ @mysteriousroad suggested [sagarkothari88](https://hivesigner.com/sign/account-witness-vote?witness=sagarkothari88&approve=1) to upvote your post ❤️<br>[🙏 Don't forget to Support Back 🙏](https://hivesigner.com/sign/account-witness-vote?witness=sagarkothari88&approve=1)</sub></center>
properties (22)
authorhivecurators
permlink20240730t134423737z
categoryhive-167922
json_metadata{"tags":["hive-185924","gift","support","hive-curators","motivate","witness","sagarkothari88"],"format":"markdown","app":"hivecurators_bot"}
created2024-07-30 13:44:27
last_update2024-07-30 13:44:27
depth1
children0
last_payout2024-08-06 13:44:27
cashout_time1969-12-31 23:59:59
total_payout_value0.000 HBD
curator_payout_value0.000 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length442
author_reputation6,181,477,982,082
root_title"Illegal Web Scraping: Makes Democratized Data Even More Crucial"
beneficiaries
0.
accountmysteriousroad
weight10,000
max_accepted_payout100,000.000 HBD
percent_hbd0
post_id135,789,955
net_rshares0
@novacadian ·
$0.59
The concerns which you raise are of greater or lesser importance depending on the motivations of the developers. If the motivation is more political/ideological/non-profit driven like open source then it is a hard one to legislate against. Bitcoin itself deflected many attacks due to the fact there was no one person or corporation in charge that could be fined or prosecuted; just an elusive Satoshi Nakamoto. 

Remember the encryption schemes that the US government outlawed downloading outside of the USA in the 90s? That kinda fizzled didn't it. If my server is using a VPN and the use of the data collected is not centralized nor profit driven then ideally we are likely to see more fizzling in my opinion.
👍  , , ,
properties (23)
authornovacadian
permlinkshg67v
categoryhive-167922
json_metadata{"app":"hiveblog/0.1"}
created2024-07-30 17:35:57
last_update2024-07-30 17:35:57
depth1
children1
last_payout2024-08-06 17:35:57
cashout_time1969-12-31 23:59:59
total_payout_value0.296 HBD
curator_payout_value0.295 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length712
author_reputation26,835,881,337,820
root_title"Illegal Web Scraping: Makes Democratized Data Even More Crucial"
beneficiaries[]
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id135,793,954
net_rshares2,247,027,137,682
author_curate_reward""
vote details (4)
@taskmaster4450le ·
The downloading issue, much of that tied to copyright, did fizzle for a couple reasons.

To start, the sheer magnitude of that activity was overwhelming.  This is not going to be the case with the developers, not the same numbers.  

A second issue is the fact that we were dealing with "crimes" that were mostly civil, i.e people being sued or facing fines.  When something is tied to a jail sentence, if can be.

They could easily tie this to espionage or something like that.

As for the open nature and decentralized, I agree with you 1000%.  That is why we have to get as much data on permissionless networks as possible.
👍  
properties (23)
authortaskmaster4450le
permlinkre-novacadian-2aypdndcv
categoryhive-167922
json_metadata{"app":"leothreads/0.3","format":"markdown","tags":["leofinance"],"canonical_url":"https://inleo.io/@taskmaster4450le/re-novacadian-2aypdndcv","isPoll":false,"pollOptions":{},"dimensions":[]}
created2024-07-31 14:18:30
last_update2024-07-31 14:18:30
depth2
children0
last_payout2024-08-07 14:18:30
cashout_time1969-12-31 23:59:59
total_payout_value0.000 HBD
curator_payout_value0.000 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length626
author_reputation2,182,374,232,713,966
root_title"Illegal Web Scraping: Makes Democratized Data Even More Crucial"
beneficiaries[]
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id135,818,480
net_rshares30,400,290,589
author_curate_reward""
vote details (1)
@outwars ·
$0.58
I guess it depends on who you ask. For those who use Meta, Google [Android], and Twitter, they pretty much know and accept that their data is being used by these companies and they are ok with it. They believe their data is relatively safe in these big companies. Barring a data hack/leak, the worse that can happen is their email being sold to advertisers. But trusting and using unknown applications is much more scarier for them. 
👍  , , ,
properties (23)
authoroutwars
permlinkre-taskmaster4450-shgtwo
categoryhive-167922
json_metadata{"tags":["hive-167922"],"app":"peakd/2024.7.4"}
created2024-07-31 02:07:39
last_update2024-07-31 02:07:39
depth1
children1
last_payout2024-08-07 02:07:39
cashout_time1969-12-31 23:59:59
total_payout_value0.292 HBD
curator_payout_value0.292 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length433
author_reputation243,863,689,488,263
root_title"Illegal Web Scraping: Makes Democratized Data Even More Crucial"
beneficiaries[]
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id135,804,581
net_rshares2,269,004,759,625
author_curate_reward""
vote details (4)
@taskmaster4450le ·
There is a big difference between building a scraping agent and a hack. 
👍  
properties (23)
authortaskmaster4450le
permlinkre-outwars-2vpgjzya7
categoryhive-167922
json_metadata{"app":"leothreads/0.3","format":"markdown","tags":["leofinance"],"canonical_url":"https://inleo.io/@taskmaster4450le/re-outwars-2vpgjzya7","isPoll":false,"pollOptions":{},"dimensions":[]}
created2024-07-31 14:15:21
last_update2024-07-31 14:15:21
depth2
children0
last_payout2024-08-07 14:15:21
cashout_time1969-12-31 23:59:59
total_payout_value0.000 HBD
curator_payout_value0.000 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length72
author_reputation2,182,374,232,713,966
root_title"Illegal Web Scraping: Makes Democratized Data Even More Crucial"
beneficiaries[]
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id135,818,440
net_rshares4,377,296,460
author_curate_reward""
vote details (1)
@sgt-dan ·
$0.58
It is taking some time for *the law* to catch up with technology. Copywrited content/information is subject to be exploited by artifical intelligence as it *scrapes the web*. It is a lot to keep up with.
👍  , , , ,
properties (23)
authorsgt-dan
permlinkre-taskmaster4450-2024730t1013237z
categoryhive-167922
json_metadata{"tags":["hive-167922","ai","data","permissionless","developers","crime","mancave","neoxian","proofofbrain","archon"],"app":"ecency/3.2.0-vision","format":"markdown+html"}
created2024-07-30 14:01:30
last_update2024-07-30 14:01:30
depth1
children1
last_payout2024-08-06 14:01:30
cashout_time1969-12-31 23:59:59
total_payout_value0.288 HBD
curator_payout_value0.289 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length203
author_reputation33,689,299,469,472
root_title"Illegal Web Scraping: Makes Democratized Data Even More Crucial"
beneficiaries[]
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id135,790,296
net_rshares2,204,578,618,427
author_curate_reward""
vote details (5)
@taskmaster4450le ·
That is true.

The MO of governments, knowing they cant stop the masses, is to make an example of people.

When one is put on trail, and they feed the media machine, it is a scare tactic that can control those who are thinking about doing that.

Of course, with tech, we are dealing with something global so those in North Korea really do not care what the EU or US says.
👍  
properties (23)
authortaskmaster4450le
permlinkre-sgt-dan-2h7uv71ui
categoryhive-167922
json_metadata{"app":"leothreads/0.3","format":"markdown","tags":["leofinance"],"canonical_url":"https://inleo.io/@taskmaster4450le/re-sgt-dan-2h7uv71ui","isPoll":false,"pollOptions":{},"dimensions":[]}
created2024-07-31 14:20:24
last_update2024-07-31 14:20:24
depth2
children0
last_payout2024-08-07 14:20:24
cashout_time1969-12-31 23:59:59
total_payout_value0.000 HBD
curator_payout_value0.000 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length371
author_reputation2,182,374,232,713,966
root_title"Illegal Web Scraping: Makes Democratized Data Even More Crucial"
beneficiaries[]
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id135,818,500
net_rshares8,706,669,065
author_curate_reward""
vote details (1)