Teaching AI to be Evil with Unethical Data by mrosenquist

View this thread on: hive.blog | peakd.com | ecency.com

technology · @mrosenquist · Jul 4 '20

$8.70

Teaching AI to be Evil with Unethical Data

<html>
<p><img src="https://i.postimg.cc/wMMQMRFk/Laptop-fire.jpg"/></p>
<p>An Artificial Intelligence (AI) system is only as good as its training. For AI Machine Learning (ML) and Deep Learning (DL) frameworks, the training data sets are a crucial element that defines how the system will operate. Feed it skewed or biased information and it will create a flawed inference engine. </p>
<p><a href="https://thenextweb.com/neural/2020/07/01/mit-removes-huge-dataset-that-teaches-ai-systems-to-use-racist-misogynistic-slurs/">MIT recently removed a dataset</a> that has been popular with AI developers. The training set, 80 Million Tiny Images, was scraped from Google in 2008 and used in training AI software to identify objects. It consists of images that are labeled with descriptions. During the learning phase, an AI system will ingest the dataset and ‘learn’ how to classify images. The problem is that many of the images are questionable and the labels were inappropriate. For example, women are described with derogatory terms, body parts are identified with offensive slang, and racial slurs were sometimes used to label minority people. Such training should never be allowed.</p>
<p>AI developers need vast amounts of training data to train their systems. Collections are often created out of convenience, without consideration for courteous content, copyright restrictions, compliance to licensing agreements, people’s privacy rights, or respect for society. Unfortunately, many of the available sets were haphazardly created by scraping the internet, social sites, copyrighted content, and human interactions without approval or notice. </p>
<p>Many of the most used training datasets have issues. A large number were created by unethically acquiring content, some contain derogatory or inflammatory information, and for others, the sample is not representative because it excludes certain groups that would benefit from inclusion. </p>
<p>The problem has become worse over time. Flawed datasets, that were made openly available to the developer community early-on, became so popular that they are now considered a standard. These benchmarks are used to check accuracy and performance across different AI systems and configurations. </p>
<p>Too few are vetted for inclusion, content, accuracy, or socially acceptable content. Using such flawed records is simply unethical because the resulting systems can be racially charged, biased, and promote inequality. </p>
<p>We cannot have good AI if the commonly used datasets create unethical systems. All files should be vetted and both the creators and product developers held responsible. Just as chefs are held accountable for the ingredients they put into their prepared dishes, so should the AI community be held responsible for allowing poor data to result in harmful AI systems.</p>
<p><br/></p>
<p><br/></p>
<p>Interested in more? Follow me on <a href="https://www.linkedin.com/today/author/matthewrosenquist">LinkedIn</a>, <a href="https://medium.com/@matthew.rosenquist">Medium</a>, and <a href="https://twitter.com/Matt_Rosenquist">Twitter (@Matt_Rosenquist)</a> to hear insights, rants, and what is going on in cybersecurity.
<br/></p>
</html>

👍 gtg, buildawhale, themarkymark, gungunkrishu, inertia, fredrikaa, mangos, deepdives, therealwolf, stevescoins, mrosenquist, techslut, smartsteem, kamchore, enforcer48, arcange, randr10, bitshares101, titusfrost, gitplait, gadrian, mys, maruskina, valued-customer, onealfa, makerhacks, tombstone, healthdear, ultimus, whd, hjmarseille, gorbisan, kggymlife, balticbadger, upmyvote, alby2, epicdice, gadrian-sp, commonlaw, frassman, beleg, julian2013, fbslo, archisteem, samuel.steem, raphaelle, quantumdeveloper, boomalex, ipromote, tykee, machete9595, fengchao, pecoshop, btc4breackfast, paullifefit, paolo.senegal, tonimontana, nalacanecorso, rafalski, stem.curate, abh12345.stem, stem.alfa, gordonramzy, zorg67, and 8 others

`author`	mrosenquist
`permlink`	teaching-ai-to-be-evil-with-unethical-data
`category`	technology
`json_metadata`	{"tags":["technology","ai","artificialintelligence","cybersecurity","ethics"],"image":["https://i.postimg.cc/wMMQMRFk/Laptop-fire.jpg"],"links":["https://thenextweb.com/neural/2020/07/01/mit-removes-huge-dataset-that-teaches-ai-systems-to-use-racist-misogynistic-slurs/","https://www.linkedin.com/today/author/matthewrosenquist","https://medium.com/@matthew.rosenquist","https://twitter.com/Matt_Rosenquist"],"app":"hiveblog/0.1","format":"html"}
`created`	2020-07-04 17:13:27
`last_update`	2020-07-04 17:13:27
`depth`	0
`children`	4
`last_payout`	2020-07-11 17:13:27
`cashout_time`	1969-12-31 23:59:59
`total_payout_value`	4.546 HBD
`curator_payout_value`	4.154 HBD
`pending_payout_value`	0.000 HBD
`promoted`	0.000 HBD
`body_length`	3,210
`author_reputation`	178,405,687,597,634
`root_title`	"Teaching AI to be Evil with Unethical Data"
`beneficiaries`	`[]`
`max_accepted_payout`	1,000,000.000 HBD
`percent_hbd`	10,000
`post_id`	98,339,689
`net_rshares`	23,496,591,690,716
`author_curate_reward`	""

properties (23)vote details (72)

voter	rshares	pct
tombstone	14,093,628,805	0.67%
onealfa	22,306,575,606	1.12%
gtg	9,976,942,498,087	50%
bitshares101	66,593,683,805	25%
inertia	999,196,321,007	100%
arcange	95,894,569,413	3%
raphaelle	2,520,072,851	3%
mrosenquist	395,248,971,025	100%
stevescoins	399,845,306,029	100%
ultimus	13,010,659,234	100%
randr10	84,826,460,385	50%
titusfrost	65,415,527,819	100%
techslut	139,528,429,347	25%
mangos	918,914,458,017	22%
zorg67	561,993,373	100%
valued-customer	22,515,698,014	25%
mys	33,302,603,621	9%
fredrikaa	924,391,060,161	100%
whd	12,023,207,087	9%
rafalski	1,108,332,084	9%
themarkymark	1,857,290,438,366	9%
tykee	2,392,892,961	45%
fbslo	3,444,296,159	4.5%
buildawhale	4,374,193,302,876	9%
paullifefit	1,570,561,131	75%
therealwolf	683,988,881,413	2.25%
makerhacks	21,389,921,884	9%
smartsteem	136,577,325,252	2.25%
kamchore	124,907,133,500	90%
upmyvote	6,662,479,362	9%
ipromote	2,395,465,201	9%
frassman	4,129,775,586	25%
beleg	4,113,969,521	9%
gadrian	55,926,538,271	35%
tonimontana	1,179,926,757	40.69%
enforcer48	110,168,246,712	15%
archisteem	2,589,821,521	7.5%
commonlaw	4,440,803,229	35%
jk6276	241,664,995	1%
deepdives	692,354,888,293	100%
julian2013	3,985,705,780	4.05%
gorbisan	8,172,248,228	4.5%
gordonramzy	568,533,664	100%
jessica.steem	540,155,730	100%
oakshieldholding	150,261,262	100%
laissez-faire	48,885,749	100%
balticbadger	7,275,477,939	25%
gungunkrishu	1,037,415,266,107	100%
healthdear	13,855,019,116	100%
limka	36,256,399	100%
samuel.steem	2,552,041,459	100%
kggymlife	7,403,509,789	25%
paolo.senegal	1,357,923,632	100%
epicdice	4,904,038,816	0.67%
gadrian-sp	4,592,222,546	35%
quantumdeveloper	2,442,257,578	50%
maruskina	32,443,613,749	50%
nalacanecorso	1,157,290,387	100%
abh12345.stem	838,796,365	100%
btc4breackfast	1,703,080,791	100%
stem.alfa	730,870,492	100%
alby2	5,554,493,170	100%
vxc.stem	0	40.69%
yggdrasil.laguna	321,752,275	70%
cd-stem	542,913,606	100%
pecoshop	1,776,317,314	100%
stem.curate	899,076,338	100%
hjmarseille	10,614,922,714	70%
boomalex	2,401,745,700	100%
machete9595	2,314,265,483	100%
fengchao	2,211,262,497	2%
gitplait	59,583,097,281	90%

@gitplait-mod2 · Jul 4 '20

$0.09

The way the AI system operates is based on the dataset trained into it. So I have got to understand that most of those data used in the AI are just random stuff which result to an unethical system.

Great post sir.

Thanks for sharing. 👍

<sub> **Your post has been submitted to be manually curated by @gitplait community account because this is the kind of publications we like to see in our community.** </sub>

Join our [Community on Hive](https://hive.blog/trending/hive-103590) and Chat with us on [Discord](https://discord.gg/CWCj3rw).

[[Gitplait-Team]](https://gitplait.tech/)

👍 mrosenquist

`author`	gitplait-mod2
`permlink`	re-mrosenquist-qcyila
`category`	technology
`json_metadata`	{"tags":["technology"],"app":"peakd/2020.07.1"}
`created`	2020-07-04 18:12:48
`last_update`	2020-07-04 18:12:48
`depth`	1
`children`	0
`last_payout`	2020-07-11 18:12:48
`cashout_time`	1969-12-31 23:59:59
`total_payout_value`	0.044 HBD
`curator_payout_value`	0.045 HBD
`pending_payout_value`	0.000 HBD
`promoted`	0.000 HBD
`body_length`	584
`author_reputation`	28,898,670,427
`root_title`	"Teaching AI to be Evil with Unethical Data"
`beneficiaries`	`[]`
`max_accepted_payout`	1,000,000.000 HBD
`percent_hbd`	10,000
`post_id`	98,340,522
`net_rshares`	394,732,886,257
`author_curate_reward`	""

properties (23)vote details (1)

voter	weight	wgt%	rshares	pct	time
mrosenquist	0 B		394,732,886,257	100%

@gitplait-mod2 · Jul 4 '20

The way the AI system operates is based on the dataset trained into it. So I have got to understand that most of those data used in the AI are just random stuff which result to an unethical system.

Great post sir.

Thanks for sharing. 👍

<sub> **Your post has been submitted to be manually curated by @gitplait community account because this is the kind of publications we like to see in our community.** </sub>

Join our [Community on Hive](https://hive.blog/trending/hive-103590) and Chat with us on [Discord](https://discord.gg/CWCj3rw).

[[Gitplait-Team]](https://gitplait.tech/)

👍 mrosenquist

`author`	gitplait-mod2
`permlink`	re-mrosenquist-qcyim3
`category`	technology
`json_metadata`	{"tags":["technology"],"app":"peakd/2020.07.1"}
`created`	2020-07-04 18:13:18
`last_update`	2020-07-04 18:13:18
`depth`	1
`children`	0
`last_payout`	2020-07-11 18:13:18
`cashout_time`	1969-12-31 23:59:59
`total_payout_value`	0.000 HBD
`curator_payout_value`	0.000 HBD
`pending_payout_value`	0.000 HBD
`promoted`	0.000 HBD
`body_length`	584
`author_reputation`	28,898,670,427
`root_title`	"Teaching AI to be Evil with Unethical Data"
`beneficiaries`	`[]`
`max_accepted_payout`	1,000,000.000 HBD
`percent_hbd`	10,000
`post_id`	98,340,528
`net_rshares`	0
`author_curate_reward`	""

properties (23)vote details (1)

voter	weight	wgt%	rshares	pct	time
mrosenquist	0 B		0	0%

@ultimus · Jul 4 '20

$0.06

How can developers justify using unethical data sets when creating an AI system?  Do they think it will magically turn out well?

👍 mrosenquist

`author`	ultimus
`permlink`	qcygaf
`category`	technology
`json_metadata`	{"app":"hiveblog/0.1"}
`created`	2020-07-04 17:23:06
`last_update`	2020-07-04 17:23:06
`depth`	1
`children`	1
`last_payout`	2020-07-11 17:23:06
`cashout_time`	1969-12-31 23:59:59
`total_payout_value`	0.044 HBD
`curator_payout_value`	0.012 HBD
`pending_payout_value`	0.000 HBD
`promoted`	0.000 HBD
`body_length`	128
`author_reputation`	6,664,676,750,516
`root_title`	"Teaching AI to be Evil with Unethical Data"
`beneficiaries`	`[]`
`max_accepted_payout`	1,000,000.000 HBD
`percent_hbd`	10,000
`post_id`	98,339,807
`net_rshares`	395,248,971,025
`author_curate_reward`	""

properties (23)vote details (1)

voter	weight	wgt%	rshares	pct	time
mrosenquist	0 B		395,248,971,025	100%

@mrosenquist · Jul 4 '20

It is tough to quickly obtain vast amounts of training data.  So they just grab whatever they can.  They aren't thinking about the ethics of the training data, just how it will progress their project.  Shortcuts have consequences.

properties (22)