<html> <p>Hoy vamos a extraer una tabla de Internet que contiene: Valor de un String o el UTF-8, sus equivalentes en Binario, Octal, Decimal, Hexadecimal y una breve descripción.</p> <p>La tabla la saque de aqui: https://www.sciencebuddies.org/science-fair-projects/references/table-of-8-bit-ascii-character-codes</p> <p>Lo interesante de todo sera usar el Scrapping de python con BeutifullSoup para extraer una estructura de Lista, luego esta lista puede traer errores así que la filtramos a listas de Python, y luego la pasamos a un diccionario para tener una llave y un valor de esta manera por ejemplo:</p> <blockquote>dictionary_unicode = {"A": ['065', '101', '041', '0100 0001']}</blockquote> <p>Donde al llamar a:</p> <blockquote><code>dictionary_unicode['A'][0]</code> dará como resultado <code>065</code> que no es mas que la parte decimal de "A" si queremos la parte binaria seria asi <code>dictionary_unicode['A'][3]</code></blockquote> <p>Aquí tenemos el primer código iré comentándolo que vamos haciendo entre códigos incluso puede q repita la clase por que como STEEMIT no tiene una buena forma de hacer series es mejor repetirlas, refrescar mejorar y agregar..</p> <pre><code># -*- coding: utf-8 -*-</code></pre> <pre><code>import urllib2</code></pre> <pre><code>from bs4 import BeautifulSoup</code></pre> <pre><code>import re</code></pre> <pre><code>from prettytable import PrettyTable</code></pre> <pre><code>import os, subprocess</code></pre> <pre><code>url_unicode = "https://www.sciencebuddies.org/science-fair-projects/references/table-of-8-bit-ascii-character-codes"</code></pre> <pre><code>page = urllib2.urlopen(url_unicode)</code></pre> <pre><code>soup = BeautifulSoup(page, "lxml")</code></pre> <pre><code>name_box = soup.find('div', attrs={'class': 'content-table page-break-avoid'}) </code></pre> <pre><code>name_boxaaa = soup.find('table') </code></pre> <pre><code>table_headers = name_boxaaa.find_all('tr')</code></pre> <p><br></p> <p>A ver vamos a entender un poco el código, en esta primera entrada llegaremos a mandar la lista a un archivo esto para que cuando ya tengamos la lista no tengamos que descargarla de nuevo :D</p> <pre><code>import urllib2</code></pre> <p>Esta librería es para importar urls completas o mejor dicho lo que tienen dentro de esa url.</p> <pre><code>from bs4 import BeautifulSoup</code></pre> <p>importamos BeautifulSoup desde la librería bs4 para manejar mejor la Sopa del Html que necesitamos extraer o XML.</p> <pre><code>import re</code></pre> <p>re es Expresiones regulare como dicen los Pro es un mundo aparte de programación, que todo programador ya sea de Java, Python o los que sea debería conocer.</p> <pre><code>import os, subprocess</code></pre> <p>Estas son librerías que solo funcionaran en Sistemas Linux según entiendo nunca las he usado en Windows...</p> <p>Ahora se viene lo bueno:</p> <pre><code>url_unicode = "https://www.sciencebuddies.org/science-fair-projects/references/table-of-8-bit-ascii-character-codes"</code></pre> <pre><code>page = urllib2.urlopen(url_unicode)</code></pre> <p>aqui abrimos la url con urllib2, hacemos un request por decirlo mejor.</p> <pre><code>soup = BeautifulSoup(page, "lxml")</code></pre> <p>Ya luego la pasamos a un formato mas legible por si algun programador no hizo su trabajo y lo ponemos en XML o una estructura mas amigable, si quiere saber de que hablo coloque un <code>print page </code>Antes de soup.</p> <p>Ahora necesitamos extraer la tabla que necesitamos lo hacemos asi:</p> <pre><code>name_box = soup.find('div', attrs={'class': 'content-table page-break-avoid'}) </code></pre> <p>La clase <code>content-table page-break-avoid </code>es la que contiene la tabla que necesitamos recuerden que podríamos tener muchas tablas y div es donde se encuentra esa clase.</p> <pre><code>name_boxaaa = soup.find('table') </code></pre> <p>Luego ya tenemos todo mas compacto, te invito a meter un <code>print name_box</code> antes de <code>name_boxaaa </code>en la linea anterior buscamos con find la etiqueta table.</p> <pre><code>table_headers = name_boxaaa.find_all('tr')</code></pre> <p>Ahora buscamos las etiqueta ""tr"" y aprovechamos BeautifullSoup y usamos el find_all propio de las expresiones regulares para crear una lista de cada "tr", esta lista necesitamos enviarla a un archivo para no tener que descargarla siempre.</p> <p><br></p> <p>Ahora mismo la tenemos en una lista todos los "tr" pero necesitamos leerla mucho y no deberíamos tener que conectarnos a Internet siempre.</p> <p>lo hacemos así:</p> <pre><code>crear_LISTA_unicode = subprocess.call("touch lista_en.txt", shell=True)</code></pre> <p>Aqui usamos <code>subprocess.call</code> para ejecutar comandos de Linux desde Python.</p> <pre><code>f=open("lista_en.txt","w")</code></pre> <p>Abrimos el Archivo anteriormente creado.</p> <pre><code>f.write(lista_unicode)</code></pre> <p>escribimos la lista que hemos descargado "lista_unicode"</p> <pre><code>f.close()</code></pre> <p>Y cerramos si queremos ver la lista simplemente colocamos un <code>print lista_unicode </code>al terminar.</p> <p><br></p> <p>En la Próxima sesión estaremos viendo como leer la lista desde el Archivo y pasarla a un Diccionario del tipo que comente al principio, nos vemos hasta la próxima.</p> <p><br></p> <p><br></p> </html>
author | sethroot |
---|---|
permlink | scrapping-con-python-extrayendo-unicode-para-criptografia-mini-curso-python |
category | spanish |
json_metadata | {"tags":["spanish","python","programming","linux","cryptography"],"links":["https://www.sciencebuddies.org/science-fair-projects/references/table-of-8-bit-ascii-character-codes"],"app":"steemit/0.1","format":"html"} |
created | 2017-09-28 21:22:18 |
last_update | 2017-09-28 21:22:18 |
depth | 0 |
children | 4 |
last_payout | 2017-10-05 21:22:18 |
cashout_time | 1969-12-31 23:59:59 |
total_payout_value | 6.391 HBD |
curator_payout_value | 1.898 HBD |
pending_payout_value | 0.000 HBD |
promoted | 0.000 HBD |
body_length | 5,338 |
author_reputation | 15,716,888,530,282 |
root_title | "SCRAPPING con Python extrayendo Unicode para Criptografia!!! [Mini Curso Python]" |
beneficiaries | [] |
max_accepted_payout | 1,000,000.000 HBD |
percent_hbd | 0 |
post_id | 16,218,506 |
net_rshares | 2,981,067,454,358 |
author_curate_reward | "" |
voter | weight | wgt% | rshares | pct | time |
---|---|---|---|---|---|
wackou | 0 | 314,351,628,776 | 3.6% | ||
pharesim | 0 | 103,146,602,990 | 0.23% | ||
lafona-miner | 0 | 336,683,357,916 | 4.5% | ||
boy | 0 | 347,550,375 | 100% | ||
bue-witness | 0 | 423,778,990 | 100% | ||
bunny | 0 | 70,966,788 | 100% | ||
bue | 0 | 95,492,948,748 | 100% | ||
mini | 0 | 185,759,907 | 100% | ||
healthcare | 0 | 79,116,643 | 100% | ||
daniel.pan | 0 | 109,518,837 | 100% | ||
ivan-perez-anies | 0 | 9,294,641,417 | 9% | ||
carlos-cabeza | 0 | 1,689,445,413 | 9% | ||
heiditravels | 0 | 117,789,169,168 | 9% | ||
luisucv34 | 0 | 517,886,490 | 9% | ||
coliraver | 0 | 303,974,156 | 9% | ||
craigslist | 0 | 60,225,846 | 100% | ||
gargon | 0 | 100,156,097,864 | 9% | ||
pgarcgo | 0 | 110,279,690,759 | 9% | ||
wartrapa | 0 | 16,626,888,650 | 9% | ||
moisesmcardona | 0 | 4,831,263,518 | 9% | ||
albagargon | 0 | 546,463,738 | 9% | ||
titin | 0 | 17,763,114,905 | 9% | ||
betamusic | 0 | 7,001,829,756 | 9% | ||
jgcastrillo19 | 0 | 26,315,723,997 | 9% | ||
nelyp | 0 | 7,009,347,452 | 9% | ||
sivila | 0 | 106,020,672 | 100% | ||
kebarshale | 0 | 108,116,211 | 100% | ||
teo | 0 | 1,162,165,445 | 9% | ||
moonrayakim | 0 | 107,636,507 | 100% | ||
natree | 0 | 85,342,537 | 100% | ||
dresden | 0 | 8,287,348,425 | 9% | ||
alfredozofio | 0 | 1,215,243,145 | 9% | ||
dulcinea | 0 | 1,769,748,973 | 9% | ||
mdcomes | 0 | 3,129,169,839 | 9% | ||
cervantes | 0 | 383,976,694,488 | 9% | ||
aneblueberry | 0 | 2,660,803,553 | 9% | ||
sethroot | 0 | 9,305,470,553 | 100% | ||
aniestudio | 0 | 6,742,354,410 | 9% | ||
loreennaa | 0 | 1,648,137,824 | 9% | ||
mendezand | 0 | 237,507,674 | 10% | ||
jduarte | 0 | 950,481,646 | 20% | ||
francis228 | 0 | 283,733,939 | 9% | ||
don.quijote | 0 | 34,877,079,012 | 90% | ||
ibiza | 0 | 87,526,337 | 9% | ||
teacher | 0 | 87,739,359 | 9% | ||
tincho | 0 | 88,380,806,595 | 9% | ||
oscarps | 0 | 3,525,988,218 | 15% | ||
fidel-castro | 0 | 241,708,511 | 9% | ||
kaask | 0 | 531,922,822 | 100% | ||
giovis | 0 | 60,476,041 | 20% | ||
jocra | 0 | 11,381,075,984 | 15% | ||
zhanm | 0 | 472,570,328 | 100% | ||
ropaga | 0 | 226,982,277 | 9% | ||
kulagin | 0 | 491,038,293 | 100% | ||
pofigistka | 0 | 498,142,497 | 100% | ||
barboss | 0 | 518,335,476 | 100% | ||
tonem | 0 | 517,934,972 | 100% | ||
lovu | 0 | 437,529,839 | 100% | ||
sexa | 0 | 432,339,799 | 100% | ||
planter | 0 | 1,082,081,640,111 | 35% | ||
mikimaus | 0 | 476,782,958 | 100% | ||
ana2410 | 0 | 56,632,662 | 20% | ||
ienrikex | 0 | 165,073,250 | 0.9% | ||
kilofasr | 0 | 467,785,347 | 100% | ||
nicolaslundy | 0 | 527,765,286 | 100% | ||
elfictron | 0 | 855,166,767 | 10% | ||
fakj94 | 0 | 1,158,042,102 | 9% | ||
nenio | 0 | 8,868,374,327 | 33% | ||
jackelinlopez | 0 | 1,346,236,562 | 10% | ||
mpandrew | 0 | 549,559,194 | 9% | ||
kilianmiguel | 0 | 741,820,340 | 9% | ||
jkj | 0 | 321,565,106 | 9% | ||
odic3o1 | 0 | 640,650,592 | 100% | ||
jmromero | 0 | 128,520,856 | 9% | ||
introspectiva | 0 | 222,067,495 | 9% | ||
magoia | 0 | 375,784,995 | 9% | ||
spanishchef | 0 | 368,679,146 | 9% | ||
soymanu | 0 | 334,538,485 | 9% | ||
pagejustin | 0 | 625,759,894 | 100% | ||
freecreative | 0 | 145,590,138 | 9% | ||
ficciones | 0 | 182,967,860 | 9% | ||
fabianacarolina | 0 | 53,045,219 | 20% | ||
magia | 0 | 333,649,165 | 9% | ||
yaniria1 | 0 | 195,316,287 | 9% | ||
hermes1666 | 0 | 101,251,637 | 9% | ||
caspell | 0 | 626,917,954 | 15% | ||
rubo | 0 | 241,636,096 | 10% | ||
eilin | 0 | 81,112,246 | 9% | ||
drmaizo | 0 | 108,998,379 | 9% | ||
tranceart | 0 | 79,958,193 | 15% | ||
lari | 0 | 805,975,697 | 100% | ||
ypervuxin | 0 | 600,934,400 | 100% | ||
noticias | 0 | 333,621,113 | 9% | ||
pilgigin | 0 | 924,563,566 | 100% | ||
esstachi | 0 | 545,867,158 | 100% | ||
ermachins | 0 | 610,999,434 | 100% | ||
savit | 0 | 620,716,261 | 100% | ||
dpereboev | 0 | 619,520,000 | 100% | ||
kazakov | 0 | 613,324,800 | 100% | ||
bogomolov | 0 | 619,520,000 | 100% | ||
tikho | 0 | 558,125,190 | 100% | ||
makler | 0 | 614,018,969 | 100% | ||
opracksech | 0 | 619,520,000 | 100% | ||
florov | 0 | 613,324,800 | 100% | ||
foredovliga | 0 | 602,775,766 | 100% | ||
vseslavkogan | 0 | 620,179,930 | 100% | ||
lizainova | 0 | 619,520,000 | 100% | ||
katyr | 0 | 620,427,393 | 100% | ||
samaraj | 0 | 678,253,474 | 100% | ||
mayvil | 0 | 219,529,102 | 27% | ||
veastasya | 0 | 622,735,945 | 100% | ||
lechibys | 0 | 534,226,634 | 100% | ||
hegaby | 0 | 110,343,467 | 10% | ||
trailhispano | 0 | 72,301,440 | 10% | ||
supercarlos1994 | 0 | 187,878,155 | 10% | ||
dinocreative | 0 | 84,782,098 | 9% | ||
cinyf | 0 | 951,727,276 | 100% | ||
zinnkazim | 0 | 1,086,279,488 | 100% | ||
cookie1225 | 0 | 1,143,224,307 | 100% | ||
vtchernof | 0 | 1,143,224,286 | 100% | ||
nuxut | 0 | 1,015,554,489 | 100% | ||
ibaf04 | 0 | 87,775,756 | 5% | ||
tanyaxhren | 0 | 1,160,632,950 | 100% | ||
alexvvolko | 0 | 1,149,026,353 | 100% | ||
criptorafa | 0 | 1,604,576,322 | 10% | ||
sosiyapettr | 0 | 1,160,630,776 | 100% | ||
mitplahov | 0 | 1,160,627,505 | 100% | ||
koshkinnao | 0 | 1,143,216,105 | 100% | ||
starovoy | 0 | 1,160,623,076 | 100% | ||
orez | 0 | 50,766,216 | 80% | ||
critic-on | 0 | 58,620,344 | 10% | ||
stash | 0 | 1,149,016,496 | 100% | ||
allatsova | 0 | 1,160,622,517 | 100% | ||
livochkin | 0 | 1,160,622,399 | 100% | ||
jacobper | 0 | 109,348,774 | 10% | ||
xesys | 0 | 957,512,187 | 100% | ||
peskovviktt | 0 | 1,143,211,413 | 100% | ||
muliya | 0 | 1,160,617,521 | 100% | ||
dinamit | 0 | 1,137,403,603 | 100% | ||
truanev | 0 | 1,143,206,106 | 100% | ||
boguzhina | 0 | 1,160,608,345 | 100% | ||
erfiguera | 0 | 72,458,669 | 20% | ||
zhene | 0 | 1,160,605,418 | 100% | ||
lyudamaychi | 0 | 1,160,604,706 | 100% | ||
ivanlyub | 0 | 1,148,998,541 | 100% | ||
gesterfoz | 0 | 1,114,176,656 | 100% | ||
preguntame | 0 | 55,975,371 | 5% | ||
truffier | 0 | 237,921,646 | 100% |
Buenas tardes @sethroot Un idioma complejo y desconocido para mí el de la programación pero no por ello carente de mucho mérito.
author | don.quijote |
---|---|
permlink | re-sethroot-scrapping-con-python-extrayendo-unicode-para-criptografia-mini-curso-python-20171001t114322519z |
category | spanish |
json_metadata | {"tags":["spanish"],"users":["sethroot"],"app":"steemit/0.1"} |
created | 2017-10-01 11:43:21 |
last_update | 2017-10-01 11:43:21 |
depth | 1 |
children | 1 |
last_payout | 2017-10-08 11:43:21 |
cashout_time | 1969-12-31 23:59:59 |
total_payout_value | 0.000 HBD |
curator_payout_value | 0.000 HBD |
pending_payout_value | 0.000 HBD |
promoted | 0.000 HBD |
body_length | 128 |
author_reputation | 175,546,267,224,811 |
root_title | "SCRAPPING con Python extrayendo Unicode para Criptografia!!! [Mini Curso Python]" |
beneficiaries | [] |
max_accepted_payout | 1,000,000.000 HBD |
percent_hbd | 10,000 |
post_id | 16,463,490 |
net_rshares | 0 |
Que bueno que te haya gustado muchas gracias!!!
author | sethroot |
---|---|
permlink | re-donquijote-re-sethroot-scrapping-con-python-extrayendo-unicode-para-criptografia-mini-curso-python-20171001t211217666z |
category | spanish |
json_metadata | {"tags":["spanish"],"app":"steemit/0.1"} |
created | 2017-10-01 21:12:18 |
last_update | 2017-10-01 21:12:18 |
depth | 2 |
children | 0 |
last_payout | 2017-10-08 21:12:18 |
cashout_time | 1969-12-31 23:59:59 |
total_payout_value | 0.000 HBD |
curator_payout_value | 0.000 HBD |
pending_payout_value | 0.000 HBD |
promoted | 0.000 HBD |
body_length | 47 |
author_reputation | 15,716,888,530,282 |
root_title | "SCRAPPING con Python extrayendo Unicode para Criptografia!!! [Mini Curso Python]" |
beneficiaries | [] |
max_accepted_payout | 1,000,000.000 HBD |
percent_hbd | 10,000 |
post_id | 16,507,306 |
net_rshares | 0 |
Seth gueko 😉
author | truffier |
---|---|
permlink | re-sethroot-scrapping-con-python-extrayendo-unicode-para-criptografia-mini-curso-python-20170928t212656973z |
category | spanish |
json_metadata | {"tags":["spanish"],"app":"steemit/0.1"} |
created | 2017-09-28 21:27:00 |
last_update | 2017-09-28 21:27:00 |
depth | 1 |
children | 1 |
last_payout | 2017-10-05 21:27:00 |
cashout_time | 1969-12-31 23:59:59 |
total_payout_value | 0.000 HBD |
curator_payout_value | 0.000 HBD |
pending_payout_value | 0.000 HBD |
promoted | 0.000 HBD |
body_length | 12 |
author_reputation | 9,893,251,640 |
root_title | "SCRAPPING con Python extrayendo Unicode para Criptografia!!! [Mini Curso Python]" |
beneficiaries | [] |
max_accepted_payout | 1,000,000.000 HBD |
percent_hbd | 10,000 |
post_id | 16,218,777 |
net_rshares | 0 |
thx :D
author | sethroot |
---|---|
permlink | re-truffier-re-sethroot-scrapping-con-python-extrayendo-unicode-para-criptografia-mini-curso-python-20170928t224540639z |
category | spanish |
json_metadata | {"tags":["spanish"],"app":"steemit/0.1"} |
created | 2017-09-28 22:45:42 |
last_update | 2017-09-28 22:45:42 |
depth | 2 |
children | 0 |
last_payout | 2017-10-05 22:45:42 |
cashout_time | 1969-12-31 23:59:59 |
total_payout_value | 0.000 HBD |
curator_payout_value | 0.000 HBD |
pending_payout_value | 0.000 HBD |
promoted | 0.000 HBD |
body_length | 6 |
author_reputation | 15,716,888,530,282 |
root_title | "SCRAPPING con Python extrayendo Unicode para Criptografia!!! [Mini Curso Python]" |
beneficiaries | [] |
max_accepted_payout | 1,000,000.000 HBD |
percent_hbd | 10,000 |
post_id | 16,223,340 |
net_rshares | 0 |