Extracting PDF Data With Pdfplumber - Lines, Rectangles, And Crop by geekgirl

View this thread on: hive.blog | peakd.com | ecency.com

python · @geekgirl · Aug 2 '22

$112.01

Extracting PDF Data With Pdfplumber - Lines, Rectangles, And Crop

![pdfplumber_data.png](https://images.hive.blog/DQmesCaeFmx11bh64otb8SBqoZribdW9eUiBTvH54VtjEBw/pdfplumber_data.png)

In the past I have written how useful **pdfplumber** library is when extracting data from pdf files. Its true power becomes evident with dealing with multiple pdf files that have hundreds of pages. When you know what you are looking for, and don't want to go through hundreds of pages manually, and if you have to do deal with such files on daily basis, best thing to do is to automate. That's what python is great at, automating. **Pdfplumber** as the naming suggest works with pdf files and makes it easy to extract data. It works best with machine-generated pdf files rather than scanned pdf files.

When extracting data from pdf files we can utilize multiple approaches. If we just need some text, we can start with the simple `.extract_text()` method. However, **pdfplumber** let's us extract all objects in the document like images, lines, rectangles, curves, chars, or we can just get all of these objects with `.objects`. Sometimes machine generated pdf files utilize lines and rectangles to separate the information on the page. This can help up in identifying the type of text within those lines or rectangles. I recently came across some financial pdf data formatted in such a way. Using the location of these lines and rectangles can help to select the text in that area using **pdfplumber**'s `.crop()` method.

First, let's take a look at basic text extraction with `pdfplumber`.

```
import pdfplumber

with pdfplumber.open('/Users/librarian/Desktop/document.pdf') as pdf:
page1 = pdf.pages[0]
page1_text = page1.extract_text().split('\n')
for text in page1_text:
print(text)
```

We open the file with pdfplumber, `.pages` returns list of pages in the pdf and all the data within those pages. Since it is a list we can access them one by one. In the example above we are just looking at page one for now. Using `.extract_text()` method, we can get all text of page one. It is one long string. If we want to separate the text line by line, we use the `.split('\n')`. Now that we have a list of lines of text from page one, we can iterate through the list and display all lines of text.

In most cases, this might be all you need. But sometimes you may want to extract these lines of text and retain the layout formatting. To do this, we add `layout=True` parameter to `.extract_text()` method, like this `page1.extract_text(layout=True).split('\n')`. Be careful when using `layout=True`, because this feature is experimental and not stable yet. In might work in most cases, but sometimes it may return unexpected results.

Now that we know how to extract the text from the page, we can apply some string manipulation and regex to get only the data that we actually need. If we know the exact area on the page where our data is located, we can use `.crop()` method and extract only that data using the same extraction methods described above.

**pdfplumber.Page** class has properties like `.page_number`, `.width`, and `.height`. We can use width and height of the page in determining which area we are going to crop. Let's take a look at a code example using `.crop()`

```
import pdfplumber

with pdfplumber.open('/Users/librarian/Desktop/document.pdf') as pdf:
page1 = pdf.pages[0]
bounding_box = (200, 300, 400, 450)
crop_area = page1.crop(bounding_box)
crop_text = crop_area.extract_text().split('\n')
for text in crop_text:
print(text)
```

Once we have our page instance, we use the `.crop(bounding_box)` method, and result is still page but only covers the area defined by bounding_box. Think of it is a piece of the page, but it still is a page, and we can apply other other methods like `.extract_text()` on this piece of a page.

This cropping the area can be very useful if you know the exact area your text is located in. This feature become even more useful when the pdf documents we are working with have lines and rectangles for formatting and separating information. We can extract all the lines and rectangles on the page and get their locations. Using these locations we can easily identify which area of the page we need to crop. To get the lines on the page, we use `.lines` property and to get the rectangles on the page we use `.rects` property. To see how many lines we have on the page and properties of a line we can run the following code.

```
import pdfplumber
import pprint

with pdfplumber.open('/Users/librarian/Desktop/document.pdf') as pdf:
page1 = pdf.pages[0]
lines = page1.lines
print(len(lines))
pprint.pprint(lines[0])
```

The result would show the following properties and their values line objects will have. Some of them will be useful, other we can ignore.

```
{'bottom': 130.64999999999998,
'doctop': 130.64999999999998,
'evenodd': False,
'fill': False,
'height': 0.0,
'linewidth': 1,
'non_stroking_color': [0.859],
'object_type': 'line',
'page_number': 1,
'pts': [(18.0, 661.35), (590.25, 661.35)],
'stroke': True,
'stroking_color': (0, 0, 0),
'top': 130.64999999999998,
'width': 572.25,
'x0': 18.0,
'x1': 590.25,
'y0': 661.35,
'y1': 661.35}
```

Which property to use will be based on the project. In my case I would be using ***top, bottom, x0, and x1***. Although top and bottom values are same in this example because line width is only 1, I would still get both values just in case the value of the line width changes in the future.

We would get the rectangles on the page the same way as we did with lines. In this case we change the property to `.rects`. When using rects, the top and bottom value will be different for obvious reasons. Now that we have the coordinates where we need to crop and extract text from, we just plug in these values we get from `.lines` and `.rects` into our bounding_box for `.crop()` method.

I just started using these features of **pdfplumber** today, and so far everything is working great and I have seen any issues yet. If you work with many pdf files to extract data and these documents have repeating lines and rectangles that separate information, you too may find **pdfplumber** to be useful in automating these tasks. Let me know your thoughts and experiences about text extraction from pdf documents in the comments.

Pdfplumber has great documentation. Feel free to visit the github page: https://github.com/jsvine/pdfplumber

👍 appreciator, trafalgar, steempty, v4vapid, coinomite, hanshotfirst, diggndeeper.com, jphamer1, magicmonk, oflyhigh, deanliu, slobberchops, jedigeiss, solominer, samantha-w, revisesociology, traf, shaka, katherine-w, sanjeevm, daveks, fulltimegeek, daltono, minnowbooster, enjar, therealyme, vancouverdining, dsky, borran, abh12345, x30, bala41288, isaria, offgridlife, steemstem, proofofbrain, vikisecrets, hiveupme, whangster79, lynds, lemouth, chorock, fijimermaid, alexpmorris, mawit07, roleerob, raindrop, netaterra, anomadsoul, techslut, chinchilla, tussar11, kevinwong, kendewitt, gubbatv, quochuy, penguinpablo, blewitt, michellectv, leaky20, prinzvalium, ikrahch, nnaraoh, fw206, and 432 others
👎 xtrafalgar, xsteempty, xappreciator

`author`	geekgirl
`permlink`	extracting-pdf-data-with-pdfplumber-lines-rectangles-and-crop
`category`	python
`json_metadata`	{"tags":["python","pdfplumber","coding","programming","vyb","proofofbrain","stem","neoxian"],"image":["https://images.hive.blog/DQmesCaeFmx11bh64otb8SBqoZribdW9eUiBTvH54VtjEBw/pdfplumber_data.png"],"links":["https://github.com/jsvine/pdfplumber"],"app":"hiveblog/0.1","format":"markdown"}
`created`	2022-08-02 03:57:06
`last_update`	2022-08-02 03:57:06
`depth`	0
`children`	13
`last_payout`	2022-08-09 03:57:06
`cashout_time`	1969-12-31 23:59:59
`total_payout_value`	56.074 HBD
`curator_payout_value`	55.940 HBD
`pending_payout_value`	0.000 HBD
`promoted`	0.000 HBD
`body_length`	6,440
`author_reputation`	1,586,488,611,824,452
`root_title`	"Extracting PDF Data With Pdfplumber - Lines, Rectangles, And Crop"
`beneficiaries`	`[]`
`max_accepted_payout`	1,000,000.000 HBD
`percent_hbd`	10,000
`post_id`	115,358,210
`net_rshares`	149,180,661,596,381
`author_curate_reward`	""

properties (23)vote details (499)

voter	rshares	pct
steempty	8,735,617,162,667	50%
tuck-fheman	3,219,669,215	100%
kevinwong	277,503,950,177	35%
leprechaun	660,327,262	19.5%
eric-boucher	2,088,410,706	0.45%
thecryptodrive	6,412,108,863	0.18%
mammasitta	1,337,348,156	0.45%
roelandp	43,694,480,530	5%
mrwang	559,750,129	17.5%
fulltimegeek	1,471,592,051,435	100%
cloh76	593,780,918	0.45%
kendewitt	275,247,845,384	100%
deanliu	2,722,461,358,627	70%
arconite	990,826,806	17.5%
alexpmorris	353,522,099,283	100%
shaka	1,870,996,205,216	60%
magicmonk	3,145,975,250,996	100%
jphamer1	3,353,694,498,744	100%
oflyhigh	2,879,841,750,215	100%
hanshotfirst	5,528,552,267,001	100%
lordvader	7,739,516,334	0.9%
borran	796,246,791,556	78%
rmach	3,692,631,343	5%
lemouth	440,093,546,417	20%
netaterra	344,622,186,458	15%
ats-david	2,846,937,965	50%
macksby	9,589,967,318	100%
daveks	1,729,236,030,261	26%
penguinpablo	259,785,114,700	14%
lamouthe	708,212,290	10%
uwelang	100,743,048,283	30%
tfeldman	818,131,438	0.45%
petrvl	143,442,935,125	15%
mcsvi	98,940,882,276	50%
abh12345	724,221,275,990	25%
funnyman	1,319,963,329	5.6%
justyy	4,405,686,686	0.9%
clayboyn	29,239,793,781	25%
curie	98,484,306,538	0.9%
techslut	302,036,675,309	50%
slider2990	12,155,018,301	100%
steemstem	567,160,934,225	10%
edb	1,022,879,838	1%
teofilex11	546,946,440	100%
valth	1,411,263,076	5%
v4vapid	8,105,179,618,217	51%
delso	45,697,547,635	100%
michellectv	236,481,311,219	100%
dna-replication	337,507,207	10%
dzboston33	1,093,429,213	50%
diggndeeper.com	4,705,213,161,892	100%
borislavzlatanov	9,506,803,814	100%
trafalgar	21,832,406,920,026	48%
fernwehninja	449,806,097	100%
itinerantph	650,123,782	24%
dhimmel	52,438,457,917	2.5%
oluwatobiloba	765,532,448	10%
detlev	44,925,362,872	4.5%
raindrop	347,565,550,570	48%
anomadsoul	304,950,674,413	100%
buster544	45,043,572,202	100%
federacion45	1,357,828,134	0.45%
gamersclassified	155,666,779,346	100%
domo	511,234,389	100%
mobbs	8,744,789,969	5%
jerrybanfield	2,922,325,846	0.9%
roomservice	641,052,626	0.45%
rt395	1,301,463,726	1.5%
bitrocker2020	1,373,353,458	0.13%
drag33	40,015,141,686	100%
ampm	9,725,548,511	100%
newsflash	174,583,289,772	0.31%
jga	45,547,013,388	100%
isaria	614,072,578,062	50%
yehey	1,728,876,115	0.9%
samminator	1,538,896,163	0.5%
mdosev	48,842,148,191	60%
belahejna	24,882,181,403	15%
anacristinasilva	29,361,681,933	100%
alphacore	3,242,756,945	2.82%
enjar	1,052,540,518,284	100%
chinito	23,648,423,033	100%
alexander.alexis	4,937,526,301	10%
jayna	29,633,375,129	5%
ew-and-patterns	7,992,143,001	3%
joeyarnoldvn	495,759,521	1.47%
vikbuddy	9,293,124,752	16.4%
offgridlife	594,468,418,180	100%
ufv	6,054,898,047	100%
gunthertopp	11,418,221,801	0.22%
ludmila.kyriakou	1,774,029,785	3%
bluemist	19,072,340,258	9%
minnowbooster	1,108,128,200,581	20%
lenasveganliving	2,672,987,585	4.5%
molometer	72,168,801,343	60%
tsoldovieri	985,013,658	5%
steemwizards	530,934,207	0.9%
appreciator	34,062,054,868,163	9%
neumannsalva	515,800,248	0.45%
stayoutoftherz	16,737,076,856	0.22%
abigail-dantes	3,555,973,460	10%
goldcoin	24,447,406,713	100%
sanjeevm	1,752,252,528,987	50%
leaky20	218,423,948,744	80%
vikisecrets	541,745,938,012	30%
isabelpena	12,706,563,074	100%
zonguin	700,336,738	2.5%
sciencevienna	91,591,661,930	100%
pocketrocket	14,636,322,297	100%
jasonbu	2,591,826,255	4%
mochita	14,033,581,138	100%
amitsharma	107,532,601,391	100%
travelingmercies	23,326,311,807	100%
aafeng	7,919,770,952	1%
valchiz	1,364,832,491	15%
chops316	170,863,136,465	100%
rehan12	17,164,030,726	50%
santigs	21,313,416,778	48%
karja	1,929,261,182	5%
sportsgeek	1,166,247,485	30%
musicgeek	665,261,380	30%
marketinggeek	1,615,753,205	60%
coruscate	6,938,763,227	50%
jedigeiss	2,391,933,494,269	100%
nurhayati	1,615,997,921	35%
artonmysleeve	1,969,570,602	13%
chinchilla	301,761,426,619	100%
sorin.cristescu	1,472,272,825	0.45%
roleerob	350,535,424,939	75%
fatman	8,672,482,613	2%
mawit07	350,668,716,711	50%
revisesociology	1,933,931,211,626	100%
yangyanje	1,779,242,601	0.45%
meno	3,248,663,522	0.45%
andywong31	95,573,841,847	100%
enzor	533,458,980	10%
x30	667,194,261,150	9%
minnowspower	240,078,065	35%
coolguy123	167,121,996,267	100%
fknmayhem	7,616,673,838	75%
sunsea	5,178,475,835	4.5%
prinzvalium	216,726,020,354	50%
steveconnor	587,990,701	0.45%
nicole-st	1,815,487,701	0.45%
aboutcoolscience	2,521,021,154	10%
mytechtrail	61,135,406,550	15%
traf	1,900,274,292,139	48%
elderson	15,992,906,598	23.5%
kenadis	2,453,738,811	10%
punchline	2,024,174,114	0.9%
shohana1	1,767,534,291	100%
sentipl	10,934,755,857	100%
cryptonized	21,227,737,867	14%
fourfourfun	7,340,673,215	25%
daltono	1,161,681,207,975	47%
juecoree	38,317,042,943	42%
fineartnow	515,732,905	0.45%
fragmentarion	2,106,001,834	10%
tryskele	1,289,356,131	5%
bala41288	615,408,694,288	50%
heidimarie	4,118,211,673	10%
soyrosa	164,898,707,503	50%
chorock	395,252,704,397	45%
ikrahch	188,308,048,888	50%
soufiani	531,793,805	0.36%
philnewton	993,357,375	12.5%
nnaraoh	183,990,644,375	100%
tobias-g	1,370,079,095	18.75%
leoumesh	6,938,136,852	100%
piotr42	571,344,958	7%
kkarenmp	47,790,625,021	100%
svemirac	13,633,053,808	100%
geopolis	606,544,717	10%
robertbira	1,033,640,830	2.5%
catharsis	5,480,972,131	100%
anna89	7,071,536,687	100%
anikys3reasure	523,453,547	15%
alexdory	1,332,663,349	10%
movement19	2,538,278,535	8.25%
beeyou	4,906,009,800	100%
francostem	1,309,731,882	10%
lisfabian	25,929,018,498	50%
videoaddiction	31,878,371,749	100%
technologix	50,229,899,877	100%
backinblackdevil	40,619,729,379	100%
asimo	1,652,721,460	35%
muratkbesiroglu	1,815,360,735	0.09%
sanderjansenart	689,844,347	0.45%
qberry	522,731,788	0.45%
racibo	665,432,448	10%
ablaze	160,305,107,145	100%
radard	21,430,314,891	100%
z3ll	1,222,627,623	100%
inciter	5,598,739,027	9%
tsurmb	28,812,804,551	100%
de-stem	5,320,904,815	9.9%
achimmertens	1,107,559,377	0.45%
quochuy	272,128,974,092	5.73%
kgakakillerg	1,164,594,592	0.9%
el-dee-are-es	38,882,007,762	10%
mraggaj	6,419,780,968	100%
meanbees	19,144,016,008	10%
sheikhsayem	1,387,795,014	100%
deholt	517,314,525	8.5%
solominer	2,279,307,899,591	33%
leomolina	1,946,625,450	20%
fw206	182,158,037,297	4.5%
slobberchops	2,397,508,570,969	60%
diabonua	718,552,589	0.45%
davidesimoncini	7,946,432,676	28.4%
darthgexe	6,254,307,412	30%
temitayo-pelumi	778,749,024	10%
motherofalegend	1,324,616,387	5%
mstedda	1,407,380,940	100%
deepu7	35,620,342,786	7.5%
digital.mine	75,309,938,909	60%
marblely	8,483,361,591	9%
blewitt	244,748,101,701	50%
bingbabe	17,458,796,979	100%
czera	612,938,996	100%
bflanagin	990,173,862	0.45%
armandosodano	1,668,489,285	0.45%
smartvote	51,645,984,314	2.1%
kylealex	4,534,186,110	10%
harkar	173,609,473,960	20%
themanny	69,505,662,924	100%
idakarlsen	37,217,712,471	10%
blueeyes8960	59,808,555,813	100%
fijimermaid	380,838,612,579	100%
thelittlebank	2,688,741,226	0.45%
gubbatv	273,545,205,999	100%
pboulet	14,869,694,689	8%
tussar11	293,095,137,216	100%
a428	2,405,338,519	100%
thrasher666	2,206,987,357	60%
brianoflondon	4,228,534,257	0.13%
proofofbrain	564,963,113,953	100%
steemcryptosicko	1,727,347,576	0.18%
bro-poker	691,221,213	50%
misterengagement	1,667,607,946	22.5%
stem.witness	531,339,553	10%
devann	4,137,708,948	3%
maajaanaa	794,140,556	15%
dismayedworld	17,085,662,402	100%
starrouge	976,204,610	50%
wherein	74,256,420,387	100%
robmojo	582,149,571	1%
bluerobo	118,788,750,287	100%
zerofive	812,176,237	50%
currentxchange	8,984,365,836	15%
jacuzzi	1,730,980,727	1.4%
blind-spot	16,159,834,934	50%
primeradue	811,398,482	51%
lestrange	17,117,016,485	100%
samantha-w	2,138,166,023,462	50%
cnstm	128,586,157,979	100%
likuang007	611,767,120	100%
elikast	4,042,549,247	100%
crowdwitness	2,778,159,680	5%
followgrubby	37,485,505,133	100%
lianjingmedia	867,293,416	100%
creacioneslelys	41,575,977,066	100%
curationvoter	1,580,750,412	50%
ultima-alianza	2,116,825,770	100%
leeyh2	28,383,083,412	100%
hungrybear	612,218,887	14%
some-asshole	808,538,779	12.5%
doze	36,333,965,307	50%
steemean	9,873,263,669	5%
sophieandhenrik	3,716,760,410	30%
c4cristi3	30,640,179,440	77%
medro-martin	12,022,180,669	95%
alenox	3,169,128,438	100%
maiu	4,628,173,504	100%
epicdice	585,855,507	0.27%
coolsurfer	884,064,697	4.5%
beerlover	5,259,421,946	4.5%
enjargames	2,565,535,184	100%
wulff-media	88,041,820,135	50%
vancouverdining	942,723,223,767	18%
leighscotford	574,891,303	0.9%
aicu	5,313,193,188	0.9%
burn-it-down	631,180,937	30%
teamashen	3,472,755,207	5%
zaku-ag	1,656,980,126	30%
bala-ag	1,900,311,306	50%
r-ag	123,903,670	30%
xawi-ag	527,868,808	15%
neoxiancity	38,007,833,070	30%
ali-h	3,302,989,945	35%
bearjohn	1,398,933,781	75%
mktmaker	676,759,007	72.75%
stemgeeks	22,059,866,050	60%
stemcuration	1,381,311,300	60%
babytarazkp	10,214,097,143	100%
trendovoter	1,472,093,747	35%
aicoding	22,092,178,279	100%
abh12345.stem	2,419,545,973	100%
soyunasantacruz	16,320,328,421	80%
prolinuxua	6,428,734,622	100%
whangster79	487,068,813,956	33%
kanibot	17,934,699,138	50%
lynds	456,690,596,979	100%
brocfml	993,829,490	100%
stem.alfa	2,709,306,820	100%
f76wz	31,984,285,501	100%
zeesh	2,105,684,608	4.5%
steemstem-trig	757,284,520	10%
therealyme	1,024,711,242,707	15%
yggdrasil.laguna	333,045,336	30%
mrhaldar	3,708,351,903	15%
cd-stem	514,891,208	100%
rehan.neox	1,704,103,055	100%
honeychip	4,367,650,773	85%
appics.tutorial	33,051,433,316	100%
sharkthelion	1,212,537,171	25%
chapmain	136,431,194	100%
gloriaolar	1,075,417,028	7.5%
marblesz	583,955,175	9%
julesquirin	1,789,067,685	9.6%
davidlionfish	16,909,517,186	50%
diyhub	68,549,136,496	50%
stuntman.mike	8,686,658,598	100%
fengchao	69,425,935,905	5%
blue-witness	1,488,578,067	100%
laruche	43,041,909,149	1%
stemsocial	79,927,693,279	10%
flowerbaby	16,063,460,572	47%
softworld	179,521,580,602	50%
fabulousfurlough	1,147,018,118	100%
ninnu	68,271,274,682	50%
hiveupme	534,434,047,608	35%
hivecurator	3,309,563,436	35%
actioncats	13,830,413,545	9%
discoveringarni	42,226,956,516	15%
bimpcy	9,975,068,525	20%
logicforce	7,366,457,391	50%
damadama	97,904,109	100%
plusvault	509,893,755	25%
olaunlimited	38,652,199,804	21.6%
coretoken	128,409,472	35%
recoveryinc	10,871,224,665	16.5%
vaulttoken	639,719,625	35%
swap.vault	77,136,238	35%
liz.writes	499,479,486	24.75%
dying	1,226,515,638	33%
aabcent	1,718,968,171	0.72%
rewarder	1,270,812,066	35%
trangbaby	11,374,349,187	5.73%
pfwaus	1,313,865,045	100%
katleya	19,715,848,172	100%
noalys	507,819,728	4.5%
gallatin	15,249,421,873	100%
tan.dev	82,650,199	35%
dorkpower	4,043,992,698	100%
wine-token	892,695,398	35%
kattycrochet	3,238,029,753	4.5%
esecholito	29,799,547,165	100%
jff7777	839,679,492	100%
stemcur	748,025,482	100%
tan.reg	81,023,272	35%
meritocracy	9,948,880,950	0.09%
thesimguru	81,022,902	35%
andrastia	16,300,670,266	24%
tan.stake	2,788,299,786	35%
stemline	3,809,555,669	30%
sillybilly	523,150,701	100%
he-index	4,453,400,526	15%
wine.bot	4,624,302,907	35%
rosalestrust	4,764,158,775	100%
scooter77.stem	617,680,587	60%
tan.extra	77,278,085	35%
vihan	82,621,760	35%
krishu.stem	986,995,203	100%
ruari	824,057,758	100%
babeltrips	1,113,687,166	2.86%
winebank	77,182,608	35%
samrisso	11,700,666,981	16.5%
peerfinance	12,265,088,738	100%
scholaris.stem	1,760,683,638	100%
simbank	82,288,969	35%
sofs-su	36,675,262,530	32.71%
kriszrokk	8,502,777,022	100%
yieldgrower	7,758,923,075	100%
haitch	8,843,282,710	100%
coinomite	5,680,379,827,717	54%
adamada.stem	2,886,810,496	100%
firinmahlazer	509,851,900	100%
wine-ico	87,332,148	35%
tomtothetom	5,111,861,733	33%
biglove	1,054,787,339	25%
benk07	137,594,842	100%
core.voter	7,269,368,907	35%
drricksanchez	35,027,595,746	7.5%
arita992	2,242,338,265	25%
eforucom.neox	2,374,147,749	100%
hexagono6	649,066,706	4.5%
glimpsytips	1,702,522,241	100%
juecoree.stem	708,680,454	100%
zarnoex	8,010,850,116	100%
unlockmaster	9,259,998,252	100%
hive.friends	0	1%
fotomaglys	943,594,087	4.5%
mattbrown.art	1,045,245,653	12.5%
memesupport	1,235,029,451	30%
holovision.stem	256,047,540	50%
twicejoy	66,653,082,750	25%
cicixrose	672,613,453	100%
lordb	2,925,839,289	100%
mimi05	7,516,818,103	80%
vault.burn	1,728,368,100	35%
irenicus30	130,476,359,069	100%
kamaleshwar	6,290,686,483	50%
chandra.shekar	9,986,192,905	50%
kannannv	29,527,485,847	50%
aprasad2325	1,688,152,400	4.5%
lavista	13,687,961,454	100%
star.stem	1,778,141,359	50%
theguru.photos	353,010,515	35%
solominer.stem	780,292,625	100%
callmemaungthan	5,663,493,483	64%
vasko90	2,555,217,244	100%
untzuntzuntz	196,447,727	1%
aleksdi	0	100%
onwugbenuvictor	6,602,743,058	15%
aries90	5,730,554,373	0.9%
lazy001	2,387,187,577	100%
matons	2,720,349,317	100%
cugel	4,002,502,144	16.5%
acantoni	1,835,321,281	16.5%
wendy0520	656,262,906	8.1%
kimloan	499,165,165	2.86%
cryptokungfu	9,605,561,890	100%
ginnungagap	726,779,282	100%
rca13	1,514,028,664	100%
uno.alfa	1,489,157,148	100%
dsky	847,766,716,738	100%
buzzgoblin	8,721,172,386	100%
whynotcamp	959,485,653	5.73%
abh12345.cards	2,981,202,007	25%
maakue	9,187,414,589	100%
crazy-bee	1,248,370,164	5.73%
deraaa	1,935,974,978	10%
mintfinch	26,785,496,412	100%
kavii	22,879,115	35%
waivio.curator	1,544,722,045	2.9%
estherscott	1,314,418,651	6.5%
winniecorp	1,081,997,210	13%
sunnyvo	939,696,169	2.86%
drexlord	3,625,135,666	7.5%
cryptobeautiful	1,344,770,268	100%
omosefe	1,619,083,214	13%
saboin.stem	402,839,515	60%
ivypham	971,220,503	5.73%
hironakamura	2,948,631,550	26%
martinlazizi	696,227,149	26%
bricksolution	34,648,552,885	80%
tuyenmei95	2,344,387,569	5.73%
jhoanbolivar19	5,417,416,339	100%
dyptre	23,431,911,358	100%
balaz	12,867,169,237	50%
rainbrella	3,492,582,015	100%
katherine-w	1,802,666,702,207	50%
moneymanog	110,263,028	100%
egboncass	554,690,740	7.8%
atyourservice	1,334,915,927	10.4%
edwingvu	0	100%
neidvu	0	100%
parravu	0	100%
mariovu2	0	100%
papovu	0	100%
luisvu	0	100%
indivu	0	100%
pedrovu	0	100%
cedevu	0	100%
escalonvu	0	100%
jeisonvu	0	100%
phithongvu	0	100%
elsabervu	0	100%
sabosuke	125,551,198	100%
toohip	146,243,812,175	100%
isabel-vihu	7,176,284,357	25%
mallorcamum	1,213,048,364	25%
anomadsoul.vyb	3,488,884,879	100%
anna-newkey	12,405,156,910	50%
bings-cards	16,902,974,954	100%
xappreciator	-148,580,206	-9%
xtrafalgar	-302,971,704	-48%
xsteempty	-148,617,552	-50%
positivum	513,973,386	100%
indonesiabersatu	877,024,716	100%
revise.spk	7,936,096,545	100%
innerwebbp	5,380,798,500	100%
crystal-caro	3,026,980,659	100%
labyrinths	9,475,229,027	100%
iykewatch12	3,315,825,140	100%
papillon199526	1,219,705,739	35%
lordneroo.vyb	451,929,990	100%
hppower	0	100%

@adolf39 · Dec 6 '23 (edited)

Fantastic tutorial on extracting PDF data with Pdfplumber! The step-by-step guide to working with lines, rectangles, and crop features is incredibly helpful. For those looking to take their PDF manipulation to the next level, I highly recommend checking out https://pdfflex.com/png-to-pdf – a free PDF converter that simplifies editing, merging, and compressing PDFs with just a few clicks. It's a game-changer!

properties (22)

`author`	adolf39
`permlink`	re-geekgirl-20231127t143233142z
`category`	python
`json_metadata`	{"tags":["python","pdfplumber","coding","programming","vyb","proofofbrain","stem","neoxian"],"app":"ecency/3.0.37-vision","format":"markdown+html"}
`created`	2023-11-27 13:32:33
`last_update`	2023-12-06 10:10:30
`depth`	1
`children`	0
`last_payout`	2023-12-04 13:32:33
`cashout_time`	1969-12-31 23:59:59
`total_payout_value`	0.000 HBD
`curator_payout_value`	0.000 HBD
`pending_payout_value`	0.000 HBD
`promoted`	0.000 HBD
`body_length`	412
`author_reputation`	-24,042,061,666
`root_title`	"Extracting PDF Data With Pdfplumber - Lines, Rectangles, And Crop"
`beneficiaries`	`[]`
`max_accepted_payout`	1,000,000.000 HBD
`percent_hbd`	10,000
`post_id`	129,209,538
`net_rshares`	0

@chinito · Aug 2 '22

$0.12

that is neat! very helpful tool! 😉🤙

👍 geekgirl, stemgeeks, stemcuration, saboin.stem, yggdrasil.laguna

`author`	chinito
`permlink`	re-geekgirl-rg0f52
`category`	python
`json_metadata`	{"tags":["python"],"app":"peakd/2022.07.1"}
`created`	2022-08-02 22:43:51
`last_update`	2022-08-02 22:43:51
`depth`	1
`children`	0
`last_payout`	2022-08-09 22:43:51
`cashout_time`	1969-12-31 23:59:59
`total_payout_value`	0.058 HBD
`curator_payout_value`	0.058 HBD
`pending_payout_value`	0.000 HBD
`promoted`	0.000 HBD
`body_length`	35
`author_reputation`	187,326,767,517,951
`root_title`	"Extracting PDF Data With Pdfplumber - Lines, Rectangles, And Crop"
`beneficiaries`	`[]`
`max_accepted_payout`	1,000,000.000 HBD
`percent_hbd`	10,000
`post_id`	115,381,923
`net_rshares`	154,945,450,118
`author_curate_reward`	""

properties (23)vote details (5)

voter	rshares	pct
geekgirl	153,023,322,600	7%
stemgeeks	1,849,210,037	6%
stemcuration	72,917,481	6%
yggdrasil.laguna	0	3%
saboin.stem	0	6%

@diyhub · Aug 3 '22

<div class="pull-right"><a href="https://steempeak.com/trending/hive-189641"><img src="https://cdn.steemitimages.com/DQmV9e1dikviiK47vokoSCH3WjuGWrd6PScpsgEL8JBEZp5/icon_comments.png"></a></div>

###### Thank you for sharing this amazing post on HIVE!

- Your content got selected by our fellow curator @priyanarc & you just received a little thank you via an upvote from our **non-profit** curation initiative!

- You will be **featured in** one of our recurring **curation compilations** and on our **pinterest** boards! Both are aiming to offer you a **stage to widen your audience** within and outside of the DIY scene of hive.

**Join** the official [DIYHub community on HIVE](https://peakd.com/trending/hive-189641) and show us more of your amazing work and feel free to connect with us and other DIYers via our discord server: https://discord.gg/mY5uCfQ !

If you want to support our goal to motivate other DIY/art/music/homesteading/... creators just delegate to us and earn 100% of your curation rewards!

###### Stay creative & hive on!

properties (22)

`author`	diyhub
`permlink`	re-extracting-pdf-data-with-pdfplumber-lines-rectangles-and-crop-20220803t202403z
`category`	python
`json_metadata`	"{"app": "beem/0.24.26"}"
`created`	2022-08-03 20:24:03
`last_update`	2022-08-03 20:24:03
`depth`	1
`children`	0
`last_payout`	2022-08-10 20:24:03
`cashout_time`	1969-12-31 23:59:59
`total_payout_value`	0.000 HBD
`curator_payout_value`	0.000 HBD
`pending_payout_value`	0.000 HBD
`promoted`	0.000 HBD
`body_length`	1,046
`author_reputation`	531,742,985,056,890
`root_title`	"Extracting PDF Data With Pdfplumber - Lines, Rectangles, And Crop"
`beneficiaries`	`[]`
`max_accepted_payout`	1,000,000.000 HBD
`percent_hbd`	10,000
`post_id`	115,408,330
`net_rshares`	0

@emeka4 · Aug 2 '22

$0.12

This is really nice @geekgirl and thanks for sharing

👍 geekgirl, stemgeeks, stemcuration, saboin.stem, yggdrasil.laguna

`author`	emeka4
`permlink`	rfz1ze
`category`	python
`json_metadata`	{"users":["geekgirl"],"app":"hiveblog/0.1"}
`created`	2022-08-02 05:02:06
`last_update`	2022-08-02 05:02:06
`depth`	1
`children`	0
`last_payout`	2022-08-09 05:02:06
`cashout_time`	1969-12-31 23:59:59
`total_payout_value`	0.058 HBD
`curator_payout_value`	0.057 HBD
`pending_payout_value`	0.000 HBD
`promoted`	0.000 HBD
`body_length`	52
`author_reputation`	234,166,618,016,346
`root_title`	"Extracting PDF Data With Pdfplumber - Lines, Rectangles, And Crop"
`beneficiaries`	`[]`
`max_accepted_payout`	1,000,000.000 HBD
`percent_hbd`	10,000
`post_id`	115,359,041
`net_rshares`	155,888,063,133
`author_curate_reward`	""

properties (23)vote details (5)

voter	rshares	pct
geekgirl	153,662,161,524	7%
stemgeeks	2,134,486,028	6%
stemcuration	91,415,581	6%
yggdrasil.laguna	0	3%
saboin.stem	0	6%

@garryob · May 29 '24

Merging PDF files sometimes takes a lot of time but it is still a solvable problem. While Adobe Acrobat Pro DC seems like an obvious choice, its capabilities fall short of expectations. I use Guru's feature-rich PDF converter, this tool not only flattens PDFs but also bypasses the file size and usage restrictions faced by other online sources. All the tricks and innovations of this file conversion technology are described in the blog https://pdfguru.com/blog/pdf-history-and-future .
Therefore, using a PDF converter, you can quickly and efficiently combine PDF files and solve the problem associated with the complexity of layers.

properties (22)

`author`	garryob
`permlink`	se8zxu
`category`	python
`json_metadata`	{"links":["https://pdfguru.com/blog/pdf-history-and-future"],"app":"hiveblog/0.1"}
`created`	2024-05-29 12:56:18
`last_update`	2024-05-29 12:56:18
`depth`	1
`children`	0
`last_payout`	2024-06-05 12:56:18
`cashout_time`	1969-12-31 23:59:59
`total_payout_value`	0.000 HBD
`curator_payout_value`	0.000 HBD
`pending_payout_value`	0.000 HBD
`promoted`	0.000 HBD
`body_length`	635
`author_reputation`	4,498,098,886
`root_title`	"Extracting PDF Data With Pdfplumber - Lines, Rectangles, And Crop"
`beneficiaries`	`[]`
`max_accepted_payout`	1,000,000.000 HBD
`percent_hbd`	10,000
`post_id`	134,031,569
`net_rshares`	0

@hivebuzz · Aug 2 '22

Congratulations @geekgirl! You have completed the following achievement on the Hive blockchain and have been rewarded with new badge(s):

<table><tr><td><img src="https://images.hive.blog/60x70/http://hivebuzz.me/@geekgirl/upvoted.png?202208020713"></td><td>You received more than 180000 upvotes.<br>Your next target is to reach 190000 upvotes.</td></tr>
</table>

<sub>_You can view your badges on [your board](https://hivebuzz.me/@geekgirl) and compare yourself to others in the [Ranking](https://hivebuzz.me/ranking)_</sub>
<sub>_If you no longer want to receive notifications, reply to this comment with the word_ `STOP`</sub>



**Check out the last post from @hivebuzz:**
<table><tr><td><a href="/hive-122221/@hivebuzz/pum-202207-result"><img src="https://images.hive.blog/64x128/https://i.imgur.com/mzwqdSL.png"></a></td><td><a href="/hive-122221/@hivebuzz/pum-202207-result">Hive Power Up Month Challenge 2022-07 - Winners List</a></td></tr><tr><td><a href="/hive-122221/@hivebuzz/pum-202208"><img src="https://images.hive.blog/64x128/https://i.imgur.com/M9RD8KS.png"></a></td><td><a href="/hive-122221/@hivebuzz/pum-202208">The 8th edition of the Hive Power Up Month starts today!</a></td></tr><tr><td><a href="/hive-122221/@hivebuzz/pud-202208"><img src="https://images.hive.blog/64x128/https://i.imgur.com/805FIIt.jpg"></a></td><td><a href="/hive-122221/@hivebuzz/pud-202208">Hive Power Up Day - August 1st 2022</a></td></tr></table>

properties (22)

`author`	hivebuzz
`permlink`	notify-geekgirl-20220802t073700
`category`	python
`json_metadata`	{"image":["http://hivebuzz.me/notify.t6.png"]}
`created`	2022-08-02 07:37:00
`last_update`	2022-08-02 07:37:00
`depth`	1
`children`	0
`last_payout`	2022-08-09 07:37:00
`cashout_time`	1969-12-31 23:59:59
`total_payout_value`	0.000 HBD
`curator_payout_value`	0.000 HBD
`pending_payout_value`	0.000 HBD
`promoted`	0.000 HBD
`body_length`	1,444
`author_reputation`	369,876,905,487,545
`root_title`	"Extracting PDF Data With Pdfplumber - Lines, Rectangles, And Crop"
`beneficiaries`	`[]`
`max_accepted_payout`	1,000,000.000 HBD
`percent_hbd`	10,000
`post_id`	115,361,473
`net_rshares`	0

@iykewatch12 · Aug 2 '22

$0.17

You have widened my horizon via this information you have passed out I will use this system to get pdf data when ever I have the need. Thank you a lot.

👍 geekgirl, stemgeeks, stemcuration, saboin.stem, yggdrasil.laguna

`author`	iykewatch12
`permlink`	rfzmel
`category`	python
`json_metadata`	{"app":"hiveblog/0.1"}
`created`	2022-08-02 12:23:15
`last_update`	2022-08-02 12:23:15
`depth`	1
`children`	0
`last_payout`	2022-08-09 12:23:15
`cashout_time`	1969-12-31 23:59:59
`total_payout_value`	0.084 HBD
`curator_payout_value`	0.084 HBD
`pending_payout_value`	0.000 HBD
`promoted`	0.000 HBD
`body_length`	151
`author_reputation`	13,668,240,655,441
`root_title`	"Extracting PDF Data With Pdfplumber - Lines, Rectangles, And Crop"
`beneficiaries`	`[]`
`max_accepted_payout`	1,000,000.000 HBD
`percent_hbd`	10,000
`post_id`	115,367,058
`net_rshares`	221,894,064,135
`author_curate_reward`	""

properties (23)vote details (5)

voter	rshares	pct
geekgirl	219,967,125,817	10%
stemgeeks	1,853,729,360	6%
stemcuration	73,208,958	6%
yggdrasil.laguna	0	3%
saboin.stem	0	6%

@lhes · Aug 2 '22

$0.12

I am not that good with regards to things like this.
Thank you for sharing

👍 geekgirl, stemgeeks, stemcuration, saboin.stem, yggdrasil.laguna

`author`	lhes
`permlink`	re-geekgirl-rfzr82
`category`	python
`json_metadata`	{"tags":["python"],"app":"peakd/2022.07.1"}
`created`	2022-08-02 14:09:54
`last_update`	2022-08-02 14:09:54
`depth`	1
`children`	0
`last_payout`	2022-08-09 14:09:54
`cashout_time`	1969-12-31 23:59:59
`total_payout_value`	0.058 HBD
`curator_payout_value`	0.058 HBD
`pending_payout_value`	0.000 HBD
`promoted`	0.000 HBD
`body_length`	74
`author_reputation`	316,732,055,715,881
`root_title`	"Extracting PDF Data With Pdfplumber - Lines, Rectangles, And Crop"
`beneficiaries`	`[]`
`max_accepted_payout`	1,000,000.000 HBD
`percent_hbd`	10,000
`post_id`	115,369,566
`net_rshares`	155,160,102,929
`author_curate_reward`	""

properties (23)vote details (5)

voter	rshares	pct
geekgirl	153,235,580,291	7%
stemgeeks	1,851,460,519	6%
stemcuration	73,062,119	6%
yggdrasil.laguna	0	3%
saboin.stem	0	6%

@maggard · Aug 2 '22

Great information.  Thank you.

👍 stemgeeks, stemcuration, saboin.stem, yggdrasil.laguna

`author`	maggard
`permlink`	rg05i0
`category`	python
`json_metadata`	{"tags":["stem"],"app":"stemgeeks/0.1","canonical_url":"https://stemgeeks.net/@maggard/rg05i0"}
`created`	2022-08-02 19:15:36
`last_update`	2022-08-02 19:15:36
`depth`	1
`children`	0
`last_payout`	2022-08-09 19:15:36
`cashout_time`	1969-12-31 23:59:59
`total_payout_value`	0.000 HBD
`curator_payout_value`	0.000 HBD
`pending_payout_value`	0.000 HBD
`promoted`	0.000 HBD
`body_length`	30
`author_reputation`	29,995,847
`root_title`	"Extracting PDF Data With Pdfplumber - Lines, Rectangles, And Crop"
`beneficiaries`	`[]`
`max_accepted_payout`	1,000,000.000 HBD
`percent_hbd`	10,000
`post_id`	115,377,074
`net_rshares`	1,919,734,257
`author_curate_reward`	""

properties (23)vote details (4)

voter	rshares	pct
stemgeeks	1,846,962,254	6%
stemcuration	72,772,003	6%
yggdrasil.laguna	0	3%
saboin.stem	0	6%

@marshallsss45 · Dec 20 '23

I will say that for those who deal with a large number of scanned documents, PDF Harvester from CoolUtils on the website https://www.coolutils.com/PDFCombine will be a real godsend, which is much easier to use. Not only does it merge files, but it also automatically removes those annoying blank pages. Saved me a lot of time and will definitely do the same for you. You can start by using the free version on the website.

properties (22)

`author`	marshallsss45
`permlink`	s5zfz8
`category`	python
`json_metadata`	{"links":["https://www.coolutils.com/PDFCombine"],"app":"hiveblog/0.1"}
`created`	2023-12-20 20:37:57
`last_update`	2023-12-20 20:37:57
`depth`	1
`children`	0
`last_payout`	2023-12-27 20:37:57
`cashout_time`	1969-12-31 23:59:59
`total_payout_value`	0.000 HBD
`curator_payout_value`	0.000 HBD
`pending_payout_value`	0.000 HBD
`promoted`	0.000 HBD
`body_length`	422
`author_reputation`	-21,177,227,860
`root_title`	"Extracting PDF Data With Pdfplumber - Lines, Rectangles, And Crop"
`beneficiaries`	`[]`
`max_accepted_payout`	1,000,000.000 HBD
`percent_hbd`	10,000
`post_id`	129,832,901
`net_rshares`	0

@shohana1 · Aug 2 '22

$0.12

>Pdfplumber has great documentation

Agree on that and github is a great source where from we collect resources. Thanks for sharing such helpful blog with us.

👍 geekgirl, stemgeeks, stemcuration, saboin.stem, yggdrasil.laguna

`author`	shohana1
`permlink`	re-geekgirl-202282t11726377z
`category`	python
`json_metadata`	{"tags":["python","pdfplumber","coding","programming","vyb","proofofbrain","stem","neoxian"],"app":"ecency/3.0.32-mobile","format":"markdown+html"}
`created`	2022-08-02 05:07:27
`last_update`	2022-08-02 05:07:27
`depth`	1
`children`	0
`last_payout`	2022-08-09 05:07:27
`cashout_time`	1969-12-31 23:59:59
`total_payout_value`	0.058 HBD
`curator_payout_value`	0.057 HBD
`pending_payout_value`	0.000 HBD
`promoted`	0.000 HBD
`body_length`	159
`author_reputation`	75,357,217,090,889
`root_title`	"Extracting PDF Data With Pdfplumber - Lines, Rectangles, And Crop"
`beneficiaries`	`[]`
`max_accepted_payout`	1,000,000.000 HBD
`percent_hbd`	10,000
`post_id`	115,359,135
`net_rshares`	155,671,279,371
`author_curate_reward`	""

properties (23)vote details (5)

voter	rshares	pct
geekgirl	153,448,135,558	7%
stemgeeks	2,131,895,907	6%
stemcuration	91,247,906	6%
yggdrasil.laguna	0	3%
saboin.stem	0	6%

@stemsocial · Aug 2 '22

re-geekgirl-extracting-pdf-data-with-pdfplumber-lines-rectangles-and-crop-20220802t045807142z

<div class='text-justify'> <div class='pull-left'>
 <img src='https://stem.openhive.network/images/stemsocialsupport7.png'> </div>

Thanks for your contribution to the <a href='/trending/hive-196387'>STEMsocial community</a>. Feel free to join us on <a href='https://discord.gg/9c7pKVD'>discord</a> to get to know the rest of us!

Please consider delegating to the @stemsocial account (85% of the curation rewards are returned).

You may also include @stemsocial as a beneficiary of the rewards of this post to get a stronger support.&nbsp;<br />&nbsp;<br />
</div>

👍 luisarivas, nazarethvu, phandongvu, leonelvu, hectorvu, luisavu, nerviavu, jhonq, linaqvu, mailinvu, hoavuu, qoalavu, purovi, isaacvu, miguivu, nitopvu, hilayvu, mvarillavu, munozvu

`author`	stemsocial
`permlink`	re-geekgirl-extracting-pdf-data-with-pdfplumber-lines-rectangles-and-crop-20220802t045807142z
`category`	python
`json_metadata`	{"app":"STEMsocial"}
`created`	2022-08-02 04:58:06
`last_update`	2022-08-02 04:58:06
`depth`	1
`children`	0
`last_payout`	2022-08-09 04:58:06
`cashout_time`	1969-12-31 23:59:59
`total_payout_value`	0.000 HBD
`curator_payout_value`	0.000 HBD
`pending_payout_value`	0.000 HBD
`promoted`	0.000 HBD
`body_length`	565
`author_reputation`	22,909,313,058,047
`root_title`	"Extracting PDF Data With Pdfplumber - Lines, Rectangles, And Crop"
`beneficiaries`	`[]`
`max_accepted_payout`	1,000,000.000 HBD
`percent_hbd`	10,000
`post_id`	115,358,986
`net_rshares`	0
`author_curate_reward`	""

properties (23)vote details (19)

voter	weight	wgt%	rshares	pct	time
munozvu	0 B		0	100%
mvarillavu	0 B		0	100%
hilayvu	0 B		0	100%
nitopvu	0 B		0	100%
miguivu	0 B		0	100%
isaacvu	0 B		0	100%
purovi	0 B		0	100%
qoalavu	0 B		0	100%
hoavuu	0 B		0	100%
mailinvu	0 B		0	100%
linaqvu	0 B		0	100%
jhonq	0 B		0	100%
nerviavu	0 B		0	100%
luisavu	0 B		0	100%
hectorvu	0 B		0	100%
leonelvu	0 B		0	100%
phandongvu	0 B		0	100%
nazarethvu	0 B		0	100%
luisarivas	0 B		0	100%

@videoaddiction · Aug 2 '22

$0.17

Extracting text from a PDF is a real mess. With pdfplumber, we can also extract the tables or shapes from a PDF page. Perhaps, it will be much more capable of doing from a scanned PDF after some developments.

👍 geekgirl, stemgeeks, stemcuration, saboin.stem, yggdrasil.laguna

`author`	videoaddiction
`permlink`	re-geekgirl-202282t95556217z
`category`	python
`json_metadata`	{"tags":["python","pdfplumber","coding","programming","vyb","proofofbrain","stem","neoxian"],"app":"ecency/3.0.32-mobile","format":"markdown+html"}
`created`	2022-08-02 06:55:57
`last_update`	2022-08-02 06:55:57
`depth`	1
`children`	0
`last_payout`	2022-08-09 06:55:57
`cashout_time`	1969-12-31 23:59:59
`total_payout_value`	0.083 HBD
`curator_payout_value`	0.083 HBD
`pending_payout_value`	0.000 HBD
`promoted`	0.000 HBD
`body_length`	208
`author_reputation`	165,652,292,195,025
`root_title`	"Extracting PDF Data With Pdfplumber - Lines, Rectangles, And Crop"
`beneficiaries`	`[]`
`max_accepted_payout`	1,000,000.000 HBD
`percent_hbd`	10,000
`post_id`	115,360,805
`net_rshares`	222,623,402,335
`author_curate_reward`	""

properties (23)vote details (5)

voter	rshares	pct
geekgirl	220,403,013,010	10%
stemgeeks	2,129,308,893	6%
stemcuration	91,080,432	6%
yggdrasil.laguna	0	3%
saboin.stem	0	6%