create account

Clean XML files by misticogama

View this thread on: hive.blogpeakd.comecency.com
· @misticogama ·
$7.57
Clean XML files
<center>
![](https://images.ecency.com/DQmaCujnBwgHUizNcSAM4orXNxzH7uCQtk1GxNWrSc6Jntq/clean.png)
</center>
## SPANISH
Les saluda su querid铆simo nekito incubo favorito 馃樃, 隆y hoy les traigo un tip s煤per 煤til! Como saben, soy de M茅xico, y me he encontrado con un reto t茅cnico que tal vez muchos de ustedes tambi茅n hayan enfrentado: la limpieza de archivos XML. En este pa铆s, los XML son bastante comunes, especialmente en temas de facturaci贸n y tr谩mites, pero trabajar con ellos a veces puede ser un dolor de cabeza, ya que suelen venir llenos de caracteres extra帽os o desordenados que dificultan su lectura.

## Desglose del C贸digo para Procesar y Limpiar XML

Este peque帽o c贸digo procesa una respuesta en formato XML, extrayendo la informaci贸n que necesitamos, ya sea desde un archivo o directamente desde la base de datos, y lo limpia para su posterior uso, utilizando el lenguaje de PHP . En este caso, el contenido en XML que queremos manejar se encuentra en una secci贸n espec铆fica marcada por las etiquetas <s0:xml> ... </s0:xml>. El objetivo es lograr que el XML quede en un formato limpio y estructurado, lo que nos facilita mucho el trabajo de interpretaci贸n. Ahora, te explico cada parte para que puedas implementarlo f谩cilmente.

1. Decodificaci贸n y eliminaci贸n de espacios

```
$stm = trim(htmlspecialchars_decode(html_entity_decode($xml->response)), " \t\n\r\"");
```

- ``` html_entity_decode($xml->response)``` : Decodifica cualquier entidad HTML en xml->response (por ejemplo, convierte &amp; en &).
- ``` htmlspecialchars_decode(...)``` : Elimina cualquier codificaci贸n de caracteres especiales HTML (por ejemplo, convierte &quot; en ").
- ``` trim(..., " \t\n\r\"")``` : Elimina espacios en blanco, tabulaciones, saltos de l铆nea, retornos de carro y comillas (") del principio y final de la cadena resultante.

El resultado es que $stm contiene el texto XML decodificado y limpiado de espacios y comillas externas.

2. C谩lculo de la longitud total de la cadena

```$str_fin = strlen($stm);```
Aqu铆 simplemente se obtiene la longitud de la cadena $stm y se guarda en $str_fin, para usarla luego en la extracci贸n.

3. Localizaci贸n de la posici贸n de la etiqueta ```<s0:xml>```
```$str_inicio = strpos($stm, '<s0:xml>') + 8;```

- ```strpos($stm, '<s0:xml>')```: Encuentra la posici贸n donde aparece la etiqueta ```<s0:xml>``` en $stm.
- ```+ 8```: Suma 8 al 铆ndice encontrado para saltar la etiqueta completa (```<s0:xml>```), de modo que el 铆ndice de ```$str_inicio``` apunte justo despu茅s de la etiqueta de apertura.

4. Extracci贸n del contenido despu茅s de <s0:xml>
```$str_tmp = substr($stm, $str_inicio, $str_fin);```

- ```substr($stm, $str_inicio, $str_fin)```: Extrae una subcadena de $stm comenzando en $str_inicio y extendi茅ndose hasta $str_fin (longitud completa de $stm).
- Esto guarda en $str_tmp todo el contenido que est谩 despu茅s de la etiqueta <s0:xml>.

5. Localizaci贸n de la posici贸n de cierre </s0:xml>
```$str_fin = strpos($str_tmp, '</s0:xml>');```

- ```strpos($str_tmp, '</s0:xml>')```: Encuentra la posici贸n de la etiqueta de cierre </s0:xml> dentro de $str_tmp, indicando el final de la porci贸n XML que nos interesa.

6. Extracci贸n del contenido XML
```$str_tmp = substr($str_tmp, 0, $str_fin);```
- ```substr($str_tmp, 0, $str_fin)```: Extrae la subcadena desde el inicio de $str_tmp hasta la posici贸n de cierre </s0:xml>, guardando solo el contenido XML deseado en $str_tmp.

7. Limpieza final de saltos de l铆nea
```$str_tmp = preg_replace("/[\r\n|\n|\r]+/", "", $str_tmp);```

- ```preg_replace("/[\r\n|\n|\r]+/", "", $str_tmp)```: Elimina todos los saltos de l铆nea (\r y \n) de $str_tmp, dej谩ndolo en una sola l铆nea sin saltos.


codigo completo
<code>
$stm = trim(htmlspecialchars_decode(html_entity_decode($xml->response))," \t\n\r\"");
$str_fin = strlen($stm);
$str_inicio = strpos($stm, '<s0:xml>') + 8;
$str_tmp = substr($stm,$str_inicio,$str_fin);
$str_fin = strpos($str_tmp, '</s0:xml>');
$str_tmp = substr($str_tmp,0,$str_fin);
$str_tmp = preg_replace("/[\r\n|\n|\r]+/", "", $str_tmp);
</code>




Con este proceso, logramos tomar solo la informaci贸n relevante, limpiar esos molestos caracteres, y dejar el XML listo para usar en nuestras aplicaciones. Adem谩s, algo que me encanta de esta soluci贸n es su flexibilidad, porque podr铆as adaptarlo para otras etiquetas o incluso para datos similares que necesites en otros proyectos.

Con esta t茅cnica espero que puedan encontrar una soluci贸n pr谩ctica para limpiar sus archivos XML o, al menos, que les sirva como gu铆a para resolver problemas similares que encuentren en sus proyectos. 隆La idea es facilitarles el trabajo y que puedan interpretar sus datos sin estr茅s!


### 隆Espero que este tip les sea de gran ayuda!
As铆 que ya saben, si en alg煤n momento se encuentran con un archivo XML desordenado o con caracteres extra帽os, 隆prueben este m茅todo y me cuentan c贸mo les va! Nos vemos en la pr贸xima, y recuerden que estoy aqu铆 para ayudarles en sus aventuras tecnol贸gicas. 馃樃

![](https://images.ecency.com/DQmYJfbBHnbc7Nmc8rMFCAt9hdyFzcM6mUzQxJEc6YuaGiW/separador_mistico_.png)
<hr>

## ENGLISH

Greetings from your beloved favorite incubator 馃樃, and today I bring you a super useful tip! As you know, I'm from Mexico, and I've encountered a technical challenge that perhaps many of you have also faced: cleaning XML files. In this country, XMLs are quite common, especially in billing and paperwork issues, but working with them can sometimes be a headache, since they usually come full of strange or disordered characters that make them difficult to read.

## Code Breakdown to Process and Clean XML

This little code processes a response in XML format, extracting the information we need, either from a file or directly from the database, and cleans it for later use, using the PHP language. In this case, the XML content we want to handle is located in a specific section marked by the tags <s0:xml> ... </s0:xml>. The goal is to get the XML into a clean and structured format, which makes it much easier for us to interpret it. Now, I'll explain each part so you can easily implement it.

1. Decoding and removing spaces

```
$stm = trim(htmlspecialchars_decode(html_entity_decode($xml->response)), " \t\n\r\"");
```

- ``` html_entity_decode($xml->response)``` : Decodes any HTML entity in xml->response (e.g. converts &amp; to &).
- ``` htmlspecialchars_decode(...)``` : Removes any HTML special character encoding (e.g. converts &quot; to ").
- ``` trim(..., " \t\n\r\"")``` : Removes whitespace, tabs, line breaks, carriage returns and quotes (") from the beginning and end of the resulting string.

The result is that $stm contains the decoded XML text cleaned of spaces and external quotes.

2. Calculating the total length of the string

```$str_fin = strlen($stm);```
Here we simply obtain the length of the string $stm and save it in $str_fin, to be used later in the extraction.

3. Locating the position of the ```<s0:xml>``` tag
```$str_start = strpos($stm, '<s0:xml>') + 8;```

- ```strpos($stm, '<s0:xml>')```: Finds the position where the ```<s0:xml>``` tag appears in $stm.
- ```+ 8```: Adds 8 to the index found to skip the entire tag (```<s0:xml>```), so that the index of ```$str_start``` points right after the opening tag.

4. Extracting content after <s0:xml>
```$str_tmp = substr($stm, $str_start, $str_end);```

- ```substr($stm, $str_start, $str_end)```: Extracts a substring from $stm starting at $str_start and extending to $str_end (full length of $stm).
- This saves all content after the <s0:xml> tag to $str_tmp.

5. Finding the closing position </s0:xml>
```$str_fin = strpos($str_tmp, '</s0:xml>');```

- ```strpos($str_tmp, '</s0:xml>')```: Finds the position of the closing tag </s0:xml> within $str_tmp, indicating the end of the XML portion we are interested in.

6. Extracting the XML content
```$str_tmp = substr($str_tmp, 0, $str_fin);```
- ```substr($str_tmp, 0, $str_fin)```: Extracts the substring from the start of $str_tmp to the closing position </s0:xml>, saving only the desired XML content in $str_tmp.

7. Final line break cleanup
```$str_tmp = preg_replace("/[\r\n|\n|\r]+/", "", $str_tmp);```

- ```preg_replace("/[\r\n|\n|\r]+/", "", $str_tmp)```: Removes all line breaks (\r and \n) from $str_tmp, leaving it on a single line without any breaks.

full code
<code>
$stm = trim(htmlspecialchars_decode(html_entity_decode($xml->response))," \t\n\r\"");
$str_fin = strlen($stm);
$str_inicio = strpos($stm, '<s0:xml>') + 8;
$str_tmp = substr($stm,$str_inicio,$str_fin);
$str_fin = strpos($str_tmp, '</s0:xml>');
$str_tmp = substr($str_tmp,0,$str_fin);
$str_tmp = preg_replace("/[\r\n|\n|\r]+/", "", $str_tmp);
</code>

With this process, we manage to take only the relevant information, clean those annoying characters, and leave the XML ready to use in our applications. Also, something I love about this solution is its flexibility, because you could adapt it for other tags or even for similar data that you need in other projects.

With this technique I hope you can find a practical solution to clean up your XML files or, at least, that it serves as a guide to solve similar problems you encounter in your projects. The idea is to make your work easier and that you can interpret your data without stress!

### I hope this tip is of great help to you!
So you know, if at any time you find yourself with a messy XML file or with strange characters, try this method and tell me how it goes! See you next time, and remember that I am here to help you in your technological adventures. 馃樃


![](https://images.ecency.com/DQmYJfbBHnbc7Nmc8rMFCAt9hdyFzcM6mUzQxJEc6YuaGiW/separador_mistico_.png)

<hr>

<center>  

![](https://images.ecency.com/DQmWoDZ3uM1U33NcXDSp63byePECwJxLFXA6VC2B5iC6tky/creditos.png)

Portada realizada en photoshop
Separador realizado por @softy1231 [softy1231](https://linktr.ee/softy_1231)
Vtuber, Paneles realizado por @panna-natha [pannanatha](https://linktr.ee/natha_arceramos) 
Logo realizado por [KivaVT](https://x.com/KivaVT)
Porta base realizada por @smile27

[<img src="https://images.ecency.com/DQmY6nYuMxRjGNTCqtNvgdbaZBEv4d3Vfv2iQpUbtQuuDyS/redes.png" alt="Redes Sociales">](https://linktr.ee/misticogama) 

![](https://images.ecency.com/DQmSbDJo2VWyLHpDThxRecw7cZ4jJYLU5bo6CcRTsD9d5xW/misticogama.gif) 

</center>


馃憤  , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , and 212 others
properties (23)
authormisticogama
permlinkclean-xml-files
categoryhive-116823
json_metadata"{"app":"ecency/4.0.1-vision","tags":["hive-116823","spanish","php","clean","xml","tips","hueso","ecency"],"format":"markdown+html","image":["https://images.ecency.com/DQmaCujnBwgHUizNcSAM4orXNxzH7uCQtk1GxNWrSc6Jntq/clean.png","https://images.ecency.com/DQmYJfbBHnbc7Nmc8rMFCAt9hdyFzcM6mUzQxJEc6YuaGiW/separador_mistico_.png","https://images.ecency.com/DQmWoDZ3uM1U33NcXDSp63byePECwJxLFXA6VC2B5iC6tky/creditos.png","https://images.ecency.com/DQmY6nYuMxRjGNTCqtNvgdbaZBEv4d3Vfv2iQpUbtQuuDyS/redes.png","https://images.ecency.com/DQmSbDJo2VWyLHpDThxRecw7cZ4jJYLU5bo6CcRTsD9d5xW/misticogama.gif"],"thumbnails":["https://images.ecency.com/DQmaCujnBwgHUizNcSAM4orXNxzH7uCQtk1GxNWrSc6Jntq/clean.png","https://images.ecency.com/DQmYJfbBHnbc7Nmc8rMFCAt9hdyFzcM6mUzQxJEc6YuaGiW/separador_mistico_.png","https://images.ecency.com/DQmYJfbBHnbc7Nmc8rMFCAt9hdyFzcM6mUzQxJEc6YuaGiW/separador_mistico_.png","https://images.ecency.com/DQmWoDZ3uM1U33NcXDSp63byePECwJxLFXA6VC2B5iC6tky/creditos.png","https://images.ecency.com/DQmY6nYuMxRjGNTCqtNvgdbaZBEv4d3Vfv2iQpUbtQuuDyS/redes.png","https://images.ecency.com/DQmSbDJo2VWyLHpDThxRecw7cZ4jJYLU5bo6CcRTsD9d5xW/misticogama.gif"],"description":"SPANISH Les saluda su querid铆simo nekito incubo favorito 馃樃, 隆y hoy les traigo un tip s煤per 煤til! Como saben, soy de M茅xico, y me he encontrado con un reto t茅cnico que tal vez muchos de ustedes tambi茅n","image_ratios":["1.9000","9.6855","1.0000","1.0000","2.0000"]}"
created2024-11-13 06:00:45
last_update2024-11-13 06:00:45
depth0
children7
last_payout2024-11-20 06:00:45
cashout_time1969-12-31 23:59:59
total_payout_value3.804 HBD
curator_payout_value3.771 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length10,299
author_reputation79,669,929,914,684
root_title"Clean XML files"
beneficiaries[]
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id138,429,662
net_rshares22,541,361,770,022
author_curate_reward""
vote details (276)
@alberto0607 ·
@misticogama un abrazo. Gracias por compartir este script, es bueno tener esta informaci贸n porque puede ser de gran utilidad cuando nos enfrentamos a esos molestos archivos XML. Bienvenido a la comunidad. Hoy estaremos compartiendo #ViernesDeEscritorio por si usas alguna distribuci贸n GNU/Linux. 
properties (22)
authoralberto0607
permlinkre-misticogama-smz6cp
categoryhive-116823
json_metadata{"tags":["hive-116823"],"app":"peakd/2024.11.1","image":[],"users":["misticogama"]}
created2024-11-15 04:41:15
last_update2024-11-15 04:41:15
depth1
children1
last_payout2024-11-22 04:41:15
cashout_time1969-12-31 23:59:59
total_payout_value0.000 HBD
curator_payout_value0.000 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length296
author_reputation85,381,778,745,006
root_title"Clean XML files"
beneficiaries[]
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id138,465,837
net_rshares0
@misticogama ·
Claro, espero que les ayude a alguien a futuro o darse ideas a su solucion, de que se trata lo que me comentas?
properties (22)
authormisticogama
permlinkre-alberto0607-20241115t0320616z
categoryhive-116823
json_metadata{"content_type":"general","type":"comment","tags":["hive-116823"],"app":"ecency/3.1.6-mobile","format":"markdown+html"}
created2024-11-15 06:03:21
last_update2024-11-15 06:03:21
depth2
children0
last_payout2024-11-22 06:03:21
cashout_time1969-12-31 23:59:59
total_payout_value0.000 HBD
curator_payout_value0.000 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length111
author_reputation79,669,929,914,684
root_title"Clean XML files"
beneficiaries[]
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id138,466,750
net_rshares0
@angeluxx ·
No entiendo nada, pero es muy linda tu indormaci贸n
properties (22)
authorangeluxx
permlinkre-misticogama-smx9zc
categoryhive-116823
json_metadata{"tags":["hive-116823"],"app":"peakd/2024.11.1","image":[],"users":[]}
created2024-11-14 04:04:27
last_update2024-11-14 04:04:27
depth1
children1
last_payout2024-11-21 04:04:27
cashout_time1969-12-31 23:59:59
total_payout_value0.000 HBD
curator_payout_value0.000 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length50
author_reputation74,304,625,043,853
root_title"Clean XML files"
beneficiaries[]
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id138,447,740
net_rshares0
@misticogama ·
Muchas gracias
properties (22)
authormisticogama
permlinkre-angeluxx-20241114t10232339z
categoryhive-116823
json_metadata{"tags":["hive-116823"],"app":"ecency/4.0.1-vision","format":"markdown+html"}
created2024-11-14 16:02:33
last_update2024-11-14 16:02:33
depth2
children0
last_payout2024-11-21 16:02:33
cashout_time1969-12-31 23:59:59
total_payout_value0.000 HBD
curator_payout_value0.000 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length14
author_reputation79,669,929,914,684
root_title"Clean XML files"
beneficiaries[]
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id138,456,239
net_rshares0
@hivebuzz ·
Congratulations @misticogama! You have completed the following achievement on the Hive blockchain And have been rewarded with New badge(s)

<table><tr><td><img src="https://images.hive.blog/60x70/https://hivebuzz.me/@misticogama/comments.png?202411140306"></td><td>You made more than 2500 comments.<br>Your next target is to reach 3000 comments.</td></tr>
</table>

<sub>_You can view your badges on [your board](https://hivebuzz.me/@misticogama) and compare yourself to others in the [Ranking](https://hivebuzz.me/ranking)_</sub>
<sub>_If you no longer want to receive notifications, reply to this comment with the word_ `STOP`</sub>



**Check out our last posts:**
<table><tr><td><a href="/hive-122221/@hivebuzz/lpud-202411"><img src="https://images.hive.blog/64x128/https://i.imgur.com/pVZi2Md.png"></a></td><td><a href="/hive-122221/@hivebuzz/lpud-202411">LEO Power Up Day - November 15, 2024</a></td></tr></table>
properties (22)
authorhivebuzz
permlinknotify-1731553877
categoryhive-116823
json_metadata{"image":["https://hivebuzz.me/notify.t6.png"]}
created2024-11-14 03:11:18
last_update2024-11-14 03:11:18
depth1
children0
last_payout2024-11-21 03:11:18
cashout_time1969-12-31 23:59:59
total_payout_value0.000 HBD
curator_payout_value0.000 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length919
author_reputation369,400,396,067,243
root_title"Clean XML files"
beneficiaries[]
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id138,447,231
net_rshares0
@mariiale1979 ·
Interesting information that can help to reduce inconveniences with the files, I will keep that in mind.
properties (22)
authormariiale1979
permlinkre-misticogama-smx5od
categoryhive-116823
json_metadata{"tags":["hive-116823"],"app":"peakd/2024.11.1","image":[],"users":[]}
created2024-11-14 02:31:24
last_update2024-11-14 02:31:24
depth1
children1
last_payout2024-11-21 02:31:24
cashout_time1969-12-31 23:59:59
total_payout_value0.000 HBD
curator_payout_value0.000 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length104
author_reputation94,519,449,782,743
root_title"Clean XML files"
beneficiaries[]
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id138,446,842
net_rshares0
@misticogama ·
That's right, thank you very much
properties (22)
authormisticogama
permlinkre-mariiale1979-20241114t10254524z
categoryhive-116823
json_metadata{"tags":["hive-116823"],"app":"ecency/4.0.1-vision","format":"markdown+html"}
created2024-11-14 16:02:57
last_update2024-11-14 16:02:57
depth2
children0
last_payout2024-11-21 16:02:57
cashout_time1969-12-31 23:59:59
total_payout_value0.000 HBD
curator_payout_value0.000 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length33
author_reputation79,669,929,914,684
root_title"Clean XML files"
beneficiaries[]
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id138,456,247
net_rshares0