create account

Investigating High CPU Usage on the "Moon" Server by thecrazygm

View this thread on: hive.blogpeakd.comecency.com
· @thecrazygm · (edited)
$10.93
Investigating High CPU Usage on the "Moon" Server
Hey everyone, 

I've been keeping an eye on our "moon" server lately, and the CPU usage metrics have been consistently high, suggesting it might be time to invest in a new, more powerful machine. Before making that decision, I wanted to dig into the data to see exactly what was going on.

For some time now, I've been running a custom Python script, [`server_metrics.py`](https://gist.github.com/TheCrazyGM/9c5945224d89152827036b76e84bbbb1), at frequent intervals to collect data on system performance and store it in a SQLite database. This has given me a fantastic historical dataset to work with.

#### Visualizing the Problem

The first step was to visualize the trend. A picture is worth a thousand words, and plotting the data from the last two weeks confirmed my suspicions immediately.

![server_metrics_plot.png](https://files.peakd.com/file/peakd-hive/thecrazygm/23w3CrDCxfPgxXuPxK5CSyQAqWHU1yixukVc7poYh1v6fqcomrPpBanQcCJ1AfmLb9iYo.png)

As you can see, the CPU usage is frequently spiking and sustaining high levels, which isn't ideal for a server running multiple applications. The question now is: what's causing it?

#### Digging into the Data

To find the culprits, I wrote a SQL query to go through the collected metrics. The goal was to find which process names appeared most often as the top CPU consumer, what their average CPU usage was in those moments, and their maximum recorded spike. The results were immediate and unambiguous:

```sql
-- Count how many samples each process was the top-CPU process
SELECT
    top_cpu_name,
    COUNT(*)                 AS samples_as_top,
    AVG(top_cpu_percent)     AS avg_top_pct,
    MAX(top_cpu_percent)     AS max_top_pct
FROM metrics
-- restrict to last two weeks
WHERE timestamp >= datetime('now', '-14 days')
GROUP BY top_cpu_name
ORDER BY samples_as_top DESC
LIMIT 10;
```

| top_cpu_name       | samples_as_top | avg_top_pct | max_top_pct |
| :----------------- | -------------: | ----------: | ----------: |
| python3            |          14852 |        65.1 |       705.4 |
| systemd            |           2905 |         0.1 |       246.2 |
| mariadbd           |            661 |        2.86 |       150.0 |
| php-fpm7.4         |             96 |        6.59 |       633.1 |
| fail2ban-server    |             93 |         0.0 |         0.0 |
| caddy              |             91 |         0.0 |         0.0 |
| kworker/0:0-events |             33 |         0.0 |         0.0 |
| kworker/0:2-events |             24 |         0.0 |         0.0 |
| kworker/0:1-events |             23 |         0.0 |         0.0 |
| multipathd         |             22 |         0.0 |         0.0 |

As the data clearly shows, `python3` processes are the runaway top consumer of CPU resources on this server. It was the top process in over 14,800 samples, with an average CPU usage of 65% during those times. Most strikingly, it had a maximum spike of over 700%, indicating that at certain moments, Python scripts were consuming the equivalent of 7 full CPU cores.

This analysis narrows down the problem significantly. It's not a system-level issue with something like Caddy or the database (`mariadbd`); the load is coming directly from the Python applications I'm running.

The next logical step in this investigation is to dig deeper and differentiate _between_ the various `python3` processes to see which specific scripts are the heaviest hitters. But for now, we have a very clear answer to "What's using the CPU?". The answer is: Python.

#### Next Steps

I have added more verbose data gathering to `server_metrics.py` to track the command line argments of each process, so we know which one is which. I'll continue to monitor the data and report back to you as I find new insights.

As always,
Michael Garcia a.k.a. TheCrazyGM
👍  , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , and 129 others
👎  
properties (23)
authorthecrazygm
permlinkinvestigating-high-cpu-usage-on-the-moon-server
categoryhive-186392
json_metadata{"app":"peakd/2025.7.1","format":"markdown","image":["https://files.peakd.com/file/peakd-hive/thecrazygm/23w3CrDCxfPgxXuPxK5CSyQAqWHU1yixukVc7poYh1v6fqcomrPpBanQcCJ1AfmLb9iYo.png"],"tags":["dev","tribes","archon","pimp","proofofbrain","musing"],"users":[]}
created2025-07-15 15:34:39
last_update2025-07-15 16:27:09
depth0
children4
last_payout2025-07-22 15:34:39
cashout_time1969-12-31 23:59:59
total_payout_value5.476 HBD
curator_payout_value5.453 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length3,798
author_reputation104,038,979,497,749
root_title"Investigating High CPU Usage on the "Moon" Server"
beneficiaries[]
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id144,083,377
net_rshares31,211,411,761,355
author_curate_reward""
vote details (194)
@ecoinstant ·
$0.02
Keep us in the loop, this is a fascinating case since my expectation would be we should be using less CPU now with less active player base between seasons.

!PAKX
!PIMP
!PIZZA
👍  ,
properties (23)
authorecoinstant
permlinkre-thecrazygm-szg630
categoryhive-186392
json_metadata{"tags":["hive-186392"],"app":"peakd/2025.7.1","image":[],"users":[]}
created2025-07-15 15:37:51
last_update2025-07-15 15:37:51
depth1
children1
last_payout2025-07-22 15:37:51
cashout_time1969-12-31 23:59:59
total_payout_value0.010 HBD
curator_payout_value0.010 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length175
author_reputation862,982,692,690,250
root_title"Investigating High CPU Usage on the "Moon" Server"
beneficiaries[]
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id144,083,420
net_rshares59,769,004,405
author_curate_reward""
vote details (2)
@pakx ·
<center><table><tr></tr><tr><td><center><img src='https://files.peakd.com/file/peakd-hive/pakx/PakX-logo-transparent.png'><p><sup><a href='https://hive-engine.com/?p=market&t=PAKX'>View or trade </a> <code>PAKX</code> tokens.</sup></p></center></td><td><center>@ecoinstant, PAKX has voted the post by @thecrazygm. (2/2 calls)</p><br><br><p>Use !PAKX command if you hold enough balance to call for a @pakx vote on worthy posts! More details available on <a href='/@pakx'>PAKX Blog</a>.</p></center></td></tr></table></center>
👍  
properties (23)
authorpakx
permlinkre-ecoinstant-1752593926
categoryhive-186392
json_metadata"{"tags": ["pakx", "hivepakistan"], "app": "HiveDiscoMod"}"
created2025-07-15 15:38:45
last_update2025-07-15 15:38:45
depth2
children0
last_payout2025-07-22 15:38:45
cashout_time1969-12-31 23:59:59
total_payout_value0.000 HBD
curator_payout_value0.000 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length524
author_reputation90,359,712,187
root_title"Investigating High CPU Usage on the "Moon" Server"
beneficiaries[]
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id144,083,435
net_rshares51,387,305,894
author_curate_reward""
vote details (1)
@pizzabot ·
$0.02
<center>PIZZA!


$PIZZA slices delivered:
@ecoinstant<sub>(2/20)</sub> tipped @thecrazygm 


<sub>Come get [MOON](https://moon.hive.pizza)ed!</sub></center>
👍  
properties (23)
authorpizzabot
permlinkre-investigating-high-cpu-usage-on-the-moon-server-20250715t153814z
categoryhive-186392
json_metadata"{"app": "pizzabot"}"
created2025-07-15 15:38:15
last_update2025-07-15 15:38:15
depth1
children0
last_payout2025-07-22 15:38:15
cashout_time1969-12-31 23:59:59
total_payout_value0.011 HBD
curator_payout_value0.011 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length156
author_reputation7,607,704,952,650
root_title"Investigating High CPU Usage on the "Moon" Server"
beneficiaries[]
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id144,083,428
net_rshares67,245,331,721
author_curate_reward""
vote details (1)
@tydynrain ·
$0.02
I've been through that process of narrowing down possible causes to issues countless times, and I know how challenging it can be sometimes, so congratulations on finding part of the cause. I'm sure that you'll noodle out the last specifics in no time! 😁🙏💚✨🤙
👍  
properties (23)
authortydynrain
permlinkre-thecrazygm-2025715t195810128z
categoryhive-186392
json_metadata{"tags":["dev","tribes","archon","pimp","proofofbrain","musing"],"app":"ecency/4.2.1-vision","format":"markdown+html"}
created2025-07-16 05:58:09
last_update2025-07-16 05:58:09
depth1
children0
last_payout2025-07-23 05:58:09
cashout_time1969-12-31 23:59:59
total_payout_value0.010 HBD
curator_payout_value0.011 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length257
author_reputation204,843,050,088,836
root_title"Investigating High CPU Usage on the "Moon" Server"
beneficiaries[]
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id144,099,638
net_rshares66,521,836,340
author_curate_reward""
vote details (1)