Random heap corruption (regression?) on servers

nta · December 31, 2022, 7:43pm

Okay, this is indeed another heap corruption crash except WER ended up acting slow and instead it tripped the live dump.

(this specific case of heap corruption seems to have come from a different place than usual - attaching procdump to catch a future case may be helpful)

Plouffe · December 31, 2022, 7:55pm

Will do, i’ll let you know when i have a few more dump

nta · December 31, 2022, 7:58pm

For what it’s worth, once 6183 is out it should fix the crash handler not getting invoked for these crashes so that it should act in a bit more predictable/obvious a way.

BennoBaba · January 1, 2023, 11:59pm

Just wanted to share update if anyone is interested.

On the latest scheduled restart, I completely closed the server process ( closed TXADMIN ) and started it again, until now, no crashes or errors of any kind.

Ill also do this for the next 2 scheduled restarts and then let the TXADMIN do its restart procedure and notify here if then crashes occur.

Thanks!

nta · January 2, 2023, 9:20am

Updating the server without restarting txAdmin probably led to the old version still being started somehow.

BennoBaba · January 3, 2023, 4:22pm

@Plouffe Do you have, by any chance, multiple FiveM servers on that same machine?

Plouffe · January 4, 2023, 4:27pm

No i do not

Plouffe · January 4, 2023, 4:31pm

So like i told you i kept monitoring, i have a few more dumps but i haven’t seen anything more. Im still running 6181, i’ll update to 6185 today.
Would it be of any help if i provided dumps from 6181 or would you rahter have the ones from 6185?

BennoBaba · January 6, 2023, 4:03pm

Okay, something that I noticed in my case. Nobody from my team launched remote desktop to do something on server for 2+ days.

Server did fine, no restarts, not issues or anything. Few minutes ago, I opened and established remote connection to turn on second test server on it.

After launching the server and then quitting remote connection, main server (115+ players) instantly crashed (while test server was still on and didn’t crash, both are using txadmin)

Maybe its time to switch to linux again

Plouffe · January 6, 2023, 5:44pm

After updating to the latest artifact a actually have a Fx crash message in the fx server console and i also have actual crash logs!
This looks like some kind of progress wich is great.

Here is the Link for the latest crash logs from fx server.

Looking foward to see whats up with it !

nta · January 6, 2023, 7:20pm

Do you happen to have a crash dump for that? Also, are you running the server from an interactive session or as an NT service?

Running background apps on an interactive session is a bad idea no matter what…

BennoBaba · January 7, 2023, 1:28am

Hello,

I do have crash from that specific time - here is the link. Both of the servers have exact same resources turned on and are sharing same database.

To be honest, I do not know what you mean by that… I was running both main and test server same way, by clicking on the FXServer.exe in the artifacts folder (making shortcut to it), both are separated of course.

Thanks!

nta · January 7, 2023, 7:39am

Try using something like NSSM (https://nssm.cc/) to run it decoupled from the actual interactive session.

Plouffe · January 18, 2023, 7:51pm

Throwing that out there but, could this be cause by ‘SaveResourceFile’ ?

nta · January 18, 2023, 9:42pm

What makes you think this to be the case?

Plouffe · January 18, 2023, 10:05pm

I have a resources that once the server is started creates 3 pretty big json files has ‘Backup’ and it backups once every 30 minutes.

As the server crashes right on startup sometimes, i figured this could be linked to the issue.
I also remember seeing an error that mentioned something about write acces.
It might not be anywhere near my issue but im still searching.

I just rewrote the whole thing to save this data with mysql to test it out.

Also im not sure if you saw or maybe it wasn’t concluding but i uploaded a crash dump that was log by fxServer in my last post.

squizer · January 19, 2023, 10:23pm

We don’t use too much SaveResourceFile. We handle everything through database mysql but it still sometimes crash

BennoBaba · February 9, 2023, 7:32pm

@Plouffe Do you have any luck with this one, did you fix crashing?

I did pretty much everything, removed most of the StateBags, changed hosting, used older artifacts.
Only difference is that I now get

[ citizen-server-impl] Server list query returned an error: System.Threading.Tasks.TaskCanceledException: A task was canceled. <- System.TimeoutException: A task was canceled. <- System.Threading.Tasks.TaskCanceledException: The request was canceled due to the configured HttpClient.Timeout of 30 seconds elapsing.

Unhandled Exception:
System.NullReferenceException: Object reference not set to an instance of an object
Unhandled exception in Mono script environment: System.NullReferenceException: Object reference not set to an instance of an object
(null)> txaEvent "serverShuttingDown" "{＂delay＂:5000,＂author＂:＂txAdmin＂,＂message＂:＂Server is shutting down: (Server stopped).＂}"

Also, @nta does this line have any meaning to you

Feb 10 18:44:13 5600x kernel: [97146.307251] traps: luv_tcp5[212245] general protection fault ip:7fd02ecf09aa sp:7fd008e7d360 error:0 in ld-musl-x86_64.so.1[7fd02ecdf000+4b000]

That is the message that appears in the kernel log right when server crashes

EDIT: Also I noticed that crash usually occurs RIGHT AFTER heartbeat:

[ citizen-server-impl] Sending heartbeat to https://servers-ingress-live.fivem.net/ingress
[ citizen-server-impl] sync thread hitch warning: timer interval of 102 milliseconds


=================================================================
FXServer crashed.
A dump can be found at /root/FIVEM/MAIN/alpine/opt/cfx-server/crashes/7a3e30b6-6309-4a1e-12abd89a-da288941.dmp.
Crash report ID: bc54b898-55ee-417b-81d0-bfc57a5c0d20
=================================================================
> txaEvent "serverShuttingDown" "{＂delay＂:5000,＂author＂:＂txAdmin＂,＂message＂:＂Server se restartuje: (Server se zaustavio).＂}"

squizer · February 12, 2023, 10:28am

On our server, it happens when there is ± 190 players And people say server crashes when a blimp drops down or someone explodes the gas station.

nta · February 12, 2023, 11:04am

Entirely unrelated to the crashes that have been discussed in this topic so far. If that is a thing you should provide dumps for that scenario separately.