Random heap corruption (regression?) on servers

Try using something like NSSM (https://nssm.cc/) to run it decoupled from the actual interactive session.

Throwing that out there but, could this be cause by ā€˜SaveResourceFileā€™ ?

What makes you think this to be the case?

I have a resources that once the server is started creates 3 pretty big json files has ā€˜Backupā€™ and it backups once every 30 minutes.

As the server crashes right on startup sometimes, i figured this could be linked to the issue.
I also remember seeing an error that mentioned something about write acces.
It might not be anywhere near my issue but im still searching.

I just rewrote the whole thing to save this data with mysql to test it out.

Also im not sure if you saw or maybe it wasnā€™t concluding but i uploaded a crash dump that was log by fxServer in my last post.

We donā€™t use too much SaveResourceFile. We handle everything through database mysql but it still sometimes crash

@Plouffe Do you have any luck with this one, did you fix crashing? :confused:

I did pretty much everything, removed most of the StateBags, changed hosting, used older artifacts.
Only difference is that I now get

[ citizen-server-impl] Server list query returned an error: System.Threading.Tasks.TaskCanceledException: A task was canceled. <- System.TimeoutException: A task was canceled. <- System.Threading.Tasks.TaskCanceledException: The request was canceled due to the configured HttpClient.Timeout of 30 seconds elapsing.

Unhandled Exception:
System.NullReferenceException: Object reference not set to an instance of an object
Unhandled exception in Mono script environment: System.NullReferenceException: Object reference not set to an instance of an object
(null)> txaEvent "serverShuttingDown" "{ļ¼‚delayļ¼‚:5000,ļ¼‚authorļ¼‚:ļ¼‚txAdminļ¼‚,ļ¼‚messageļ¼‚:ļ¼‚Server is shutting down: (Server stopped).ļ¼‚}"

Also, @nta does this line have any meaning to you

Feb 10 18:44:13 5600x kernel: [97146.307251] traps: luv_tcp5[212245] general protection fault ip:7fd02ecf09aa sp:7fd008e7d360 error:0 in ld-musl-x86_64.so.1[7fd02ecdf000+4b000]

That is the message that appears in the kernel log right when server crashes

EDIT: Also I noticed that crash usually occurs RIGHT AFTER heartbeat:

[ citizen-server-impl] Sending heartbeat to https://servers-ingress-live.fivem.net/ingress
[ citizen-server-impl] sync thread hitch warning: timer interval of 102 milliseconds


=================================================================
FXServer crashed.
A dump can be found at /root/FIVEM/MAIN/alpine/opt/cfx-server/crashes/7a3e30b6-6309-4a1e-12abd89a-da288941.dmp.
Crash report ID: bc54b898-55ee-417b-81d0-bfc57a5c0d20
=================================================================
> txaEvent "serverShuttingDown" "{ļ¼‚delayļ¼‚:5000,ļ¼‚authorļ¼‚:ļ¼‚txAdminļ¼‚,ļ¼‚messageļ¼‚:ļ¼‚Server se restartuje: (Server se zaustavio).ļ¼‚}"

On our server, it happens when there is Ā± 190 players And people say server crashes when a blimp drops down or someone explodes the gas station.

Entirely unrelated to the crashes that have been discussed in this topic so far. If that is a thing you should provide dumps for that scenario separately.

1 Like

Semi-related (but not reallyā€¦ was looking through the uses of HttpResponse).

Should ResourceHttpComponent be ending or closing the HttpResponse if there is no handler associated with a resource?

In theory, this would be fine once all references vanishā€¦ but this is indeed a little bit weird-looking.

There was another thread running about this in parallel, this has been closed and is redirected here now:

(yes, the other thread is technically older but this one had more recent activity and is the one that usually ends up found instead)

We found out that we are getting rate limits everytime before crashes. Also i think itā€™s related to the internet, cuz why it would hand rdp before crash in one server, why we donā€™t have logs before crash, why there is always network hitch etc. Here are some screenshots of console before crashes




Since an earlier investigation as recent as 2023-03-08T17:39:00Z resulted in claims that the heap corruption exists ā€˜as low as 5914ā€™, and theoretically ā€˜5848ā€™ was still fine, thereā€™s a new theory about what might be going on here:

As another experiment, Iā€™ve just pushed tweak(net/http-server): flag to remove EASTL usage Ā· citizenfx/fivem@6fa9f9f Ā· GitHub (build 6314), which should at least behave differently here (and might also finally show the original corruption in cases with a memory debugger attached, so if the issue still occurs there Iā€™ll probably throw an ASan build out there again).


tl;dr

(tl;dr: try again with 6314, if it still fails do upload a full dump and if itā€™s the same failure and it needs more info still thereā€™ll be an ASan build to try with again too thatā€™ll hopefully catch it unlike last time)

This seems to be a print from this ā€˜SekulBanSysteā€™ resource. What does this resource do that makes it send requests that ā€˜get rate limitedā€™, and when do these get tripped initially?

That behavior also seems to add up with some sort of deliberate attack again, by the way.

We are now on 5855 and have crashes all time

It is discord logs when someone is joining

After having like 30 crashes in a row after starts.

We have disabled every single http request, making only exeption for TXAdmin. We are running still,

I will keep it disabled for 2 days or more to confirm that problem is gone (we had have at least 5 crashes daily)

function PerformHttpRequest(url, cb, method, data, headers, options)
    if GetCurrentResourceName() == 'monitor' then
        local followLocation = true
        
        if options and options.followLocation ~= nil then
            followLocation = options.followLocation
        end
    
        local t = {
            url = url,
            method = method or 'GET',
            data = data or '',
            headers = headers or {},
            followLocation = followLocation
        }

        local id = PerformHttpRequestInternalEx(t)

        if id ~= -1 then
            httpDispatch[id] = cb
        else
            cb(0, nil, {}, 'Failure handling HTTP request')
        end
    else
        cb(0, nil, {}, 'Failure handling HTTP request')
    end
end

Paste this in scheduler to disable every httprequest

Third timeā€™s the charm, christ.

Since apparently the D word is a very bad word (even though it was already used in this thread) I am going to censor this url.

https://verybadword.com/developers/docs/topics/rate-limits#exceeding-a-rate-limit-example-exceeded-resource-rate-limit-response

1 Like

What else we can use to have logs?

Although true, that is not relevant for this thread.
So letā€™s keep the subject centered on the crashes and solutions, shall we?!