Server crash on version 6415

ATomas · April 26, 2023, 2:34pm

Hello,
this is syslog:

Apr 26 14:17:04 gh-ds-1001 kernel: [825653.927897] luv_svMain[276912]: segfault at 7f0eb086b038 ip 00007f0f30e69371 sp 00007f0f2d414220 error 6 in libcitizen-server-state-fivesv.so[7f0f30e29000+152000]
Apr 26 14:17:04 gh-ds-1001 kernel: [825653.927909] Code: 49 89 fd 64 48 8b 04 25 28 00 00 00 48 89 84 24 a0 00 00 00 48 8b 0a 48 8b 81 d0 01 00 00 48 8b 99 d8 01 00 00 48 85 db 74 48 <f0> 83 43 08 01 49 89 45 00 49 89 5d 08 f0 83 43 08 01 48 8b 43 08
Apr 26 14:17:04 gh-ds-1001 mariadbd[127144]: 2023-04-26 16:17:04 35129 [Warning] Aborted connection 35129 to db: 'fivedata' user: 'fiveusr' host: 'localhost' (Got an error reading communication packets)
Apr 26 14:17:04 gh-ds-1001 mariadbd[127144]: 2023-04-26 16:17:04 35139 [Warning] Aborted connection 35139 to db: 'fivedata' user: 'fiveusr' host: 'localhost' (Got an error reading communication packets)
Apr 26 14:17:04 gh-ds-1001 mariadbd[127144]: 2023-04-26 16:17:04 34986 [Warning] Aborted connection 34986 to db: 'fivedata' user: 'fiveusr' host: 'localhost' (Got an error reading communication packets)
Apr 26 14:17:04 gh-ds-1001 mariadbd[127144]: 2023-04-26 16:17:04 34982 [Warning] Aborted connection 34982 to db: 'fivedata' user: 'fiveusr' host: 'localhost' (Got an error reading communication packets)
Apr 26 14:17:04 gh-ds-1001 mariadbd[127144]: 2023-04-26 16:17:04 35138 [Warning] Aborted connection 35138 to db: 'fivedata' user: 'fiveusr' host: 'localhost' (Got an error reading communication packets)
Apr 26 14:17:04 gh-ds-1001 mariadbd[127144]: 2023-04-26 16:17:04 34980 [Warning] Aborted connection 34980 to db: 'fivedata' user: 'fiveusr' host: 'localhost' (Got an error reading communication packets)
Apr 26 14:17:04 gh-ds-1001 mariadbd[127144]: 2023-04-26 16:17:04 34984 [Warning] Aborted connection 34984 to db: 'fivedata' user: 'fiveusr' host: 'localhost' (Got an error reading communication packets)
Apr 26 14:17:04 gh-ds-1001 mariadbd[127144]: 2023-04-26 16:17:04 34983 [Warning] Aborted connection 34983 to db: 'fivedata' user: 'fiveusr' host: 'localhost' (Got an error reading communication packets)
Apr 26 14:17:04 gh-ds-1001 mariadbd[127144]: 2023-04-26 16:17:04 34981 [Warning] Aborted connection 34981 to db: 'fivedata' user: 'fiveusr' host: 'localhost' (Got an error reading communication packets)
Apr 26 14:17:04 gh-ds-1001 mariadbd[127144]: 2023-04-26 16:17:04 34985 [Warning] Aborted connection 34985 to db: 'fivedata' user: 'fiveusr' host: 'localhost' (Got an error reading communication packets)

Client

Using canary? no
Windows version: 10

Server

Operating system: Ubuntu 23.04
Artifact version: 6415
Resources: 80
System specifications:
AMD Ryzen 9 5900X
RAM 128GB

e6f15911-d465-4f6a-993f6ea6-6db49d40.dmp (3.9 MB)

nta · April 26, 2023, 9:28pm

‘syslog’ is entirely useless as are Linux .dmp files, though this looks to be a case of GetClientData failing due to a weird fx::Client pointer.

Can you provide any additional context at all, or the actual server output, or anything?

ATomas · April 26, 2023, 10:12pm

Thank you for answer. I can provide anything. I write my own resource in peace and I don’t have any frameworks, i.e. I have 100% overview of the code. So if you point me in the right direction, I can figure out the problem. If the problem is in ‘GetClientData’, in which native servers is it used? I can grant a code review for a suspect server native.

PS: From what you found out it’s related to ‘GetClientData’ ?

gottfriedleibniz · April 27, 2023, 2:29am

Looking at the dump, svMain looks to be executing GET_PLAYER_ROUTING_BUCKET.

Given that no registers/stack-values look zeroed, I wonder if multiple threads are racing to create the initial m_syncData (given the copy assignment operator of shared_ptr is not atomic - this seems possible). For example, three potentially competing paths:

svMain: ProcessServerFrame/GetEntityLockdownMode
svSync: Tick/UpdateWorldGrid
svNetwork: ProcessPacket/SendObjectIds

And comparing Windows/Linux artifacts, the Windows implementation looks way more durable and less likely to crash outright.

AvarianKnight · April 27, 2023, 2:46am

Could this be potentially related to this crash

nta · April 27, 2023, 1:26pm

Perhaps. I don’t really like the lazy creation here at all, it’s weird and difficult/impossible to clean up properly, so I’ve moved it to be handled by OnClientCreated.

There’s some other fx::Client lifetime concerns elsewhere too (such as @AvarianKnight’s post) but I thought that was related to freeing and should’ve been mitigated by one of the recent changes in ClientSharedPtr stuff.

ATomas · April 27, 2023, 1:59pm

Thanks for your advice I appreciate it. In all reosources, I put print(“…”) before the native ‘GET_PLAYER_ROUTING_BUCKET’ to locate in which part of the code the crash occurs. I’ll get back to you as soon as I find out more details.

gottfriedleibniz · April 27, 2023, 4:16pm

Update your server artifacts to 6419 to see if today’s changes have helped your case.

Also, that linked report is using 5848. A bit dated and not worthwhile to consider if there is anything new or actionable within it given recent changes.

ATomas · April 27, 2023, 10:04pm

Thank you. I’ve updated the server to 6419. I’ll let you know what happens. Thank you for your cooperation!

goncalobsccosta · April 28, 2023, 12:09am

I’m also updating to that version, I was using 5848 because it’s the recommended version.

Edit: The server crashes on start CRASH

ATomas · April 28, 2023, 11:25am

I have more information about crash. I updated server to 6419.
syslog:

Apr 28 13:17:44 gh-ds-1001 kernel: [987694.109562] luv_svMain[335190]: segfault at 7fd4156df0c8 ip 00007fd4821d187e sp 00007fd47e77d2e0 error 6 in libcitizen-server-state-fivesv.so[7fd482192000+151000]
Apr 28 13:17:44 gh-ds-1001 kernel: [987694.109575] Code: 25 28 00 00 00 48 89 44 24 18 49 bc 01 00 00 00 01 00 00 00 48 8b 0a 48 8b 81 d0 01 00 00 48 8b 99 d8 01 00 00 48 85 db 74 39 <f0> 83 43 08 01 48 89 44 24 08 48 89 5c 24 10 f0 83 43 08 01 48 8b

3494acd5-8bc3-4ac8-8a63209a-af479696.dmp (3.8 MB)

And server log:

e[38;5;171m[       script:mychat] e[0me[0m[13:17:41] JohnHonza(5): j
e[38;5;171m[       script:mychat] e[0me[0m[13:17:43] JohnHonza(5): taky 
e[38;5;57m[          script:acc] e[0me[0mplayerDropped	30	Server->client connection timed out. Last seen 28 msec ago.
e[38;5;57m[          script:acc] e[0me[0mJohnHonza
e[38;5;73m[ citizen-server-impl] e[0me[0mnetwork thread hitch warning: timer interval of 158 milliseconds


=================================================================
e[31mFXServer crashed.e[0m
A dump can be found at /alpine/opt/cfx-server/crashes/3494acd5-8bc3-4ac8-8a63209a-af479696.dmp.
Crash report ID: d5eec8a1-a6b9-4b19-b29d-1a6443a9b8e0
=================================================================

And here is code that printed this message before crash:

AddEventHandler("playerDropped",function(reason)
    local id = source
    if serverids[id] then
        playerids[serverids[id]] = nil
    end
    serverids[id] = nil

    if acc[id] then
        print("playerDropped",id,reason)
        print(GetPlayerName(id))
        local pos = GetPlayerRoutingBucket(id) == 0 and GetEntityCoords(GetPlayerPed(id)) or vector3(0.0,0.0,0.0)
        MySQL.Async.execute("UPDATE user SET disc = CURRENT_TIMESTAMP, x = @x, y = @y, z = @z WHERE nick = @nick",
        {
            ["@x"] = pos.x,
            ["@y"] = pos.y,
            ["@z"] = pos.z,
            ["@nick"] = GetPlayerName(id)
        },
        function() 
        end)
    end

    LogPlayer(id,"disconnect",GetPlayerEndpoint(id).." "..reason)

    acc[id] = nil
    SharedData[id] = nil

    TriggerClientEvent("fivem:OnPlayerDisconnect",-1,GetPlayerName(id),id)
    print(os.date("[%X] ").."Disconnect: "..GetPlayerName(id).." Reason: "..reason)
end)

(update to database was not executed)

What is wrong?

nta · April 28, 2023, 3:54pm

Right, this might be something specifically involving calling a function depending on ‘sync data’ (like getting routing buckets) from the playerDropped event in your case.

Stack (from Sentry):

Thread 335190 Crashed:
0   libcitizen-server-state-fivesv.so0x7fd4821d187e      [inlined] __gnu_cxx::__atomic_add (atomicity.h:71)
1   libcitizen-server-state-fivesv.so0x7fd4821d187e      [inlined] __gnu_cxx::__atomic_add_dispatch (atomicity.h:111)
2   libcitizen-server-state-fivesv.so0x7fd4821d187e      [inlined] std::_Sp_counted_base<T>::_M_add_ref_copy (shared_ptr_base.h:152)
3   libcitizen-server-state-fivesv.so0x7fd4821d187e      [inlined] std::__shared_count<T>::__shared_count (shared_ptr_base.h:1078)
4   libcitizen-server-state-fivesv.so0x7fd4821d187e      [inlined] std::__shared_ptr<T>::__shared_ptr (shared_ptr_base.h:1522)
5   libcitizen-server-state-fivesv.so0x7fd4821d187e      [inlined] std::shared_ptr<T>::shared_ptr (shared_ptr.h:204)
6   libcitizen-server-state-fivesv.so0x7fd4821d187e      [inlined] fx::Client::GetSyncData (Client.h:268)
7   libcitizen-server-state-fivesv.so0x7fd4821d187e      [inlined] fx::GetClientDataUnlocked (ServerGameState.cpp:321)
8   libcitizen-server-state-fivesv.so0x7fd4821d187e      fx::GetClientData (ServerGameState.cpp:331)
9   libcitizen-server-state-fivesv.so0x7fd4821a61f8      fx::ServerGameState::ExternalGetClientData (ServerGameState.cpp:344)
10  libcitizen-server-state-fivesv.so0x7fd4822258c5      [inlined] Init::lambda::operator() (ServerGameState_Scripting.cpp:1309)
11  libcitizen-server-state-fivesv.so0x7fd4822258c5      [inlined] MakeClientFunction<T>::lambda::operator() (MakeClientFunction.h:42)
12  libcitizen-server-state-fivesv.so0x7fd4822258c5      [inlined] std::__invoke_impl<T> (invoke.h:61)
13  libcitizen-server-state-fivesv.so0x7fd4822258c5      [inlined] std::__invoke_r<T> (invoke.h:111)
14  libcitizen-server-state-fivesv.so0x7fd4822258c5      std::_Function_handler<T>::_M_invoke (std_function.h:290)
15  libcitizen-scripting-core.so    0x7fd485dadb9c      [inlined] std::function<T>::operator() (std_function.h:591)
16  libcitizen-scripting-core.so    0x7fd485dadb9c      fx::TestScriptHost::InvokeNative (ScriptHost.cpp:325)
17  libcitizen-scripting-lua.so     0x7fd481f3dd5b      [inlined] __Lua_InvokeNative<T> (LuaScriptNatives.cpp:665)
18  libcitizen-scripting-lua.so     0x7fd481f3dd5b      Lua_InvokeNative (LuaScriptNatives.cpp:837)
19  libcitizen-scripting-lua.so     0x7fd481f8baa6      luaD_precall (ldo.c:337)
20  libcitizen-scripting-lua.so     0x7fd481fdb47f      luaV_execute (lvm.c:1503)
21  libcitizen-scripting-lua.so     0x7fd481f8b0d2      luaD_rawrunprotected (ldo.c:142)
22  libcitizen-scripting-lua.so     0x7fd481f8be85      lua_resume (ldo.c:579)
23  libcitizen-scripting-lua.so     0x7fd481f37c42      fx::LuaScriptRuntime::RunBookmark (LuaScriptRuntime.cpp:938)
24  libcitizen-scripting-lua.so     0x7fd481f3f706      fx::Lua_CreateThreadInternal (LuaScriptRuntime.cpp:1118)
25  libcitizen-scripting-lua.so     0x7fd481f3db29      fx::Lua_CreateThreadNow (LuaScriptRuntime.cpp:1137)
26  libcitizen-scripting-lua.so     0x7fd481f8baa6      luaD_precall (ldo.c:337)
27  libcitizen-scripting-lua.so     0x7fd481fda47e      luaV_execute (lvm.c:1488)
28  libcitizen-scripting-lua.so     0x7fd481f8bdea      luaD_call (ldo.c:422)
29  libcitizen-scripting-lua.so     0x7fd481f8b0d2      luaD_rawrunprotected (ldo.c:142)
30  libcitizen-scripting-lua.so     0x7fd481f8c71a      luaD_pcall (ldo.c:644)
31  libcitizen-scripting-lua.so     0x7fd481f82e5d      lua_pcallk (lapi.c:1237)
32  libcitizen-scripting-lua.so     0x7fd481f3f335      [inlined] fx::Lua_SetEventRoutine::lambda::operator() (LuaScriptRuntime.cpp:443)
33  libcitizen-scripting-lua.so     0x7fd481f3f335      [inlined] std::__invoke_impl<T> (invoke.h:61)
34  libcitizen-scripting-lua.so     0x7fd481f3f335      [inlined] std::__invoke_r<T> (invoke.h:111)
35  libcitizen-scripting-lua.so     0x7fd481f3f335      std::_Function_handler<T>::_M_invoke (std_function.h:290)
36  libcitizen-scripting-lua.so     0x7fd481f3be66      [inlined] std::function<T>::operator() (std_function.h:591)
37  libcitizen-scripting-lua.so     0x7fd481f3be66      fx::LuaScriptRuntime::TriggerEvent (LuaScriptRuntime.cpp:1701)
38  libcitizen-scripting-lua.so     0x7fd481f3c1a9      {virtual override thunk}
39  libcitizen-scripting-core.so    0x7fd485da8bee      [inlined] fx::ResourceScriptingComponent::CreateEnvironments::lambda::operator() (ResourceScriptingComponent.cpp:298)
40  libcitizen-scripting-core.so    0x7fd485da8bee      [inlined] std::__invoke_impl<T> (invoke.h:61)
41  libcitizen-scripting-core.so    0x7fd485da8bee      [inlined] std::__invoke<T> (invoke.h:96)
42  libcitizen-scripting-core.so    0x7fd485da8bee      [inlined] std::invoke<T> (functional:110)
43  libcitizen-scripting-core.so    0x7fd485da8bee      [inlined] fwEvent<T>::Connect<T>::lambda::operator() (EventCore.h:299)
44  libcitizen-scripting-core.so    0x7fd485da8bee      [inlined] std::__invoke_impl<T> (invoke.h:61)
45  libcitizen-scripting-core.so    0x7fd485da8bee      [inlined] std::__invoke_r<T> (invoke.h:114)
46  libcitizen-scripting-core.so    0x7fd485da8bee      std::_Function_handler<T>::_M_invoke (std_function.h:290)
47  libcitizen-resources-core.so    0x7fd4867d4f9b      [inlined] std::function<T>::operator() (std_function.h:591)
48  libcitizen-resources-core.so    0x7fd4867d4f9b      [inlined] std::__invoke_impl<T> (invoke.h:61)
49  libcitizen-resources-core.so    0x7fd4867d4f9b      [inlined] std::__invoke<T> (invoke.h:96)
50  libcitizen-resources-core.so    0x7fd4867d4f9b      [inlined] std::invoke<T> (functional:110)
51  libcitizen-resources-core.so    0x7fd4867d4f9b      [inlined] fwEvent<T>::operator() (EventCore.h:400)
52  libcitizen-resources-core.so    0x7fd4867d4f9b      [inlined] fx::ResourceEventComponent::HandleTriggerEvent (ResourceEventComponent.cpp:136)
53  libcitizen-resources-core.so    0x7fd4867d4f9b      [inlined] fx::ResourceEventManagerComponent::TriggerEvent::lambda::operator() (ResourceEventComponent.cpp:206)
54  libcitizen-resources-core.so    0x7fd4867d4f9b      fx::ResourceEventManagerComponent::TriggerEvent (ResourceEventComponent.cpp:219)
55  libcitizen-server-impl.so       0x7fd48335a22e      fx::ResourceEventManagerComponent::TriggerEvent2<T> (ResourceEventComponent.h:157)
56  libcitizen-server-impl.so       0x7fd483382530      fx::GameServer::DropClientInternal (GameServer.cpp:1124)
57  libcitizen-server-impl.so       0x7fd483382216      fx::GameServer::DropClientv (GameServer.cpp:1095)
58  libcitizen-server-impl.so       0x7fd48337f785      [inlined] fx::GameServer::DropClient<T> (GameServer.h:76)
59  libcitizen-server-impl.so       0x7fd48337f785      fx::GameServer::ProcessServerFrame (GameServer.cpp:944)
60  libcitizen-server-impl.so       0x7fd483384e38      [inlined] const::lambda::operator() (GameServer.cpp:219)

(@goncalobsccosta’s issue is separate so I’ll handle that at another time)

nta · April 29, 2023, 9:17am

It turns out both crashes might be related to a client being dropped twice: the other dump is after ServerGameState::HandleClientDrop gets called while the client’s ‘sync data’ pointer is already null.

The other dump might rather be a case of a regression in the fix earlier where dropping a deferral twice would crash now.

I don’t think I’m going to be able to actually look into or successfully fix this any further, sorry - I’ve no idea what is even going on at all.

nta · April 29, 2023, 9:35am

Pushed another change that might help if it doesn’t break even more: fix(server): a few sync data correctness fixes · citizenfx/fivem@96bf5a4 · GitHub

ATomas · April 29, 2023, 2:17pm

Thanks, I’ll update the server to 6424 and let you know what happens.

ATomas · April 30, 2023, 1:32pm

I have more information. Server version is 6424.
Syslog:

Apr 30 10:43:49 gh-ds-1001 kernel: [1151259.213705] luv_svMain[381648]: segfault at 23 ip 00007f80f719669f sp 00007f80edb602c8 error 4 in ld-musl-x86_64.so.1[7f80f7185000+4c000]
Apr 30 10:43:49 gh-ds-1001 kernel: [1151259.213716] Code: 80 7f fc 00 74 12 85 d2 74 01 f4 48 63 57 f8 81 fa ff ff 00 00 7f 01 f4 89 d0 c1 e0 04 48 98 48 29 c7 48 8b 47 f0 48 83 ef 10 <48> 39 78 10 74 01 f4 40 8a 78 20 83 e7 1f 39 f7 7d 01 f4 8b 78 18

f010b827-2819-451c-64ee9aa4-01872cda.dmp (3.8 MB)

Server log:

e[38;5;57m[          script:acc] e[0me[0m[10:43:49] Disconnect: _K_r_p_a_t_a_ Reason: Server->client connection timed out. Last seen 13 msec ago.
e[38;5;57m[          script:acc] e[0me[0mplayerDropped	128	Server->client connection timed out. Last seen 13 msec ago.
e[38;5;57m[          script:acc] e[0me[0m_K_r_p_a_t_a_
e[38;5;57m[          script:acc] e[0me[0mplayerDropped2
e[38;5;57m[          script:acc] e[0me[0mplayerDropped3
e[38;5;57m[          script:acc] e[0me[0mplayerDropped4
e[38;5;57m[          script:acc] e[0me[0mplayerDropped5
e[38;5;73m[ citizen-server-impl] e[0me[0msync thread hitch warning: timer interval of 152 milliseconds


=================================================================
e[31mFXServer crashed.e[0m
A dump can be found at /30120/alpine/opt/cfx-server/crashes/f010b827-2819-451c-64ee9aa4-01872cda.dmp.
Crash report ID: 36631bd0-7358-40fe-88cb-3066226c0e01
=================================================================

And playerDropped:


AddEventHandler("playerDropped",function(reason)
    local id = source
    local name = GetPlayerName(id)
    local ip = GetPlayerEndpoint(id)

    print(os.date("[%X] ").."Disconnect: "..name.." Reason: "..reason)

    if serverids[id] then
        playerids[serverids[id]] = nil
    end
    serverids[id] = nil

    if acc[id] then
        print("playerDropped",id,reason)
        print(name)
        local ped = GetPlayerPed(id)
        local pos = vector3(0.0,0.0,0.0)
        if GetSharedData(id,"VW") == 0 and ped ~= 0 then
            pos = GetEntityCoords(ped)
        end
        MySQL.Async.execute("UPDATE user SET disc = CURRENT_TIMESTAMP, x = @x, y = @y, z = @z WHERE nick = @nick",
        {
            ["@x"] = pos.x,
            ["@y"] = pos.y,
            ["@z"] = pos.z,
            ["@nick"] = name
        },
        function() 
        end)
    end

    print("playerDropped2")

    LogPlayer(id,"disconnect",ip.." "..reason)

    print("playerDropped3")

    acc[id] = nil
    SharedData[id] = nil

    print("playerDropped4")

    TriggerClientEvent("fivem:OnPlayerDisconnect",-1,name,id)

    print("playerDropped5")
end)

Thanks for help.

gottfriedleibniz · May 3, 2023, 8:12am

It is a bit late, so I don’t have this fully fleshed out…

Consider auth->RunAuthentication doing an HTTP request: its callback will be executed on another thread; causing done to be run on that thread. The contents of request->SetCancelHandler may then race with the subsequent execute_callback_on_main_thread in a few places. Creating pathways for RemoveClient + client->OnDrop(); to be called twice.

Side question: Should the m_keepAliveTimer + TimerEvent connection handle be disposed of after use? Seems a bit leaky.

nta · May 3, 2023, 9:46am

Does this happen on 6425 too?

Yeah, the temp client logic in deferrals looked extremely fishy, but I didn’t look into it too much. RemoveClient being invoked twice would make sense.

I also don’t really like the way RemoveClient calls are sprinkled around there so much, but refactoring it is more risky without the ability to verify changes in a realistic environment.

I made some changes around there anyway, but I don’t particularly trust them: tweak(server): initConnect client lifetime tweaks · citizenfx/fivem@b858708 · GitHub

~~Right, as uvw::Handle objects don’t close on destruction, this probably is an oversight.~~ No, huh, m_keepAliveTimer does get cleaned up in fx::ClientDeferral::~ClientDeferral.

gottfriedleibniz · May 3, 2023, 3:22pm

I don’t think the dtor to ClientDeferral is ever being invoked:

The SetCardResponseHandler lambda captures self by value and that creates a circular dependency in the ClientDeferral shared_ptr (never allowing its use_count to hit zero). Making things weak is an easy fix.

Mega-unrelated side note: It would also be nice if this WriteColor could be wrapped in a g_allowVt check.

ATomas · May 3, 2023, 8:05pm

I updated the server to 6431. I’ll let you know if it helped.