Select/Drag & drop hang CEF entirely

Hello, after a lot of testing, I have finally managed to repro a weird bug that I’ve encountered with the CEF layer when draging and selecting things.

The repro steps are effective to reproduce a CEF freeze, but there are probably other methods to trigger a freeze, I know this because many players are reporting a freeze after accidental drags that are clearly not related to changing window focus, this is just the most convienient way to induce this bug I found.

Repro steps :

  • Have a resource that allow selecting things (Example repro : GitHub - Ekinoxx0/nui-hang-repro)
  • This resource needs to have SetNuiFocusKeepInput enabled
  • Select text, then exit focus of the window, then drag the content (Best replicated while selecting text, spamming “Windows” key and dragging at the same time)

The CEF seems to enter a frozen state, and hangs indefinitly, this will also block dragging on all other windows (like dragging files on Explorer).
This frozen state can be exited if you had nui_devtools window opened before trying the repro, and you focus back on the dev tool window.
The CEF remote debug tool (localhost:13172) will not be accessible during the frozen state.
This CEF frozen state break every resources, not only the one doing the repro.

Example video :

This has been tested on multiple configuration and by multiple players.
I also searched for this specific bug on the Cfx discord, and found a possible match for this bug dating to 02/10/2022 : Discord

Thanks for all your hard work.
Good luck fixing this :+1: I had a hard time finding this repro (players reporting weird behavior without being specific)

Native HTML5 drag and drop does not work and should not be used. jQuery UI Drag and Drop is recommended by most.

Seems to me like this is related, but I’m not trying to implement any drag and drop feature here.
This post mention user select being disabled by default but it’s not the case here, so this behavior could have changed since.

In this case dragging even accidently shouldn’t cause a hang to a normal user, it has happened that users selected text to copy it and accidentally caused this freeze

Note that i’m using dnd-kit where needed (in my react inventory resource) like dbub recommend in this post, but dnd-kit or jQuery dnd will not solve the problem specified here.

An other note : this repro doesn’t contains a single line of js, and using js will not solve this problem. A temporary solution would be to disable user-select and user-drag with css, but this shouldn’t be expected to be added by developers I suppose and will not solve the problem in input fields that force user selection and drag

Thanks anyway for the help, you probably answered with good faith and your link was still a great read

Thats fine, given the title of the subject it came across an attempt of HTML5 Drag and Drop, which as you’ve stated it is not. Very likely dragging from game window and losing focus is causing the system wait on that event to complete, but as its not supported then its most likely an edge case for players using the game in a window, or as you’ve shown changing the focus off the game.

Suitable solution is to disable text selection with CSS user-select: none; and seeing if the same issue still happens.

The problem is that it happens even when user-select is set to none. Because you can’t disable text selection in an input and even on the fact that I don’t want to deactivate the selection.

As Eki said, the freeze only occurs when one of the natives is called.

This is in fact a side effect of the partial implementation of this in CEF/host bits, but that doesn’t disqualify this being a confusing bug. :confused:

It also is not related to selection, since dragging images e.g. still initiates a drag even if not selecting anything.


Breakage here seems to be the CrBrowserMain thread getting stuck during DoDragDrop:

 	win32u.dll!NtUserPeekMessage()	Unknown
 	user32.dll!_PeekMessage()	Unknown
 	user32.dll!PeekMessageW()	Unknown
 	GameOverlayRenderer64.dll!00007ffb57a87674()	Unknown
 	ole32.dll!CDragOperation::RuntimeClassInitialize(IDataObject * pDataObject, IDropSource * pDropSource, unsigned long dwOKEffects, unsigned long * pdwEffect, int fWinRTDrag, void * * phClientToken, int fBackgroundContainerDrag) Line 1435	C++
 	ole32.dll!DoDragDrop(IDataObject * pDataObject, IDropSource * pDropSource, unsigned long dwOKEffects, unsigned long * pdwEffect) Line 3091	C++
>	nui-core.dll!DropTargetWin::StartDragging(scoped_refptr<CefBrowser> browser, scoped_refptr<CefDragData> drag_data, cef_drag_operations_mask_t allowed_ops, int x, int y) Line 405	C++
 	nui-core.dll!NUIRenderHandler::StartDragging(scoped_refptr<CefBrowser> browser, scoped_refptr<CefDragData> drag_data, cef_drag_operations_mask_t allowed_ops, int x, int y) Line 279	C++
 	nui-core.dll!`anonymous namespace'::render_handler_start_dragging(_cef_render_handler_t * self, _cef_browser_t * browser, _cef_drag_data_t * drag_data, cef_drag_operations_mask_t allowed_ops, int x, int y) Line 399	C++
 	libcef.dll!CefRenderHandlerCToCpp::StartDragging(scoped_refptr<CefBrowser> browser, scoped_refptr<CefDragData> drag_data, <unnamed-tag> allowed_ops, int x, int y) Line 343	C++
 	libcef.dll!CefBrowserPlatformDelegateOsr::StartDragging(const content::DropData & drop_data, blink::DragOperationsMask allowed_ops, const gfx::ImageSkia & image, const gfx::Vector2d & image_offset, const blink::mojom::DragEventSourceInfo & event_info, content::RenderWidgetHostImpl * source_rwh) Line 508	C++
 	libcef.dll!CefWebContentsViewOSR::StartDragging(const content::DropData & drop_data, blink::DragOperationsMask allowed_ops, const gfx::ImageSkia & image, const gfx::Vector2d & image_offset, const blink::mojom::DragEventSourceInfo & event_info, content::RenderWidgetHostImpl * source_rwh) Line 166	C++
 	libcef.dll!content::RenderWidgetHostImpl::StartDragging(mojo::StructPtr<blink::mojom::DragData> drag_data, blink::DragOperationsMask drag_operations_mask, const SkBitmap & bitmap, const gfx::Vector2d & bitmap_offset_in_dip, mojo::StructPtr<blink::mojom::DragEventSourceInfo> event_info) Line 2857	C++
 	libcef.dll!blink::mojom::FrameWidgetHostStubDispatch::Accept(blink::mojom::FrameWidgetHost * impl, mojo::Message * message) Line 3121	C++
 	libcef.dll!mojo::InterfaceEndpointClient::HandleValidatedMessage(mojo::Message * message) Line 925	C++
 	libcef.dll!mojo::MessageDispatcher::Accept(mojo::Message * message) Line 43	C++
 	libcef.dll!mojo::InterfaceEndpointClient::HandleIncomingMessage(mojo::Message * message) Line 664	C++
 	libcef.dll!IPC::`anonymous namespace'::ChannelAssociatedGroupController::AcceptOnEndpointThread(mojo::Message message) Line 1010	C++

… which implies the drag is not getting canceled.

Thank you for the clarification, could it be related to not getting a Mouse Enter/Leave event? as in the past that has been an issue with other CEF WinForms or is that not the case here?

Instead of using AttachThreadInput, which seems to turn a blocking-issue into a race-condition issue (using osr_dragdrop_win from CEF as a reference implementation)…

Would it be possible to use PostThreadMessage* from CefInput (or elsewhere) on the relevant subset of messages (0x100, 0x104, 0x200-to-0x20E, etc) to emulate consistent DoDragDrop functionality? CefInput already formats-and-forwards some events to CEF, and g_isDragging exists, so it would not look too out of place.

Also semi-related comments while looking through this:

  1. Should m_dropTarget’s creation be deferred to ensure it references g_gameWindow instead of g_fallbackGameWindow?
  2. On my debug build of CEF, TranslateUiClickEvent is throwing a fit because DCHECK_GE(clickCount, 1) is failing: browser->GetHost()->SendMouseClickEvent with lastClickCount as zero.

Does this imply ‘not using OLE drag/drop at all’, or ‘forward just enough messages so that DoDragDrop is happy being called on another thread’?

The former was the most likely outcome I found here after a small amount of experimentation ~a week ago, and it is also one that would make it possible for drag thumbnails to exist so drag/drop would consistently work for use cases where the web page itself is a drop target.

(Other outcomes included somehow marshaling DoDragDrop to the render thread/thread owning the window, or worst-case running the call off-thread and having a timeout to ‘fake’ signal to CEF that system drag/drop is actually done, but then risking hanging drag/drop forever)

Getting mouse events and forwarding them to the message queue of the thread running DoDragDrop might actually be a viable solution though since this race condition is a pain to invoke and DoDragDrop’s implementation is opaque enough I didn’t directly look into that.

Also, from the MSDN page on PostThreadMessage:

Messages sent by PostThreadMessage are not associated with a window. As a general rule, messages that are not associated with a window cannot be dispatched by the DispatchMessage function. Therefore, if the recipient thread is in a modal loop (as used by MessageBox or DialogBox), the messages will be lost. To intercept thread messages while in a modal loop, use a thread-specific hook.

Would this issue also apply to DoDragDrop?


The other concerns

Was this a concern for cl2 launches?

Right. I don’t usually use debug-type builds since building/linking is already slow and repeating that process for another GN config is a pain, so this likely wasn’t caught.

This.

Before continuing it may be worth referencing/testing chromium’s other workarounds: DesktopWindowTreeHostWin::StartTouchDrag and DesktopDragDropClientWin::StartDragAndDrop. One of which seems related to what I had in mind (artificial events so DoDragDrop may run in its own event loop).

In my few experiments, I did not check whether there were regressions, or changes, in thumbnail dragging. And besides a few strategical printf insertions, I did not look for the exact reason as to what was causing the race condition. I did observe the Render thread was always in 0x1412E3D08 (1604) already working through its message queue when CrBrowserMain hung on DoDragDrop.

Debugging these types of issues is generally not fun and require somebody who measures on the masochist-scale.

I don’t believe so. Not in this instance anyway. The os_modal_loop bits in chromium are also never touched.

Just an observation looking through MSVCs debugger.