Wine with a bit of extra spice
OTHER License
This is eventfd-based synchronization, or 'esync' for short. Turn it on with WINEESYNC=1; debug it with +esync.
== BUGS AND LIMITATIONS ==
Please let me know if you find any bugs. If you can, also attach a log with +seh,+pid,+esync,+server,+timestamp.
If you get something like "eventfd: Too many open files" and then things start crashing, you've probably run out of file descriptors. esync creates one eventfd descriptor for each synchronization object, and some games may use a large number of these. Linux by default limits a process to 4096 file descriptors, which probably was reasonable back in the nineties but isn't really anymore. (Fortunately Debian and derivatives [Ubuntu, Mint] already have a reasonable limit.) To raise the limit you'll want to edit /etc/security/limits.conf and add a line like
then restart your session.
On distributions using systemd, the settings in /etc/security/limits.conf
will be overridden by systemd's own settings. If you run ulimit -Hn
and it
returns a lower number than the one you've previously set, then you can set
DefaultLimitNOFILE=1024:1048576
in both /etc/systemd/system.conf
and /etc/systemd/user.conf
. You can then
execute sudo systemctl daemon-reexec
and restart your session. Check again
with ulimit -Hn
that the limit is correct.
Also note that if the wineserver has esync active, all clients also must, and vice versa. Otherwise things will probably crash quite badly.
== EXPLANATION ==
The aim is to execute all synchronization operations in "user-space", that is, without going through wineserver. We do this using Linux's eventfd facility. The main impetus to using eventfd is so that we can poll multiple objects at once; in particular we can't do this with futexes, or pthread semaphores, or the like. The only way I know of to wait on any of multiple objects is to use select/poll/epoll to wait on multiple fds, and eventfd gives us those fds in a quite usable way.
Whenever a semaphore, event, or mutex is created, we have the server, instead of creating a traditional server-side event/semaphore/mutex, instead create an 'esync' primitive. These live in esync.c and are very slim objects; in fact, they don't even know what type of primitive they are. The server is involved at all because we still need a way of creating named objects, passing handles to another process, etc.
The server creates an eventfd file descriptor with the requested parameters and passes it back to ntdll. ntdll creates an object of the appropriate type, then caches it in a table. This table is copied almost wholesale from the fd cache code in server.c.
Specific operations follow quite straightforwardly from eventfd:
The interesting part about esync is that (almost) all waits happen in ntdll, including those on server-bound objects. The idea here is that on the server side, for any waitable object, we create an eventfd file descriptor (not an esync primitive), and then pass it to ntdll if the program tries to wait on it. These are cached too, so only the first wait will require a round trip to the server. Then the server signals the file descriptor as appropriate, and thereby wakes up the client. So far this is implemented for processes, threads, message queues (difficult; see below), and device managers (necessary for drivers to work). All of these are necessarily server-bound, so we wouldn't really gain anything by signalling on the client side instead. Of course, except possibly for message queues, it's not likely that any program (cutting-edge D3D game or not) is going to be causing a great wineserver load by waiting on any of these objects; the motivation was rather to provide a way to wait on ntdll-bound and server-bound objects at the same time.
Some cases are still passed to the server, and there's probably no reason not to keep them that way. Those that I noticed while testing include: async objects, which are internal to the file APIs and never exposed to userspace, startup_info objects, which are internal to the loader and signalled when a process starts, and keyed events, which are exposed through an ntdll API (although not through kernel32) but can't be mixed with other objects (you have to use NtWaitForKeyedEvent()). Other cases include: named pipes, debug events, sockets, and timers. It's unlikely we'll want to optimize debug events or sockets (or any of the other, rather rare, objects), but it is possible we'll want to optimize named pipes or timers.
There were two sort of complications when working out the above. The first one was events. The trouble is that (1) the server actually creates some events by itself and (2) the server sometimes manipulates events passed by the client. Resolving the first case was easy enough, and merely entailed creating eventfd descriptors for the events the same way as for processes and threads (note that we don't really lose anything this way; the events include "LowMemoryCondition" and the event that signals system processes to shut down). For the second case I basically had to hook the server-side event functions to redirect to esync versions if the event was actually an esync primitive.
The second complication was message queues. The difficulty here is that X11 signals events by writing into a pipe (at least I think it's a pipe?), and so as a result wineserver has to poll on that descriptor. In theory we could just let wineserver do so and then signal us as appropriate, except that wineserver only polls on the pipe when the thread is waiting for events (otherwise we'd get e.g. keyboard input while the thread is doing something else, and spin forever trying to wake up a thread that doesn't care). The obvious solution is just to poll on that fd ourselves, and that's what I didit's just that getting the fd from wineserver was kind of ugly, and the code for waiting was also kind of ugly basically because we have to wait on both X11's fd and the "normal" process/thread-style wineserver fd that we use to signal sent messages. The upshot about the whole thing was that races are basically impossible, since a thread can only wait on its own queue.
System APCs already work, since the server will forcibly suspend a thread if it's not already waiting, and so we just need to check for EINTR from poll(). User APCs and alertable waits are implemented in a similar style to message queues (well, sort of): whenever someone executes an alertable wait, we add an additional eventfd to the list, which the server signals when an APC arrives. If that eventfd gets signaled, we hand it off to the server to take care of, and return STATUS_USER_APC.
Originally I kept the volatile state of semaphores and mutexes inside a variable local to the handle, with the knowledge that this would break if someone tried to open the handle elsewhere or duplicate it. It did, and so now this state is stored inside shared memory. This is of the POSIX variety, is allocated by the server (but never mapped there) and lives under the path "/wine-esync".
There are a couple things that this infrastructure can't handle, although surprisingly there aren't that many. In particular:
There are some things that are perfectly implementable but that I just haven't done yet:
This patchset was inspired by Daniel Santos' "hybrid synchronization" patchset. My idea was to create a framework whereby even contended waits could be executed in userspace, eliminating a lot of the complexity that his synchronization primitives used. I do however owe some significant gratitude toward him for setting me on the right path.
I've tried to maximize code separation, both to make any potential rebases easier and to ensure that esync is only active when configured. All code in existing source files is guarded with "if (do_esync())", and generally that condition is followed by "return esync_version_of_this_method(...);", where the latter lives in esync.c and is declared in esync.h. I've also tried to make the patchset very clear and readableto write it as if I were going to submit it upstream. (Some intermediate patches do break things, which Wine is generally against, but I think it's for the better in this case.) I have cut some corners, though; there is some error checking missing, or implicit assumptions that the program is behaving correctly.
I've tried to be careful about races. There are a lot of comments whose purpose are basically to assure me that races are impossible. In most cases we don't have to worry about races since all of the low-level synchronization is done by the kernel.
Anyway, yeah, this is esync. Use it if you like.
--Zebediah Figura