Hi, In the performance paper http://shorestreet.com/sites/shorestreet.com/... there's discussion of scheduler issues where the bus only gets the same CPU slice as each individual dependent app, so in the aggregate the apps get way too much time and the bus way too little. I was just reading Lennart's post here http://0pointer.de/blog/projects/cgroups-vs-c... where he mentions balancing mysql and httpd even though they are made up of different numbers of processes. Some docs here: http://docs.redhat.com/docs/en-US/Red_Hat_Ent... If you were to split the system into "dbus-daemon" and "everything else" and say that when both need CPU, the CPU should be divided equally between those two groups, I wonder if that would solve the scheduling problem. I don't know if it's possible. Havoc
On quarta-feira, 11 de abril de 2012 09.49.49, Havoc Pennington wrote:
> If you were to split the system into "dbus-daemon" and "everything
> else" and say that when both need CPU, the CPU should be divided
> equally between those two groups, I wonder if that would solve the
> scheduling problem. I don't know if it's possible.I think it makes sense: if D-Bus is used as a communication mechanism between
two apps, it should get as much opportunity to handle its input as those two
are generating data for the daemon.
I'm not sure we need to do an equal division of CPU time, though. In fact,
wouldn't it suffice to ensure that the daemon has a higher priority?
Also, I imagine that the applications need more CPU time between messages than
the daemon, so even if it had an less than equal share compared to the
senders, it should still suffice.
I still need to RTFP though....
At Wed, 11 Apr 2012 09:49:49 -0400,
Havoc Pennington wrote:
> In the performance paper
> http://shorestreet.com/sites/shorestreet.com/...
> there's discussion of scheduler issues where the bus only gets the
> same CPU slice as each individual dependent app, so in the aggregate
> the apps get way too much time and the bus way too little.This is a basic problem throughly examined by the real-time scheduling
community. Start looking at things like priority inheritance
(globally fairer) or priority ceiling (easier to implement).
Neal
On 2012-04-13, at 10:10 AM, Neal H. Walfield wrote:
At Wed, 11 Apr 2012 09:49:49 -0400,
> Havoc Pennington wrote:
>> In the performance paper
>> http://shorestreet.com/sites/shorestreet.com/...
>> there's discussion of scheduler issues where the bus only gets the
>> same CPU slice as each individual dependent app, so in the aggregate
>> the apps get way too much time and the bus way too little.
>
> This is a basic problem throughly examined by the real-time scheduling
> community. Start looking at things like priority inheritance
> (globally fairer) or priority ceiling (easier to implement).I am the primary author of the cited paper.
I agree that the scheduling problems discussed in the paper would be trivially solvable with real-time scheduling facilities like those mentioned above. Unfortunately, on the systems on which D-Bus operates (Windows, vanilla Linux), those in-kernel scheduling facilities are not available. I wish that they were.
There may be some good solutions available with cgroups. Windows does not have those, AFAIK.
No reason not to use these solutions on Linux mobile devices and Linux
distributions, though.
HavocOn Fri, Apr 13, 2012 at 2:32 PM, Robin Bate Boerop wrote:
> I agree that the scheduling problems discussed in the paper would be trivially solvable with real-time scheduling facilities like those mentioned above. Unfortunately, on the systems on which D-Bus operates (Windows, vanilla Linux), those in-kernel scheduling facilities are not available. I wish that they were.
>
> There may be some good solutions available with cgroups. Windows does not have those, AFAIK.
On 2012-04-13, at 3:48 PM, Havoc Pennington wrote: > No reason not to use these solutions on Linux mobile devices and Linux > distributions, though.It would be a good idea to solve the problem on platforms which admit straightforward solutions via either real-time (or other nonstandard) scheduling facilities. One argument against solving the problem this way is the following (I play devil's advocate): D-Bus is fundamentally broken performance-wise in the way described in the paper. Fixing it on some platforms and not others creates an open source product which is broken on some platforms and not others. This is a confusing story for adopters of D-Bus. Its also confusing for current users of D-Bus - they can't be sure that performance will be reasonable on platforms on which their software runs; D-Bus may or may not be broken on those platforms. Thanks, everyone, for thinking about this problem some more.> On Fri, Apr 13, 2012 at 2:32 PM, Robin Bate Boerop wrote: >> I agree that the scheduling problems discussed in the paper would be trivially solvable with real-time scheduling facilities like those mentioned above. Unfortunately, on the systems on which D-Bus operates (Windows, vanilla Linux), those in-kernel scheduling facilities are not available. I wish that they were. >> >> There may be some good solutions available with cgroups. Windows does not have those, AFAIK.
At Fri, 13 Apr 2012 16:08:00 -0300,
Robin Bate Boerop wrote:
> One argument against solving the problem this way is the following (I play devil's advocate):
>
> D-Bus is fundamentally broken performance-wise in the way described in the paper. Fixing it on some platforms and not others creates an open source product which is broken on some platforms and not others. This is a confusing story for adopters of D-Bus. Its also confusing for current users of D-Bus - they can't be sure that performance will be reasonable on platforms on which their software runs; D-Bus may or may not be broken on those platforms.Playing devil's advocate: let's rename D-Bus the Harrison Bergeron Bus.
Neal
Hi,On Fri, Apr 13, 2012 at 3:08 PM, Robin Bate Boerop wrote:
> D-Bus is fundamentally broken performance-wise in the way described in the paper.I don't agree with you here, maybe that's clear. For me the question
is: what is the user-visible impact? And in most cases (on the desktop
at least), it has been zero. I don't agree with saying "fundamentally
broken" when something is widely-adopted and solving real problems.
Now yes, there are cases where it doesn't work and it could, but "it
could be useful in more situations than the original design target"
and "fundamentally broken" are VERY different statements.
I'm not just trying to nitpick language.
If you are thinking "fundamentally broken" then any measure is
worthwhile even if it breaks other stuff and takes a lot of community
effort. So that's why I don't think "fundamentally broken" is the
right way to frame it.
The right way to frame it is _tradeoffs_. The performance issue here
was known on _day one_ when implementing dbus, and deemed worth the
tradeoff; which isn't to say it can't be improved, but in some sense
the performance issue was "the point" (because the central daemon
brought other advantages, both conceptual and practical). I think the
value of that choice has been apparent in that dbus has been widely
adopted and solved a whole lot of problems on the Linux desktop, and
that dbus _exists_. If there were no tradeoff, then we might have a
solution that was best of all worlds by now, and we do not.
Conceptually such a best-of-all-worlds solution could exist (at least
if you can change the kernel, though that won't help on Windows
either), and it's great to work on it, but don't assume there's no
baby in the bathwater.
In terms of user-visible impact, it sure sounds like there are
potentially significant, low-hanging performance gains on both desktop
and mobile Linux devices, whether it's messing with scheduler config
or some of the dbus changes we discussed in the earlier thread. That's
why I keep lobbying this point: it sure seems like someone should take
a crack at some of these, even if they are not "fundamental" they are
still real.
You can say it's fundamentally broken _for some purpose_ and that's
fine. But I would strongly argue that it is not broken for the actual
original intended use (= most but not all Linux desktop IPC scenarios)
and I also suspect that it can be made to work well for mobile Linux
devices with relatively incremental improvements to dbus and/or apps.
Havoc
On sexta-feira, 13 de abril de 2012 15.10.50, Neal H. Walfield wrote:
> At Wed, 11 Apr 2012 09:49:49 -0400,
>
> Havoc Pennington wrote:
> > In the performance paper
> > http://shorestreet.com/sites/shorestreet.com/....
> > pdf there's discussion of scheduler issues where the bus only gets the
> > same CPU slice as each individual dependent app, so in the aggregate the
> > apps get way too much time and the bus way too little.
>
> This is a basic problem throughly examined by the real-time scheduling
> community. Start looking at things like priority inheritance
> (globally fairer) or priority ceiling (easier to implement).Futex priority inheritance is implemented in all Linux versions, and other
systems have similar functionality -- though I have no clue how they'd
transmit the priority token to other processes.
For Linux, I'd start with:
1) create a temp file, mmap it to memory, create a futex with PI there
2) pass the file's FD as part of the message, with a new header field
containing the FD index
3) the receiver side gets the extra FD, maps it to memory and uses the futex
PI to get the enhanced priority
The bus daemon would do that while processing that message, then pass the PI
token to the target process and stop using on its own.
There are a lot of problems with this initial idea. Those come readily to
mind:
a) the futex PI interface was created so that a thread trying to lock would
give its priority to the thread currently holding the lock. We have a race
condition: the sender needs to lock when the receiver has already obtained the
lock.
b) worse: for the receiver to know it needs to lock and to know what to lock,
it needs to have received the message and started processing, which is the
very priority inversion problem we're trying to solve
c) in order for the sender to give its priority to something else, it needs
to be suspended in a futex lock. That means we can't do asynchronous
processing nor can we pop messages off the socket while our priority is being
given away.
d) by the way, how can one thread give its priority *and* continue executing?
So I think that this is a nice exercise to get the brain started, but we'll
need kernel help. I'd suggest instead:
- a new system call that returns a priority-inheritance file descriptor
- said FD is passed in the D-Bus message
- whenever a thread is polling that FD, its priority is automatically
inherited by a process holding that FD open -- this includes FDs currently
queued in a Unix socket buffer
- if a process has multiple priority FDs open, it gets the highest priority
among them
This solves the problems above:
a) the priority is given by poll / select, so there's no race condition
b) since the priority is received automatically, there's no priority
inversion: you have the FD open, you get it
c) since we're giving the priority by way of the very syscall we use to find
out if there's more data on the socket, the sender can be woken up by socket
data and process it (except if you've given an RT FIFO priority away).
d) since the priority is given when the calling thread goes to sleep, by
consequence it's not running; if it does wake up, the given priority is taken
away
Additional benefits:
e) if the call timed out, the calling thread resumes execution and takes away
its priority
f) the daemon code and the library actually need no modification to receive
priorities: since they receive all file descriptors and keep them open for the
duration of the DBusMessage, they have the priority.
However, the daemon code should be modified so it does *not* forward the
priority FD to eavesdropping receivers. In fact, it should *only* forward to
the intended receiver, which also neatly solves the issue of priority FDs
being passed in signal messages: the bus gets the priority, but doesn't pass
it along
g) user code often keeps the original DBusMessage around before sending its
reply, if nothing else for the serial ID. Code that drops the message should
be adapted to keep it around or at least keep the priority FD. The priority FD
should be sent along the reply, so the bus daemon gets the priority needed to
process it.
h) since a thread gets the highest priority from the priority FDs it has
open, the bus daemon is automatically running at the highest priority of its
pending messages, and so are target processes
Problem:
priority is usually given per thread, but a file description is a process
thing. Which thread gets the enhanced priority? All of them?
What do you think?
Any reactions to the proposal below?
I'd like to bring it to some kernel developers to have their feedback, but I
don't want to waste their time if we can poke holes with the D-Bus side of it.On sexta-feira, 13 de abril de 2012 16.55.26, Thiago Macieira wrote:
> So I think that this is a nice exercise to get the brain started, but we'll
> need kernel help. I'd suggest instead:
>
> - a new system call that returns a priority-inheritance file descriptor
>
> - said FD is passed in the D-Bus message
>
> - whenever a thread is polling that FD, its priority is automatically
> inherited by a process holding that FD open -- this includes FDs currently
> queued in a Unix socket buffer
>
> - if a process has multiple priority FDs open, it gets the highest priority
> among them
>
> This solves the problems above:
> a) the priority is given by poll / select, so there's no race condition
>
> b) since the priority is received automatically, there's no priority
> inversion: you have the FD open, you get it
>
> c) since we're giving the priority by way of the very syscall we use to
> find out if there's more data on the socket, the sender can be woken up by
> socket data and process it (except if you've given an RT FIFO priority
> away).
>
> d) since the priority is given when the calling thread goes to sleep, by
> consequence it's not running; if it does wake up, the given priority is
> taken away
>
> Additional benefits:
> e) if the call timed out, the calling thread resumes execution and takes
> away its priority
>
> f) the daemon code and the library actually need no modification to receive
> priorities: since they receive all file descriptors and keep them open for
> the duration of the DBusMessage, they have the priority.
>
> However, the daemon code should be modified so it does *not* forward the
> priority FD to eavesdropping receivers. In fact, it should *only* forward to
> the intended receiver, which also neatly solves the issue of priority FDs
> being passed in signal messages: the bus gets the priority, but doesn't
> pass it along
>
> g) user code often keeps the original DBusMessage around before sending its
> reply, if nothing else for the serial ID. Code that drops the message
> should be adapted to keep it around or at least keep the priority FD. The
> priority FD should be sent along the reply, so the bus daemon gets the
> priority needed to process it.
>
> h) since a thread gets the highest priority from the priority FDs it has
> open, the bus daemon is automatically running at the highest priority of its
> pending messages, and so are target processes
>
> Problem:
> priority is usually given per thread, but a file description is a process
> thing. Which thread gets the enhanced priority? All of them?
>
> What do you think?
On di 17 apr 2012 22:02:20 CEST, Thiago Macieira wrote: > Any reactions to the proposal below? > > I'd like to bring it to some kernel developers to have their feedback, but I > don't want to waste their time if we can poke holes with the D-Bus side of it.To me it sounds like we want to solve a priority inversion problem by deliberately raising priority. Clever scheduler tricks can certainly help, but don't solve the fundamental problem of the daemon being the bottleneck.> > On sexta-feira, 13 de abril de 2012 16.55.26, Thiago Macieira wrote: >> So I think that this is a nice exercise to get the brain started, but we'll >> need kernel help. I'd suggest instead: >> >> - a new system call that returns a priority-inheritance file descriptor >> >> - said FD is passed in the D-Bus message >> >> - whenever a thread is polling that FD, its priority is automatically >> inherited by a process holding that FD open -- this includes FDs currently >> queued in a Unix socket buffer >> >> - if a process has multiple priority FDs open, it gets the highest priority >> among them >> >> This solves the problems above: >> a) the priority is given by poll / select, so there's no race condition >> >> b) since the priority is received automatically, there's no priority >> inversion: you have the FD open, you get it >> >> c) since we're giving the priority by way of the very syscall we use to >> find out if there's more data on the socket, the sender can be woken up by >> socket data and process it (except if you've given an RT FIFO priority >> away). >> >> d) since the priority is given when the calling thread goes to sleep, by >> consequence it's not running; if it does wake up, the given priority is >> taken away >> >> Additional benefits: >> e) if the call timed out, the calling thread resumes execution and takes >> away its priority >> >> f) the daemon code and the library actually need no modification to receive >> priorities: since they receive all file descriptors and keep them open for >> the duration of the DBusMessage, they have the priority. >> >> However, the daemon code should be modified so it does *not* forward the >> priority FD to eavesdropping receivers. In fact, it should *only* forward to >> the intended receiver, which also neatly solves the issue of priority FDs >> being passed in signal messages: the bus gets the priority, but doesn't >> pass it along >> >> g) user code often keeps the original DBusMessage around before sending its >> reply, if nothing else for the serial ID. Code that drops the message >> should be adapted to keep it around or at least keep the priority FD. The >> priority FD should be sent along the reply, so the bus daemon gets the >> priority needed to process it. >> >> h) since a thread gets the highest priority from the priority FDs it has >> open, the bus daemon is automatically running at the highest priority of its >> pending messages, and so are target processes >> >> Problem: >> priority is usually given per thread, but a file description is a process >> thing. Which thread gets the enhanced priority? All of them? >> >> What do you think? >> >> >> _______________________________________________ >> dbus mailing list >> >> http://lists.freedesktop.org/mailman/listinfo...
On quarta-feira, 18 de abril de 2012 08.13.11, Bart Cerneels wrote: > > I'd like to bring it to some kernel developers to have their feedback, but > > I don't want to waste their time if we can poke holes with the D-Bus side > > of it. > To me it sounds like we want to solve a priority inversion problem by > deliberately raising priority.Do you know of any other ways of solving a priority inversion problem?> Clever scheduler tricks can certainly help, but don't solve the > fundamental problem of the daemon being the bottleneck.The broadcasting via the kernel is a good solution, but unlike the daemon, the kernel cannot use limitless memory. It has to block on receiving data from the userland at some time. Therefore, the kernel solution swaps one problem for another.
On 18-04-12 12:26, Thiago Macieira wrote: > On quarta-feira, 18 de abril de 2012 08.13.11, Bart Cerneels wrote: >>> I'd like to bring it to some kernel developers to have their feedback, but >>> I don't want to waste their time if we can poke holes with the D-Bus side >>> of it. >> To me it sounds like we want to solve a priority inversion problem by >> deliberately raising priority. > > Do you know of any other ways of solving a priority inversion problem?In this case you just have to limit the effect size of the problem. You can still have priority inversion between 2 D-Bus peers communication heavily. But at least it will not affect all the bus participants if there is no single central serialized access channel in the daemon.> >> Clever scheduler tricks can certainly help, but don't solve the >> fundamental problem of the daemon being the bottleneck. > > The broadcasting via the kernel is a good solution, but unlike the daemon, the > kernel cannot use limitless memory. It has to block on receiving data from the > userland at some time. Therefore, the kernel solution swaps one problem for > another. >It even is a problem in userspace when the used memory balloons out of control. I think a solution to this will also fix the problem in the kernel. How about a much smaller (like 4K/ 1 page size) limit to message size and handling larger messages transparently using ex. fd passing? Bart
On quarta-feira, 18 de abril de 2012 13.06.36, Bart Cerneels wrote: > > Do you know of any other ways of solving a priority inversion problem? > > In this case you just have to limit the effect size of the problem. You > can still have priority inversion between 2 D-Bus peers communication > heavily. But at least it will not affect all the bus participants if > there is no single central serialized access channel in the daemon.In other words, not solving the problem. I'd like to get some feedback on the solution I proposed then, or see if anyone has alternative solutions.> It even is a problem in userspace when the used memory balloons out of > control. I think a solution to this will also fix the problem in the kernel. > How about a much smaller (like 4K/ 1 page size) limit to message size and > handling larger messages transparently using ex. fd passing?That's something between D-Bus 2.0 wire format or an extremely different transport protocol. And it doesn't solve the problem of a client sending messages more quickly than the receiver reads them. It will just take longer for the memory ballooning to go out of control.
At Wed, 18 Apr 2012 08:13:11 +0200,
Bart Cerneels wrote:
>
> On di 17 apr 2012 22:02:20 CEST, Thiago Macieira wrote:
> > Any reactions to the proposal below?
> >
> > I'd like to bring it to some kernel developers to have their feedback, but I
> > don't want to waste their time if we can poke holes with the D-Bus side of it.
>
> To me it sounds like we want to solve a priority inversion problem by
> deliberately raising priority.
> Clever scheduler tricks can certainly help, but don't solve the
> fundamental problem of the daemon being the bottleneck.I'm not sure what you mean. Priority ceiling is a standard technique
to deal with priority inversion. Proper scheduling and accounting of
the resources consumed by the daemon require that the daemon binds to
the client's scheduling context when servicing a request so that the
client gets charged and the operation is executed with the right
parameters. But, even just supporting priority ceiling will help.
Neal