Eugene,
I ran pipe performance tests on 2.56GHz PIV 333MHz FSB, same box I ran
2.16.16 and 2.16.18 pipe performance test on last time (about 18 months
ago). I show 2.18 as comparable to 2.16.16. The performance increases
of 2.16.18 only showed improvements when message sizes were within a
FASTBUF.
Performance tests on RH7.2 2.4.20-28.7bigmem (SMP kernel I tested on
last time) shows LiS 2.18 STREAMS-based pipes clocking in at a dismal
13% when compared to Linux SVR3-style native pipes. 2.16.18 showed
about 20% 18 months ago, but only beneath 64-byte read/write sizes and
then fell back to about 10% after that.
Performances tests on Centos4 (RHEL4 clone) and FC4 show the performance
gains of the 2.6 kernel (and recent re-optimizing compilers) to be quite
significant.
On FC4 (a regparms kernel), the per-byte read/write latency drops from
about 750 ps (picoseconds) on RH7.2 and CL4 to about 500 ps on FC4. LiS
2.18 experiences a per-byte read/write latency drop from 1600 ps on
RH7.2 to 900 ps on CL4 and FC4. I attribute the gain on native to the
regparms FC4 kernel. I attribute the gain on LiS to the better
compilers (3.4.3 and 4.0) on CL4 and FC4 that better find their way
around cruft in the code.
Per message read/write delays for LiS 2.18 drops from 20 us on RH7.2 to
8 us on CL4 and FC4, while native pipes sit at around 2.5 us on all
three. I attribute the gain on LiS to the tighter scheduling latency
and O(1) scheduler of the 2.6 kernel. STREAMS-based pipes are far more
susceptible to scheduling latency.
Overall, when compared to native pipes, LiS 2.18 performs at 12.7% for
RH7.2, 28.1% for CL4 and a top end of 38.8% for FC4. The FC4 native
pipes really cruise, so 38.8% is quite good. I attribute the good FC4
results to the O(1) scheduler, the regparms kernel and the re-optimizing
GCC 4 compiler.
It is interesting that kernel improvements generate better performance
gains than could be accomplished within LiS with the changes from
2.16.16 to 2.16.18. There, it was only a 2x gain when compared to
native pipes and only beneath 64-byte writes. The FC4 improvesments are
across all messages sizes (tested linear with .999 correlation up to
4096 bytes).
So I suppose the story with LiS is, if you want the best performance use
a good 2.6 kernel. Because 2.16.18 only runs on a 2.6 kernel, 2.18 is
the better choice of the two for performance. If you are running on a
2.4 kernel; however, expect better performance from 2.16.18 at message
sizes within a FASTBUF.
Now, Linux Fast-STREAMS...
LfS (streams-0.7a.4) in the same performance tests relative to Linux
native pipes clocked in a 40%, 60% and 75% on RH7.2, CL4 and FC4 over
all message sizes. When compared to LiS at 13%, 28% and 39% in the
same tests, LfS performs 3.1x, 2.1x, 1.9x compared to LiS. The 3x
performance gain on 2.4 SMP over LiS 2.18 is quite impressive,
particularly when you consider that compared to native pipes, LfS runs
as fast on RH7.2 2.4 as LiS 2.18 runs on FC4. The other impressive
figure is that LfS on FC4 is running at 75% of the performance of a
native Linux pipe. This exceeds John Boyd's "impressed" threshold
(better than 50% native pipe performance).
LfS is the best performance choice on any kernel. Transitioning from
LiS 2.16.18 to LfS is a better performance choice than to LiS 2.18.
But then, that's why I called it "Fast".
I will send a separate note on some of my discoveries reagarding
performance on LiS and LfS.
Here is the raw (well, half-cooked) data: (obtained using the perftest
program included in the OpenSS7 LiS 2.18.2 release and the streams
0.7a.4 release):
Linear regression was performed on pipe throughput at 4, 8, 16, 32,
64, 128, 256, 512, 1024, 2048 and 4096 byte message sizes running the
pipe wide open (100% cpu utilization). Correlations were usually 99.9%
Slope is per-byte read/write delay, intercept is per message read/write
delay. The delay is y = mx + b, where x is the message size in bytes.
Per byte delay, slope, (picoseconds):
RH7.2 CL4 FC4
--------- -------- --------
LiS 1620 886 925
LfS 1230 919 987
Linux 760 750 482
Per write delay, intercept, (microseconds):
RH7.2 CL4 FC4
--------- -------- --------
LiS 19.20 7.72 7.83
LfS 6.54 3.57 4.02
Linux 2.43 2.17 3.03
--brian
Post by e***@netscape.netHello,
I'm curious about people impressions about LiS-2.18 performance.
Is it better comparing to LiS-2.16.18 ?
Are there any known 2.18 issues that can be fixed to improve
performance?
My understanding is that in LiS-2.18 most(all?) of the queue
processing
is done by LiS kernel threads and queuerun is never executed from
the driver tasklet context. That may result, I guess, in excessive
process
switching overhead and poorer performance.
I might be missing something, though.
The other thing I noticed when I ran my tests on a 4 processor system
root 9574 1 0 Dec01 ? 00:02:27 [LiS-2.18.0:0] <-------
root 9575 1 0 Dec01 ? 00:00:01 [LiS-2.18.0:1]
root 9576 1 0 Dec01 ? 00:00:00 [LiS-2.18.0:2]
root 9577 1 0 Dec01 ? 00:00:00 [LiS-2.18.0:3]
Is it the way it's supposed to be, or it's a bug?
I'd appreciate any comment/advices regarding performance issues on
LiS-2.18.
--
Eugene
_________________________________________________________________
Try the New Netscape Mail Today!
Virtually Spam-Free | More Storage | Import Your Contact List
[1]http://mail.netscape.com
References
1. http://mail.netscape.com/
--
Brian F. G. Bidulock ¦ The reasonable man adapts himself to the ¦
***@openss7.org ¦ world; the unreasonable one persists in ¦
http://www.openss7.org/ ¦ trying to adapt the world to himself. ¦
¦ Therefore all progress depends on the ¦
¦ unreasonable man. -- George Bernard Shaw ¦