nanog mailing list archives

Re: DPDK and energy efficiency


From: Eric Kuhnke <eric.kuhnke () gmail com>
Date: Thu, 4 Mar 2021 16:26:45 -0800

A great deal of this discussion could be resolved by the use of a $20
in-line 120VAC watt meter [1] plugged into something as simple as a $500 1U
server with some of the DPDK-enabled network cards connected to its PCI-E
bus, running DANOS.

Characterizing the idle load, average usage load, and absolute maximum
wattage load of an x86-64 platform is excessively difficult or complicated.

[1]
https://www.homedepot.com/p/Kill-A-Watt-Electricity-Monitor-P4400/202196386


On Thu, Mar 4, 2021 at 11:28 AM Etienne-Victor Depasquale <edepa () ieee org>
wrote:

*TL;DR - DPDK applications embody the phrase caveat emptor.*

As Robert Bays put it:  "Please ask your open source dev and/or vendor of
choice to verify."
On the other hand, I do not recommend taking the following (citing Robert
Bays again) for granted:
"But the reality is [open source projects and commercial products] have
all been designed from day one not to unnecessarily consume power."

This note is presented in two sections.
Section 1 presents the preamble necessary to avoid misinformation.
Section 2 presents the survey.

If so inclined, please read on.

*SECTION 1*
There are three issues at stake:

1.  the ground truth about the power/energy efficiency of (current)
deployments that use DPDK,
2.  my choice of words for the first question, as this constitutes the
claimed source of misinformation, and
3.  apportionment of responsibility for the attained level of
power/energy efficiency of a deployment that uses DPDK,

*Issue #1: ground truth on current deployments*
I base on (a) research papers and (b) Pawel Malachowski's data. Numbered
references are listed at the end of this e-mail.

[1] investigates software data planes, including OvS-DPDK. Citing directly:
"DPDK-OVS always works with high power consumption even when [there is] no
traffic to handle.
Considering the inefficiency [][in] power, DPDK provides power management
APIs to compromise
between power consumption and performance."
"For DPDK-OVS, due to the feature of DPDK’s Polling Mode Driver (PMD),
once the first DPDK port is added to vswitchd process,
it creates a polling thread and polls DPDK device in continuous loop.
Therefore CPU utilization for that thread is always 100%,
and the power consumption r[]ises to about 138 Watt"

[2] investigates multimedia content delivery and benchmarks *DPDK-OvS* in
the process. Citing directly:
"Even when no traffic was in transit,
OvS-DPDK consumed approximately three
times more energy than the other two data
planes, adding 250 percent energy overhead
(15.57 W) on top of the host OS."

[3] proposes the use of ACPI P-states and the halt instruction to control
power consumption,
in the context of *a bespoke application*. Citing directly:
"For example, a Xeon(R) E5-2620 v3 dual socket CPU consumes
about 22W of power when it is idle; but if a DPDK-based software
router runs on it, the CPU power soars to 83W even
when no packets arrive. That is a power gap of more than
60W."

[4] investigates the energy-efficient use of *Pktgen-DPDK*. Citing
directly:
"We find that high performance comes at the cost of high energy
consumption."

Pawel Malachowski  shows a list of cores (13 out of 16) in use by a DPDK
application
("DPDK-based 100G DDoS scrubber currently lifting some low traffic using
cores 1-13
on 16 core host. It uses naive DPDK::rte_pause() throttling to enter C1").
The list shows the cores spending most of their time in C1.
This means that cores are in a low-power-idle state and therefore not in
an active (C0) state.
This shows a power-aware DPDK application.

*Issue #2: my choice of words, as a source of misinformation*
Issue has been taken with the text of question 1.
I addressed this to the NANOG community,
who are busy and knowledgeable.
I chose, *with hindsight wrongly*, to paraphrase,
with the expectation that a reader would interpret correctly.
A better expression, that would still have been terse, would be:
"Are you aware that *naïve* use of DPDK on a processor core keeps
utilization at 100% regardless of packet activity?"


*Issue #3: apportionment of responsibility for the attained level of
power/energy efficiency of a deployment that uses DPDK*
Pawel Malachowski states that "It consumes 100% only if you busy poll
(which is the default approach)."

Since it is the application that exploits the DPDK API,
and since the DPDK API promotes run-to-completion (
https://doc.dpdk.org/guides/prog_guide/poll_mode_drv.html),
then *it is the application that determines power consumption*
but it is DPDK's poll-mode driver *that poses a real threat to power
efficiency, if used in "the default approach".*

Robert Bays states:
"The vast majority of applications that this audience would actually
install in their networks do not do tight polling all the time and
therefore don’t consume 100% of the CPU all the time."
*Would this audience (an audience of network operators) **truly not be
interested in using OvS-DPDK ?*
*Caveat emptor.*

*SECTION 2: Survey results*
*Q1*
[image: image.png]
*Q2*
[image: image.png]



[1] Z. Xu, F. Liu, T. Wang, and H. Xu, “Demystifying the energy efficiency
of Network Function Virtualization,”
in 2016 IEEE/ACM 24th International Symposium on Quality of Service
(IWQoS), Jun. 2016, pp. 1–10.
DOI: 10.1109/IWQoS.2016.7590429.

[2] S. Fu, J. Liu, and W. Zhu, “Multimedia Content Delivery with Network
Function Virtualization: The Energy Perspective,”
 IEEE MultiMedia, vol. 24, no. 3, pp. 38–47, 2017, ISSN: 1941-0166.
DOI: 10.1109/MMUL.2017.3051514.

[3] X. Li, W. Cheng, T. Zhang, F. Ren, and B. Yang, “Towards Power
Efficient High Performance Packet I/O,”
IEEE Transactions on Parallel and Distributed Systems, vol. 31, no. 4, pp.
981–996, April 2020,
ISSN:1558-2183. DOI: 10.1109/TPDS.2019.2957746.

[4] G. Li, D. Zhang, Y. Li, and K. Li, “Toward energy efficiency
optimization of pktgen-DPDK for green network testbeds,”
China Communications, vol. 15, no. 11, pp. 199–207, November 2018,
ISSN: 1673-5447. DOI: 10.1109/CC.2018.8543100.


On Sat, Feb 27, 2021 at 5:11 PM Etienne-Victor Depasquale <edepa () ieee org>
wrote:

Just a quick note to say that I've closed the survey.

I haven't published the results yet as I said that I would write notes
necessary as a preamble to correctly inform potential readers,
and these notes are taking longer to write than I have time available.

Cheers,

Etienne

On Wed, Feb 24, 2021 at 7:07 PM Etienne-Victor Depasquale <edepa () ieee org>
wrote:

I think I need to calm this thread down.

I'm a researcher, and my interest is in the truth, not in my opinion.

I've read some facts in this thread that are necessary
as a prerequisite to the publication of the results on Friday.

I do want to ensure that no future reader is misinformed and will do my
best,
with the help of contribution from my peers in this good community,
to summarize all objections to this survey's questions,
in the same message as that which publishes the result.

All peace and good wishes,

Etienne

On Wed, Feb 24, 2021 at 4:35 PM Robert Bays <robert () gdk org> wrote:

To the nanog community, I’m sorry to have dragged this conversation out
further.  I'm only responding to this because there are a significant
number of open source projects and commercial products that use DPDK, or
similar userspace network environment in their implementations.  The
statements in this thread incorrectly cast them, because they use DPDK, as
inefficient.  But the reality is they have all been designed from day one
not to unnecessarily consume power.  Please ask your open source dev and/or
vendor of choice to verify.  But please don’t rely on the information in
this thread to make decisions about what you deploy in your network.

On Feb 23, 2021, at 11:44 PM, Etienne-Victor Depasquale <edepa () ieee org>
wrote:

Hello Robert,

Your statement that DPDK “keeps utilization at 100% regardless of
packet activity” is just not correct.  You further pre-suppose "widespread
DPDK's core operating inefficiency” without any data to backup the
operating inefficacy assertion.


This statement is incorrect.
I have provided references (please see earlier e-mails) that
investigate the operation of DPDK.
These references are items of peer-reviewed research that investigate a
perceived problem with deployment of DPDK.
If the power consumption incurred while running DPDK were a corner
case,
then there would be little to no research value in investigating such
behavior.


Your references don’t take into account the code that this community
would actually deploy; open source implementations like DANOS, FD.io,
or OVS.  They don’t audit any commercial products that implement userspace
stacks.  None of your references say that DPDK is inherently inefficient.
The closest they come is to say that tight polling is inefficient.  But
tight polling, even in the earliest days of DPDK, was never meant to be a
design pattern that was actually deployed into production.  I was there for
those early conversations.

Please don’t mislead the community into believing that DPDK == power bad

I have to object to this statement. It does seem to imply malice, or,
at best, amateurish behaviour, whether you intended it or not.


Object all you want.  You are misleading people with your comments.
And in the process you are denigrating a large swath of OSS projects and
commercial products that use DPDK.  Your survey questions are leading and
provide a false dichotomy.  And when you post the results here, they will
be archived forever to continue to spread misinformation, unfortunately.

Everything following is informational.  Stop here if so inclined.

 Please stop delving into the detail of DPDK's facilities without
regard for your logical omission:
that whether the facilities are available or not, DPDK's deployment
profile (meaning: how it's being used in general), as indicated by the
references I've provided,
are leading to high power inefficiency on cores partitioned to the data
plane.


I’ve been writing network appliance code for over 20 years.  I designed
network architectures for years before that.  I have 10s of thousands of
DPDK based appliances in production at this moment across multiple
different use cases. I work with companies that have 100s of thousands of
units in production that leverage userspace runtimes.  I do think I
understand DPDK’s deployment profile better than you.  That’s what I have
been trying to tell you.  People don’t write inefficient DPDK code to put
into production.  We’re not dumb.  We’ve been thinking about power
consumption from day one.  DPDK was never supposed to be just a tight loop
poll.  You were always supposed to put in the very minimal extra work to
modulate power consumption.

The takeaway is that DPDK (and similar) doesn’t guarantee runaway power
bills.

Of course it doesn't.
Even the second question of that bare-bones survey tried to communicate
this much.

If you have questions, I’d be happy to discuss off line

I would be happy to answer your objections in detail off line too.
Just let me know.


Unfortunately, you don’t seem to be receptive to the numerous people
contradicting your assertions.  So I’m out.  I’ll let my comments stand
here.

Cheers,

Etienne


On Wed, Feb 24, 2021 at 12:12 AM Robert Bays <robert () gdk org> wrote:

Hi Etienne,

Your statement that DPDK “keeps utilization at 100% regardless of
packet activity” is just not correct.  You further pre-suppose "widespread
DPDK's core operating inefficiency” without any data to backup the
operating inefficacy assertion.  Your statements, taken at face value, lead
people to believe that if a project uses DPDK it’s going to increase their
power costs.  And that’s just not the case.  Please don’t mislead the
community into believing that DPDK == power bad.

Everything following is informational.  Stop here if so inclined.

DPDK does not dictate CPU utilization or power consumption, the
application leveraging DPDK does.  It’s the application that decides how to
poll packets.  If an application implements DPDK using only a tight polling
loop, then it will keep CPU cores that are running DPDK threads at 100%.
But only the most simple and/or bespoke (think trading) applications are
implemented this way.  You don’t need tight polling all the time to get the
performance gains provided by DPDK or similar environments.  The vast
majority of applications that this audience would actually install in their
networks do not do tight polling all the time and therefore don’t consume
100% of the CPU all the time.   An interesting, helpful research effort you
could lead would be to survey the ecosystem to catalog those applications
that do fall into the power hungry category and help them to change their
code.

Intel DPDK application development guidelines don’t pre-suppose tight
polling all the time and offer at least two methods for optimizing power
against throughput.  The older method is to use adaptive polling;
increasing the polling frequency as traffic load increases.  This keeps cpu
utilization low when packet load is light and increases it as traffic
levels warrant.  The second method is to use P-states and/or C-states to
put the processor into lower power modes when traffic loads are lighter.
We have found that adaptive polling works better across a larger pool of
hardware types, and therefore that is what DANOS uses, amongst other
things.

Further, performance and power consumption are dictated by a
multivariate set of application decisions including: design patterns such
as single thread run to completion models vs. passing mbufs between
multiple threads, buffer sizes and cache management algorithms, combining
and/or separating tx/rx threads, binding threads to specific lcores,
reserved cores for DPDK threads, hyperthreading, kernel schedulers,
hypervisor schedulers, interface drivers, etc.  All of these are
application specific, not DPDK generic.  Well written applications that
leverage DPDK provide knobs for the user to tune these settings for their
specific environment and use case.  None of this unique to DPDK.  Solution
designs were cribbed from previous technologies.

The takeaway is that DPDK (and similar) doesn’t guarantee runaway
power bills.  Power consumption is dictated by the application.  Look for
well behaved applications and everything will be alright.

If you have questions, I’d be happy to discuss off line.

Thanks,
Robert.


On Feb 22, 2021, at 11:27 PM, Etienne-Victor Depasquale <
edepa () ieee org> wrote:

Sorry, last line should have been:
"intended to get an impression of how widespread ***knowledge of***
DPDK's core operating inefficiency is",
not:
"intended to get an impression of how widespread DPDK's core
operating inefficiency is"

On Tue, Feb 23, 2021 at 8:22 AM Etienne-Victor Depasquale <
edepa () ieee org> wrote:
Beyond RX/TX CPU affinity, in DANOS you can further tune power
consumption by changing the adaptive polling rate.  It doesn’t, per the
survey, "keep utilization at 100% regardless of packet activity.”
Robert, you seem to be conflating DPDK
with DANOS' power control algorithms that modulate DPDK's default
behaviour.

Let me know what you think; otherwise, I'm pretty confident that
DPDK does:
"keep utilization at 100% regardless of packet activity.”

Keep in mind that this is a bare-bones survey intended for busy,
knowledgeable people (the ones you'd find on NANOG) -
not a detailed breakdown of modes of operation of DPDK or DANOS.
DPDK has been designed for fast I/O that's unencumbered by the
trappings of general-purpose OSes,
and that's the impression that needs to be forefront.
Power control, as well as any other dimensions of modulation,
are detailed modes of operation that are well beyond the scope of a
bare-bones 2-question survey
intended to get an impression of how widespread DPDK's core
operating inefficiency is.

Cheers,

Etienne

On Mon, Feb 22, 2021 at 10:20 PM Robert Bays <robert () gdk org> wrote:
Beyond RX/TX CPU affinity, in DANOS you can further tune power
consumption by changing the adaptive polling rate.  It doesn’t, per the
survey, "keep utilization at 100% regardless of packet activity.”  Adaptive
polling changes in DPDK optimize for tradeoffs between power consumption,
latency/jitter and drops during throughput ramp up periods.  Ideally your
DPDK implementation has an algorithm that tries to automatically optimize
based on current traffic patterns.

In DANOS refer to the “system default dataplane power-profile”
config command tree for adaptive polling settings.  Interface RX/TX
affinity is configured on a per interface basis under the “interfaces
dataplane” config command tree.

-robert


On Feb 22, 2021, at 11:46 AM, Jared Geiger <jared () compuwizz net>
wrote:

DANOS lets you specify how many dataplane cores you use versus
control plane cores. So if you put a 16 core host in to handle 2GB of
traffic, you can adjust the dataplane worker cores as needed. Control plane
cores don't stay at 100% utilization.

I use that technique plus DANOS runs on VMware (not
oversubscribed) which allows me to use the hardware for other VMs. NICS are
attached to the VM via PCI Passthrough which helps eliminate the overhead
to the VMware hypervisor itself.

I have an 8 core VM with 4 cores set to dataplane and 4 to control
plane. The 4 control plane cores are typically idle only processing BGP
route updates, SNMP, logs, etc.

~Jared

On Sun, Feb 21, 2021 at 11:30 PM Etienne-Victor Depasquale <
edepa () ieee org> wrote:
Hello folks,

I've just followed a thread regarding use of CGNAT and noted a
suggestion (regarding DANOS) that includes use of DPDK.

As I'm interested in the breadth of adoption of DPDK, and as I'm a
researcher into energy and power efficiency, I'd love to hear your feedback
on your use of power consumption control by DPDK.

I've drawn up a bare-bones, 2-question survey at this link:

https://www.surveymonkey.com/r/J886DPY.

Responses have been set to anonymous.

Cheers,

Etienne

--
Ing. Etienne-Victor Depasquale
Assistant Lecturer
Department of Communications & Computer Engineering
Faculty of Information & Communication Technology
University of Malta
Web. https://www.um.edu.mt/profile/etiennedepasquale



--
Ing. Etienne-Victor Depasquale
Assistant Lecturer
Department of Communications & Computer Engineering
Faculty of Information & Communication Technology
University of Malta
Web. https://www.um.edu.mt/profile/etiennedepasquale


--
Ing. Etienne-Victor Depasquale
Assistant Lecturer
Department of Communications & Computer Engineering
Faculty of Information & Communication Technology
University of Malta
Web. https://www.um.edu.mt/profile/etiennedepasquale



--
Ing. Etienne-Victor Depasquale
Assistant Lecturer
Department of Communications & Computer Engineering
Faculty of Information & Communication Technology
University of Malta
Web. https://www.um.edu.mt/profile/etiennedepasquale




--
Ing. Etienne-Victor Depasquale
Assistant Lecturer
Department of Communications & Computer Engineering
Faculty of Information & Communication Technology
University of Malta
Web. https://www.um.edu.mt/profile/etiennedepasquale



--
Ing. Etienne-Victor Depasquale
Assistant Lecturer
Department of Communications & Computer Engineering
Faculty of Information & Communication Technology
University of Malta
Web. https://www.um.edu.mt/profile/etiennedepasquale



--
Ing. Etienne-Victor Depasquale
Assistant Lecturer
Department of Communications & Computer Engineering
Faculty of Information & Communication Technology
University of Malta
Web. https://www.um.edu.mt/profile/etiennedepasquale


Current thread: