[pmacct-discussion] pmacct performance

Hi Anthony,

I map the word 'message' to 'flow' and not to NetFlow packet, please
correct me if this assumption is wrong. 55m flows/min makes it roughly
1m flows/sec. I would not recommend stretching a single nfacctd daemon
beyond beyond 200K flows/sec and the beauty of NetFlow, being UDP, is
that it can be easily scaled horizontally. For a start, details and
complexity may vary from use-case to use-case, I would hence recommend
to look in the following direction: point all NetFlow to a single IP/
port where a nfacctd in replicator mode is listening. You should test
it being able to absorb the full feed on your CPU resources. Then you
replicate to nfacctd collectors downstream parts of the full feed, ie.
you can instantiate with some headroom around 6-8 nfacctd collectors.
You can balance the incoming NetFlow packets using round-robin or
assigning flow exporters to flow collectors or with some hashing. Here
is how to start with it:

https://github.com/pmacct/pmacct/blob/master/QUICKSTART#L1384-L1445

Of course you can do the same with your load-balancer of preference.

Paolo

Post by Anthony Caiafa
Hi! So my usecase may be slightly larger than most. I am processing 1:1
netflow data for a larger infrastructure. We are receiving about 55million
messages a minute which isn’t much but through pmacct it seems to not like
it so much. I have pmacct scheduled with nomad running across a few
machines and 2 designated ports accepting the flow traffic and outputting
those to kafka.
About every 5m or so pmacct dies and restarts basically dropping all
traffic for a short period of time. The two configurations i have that are
kafka_refresh_time[name]: 300
kafka_history[name]: 5m
So i am not sure if its one of these or not since the logs only indicate
that it lost a connection to kafka and thats about it.
_______________________________________________
pmacct-discussion mailing list
http://www.pmacct.net/#mailinglists

Anthony Caiafa

2017-11-17 15:41:40 UTC

Hi!.. so i have the load spread between a 3 machines and 2 ports per
box. the biggest thing in the netflow data is the ordering for me. I
guess where i am still curious is would either of those settings be
causing the complete drop in the service where it starts and stops
every 5 minutes on the dot? I am going to play around with the times
on it to see if it is one of those settings. I will eventually have to
increase this to about 2-4m flows per second so maybe the replicator
is the best way forward.

Post by Paolo Lucente
Hi Anthony,
I map the word 'message' to 'flow' and not to NetFlow packet, please
correct me if this assumption is wrong. 55m flows/min makes it roughly
1m flows/sec. I would not recommend stretching a single nfacctd daemon
beyond beyond 200K flows/sec and the beauty of NetFlow, being UDP, is
that it can be easily scaled horizontally. For a start, details and
complexity may vary from use-case to use-case, I would hence recommend
to look in the following direction: point all NetFlow to a single IP/
port where a nfacctd in replicator mode is listening. You should test
it being able to absorb the full feed on your CPU resources. Then you
replicate to nfacctd collectors downstream parts of the full feed, ie.
you can instantiate with some headroom around 6-8 nfacctd collectors.
You can balance the incoming NetFlow packets using round-robin or
assigning flow exporters to flow collectors or with some hashing. Here
https://github.com/pmacct/pmacct/blob/master/QUICKSTART#L1384-L1445
Of course you can do the same with your load-balancer of preference.
Paolo

_______________________________________________
pmacct-discussion mailing list
http://www.pmacct.net/#mailinglists

_______________________________________________
pmacct-discussion mailing li

Paolo Lucente

2017-11-18 14:25:05 UTC

Hi Anthony,

Keep me posted on the ordering part. Wrt the complete drop in the
service, as you described in your original email, i have little info to
comment: let's say it should never happen but i don't know to this point
if it's a crash or a graceful shutdown with some message in the logs. If
you wish, we can take this further and you could start from this section
of doc about suspect of crashes:

https://github.com/pmacct/pmacct/blob/master/QUICKSTART#L1994-L2013

Any output from gdb and such, you can freely take it off list and
unicast to me directly. We can then summarise things back on list.

Paolo

Post by Anthony Caiafa
Hi!.. so i have the load spread between a 3 machines and 2 ports per
box. the biggest thing in the netflow data is the ordering for me. I
guess where i am still curious is would either of those settings be
causing the complete drop in the service where it starts and stops
every 5 minutes on the dot? I am going to play around with the times
on it to see if it is one of those settings. I will eventually have to
increase this to about 2-4m flows per second so maybe the replicator
is the best way forward.

_______________________________________________
pmacct-discussion mailing list
http://www.pmacct.net/#mailinglists

_______________________________________________
pmacct-discussion mailin

Anthony Caiafa

2017-11-18 14:27:33 UTC

Sounds good. Iâll be sending out some data to you.

Post by Paolo Lucente
Hi Anthony,
Keep me posted on the ordering part. Wrt the complete drop in the
service, as you described in your original email, i have little info to
comment: let's say it should never happen but i don't know to this point
if it's a crash or a graceful shutdown with some message in the logs. If
you wish, we can take this further and you could start from this section
https://github.com/pmacct/pmacct/blob/master/QUICKSTART#L1994-L2013
Any output from gdb and such, you can freely take it off list and
unicast to me directly. We can then summarise things back on list.
Paolo

Post by Anthony Caiafa
Hi! So my usecase may be slightly larger than most. I am processing

1:1

Post by Anthony Caiafa
netflow data for a larger infrastructure. We are receiving about

55million

Post by Anthony Caiafa
messages a minute which isnât much but through pmacct it seems to not

Post by Anthony Caiafa
it so much. I have pmacct scheduled with nomad running across a few
machines and 2 designated ports accepting the flow traffic and

outputting

Post by Anthony Caiafa
those to kafka.
About every 5m or so pmacct dies and restarts basically dropping all
traffic for a short period of time. The two configurations i have

that are

Post by Anthony Caiafa
kafka_refresh_time[name]: 300
kafka_history[name]: 5m
So i am not sure if its one of these or not since the logs only

indicate

Post by Anthony Caiafa
that it lost a connection to kafka and thats about it.
_______________________________________________
pmacct-discussion mailing list
http://www.pmacct.net/#mailinglists

_______________________________________________
pmacct-discussion mailing list
http://www.pmacct.net/#mailinglists

Anthony Caiafa

2017-11-21 19:34:33 UTC

Yep so it looks like everytime kafka_history runs no matter what
interval you put it on it will crash pmacct and restart the service.

Sounds good. I’ll be sending out some data to you.

_______________________________________________
pmacct-discussion mailing list
http://www.pmacct.net/#mailinglists

_______________________________________________
pmacct-discussion mailing list

Paolo Lucente

2017-11-21 21:32:45 UTC