Discussion:
[pmacct-discussion] pmacct performance
Anthony Caiafa
2017-11-16 18:16:48 UTC
Permalink
Hi! So my usecase may be slightly larger than most. I am processing 1:1
netflow data for a larger infrastructure. We are receiving about 55million
messages a minute which isn’t much but through pmacct it seems to not like
it so much. I have pmacct scheduled with nomad running across a few
machines and 2 designated ports accepting the flow traffic and outputting
those to kafka.

About every 5m or so pmacct dies and restarts basically dropping all
traffic for a short period of time. The two configurations i have that are
doing anything every 5 minutes are:

kafka_refresh_time[name]: 300
kafka_history[name]: 5m


So i am not sure if its one of these or not since the logs only indicate
that it lost a connection to kafka and thats about it.
Paolo Lucente
2017-11-17 14:47:32 UTC
Permalink
Hi Anthony,

I map the word 'message' to 'flow' and not to NetFlow packet, please
correct me if this assumption is wrong. 55m flows/min makes it roughly
1m flows/sec. I would not recommend stretching a single nfacctd daemon
beyond beyond 200K flows/sec and the beauty of NetFlow, being UDP, is
that it can be easily scaled horizontally. For a start, details and
complexity may vary from use-case to use-case, I would hence recommend
to look in the following direction: point all NetFlow to a single IP/
port where a nfacctd in replicator mode is listening. You should test
it being able to absorb the full feed on your CPU resources. Then you
replicate to nfacctd collectors downstream parts of the full feed, ie.
you can instantiate with some headroom around 6-8 nfacctd collectors.
You can balance the incoming NetFlow packets using round-robin or
assigning flow exporters to flow collectors or with some hashing. Here
is how to start with it:

https://github.com/pmacct/pmacct/blob/master/QUICKSTART#L1384-L1445

Of course you can do the same with your load-balancer of preference.

Paolo
Post by Anthony Caiafa
Hi! So my usecase may be slightly larger than most. I am processing 1:1
netflow data for a larger infrastructure. We are receiving about 55million
messages a minute which isn’t much but through pmacct it seems to not like
it so much. I have pmacct scheduled with nomad running across a few
machines and 2 designated ports accepting the flow traffic and outputting
those to kafka.
About every 5m or so pmacct dies and restarts basically dropping all
traffic for a short period of time. The two configurations i have that are
kafka_refresh_time[name]: 300
kafka_history[name]: 5m
So i am not sure if its one of these or not since the logs only indicate
that it lost a connection to kafka and thats about it.
_______________________________________________
pmacct-discussion mailing list
http://www.pmacct.net/#mailinglists
Anthony Caiafa
2017-11-17 15:41:40 UTC
Permalink
Hi!.. so i have the load spread between a 3 machines and 2 ports per
box. the biggest thing in the netflow data is the ordering for me. I
guess where i am still curious is would either of those settings be
causing the complete drop in the service where it starts and stops
every 5 minutes on the dot? I am going to play around with the times
on it to see if it is one of those settings. I will eventually have to
increase this to about 2-4m flows per second so maybe the replicator
is the best way forward.
Post by Paolo Lucente
Hi Anthony,
I map the word 'message' to 'flow' and not to NetFlow packet, please
correct me if this assumption is wrong. 55m flows/min makes it roughly
1m flows/sec. I would not recommend stretching a single nfacctd daemon
beyond beyond 200K flows/sec and the beauty of NetFlow, being UDP, is
that it can be easily scaled horizontally. For a start, details and
complexity may vary from use-case to use-case, I would hence recommend
to look in the following direction: point all NetFlow to a single IP/
port where a nfacctd in replicator mode is listening. You should test
it being able to absorb the full feed on your CPU resources. Then you
replicate to nfacctd collectors downstream parts of the full feed, ie.
you can instantiate with some headroom around 6-8 nfacctd collectors.
You can balance the incoming NetFlow packets using round-robin or
assigning flow exporters to flow collectors or with some hashing. Here
https://github.com/pmacct/pmacct/blob/master/QUICKSTART#L1384-L1445
Of course you can do the same with your load-balancer of preference.
Paolo
Post by Anthony Caiafa
Hi! So my usecase may be slightly larger than most. I am processing 1:1
netflow data for a larger infrastructure. We are receiving about 55million
messages a minute which isn’t much but through pmacct it seems to not like
it so much. I have pmacct scheduled with nomad running across a few
machines and 2 designated ports accepting the flow traffic and outputting
those to kafka.
About every 5m or so pmacct dies and restarts basically dropping all
traffic for a short period of time. The two configurations i have that are
kafka_refresh_time[name]: 300
kafka_history[name]: 5m
So i am not sure if its one of these or not since the logs only indicate
that it lost a connection to kafka and thats about it.
_______________________________________________
pmacct-discussion mailing list
http://www.pmacct.net/#mailinglists
_______________________________________________
pmacct-discussion mailing list
http://www.pmacct.net/#mailinglists
_______________________________________________
pmacct-discussion mailing li
Paolo Lucente
2017-11-18 14:25:05 UTC
Permalink
Hi Anthony,

Keep me posted on the ordering part. Wrt the complete drop in the
service, as you described in your original email, i have little info to
comment: let's say it should never happen but i don't know to this point
if it's a crash or a graceful shutdown with some message in the logs. If
you wish, we can take this further and you could start from this section
of doc about suspect of crashes:

https://github.com/pmacct/pmacct/blob/master/QUICKSTART#L1994-L2013

Any output from gdb and such, you can freely take it off list and
unicast to me directly. We can then summarise things back on list.

Paolo
Post by Anthony Caiafa
Hi!.. so i have the load spread between a 3 machines and 2 ports per
box. the biggest thing in the netflow data is the ordering for me. I
guess where i am still curious is would either of those settings be
causing the complete drop in the service where it starts and stops
every 5 minutes on the dot? I am going to play around with the times
on it to see if it is one of those settings. I will eventually have to
increase this to about 2-4m flows per second so maybe the replicator
is the best way forward.
Post by Paolo Lucente
Hi Anthony,
I map the word 'message' to 'flow' and not to NetFlow packet, please
correct me if this assumption is wrong. 55m flows/min makes it roughly
1m flows/sec. I would not recommend stretching a single nfacctd daemon
beyond beyond 200K flows/sec and the beauty of NetFlow, being UDP, is
that it can be easily scaled horizontally. For a start, details and
complexity may vary from use-case to use-case, I would hence recommend
to look in the following direction: point all NetFlow to a single IP/
port where a nfacctd in replicator mode is listening. You should test
it being able to absorb the full feed on your CPU resources. Then you
replicate to nfacctd collectors downstream parts of the full feed, ie.
you can instantiate with some headroom around 6-8 nfacctd collectors.
You can balance the incoming NetFlow packets using round-robin or
assigning flow exporters to flow collectors or with some hashing. Here
https://github.com/pmacct/pmacct/blob/master/QUICKSTART#L1384-L1445
Of course you can do the same with your load-balancer of preference.
Paolo
Post by Anthony Caiafa
Hi! So my usecase may be slightly larger than most. I am processing 1:1
netflow data for a larger infrastructure. We are receiving about 55million
messages a minute which isn’t much but through pmacct it seems to not like
it so much. I have pmacct scheduled with nomad running across a few
machines and 2 designated ports accepting the flow traffic and outputting
those to kafka.
About every 5m or so pmacct dies and restarts basically dropping all
traffic for a short period of time. The two configurations i have that are
kafka_refresh_time[name]: 300
kafka_history[name]: 5m
So i am not sure if its one of these or not since the logs only indicate
that it lost a connection to kafka and thats about it.
_______________________________________________
pmacct-discussion mailing list
http://www.pmacct.net/#mailinglists
_______________________________________________
pmacct-discussion mailing list
http://www.pmacct.net/#mailinglists
_______________________________________________
pmacct-discussion mailin
Anthony Caiafa
2017-11-18 14:27:33 UTC
Permalink
Sounds good. I’ll be sending out some data to you.
Post by Paolo Lucente
Hi Anthony,
Keep me posted on the ordering part. Wrt the complete drop in the
service, as you described in your original email, i have little info to
comment: let's say it should never happen but i don't know to this point
if it's a crash or a graceful shutdown with some message in the logs. If
you wish, we can take this further and you could start from this section
https://github.com/pmacct/pmacct/blob/master/QUICKSTART#L1994-L2013
Any output from gdb and such, you can freely take it off list and
unicast to me directly. We can then summarise things back on list.
Paolo
Post by Anthony Caiafa
Hi!.. so i have the load spread between a 3 machines and 2 ports per
box. the biggest thing in the netflow data is the ordering for me. I
guess where i am still curious is would either of those settings be
causing the complete drop in the service where it starts and stops
every 5 minutes on the dot? I am going to play around with the times
on it to see if it is one of those settings. I will eventually have to
increase this to about 2-4m flows per second so maybe the replicator
is the best way forward.
Post by Paolo Lucente
Hi Anthony,
I map the word 'message' to 'flow' and not to NetFlow packet, please
correct me if this assumption is wrong. 55m flows/min makes it roughly
1m flows/sec. I would not recommend stretching a single nfacctd daemon
beyond beyond 200K flows/sec and the beauty of NetFlow, being UDP, is
that it can be easily scaled horizontally. For a start, details and
complexity may vary from use-case to use-case, I would hence recommend
to look in the following direction: point all NetFlow to a single IP/
port where a nfacctd in replicator mode is listening. You should test
it being able to absorb the full feed on your CPU resources. Then you
replicate to nfacctd collectors downstream parts of the full feed, ie.
you can instantiate with some headroom around 6-8 nfacctd collectors.
You can balance the incoming NetFlow packets using round-robin or
assigning flow exporters to flow collectors or with some hashing. Here
https://github.com/pmacct/pmacct/blob/master/QUICKSTART#L1384-L1445
Of course you can do the same with your load-balancer of preference.
Paolo
Post by Anthony Caiafa
Hi! So my usecase may be slightly larger than most. I am processing
1:1
Post by Anthony Caiafa
Post by Paolo Lucente
Post by Anthony Caiafa
netflow data for a larger infrastructure. We are receiving about
55million
Post by Anthony Caiafa
Post by Paolo Lucente
Post by Anthony Caiafa
messages a minute which isn’t much but through pmacct it seems to not
like
Post by Anthony Caiafa
Post by Paolo Lucente
Post by Anthony Caiafa
it so much. I have pmacct scheduled with nomad running across a few
machines and 2 designated ports accepting the flow traffic and
outputting
Post by Anthony Caiafa
Post by Paolo Lucente
Post by Anthony Caiafa
those to kafka.
About every 5m or so pmacct dies and restarts basically dropping all
traffic for a short period of time. The two configurations i have
that are
Post by Anthony Caiafa
Post by Paolo Lucente
Post by Anthony Caiafa
kafka_refresh_time[name]: 300
kafka_history[name]: 5m
So i am not sure if its one of these or not since the logs only
indicate
Post by Anthony Caiafa
Post by Paolo Lucente
Post by Anthony Caiafa
that it lost a connection to kafka and thats about it.
_______________________________________________
pmacct-discussion mailing list
http://www.pmacct.net/#mailinglists
_______________________________________________
pmacct-discussion mailing list
http://www.pmacct.net/#mailinglists
Anthony Caiafa
2017-11-21 19:34:33 UTC
Permalink
Yep so it looks like everytime kafka_history runs no matter what
interval you put it on it will crash pmacct and restart the service.
Sounds good. I’ll be sending out some data to you.
Post by Paolo Lucente
Hi Anthony,
Keep me posted on the ordering part. Wrt the complete drop in the
service, as you described in your original email, i have little info to
comment: let's say it should never happen but i don't know to this point
if it's a crash or a graceful shutdown with some message in the logs. If
you wish, we can take this further and you could start from this section
https://github.com/pmacct/pmacct/blob/master/QUICKSTART#L1994-L2013
Any output from gdb and such, you can freely take it off list and
unicast to me directly. We can then summarise things back on list.
Paolo
Post by Anthony Caiafa
Hi!.. so i have the load spread between a 3 machines and 2 ports per
box. the biggest thing in the netflow data is the ordering for me. I
guess where i am still curious is would either of those settings be
causing the complete drop in the service where it starts and stops
every 5 minutes on the dot? I am going to play around with the times
on it to see if it is one of those settings. I will eventually have to
increase this to about 2-4m flows per second so maybe the replicator
is the best way forward.
Post by Paolo Lucente
Hi Anthony,
I map the word 'message' to 'flow' and not to NetFlow packet, please
correct me if this assumption is wrong. 55m flows/min makes it roughly
1m flows/sec. I would not recommend stretching a single nfacctd daemon
beyond beyond 200K flows/sec and the beauty of NetFlow, being UDP, is
that it can be easily scaled horizontally. For a start, details and
complexity may vary from use-case to use-case, I would hence recommend
to look in the following direction: point all NetFlow to a single IP/
port where a nfacctd in replicator mode is listening. You should test
it being able to absorb the full feed on your CPU resources. Then you
replicate to nfacctd collectors downstream parts of the full feed, ie.
you can instantiate with some headroom around 6-8 nfacctd collectors.
You can balance the incoming NetFlow packets using round-robin or
assigning flow exporters to flow collectors or with some hashing. Here
https://github.com/pmacct/pmacct/blob/master/QUICKSTART#L1384-L1445
Of course you can do the same with your load-balancer of preference.
Paolo
Post by Anthony Caiafa
Hi! So my usecase may be slightly larger than most. I am processing 1:1
netflow data for a larger infrastructure. We are receiving about 55million
messages a minute which isn’t much but through pmacct it seems to not like
it so much. I have pmacct scheduled with nomad running across a few
machines and 2 designated ports accepting the flow traffic and outputting
those to kafka.
About every 5m or so pmacct dies and restarts basically dropping all
traffic for a short period of time. The two configurations i have that are
kafka_refresh_time[name]: 300
kafka_history[name]: 5m
So i am not sure if its one of these or not since the logs only indicate
that it lost a connection to kafka and thats about it.
_______________________________________________
pmacct-discussion mailing list
http://www.pmacct.net/#mailinglists
_______________________________________________
pmacct-discussion mailing list
http://www.pmacct.net/#mailinglists
_______________________________________________
pmacct-discussion mailing list
Paolo Lucente
2017-11-21 21:32:45 UTC
Permalink
Hi Anthony,

What version are you using? You can confirm this with a nfacctd -V.
Also, please post your integral config. I use kafka_history myself in
production so i lean towards: either you are running some code from
master that unluckily is not working or some combination of directives
of your configuration is triggering a bug. Either case what i was
pointing out in my previous email would give a clue where the code
breaks - which could be useful info especially in the latter case.

Paolo
Post by Anthony Caiafa
Yep so it looks like everytime kafka_history runs no matter what
interval you put it on it will crash pmacct and restart the service.
Sounds good. I’ll be sending out some data to you.
Post by Paolo Lucente
Hi Anthony,
Keep me posted on the ordering part. Wrt the complete drop in the
service, as you described in your original email, i have little info to
comment: let's say it should never happen but i don't know to this point
if it's a crash or a graceful shutdown with some message in the logs. If
you wish, we can take this further and you could start from this section
https://github.com/pmacct/pmacct/blob/master/QUICKSTART#L1994-L2013
Any output from gdb and such, you can freely take it off list and
unicast to me directly. We can then summarise things back on list.
Paolo
Post by Anthony Caiafa
Hi!.. so i have the load spread between a 3 machines and 2 ports per
box. the biggest thing in the netflow data is the ordering for me. I
guess where i am still curious is would either of those settings be
causing the complete drop in the service where it starts and stops
every 5 minutes on the dot? I am going to play around with the times
on it to see if it is one of those settings. I will eventually have to
increase this to about 2-4m flows per second so maybe the replicator
is the best way forward.
Post by Paolo Lucente
Hi Anthony,
I map the word 'message' to 'flow' and not to NetFlow packet, please
correct me if this assumption is wrong. 55m flows/min makes it roughly
1m flows/sec. I would not recommend stretching a single nfacctd daemon
beyond beyond 200K flows/sec and the beauty of NetFlow, being UDP, is
that it can be easily scaled horizontally. For a start, details and
complexity may vary from use-case to use-case, I would hence recommend
to look in the following direction: point all NetFlow to a single IP/
port where a nfacctd in replicator mode is listening. You should test
it being able to absorb the full feed on your CPU resources. Then you
replicate to nfacctd collectors downstream parts of the full feed, ie.
you can instantiate with some headroom around 6-8 nfacctd collectors.
You can balance the incoming NetFlow packets using round-robin or
assigning flow exporters to flow collectors or with some hashing. Here
https://github.com/pmacct/pmacct/blob/master/QUICKSTART#L1384-L1445
Of course you can do the same with your load-balancer of preference.
Paolo
Post by Anthony Caiafa
Hi! So my usecase may be slightly larger than most. I am processing 1:1
netflow data for a larger infrastructure. We are receiving about 55million
messages a minute which isn’t much but through pmacct it seems to not
like
it so much. I have pmacct scheduled with nomad running across a few
machines and 2 designated ports accepting the flow traffic and outputting
those to kafka.
About every 5m or so pmacct dies and restarts basically dropping all
traffic for a short period of time. The two configurations i have that are
kafka_refresh_time[name]: 300
kafka_history[name]: 5m
So i am not sure if its one of these or not since the logs only indicate
that it lost a connection to kafka and thats about it.
_______________________________________________
pmacct-discussion mailing list
http://www.pmacct.net/#mailinglists
_______________________________________________
pmacct-discussion mailing list
http://www.pmacct.net/#mailinglists
_______________________________________________
pmacct-discussion mai

Loading...