Gianluca Marcari
2005-09-02 10:34:36 UTC
Hello,
We deployed a 14-host parallel postfix+amavis installation and are
having performance definitely not in line with what we were expecting
(and actually getting from a previos lower-scale installation).
Scenario:
Load balancer publishes a virtual IP which spreads the load on the 14
content filtering servers (plus the 5 non-filtering servers which were
previously handling the load, we had to bring them back into the group
because the 14 servers can't cope with the traffic)
The boxes have a postfix instance which performs some
RBL+header+black/whitelist checks and then forwards to the local
amavisd-new instance listening on the usual port 10024
Amavisd does an LDAP query against our centralized ldap, then performs
SpamAssassin + ClamAV checks
Viruses get destroyed
Spam gets destroyed unless the recipient is an AmavisVirusLover
(400.000 users out of 6 millions in the LDAP)
There is no quarantine at all, for anything (virus/spam/badheader/etc)
Everything not destroyed (ham, spam to AmavisVirusLovers) is sent to
the original 5 SMTP servers which handle it with no apparent proble
(If we shunt all the traffic to the 5 original SMTP servers,
everything flows smoothly).
Boxes are 2x2.66Ghz HT Xeons, 1GB Ram, 2 Ultra-320 HDDs in hardware
RAID1, Linux 2.6.12.3, amavis has its $TEMPBASE on a ramdrive
(/dev/shm).
The systems are processing between 7 and 14 emails per second, while
receiving up to 40, thus falling behind constantly.
Other relevant data:
Perl v5.8.5
Logging via syslog
Filesystems are reiserfs v3
SpamAssassin version 3.0.4
Relevant data in /etc/amavisd:conf:
$max_servers = 10;
$max_requests = 10;
$enable_ldap = 1;
Even though there is only 1 (mirrored) disk, we are seeing almost no
iowait (2% max, often 0%). System load is split between 74%usr, 22%
sys and 3% idle (fairly constant on most boxes)
From our logs, many mails get processed and handed over to the backend
in 300-500 milliseconds, but some are in the 5000-150000 milliseconds
range:
Sep 1 08:35:12 asav1 amavis[1224]: (01224-02-3) TIMING [total 7626 ms]
Sep 1 08:35:12 asav1 amavis[1299]: (01299-02-4) TIMING [total 5853 ms]
Sep 1 08:35:13 asav1 amavis[1720]: (01720-02-4) TIMING [total 5570 ms]
Sep 1 08:35:13 asav1 amavis[1592]: (01592-01-10) TIMING [total 417 ms]
Sep 1 08:35:13 asav1 amavis[1926]: (01926-01-4) TIMING [total 5485 ms]
Sep 1 08:35:13 asav1 amavis[1566]: (01566-01-10) TIMING [total 5504 ms]
Sep 1 08:35:13 asav1 amavis[1777]: (01777-01-4) TIMING [total 5550 ms]
Sep 1 08:35:13 asav1 amavis[1224]: (01224-02-4) TIMING [total 1050 ms]
Sep 1 08:35:13 asav1 amavis[1200]: (01200-02-10) TIMING [total 5186 ms]
Sep 1 08:35:14 asav1 amavis[1566]: (01566-01-11) TIMING [total 610 ms]
Sep 1 08:35:14 asav1 amavis[1224]: (01224-02-5) TIMING [total 811 ms]
Sep 1 08:35:14 asav1 amavis[1200]: (01200-02-11) TIMING [total 677 ms]
Sep 1 08:35:14 asav1 amavis[1720]: (01720-02-5) TIMING [total 1829 ms]
Sep 1 08:35:14 asav1 amavis[1299]: (01299-02-5) TIMING [total 2206 ms]
Sep 1 08:35:15 asav1 amavis[1921]: (01921-01-10) TIMING [total 5537 ms]
Sep 1 08:35:15 asav1 amavis[1394]: (01394-02-4) TIMING [total 5678 ms]
Sep 1 08:35:16 asav1 amavis[1720]: (01720-02-6) TIMING [total 1436 ms]
Sep 1 08:35:16 asav1 amavis[1921]: (01921-01-11) TIMING [total 1265 ms]
Sep 1 08:35:16 asav1 amavis[1969]: (01969-01-7) TIMING [total 6756 ms]
Sep 1 08:35:16 asav1 amavis[1566]: (01566-02) TIMING [total 2813 ms]
Sep 1 08:35:17 asav1 amavis[2093]: (02093-01) TIMING [total 5355 ms]
Sep 1 08:35:17 asav1 amavis[1885]: (01885-01-11) TIMING [total 5401 ms]
Sep 1 08:35:17 asav1 amavis[1238]: (01238-02-5) TIMING [total 5412 ms]
Sep 1 08:35:17 asav1 amavis[2093]: (02093-01-2) TIMING [total 481 ms]
Sep 1 08:35:18 asav1 amavis[1969]: (01969-01-8) TIMING [total 1298 ms]
Sep 1 08:35:18 asav1 amavis[2149]: (02149-01) TIMING [total 2717 ms]
Sep 1 08:35:18 asav1 amavis[1238]: (01238-02-6) TIMING [total 1021 ms]
Sep 1 08:35:18 asav1 amavis[1885]: (01885-02) TIMING [total 1090 ms]
Sep 1 08:35:19 asav1 amavis[1885]: (01885-02-2) TIMING [total 813 ms]
Sep 1 08:35:19 asav1 amavis[1720]: (01720-02-7) TIMING [total 3098 ms]
Sep 1 08:35:19 asav1 amavis[1969]: (01969-01-9) TIMING [total 1437 ms]
Sep 1 08:35:19 asav1 amavis[1238]: (01238-02-7) TIMING [total 1184 ms]
Sep 1 08:35:19 asav1 amavis[1777]: (01777-01-5) TIMING [total 6222 ms]
Sep 1 08:35:19 asav1 amavis[1299]: (01299-02-6) TIMING [total 4772 ms]
Sep 1 08:35:19 asav1 amavis[1592]: (01592-01-11) TIMING [total 6732 ms]
Sep 1 08:35:19 asav1 amavis[1969]: (01969-01-10) TIMING [total 279 ms]
Mail which have long processing time exhibit an odd breakdown,
spending a huge chunk of that time in fwd-rcpt-to and sometimes
best_try_originator
Sep 1 08:37:16 asav1 amavis[3682]: (03682-02-4) TIMING [total 5382
ms] - lookup_ldap: 16 (0%)0, SMTP pre-DATA-flush: 1 (0%)0, SMTP DATA:
66 (1%)2, body_digest: 1 (0%)2, gen_mail_id: 0 (0%)2, mime_decode: 33
(1%)2, get-file-type4: 16 (0%)2, decompose_part: 1 (0%)3,
decompose_part: 3 (0%)3, get-file-type1: 14 (0%)3, decompose_part: 57
(1%)4, parts_decode: 0 (0%)4, AV-scan-1: 10 (0%)4,
read_snmp_variables: 1 (0%)4, best_try_originator: 7 (0%)4,
update_cache: 1 (0%)4, fwd-connect: 24 (0%)5, fwd-mail-from: 2 (0%)5,
fwd-rcpt-to: 5094 (95%)99, fwd-rundown: 3 (0%)99, deal_with_mail_size:
1 (0%)99, main_log_entry: 27 (1%)100, update_snmp: 2 (0%)100,
unlink-4-files: 1 (0%)100, rundown: 0 (0%)100
Sep 1 08:37:16 asav1 amavis[4051]: (04051-01-5) TIMING [total 5437
ms] - lookup_ldap: 12 (0%)0, SMTP pre-DATA-flush: 1 (0%)0, SMTP DATA:
69 (1%)2, body_digest: 1 (0%)2, gen_mail_id: 0 (0%)2, mime_decode: 17
(0%)2, get-file-type2: 14 (0%)2, decompose_part: 1 (0%)2,
decompose_part: 51 (1%)3, parts_decode: 0 (0%)3, best_try_originator:
9 (0%)3, update_cache: 35 (1%)4, fwd-connect: 22 (0%)4, fwd-mail-from:
1 (0%)4, fwd-rcpt-to: 5158 (95%)99, fwd-rundown: 4 (0%)99,
deal_with_mail_size: 1 (0%)99, main_log_entry: 34 (1%)100,
update_snmp: 3 (0%)100, unlink-2-files: 2 (0%)100, rundown: 1 (0%)100
Sep 1 08:37:16 asav1 amavis[3425]: (03425-01-9) TIMING [total 5407
ms] - lookup_ldap: 33 (1%)1, SMTP pre-DATA-flush: 2 (0%)1, SMTP DATA:
12 (0%)1, body_digest: 1 (0%)1, gen_mail_id: 0 (0%)1, mime_decode: 70
(1%)2, get-file-type2: 30 (1%)3, decompose_part: 1 (0%)3,
decompose_part: 43 (1%)4, parts_decode: 0 (0%)4, AV-scan-1: 9 (0%)4,
read_snmp_variables: 1 (0%)4, best_try_originator: 21 (0%)4,
update_cache: 1 (0%)4, fwd-connect: 15 (0%)4, fwd-mail-from: 2 (0%)4,
fwd-rcpt-to: 5135 (95%)99, fwd-rundown: 4 (0%)100,
deal_with_mail_size: 1 (0%)100, main_log_entry: 21 (0%)100,
update_snmp: 2 (0%)100, unlink-2-files: 2 (0%)100, rundown: 1 (0%)100
Sep 1 08:37:19 asav1 amavis[3580]: (03580-02-3) TIMING [total 5032
ms] - lookup_ldap: 7 (0%)0, lookup_ldap: 5 (0%)0, lookup_ldap: 5
(0%)0, lookup_ldap: 6 (0%)0, lookup_ldap: 6 (0%)1, lookup_ldap: 6
(0%)1, lookup_ldap: 13 (0%)1, lookup_ldap: 17 (0%)1, lookup_ldap: 7
(0%)1, lookup_ldap: 36 (1%)2, lookup_ldap: 109 (2%)4, lookup_ldap: 100
(2%)6, lookup_ldap: 57 (1%)7, lookup_ldap: 46 (1%)8, lookup_ldap: 38
(1%)9, lookup_ldap: 60 (1%)10, lookup_ldap: 23 (0%)11, lookup_ldap: 45
(1%)12, lookup_ldap: 23 (0%)12, lookup_ldap: 9 (0%)12, lookup_ldap: 11
(0%)13, lookup_ldap: 14 (0%)13, lookup_ldap: 7 (0%)13, lookup_ldap: 8
(0%)13, lookup_ldap: 8 (0%)13, lookup_ldap: 7 (0%)13, lookup_ldap: 8
(0%)14, lookup_ldap: 17 (0%)14, lookup_ldap: 21 (0%)14, lookup_ldap:
16 (0%)15, SMTP pre-DATA-flush: 6 (0%)15, SMTP DATA: 42 (1%)16,
body_digest: 1 (0%)16, gen_mail_id: 1 (0%)16, mime_decode: 28 (1%)16,
get-file-type2: 14 (0%)16, parts_decode: 0 (0%)16, AV-scan-1: 52
(1%)18, spam-wb-list: 121 (2%)20, SA msg read: 5 (0%)20, ...
Sorry for the big mail, I tried to strike a balance between providing
enough useful information to troubleshoot the issue and flooding the
list.
Many thanks for any help you may provide!
Ciao,
Gianluca Marcari
-------------------------------------------------------
SF.Net email is Sponsored by the Better Software Conference & EXPO
September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
_______________________________________________
AMaViS-user mailing list
AMaViS-***@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/amavis-user
AMaViS-FAQ:http://www.amavis.org/amavis-faq.php3
AMaViS-HowTos:http://www.amavis.org/howto/
We deployed a 14-host parallel postfix+amavis installation and are
having performance definitely not in line with what we were expecting
(and actually getting from a previos lower-scale installation).
Scenario:
Load balancer publishes a virtual IP which spreads the load on the 14
content filtering servers (plus the 5 non-filtering servers which were
previously handling the load, we had to bring them back into the group
because the 14 servers can't cope with the traffic)
The boxes have a postfix instance which performs some
RBL+header+black/whitelist checks and then forwards to the local
amavisd-new instance listening on the usual port 10024
Amavisd does an LDAP query against our centralized ldap, then performs
SpamAssassin + ClamAV checks
Viruses get destroyed
Spam gets destroyed unless the recipient is an AmavisVirusLover
(400.000 users out of 6 millions in the LDAP)
There is no quarantine at all, for anything (virus/spam/badheader/etc)
Everything not destroyed (ham, spam to AmavisVirusLovers) is sent to
the original 5 SMTP servers which handle it with no apparent proble
(If we shunt all the traffic to the 5 original SMTP servers,
everything flows smoothly).
Boxes are 2x2.66Ghz HT Xeons, 1GB Ram, 2 Ultra-320 HDDs in hardware
RAID1, Linux 2.6.12.3, amavis has its $TEMPBASE on a ramdrive
(/dev/shm).
The systems are processing between 7 and 14 emails per second, while
receiving up to 40, thus falling behind constantly.
Other relevant data:
Perl v5.8.5
Logging via syslog
Filesystems are reiserfs v3
SpamAssassin version 3.0.4
Relevant data in /etc/amavisd:conf:
$max_servers = 10;
$max_requests = 10;
$enable_ldap = 1;
Even though there is only 1 (mirrored) disk, we are seeing almost no
iowait (2% max, often 0%). System load is split between 74%usr, 22%
sys and 3% idle (fairly constant on most boxes)
From our logs, many mails get processed and handed over to the backend
in 300-500 milliseconds, but some are in the 5000-150000 milliseconds
range:
Sep 1 08:35:12 asav1 amavis[1224]: (01224-02-3) TIMING [total 7626 ms]
Sep 1 08:35:12 asav1 amavis[1299]: (01299-02-4) TIMING [total 5853 ms]
Sep 1 08:35:13 asav1 amavis[1720]: (01720-02-4) TIMING [total 5570 ms]
Sep 1 08:35:13 asav1 amavis[1592]: (01592-01-10) TIMING [total 417 ms]
Sep 1 08:35:13 asav1 amavis[1926]: (01926-01-4) TIMING [total 5485 ms]
Sep 1 08:35:13 asav1 amavis[1566]: (01566-01-10) TIMING [total 5504 ms]
Sep 1 08:35:13 asav1 amavis[1777]: (01777-01-4) TIMING [total 5550 ms]
Sep 1 08:35:13 asav1 amavis[1224]: (01224-02-4) TIMING [total 1050 ms]
Sep 1 08:35:13 asav1 amavis[1200]: (01200-02-10) TIMING [total 5186 ms]
Sep 1 08:35:14 asav1 amavis[1566]: (01566-01-11) TIMING [total 610 ms]
Sep 1 08:35:14 asav1 amavis[1224]: (01224-02-5) TIMING [total 811 ms]
Sep 1 08:35:14 asav1 amavis[1200]: (01200-02-11) TIMING [total 677 ms]
Sep 1 08:35:14 asav1 amavis[1720]: (01720-02-5) TIMING [total 1829 ms]
Sep 1 08:35:14 asav1 amavis[1299]: (01299-02-5) TIMING [total 2206 ms]
Sep 1 08:35:15 asav1 amavis[1921]: (01921-01-10) TIMING [total 5537 ms]
Sep 1 08:35:15 asav1 amavis[1394]: (01394-02-4) TIMING [total 5678 ms]
Sep 1 08:35:16 asav1 amavis[1720]: (01720-02-6) TIMING [total 1436 ms]
Sep 1 08:35:16 asav1 amavis[1921]: (01921-01-11) TIMING [total 1265 ms]
Sep 1 08:35:16 asav1 amavis[1969]: (01969-01-7) TIMING [total 6756 ms]
Sep 1 08:35:16 asav1 amavis[1566]: (01566-02) TIMING [total 2813 ms]
Sep 1 08:35:17 asav1 amavis[2093]: (02093-01) TIMING [total 5355 ms]
Sep 1 08:35:17 asav1 amavis[1885]: (01885-01-11) TIMING [total 5401 ms]
Sep 1 08:35:17 asav1 amavis[1238]: (01238-02-5) TIMING [total 5412 ms]
Sep 1 08:35:17 asav1 amavis[2093]: (02093-01-2) TIMING [total 481 ms]
Sep 1 08:35:18 asav1 amavis[1969]: (01969-01-8) TIMING [total 1298 ms]
Sep 1 08:35:18 asav1 amavis[2149]: (02149-01) TIMING [total 2717 ms]
Sep 1 08:35:18 asav1 amavis[1238]: (01238-02-6) TIMING [total 1021 ms]
Sep 1 08:35:18 asav1 amavis[1885]: (01885-02) TIMING [total 1090 ms]
Sep 1 08:35:19 asav1 amavis[1885]: (01885-02-2) TIMING [total 813 ms]
Sep 1 08:35:19 asav1 amavis[1720]: (01720-02-7) TIMING [total 3098 ms]
Sep 1 08:35:19 asav1 amavis[1969]: (01969-01-9) TIMING [total 1437 ms]
Sep 1 08:35:19 asav1 amavis[1238]: (01238-02-7) TIMING [total 1184 ms]
Sep 1 08:35:19 asav1 amavis[1777]: (01777-01-5) TIMING [total 6222 ms]
Sep 1 08:35:19 asav1 amavis[1299]: (01299-02-6) TIMING [total 4772 ms]
Sep 1 08:35:19 asav1 amavis[1592]: (01592-01-11) TIMING [total 6732 ms]
Sep 1 08:35:19 asav1 amavis[1969]: (01969-01-10) TIMING [total 279 ms]
Mail which have long processing time exhibit an odd breakdown,
spending a huge chunk of that time in fwd-rcpt-to and sometimes
best_try_originator
Sep 1 08:37:16 asav1 amavis[3682]: (03682-02-4) TIMING [total 5382
ms] - lookup_ldap: 16 (0%)0, SMTP pre-DATA-flush: 1 (0%)0, SMTP DATA:
66 (1%)2, body_digest: 1 (0%)2, gen_mail_id: 0 (0%)2, mime_decode: 33
(1%)2, get-file-type4: 16 (0%)2, decompose_part: 1 (0%)3,
decompose_part: 3 (0%)3, get-file-type1: 14 (0%)3, decompose_part: 57
(1%)4, parts_decode: 0 (0%)4, AV-scan-1: 10 (0%)4,
read_snmp_variables: 1 (0%)4, best_try_originator: 7 (0%)4,
update_cache: 1 (0%)4, fwd-connect: 24 (0%)5, fwd-mail-from: 2 (0%)5,
fwd-rcpt-to: 5094 (95%)99, fwd-rundown: 3 (0%)99, deal_with_mail_size:
1 (0%)99, main_log_entry: 27 (1%)100, update_snmp: 2 (0%)100,
unlink-4-files: 1 (0%)100, rundown: 0 (0%)100
Sep 1 08:37:16 asav1 amavis[4051]: (04051-01-5) TIMING [total 5437
ms] - lookup_ldap: 12 (0%)0, SMTP pre-DATA-flush: 1 (0%)0, SMTP DATA:
69 (1%)2, body_digest: 1 (0%)2, gen_mail_id: 0 (0%)2, mime_decode: 17
(0%)2, get-file-type2: 14 (0%)2, decompose_part: 1 (0%)2,
decompose_part: 51 (1%)3, parts_decode: 0 (0%)3, best_try_originator:
9 (0%)3, update_cache: 35 (1%)4, fwd-connect: 22 (0%)4, fwd-mail-from:
1 (0%)4, fwd-rcpt-to: 5158 (95%)99, fwd-rundown: 4 (0%)99,
deal_with_mail_size: 1 (0%)99, main_log_entry: 34 (1%)100,
update_snmp: 3 (0%)100, unlink-2-files: 2 (0%)100, rundown: 1 (0%)100
Sep 1 08:37:16 asav1 amavis[3425]: (03425-01-9) TIMING [total 5407
ms] - lookup_ldap: 33 (1%)1, SMTP pre-DATA-flush: 2 (0%)1, SMTP DATA:
12 (0%)1, body_digest: 1 (0%)1, gen_mail_id: 0 (0%)1, mime_decode: 70
(1%)2, get-file-type2: 30 (1%)3, decompose_part: 1 (0%)3,
decompose_part: 43 (1%)4, parts_decode: 0 (0%)4, AV-scan-1: 9 (0%)4,
read_snmp_variables: 1 (0%)4, best_try_originator: 21 (0%)4,
update_cache: 1 (0%)4, fwd-connect: 15 (0%)4, fwd-mail-from: 2 (0%)4,
fwd-rcpt-to: 5135 (95%)99, fwd-rundown: 4 (0%)100,
deal_with_mail_size: 1 (0%)100, main_log_entry: 21 (0%)100,
update_snmp: 2 (0%)100, unlink-2-files: 2 (0%)100, rundown: 1 (0%)100
Sep 1 08:37:19 asav1 amavis[3580]: (03580-02-3) TIMING [total 5032
ms] - lookup_ldap: 7 (0%)0, lookup_ldap: 5 (0%)0, lookup_ldap: 5
(0%)0, lookup_ldap: 6 (0%)0, lookup_ldap: 6 (0%)1, lookup_ldap: 6
(0%)1, lookup_ldap: 13 (0%)1, lookup_ldap: 17 (0%)1, lookup_ldap: 7
(0%)1, lookup_ldap: 36 (1%)2, lookup_ldap: 109 (2%)4, lookup_ldap: 100
(2%)6, lookup_ldap: 57 (1%)7, lookup_ldap: 46 (1%)8, lookup_ldap: 38
(1%)9, lookup_ldap: 60 (1%)10, lookup_ldap: 23 (0%)11, lookup_ldap: 45
(1%)12, lookup_ldap: 23 (0%)12, lookup_ldap: 9 (0%)12, lookup_ldap: 11
(0%)13, lookup_ldap: 14 (0%)13, lookup_ldap: 7 (0%)13, lookup_ldap: 8
(0%)13, lookup_ldap: 8 (0%)13, lookup_ldap: 7 (0%)13, lookup_ldap: 8
(0%)14, lookup_ldap: 17 (0%)14, lookup_ldap: 21 (0%)14, lookup_ldap:
16 (0%)15, SMTP pre-DATA-flush: 6 (0%)15, SMTP DATA: 42 (1%)16,
body_digest: 1 (0%)16, gen_mail_id: 1 (0%)16, mime_decode: 28 (1%)16,
get-file-type2: 14 (0%)16, parts_decode: 0 (0%)16, AV-scan-1: 52
(1%)18, spam-wb-list: 121 (2%)20, SA msg read: 5 (0%)20, ...
Sorry for the big mail, I tried to strike a balance between providing
enough useful information to troubleshoot the issue and flooding the
list.
Many thanks for any help you may provide!
Ciao,
Gianluca Marcari
-------------------------------------------------------
SF.Net email is Sponsored by the Better Software Conference & EXPO
September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
_______________________________________________
AMaViS-user mailing list
AMaViS-***@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/amavis-user
AMaViS-FAQ:http://www.amavis.org/amavis-faq.php3
AMaViS-HowTos:http://www.amavis.org/howto/