| 订阅 | 在线投稿
分享
 
 
 

DSPAM v3.4.2 README 中英文对照

来源:互联网  宽屏版  评论
2008-05-31 00:03:40

DSPAM v3.4.2 README

DSPAM v3.0

Copyright (c) 2003 Network Dweebs Corporation

http://www.nuclearelephant.com/projects/dspam/

LICENSE

This program is free software; you can redistribute it and/or

modify it under the terms of the GNU General Public License

as published by the Free Software Foundation; either version 2

of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful,

but WITHOUT ANY WARRANTY; without even the implied warranty of

MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the

GNU General Public License for more details.

You should have received a copy of the GNU General Public License

along with this program; if not, write to the Free Software

Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.

TABLE OF CONTENTS

目录

DSPAM的一般知识

1.0 DSPAM介绍

1.1 安装

1.2 测试

1.3 故障处理

1.4 DSPAM工具

1.5 代理命令行参数

DSPAM的高级功能

2.0 连接libdspam

2.1 配置组群

2.2 外部播种学理论

其它功能

3.0 故障、端口等问题

3.1 已知的故障

3.2 给您的站点添加dspam标识按钮

3.3 访问CVS

General DSPAM Information

1.0 About DSPAM

1.1 Installation

1.2 Testing

1.3 Troubleshooting

1.4 DSPAM Tools

1.5 Agent Commandline Arguments

Advancced DSPAM functionality

2.0 Linking with libdspam

2.1 Configuring groups

2.2 External Inoculation Theory

2.3 Client/Server Mode

2.4 LMTP

Miscellaneous

3.0 Bugs, Ports, and the like

3.1 Known Bugs

3.2 Adding the dspam logo button to your website

3.3 CVS Access

1.0 ABOUT DSPAM

DSPAM是一个开放源代码,通过使用较高级的统计分析工具加上deobfuscation技术以及其他相关的方法,

来直接用于抗击商业邮件的反垃圾方案中。DSPAM能够学习每个用户的不同邮件的习性:根据这些习性告诉

过滤器什么是垃圾邮件。这就使得即使在一个非常庞大的系统中,DSPAM仍要为每个用户提供高精确度的、智能

的过滤功能。他提供了一个能够学习每个用户的邮件习性的管理维护功能,这些习性可能带有些许的假阳性

(false positives)。DSPAM是非常流行的防垃圾工具之一,他成功地完成了真正精确的垃圾过滤功能,

并且迅速获得一个巨大的支持论坛。

DSPAM is an open-source, freely available anti-spam solution designed to combat

unsolicited commercial email using an advanced implementation of statistical

analysis coupled with deobfuscation techniques and other related approaches.

DSPAM is capable of learning each user's individual mail behavior based on

what they tell the filter spam is and isn't. This allows DSPAM to provide

highly-accurate, personalized filtering for each user on even a large

system. This provides an administratively maintenance free system capable of

learning each user's email behaviors with very few false positives.

DSPAM is among one of the more popular and successful attempts at truly

accurate spam filtering, and is rapidly gaining a large support forum.

Contributions to the project are welcome via the dspam-dev mailing list.

DSPAM可以通过一下两种方式实现:

1.DSPAM邮件代理提供服务器支持的垃圾过滤,隔离箱,和一个促进系统进行自动分析垃圾的机制。支持先

进的特性,例如,进/出选择(opt-in/opt-out)过滤,接种(inoculation),和共享组群。

2.开发人员可以把他们的项目连接到dspam的内核引擎(libdspam)中,并与GPL协议许可证一致。这就使

得开发人员可以把libdspam立即并入自己的垃圾过滤应用软件:例如邮件客户机,其他的反垃圾邮件工具,等等。

许多基本原则合成了这个代理,在http://paulgraham.com/spam.html网站上可以看到PaulGraham抗击垃圾

邮件的白皮书中写到了这些原则。人们对原始内核提出可许多新的方法,有些方法的说明可以在DSPAM主页

的白皮书中看到。

DSPAM can be implemented in one of two ways:

1. The DSPAM mailer-agent provides server-side spam filtering, quarantine

box, and a mechanism for forwarding spams into the system to be automatically

analyzed. Advanced features, such as opt-in/opt-out filtering, inoculation,

and shared groups are supported.

2. Developers may link their projects to the dspam core engine (libdspam) in

accordance with the GPL license agreement. This enables developers to

incorporate libdspam as a "drop-in" for instant spam filtering within their

applications - such as mail clients, other anti-spam tools, and so on.

Many of the foundational principles incorporated into this agent were

contributed by Paul Graham's white paper on combatting SPAM, which can be

found at http://paulgraham.com/spam.html. Many new approaches have been

layered on top of the original core, some of which may be explained in

white papers on the DSPAM home page.

DSPAM可以分解成以下几部分:

DSPAM内核引擎

DSPAM内核引擎,即libdspam,提供几乎所有主要的垃圾过滤函数。该引擎连接到其他的dspam构件(或shell)

上提供功能性。内核引擎能够同任何其他的应用软件连接,并作为”顺便拜访者“向邮件客户机,或其他的

反垃圾工具、或其他类似的项目提供垃圾过滤功能,并使之受益于此。许多静态的和共享的版本就是通过.libs

下的libtool建立的。

libdspam提供一个存储驱动提取层,这使得开发人员可以较容易的理解信息是如何被存储在系统(例如Berkeley

DB, MySQL, Oracle, 等等)中,这就使得他们有足够的灵活性利用石板和凿子写存储驱动。

The DSPAM Solution is split up into the following pieces:

DSPAM CORE ENGINE

The DSPAM core engine, also known as libdspam, provides all major spam

filtering functions. The engine is linked to other dspam components (or

shells) to provide functionality. The core engine is capable of being linked

in with any other application as a "drop-in" to provide spam filtering to

mail clients, other anti-spam tools, and other such type projects that

would benefit from its use. Both static and shared versions are built by

libtool into .libs.

libdspam provides a storage driver abstraction layer, enabling developers to

easily change how information is stored on the system (for example Berkeley

DB, MySQL, Oracle, etc.) with enough flexibility to write a storage

driver utilizing stone tablets and chisels.

DSPAM代理

DSPAM代理是为libdspam提供垃圾过滤邮件服务器或其他服务器支持工具的直接接口的shell。代理通常指以

下两者之一:

1.代理可以化妆成邮件服务器的本地发送代理。然后DSPAM就处理邮件服务器通过管道传来的邮件,接着用

真实发送代理(procmail,mail.local,或通过一代理人把他传到另一个服务器),但若是垃圾邮件就会将

其隔离(DSPAM也可以选择标识并发送垃圾邮件)。

2.作为POP3代理,当用户检查时,DSPAM能被设定可处理电子邮件以及标识垃圾邮件。这就允许DSPAM在没有

综合需求的情况下自动到达任何邮件的前端。

MTA(sendmail,exim,qmail,等等)或POP3代理用参数识别目的地用户和其他操作参数调用DSPAM。DSPAM完成

其内部计算后将基于此结果执行适当的动作。

当邮件发送至终端用户时,代理对每个邮件设置了一系列的数字搜索路径。这些数字代表存储在服务器的临

时信息:包含邮件的原始数据,同时也用于当DSPAM出错时重学原始邮件。这允许DSPAM在不必提供完整邮件

头时精确地获悉————为了终端用户的生活跟方便。

DSPAM AGENT

The DSPAM agent is a shell for libdspam providing a direct interface to

mail servers or other tools for server-side spam filtering. The agent

is normally integrated into one of two places:

1. The agent can masquerade as a mail server's local delivery agent. DSPAM then

processes email piped to it from the mail server and then either delivers it

using the real delivery agent (procmail, mail.local, or a proxy to pass it

along to another server), or will quarantine it if the message is spam (DSPAM

can optionally tag and deliver spams as well).

2. As a POP3 proxy, DSPAM can be configured to processes email when the user

checks theirs, and tags spam accordingly. This allows DSPAM to front-end

any mail system without the need for integration.

The agent is also responsible for correcting misclassifications (missed spams

or false positives). This is critical to the learning operations of DSPAM.

The MTA (sendmail, exim, qmail, etc) or the POP3 proxy calls DSPAM with

parameters identifying the destination user and other operational parameters.

DSPAM performs its internal calculations and will then perform the appropriate

action based on the result.

When an email is delivered to the end-user, the agent appends a serial number

to each email. This serial number references temporary information stored on

the server which contains the original training data for the message, and is

used to re-learn the original message in the event DSPAM made a mistake. This

allows DSPAM to accurately learn without having to provide the full headers

of the message - making life much easier for end-users.

CGI CLIENT

CGI客户机是一个是邮件用户看到其隔离箱的终端用户工具,可以倒转偶然的假阳性,看到用户的历史动作,

而且最重要的是可以永久删除垃圾邮件。CGI客户机用来和DSPAM代理联合一致。在一可选择的解决方案中,

比如客户机-过滤/前进,消除隔离箱是必要的,但是很多用户会感激--要是不用下载所有的垃圾邮件的话,

同时也能够查看使用曲线图和其他有用的信息。

工具

提供管理字典,自动化文集,创造种子[合成的]字典。

CGI CLIENT

The CGI client is an end-user tool enabling a mail user to view their

quarantine box, reverse the occasional false positive, view their historical

activity, and most importantly to delete their spams permanently.

The CGI client works in conjunction with the DSPAM agent. It is possible to

eliminate the quarantine box in lieu of an alternative solution, such as

client-filtering/forwarding, but many users will appreciate not having to

download all of their spam, and being able to view usage graphs and other

useful information.

TOOLS

Some basic tools which have been provided to manage dictionaries, automate

corpus feeding, and create seeded [composite] dictionaries.

1.1 安装

主要步骤

------------------------------------------------------

重要升级步骤: 适用于版本 ------------------------------------------------------

3.0版本合并了用户接口的许多变化,但是保留了常规的数据结构,因此用户没有必要为了升级再重新学。

步骤1:关闭现有的DSPAM安装。

现有的DSPAM安装必须先于升级前关掉。最简单的方法是关掉为DSPAM CGI服务的MTA和web服务器。在升级时不得处理任何邮件。

步骤2:数据存储升级

如果您用的是基于SQL的驱动程序,只需作很小的改动即可。这些改变只对基于SQL的驱动程序是必要

的;如果您用的是BerkeleyDB存储驱动则无须作改动。

下面的SQL代码应该更新MySQL和Oracle数据库的3.0版格式。确定用DSPAM进入schema以便能运用这些改动。

alter table dspam_stats add spam_learned int;

alter table dspam_stats add innocent_learned int;

alter table dspam_stats add spam_classified int;

alter table dspam_stats add innocent_classified int;

update dspam_stats set spam_learned = total_spam;

update dspam_stats set innocent_learned = total_innocent;

update dspam_stats set spam_classified = 0, innocent_classified = 0;

alter table dspam_stats drop column total_spam;

alter table dspam_stats drop column total_innocent;

alter table dspam_stats add spam_misclassified int;

alter table dspam_stats add innocent_misclassified int;

update dspam_stats set spam_misclassified = spam_misses;

update dspam_stats set innocent_misclassified = false_positives;

alter table dspam_stats drop column spam_misses;

alter table dspam_stats drop column false_positives;

步骤3:编译DSPAM V3.0

DSPAM v3.0改动了很多命令行的特征。其他的configure-time参数也有所改动或删除。下面列举了对

configure-timed做的变动:

--with-userdir-* changed 'userdir' to 'dspam-home'

--with-local-delivery-agent changed to --with-delivery-agent

--enable/disable-chained-tokens removed from configure

--enable/disable-bnr removed from configure

--enable/disable-whitelist removed from configure

--enable/disable-toe removed from configure

--enable/disable-tum removed from configure

--enable/disable-spam-delivery removed from configure

--enable/disable-deliver-fp removed from configure

一旦您配置了DSPAM,运行:make %26amp;%26amp; make install

进行编译安装软件。

注:默认的DSPAM路径由changed from /etc/mail/dspam 变为 /var/dspam.如果您想用老的路径,

请用 --with-dspam-home=/etc/mail/dspam 指定。

1.1 INSTALLATION

UPGRADING

------------------------------------------------------

IMPORTANT UPGRADE STEPS FOR USERS UPGRADING FROM ------------------------------------------------------

Version 3.0 incorporates many changes to the user interface, but preserves

the general data structure so that users don't need to re-train in order

to upgrade.

Step 1: Shut down the existing DSPAM installation.

The existing DSPAM installation should be shut down prior to any upgrade

changes. The easiest way to do this is to turn off the MTA and web

server serving the DSPAM CGI. No mail should be processed while the

changes are being made.

Step 2: Data Storage Changes

If you are using a SQL-Based driver, a few minor changes will need to

be made. These changes are only necessary to SQL-Based drivers; no

changes need to be made if you are using the BerkeleyDB storage drivers.

The following SQL code should upgrade both MySQL and Oracle databases to

the v3.0 format. Be sure to log into the schema used by DSPAM to apply

these changes.

alter table dspam_stats add spam_learned int;

alter table dspam_stats add innocent_learned int;

alter table dspam_stats add spam_classified int;

alter table dspam_stats add innocent_classified int;

update dspam_stats set spam_learned = total_spam;

update dspam_stats set innocent_learned = total_innocent;

update dspam_stats set spam_classified = 0, innocent_classified = 0;

alter table dspam_stats drop column total_spam;

alter table dspam_stats drop column total_innocent;

alter table dspam_stats add spam_misclassified int;

alter table dspam_stats add innocent_misclassified int;

update dspam_stats set spam_misclassified = spam_misses;

update dspam_stats set innocent_misclassified = false_positives;

alter table dspam_stats drop column spam_misses;

alter table dspam_stats drop column false_positives;

Step 3: Compile DSPAM v3.0

DSPAM v3.0 has moved many features out to the commandline. Other

configure-time arguments have also been changed or removed. The following

is a list of configure-time changes that have been made:

--with-userdir-* changed 'userdir' to 'dspam-home'

--with-local-delivery-agent changed to --with-delivery-agent

--enable/disable-chained-tokens removed from configure

--enable/disable-bnr removed from configure

--enable/disable-whitelist removed from configure

--enable/disable-toe removed from configure

--enable/disable-tum removed from configure

--enable/disable-spam-delivery removed from configure

--enable/disable-deliver-fp removed from configure

once you have configured DSPAM, run: make %26amp;%26amp; make install

to build and install the software.

NOTE: The default DSPAM home has been changed from /etc/mail/dspam to

/var/dspam. If you would like to use the old path, specify it using

--with-dspam-home=/etc/mail/dspam.

步骤4:更新CGI

DSPAM CGI必须得更新。许多的调用参数已经被改变了。

步骤5:重行配置MTA

DSPAM的命令行参数已被重写。您应该为所有新的命令行成分的圆满解释,考虑'AGENT COMMANDLINE

ARGUMENTS' 部分。一些变动如下:

--addspam, --falsepositive, --corpus, and --inoculate have been

replaced with two flags to specify a pre-classification and a

classification source:

--addspam becomes:

--class=spam --source=error

--falsepositive becomes:

--class=innocent --source=error

--corpus becomes:

--class=innocent --source=corpus

--class=spam --source=corpus

--inoculate becomes:

--class=spam --source=inoculation

指定训练模式(training mode),发送参数选择,还有命令行参数选择是必要的。例如:

--mode=teft --deliver=innocent --feature=chained,bnr

否则,如果您喜欢一起发送合法邮件和垃圾邮件,您应该用

--deliver=innocent,spam。谨记您的这点喜好将被用于您将使用的任何操作系统。例如,假如您重新训练

一个假阳性或接近垃圾的邮件,--deliver参数将指定您是否想发送他,因此您可以享受在MTA,aliases,

和CGI中定义不同的命令行参数。

Step 4: Upgrade the CGI

The DSPAM CGI has been updated, and should be upgraded. Many calling

arguments have been changed.

Step 5: Reconfigure your MTA

DSPAM's commandline arguments have been rewritten. You'll want to

consult the 'AGENT COMMANDLINE ARGUMENTS' section for a full explanation

of all the new commandline components. Some of the basic changes are:

--addspam, --falsepositive, --corpus, and --inoculate have been

replaced with two flags to specify a pre-classification and a

classification source:

--addspam becomes:

--class=spam --source=error

--falsepositive becomes:

--class=innocent --source=error

--corpus becomes:

--class=innocent --source=corpus

--class=spam --source=corpus

--inoculate becomes:

--class=spam --source=inoculation

It will also be necessary to specify the training mode, delivery

preferences, and feature selection on the commandline as well. For

example:

--mode=teft --deliver=innocent --feature=chained,bnr

Or if you prefer delivery of both innocent and spam messages, you should

use --deliver=innocent,spam. Keep in mind that these preferences will

be applied to whatever operation you are calling. For example, if you

are retraining a false positive or forwarding in a spam, the --deliver

argument will specify whether or not you want to deliver it, so you have

the luxury of defining different commandline arguments between your MTA,

aliases, and CGI.

步骤6:目录结构已被改变,因此所有的用户目录都在$DSPAM_HOME/data中。您可以删除所有的用户目录

(.stats等文件将被重新编译,但是隔离箱不会),或者把他们移到 $DSPAM_HOME/data中。

步骤7:打开您的MTA和CGI,做一个全面的测试。

刷新安装

首先您得下载一些必要的工具:

这取决于您想用那种驱动程序,您需要:

libdb4_drv: Berkeley DB-4.

libdb3_drv: Berkeley DB-3.

mysql_drv: MySQL client libraries (and a server to connect to)

ora_drv: Oracle Call Interface (and a server to connect to)

pgsql_drv: PostgreSQL client libraries (and a server to connect to)

MYSQL时被推荐的存储驱动程序,即使是执行小项目,他也比其他驱动稳定且好测试。如果您没有办法运行

一个稳定的服务器,libdb驱动应该满足了,但是请注意,libdb 偶尔会导致一些问题,包括data corruption

和 lock contention。结果,您不得不做一个备份以免出现这些问题。

一般来说,MYSQL是一个较快的解决方案且占用较小的存储器,同时适合小型或大规模的运行。

Step 6: The directory structure has been changed so that all user directories

go into $DSPAM_HOME/data. You'll want to either delete all user

directories (the .stats files and such will be rebuilt, but the

quarantine boxes won't), or move them into $DSPAM_HOME/data.

Step 7: Turn your MTA and CGIs back on, and TEST EVERYTHING.

FRESH INSTALLATION

First you will need to download a few prerequisite tools:

Depending on which storage driver you want to use, you will need:

libdb4_drv: Berkeley DB-4.

libdb3_drv: Berkeley DB-3.

mysql_drv: MySQL client libraries (and a server to connect to)

ora_drv: Oracle Call Interface (and a server to connect to)

pgsql_drv: PostgreSQL client libraries (and a server to connect to)

MySQL is the recommended storage driver, even for small implementations, as

it is more stable and tested than the other drivers. If you are incapable

of running a stateful server, the libdb drivers should suffice, but please

be aware that libdb can occasionally result in some problems including

data corruption and lock contention. As a result, you'll want to maintain

a backup of your dictionary in the event such problems arise.

In general, MySQL is a faster solution with a smaller storage footprint,

and is well suited for both small and large-scale implementations.

You can download Berkeley DB from http://www.sleepycat.com.

You can download MySQL from http://www.mysql.com.

You can download PostgreSQL from http://www.postgresql.com

You can obtain more information about Oracle at http://www.oracle.com.

Be sure the necessary libraries are available to root, the MTA user, and

the CGI user. The easiest way to do this is to copy them to /usr/lib or

/lib.

Documentation for the setup of your selected storage driver can be found

in the tools.[storage driver]/ directory.

NOTE: Some operating system distributions include their own version of

libdb3_drv and libdb4_drv. A majority of these packaged versions

do work correctly with DSPAM, however a few do not. If you experience

problems with one of the libdb storage drivers, consider downloading

and compiling the official source tree from http://www.sleepycat.com.

1. 配置结构

./configure [options]

DSPAM支持下面的配置结构:

PATH SWITCHES

--with-dspam-home=DIR

为dspam用户信息指定一个可选择的存储目录。默认路径是/var/dspam。

--prefix=DIR

安装时指定一个可选的root 路径前缀。默认方式为:/usr/local。这并不影响DSPAM_HOME。

FILESYSTEM SCALE

默认的filesystem scale是:"small-scale",在顶级(top-level)DSPAM_HOME/data目录下把每个用户写

入自己的目录里。以下两个开关允许为了适合比较大型的安装而把scale作一定变动。

--enable-large-scale large-scale执行的开关。用户数据将以$DSPAM_HOME/data/u/s/user方式代替$DSPAM_HOME/data/user

被存储起来。

--enable-domain-scale

domain-scale执行的开关。用的时候,username@domain会阿被当成用户的id,用户数据会被

当作$DSPAM_HOME/data/domain.com/user存储,同时,$DSPAM_HOME/opt-in/domain/user.dspam也

会代替$DSPAM_HOME/data/user。

1. CONFIGURATION

./configure [options]

DSPAM supports the configuration options below.

PATH SWITCHES

--with-dspam-home=DIR

Specify an alternative storage directory for dspam user information. The

default is /var/dspam.

--prefix=DIR

Specify an alternative root prefix for installation. The default is

/usr/local. This does not affect DSPAM_HOME.

FILESYSTEM SCALE

The default filesystem scale is "small-scale", and writes each user to

its own directory in the top-level DSPAM_HOME/data directory.

The following two switches allow the scale to be changed to be more

suitable for larger installations.

--enable-large-scale

Switch for large-scale implementation. User data will be stored as

$DSPAM_HOME/data/u/s/user instead of $DSPAM_HOME/data/user

--enable-domain-scale

Switch for domain-scale implementation. When used, username@domain should

be passed in as the user id and user data will be stored as

$DSPAM_HOME/data/domain.com/user and $DSPAM_HOME/opt-in/domain/user.dspam

instead of $DSPAM_HOME/data/user

INTEGRATION SWITCHES

--with-delivery-agent=PROG

发送代理被称作邮件发送(deliver messages)。

用此来指定一个发送代理,而不是您的操作系统所指定的那个。尤其是您在一个不支持的平台上建立时,

您必须这样指定。如果您想包含额外的命令行标识,可能您会用到引号。DSPAM将自动替换最初给定的的命

令行参数,除了所有的DSPAM-specific参数(比如--user,--process,等等)。这并不要求必须得是一个本

地代理,但是必须得配置成可使某个代理可以通过。

当前,DSPAM已经为Linux,FreeBSD,Solaris,和Cygwin平台搭建了默认的发送代理(delivery agent)。

注:当指定一系列的参数时,您得在PROG周围加上引号。可能您也会用到标志符$u在参数列表的相应位置

用DSPAM指定用户ID的目的文件。例如:

在调用LDA之前$u将会在那里被目的用户所取代.然而,如果您的MTA要求用户参数列表已默认的方式最后出现,

这样就导致了潜在的问题。这就是为什么DSPAM允许您这样设置MTA配置的原因。

注:写$u时千万别忘记写$,只有在命令行中指定$u时可以不写$.这会防止$u被shell的环境变量'u'所覆盖。

您可以选择用%u。

INTEGRATION SWITCHES

--with-delivery-agent=PROG

The delivery agent is the tool called to deliver messages.

Use this to specify an alternative delivery agent, other than the one

specific to your operating system. If you are building on an unsupported

platform, you will need to specify this. You may use quotes if you wish to

include additional commandline flags. DSPAM will automatically relay the

commandline parameters it was initially given, with the exception of any

DSPAM-specific parameters (such as --user, --process, etc.). This does

not necessarily need to be a local agent, but can be configured to call

a proxy pass-through.

Currently, DSPAM has a default delivery agent selected for Linux,

FreeBSD, Solaris, and Cygwin platforms.

NOTE: When specifying a series of arguments, you will need to use quotes

around PROG. You may also use the $u identifier to specify that you

with DSPAM to place the destination user's ID in the corresponding space

in the arguments list. For example:

Where $u will be replaced by the destination user prior to calling the LDA.

This could potentially cause problems, however, if your MTA requires the

user argument list to come last, which is why DSPAM, by default, will allow

you to set this in the MTA configuration.

NOTE: be sure to escape the $ in $u. Only do this when specifying $u on the

commandline. This will prevent $u from being overwritten with the shell's

environment variable 'u'. You may alternatively use %u.

--with-quarantine-agent=PROG 默认情况是,在其内部用户隔离箱里DSPAM会自动隔离垃圾。要是您不想使用默认的方式,您就得指定您自

己的隔离代理。--with-delivery-agent选项也是同理。任何时候当某个邮件被认为是垃圾邮件时,隔离代

理将被调用。

--enable-broken-mta

要是您的MTA(报文传送代理)被破坏了,您就可以用此命令,用CTRL-M把邮件传到DSPAM中。

--enable-broken-return-codes

如果是垃圾邮件则使DSPAM返回99(退出码:exitcode),不是垃圾邮件则返回0,其他的返回值则表示有

错误发生。默认方式下不会考虑结果怎样,只要操作成功,就返回0。只有用这种方式您才能明白您在做什么。

--with-quarantine-agent=PROG

By default, DSPAM automatically quarantines spams in its internal

user quarantine box. If you wish to override this default behavior,

however, you may do so by specifying your own quarantine agent. The same

notes from the --with-delivery-agent option apply here. The quarantine agent

will be called whenever a message is believed to be spam, with the message

provided as stdin into the tool.

--enable-broken-mta

You should enable this if your MTA is broken and passes messages into DSPAM

with CTRL-M's (^M) in them.

--enable-broken-return-codes

Causes DSPAM to return an exit code of 99 if a message is spam, 0 if

innocent, and any other code if an error has occured. The default is to

return 0 whenever the operation is successful, regardless of outcome. Only

use this if you know what you're doing!

--with-storage-driver=DRIVER

指定一个可选择的存储驱动。这个驱动是特地为DSPAM来写存储记号,签名数据,以及其他的私有操作。

通常默认的驱动是libdb4_drv,可以和Berkeley DB v4结合。下面给出了一些驱动:

libdb4_drv: Berkeley DB4 Library

libdb3_drv: Berkeley DB3 Library

mysql_drv: MySQL Drivers

ora_drv: Oracle Drivers (BETA)

pgsql_drv: PostgreSQL Drivers (BETA)

您也许要用到某些特定的驱动来配置标记(以后讨论)。

--enable-client-compression

在用存储驱动之处(目前只有mysql_drv),使客户机数据源能够压缩。中导致数据源和其客户机的数据均被压缩。

如果您的数据源为了节约带宽而在一个与DSPAM代理分离的机器上时,您应该用此选项,但是这样会花费占

用一些CPU。

--disable-trusted-user-security

管理员们可以用此配置标识来使trusted user security 不可用。这样会使DSPAM对每一位用户都很“信任”,允许他们在服务器里通过DSPAM潜在执行任意的命令。

由此,管理员应该只用此于服务器关闭时,或是将其DSPAMbinary配置成只有可“信任”用户执行的形式。

这个选项绝对不应该用来当作解决MTA授权优先于调用DSPAM的办法。相反,请查看本文的TRUSTED SECURITY部分。

--with-storage-driver=DRIVER

Specify an alternative storage driver. A storage driver is a driver

written specifically for DSPAM to store tokens, signature data, and

perform other proprietary operations. The default driver is libdb4_drv,

which incorporates Berkeley DB v4. The following drivers have been provided:

libdb4_drv: Berkeley DB4 Library

libdb3_drv: Berkeley DB3 Library

mysql_drv: MySQL Drivers

ora_drv: Oracle Drivers (BETA)

pgsql_drv: PostgreSQL Drivers (BETA)

You may also need to use some of the driver-specific configure flags

(discussed later).

--enable-client-compression

Enables data source client compression for storage drivers where it is

available (presently only mysql_drv). This causes data between the

data source and its clients to be compressed. You should use this option

if your data source is on a separate machine from the DSPAM agent(s) as it

conserves bandwidth, but at the expense of a few CPU cycles.

--disable-trusted-user-security

Administrators who wish to disable trusted user security may do so by

using this configure flag. This will cause DSPAM to treat each user as

if they were "trusted" which could allow them to potentially execute

arbitrary commands on the server via DSPAM. Because of this, administrators

should only use this option on either a closed server, or configure their

DSPAM binary to be executable only by users who can be trusted. This

option SHOULD NOT be used as a solution to your MTA dropping privileges

prior to calling DSPAM. Instead, see the TRUSTED SECURITY section of this

document.

--enable-homedir-dotfiles

如果选择可用(enabled),DSPAM将在用户的主目录里检查.nodspam|.dspam文件,而不是检查

$DSPAM_HOME/$USER/opt-in/ $USR[.nodspam |.dspam]。这两个dotfiles用来过滤opt-out或opt-in。

--enable-opt-in

使DSPAM为只有.dspam dotfile的文件过滤邮件。默认方式是opt-out,它需要一个有.nodspam 文件回避过滤。

--enable-homedir-dotfiles

When enabled, instead of checking for $DSPAM_HOME/$USER/opt-in/

$USER[.nodspam|.dspam], DSPAM will check for a .nodspam|.dspam file in the

user's home directory. These two dotfiles are used for opt-out or opt-in

filtering.

--enable-opt-in

Causes DSPAM to filter mail only for users with a .dspam dotfile. The

default is opt-out, which requires a .nodspam file to exist to bypass

filtering.

调试开关

--enable-debug

为调试输出DSPAM_HOME/dspam.debug和DSPAM_HOME/dspam.messages(有关DSPAM_HOME的详细资料请参

见--with-dspam-home的desription选项)打开support(Turnsonsupport)。这允许您可以为某个指定的用

户通过下放(drop)DSPAM_HOME/userpath/user.debug文件而打开邮件调试,或者为所有用户下放

DSPAM_HOME/.debug文件。使得调试工具只支持这种特性,而且为了打开邮件dotfile还必须得下放。

--enable-verbose-debug

打开非常详细的DSPAM_HOME/dspam.debug 和

DSPAM_HOME/dspam.messages (有关DSPAM_HOME的详细资料请参见--with-dspam-home的desription 选项)

的调试输出结果。dotfile仍然得下放以激活邮件,就像--enable-debug选项一样。

DEBUGGING SWITCHES

--enable-debug

Turns on support for debugging output to DSPAM_HOME/dspam.debug and

DSPAM_HOME/dspam.messages (see desription of --with-dspam-home option for

details about DSPAM_HOME). This option allows you to turn on debugging

messages for specific users by dropping a DSPAM_HOME/userpath/user.debug

file or for all users by dropping a DSPAM_HOME/.debug file. Enabling

debug only enables support for this feature, dotfiles must still be

dropped in order to turn messages on.

--enable-verbose-debug

Turns on extremely verbose debugging output to DSPAM_HOME/dspam.debug

and DSPAM_HOME/dspam.messages (see desription of --with-dspam-home option

for details about DSPAM_HOME). dotfiles must still be dropped in order to

activate messages, just like with --enable-debug.

训练集辨识开关(TRAINING SET IDENTIFICATION SWITCHES)

DSPAM的默认方式是存储所有的原始training data到作为暂时信息的服务器一边,嵌入一系列的数字到与相关数据有关的每个邮件的主体(body)中。

这用于错误分类以及提供真正的1:1再训练(retraining)。 某些执行或许会对训练集识别的要求有些许不同之处。

--enable-signature-attachments

取代了在服务器上存储DSPAM签名(这会腾出可观的磁盘空间),这个选项会为了包含一个dspam.dat附件而

导致DSPAM重写每个邮件,这包括为了计算原始邮件的所有记号。当垃圾邮件或是假阳性被返回到系统来处

理时,就会读这个签名。每封邮件大概平均会增加2k-32k的带宽,这取决于原始邮件的大小。

注:这个选项会和一些引进了先进邮件(比如某些或是所有的elm版本)的mail client产生冲突,

由此这些选项应该只用于那些所有的客户机都能完全理解embedded multipart message(如Outlook,

Ximian Evolution,Etcetera)的网络中,而且可以把附件当作是附件而不是当作引用文本(quote text)。

换句话说,如果您的客户机网络不是标准的基于GUI的,这会突然导致过多的堵塞。服务器那边的签名仍然

是为所有客户机服务的最可靠的方法。

它总是在您收到的每一封邮件上别一个“曲别针”("paper clip")。

TRAINING SET IDENTIFICATION SWITCHES

The default behavior for DSPAM is to store all original training data

on the server-side as temporary information, and embed a serial number

into the body of each message referencing the data. This is used for

misclassifications and providing a true 1:1 retraining. Some

implementations may call for a slightly different approach to training set

identification.

--enable-signature-attachments

Instead of storing the DSPAM signatures on the server (which could take

up considerable disk space), this option will cause DSPAM to rewrite

each message to include a dspam.dat attachment, which contains all of

the tokens used to calculate the original message. When the spam or

false positive is processed back into the system, this signature will

be read. May increase bandwidth on an average between 2k-32k per

message, depending on the original message's size.

NOTE: This option doesn't work correctly with mail clients that quote an

embedded, forwarded message (such as some or all versions of elm) and

should only be used on networks where all clients can properly understand

an embedded multipart message (Outlook, Ximian Evolution, Etcetera), and

forward the attachment as an attachment instead of quoted text. In othe

words, this breaks a lot of stuff if you're not on a standardized GUI-based

client network. Server-side signatures is still the most reliable method

and works for all known clients.

This also puts a "paper clip" on every message you receive.

--enable-signature-headers

该选项使DSPAM签名写入邮件头而不是邮件体。

重点:该选项要求所有用户把他们的邮件当作附件返回到DSPAM中,或者执行一些宏命令以保留将会被标准

发送而NORMALLY BE DROPPED 的 X-DSPAM-Signature 邮件头。

--enable-webmail

webmail开关是为某些系统而设计的。这些系统的源邮件保留在服务器里,而且为了再训练(retraining)

呈现原始的格式。该选项会导致DSPAM中止所有的签名写入和DSPAM写入到邮件中,而且会尽可能简单的发送

出邮件。这个模式需要(REQUIRES)源邮件显示出最初发送时的格式,由此可以再训练。就像在webmail或

是其他的应用中,在读邮件时邮件通常都是保存在服务器里的。不要(DO NOT)为了再训练(retraining)

而用这个开关,除非原始邮件确实有原始的邮件头而且没有被修改过(ORIGINAL HEADERS and NO

MODIFICATIONS)。

--with-signature-life=DAYS

指定存储在服务器里的签名长度(以天为单位)默认值。默认值为14天。这个值应该准确描绘用户识可能别

和转寄一封丢失了的垃圾或是假阳性邮件的最长时间。要考虑到休假问题。可以在命令行中调用dspam_clean

来改变。

--enable-signature-headers

This option will cause the DSPAM signature to be written to the message

header instead of body.

IMPORTANT: This option requires that all users either bounce their messages

into DSPAM, forward as an attachment, or implement some macro that will

retain the X-DSPAM-Signature header, which will NORMALLY BE DROPPED by

standard forwarding.

--enable-webmail

The webmail switch is designed for systems where the original message

remains server side and can therefore be presented in pristine format for

retraining. This option will cause DSPAM to cease all writing of

signatures and DSPAM headers to the message, and deliver the message in as

pristine format as possible. This mode REQUIRES that the original message

in its pristine format (as of delivery) be presented for retraining, as in

the case of webmail or other applications where the message is actually

kept server-side during reading, and is preserved. DO NOT use this switch

unless the original message can be presented for retraining with the

ORIGINAL HEADERS and NO MODIFICATIONS.

--with-signature-life=DAYS

Specifies the default length (in days) a signature should remain stored on

the server. The default is 14 days. This value should accurately represent

the maximum amount of time a user would need to identify and forward

a missed spam, or mark a false positive. Consider vacations. This can

be changed in calls to dspam_clean on the commandline.

(特征激活)FEATURE ACTIVATION

--enable-neural-networking (EXPERIMENTAL可选)

使中心网络支持可用(参见NEURAL NETWORKING部分)。目前只有mysq_drv 和 pgsql_drv

存储驱动支持该特征,而且也还只是试验性的。

--enable-source-address-tracking

通过syslog把垃圾邮件和正常邮件的源地址记入日志。您可以创建一个包含本地MTA IPs 的DSPAM/meta.whichlist文件,这样就让DSPAM跳到下一个“已收”('Received')邮件头。每行一个IP。

也可以用改进了的Blackhole Server写入SBL blacklist文件 (http://www.nuclearelephant.com/projects/sbl/)。

--enable-spam-subject

预先考虑任何疑似垃圾邮件的邮件头主题部分。有些时候这比X-DSPAM-Result域更有用,因为并不是所有的

邮件客户机都支持带自定义邮件头的邮件规则。

--disable-user-logging

禁止每个用户日志文件的写。禁止后用户不能察看图表或历史日志。

--disable-system-logging

禁止系统日志文件的写。禁止后管理员不能察看图表或历史日志。

FEATURE ACTIVATION

--enable-neural-networking (EXPERIMENTAL)

Enables neural networking support (see the section NEURAL NETWORKING). This

feature is only presently supported by the mysq_drv and pgsql_drv

storage drivers, and is still considered experimental.

--enable-source-address-tracking

Logs the source address of spams and innocent messages via syslog.

You can create a file DSPAM_HOME/mta.whitelist which can contain a list of

local MTA IPs, which will cause DSPAM to skip to the next 'Received' header.

Each IP should be on a new line.

Also writes SBL blacklist files for use with the Streamlined Blackhole

Server (http://www.nuclearelephant.com/projects/sbl/).

--enable-spam-subject

Prepends [SPAM] to the subject header of any messages suspected to be spam.

This is sometimes more useful than the X-DSPAM-Result field, because not

all mail clients support mail rules with custom headers.

--disable-user-logging

Disables the writing of per-user .log files. Users will not be able to

view graphs or history with this feature disabled.

--disable-system-logging

Disables the writing of the system.log. Admins will not be able to view

graphs or other related information with this feature disabled.

算法规则激活(ALGORITHM ACTIVATION)

默认的已激活算法规则已经非常够用,表现了DSPAM中最彻底测试过的算法。没有必要改动任何选项,除非

您对改变DSPAM的默认方式特别感兴趣。

--disable-traditional-bayesian

禁止传统的Bayesian 算法(默认已激活)。

--disable-alternative-bayesian

禁止Brian Burton 的算法,选择Bayesian 算法。不同之处在于:

-用27个例子代替15个例子

-在计算中出现过一次以上的记号会取走两个扩展槽。当数据很有限时,这一点比较理想

(默认已激活)

--enable-robinson

可用Robinson的几何平均数测试。不同之处在于:

-窗口型号25取代了15

-联合算法也有区别。参见:

http://radio.weblogs.com/0101454/stories/2002/09/16/spamDetection.html

此算法非常陈旧,不推荐使用于产品成果建立。

ALGORITHM ACTIVATION

The default algorithms enabled are quite sufficient, and represent the most

well-tested algorithms in DSPAM. It is not necessary to change any of

these options unless you are interested in altering DSPAM's default behavior.

--disable-traditional-bayesian

Disables the traditional Bayesian algorithm (it is enabled by default).

--disable-alternative-bayesian

Disables Brian Burton's alternative Bayesian algorithm. The differences are:

- 27 Samples are used instead of 15

- Tokens appearing more than once may take up to 2 slots in the

calculation. This is ideal when there is very limited data

(it is enabled by default)

--enable-robinson

Enables Robinson's geometric mean test. The differences are:

- A window-size of 25 is used instead of 15

- The combination algorithm is different. See:

http://radio.weblogs.com/0101454/stories/2002/09/16/spamDetection.html

for more information.

This algorithm is obsolete, and not recommended for production builds.

--enable-chi-square

Fisher-Robinson的 Inverse Chi-Square算法可用。

在libdspam.c中默认的是:

Defaults in libdspam.c:

- Exclusionary radius of 0.45

- Ham/Spam Cutoff of 0.5

- Strength: 0.1

- Assumed probability: 0.5

注: 您可以同时激活多种算法规则;如果某个算法认为某邮件是垃圾邮件,他会直接打上标记。自然地,您也会

发现潜在的问题,即由这些算法产生的假阳性邮件,由此,推荐或者坚持一个算法,或者只用Bayesian

或Robindon的算法。Bayesian+Alt-Bayesian看起来是最有效的联合(根本不用Robinson算法)。

正是这个原因,如果您想用默认禁止的算法时,强烈推荐您同时:

--disable-traditional-bayesian --disable-alternative-bayesian

一般来说,alternative-Bayesian算法有时发现一些传统算法没有发现的垃圾邮件,但是,它相比传统算法

会遗漏更多的垃圾邮件。由此,两个Bayesian算法同时使用看来是最佳的办法。

--enable-chi-square

Enables Fisher-Robinson's Inverse Chi-Square

Defaults in libdspam.c:

- Exclusionary radius of 0.45

- Ham/Spam Cutoff of 0.5

- Strength: 0.1

- Assumed probability: 0.5

NOTE: You may have multiple algorithms enabled simultaneously; if any of

the enabled algorithms believe the message is spam, it will be marked

accordingly. Naturally, you also have the potential problem of any

false positives generated by the enabled algorithms, so it is recommended

to either stick with a single algorithm, or use only Bayesian or only

Robinson's type algorithms. Bayesian+Alt-Bayesian seems to be the most

effective combination (not using Robinson's at all).

For this reason, if you plan on enabling any algorithms which are

disabled by default, it is strongly recommended that you also:

--disable-traditional-bayesian --disable-alternative-bayesian

Generally, the alternative-Bayesian algorithm appears to catch some spams

that the traditional Bayesian algorithm does not, however it also misses

far more spams than the traditional algorithm. Therefore, an

implementation using both Bayesian algorithms appears to be the most

effective in catching spam.

--disable-bias

当偏见被禁止后,dspam不再为了正常邮件而偏爱统计学,而是以平等的计算来平等的评估垃圾和正常邮件。

这或许会对垃圾过滤更有效,但是也提高了假阳性的数量。

--enable-robinson-pvalues

Robinson的联合p-valuse方法可用。这个方法和下面描述的产生单词概率可以二者择一:

http://www.linuxjournal.com/article.php?sid=6467

Robinson的p-values方法目前用于Chi-Square的计算,但是让它们带上标记就会使其用于“所有的”计算,

且有效的取代(或是依赖于)Graham的标记方法。这个标记在Chi-Square禁用时也可用。

--disable-test-conditional

禁用test-conditional训练。Test-conditional训练与传统的相比是一个更加有力的方式,更迅速的提供了

更多的inoculous结果。

默认已激活,训练的模式会自动重新训练用户的垃圾或假阳性词典,直到条件为met(例如直到用户的字典不

再对疑似邮件产生错误的分类) 。这种再训练最多可以迭代5次,当以下情况时才被调用:

-当用户有多于1000封正常邮件时,且报告有垃圾邮件

-用户正在报告有假阳性邮件(有多少邮件可不计)

--disable-bias

When bias is disabled, dspam no longer biases the statistics in favor of

innocent mail, but measures both spam and innocent tokens equally in the

calculation equally. This may provide more effective spam filtering,

but has shown to increase the number of false positives.

--enable-robinson-pvalues

Enable's robinson's technique for combining p-values. This is an alternative

approach to generating word probabilities described here:

http://www.linuxjournal.com/article.php?sid=6467

Robinson's p-values are presently used in Chi-Square calculations, but

enabling them with this flag will use them for *all* calculations effectively

replacing (or rather building upon) Graham's tokenization approach. This

flag may also be used without enabling Chi-Square.

--disable-test-conditional

Disables test-conditional training. Test-conditional training is a more

agressive approach to training than traditional training, and provides more

inoculous results rapidly.

Enabled by default, this mode of training will automatically re-train the

user's dictionary on spam or false positive until the training condition is

met (e.g. until the user's dictionary no longer results in

misclassification of the message being retrained). This training has a

maximum number of 5 iterations, and will only invoke when:

- The user has 1000 innocent messages in their corpus, and is reporting

a spam

- The user is reporting a false positive (regardless of the number of

messages in their corpus)

然而这种training的方式也有争议。所有的论点都是围绕着一个假设:将来这种training的方式很可能导致

您不止一次的接收同一个(或是非常相似的)邮件。

- 既然邮件被重复retrain,那么学习曲线将只基于某一封邮件而不是基于包含不同内容的相似邮件群。

- 很有可能某个用户会重复train一个只收到过一次的垃圾邮件,但是这将会潜在的增加假阳性的风险。

- 如果用户的正常邮件与引进被重复训练的垃圾邮件之间的字典标记非常雷同的话,会使用户潜意识的终止用

垃圾邮件retraining,接着终止用假阳性retraining,然后再终止用垃圾邮件retraining。

尽管有这些争议,但是这种training的方法在许多应用中取得里极大的成功。

This method of training has its controversial points as well. All of these

issues revolve around the assumption this approach to training makes that

you are likely to receive the same (or very similar) again one or more times

in the future.

- Since the message is being retrained repeatedly, the learning curve is

going to be based solely on that one message rather than the natural flow

of similar messages that may contain slightly different text.

- It's possible a user may agressively train a spam they will only receive

once but could potentially increase their risk of false positives by

training this agressively.

- If there is a significant overlap of dictionary tokens between a user's

regular mail and the incoming spams being agressively trained, the user

could potentially end up retraining with spam, then retraining with

false positives, then retraining with spam again.

In spite of these controversial points, this approach to training has had

successful results with several implementations.

驱动程序细节配置开关(DRIVER SPECIFIC CONFIGURE SWITCHES)

DRIVER SPECIFIC CONFIGURE SWITCHES

libdb4_drv:

--with-db4-includes=DIR

Specify a path to the Berkeley db4 includes

--with-db4-libraries=DIR

Specify a path to the Berkeley db4 libraries

libdb3_drv:

--with-db3-includes=DIR

Specify a path to the Berkeley db3 includes

--with-db3-libraries=DIR

Specify a path to the Berkeley db3 libraries

(Currently links to -ldb3, to you may need to symlink libdb-3.3.so to

libdb3.so if it doesn't exist)

mysql_drv:

--with-mysql-includes=DIR

Specify a path to the MySQL includes

--with-mysql-libraries=DIR

Specify a path to the MySQL libraries

(Currently links to -lmysqlclient, also -lcrypto on some systems)

--enable-virtual-users

Tells DSPAM to create virtual user ids. Use this if your users don't

actually exist on the system (e.g. in /etc/passwd if using a password file)

NOTE: Please see the file tools.mysql_drv/README for more information

about configuring the mysql_drv storage driver.

pgsql_drv:

--with-pgsql-includes=DIR

Specify a path to the PgSQL includes

--with-pgsql-libraries=DIR

Specify a path to the PgSQL libraries

(Currently links to -lpq, and netlibs on some systems)

--enable-virtual-users

Tells DSPAM to create virtual user ids. Use this if your users don't

actually exist on the system (e.g. in /etc/passwd if using a password file)

NOTE: Please see the file tools.pgsql_drv/README for more information

about configuring the pgsql_drv storage driver.

ora_drv:

--with-oracle-home=DIR

Specify the Oracle Home (or client home)

--enable-virtual-users

Tells DSPAM to create virtual user ids. Use this if your users don't

actually exist on the system (e.g. in /etc/passwd if using a password file)

NOTE: Please see the file tools.ora_drv/README for more information

about configuring the ora_drv storage driver.

2. BUILDING AND INSTALLING

After you have run configure with the correct options, build and install

DSPAM by performing:

make %26amp;%26amp; make install

If you are a developer wanting to link to the core engine of dspam,

libdspam will be built during this process. Please see the

example.c file for examples of how to link to and use libdspam. Static

and dynamic libraries are built in the .libs directory. Needed headers

will be installed in $prefix$/include/dspam.

3. 权限

安装后,DSPAM_HOME会自动生成(默认路径是/var/dspam)。确保您的MTA 和CGI 用户在这个路径上有写入的权限。

或许您需要在/etc/group下的the directory's [mail] group中添加root 和MTA用户。MTA用户通常是

'daemon' 或 'smmsp',尽管在FreeBSD中默认为'mailnull'。这一点很重要,因为您的MTA用户需要

和文件打交道。

非常重要!!!(IMPORTANT!!!)

FreeBSD的mail.local更改了其有效的uid,因此,为了使它在命令行真正地起作用,dspam必须作为setuid root安装。这在安装过程中自动完成。

如果您发现DSPAM正在错误地为某个用户处理所有的操作,可能是那个用户作为一个administrative user已被加入到trusted.users中。

3. PERMISSIONS

After install, the DSPAM_HOME will have been created for you automatically

(the default is /var/dspam). Insure the permissions of the directory

are writable by both your MTA and CGI user.

You may need to add root and your MTA user to the directory's [mail] group

in /etc/group. The MTA user is usually 'daemon' or 'smmsp' although on

FreeBSD the default is 'mailnull'. This is very important, as your MTA

user needs to be able to lock and work with files.

IMPORTANT!!!

FreeBSD's mail.local changes its effective uid, and so in order to use it

dspam must be installed as setuid root to work on the commandline properly.

This is done automatically on install.

If you find that DSPAM is erroneously processing all operations as a single

user, chances are that user should be added to trusted.users as an

administrative user,

信任用户安全管理(TRUSTED USERS SECURITY)

DSPAM对系统内的不可靠用户有着严格的安全体系,目的是防止他们欺骗其他用户或者指定其自己的通行参

数(passthru arguments)潜在地劫持发送代理。应用这种安全方法是因为执行某些命令(比如使用procmail)

时会要求setuid或是setgid DSPAM代理。

trusted.users文件应该创建在$DSPAM_HOME (默认是 /var/dspam)中。该文件应该包含trusted users的名单,

这些trusted users允许设置或限制垃圾用户,passthru parameters及其他被某些恶意用户设置的具有潜在

危险的信息。该文件一行一个用户名,通常都是MTA和CGI用户的用户名。例如:

root

smmsp

daemon

cgi

mailnull

Where cgi represents the special CGI user you configure Apache to

run your dspam.cgi as.

TRUSTED USERS SECURITY

DSPAM has tighter security for untrusted users on the system, to prevent

them from being able to spoof other users or specify their own passthru

arguments to potentially hijack the delivery agent. This method

of security has been implemented due to the fact that some implementations

(such as those using procmail) may require the DSPAM agent to be setuid or

setgid.

The trusted.users file should be created in $DSPAM_HOME (defaulted to

/var/dspam). This file should contain a list of trusted users who

should be allowed to set the dspam user, passthru parameters, and other

information that would be potentially dangerous for a malicious user to

be able to set. The file should contain one username per line, and will

generally the usernames of the MTA and CGI users. Example:

root

smmsp

daemon

cgi

mailnull

Where cgi represents the special CGI user you configure Apache to

run your dspam.cgi as.

一定要检查DSPAM_HOME/dspam.debug以确保当提交垃圾或假阳性邮件时您没有收到任何不可靠用户的警告,

因为这些actions会经常从不同的用户调用垃圾邮件而不是从标准投递调用。

如果您在调用DSPAM匹配目的用户之前已经更改了userid的MTA时,您不该(should, NOT)把每个用户都添加到trusted users文件中,您应该配置一个事先调整的命令行。DSPAM就会看到这个用户是不可靠的用户,自动设置其

DSPAM用户id和随意配置发送代理参数。

为了不考虑某个untrusted user的通过代理参数(是指可以用来攻击发送代理以获得访问系统的特权的参数),您只需在相同的目录

($DSPAM_HOME)中建一个untrusted.mailer_args的文件。第一行应该是到发送代理的路径,接下来是所有

要通过的LDP参数列表(如果必要的话可以包括每个用户的是识别标志)。这个文件的信息将不会考虑任何

由用户指定的通过命令行的参数。例如:

/bin/mail -d $u

变量$u告诉DSPAM您愿意目标用户名可以用于$u被指定的地方,因此当DSPAM为用户'bob'调用您的LDA时,

他将会这样调用:

/bin/mail -d bob

Be sure to examine DSPAM_HOME/dspam.debug to insure that you don't get any

untrusted user warnings when submitting spam or a false positive, as both

of these actions frequently call dspam from a different user than

standard mail delivery.

If you are using an MTA that changes its userid before calling DSPAM to

match the destination user, you should NOT add each user to the trusted

users file, but instead configure a preset commandline. DSPAM will see

that the user is not trusted and automatically set their DSPAM user id

and optionally the passthru delivery agent arguments.

To override an untrusted user's passthru delivery agent arguments

(arguments which could be used to hijack the delivery agent to gain

privileged access to the system) you will need to set up a file called

untrusted.mailer_args in the same directory ($DSPAM_HOME). The first line

should contain the path to the delivery agent followed by a list of

all the LDA arguments to pass through (including a user identity flag if

necessary). This file's information will override any passthru commandline

parameters specified by the user. For example:

/bin/mail -d $u

The variable $u informs DSPAM that you would like the destination username

to be used in the position $u is specified, so when DSPAM calls your LDA

for user 'bob', it will call it with:

/bin/mail -d bob

注:如果下列所有(ALL)事件都是真:

- 您的MTA在调用DSPAM之前对目标用户执行setup()

- 在配置文件中不能指定,但是还必需得传递给DSPAM的参数additional_dynamically assigned_paramerers存在

- 发送代理没有潜在危险的命令行参数选项,或者您给发送代理加了一层封皮

那么您或许希望删除untrusted.mail_args文件。如果没有发现文件,dspam将允许用户向预先配置了的LSA

(和一些合乎情理的核实要素)指定自己的通过参数,如果不正确的安装这会产生潜在的不安全因素。为了

忽略用户参数,强烈推荐您使用此文件。

不能打开untrusted.mailer_args文件时DSPAM会警告您(通过日志纪录)。

如果您不想看见这个警告的话,去建一个空的untrusted.mailer_args文件吧。

NOTE: In the event that ALL of the following are true:

- Your MTA performs a setuid() to the destination user prior to calling

DSPAM

- There are additional _dynamically assigned_ parameters that must be

passed to DSPAM which cannot be specified in configuration

- The delivery agent has no potentially dangerous commandline

options, or you are placing a wrapper around the delivery

agent

Then you may want to remove the untrusted.mailer_args file all together.

If the file cannot be found, dspam will permit the user to specify their

own passthru arguments to the preconfigured LDA (with some basic sanity

checking) which COULD POTENTIALLY BE INSECURE if improperly set up.. It

is strongly recommended you use this file to override the user.

DSPAM warns you (over log record) when unable to open

untrusted.mailer_args file.

If you don't want to see this warning then make untrusted.mailer_args

file exists but empty.

4. 配置服务器

有两种配置DSPAM的方法:

Mail Server: 当邮件来到时,使DSPAM直接和邮件服务器以及垃圾过滤器结合成整体的默认方式。

POP3 可选择的实现POP3的方法,用户连接到该代理为了检查他们的邮件,当下载完以后邮件就被过滤。POP3方法

比较简单,因为它和邮件服务器之间不需要配置太多的参数(同时也是在Exchange等实现DSPAM的理想工具)。

最大的区别在于前者(邮件服务器)在MTA时间过滤邮件,而后者(POP3代理)在MUA时间处理邮件过滤,而

且后者还有额外的好处:不必担心虚拟用户等等。

4. SERVER CONFIGURATION

There are two ways DSPAM can be configured:

Mail Server: The default approach integrates DSPAM directly with the mail

server and filters spam as mail comes in.

POP3 Proxy: The alternative approach implements a POP3 proxy where users

connect to the proxy to check their email, and email is filtered when

being downloaded. The POP3 proxy is a much easier approach, as it

requires much less integration work with the mail server (and is ideal

for implementing DSPAM on Exchange, etcetera).

 
DSPAM v3.4.2 README DSPAM v3.0 Copyright (c) 2003 Network Dweebs Corporation http://www.nuclearelephant.com/projects/dspam/ LICENSE This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. TABLE OF CONTENTS 目录 DSPAM的一般知识 1.0 DSPAM介绍 1.1 安装 1.2 测试 1.3 故障处理 1.4 DSPAM工具 1.5 代理命令行参数 DSPAM的高级功能 2.0 连接libdspam 2.1 配置组群 2.2 外部播种学理论 其它功能 3.0 故障、端口等问题 3.1 已知的故障 3.2 给您的站点添加dspam标识按钮 3.3 访问CVS General DSPAM Information 1.0 About DSPAM 1.1 Installation 1.2 Testing 1.3 Troubleshooting 1.4 DSPAM Tools 1.5 Agent Commandline Arguments Advancced DSPAM functionality 2.0 Linking with libdspam 2.1 Configuring groups 2.2 External Inoculation Theory 2.3 Client/Server Mode 2.4 LMTP Miscellaneous 3.0 Bugs, Ports, and the like 3.1 Known Bugs 3.2 Adding the dspam logo button to your website 3.3 CVS Access 1.0 ABOUT DSPAM DSPAM是一个开放源代码,通过使用较高级的统计分析工具加上deobfuscation技术以及其他相关的方法, 来直接用于抗击商业邮件的反垃圾方案中。DSPAM能够学习每个用户的不同邮件的习性:根据这些习性告诉 过滤器什么是垃圾邮件。这就使得即使在一个非常庞大的系统中,DSPAM仍要为每个用户提供高精确度的、智能 的过滤功能。他提供了一个能够学习每个用户的邮件习性的管理维护功能,这些习性可能带有些许的假阳性 (false positives)。DSPAM是非常流行的防垃圾工具之一,他成功地完成了真正精确的垃圾过滤功能, 并且迅速获得一个巨大的支持论坛。 DSPAM is an open-source, freely available anti-spam solution designed to combat unsolicited commercial email using an advanced implementation of statistical analysis coupled with deobfuscation techniques and other related approaches. DSPAM is capable of learning each user's individual mail behavior based on what they tell the filter spam is and isn't. This allows DSPAM to provide highly-accurate, personalized filtering for each user on even a large system. This provides an administratively maintenance free system capable of learning each user's email behaviors with very few false positives. DSPAM is among one of the more popular and successful attempts at truly accurate spam filtering, and is rapidly gaining a large support forum. Contributions to the project are welcome via the dspam-dev mailing list. DSPAM可以通过一下两种方式实现: 1.DSPAM邮件代理提供服务器支持的垃圾过滤,隔离箱,和一个促进系统进行自动分析垃圾的机制。支持先 进的特性,例如,进/出选择(opt-in/opt-out)过滤,接种(inoculation),和共享组群。 2.开发人员可以把他们的项目连接到dspam的内核引擎(libdspam)中,并与GPL协议许可证一致。这就使 得开发人员可以把libdspam立即并入自己的垃圾过滤应用软件:例如邮件客户机,其他的反垃圾邮件工具,等等。 许多基本原则合成了这个代理,在http://paulgraham.com/spam.html网站上可以看到PaulGraham抗击垃圾 邮件的白皮书中写到了这些原则。人们对原始内核提出可许多新的方法,有些方法的说明可以在DSPAM主页 的白皮书中看到。 DSPAM can be implemented in one of two ways: 1. The DSPAM mailer-agent provides server-side spam filtering, quarantine box, and a mechanism for forwarding spams into the system to be automatically analyzed. Advanced features, such as opt-in/opt-out filtering, inoculation, and shared groups are supported. 2. Developers may link their projects to the dspam core engine (libdspam) in accordance with the GPL license agreement. This enables developers to incorporate libdspam as a "drop-in" for instant spam filtering within their applications - such as mail clients, other anti-spam tools, and so on. Many of the foundational principles incorporated into this agent were contributed by Paul Graham's white paper on combatting SPAM, which can be found at http://paulgraham.com/spam.html. Many new approaches have been layered on top of the original core, some of which may be explained in white papers on the DSPAM home page. DSPAM可以分解成以下几部分: DSPAM内核引擎 DSPAM内核引擎,即libdspam,提供几乎所有主要的垃圾过滤函数。该引擎连接到其他的dspam构件(或shell) 上提供功能性。内核引擎能够同任何其他的应用软件连接,并作为”顺便拜访者“向邮件客户机,或其他的 反垃圾工具、或其他类似的项目提供垃圾过滤功能,并使之受益于此。许多静态的和共享的版本就是通过.libs 下的libtool建立的。 libdspam提供一个存储驱动提取层,这使得开发人员可以较容易的理解信息是如何被存储在系统(例如Berkeley DB, MySQL, Oracle, 等等)中,这就使得他们有足够的灵活性利用石板和凿子写存储驱动。 The DSPAM Solution is split up into the following pieces: DSPAM CORE ENGINE The DSPAM core engine, also known as libdspam, provides all major spam filtering functions. The engine is linked to other dspam components (or shells) to provide functionality. The core engine is capable of being linked in with any other application as a "drop-in" to provide spam filtering to mail clients, other anti-spam tools, and other such type projects that would benefit from its use. Both static and shared versions are built by libtool into .libs. libdspam provides a storage driver abstraction layer, enabling developers to easily change how information is stored on the system (for example Berkeley DB, MySQL, Oracle, etc.) with enough flexibility to write a storage driver utilizing stone tablets and chisels. DSPAM代理 DSPAM代理是为libdspam提供垃圾过滤邮件服务器或其他服务器支持工具的直接接口的shell。代理通常指以 下两者之一: 1.代理可以化妆成邮件服务器的本地发送代理。然后DSPAM就处理邮件服务器通过管道传来的邮件,接着用 真实发送代理(procmail,mail.local,或通过一代理人把他传到另一个服务器),但若是垃圾邮件就会将 其隔离(DSPAM也可以选择标识并发送垃圾邮件)。 2.作为POP3代理,当用户检查时,DSPAM能被设定可处理电子邮件以及标识垃圾邮件。这就允许DSPAM在没有 综合需求的情况下自动到达任何邮件的前端。 MTA(sendmail,exim,qmail,等等)或POP3代理用参数识别目的地用户和其他操作参数调用DSPAM。DSPAM完成 其内部计算后将基于此结果执行适当的动作。 当邮件发送至终端用户时,代理对每个邮件设置了一系列的数字搜索路径。这些数字代表存储在服务器的临 时信息:包含邮件的原始数据,同时也用于当DSPAM出错时重学原始邮件。这允许DSPAM在不必提供完整邮件 头时精确地获悉————为了终端用户的生活跟方便。 DSPAM AGENT The DSPAM agent is a shell for libdspam providing a direct interface to mail servers or other tools for server-side spam filtering. The agent is normally integrated into one of two places: 1. The agent can masquerade as a mail server's local delivery agent. DSPAM then processes email piped to it from the mail server and then either delivers it using the real delivery agent (procmail, mail.local, or a proxy to pass it along to another server), or will quarantine it if the message is spam (DSPAM can optionally tag and deliver spams as well). 2. As a POP3 proxy, DSPAM can be configured to processes email when the user checks theirs, and tags spam accordingly. This allows DSPAM to front-end any mail system without the need for integration. The agent is also responsible for correcting misclassifications (missed spams or false positives). This is critical to the learning operations of DSPAM. The MTA (sendmail, exim, qmail, etc) or the POP3 proxy calls DSPAM with parameters identifying the destination user and other operational parameters. DSPAM performs its internal calculations and will then perform the appropriate action based on the result. When an email is delivered to the end-user, the agent appends a serial number to each email. This serial number references temporary information stored on the server which contains the original training data for the message, and is used to re-learn the original message in the event DSPAM made a mistake. This allows DSPAM to accurately learn without having to provide the full headers of the message - making life much easier for end-users. CGI CLIENT CGI客户机是一个是邮件用户看到其隔离箱的终端用户工具,可以倒转偶然的假阳性,看到用户的历史动作, 而且最重要的是可以永久删除垃圾邮件。CGI客户机用来和DSPAM代理联合一致。在一可选择的解决方案中, 比如客户机-过滤/前进,消除隔离箱是必要的,但是很多用户会感激--要是不用下载所有的垃圾邮件的话, 同时也能够查看使用曲线图和其他有用的信息。 工具 提供管理字典,自动化文集,创造种子[合成的]字典。 CGI CLIENT The CGI client is an end-user tool enabling a mail user to view their quarantine box, reverse the occasional false positive, view their historical activity, and most importantly to delete their spams permanently. The CGI client works in conjunction with the DSPAM agent. It is possible to eliminate the quarantine box in lieu of an alternative solution, such as client-filtering/forwarding, but many users will appreciate not having to download all of their spam, and being able to view usage graphs and other useful information. TOOLS Some basic tools which have been provided to manage dictionaries, automate corpus feeding, and create seeded [composite] dictionaries. 1.1 安装 主要步骤 ------------------------------------------------------ 重要升级步骤: 适用于版本 ------------------------------------------------------ 3.0版本合并了用户接口的许多变化,但是保留了常规的数据结构,因此用户没有必要为了升级再重新学。 步骤1:关闭现有的DSPAM安装。 现有的DSPAM安装必须先于升级前关掉。最简单的方法是关掉为DSPAM CGI服务的MTA和web服务器。在升级时不得处理任何邮件。 步骤2:数据存储升级 如果您用的是基于SQL的驱动程序,只需作很小的改动即可。这些改变只对基于SQL的驱动程序是必要 的;如果您用的是BerkeleyDB存储驱动则无须作改动。 下面的SQL代码应该更新MySQL和Oracle数据库的3.0版格式。确定用DSPAM进入schema以便能运用这些改动。 alter table dspam_stats add spam_learned int; alter table dspam_stats add innocent_learned int; alter table dspam_stats add spam_classified int; alter table dspam_stats add innocent_classified int; update dspam_stats set spam_learned = total_spam; update dspam_stats set innocent_learned = total_innocent; update dspam_stats set spam_classified = 0, innocent_classified = 0; alter table dspam_stats drop column total_spam; alter table dspam_stats drop column total_innocent; alter table dspam_stats add spam_misclassified int; alter table dspam_stats add innocent_misclassified int; update dspam_stats set spam_misclassified = spam_misses; update dspam_stats set innocent_misclassified = false_positives; alter table dspam_stats drop column spam_misses; alter table dspam_stats drop column false_positives; 步骤3:编译DSPAM V3.0 DSPAM v3.0改动了很多命令行的特征。其他的configure-time参数也有所改动或删除。下面列举了对 configure-timed做的变动: --with-userdir-* changed 'userdir' to 'dspam-home' --with-local-delivery-agent changed to --with-delivery-agent --enable/disable-chained-tokens removed from configure --enable/disable-bnr removed from configure --enable/disable-whitelist removed from configure --enable/disable-toe removed from configure --enable/disable-tum removed from configure --enable/disable-spam-delivery removed from configure --enable/disable-deliver-fp removed from configure 一旦您配置了DSPAM,运行:make %26amp;%26amp; make install 进行编译安装软件。 注:默认的DSPAM路径由changed from /etc/mail/dspam 变为 /var/dspam.如果您想用老的路径, 请用 --with-dspam-home=/etc/mail/dspam 指定。 1.1 INSTALLATION UPGRADING ------------------------------------------------------ IMPORTANT UPGRADE STEPS FOR USERS UPGRADING FROM ------------------------------------------------------ Version 3.0 incorporates many changes to the user interface, but preserves the general data structure so that users don't need to re-train in order to upgrade. Step 1: Shut down the existing DSPAM installation. The existing DSPAM installation should be shut down prior to any upgrade changes. The easiest way to do this is to turn off the MTA and web server serving the DSPAM CGI. No mail should be processed while the changes are being made. Step 2: Data Storage Changes If you are using a SQL-Based driver, a few minor changes will need to be made. These changes are only necessary to SQL-Based drivers; no changes need to be made if you are using the BerkeleyDB storage drivers. The following SQL code should upgrade both MySQL and Oracle databases to the v3.0 format. Be sure to log into the schema used by DSPAM to apply these changes. alter table dspam_stats add spam_learned int; alter table dspam_stats add innocent_learned int; alter table dspam_stats add spam_classified int; alter table dspam_stats add innocent_classified int; update dspam_stats set spam_learned = total_spam; update dspam_stats set innocent_learned = total_innocent; update dspam_stats set spam_classified = 0, innocent_classified = 0; alter table dspam_stats drop column total_spam; alter table dspam_stats drop column total_innocent; alter table dspam_stats add spam_misclassified int; alter table dspam_stats add innocent_misclassified int; update dspam_stats set spam_misclassified = spam_misses; update dspam_stats set innocent_misclassified = false_positives; alter table dspam_stats drop column spam_misses; alter table dspam_stats drop column false_positives; Step 3: Compile DSPAM v3.0 DSPAM v3.0 has moved many features out to the commandline. Other configure-time arguments have also been changed or removed. The following is a list of configure-time changes that have been made: --with-userdir-* changed 'userdir' to 'dspam-home' --with-local-delivery-agent changed to --with-delivery-agent --enable/disable-chained-tokens removed from configure --enable/disable-bnr removed from configure --enable/disable-whitelist removed from configure --enable/disable-toe removed from configure --enable/disable-tum removed from configure --enable/disable-spam-delivery removed from configure --enable/disable-deliver-fp removed from configure once you have configured DSPAM, run: make %26amp;%26amp; make install to build and install the software. NOTE: The default DSPAM home has been changed from /etc/mail/dspam to /var/dspam. If you would like to use the old path, specify it using --with-dspam-home=/etc/mail/dspam. 步骤4:更新CGI DSPAM CGI必须得更新。许多的调用参数已经被改变了。 步骤5:重行配置MTA DSPAM的命令行参数已被重写。您应该为所有新的命令行成分的圆满解释,考虑'AGENT COMMANDLINE ARGUMENTS' 部分。一些变动如下: --addspam, --falsepositive, --corpus, and --inoculate have been replaced with two flags to specify a pre-classification and a classification source: --addspam becomes: --class=spam --source=error --falsepositive becomes: --class=innocent --source=error --corpus becomes: --class=innocent --source=corpus --class=spam --source=corpus --inoculate becomes: --class=spam --source=inoculation 指定训练模式(training mode),发送参数选择,还有命令行参数选择是必要的。例如: --mode=teft --deliver=innocent --feature=chained,bnr 否则,如果您喜欢一起发送合法邮件和垃圾邮件,您应该用 --deliver=innocent,spam。谨记您的这点喜好将被用于您将使用的任何操作系统。例如,假如您重新训练 一个假阳性或接近垃圾的邮件,--deliver参数将指定您是否想发送他,因此您可以享受在MTA,aliases, 和CGI中定义不同的命令行参数。 Step 4: Upgrade the CGI The DSPAM CGI has been updated, and should be upgraded. Many calling arguments have been changed. Step 5: Reconfigure your MTA DSPAM's commandline arguments have been rewritten. You'll want to consult the 'AGENT COMMANDLINE ARGUMENTS' section for a full explanation of all the new commandline components. Some of the basic changes are: --addspam, --falsepositive, --corpus, and --inoculate have been replaced with two flags to specify a pre-classification and a classification source: --addspam becomes: --class=spam --source=error --falsepositive becomes: --class=innocent --source=error --corpus becomes: --class=innocent --source=corpus --class=spam --source=corpus --inoculate becomes: --class=spam --source=inoculation It will also be necessary to specify the training mode, delivery preferences, and feature selection on the commandline as well. For example: --mode=teft --deliver=innocent --feature=chained,bnr Or if you prefer delivery of both innocent and spam messages, you should use --deliver=innocent,spam. Keep in mind that these preferences will be applied to whatever operation you are calling. For example, if you are retraining a false positive or forwarding in a spam, the --deliver argument will specify whether or not you want to deliver it, so you have the luxury of defining different commandline arguments between your MTA, aliases, and CGI. 步骤6:目录结构已被改变,因此所有的用户目录都在$DSPAM_HOME/data中。您可以删除所有的用户目录 (.stats等文件将被重新编译,但是隔离箱不会),或者把他们移到 $DSPAM_HOME/data中。 步骤7:打开您的MTA和CGI,做一个全面的测试。 刷新安装 首先您得下载一些必要的工具: 这取决于您想用那种驱动程序,您需要: libdb4_drv: Berkeley DB-4. libdb3_drv: Berkeley DB-3. mysql_drv: MySQL client libraries (and a server to connect to) ora_drv: Oracle Call Interface (and a server to connect to) pgsql_drv: PostgreSQL client libraries (and a server to connect to) MYSQL时被推荐的存储驱动程序,即使是执行小项目,他也比其他驱动稳定且好测试。如果您没有办法运行 一个稳定的服务器,libdb驱动应该满足了,但是请注意,libdb 偶尔会导致一些问题,包括data corruption 和 lock contention。结果,您不得不做一个备份以免出现这些问题。 一般来说,MYSQL是一个较快的解决方案且占用较小的存储器,同时适合小型或大规模的运行。 Step 6: The directory structure has been changed so that all user directories go into $DSPAM_HOME/data. You'll want to either delete all user directories (the .stats files and such will be rebuilt, but the quarantine boxes won't), or move them into $DSPAM_HOME/data. Step 7: Turn your MTA and CGIs back on, and TEST EVERYTHING. FRESH INSTALLATION First you will need to download a few prerequisite tools: Depending on which storage driver you want to use, you will need: libdb4_drv: Berkeley DB-4. libdb3_drv: Berkeley DB-3. mysql_drv: MySQL client libraries (and a server to connect to) ora_drv: Oracle Call Interface (and a server to connect to) pgsql_drv: PostgreSQL client libraries (and a server to connect to) MySQL is the recommended storage driver, even for small implementations, as it is more stable and tested than the other drivers. If you are incapable of running a stateful server, the libdb drivers should suffice, but please be aware that libdb can occasionally result in some problems including data corruption and lock contention. As a result, you'll want to maintain a backup of your dictionary in the event such problems arise. In general, MySQL is a faster solution with a smaller storage footprint, and is well suited for both small and large-scale implementations. You can download Berkeley DB from http://www.sleepycat.com. You can download MySQL from http://www.mysql.com. You can download PostgreSQL from http://www.postgresql.com You can obtain more information about Oracle at http://www.oracle.com. Be sure the necessary libraries are available to root, the MTA user, and the CGI user. The easiest way to do this is to copy them to /usr/lib or /lib. Documentation for the setup of your selected storage driver can be found in the tools.[storage driver]/ directory. NOTE: Some operating system distributions include their own version of libdb3_drv and libdb4_drv. A majority of these packaged versions do work correctly with DSPAM, however a few do not. If you experience problems with one of the libdb storage drivers, consider downloading and compiling the official source tree from http://www.sleepycat.com. 1. 配置结构 ./configure [options] DSPAM支持下面的配置结构: PATH SWITCHES --with-dspam-home=DIR 为dspam用户信息指定一个可选择的存储目录。默认路径是/var/dspam。 --prefix=DIR 安装时指定一个可选的root 路径前缀。默认方式为:/usr/local。这并不影响DSPAM_HOME。 FILESYSTEM SCALE 默认的filesystem scale是:"small-scale",在顶级(top-level)DSPAM_HOME/data目录下把每个用户写 入自己的目录里。以下两个开关允许为了适合比较大型的安装而把scale作一定变动。 --enable-large-scale large-scale执行的开关。用户数据将以$DSPAM_HOME/data/u/s/user方式代替$DSPAM_HOME/data/user 被存储起来。 --enable-domain-scale domain-scale执行的开关。用的时候,username@domain会阿被当成用户的id,用户数据会被 当作$DSPAM_HOME/data/domain.com/user存储,同时,$DSPAM_HOME/opt-in/domain/user.dspam也 会代替$DSPAM_HOME/data/user。 1. CONFIGURATION ./configure [options] DSPAM supports the configuration options below. PATH SWITCHES --with-dspam-home=DIR Specify an alternative storage directory for dspam user information. The default is /var/dspam. --prefix=DIR Specify an alternative root prefix for installation. The default is /usr/local. This does not affect DSPAM_HOME. FILESYSTEM SCALE The default filesystem scale is "small-scale", and writes each user to its own directory in the top-level DSPAM_HOME/data directory. The following two switches allow the scale to be changed to be more suitable for larger installations. --enable-large-scale Switch for large-scale implementation. User data will be stored as $DSPAM_HOME/data/u/s/user instead of $DSPAM_HOME/data/user --enable-domain-scale Switch for domain-scale implementation. When used, username@domain should be passed in as the user id and user data will be stored as $DSPAM_HOME/data/domain.com/user and $DSPAM_HOME/opt-in/domain/user.dspam instead of $DSPAM_HOME/data/user INTEGRATION SWITCHES --with-delivery-agent=PROG 发送代理被称作邮件发送(deliver messages)。 用此来指定一个发送代理,而不是您的操作系统所指定的那个。尤其是您在一个不支持的平台上建立时, 您必须这样指定。如果您想包含额外的命令行标识,可能您会用到引号。DSPAM将自动替换最初给定的的命 令行参数,除了所有的DSPAM-specific参数(比如--user,--process,等等)。这并不要求必须得是一个本 地代理,但是必须得配置成可使某个代理可以通过。 当前,DSPAM已经为Linux,FreeBSD,Solaris,和Cygwin平台搭建了默认的发送代理(delivery agent)。 注:当指定一系列的参数时,您得在PROG周围加上引号。可能您也会用到标志符$u在参数列表的相应位置 用DSPAM指定用户ID的目的文件。例如: 在调用LDA之前$u将会在那里被目的用户所取代.然而,如果您的MTA要求用户参数列表已默认的方式最后出现, 这样就导致了潜在的问题。这就是为什么DSPAM允许您这样设置MTA配置的原因。 注:写$u时千万别忘记写$,只有在命令行中指定$u时可以不写$.这会防止$u被shell的环境变量'u'所覆盖。 您可以选择用%u。 INTEGRATION SWITCHES --with-delivery-agent=PROG The delivery agent is the tool called to deliver messages. Use this to specify an alternative delivery agent, other than the one specific to your operating system. If you are building on an unsupported platform, you will need to specify this. You may use quotes if you wish to include additional commandline flags. DSPAM will automatically relay the commandline parameters it was initially given, with the exception of any DSPAM-specific parameters (such as --user, --process, etc.). This does not necessarily need to be a local agent, but can be configured to call a proxy pass-through. Currently, DSPAM has a default delivery agent selected for Linux, FreeBSD, Solaris, and Cygwin platforms. NOTE: When specifying a series of arguments, you will need to use quotes around PROG. You may also use the $u identifier to specify that you with DSPAM to place the destination user's ID in the corresponding space in the arguments list. For example: Where $u will be replaced by the destination user prior to calling the LDA. This could potentially cause problems, however, if your MTA requires the user argument list to come last, which is why DSPAM, by default, will allow you to set this in the MTA configuration. NOTE: be sure to escape the $ in $u. Only do this when specifying $u on the commandline. This will prevent $u from being overwritten with the shell's environment variable 'u'. You may alternatively use %u. --with-quarantine-agent=PROG 默认情况是,在其内部用户隔离箱里DSPAM会自动隔离垃圾。要是您不想使用默认的方式,您就得指定您自 己的隔离代理。--with-delivery-agent选项也是同理。任何时候当某个邮件被认为是垃圾邮件时,隔离代 理将被调用。 --enable-broken-mta 要是您的MTA(报文传送代理)被破坏了,您就可以用此命令,用CTRL-M把邮件传到DSPAM中。 --enable-broken-return-codes 如果是垃圾邮件则使DSPAM返回99(退出码:exitcode),不是垃圾邮件则返回0,其他的返回值则表示有 错误发生。默认方式下不会考虑结果怎样,只要操作成功,就返回0。只有用这种方式您才能明白您在做什么。 --with-quarantine-agent=PROG By default, DSPAM automatically quarantines spams in its internal user quarantine box. If you wish to override this default behavior, however, you may do so by specifying your own quarantine agent. The same notes from the --with-delivery-agent option apply here. The quarantine agent will be called whenever a message is believed to be spam, with the message provided as stdin into the tool. --enable-broken-mta You should enable this if your MTA is broken and passes messages into DSPAM with CTRL-M's (^M) in them. --enable-broken-return-codes Causes DSPAM to return an exit code of 99 if a message is spam, 0 if innocent, and any other code if an error has occured. The default is to return 0 whenever the operation is successful, regardless of outcome. Only use this if you know what you're doing! --with-storage-driver=DRIVER 指定一个可选择的存储驱动。这个驱动是特地为DSPAM来写存储记号,签名数据,以及其他的私有操作。 通常默认的驱动是libdb4_drv,可以和Berkeley DB v4结合。下面给出了一些驱动: libdb4_drv: Berkeley DB4 Library libdb3_drv: Berkeley DB3 Library mysql_drv: MySQL Drivers ora_drv: Oracle Drivers (BETA) pgsql_drv: PostgreSQL Drivers (BETA) 您也许要用到某些特定的驱动来配置标记(以后讨论)。 --enable-client-compression 在用存储驱动之处(目前只有mysql_drv),使客户机数据源能够压缩。中导致数据源和其客户机的数据均被压缩。 如果您的数据源为了节约带宽而在一个与DSPAM代理分离的机器上时,您应该用此选项,但是这样会花费占 用一些CPU。 --disable-trusted-user-security 管理员们可以用此配置标识来使trusted user security 不可用。这样会使DSPAM对每一位用户都很“信任”,允许他们在服务器里通过DSPAM潜在执行任意的命令。 由此,管理员应该只用此于服务器关闭时,或是将其DSPAMbinary配置成只有可“信任”用户执行的形式。 这个选项绝对不应该用来当作解决MTA授权优先于调用DSPAM的办法。相反,请查看本文的TRUSTED SECURITY部分。 --with-storage-driver=DRIVER Specify an alternative storage driver. A storage driver is a driver written specifically for DSPAM to store tokens, signature data, and perform other proprietary operations. The default driver is libdb4_drv, which incorporates Berkeley DB v4. The following drivers have been provided: libdb4_drv: Berkeley DB4 Library libdb3_drv: Berkeley DB3 Library mysql_drv: MySQL Drivers ora_drv: Oracle Drivers (BETA) pgsql_drv: PostgreSQL Drivers (BETA) You may also need to use some of the driver-specific configure flags (discussed later). --enable-client-compression Enables data source client compression for storage drivers where it is available (presently only mysql_drv). This causes data between the data source and its clients to be compressed. You should use this option if your data source is on a separate machine from the DSPAM agent(s) as it conserves bandwidth, but at the expense of a few CPU cycles. --disable-trusted-user-security Administrators who wish to disable trusted user security may do so by using this configure flag. This will cause DSPAM to treat each user as if they were "trusted" which could allow them to potentially execute arbitrary commands on the server via DSPAM. Because of this, administrators should only use this option on either a closed server, or configure their DSPAM binary to be executable only by users who can be trusted. This option SHOULD NOT be used as a solution to your MTA dropping privileges prior to calling DSPAM. Instead, see the TRUSTED SECURITY section of this document. --enable-homedir-dotfiles 如果选择可用(enabled),DSPAM将在用户的主目录里检查.nodspam|.dspam文件,而不是检查 $DSPAM_HOME/$USER/opt-in/ $USR[.nodspam |.dspam]。这两个dotfiles用来过滤opt-out或opt-in。 --enable-opt-in 使DSPAM为只有.dspam dotfile的文件过滤邮件。默认方式是opt-out,它需要一个有.nodspam 文件回避过滤。 --enable-homedir-dotfiles When enabled, instead of checking for $DSPAM_HOME/$USER/opt-in/ $USER[.nodspam|.dspam], DSPAM will check for a .nodspam|.dspam file in the user's home directory. These two dotfiles are used for opt-out or opt-in filtering. --enable-opt-in Causes DSPAM to filter mail only for users with a .dspam dotfile. The default is opt-out, which requires a .nodspam file to exist to bypass filtering. 调试开关 --enable-debug 为调试输出DSPAM_HOME/dspam.debug和DSPAM_HOME/dspam.messages(有关DSPAM_HOME的详细资料请参 见--with-dspam-home的desription选项)打开support(Turnsonsupport)。这允许您可以为某个指定的用 户通过下放(drop)DSPAM_HOME/userpath/user.debug文件而打开邮件调试,或者为所有用户下放 DSPAM_HOME/.debug文件。使得调试工具只支持这种特性,而且为了打开邮件dotfile还必须得下放。 --enable-verbose-debug 打开非常详细的DSPAM_HOME/dspam.debug 和 DSPAM_HOME/dspam.messages (有关DSPAM_HOME的详细资料请参见--with-dspam-home的desription 选项) 的调试输出结果。dotfile仍然得下放以激活邮件,就像--enable-debug选项一样。 DEBUGGING SWITCHES --enable-debug Turns on support for debugging output to DSPAM_HOME/dspam.debug and DSPAM_HOME/dspam.messages (see desription of --with-dspam-home option for details about DSPAM_HOME). This option allows you to turn on debugging messages for specific users by dropping a DSPAM_HOME/userpath/user.debug file or for all users by dropping a DSPAM_HOME/.debug file. Enabling debug only enables support for this feature, dotfiles must still be dropped in order to turn messages on. --enable-verbose-debug Turns on extremely verbose debugging output to DSPAM_HOME/dspam.debug and DSPAM_HOME/dspam.messages (see desription of --with-dspam-home option for details about DSPAM_HOME). dotfiles must still be dropped in order to activate messages, just like with --enable-debug. 训练集辨识开关(TRAINING SET IDENTIFICATION SWITCHES) DSPAM的默认方式是存储所有的原始training data到作为暂时信息的服务器一边,嵌入一系列的数字到与相关数据有关的每个邮件的主体(body)中。 这用于错误分类以及提供真正的1:1再训练(retraining)。 某些执行或许会对训练集识别的要求有些许不同之处。 --enable-signature-attachments 取代了在服务器上存储DSPAM签名(这会腾出可观的磁盘空间),这个选项会为了包含一个dspam.dat附件而 导致DSPAM重写每个邮件,这包括为了计算原始邮件的所有记号。当垃圾邮件或是假阳性被返回到系统来处 理时,就会读这个签名。每封邮件大概平均会增加2k-32k的带宽,这取决于原始邮件的大小。 注:这个选项会和一些引进了先进邮件(比如某些或是所有的elm版本)的mail client产生冲突, 由此这些选项应该只用于那些所有的客户机都能完全理解embedded multipart message(如Outlook, Ximian Evolution,Etcetera)的网络中,而且可以把附件当作是附件而不是当作引用文本(quote text)。 换句话说,如果您的客户机网络不是标准的基于GUI的,这会突然导致过多的堵塞。服务器那边的签名仍然 是为所有客户机服务的最可靠的方法。 它总是在您收到的每一封邮件上别一个“曲别针”("paper clip")。 TRAINING SET IDENTIFICATION SWITCHES The default behavior for DSPAM is to store all original training data on the server-side as temporary information, and embed a serial number into the body of each message referencing the data. This is used for misclassifications and providing a true 1:1 retraining. Some implementations may call for a slightly different approach to training set identification. --enable-signature-attachments Instead of storing the DSPAM signatures on the server (which could take up considerable disk space), this option will cause DSPAM to rewrite each message to include a dspam.dat attachment, which contains all of the tokens used to calculate the original message. When the spam or false positive is processed back into the system, this signature will be read. May increase bandwidth on an average between 2k-32k per message, depending on the original message's size. NOTE: This option doesn't work correctly with mail clients that quote an embedded, forwarded message (such as some or all versions of elm) and should only be used on networks where all clients can properly understand an embedded multipart message (Outlook, Ximian Evolution, Etcetera), and forward the attachment as an attachment instead of quoted text. In othe words, this breaks a lot of stuff if you're not on a standardized GUI-based client network. Server-side signatures is still the most reliable method and works for all known clients. This also puts a "paper clip" on every message you receive. --enable-signature-headers 该选项使DSPAM签名写入邮件头而不是邮件体。 重点:该选项要求所有用户把他们的邮件当作附件返回到DSPAM中,或者执行一些宏命令以保留将会被标准 发送而NORMALLY BE DROPPED 的 X-DSPAM-Signature 邮件头。 --enable-webmail webmail开关是为某些系统而设计的。这些系统的源邮件保留在服务器里,而且为了再训练(retraining) 呈现原始的格式。该选项会导致DSPAM中止所有的签名写入和DSPAM写入到邮件中,而且会尽可能简单的发送 出邮件。这个模式需要(REQUIRES)源邮件显示出最初发送时的格式,由此可以再训练。就像在webmail或 是其他的应用中,在读邮件时邮件通常都是保存在服务器里的。不要(DO NOT)为了再训练(retraining) 而用这个开关,除非原始邮件确实有原始的邮件头而且没有被修改过(ORIGINAL HEADERS and NO MODIFICATIONS)。 --with-signature-life=DAYS 指定存储在服务器里的签名长度(以天为单位)默认值。默认值为14天。这个值应该准确描绘用户识可能别 和转寄一封丢失了的垃圾或是假阳性邮件的最长时间。要考虑到休假问题。可以在命令行中调用dspam_clean 来改变。 --enable-signature-headers This option will cause the DSPAM signature to be written to the message header instead of body. IMPORTANT: This option requires that all users either bounce their messages into DSPAM, forward as an attachment, or implement some macro that will retain the X-DSPAM-Signature header, which will NORMALLY BE DROPPED by standard forwarding. --enable-webmail The webmail switch is designed for systems where the original message remains server side and can therefore be presented in pristine format for retraining. This option will cause DSPAM to cease all writing of signatures and DSPAM headers to the message, and deliver the message in as pristine format as possible. This mode REQUIRES that the original message in its pristine format (as of delivery) be presented for retraining, as in the case of webmail or other applications where the message is actually kept server-side during reading, and is preserved. DO NOT use this switch unless the original message can be presented for retraining with the ORIGINAL HEADERS and NO MODIFICATIONS. --with-signature-life=DAYS Specifies the default length (in days) a signature should remain stored on the server. The default is 14 days. This value should accurately represent the maximum amount of time a user would need to identify and forward a missed spam, or mark a false positive. Consider vacations. This can be changed in calls to dspam_clean on the commandline. (特征激活)FEATURE ACTIVATION --enable-neural-networking (EXPERIMENTAL可选) 使中心网络支持可用(参见NEURAL NETWORKING部分)。目前只有mysq_drv 和 pgsql_drv 存储驱动支持该特征,而且也还只是试验性的。 --enable-source-address-tracking 通过syslog把垃圾邮件和正常邮件的源地址记入日志。您可以创建一个包含本地MTA IPs 的DSPAM/meta.whichlist文件,这样就让DSPAM跳到下一个“已收”('Received')邮件头。每行一个IP。 也可以用改进了的Blackhole Server写入SBL blacklist文件 (http://www.nuclearelephant.com/projects/sbl/)。 --enable-spam-subject 预先考虑任何疑似垃圾邮件的邮件头主题部分。有些时候这比X-DSPAM-Result域更有用,因为并不是所有的 邮件客户机都支持带自定义邮件头的邮件规则。 --disable-user-logging 禁止每个用户日志文件的写。禁止后用户不能察看图表或历史日志。 --disable-system-logging 禁止系统日志文件的写。禁止后管理员不能察看图表或历史日志。 FEATURE ACTIVATION --enable-neural-networking (EXPERIMENTAL) Enables neural networking support (see the section NEURAL NETWORKING). This feature is only presently supported by the mysq_drv and pgsql_drv storage drivers, and is still considered experimental. --enable-source-address-tracking Logs the source address of spams and innocent messages via syslog. You can create a file DSPAM_HOME/mta.whitelist which can contain a list of local MTA IPs, which will cause DSPAM to skip to the next 'Received' header. Each IP should be on a new line. Also writes SBL blacklist files for use with the Streamlined Blackhole Server (http://www.nuclearelephant.com/projects/sbl/). --enable-spam-subject Prepends [SPAM] to the subject header of any messages suspected to be spam. This is sometimes more useful than the X-DSPAM-Result field, because not all mail clients support mail rules with custom headers. --disable-user-logging Disables the writing of per-user .log files. Users will not be able to view graphs or history with this feature disabled. --disable-system-logging Disables the writing of the system.log. Admins will not be able to view graphs or other related information with this feature disabled. 算法规则激活(ALGORITHM ACTIVATION) 默认的已激活算法规则已经非常够用,表现了DSPAM中最彻底测试过的算法。没有必要改动任何选项,除非 您对改变DSPAM的默认方式特别感兴趣。 --disable-traditional-bayesian 禁止传统的Bayesian 算法(默认已激活)。 --disable-alternative-bayesian 禁止Brian Burton 的算法,选择Bayesian 算法。不同之处在于: -用27个例子代替15个例子 -在计算中出现过一次以上的记号会取走两个扩展槽。当数据很有限时,这一点比较理想 (默认已激活) --enable-robinson 可用Robinson的几何平均数测试。不同之处在于: -窗口型号25取代了15 -联合算法也有区别。参见: http://radio.weblogs.com/0101454/stories/2002/09/16/spamDetection.html 此算法非常陈旧,不推荐使用于产品成果建立。 ALGORITHM ACTIVATION The default algorithms enabled are quite sufficient, and represent the most well-tested algorithms in DSPAM. It is not necessary to change any of these options unless you are interested in altering DSPAM's default behavior. --disable-traditional-bayesian Disables the traditional Bayesian algorithm (it is enabled by default). --disable-alternative-bayesian Disables Brian Burton's alternative Bayesian algorithm. The differences are: - 27 Samples are used instead of 15 - Tokens appearing more than once may take up to 2 slots in the calculation. This is ideal when there is very limited data (it is enabled by default) --enable-robinson Enables Robinson's geometric mean test. The differences are: - A window-size of 25 is used instead of 15 - The combination algorithm is different. See: http://radio.weblogs.com/0101454/stories/2002/09/16/spamDetection.html for more information. This algorithm is obsolete, and not recommended for production builds. --enable-chi-square Fisher-Robinson的 Inverse Chi-Square算法可用。 在libdspam.c中默认的是: Defaults in libdspam.c: - Exclusionary radius of 0.45 - Ham/Spam Cutoff of 0.5 - Strength: 0.1 - Assumed probability: 0.5 注: 您可以同时激活多种算法规则;如果某个算法认为某邮件是垃圾邮件,他会直接打上标记。自然地,您也会 发现潜在的问题,即由这些算法产生的假阳性邮件,由此,推荐或者坚持一个算法,或者只用Bayesian 或Robindon的算法。Bayesian+Alt-Bayesian看起来是最有效的联合(根本不用Robinson算法)。 正是这个原因,如果您想用默认禁止的算法时,强烈推荐您同时: --disable-traditional-bayesian --disable-alternative-bayesian 一般来说,alternative-Bayesian算法有时发现一些传统算法没有发现的垃圾邮件,但是,它相比传统算法 会遗漏更多的垃圾邮件。由此,两个Bayesian算法同时使用看来是最佳的办法。 --enable-chi-square Enables Fisher-Robinson's Inverse Chi-Square Defaults in libdspam.c: - Exclusionary radius of 0.45 - Ham/Spam Cutoff of 0.5 - Strength: 0.1 - Assumed probability: 0.5 NOTE: You may have multiple algorithms enabled simultaneously; if any of the enabled algorithms believe the message is spam, it will be marked accordingly. Naturally, you also have the potential problem of any false positives generated by the enabled algorithms, so it is recommended to either stick with a single algorithm, or use only Bayesian or only Robinson's type algorithms. Bayesian+Alt-Bayesian seems to be the most effective combination (not using Robinson's at all). For this reason, if you plan on enabling any algorithms which are disabled by default, it is strongly recommended that you also: --disable-traditional-bayesian --disable-alternative-bayesian Generally, the alternative-Bayesian algorithm appears to catch some spams that the traditional Bayesian algorithm does not, however it also misses far more spams than the traditional algorithm. Therefore, an implementation using both Bayesian algorithms appears to be the most effective in catching spam. --disable-bias 当偏见被禁止后,dspam不再为了正常邮件而偏爱统计学,而是以平等的计算来平等的评估垃圾和正常邮件。 这或许会对垃圾过滤更有效,但是也提高了假阳性的数量。 --enable-robinson-pvalues Robinson的联合p-valuse方法可用。这个方法和下面描述的产生单词概率可以二者择一: http://www.linuxjournal.com/article.php?sid=6467 Robinson的p-values方法目前用于Chi-Square的计算,但是让它们带上标记就会使其用于“所有的”计算, 且有效的取代(或是依赖于)Graham的标记方法。这个标记在Chi-Square禁用时也可用。 --disable-test-conditional 禁用test-conditional训练。Test-conditional训练与传统的相比是一个更加有力的方式,更迅速的提供了 更多的inoculous结果。 默认已激活,训练的模式会自动重新训练用户的垃圾或假阳性词典,直到条件为met(例如直到用户的字典不 再对疑似邮件产生错误的分类) 。这种再训练最多可以迭代5次,当以下情况时才被调用: -当用户有多于1000封正常邮件时,且报告有垃圾邮件 -用户正在报告有假阳性邮件(有多少邮件可不计) --disable-bias When bias is disabled, dspam no longer biases the statistics in favor of innocent mail, but measures both spam and innocent tokens equally in the calculation equally. This may provide more effective spam filtering, but has shown to increase the number of false positives. --enable-robinson-pvalues Enable's robinson's technique for combining p-values. This is an alternative approach to generating word probabilities described here: http://www.linuxjournal.com/article.php?sid=6467 Robinson's p-values are presently used in Chi-Square calculations, but enabling them with this flag will use them for *all* calculations effectively replacing (or rather building upon) Graham's tokenization approach. This flag may also be used without enabling Chi-Square. --disable-test-conditional Disables test-conditional training. Test-conditional training is a more agressive approach to training than traditional training, and provides more inoculous results rapidly. Enabled by default, this mode of training will automatically re-train the user's dictionary on spam or false positive until the training condition is met (e.g. until the user's dictionary no longer results in misclassification of the message being retrained). This training has a maximum number of 5 iterations, and will only invoke when: - The user has 1000 innocent messages in their corpus, and is reporting a spam - The user is reporting a false positive (regardless of the number of messages in their corpus) 然而这种training的方式也有争议。所有的论点都是围绕着一个假设:将来这种training的方式很可能导致 您不止一次的接收同一个(或是非常相似的)邮件。 - 既然邮件被重复retrain,那么学习曲线将只基于某一封邮件而不是基于包含不同内容的相似邮件群。 - 很有可能某个用户会重复train一个只收到过一次的垃圾邮件,但是这将会潜在的增加假阳性的风险。 - 如果用户的正常邮件与引进被重复训练的垃圾邮件之间的字典标记非常雷同的话,会使用户潜意识的终止用 垃圾邮件retraining,接着终止用假阳性retraining,然后再终止用垃圾邮件retraining。 尽管有这些争议,但是这种training的方法在许多应用中取得里极大的成功。 This method of training has its controversial points as well. All of these issues revolve around the assumption this approach to training makes that you are likely to receive the same (or very similar) again one or more times in the future. - Since the message is being retrained repeatedly, the learning curve is going to be based solely on that one message rather than the natural flow of similar messages that may contain slightly different text. - It's possible a user may agressively train a spam they will only receive once but could potentially increase their risk of false positives by training this agressively. - If there is a significant overlap of dictionary tokens between a user's regular mail and the incoming spams being agressively trained, the user could potentially end up retraining with spam, then retraining with false positives, then retraining with spam again. In spite of these controversial points, this approach to training has had successful results with several implementations. 驱动程序细节配置开关(DRIVER SPECIFIC CONFIGURE SWITCHES) DRIVER SPECIFIC CONFIGURE SWITCHES libdb4_drv: --with-db4-includes=DIR Specify a path to the Berkeley db4 includes --with-db4-libraries=DIR Specify a path to the Berkeley db4 libraries libdb3_drv: --with-db3-includes=DIR Specify a path to the Berkeley db3 includes --with-db3-libraries=DIR Specify a path to the Berkeley db3 libraries (Currently links to -ldb3, to you may need to symlink libdb-3.3.so to libdb3.so if it doesn't exist) mysql_drv: --with-mysql-includes=DIR Specify a path to the MySQL includes --with-mysql-libraries=DIR Specify a path to the MySQL libraries (Currently links to -lmysqlclient, also -lcrypto on some systems) --enable-virtual-users Tells DSPAM to create virtual user ids. Use this if your users don't actually exist on the system (e.g. in /etc/passwd if using a password file) NOTE: Please see the file tools.mysql_drv/README for more information about configuring the mysql_drv storage driver. pgsql_drv: --with-pgsql-includes=DIR Specify a path to the PgSQL includes --with-pgsql-libraries=DIR Specify a path to the PgSQL libraries (Currently links to -lpq, and netlibs on some systems) --enable-virtual-users Tells DSPAM to create virtual user ids. Use this if your users don't actually exist on the system (e.g. in /etc/passwd if using a password file) NOTE: Please see the file tools.pgsql_drv/README for more information about configuring the pgsql_drv storage driver. ora_drv: --with-oracle-home=DIR Specify the Oracle Home (or client home) --enable-virtual-users Tells DSPAM to create virtual user ids. Use this if your users don't actually exist on the system (e.g. in /etc/passwd if using a password file) NOTE: Please see the file tools.ora_drv/README for more information about configuring the ora_drv storage driver. 2. BUILDING AND INSTALLING After you have run configure with the correct options, build and install DSPAM by performing: make %26amp;%26amp; make install If you are a developer wanting to link to the core engine of dspam, libdspam will be built during this process. Please see the example.c file for examples of how to link to and use libdspam. Static and dynamic libraries are built in the .libs directory. Needed headers will be installed in $prefix$/include/dspam. 3. 权限 安装后,DSPAM_HOME会自动生成(默认路径是/var/dspam)。确保您的MTA 和CGI 用户在这个路径上有写入的权限。 或许您需要在/etc/group下的the directory's [mail] group中添加root 和MTA用户。MTA用户通常是 'daemon' 或 'smmsp',尽管在FreeBSD中默认为'mailnull'。这一点很重要,因为您的MTA用户需要 和文件打交道。 非常重要!!!(IMPORTANT!!!) FreeBSD的mail.local更改了其有效的uid,因此,为了使它在命令行真正地起作用,dspam必须作为setuid root安装。这在安装过程中自动完成。 如果您发现DSPAM正在错误地为某个用户处理所有的操作,可能是那个用户作为一个administrative user已被加入到trusted.users中。 3. PERMISSIONS After install, the DSPAM_HOME will have been created for you automatically (the default is /var/dspam). Insure the permissions of the directory are writable by both your MTA and CGI user. You may need to add root and your MTA user to the directory's [mail] group in /etc/group. The MTA user is usually 'daemon' or 'smmsp' although on FreeBSD the default is 'mailnull'. This is very important, as your MTA user needs to be able to lock and work with files. IMPORTANT!!! FreeBSD's mail.local changes its effective uid, and so in order to use it dspam must be installed as setuid root to work on the commandline properly. This is done automatically on install. If you find that DSPAM is erroneously processing all operations as a single user, chances are that user should be added to trusted.users as an administrative user, 信任用户安全管理(TRUSTED USERS SECURITY) DSPAM对系统内的不可靠用户有着严格的安全体系,目的是防止他们欺骗其他用户或者指定其自己的通行参 数(passthru arguments)潜在地劫持发送代理。应用这种安全方法是因为执行某些命令(比如使用procmail) 时会要求setuid或是setgid DSPAM代理。 trusted.users文件应该创建在$DSPAM_HOME (默认是 /var/dspam)中。该文件应该包含trusted users的名单, 这些trusted users允许设置或限制垃圾用户,passthru parameters及其他被某些恶意用户设置的具有潜在 危险的信息。该文件一行一个用户名,通常都是MTA和CGI用户的用户名。例如: root smmsp daemon cgi mailnull Where cgi represents the special CGI user you configure Apache to run your dspam.cgi as. TRUSTED USERS SECURITY DSPAM has tighter security for untrusted users on the system, to prevent them from being able to spoof other users or specify their own passthru arguments to potentially hijack the delivery agent. This method of security has been implemented due to the fact that some implementations (such as those using procmail) may require the DSPAM agent to be setuid or setgid. The trusted.users file should be created in $DSPAM_HOME (defaulted to /var/dspam). This file should contain a list of trusted users who should be allowed to set the dspam user, passthru parameters, and other information that would be potentially dangerous for a malicious user to be able to set. The file should contain one username per line, and will generally the usernames of the MTA and CGI users. Example: root smmsp daemon cgi mailnull Where cgi represents the special CGI user you configure Apache to run your dspam.cgi as. 一定要检查DSPAM_HOME/dspam.debug以确保当提交垃圾或假阳性邮件时您没有收到任何不可靠用户的警告, 因为这些actions会经常从不同的用户调用垃圾邮件而不是从标准投递调用。 如果您在调用DSPAM匹配目的用户之前已经更改了userid的MTA时,您不该(should, NOT)把每个用户都添加到trusted users文件中,您应该配置一个事先调整的命令行。DSPAM就会看到这个用户是不可靠的用户,自动设置其 DSPAM用户id和随意配置发送代理参数。 为了不考虑某个untrusted user的通过代理参数(是指可以用来攻击发送代理以获得访问系统的特权的参数),您只需在相同的目录 ($DSPAM_HOME)中建一个untrusted.mailer_args的文件。第一行应该是到发送代理的路径,接下来是所有 要通过的LDP参数列表(如果必要的话可以包括每个用户的是识别标志)。这个文件的信息将不会考虑任何 由用户指定的通过命令行的参数。例如: /bin/mail -d $u 变量$u告诉DSPAM您愿意目标用户名可以用于$u被指定的地方,因此当DSPAM为用户'bob'调用您的LDA时, 他将会这样调用: /bin/mail -d bob Be sure to examine DSPAM_HOME/dspam.debug to insure that you don't get any untrusted user warnings when submitting spam or a false positive, as both of these actions frequently call dspam from a different user than standard mail delivery. If you are using an MTA that changes its userid before calling DSPAM to match the destination user, you should NOT add each user to the trusted users file, but instead configure a preset commandline. DSPAM will see that the user is not trusted and automatically set their DSPAM user id and optionally the passthru delivery agent arguments. To override an untrusted user's passthru delivery agent arguments (arguments which could be used to hijack the delivery agent to gain privileged access to the system) you will need to set up a file called untrusted.mailer_args in the same directory ($DSPAM_HOME). The first line should contain the path to the delivery agent followed by a list of all the LDA arguments to pass through (including a user identity flag if necessary). This file's information will override any passthru commandline parameters specified by the user. For example: /bin/mail -d $u The variable $u informs DSPAM that you would like the destination username to be used in the position $u is specified, so when DSPAM calls your LDA for user 'bob', it will call it with: /bin/mail -d bob 注:如果下列所有(ALL)事件都是真: - 您的MTA在调用DSPAM之前对目标用户执行setup() - 在配置文件中不能指定,但是还必需得传递给DSPAM的参数additional_dynamically assigned_paramerers存在 - 发送代理没有潜在危险的命令行参数选项,或者您给发送代理加了一层封皮 那么您或许希望删除untrusted.mail_args文件。如果没有发现文件,dspam将允许用户向预先配置了的LSA (和一些合乎情理的核实要素)指定自己的通过参数,如果不正确的安装这会产生潜在的不安全因素。为了 忽略用户参数,强烈推荐您使用此文件。 不能打开untrusted.mailer_args文件时DSPAM会警告您(通过日志纪录)。 如果您不想看见这个警告的话,去建一个空的untrusted.mailer_args文件吧。 NOTE: In the event that ALL of the following are true: - Your MTA performs a setuid() to the destination user prior to calling DSPAM - There are additional _dynamically assigned_ parameters that must be passed to DSPAM which cannot be specified in configuration - The delivery agent has no potentially dangerous commandline options, or you are placing a wrapper around the delivery agent Then you may want to remove the untrusted.mailer_args file all together. If the file cannot be found, dspam will permit the user to specify their own passthru arguments to the preconfigured LDA (with some basic sanity checking) which COULD POTENTIALLY BE INSECURE if improperly set up.. It is strongly recommended you use this file to override the user. DSPAM warns you (over log record) when unable to open untrusted.mailer_args file. If you don't want to see this warning then make untrusted.mailer_args file exists but empty. 4. 配置服务器 有两种配置DSPAM的方法: Mail Server: 当邮件来到时,使DSPAM直接和邮件服务器以及垃圾过滤器结合成整体的默认方式。 POP3 可选择的实现POP3的方法,用户连接到该代理为了检查他们的邮件,当下载完以后邮件就被过滤。POP3方法 比较简单,因为它和邮件服务器之间不需要配置太多的参数(同时也是在Exchange等实现DSPAM的理想工具)。 最大的区别在于前者(邮件服务器)在MTA时间过滤邮件,而后者(POP3代理)在MUA时间处理邮件过滤,而 且后者还有额外的好处:不必担心虚拟用户等等。 4. SERVER CONFIGURATION There are two ways DSPAM can be configured: Mail Server: The default approach integrates DSPAM directly with the mail server and filters spam as mail comes in. POP3 Proxy: The alternative approach implements a POP3 proxy where users connect to the proxy to check their email, and email is filtered when being downloaded. The POP3 proxy is a much easier approach, as it requires much less integration work with the mail server (and is ideal for implementing DSPAM on Exchange, etcetera).
󰈣󰈤
 
 
 
>>返回首页<<
 
 热帖排行
 
 
 
静静地坐在废墟上,四周的荒凉一望无际,忽然觉得,凄凉也很美
©2005- 王朝网络 版权所有