BOOST 1.33.0 Regex试用手记

王朝other·作者佚名  2006-01-09
宽屏版  字体: |||超大  

BOOST 1..33.0 快出来了,并重写了regex,增加了

*对unicode支持

*对ATL MFC CString的支持

***********

迫不及待,先下了一个来看看.

源码下载:

=========

boost地址:

cvs -d:pserver:anonymous@cvs.sourceforge.net:/cvsroot/boost login

cvs -z9 -d:pserver:anonymous@cvs.sourceforge.net:/cvsroot/boost co -P boost

ICU地址:(boost 1.33.0的regex的unicode解决方案是基于IBM的unicode库ICU)

http://www.ibm.com/software/globalization/icu/

源码编译:

=============

编译环境是vc7.1+vc7.1自带的C++ STL,进入到BOOST_ROOT\libs\regex\build

bjam -sICU_PATH=d:\icu32 -sTOOLS=vc-7_1 stage

Unicode支持测试:

================

看了一下icu的dll,boost regex动态连接的三个dll总体积居然达到10M,心情不好,放弃测试。

ATL MFC支持:

===============

在vc7.1里面,新开个win32 console,加入下面代码:

/*

*

* Copyright (c) 2004

* John Maddock

*

* Use, modification and distribution are subject to the

* Boost Software License, Version 1.0. (See accompanying file

* LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)

*

*/

/*

* LOCATION: see http://www.boost.org for most recent version.

* FILE mfc_example.cpp

* VERSION see <boost/version.hpp>

* DESCRIPTION: examples of using Boost.Regex with MFC and ATL string types.

*/

#define TEST_MFC

#ifdef TEST_MFC

#include <boost/regex/mfc.hpp>

#include <cstringt.h>

#include <atlstr.h>

#include <assert.h>

#include <tchar.h>

#include <iostream>

#ifdef _UNICODE

#define cout wcout

#endif

//

// Find out if *password* meets our password requirements,

// as defined by the regular expression *requirements*.

//

bool is_valid_password(const CString& password, const CString& requirements)

{

return boost::regex_match(password, boost::make_regex(requirements));

}

//

// Extract filename part of a path from a CString and return the result

// as another CString:

//

CString get_filename(const CString& path)

{

boost::tregex r(__T("(?:\\A|.*\\\\)([^\\\\]+)"));

boost::tmatch what;

if(boost::regex_match(path, what, r))

{

// extract $1 as a CString:

return CString(what[1].first, what.length(1));

}

else

{

throw std::runtime_error("Invalid pathname");

}

}

CString extract_postcode(const CString& address)

{

// searches throw address for a UK postcode and returns the result,

// the expression used is by Phil A. on www.regxlib.com:

boost::tregex r(__T("^(([A-Z]{1,2}[0-9]{1,2})|([A-Z]{1,2}[0-9][A-Z]))\\s?([0-9][A-Z]{2})$"));

boost::tmatch what;

if(boost::regex_search(address, what, r))

{

// extract $0 as a CString:

return CString(what[0].first, what.length());

}

else

{

throw std::runtime_error("No postcode found");

}

}

void enumerate_links(const CString& html)

{

// enumerate and print all the <a> links in some HTML text,

// the expression used is by Andew Lee on www.regxlib.com:

boost::tregex r(__T("href=[\"\']((http:\\/\\/|\\.\\/|\\/)?\\w+(\.\w+)*(\/\w+(\.\w+)?)*(\/|\?\w*=\w*(&\w*=\w*)*)?)["']"));

boost::tregex_iterator i(boost::make_regex_iterator(html, r)), j;

while(i != j)

{

std::cout << (*i)[1] << std::endl;

++i;

}

}

void enumerate_links2(const CString& html)

{

// enumerate and print all the <a> links in some HTML text,

// the expression used is by Andew Lee on www.regxlib.com:

boost::tregex r(__T("href=[\"\']((http:\\/\\/|\\.\\/|\\/)?\\w+(\.\w+)*(\/\w+(\.\w+)?)*(\/|\?\w*=\w*(&\w*=\w*)*)?)["']"));

boost::tregex_token_iterator i(boost::make_regex_token_iterator(html, r, 1)), j;

while(i != j)

{

std::cout << *i << std::endl;

++i;

}

}

//

// Take a credit card number as a string of digits,

// and reformat it as a human readable string with "-"

// separating each group of four digits:

//

const boost::tregex e(__T("\A(\d{3,4})[- ]?(\d{4})[- ]?(\d{4})[- ]?(\d{4})\z"));

const CString human_format = __T("$1-$2-$3-$4");

CString human_readable_card_number(const CString& s)

{

return boost::regex_replace(s, e, human_format);

}

int main()

{

// password checks using regex_match:

CString pwd = "abcDEF---";

CString pwd_check = "(?=.*[[:lower:]])(?=.*[[:upper:]])(?=.*[[:punct:]]).{6,}";

bool b = is_valid_password(pwd, pwd_check);

assert(b);

pwd = "abcD-";

b = is_valid_password(pwd, pwd_check);

assert(!b);

// filename extraction with regex_match:

CString file = "abc.hpp";

file = get_filename(file);

assert(file == "abc.hpp");

file = "c:\\a\\b\\c\\d.h";

file = get_filename(file);

assert(file == "d.h");

// postcode extraction with regex_search:

CString address = "Joe Bloke, 001 Somestreet, Somewhere,\nPL2 8AB";

CString postcode = extract_postcode(address);

assert(postcode = "PL2 8NV");

// html link extraction with regex_iterator:

CString text = "<dt><a href=\"syntax_perl.html\">Perl Regular Expressions</a></dt><dt><a href=\"syntax_extended.html\">POSIX-Extended Regular Expressions</a></dt><dt><a href=\"syntax_basic.html\">POSIX-Basic Regular Expressions</a></dt>";

enumerate_links(text);

enumerate_links2(text);

CString credit_card_number = "1234567887654321";

credit_card_number = human_readable_card_number(credit_card_number);

assert(credit_card_number == "1234-5678-8765-4321");

return 0;

}

#else

#include <iostream>

int main()

{

std::cout << "<NOTE>MFC support not enabled, feature unavailable</NOTE>";

return 0;

}

#endif

设置编译环境:

=============

*include路径里面包含$(BOOST_ROOT);%(ICU_PATH)\include,都在vc7.1相关include目录之后。

设置编译属性:

============

*使用unicode字符集

*使用/Zc:wchar_t(注意:vc7.1默认编译boost时候,wchar_t是作为元数据处理的,所以,如果要支持unicode,而不是mbcs时候,请使用此编译项编译工程)

*使用多线程调试dll /MDd(请不要使用其他的,如果你不明白这个是什么意思)

*设置宏BOOST_REGEX_DYN_LINK(默认情况下,regex是静态连接,如果想动态连接,就设置此宏)

编译连接“顺利”通过。

编译命令行为:

/Od /D "WIN32" /D "_DEBUG" /D "_CONSOLE" /D "BOOST_REGEX_DYN_LINK" /D "_UNICODE" /D "UNICODE" /Gm /EHsc /RTC1 /MDd /Zc:wchar_t /Yu"stdafx.h" /Fp"Debug/capture.pch" /Fo"Debug/" /Fd"Debug/vc70.pdb" /W3 /nologo /c /Wp64 /ZI /TP

连接命令行为:

/OUT:"Debug/capture.exe" /INCREMENTAL /NOLOGO /DEBUG /PDB:"Debug/capture.pdb" /SUBSYSTEM:CONSOLE /MACHINE:X86 kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib

BOOST 1.33.0 regex changelog

=====================

Boost 1.33.0.

Completely rewritten expression parsing code, and traits class support; now conforms to the standardization proposal. Added support for (?imsx-imsx) constructs. Added support for lookbehind expressions (?<=positive-lookbehind) and (?<!negative-lookbehind). Added support for conditional expressions (?(assertion)true-expresion|false-expression). Added MFC/ATL string wrappers. Added Unicode support; based on ICU. Changed newline support to recognise \f as a line separator (all character types), and \x85 as a line separator for wide characters / Unicode only.Boost 1.32.1.

Fixed bug in partial matches of bounded repeats of '.'.Boost 1.31.0.

Completely rewritten pattern matching code - it is now up to 10 times faster than before. Reorganized documentation. Deprecated all interfaces that are not part of the regular expression standardization proposal. Added regex_iterator and regex_token_iterator . Added support for Perl style independent sub-expressions. Added non-member operators to the sub_match class, so that you can compare sub_match's with strings, or add them to a string to produce a new string. Added experimental support for extended capture information. Changed the match flags so that they are a distinct type (not an integer), if you try to pass the match flags as an integer rather than match_flag_type to the regex algorithms then you will now get a compiler error.[end]

 
 
 
免责声明:本文为网络用户发布,其观点仅代表作者个人观点,与本站无关,本站仅提供信息存储服务。文中陈述内容未经本站证实,其真实性、完整性、及时性本站不作任何保证或承诺,请读者仅作参考,并请自行核实相关内容。
© 2005- 王朝网络 版权所有