Home
Power Search and Replace Tool Prev Page Prev Page
Introducing PowerGREP
Getting Started with PowerGREP
Regular Expression Quick Start
PowerGREP Contact Information
How to Use PowerGREP
Mark Files for Searching
Define a Search Action
Interpret Search Results
Edit Files and Replace or Revert Individual Matches
Keyboard Shortcuts
PowerGREP Examples
Search Through File Names
Find Email Addresses
Find Word Pairs
Boolean Operators “and” and “or”
Find Two Words Near Each Other
Find Two or More Words on The Same Line
Extract or Delete Lines Matching One or More Search Terms
Delete Repeated Words
Add a Header and Footer to Files
Update Copyright Years
Add Proper HTML <TITLE> Tags
Replace HTML Tags
Replace HTML Attributes
Search Through or Skip Source Code Comments and Strings
Convert Windows to UNIX Paths
Extract Data into a CSV File or Spreadsheet
Collect a List of Header and Item Pairs
Inspect Web Logs
Extract Google Search Terms from Web Logs
Compile Indices of Files
Make Sections and Their Contents Consistent
Generate a PHP Navigation Bar
Include a PHP Navigation Bar
PowerGREP Reference
PowerGREP Assistant
File Selector Reference
File Selector Menu
Action Reference
Search Terms and Options
Action Definition
Extra Processing
File Sectioning
Target and Backup Files
Action Menu
Library Reference
Library Menu
Results Reference
Results Menu
Editor Reference
Editor Menu
Undo History Reference
Undo History Menu
Change PowerGREP’s Appearance
Preferences
Path Placeholders
Supported File Formats
Command Line Parameters
XML Format of PowerGREP Files
Regular Expression Tutorial
Introduction
Tutorial Contents
Characters
How a Regex Engine Works Internally
Character Classes
Dot (Any Character)
Start and End of String or Line
Word boundaries
Alternation
Making a Token Optional
Quantifiers (Repetition)
Grouping and Backreferences
Named Capturing Groups
Unicode
Mode Modifiers
Atomic Grouping and Possessive Quantifiers
Lookahead and Lookbehind
Test The Same Part of The String for More Than Once
Continuing from The Previous Match
If-Then-Else Conditionals
Adding Comments
Regular Expression Reference
Basic Syntax
Advanced Syntax
Flavor-Specific Syntax
Unicode Syntax
Regular Expression Examples
Regular Expression Examples
Floating Point Numbers
Dates
Matching Complete Lines
Delete Duplicate Lines
Programming Language Syntax
Two Near Words
Regex Tools and Languages
Overview
grep
RegexBuddy
EditPad Pro
Delphi
Java
Java Example
JavaScript
JavaScript Example
.NET (C# and VB.NET)
C# Example
PCRE
Perl
PHP
Python
Ruby
Regular Expression Books
Teach Yourself...in 10 Minutes
Mastering Regular Expressions
Java Regular Expressions
Regular Expression Recipies

Word Boundaries

The metacharacter \b is an anchor like the caret and the dollar sign. It matches at a position that is called a "word boundary". This match is zero-length.

There are four different positions that qualify as word boundaries:

  • Before the first character in the string, if the first character is a word character.
  • After the last character in the string, if the last character is a word character.
  • Between a word character and a non-word character following right after the word character.
  • Between a non-word character and a word character following right after the non-word character.

Simply put: \b allows you to perform a "whole words only" search using a regular expression in the form of \bword\b. A "word character" is a character that can be used to form words. All characters that are not "word characters" are "non-word characters". The exact list of characters is different for each regex flavor, but all word characters are always matched by the short-hand character class \w. All non-word characters are always matched by \W.

In Perl and the other regex flavors discussed in this tutorial, there is only one metacharacter that matches both before a word and after a word. This is because any position between characters can never be both at the start and at the end of a word. Using only one operator makes things easier for you.

Note that \w usually also matches digits. So \b4\b can be used to match a 4 that is not part of a larger number. This regex will not match 44 sheets of a4. So saying "\b matches before and after an alphanumeric sequence" is more exact than saying "before and after a word".

Negated Word Boundary

\B is the negated version of \b. \B matches at every position where \b does not. Effectively, \B matches at any position between two word characters as well as at any position between two non-word characters.

Looking Inside the Regex Engine

Let's see what happens when we apply the regex \bis\b to the string This island is beautiful. The engine starts with the first token \b at the first character T. Since this token is zero-length, the position before the character is inspected. \b matches here, because the T is a word character and the character before it is the void before the start of the string. The engine continues with the next token: the literal i. The engine does not advance to the next character in the string, because the previous regex token was zero-width. i does not match T, so the engine retries the first token at the next character position.

\b cannot match at the position between the T and the h. It cannot match between the h and the i either, and neither between the i and the s.

The next character in the string is a space. \b matches here because the space is not a word character, and the preceding character is. Again, the engine continues with the i which does not match with the space.

Advancing a character and restarting with the first regex token, \b matches between the space and the second i in the string. Continuing, the regex engine finds that i matches i and s matches s. Now, the engine tries to match the second \b at the position before the l. This fails because this position is between two word characters. The engine reverts to the start of the regex and advances one character to the s in island. Again, the \b fails to match and continues to do so until the second space is reached. It matches there, but matching the i fails.

But \b matches at the position before the third i in the string. The engine continues, and finds that i matches i and s matches s. The last token in the regex, \b, also matches at the position before the second space in the string because the space is not a word character, and the character before it is.

The engine has successfully matched the word is in our string, skipping the two earlier occurrences of the characters i and s. If we had used the regular expression is, it would have matched the is in This.

 

Visit the best online web design and script resources