grep and Other RegEx Functions

grep & grepl

grep stands for " globally search for a regular expression and print all matches," just as in UNIX. The function allows you to use regular expressions to search for a pattern in a vector of strings or characters, and returns the index (indices) of the match(es).

Additionally, the function grepl (derived from grep-logical) uses the same inputs, but returns a logical vector, where TRUE indicates a match at that index, and FALSE indicates the opposite.

grep and grepl can be used on individual strings, though they match the entire string, not the index of the character that matches the regular expression. For grep, a hit would return [1] 1 and a miss would return integer(0). For grepl, you still get TRUE or FALSE.


Examples

Given a character vector, return the index of any words ending in 's'.

Click to see solution
grep(".*s$", c("waffle", "waffles", "pancake", "pancakes"))
[1] 2 4

Given a character vector, return TRUE of any words ending in 's', FALSE otherwise.

Click to see solution
grepl(".*s$", c("cats", "bats", "geese", "meese"))
# fun fact: meese is not the plural of moose
[1]  TRUE  TRUE FALSE FALSE


sub & gsub

Oftentimes finding the indices of matches in your text isn’t what you want — your goal is to change the text into a format that’s better for parsing. For this, we have sub and gsub. These functions take a regular expression and a replacement expression, applying the replacement to a string or a vector of strings.

The key difference here is that sub applies only to the first match, while gsub applies to all matches (derived from global-substitution).

For these functions, it’s equally valid to apply vector-wise or individually. Applying on vectors will repeat the substitution process for each individual string, so naturally a single string would work.


Examples

Given a string, replace the first 'l' with '?'

Click to see solution
sub("l", "?", "The best part of waking up is Folgers in your cup")
# not sponsored or affiliated
[1] "The best part of waking up is Fo?gers in your cup"

Given a string, replace all vowels with '!'

Click to see solution
gsub("[aeiou]", "!", "The best part of waking up is Folgers in your cup")
# globally not sponsored or affiliated
[1] "Th! b!st p!rt !f w!k!ng !p !s F!lg!rs !n y!!r c!p"


Resources

RStudio Cheat Sheets: 6th sheet down, titled "String manipulation with stringr cheatsheet"

Regular expressions are hard, even for some veteran programmers, as the rules and match characters are subtly different for each programming language. This is a great resource for RegEx basics — everything on the second page is useful — and string manipulations which encompass more than that of sub and gsub.