Intro to Regex 🔍
Quick Intro to Regular Expressions
2 min readFeb 3, 2021
Regular expressions (aka Regex) define patterns for matching, extracting, and/or transforming data. They are available for use in many major programming languages (Python, R, Java, Javascript, etc.).
Essentials
- Character set [abc] = Matches any single character (in this case a, b, or c is matched)
- Optional element via a? = Matches 0 or 1 occurrences (0 or 1 of the a character for example)
- Range [0–9] = Matches any single character in inclusive range (includes numbers in a range)
- (Kleene) Star/Asterisk * =Matches 0 or more occurrences
- Plus sign + = Matches 1 or more occurrences
- Dot . = Matches any character (except line breaks)
- Escape Character \= Take next character literally (no special meaning)
- Alternation |= Boolean “or”
- Exact quantifiers via {n} {a, b} {a,} = Match a number of something or range of something
- Capturing group (a) = Group multiple tokens and can be used for backreference
Here are some character classes and info on anchors which may be useful to quickly build an expression:
- \w\d\s word, digit, whitespace character
- \W\D\S non-word, non-digit, non-whitespace character
- [abc] a b or c
- [^abc] any character other than a b or c
- [a-g] any char between a and g (inclusive)
- ^abc start of string
- abc$ end of string
- xyz\b word boundary
- xyz\B not word boundary