Intro to Regex 🔍

Quick Intro to Regular Expressions

Sam Parmar
2 min readFeb 3, 2021

Regular expressions (aka Regex) define patterns for matching, extracting, and/or transforming data. They are available for use in many major programming languages (Python, R, Java, Javascript, etc.).

Essentials

  • Character set [abc] = Matches any single character (in this case a, b, or c is matched)
  • Optional element via a? = Matches 0 or 1 occurrences (0 or 1 of the a character for example)
  • Range [0–9] = Matches any single character in inclusive range (includes numbers in a range)
  • (Kleene) Star/Asterisk * =Matches 0 or more occurrences
  • Plus sign + = Matches 1 or more occurrences
  • Dot . = Matches any character (except line breaks)
  • Escape Character \= Take next character literally (no special meaning)
  • Alternation |= Boolean “or”
  • Exact quantifiers via {n} {a, b} {a,} = Match a number of something or range of something
  • Capturing group (a) = Group multiple tokens and can be used for backreference

Here are some character classes and info on anchors which may be useful to quickly build an expression:

  • \w\d\s word, digit, whitespace character
  • \W\D\S non-word, non-digit, non-whitespace character
  • [abc] a b or c
  • [^abc] any character other than a b or c
  • [a-g] any char between a and g (inclusive)
  • ^abc start of string
  • abc$ end of string
  • xyz\b word boundary
  • xyz\B not word boundary

Resources

Interactive learning here

Read this book chapter here

Practice via crossword game here

--

--

Sam Parmar
Sam Parmar

No responses yet