57 lines
11 KiB
HTML
57 lines
11 KiB
HTML
<!DOCTYPE html><html lang="en"><head><meta charset="utf-8"><meta name="viewport" content="width=device-width, initial-scale=1.0"><meta name="generator" content="rustdoc"><meta name="description" content="Defines a Thompson NFA and provides the `PikeVM` and `BoundedBacktracker` regex engines."><title>regex_automata::nfa::thompson - Rust</title><script>if(window.location.protocol!=="file:")document.head.insertAdjacentHTML("beforeend","SourceSerif4-Regular-6b053e98.ttf.woff2,FiraSans-Italic-81dc35de.woff2,FiraSans-Regular-0fe48ade.woff2,FiraSans-MediumItalic-ccf7e434.woff2,FiraSans-Medium-e1aa3f0a.woff2,SourceCodePro-Regular-8badfe75.ttf.woff2,SourceCodePro-Semibold-aa29a496.ttf.woff2".split(",").map(f=>`<link rel="preload" as="font" type="font/woff2"href="../../../static.files/${f}">`).join(""))</script><link rel="stylesheet" href="../../../static.files/normalize-9960930a.css"><link rel="stylesheet" href="../../../static.files/rustdoc-ca0dd0c4.css"><meta name="rustdoc-vars" data-root-path="../../../" data-static-root-path="../../../static.files/" data-current-crate="regex_automata" data-themes="" data-resource-suffix="" data-rustdoc-version="1.93.1 (01f6ddf75 2026-02-11) (Arch Linux rust 1:1.93.1-1)" data-channel="1.93.1" data-search-js="search-9e2438ea.js" data-stringdex-js="stringdex-a3946164.js" data-settings-js="settings-c38705f0.js" ><script src="../../../static.files/storage-e2aeef58.js"></script><script defer src="../sidebar-items.js"></script><script defer src="../../../static.files/main-a410ff4d.js"></script><noscript><link rel="stylesheet" href="../../../static.files/noscript-263c88ec.css"></noscript><link rel="alternate icon" type="image/png" href="../../../static.files/favicon-32x32-eab170b8.png"><link rel="icon" type="image/svg+xml" href="../../../static.files/favicon-044be391.svg"></head><body class="rustdoc mod"><!--[if lte IE 11]><div class="warning">This old browser is unsupported and will most likely display funky things.</div><![endif]--><rustdoc-topbar><h2><a href="#">Module thompson</a></h2></rustdoc-topbar><nav class="sidebar"><div class="sidebar-crate"><h2><a href="../../../regex_automata/index.html">regex_<wbr>automata</a><span class="version">0.4.14</span></h2></div><div class="sidebar-elems"><section id="rustdoc-toc"><h2 class="location"><a href="#">Module thompson</a></h2><h3><a href="#">Sections</a></h3><ul class="block top-toc"><li><a href="#details" title="Details">Details</a></li></ul><h3><a href="#modules">Module Items</a></h3><ul class="block"><li><a href="#modules" title="Modules">Modules</a></li><li><a href="#structs" title="Structs">Structs</a></li><li><a href="#enums" title="Enums">Enums</a></li></ul></section><div id="rustdoc-modnav"><h2><a href="../index.html">In regex_<wbr>automata::<wbr>nfa</a></h2></div></div></nav><div class="sidebar-resizer" title="Drag to resize sidebar"></div><main><div class="width-limiter"><section id="main-content" class="content"><div class="main-heading"><div class="rustdoc-breadcrumbs"><a href="../../index.html">regex_automata</a>::<wbr><a href="../index.html">nfa</a></div><h1>Module <span>thompson</span> <button id="copy-path" title="Copy item path to clipboard">Copy item path</button></h1><rustdoc-toolbar></rustdoc-toolbar><span class="sub-heading"><a class="src" href="../../../src/regex_automata/nfa/thompson/mod.rs.html#1-81">Source</a> </span></div><details class="toggle top-doc" open><summary class="hideme"><span>Expand description</span></summary><div class="docblock"><p>Defines a Thompson NFA and provides the <a href="pikevm/struct.PikeVM.html" title="struct regex_automata::nfa::thompson::pikevm::PikeVM"><code>PikeVM</code></a> and
|
||
<a href="backtrack/struct.BoundedBacktracker.html" title="struct regex_automata::nfa::thompson::backtrack::BoundedBacktracker"><code>BoundedBacktracker</code></a> regex engines.</p>
|
||
<p>A Thompson NFA (non-deterministic finite automaton) is arguably <em>the</em> central
|
||
data type in this library. It is the result of what is commonly referred to as
|
||
“regex compilation.” That is, turning a regex pattern from its concrete syntax
|
||
string into something that can run a search looks roughly like this:</p>
|
||
<ul>
|
||
<li>A <code>&str</code> is parsed into a <a href="../../../regex_syntax/ast/enum.Ast.html" title="enum regex_syntax::ast::Ast"><code>regex-syntax::ast::Ast</code></a>.</li>
|
||
<li>An <code>Ast</code> is translated into a <a href="../../../regex_syntax/hir/struct.Hir.html" title="struct regex_syntax::hir::Hir"><code>regex-syntax::hir::Hir</code></a>.</li>
|
||
<li>An <code>Hir</code> is compiled into a <a href="struct.NFA.html" title="struct regex_automata::nfa::thompson::NFA"><code>NFA</code></a>.</li>
|
||
<li>The <code>NFA</code> is then used to build one of a few different regex engines:
|
||
<ul>
|
||
<li>An <code>NFA</code> is used directly in the <code>PikeVM</code> and <code>BoundedBacktracker</code> engines.</li>
|
||
<li>An <code>NFA</code> is used by a <a href="../../hybrid/index.html" title="mod regex_automata::hybrid">hybrid NFA/DFA</a> to build out a DFA’s
|
||
transition table at search time.</li>
|
||
<li>An <code>NFA</code>, assuming it is one-pass, is used to build a full
|
||
<a href="../../dfa/onepass/index.html" title="mod regex_automata::dfa::onepass">one-pass DFA</a> ahead of time.</li>
|
||
<li>An <code>NFA</code> is used to build a <a href="../../dfa/index.html" title="mod regex_automata::dfa">full DFA</a> ahead of time.</li>
|
||
</ul>
|
||
</li>
|
||
</ul>
|
||
<p>The <a href="../../meta/index.html" title="mod regex_automata::meta"><code>meta</code></a> regex engine makes all of these choices for you based
|
||
on various criteria. However, if you have a lower level use case, <em>you</em> can
|
||
build any of the above regex engines and use them directly. But you must start
|
||
here by building an <code>NFA</code>.</p>
|
||
<h2 id="details"><a class="doc-anchor" href="#details">§</a>Details</h2>
|
||
<p>It is perhaps worth expanding a bit more on what it means to go through the
|
||
<code>&str</code>-><code>Ast</code>-><code>Hir</code>-><code>NFA</code> process.</p>
|
||
<ul>
|
||
<li>Parsing a string into an <code>Ast</code> gives it a structured representation.
|
||
Crucially, the size and amount of work done in this step is proportional to the
|
||
size of the original string. No optimization or Unicode handling is done at
|
||
this point. This means that parsing into an <code>Ast</code> has very predictable costs.
|
||
Moreover, an <code>Ast</code> can be round-tripped back to its original pattern string as
|
||
written.</li>
|
||
<li>Translating an <code>Ast</code> into an <code>Hir</code> is a process by which the structured
|
||
representation is simplified down to its most fundamental components.
|
||
Translation deals with flags such as case insensitivity by converting things
|
||
like <code>(?i:a)</code> to <code>[Aa]</code>. Translation is also where Unicode tables are consulted
|
||
to resolve things like <code>\p{Emoji}</code> and <code>\p{Greek}</code>. It also flattens each
|
||
character class, regardless of how deeply nested it is, into a single sequence
|
||
of non-overlapping ranges. All the various literal forms are thrown out in
|
||
favor of one common representation. Overall, the <code>Hir</code> is small enough to fit
|
||
into your head and makes analysis and other tasks much simpler.</li>
|
||
<li>Compiling an <code>Hir</code> into an <code>NFA</code> formulates the regex into a finite state
|
||
machine whose transitions are defined over bytes. For example, an <code>Hir</code> might
|
||
have a Unicode character class corresponding to a sequence of ranges defined
|
||
in terms of <code>char</code>. Compilation is then responsible for turning those ranges
|
||
into a UTF-8 automaton. That is, an automaton that matches the UTF-8 encoding
|
||
of just the codepoints specified by those ranges. Otherwise, the main job of
|
||
an <code>NFA</code> is to serve as a byte-code of sorts for a virtual machine. It can be
|
||
seen as a sequence of instructions for how to match a regex.</li>
|
||
</ul>
|
||
</div></details><h2 id="modules" class="section-header">Modules<a href="#modules" class="anchor">§</a></h2><dl class="item-table"><dt><a class="mod" href="backtrack/index.html" title="mod regex_automata::nfa::thompson::backtrack">backtrack</a></dt><dd>An NFA backed bounded backtracker for executing regex searches with capturing
|
||
groups.</dd><dt><a class="mod" href="pikevm/index.html" title="mod regex_automata::nfa::thompson::pikevm">pikevm</a></dt><dd>An NFA backed Pike VM for executing regex searches with capturing groups.</dd></dl><h2 id="structs" class="section-header">Structs<a href="#structs" class="anchor">§</a></h2><dl class="item-table"><dt><a class="struct" href="struct.BuildError.html" title="struct regex_automata::nfa::thompson::BuildError">Build<wbr>Error</a></dt><dd>An error that can occurred during the construction of a thompson NFA.</dd><dt><a class="struct" href="struct.Builder.html" title="struct regex_automata::nfa::thompson::Builder">Builder</a></dt><dd>An abstraction for building Thompson NFAs by hand.</dd><dt><a class="struct" href="struct.Compiler.html" title="struct regex_automata::nfa::thompson::Compiler">Compiler</a></dt><dd>A builder for compiling an NFA from a regex’s high-level intermediate
|
||
representation (HIR).</dd><dt><a class="struct" href="struct.Config.html" title="struct regex_automata::nfa::thompson::Config">Config</a></dt><dd>The configuration used for a Thompson NFA compiler.</dd><dt><a class="struct" href="struct.DenseTransitions.html" title="struct regex_automata::nfa::thompson::DenseTransitions">Dense<wbr>Transitions</a></dt><dd>A sequence of transitions used to represent a dense state.</dd><dt><a class="struct" href="struct.NFA.html" title="struct regex_automata::nfa::thompson::NFA">NFA</a></dt><dd>A byte oriented Thompson non-deterministic finite automaton (NFA).</dd><dt><a class="struct" href="struct.PatternIter.html" title="struct regex_automata::nfa::thompson::PatternIter">Pattern<wbr>Iter</a></dt><dd>An iterator over all pattern IDs in an NFA.</dd><dt><a class="struct" href="struct.SparseTransitions.html" title="struct regex_automata::nfa::thompson::SparseTransitions">Sparse<wbr>Transitions</a></dt><dd>A sequence of transitions used to represent a sparse state.</dd><dt><a class="struct" href="struct.Transition.html" title="struct regex_automata::nfa::thompson::Transition">Transition</a></dt><dd>A single transition to another state.</dd></dl><h2 id="enums" class="section-header">Enums<a href="#enums" class="anchor">§</a></h2><dl class="item-table"><dt><a class="enum" href="enum.State.html" title="enum regex_automata::nfa::thompson::State">State</a></dt><dd>A state in an NFA.</dd><dt><a class="enum" href="enum.WhichCaptures.html" title="enum regex_automata::nfa::thompson::WhichCaptures">Which<wbr>Captures</a></dt><dd>A configuration indicating which kinds of
|
||
<a href="enum.State.html#variant.Capture" title="variant regex_automata::nfa::thompson::State::Capture"><code>State::Capture</code></a> states to include.</dd></dl></section></div></main></body></html> |