105 lines
14 KiB
HTML
105 lines
14 KiB
HTML
<!DOCTYPE html><html lang="en"><head><meta charset="utf-8"><meta name="viewport" content="width=device-width, initial-scale=1.0"><meta name="generator" content="rustdoc"><meta name="description" content="A module for building and searching with lazy deterministic finite automata (DFAs)."><title>regex_automata::hybrid - Rust</title><script>if(window.location.protocol!=="file:")document.head.insertAdjacentHTML("beforeend","SourceSerif4-Regular-6b053e98.ttf.woff2,FiraSans-Italic-81dc35de.woff2,FiraSans-Regular-0fe48ade.woff2,FiraSans-MediumItalic-ccf7e434.woff2,FiraSans-Medium-e1aa3f0a.woff2,SourceCodePro-Regular-8badfe75.ttf.woff2,SourceCodePro-Semibold-aa29a496.ttf.woff2".split(",").map(f=>`<link rel="preload" as="font" type="font/woff2"href="../../static.files/${f}">`).join(""))</script><link rel="stylesheet" href="../../static.files/normalize-9960930a.css"><link rel="stylesheet" href="../../static.files/rustdoc-ca0dd0c4.css"><meta name="rustdoc-vars" data-root-path="../../" data-static-root-path="../../static.files/" data-current-crate="regex_automata" data-themes="" data-resource-suffix="" data-rustdoc-version="1.93.1 (01f6ddf75 2026-02-11) (Arch Linux rust 1:1.93.1-1)" data-channel="1.93.1" data-search-js="search-9e2438ea.js" data-stringdex-js="stringdex-a3946164.js" data-settings-js="settings-c38705f0.js" ><script src="../../static.files/storage-e2aeef58.js"></script><script defer src="../sidebar-items.js"></script><script defer src="../../static.files/main-a410ff4d.js"></script><noscript><link rel="stylesheet" href="../../static.files/noscript-263c88ec.css"></noscript><link rel="alternate icon" type="image/png" href="../../static.files/favicon-32x32-eab170b8.png"><link rel="icon" type="image/svg+xml" href="../../static.files/favicon-044be391.svg"></head><body class="rustdoc mod"><!--[if lte IE 11]><div class="warning">This old browser is unsupported and will most likely display funky things.</div><![endif]--><rustdoc-topbar><h2><a href="#">Module hybrid</a></h2></rustdoc-topbar><nav class="sidebar"><div class="sidebar-crate"><h2><a href="../../regex_automata/index.html">regex_<wbr>automata</a><span class="version">0.4.14</span></h2></div><div class="sidebar-elems"><section id="rustdoc-toc"><h2 class="location"><a href="#">Module hybrid</a></h2><h3><a href="#">Sections</a></h3><ul class="block top-toc"><li><a href="#overview" title="Overview">Overview</a></li><li><a href="#example-basic-regex-searching" title="Example: basic regex searching">Example: basic regex searching</a></li><li><a href="#example-searching-with-multiple-regexes" title="Example: searching with multiple regexes">Example: searching with multiple regexes</a></li><li><a href="#when-should-i-use-this" title="When should I use this?">When should I use this?</a></li><li><a href="#differences-with-fully-compiled-dfas" title="Differences with fully compiled DFAs">Differences with fully compiled DFAs</a></li><li><a href="#syntax" title="Syntax">Syntax</a></li></ul><h3><a href="#modules">Module Items</a></h3><ul class="block"><li><a href="#modules" title="Modules">Modules</a></li><li><a href="#structs" title="Structs">Structs</a></li><li><a href="#enums" title="Enums">Enums</a></li></ul></section><div id="rustdoc-modnav"><h2 class="in-crate"><a href="../index.html">In crate regex_<wbr>automata</a></h2></div></div></nav><div class="sidebar-resizer" title="Drag to resize sidebar"></div><main><div class="width-limiter"><section id="main-content" class="content"><div class="main-heading"><div class="rustdoc-breadcrumbs"><a href="../index.html">regex_automata</a></div><h1>Module <span>hybrid</span> <button id="copy-path" title="Copy item path to clipboard">Copy item path</button></h1><rustdoc-toolbar></rustdoc-toolbar><span class="sub-heading"><a class="src" href="../../src/regex_automata/hybrid/mod.rs.html#1-144">Source</a> </span></div><details class="toggle top-doc" open><summary class="hideme"><span>Expand description</span></summary><div class="docblock"><p>A module for building and searching with lazy deterministic finite automata
|
||
(DFAs).</p>
|
||
<p>Like other modules in this crate, lazy DFAs support a rich regex syntax with
|
||
Unicode features. The key feature of a lazy DFA is that it builds itself
|
||
incrementally during search, and never uses more than a configured capacity of
|
||
memory. Thus, when searching with a lazy DFA, one must supply a mutable “cache”
|
||
in which the actual DFA’s transition table is stored.</p>
|
||
<p>If you’re looking for fully compiled DFAs, then please see the top-level
|
||
<a href="../dfa/index.html" title="mod regex_automata::dfa"><code>dfa</code> module</a>.</p>
|
||
<h2 id="overview"><a class="doc-anchor" href="#overview">§</a>Overview</h2>
|
||
<p>This section gives a brief overview of the primary types in this module:</p>
|
||
<ul>
|
||
<li>A <a href="regex/struct.Regex.html" title="struct regex_automata::hybrid::regex::Regex"><code>regex::Regex</code></a> provides a way to search for matches of a regular
|
||
expression using lazy DFAs. This includes iterating over matches with both the
|
||
start and end positions of each match.</li>
|
||
<li>A <a href="dfa/struct.DFA.html" title="struct regex_automata::hybrid::dfa::DFA"><code>dfa::DFA</code></a> provides direct low level access to a lazy DFA.</li>
|
||
</ul>
|
||
<h2 id="example-basic-regex-searching"><a class="doc-anchor" href="#example-basic-regex-searching">§</a>Example: basic regex searching</h2>
|
||
<p>This example shows how to compile a regex using the default configuration
|
||
and then use it to find matches in a byte string:</p>
|
||
|
||
<div class="example-wrap"><pre class="rust rust-example-rendered"><code><span class="kw">use </span>regex_automata::{hybrid::regex::Regex, Match};
|
||
|
||
<span class="kw">let </span>re = Regex::new(<span class="string">r"[0-9]{4}-[0-9]{2}-[0-9]{2}"</span>)<span class="question-mark">?</span>;
|
||
<span class="kw">let </span><span class="kw-2">mut </span>cache = re.create_cache();
|
||
|
||
<span class="kw">let </span>haystack = <span class="string">"2018-12-24 2016-10-08"</span>;
|
||
<span class="kw">let </span>matches: Vec<Match> = re.find_iter(<span class="kw-2">&mut </span>cache, haystack).collect();
|
||
<span class="macro">assert_eq!</span>(matches, <span class="macro">vec!</span>[
|
||
Match::must(<span class="number">0</span>, <span class="number">0</span>..<span class="number">10</span>),
|
||
Match::must(<span class="number">0</span>, <span class="number">11</span>..<span class="number">21</span>),
|
||
]);</code></pre></div><h2 id="example-searching-with-multiple-regexes"><a class="doc-anchor" href="#example-searching-with-multiple-regexes">§</a>Example: searching with multiple regexes</h2>
|
||
<p>The lazy DFAs in this module all fully support searching with multiple regexes
|
||
simultaneously. You can use this support with standard leftmost-first style
|
||
searching to find non-overlapping matches:</p>
|
||
|
||
<div class="example-wrap"><pre class="rust rust-example-rendered"><code><span class="kw">use </span>regex_automata::{hybrid::regex::Regex, Match};
|
||
|
||
<span class="kw">let </span>re = Regex::new_many(<span class="kw-2">&</span>[<span class="string">r"\w+"</span>, <span class="string">r"\S+"</span>])<span class="question-mark">?</span>;
|
||
<span class="kw">let </span><span class="kw-2">mut </span>cache = re.create_cache();
|
||
|
||
<span class="kw">let </span>haystack = <span class="string">"@foo bar"</span>;
|
||
<span class="kw">let </span>matches: Vec<Match> = re.find_iter(<span class="kw-2">&mut </span>cache, haystack).collect();
|
||
<span class="macro">assert_eq!</span>(matches, <span class="macro">vec!</span>[
|
||
Match::must(<span class="number">1</span>, <span class="number">0</span>..<span class="number">4</span>),
|
||
Match::must(<span class="number">0</span>, <span class="number">5</span>..<span class="number">8</span>),
|
||
]);</code></pre></div><h2 id="when-should-i-use-this"><a class="doc-anchor" href="#when-should-i-use-this">§</a>When should I use this?</h2>
|
||
<p>Generally speaking, if you can abide the use of mutable state during search,
|
||
and you don’t need things like capturing groups or Unicode word boundary
|
||
support in non-ASCII text, then a lazy DFA is likely a robust choice with
|
||
respect to both search speed and memory usage. Note however that its speed
|
||
may be worse than a general purpose regex engine if you don’t select a good
|
||
<a href="../util/prefilter/index.html" title="mod regex_automata::util::prefilter">prefilter</a>.</p>
|
||
<p>If you know ahead of time that your pattern would result in a very large DFA
|
||
if it was fully compiled, it may be better to use an NFA simulation instead
|
||
of a lazy DFA. Either that, or increase the cache capacity of your lazy DFA
|
||
to something that is big enough to hold the state machine (likely through
|
||
experimentation). The issue here is that if the cache is too small, then it
|
||
could wind up being reset too frequently and this might decrease searching
|
||
speed significantly.</p>
|
||
<h2 id="differences-with-fully-compiled-dfas"><a class="doc-anchor" href="#differences-with-fully-compiled-dfas">§</a>Differences with fully compiled DFAs</h2>
|
||
<p>A <a href="regex/struct.Regex.html" title="struct regex_automata::hybrid::regex::Regex"><code>hybrid::regex::Regex</code></a> and a
|
||
<a href="../dfa/regex/struct.Regex.html" title="struct regex_automata::dfa::regex::Regex"><code>dfa::regex::Regex</code></a> both have the same capabilities
|
||
(and similarly for their underlying DFAs), but they achieve them through
|
||
different means. The main difference is that a hybrid or “lazy” regex builds
|
||
its DFA lazily during search, where as a fully compiled regex will build its
|
||
DFA at construction time. While building a DFA at search time might sound like
|
||
it’s slow, it tends to work out where most bytes seen during a search will
|
||
reuse pre-built parts of the DFA and thus can be almost as fast as a fully
|
||
compiled DFA. The main downside is that searching requires mutable space to
|
||
store the DFA, and, in the worst case, a search can result in a new state being
|
||
created for each byte seen, which would make searching quite a bit slower.</p>
|
||
<p>A fully compiled DFA never has to worry about searches being slower once
|
||
it’s built. (Aside from, say, the transition table being so large that it
|
||
is subject to harsh CPU cache effects.) However, of course, building a full
|
||
DFA can be quite time consuming and memory hungry. Particularly when large
|
||
Unicode character classes are used, which tend to translate into very large
|
||
DFAs.</p>
|
||
<p>A lazy DFA strikes a nice balance <em>in practice</em>, particularly in the
|
||
presence of Unicode mode, by only building what is needed. It avoids the
|
||
worst case exponential time complexity of DFA compilation by guaranteeing that
|
||
it will only build at most one state per byte searched. While the worst
|
||
case here can lead to a very high constant, it will never be exponential.</p>
|
||
<h2 id="syntax"><a class="doc-anchor" href="#syntax">§</a>Syntax</h2>
|
||
<p>This module supports the same syntax as the <code>regex</code> crate, since they share the
|
||
same parser. You can find an exhaustive list of supported syntax in the
|
||
<a href="https://docs.rs/regex/1/regex/#syntax">documentation for the <code>regex</code> crate</a>.</p>
|
||
<p>There are two things that are not supported by the lazy DFAs in this module:</p>
|
||
<ul>
|
||
<li>Capturing groups. The DFAs (and <a href="regex/struct.Regex.html" title="struct regex_automata::hybrid::regex::Regex"><code>Regex</code></a>es built on top
|
||
of them) can only find the offsets of an entire match, but cannot resolve
|
||
the offsets of each capturing group. This is because DFAs do not have the
|
||
expressive power necessary. Note that it is okay to build a lazy DFA from an
|
||
NFA that contains capture groups. The capture groups will simply be ignored.</li>
|
||
<li>Unicode word boundaries. These present particularly difficult challenges for
|
||
DFA construction and would result in an explosion in the number of states.
|
||
One can enable <a href="dfa/struct.Config.html#method.unicode_word_boundary" title="method regex_automata::hybrid::dfa::Config::unicode_word_boundary"><code>dfa::Config::unicode_word_boundary</code></a> though, which provides
|
||
heuristic support for Unicode word boundaries that only works on ASCII text.
|
||
Otherwise, one can use <code>(?-u:\b)</code> for an ASCII word boundary, which will work
|
||
on any input.</li>
|
||
</ul>
|
||
<p>There are no plans to lift either of these limitations.</p>
|
||
<p>Note that these restrictions are identical to the restrictions on fully
|
||
compiled DFAs.</p>
|
||
</div></details><h2 id="modules" class="section-header">Modules<a href="#modules" class="anchor">§</a></h2><dl class="item-table"><dt><a class="mod" href="dfa/index.html" title="mod regex_automata::hybrid::dfa">dfa</a></dt><dd>Types and routines specific to lazy DFAs.</dd><dt><a class="mod" href="regex/index.html" title="mod regex_automata::hybrid::regex">regex</a></dt><dd>A lazy DFA backed <code>Regex</code>.</dd></dl><h2 id="structs" class="section-header">Structs<a href="#structs" class="anchor">§</a></h2><dl class="item-table"><dt><a class="struct" href="struct.BuildError.html" title="struct regex_automata::hybrid::BuildError">Build<wbr>Error</a></dt><dd>An error that occurs when initial construction of a lazy DFA fails.</dd><dt><a class="struct" href="struct.CacheError.html" title="struct regex_automata::hybrid::CacheError">Cache<wbr>Error</a></dt><dd>An error that occurs when cache usage has become inefficient.</dd><dt><a class="struct" href="struct.LazyStateID.html" title="struct regex_automata::hybrid::LazyStateID">Lazy<wbr>StateID</a></dt><dd>A state identifier specifically tailored for lazy DFAs.</dd></dl><h2 id="enums" class="section-header">Enums<a href="#enums" class="anchor">§</a></h2><dl class="item-table"><dt><a class="enum" href="enum.StartError.html" title="enum regex_automata::hybrid::StartError">Start<wbr>Error</a></dt><dd>An error that can occur when computing the start state for a search.</dd></dl></section></div></main></body></html> |