270 lines
28 KiB
HTML
270 lines
28 KiB
HTML
<!DOCTYPE html><html lang="en"><head><meta charset="utf-8"><meta name="viewport" content="width=device-width, initial-scale=1.0"><meta name="generator" content="rustdoc"><meta name="description" content="A module for building and searching with deterministic finite automata (DFAs)."><title>regex_automata::dfa - Rust</title><script>if(window.location.protocol!=="file:")document.head.insertAdjacentHTML("beforeend","SourceSerif4-Regular-6b053e98.ttf.woff2,FiraSans-Italic-81dc35de.woff2,FiraSans-Regular-0fe48ade.woff2,FiraSans-MediumItalic-ccf7e434.woff2,FiraSans-Medium-e1aa3f0a.woff2,SourceCodePro-Regular-8badfe75.ttf.woff2,SourceCodePro-Semibold-aa29a496.ttf.woff2".split(",").map(f=>`<link rel="preload" as="font" type="font/woff2"href="../../static.files/${f}">`).join(""))</script><link rel="stylesheet" href="../../static.files/normalize-9960930a.css"><link rel="stylesheet" href="../../static.files/rustdoc-ca0dd0c4.css"><meta name="rustdoc-vars" data-root-path="../../" data-static-root-path="../../static.files/" data-current-crate="regex_automata" data-themes="" data-resource-suffix="" data-rustdoc-version="1.93.1 (01f6ddf75 2026-02-11) (Arch Linux rust 1:1.93.1-1)" data-channel="1.93.1" data-search-js="search-9e2438ea.js" data-stringdex-js="stringdex-a3946164.js" data-settings-js="settings-c38705f0.js" ><script src="../../static.files/storage-e2aeef58.js"></script><script defer src="../sidebar-items.js"></script><script defer src="../../static.files/main-a410ff4d.js"></script><noscript><link rel="stylesheet" href="../../static.files/noscript-263c88ec.css"></noscript><link rel="alternate icon" type="image/png" href="../../static.files/favicon-32x32-eab170b8.png"><link rel="icon" type="image/svg+xml" href="../../static.files/favicon-044be391.svg"></head><body class="rustdoc mod"><!--[if lte IE 11]><div class="warning">This old browser is unsupported and will most likely display funky things.</div><![endif]--><rustdoc-topbar><h2><a href="#">Module dfa</a></h2></rustdoc-topbar><nav class="sidebar"><div class="sidebar-crate"><h2><a href="../../regex_automata/index.html">regex_<wbr>automata</a><span class="version">0.4.14</span></h2></div><div class="sidebar-elems"><section id="rustdoc-toc"><h2 class="location"><a href="#">Module dfa</a></h2><h3><a href="#">Sections</a></h3><ul class="block top-toc"><li><a href="#overview" title="Overview">Overview</a></li><li><a href="#example-basic-regex-searching" title="Example: basic regex searching">Example: basic regex searching</a></li><li><a href="#example-searching-with-regex-sets" title="Example: searching with regex sets">Example: searching with regex sets</a></li><li><a href="#example-use-sparse-dfas" title="Example: use sparse DFAs">Example: use sparse DFAs</a></li><li><a href="#example-deserialize-a-dfa" title="Example: deserialize a DFA">Example: deserialize a DFA</a></li><li><a href="#support-for-no_std-and-alloc-only" title="Support for `no_std` and `alloc`-only">Support for <code>no_std</code> and <code>alloc</code>-only</a></li><li><a href="#syntax" title="Syntax">Syntax</a></li><li><a href="#differences-with-general-purpose-regexes" title="Differences with general purpose regexes">Differences with general purpose regexes</a></li></ul><h3><a href="#modules">Module Items</a></h3><ul class="block"><li><a href="#modules" title="Modules">Modules</a></li><li><a href="#structs" title="Structs">Structs</a></li><li><a href="#enums" title="Enums">Enums</a></li><li><a href="#traits" title="Traits">Traits</a></li></ul></section><div id="rustdoc-modnav"><h2 class="in-crate"><a href="../index.html">In crate regex_<wbr>automata</a></h2></div></div></nav><div class="sidebar-resizer" title="Drag to resize sidebar"></div><main><div class="width-limiter"><section id="main-content" class="content"><div class="main-heading"><div class="rustdoc-breadcrumbs"><a href="../index.html">regex_automata</a></div><h1>Module <span>dfa</span> <button id="copy-path" title="Copy item path to clipboard">Copy item path</button></h1><rustdoc-toolbar></rustdoc-toolbar><span class="sub-heading"><a class="src" href="../../src/regex_automata/dfa/mod.rs.html#1-360">Source</a> </span></div><details class="toggle top-doc" open><summary class="hideme"><span>Expand description</span></summary><div class="docblock"><p>A module for building and searching with deterministic finite automata (DFAs).</p>
|
||
<p>Like other modules in this crate, DFAs support a rich regex syntax with Unicode
|
||
features. DFAs also have extensive options for configuring the best space vs
|
||
time trade off for your use case and provides support for cheap deserialization
|
||
of automata for use in <code>no_std</code> environments.</p>
|
||
<p>If you’re looking for lazy DFAs that build themselves incrementally during
|
||
search, then please see the top-level <a href="../hybrid/index.html" title="mod regex_automata::hybrid"><code>hybrid</code> module</a>.</p>
|
||
<h2 id="overview"><a class="doc-anchor" href="#overview">§</a>Overview</h2>
|
||
<p>This section gives a brief overview of the primary types in this module:</p>
|
||
<ul>
|
||
<li>A <a href="regex/struct.Regex.html" title="struct regex_automata::dfa::regex::Regex"><code>regex::Regex</code></a> provides a way to search for matches of a regular
|
||
expression using DFAs. This includes iterating over matches with both the start
|
||
and end positions of each match.</li>
|
||
<li>A <a href="dense/struct.DFA.html" title="struct regex_automata::dfa::dense::DFA"><code>dense::DFA</code></a> provides low level access to a DFA that uses a dense
|
||
representation (uses lots of space, but fast searching).</li>
|
||
<li>A <a href="sparse/struct.DFA.html" title="struct regex_automata::dfa::sparse::DFA"><code>sparse::DFA</code></a> provides the same API as a <code>dense::DFA</code>, but uses a sparse
|
||
representation (uses less space, but slower searching).</li>
|
||
<li>An <a href="trait.Automaton.html" title="trait regex_automata::dfa::Automaton"><code>Automaton</code></a> trait that defines an interface that both dense and sparse
|
||
DFAs implement. (A <code>regex::Regex</code> is generic over this trait.)</li>
|
||
<li>Both dense DFAs and sparse DFAs support serialization to raw bytes (e.g.,
|
||
<a href="dense/struct.DFA.html#method.to_bytes_little_endian" title="method regex_automata::dfa::dense::DFA::to_bytes_little_endian"><code>dense::DFA::to_bytes_little_endian</code></a>) and cheap deserialization (e.g.,
|
||
<a href="dense/struct.DFA.html#method.from_bytes" title="associated function regex_automata::dfa::dense::DFA::from_bytes"><code>dense::DFA::from_bytes</code></a>).</li>
|
||
</ul>
|
||
<p>There is also a <a href="onepass/index.html" title="mod regex_automata::dfa::onepass"><code>onepass</code></a> module that provides a <a href="onepass/struct.DFA.html" title="struct regex_automata::dfa::onepass::DFA">one-pass
|
||
DFA</a>. The unique advantage of this DFA is that, for the class
|
||
of regexes it can be built with, it supports reporting the spans of matching
|
||
capturing groups. It is the only DFA in this crate capable of such a thing.</p>
|
||
<h2 id="example-basic-regex-searching"><a class="doc-anchor" href="#example-basic-regex-searching">§</a>Example: basic regex searching</h2>
|
||
<p>This example shows how to compile a regex using the default configuration
|
||
and then use it to find matches in a byte string:</p>
|
||
|
||
<div class="example-wrap"><pre class="rust rust-example-rendered"><code><span class="kw">use </span>regex_automata::{Match, dfa::regex::Regex};
|
||
|
||
<span class="kw">let </span>re = Regex::new(<span class="string">r"[0-9]{4}-[0-9]{2}-[0-9]{2}"</span>)<span class="question-mark">?</span>;
|
||
<span class="kw">let </span>text = <span class="string">b"2018-12-24 2016-10-08"</span>;
|
||
<span class="kw">let </span>matches: Vec<Match> = re.find_iter(text).collect();
|
||
<span class="macro">assert_eq!</span>(matches, <span class="macro">vec!</span>[
|
||
Match::must(<span class="number">0</span>, <span class="number">0</span>..<span class="number">10</span>),
|
||
Match::must(<span class="number">0</span>, <span class="number">11</span>..<span class="number">21</span>),
|
||
]);</code></pre></div><h2 id="example-searching-with-regex-sets"><a class="doc-anchor" href="#example-searching-with-regex-sets">§</a>Example: searching with regex sets</h2>
|
||
<p>The DFAs in this module all fully support searching with multiple regexes
|
||
simultaneously. You can use this support with standard leftmost-first style
|
||
searching to find non-overlapping matches:</p>
|
||
|
||
<div class="example-wrap"><pre class="rust rust-example-rendered"><code><span class="kw">use </span>regex_automata::{Match, dfa::regex::Regex};
|
||
|
||
<span class="kw">let </span>re = Regex::new_many(<span class="kw-2">&</span>[<span class="string">r"\w+"</span>, <span class="string">r"\S+"</span>])<span class="question-mark">?</span>;
|
||
<span class="kw">let </span>text = <span class="string">b"@foo bar"</span>;
|
||
<span class="kw">let </span>matches: Vec<Match> = re.find_iter(text).collect();
|
||
<span class="macro">assert_eq!</span>(matches, <span class="macro">vec!</span>[
|
||
Match::must(<span class="number">1</span>, <span class="number">0</span>..<span class="number">4</span>),
|
||
Match::must(<span class="number">0</span>, <span class="number">5</span>..<span class="number">8</span>),
|
||
]);</code></pre></div><h2 id="example-use-sparse-dfas"><a class="doc-anchor" href="#example-use-sparse-dfas">§</a>Example: use sparse DFAs</h2>
|
||
<p>By default, compiling a regex will use dense DFAs internally. This uses more
|
||
memory, but executes searches more quickly. If you can abide slower searches
|
||
(somewhere around 3-5x), then sparse DFAs might make more sense since they can
|
||
use significantly less space.</p>
|
||
<p>Using sparse DFAs is as easy as using <code>Regex::new_sparse</code> instead of
|
||
<code>Regex::new</code>:</p>
|
||
|
||
<div class="example-wrap"><pre class="rust rust-example-rendered"><code><span class="kw">use </span>regex_automata::{Match, dfa::regex::Regex};
|
||
|
||
<span class="kw">let </span>re = Regex::new_sparse(<span class="string">r"[0-9]{4}-[0-9]{2}-[0-9]{2}"</span>).unwrap();
|
||
<span class="kw">let </span>text = <span class="string">b"2018-12-24 2016-10-08"</span>;
|
||
<span class="kw">let </span>matches: Vec<Match> = re.find_iter(text).collect();
|
||
<span class="macro">assert_eq!</span>(matches, <span class="macro">vec!</span>[
|
||
Match::must(<span class="number">0</span>, <span class="number">0</span>..<span class="number">10</span>),
|
||
Match::must(<span class="number">0</span>, <span class="number">11</span>..<span class="number">21</span>),
|
||
]);</code></pre></div>
|
||
<p>If you already have dense DFAs for some reason, they can be converted to sparse
|
||
DFAs and used to build a new <code>Regex</code>. For example:</p>
|
||
|
||
<div class="example-wrap"><pre class="rust rust-example-rendered"><code><span class="kw">use </span>regex_automata::{Match, dfa::regex::Regex};
|
||
|
||
<span class="kw">let </span>dense_re = Regex::new(<span class="string">r"[0-9]{4}-[0-9]{2}-[0-9]{2}"</span>).unwrap();
|
||
<span class="kw">let </span>sparse_re = Regex::builder().build_from_dfas(
|
||
dense_re.forward().to_sparse()<span class="question-mark">?</span>,
|
||
dense_re.reverse().to_sparse()<span class="question-mark">?</span>,
|
||
);
|
||
<span class="kw">let </span>text = <span class="string">b"2018-12-24 2016-10-08"</span>;
|
||
<span class="kw">let </span>matches: Vec<Match> = sparse_re.find_iter(text).collect();
|
||
<span class="macro">assert_eq!</span>(matches, <span class="macro">vec!</span>[
|
||
Match::must(<span class="number">0</span>, <span class="number">0</span>..<span class="number">10</span>),
|
||
Match::must(<span class="number">0</span>, <span class="number">11</span>..<span class="number">21</span>),
|
||
]);</code></pre></div><h2 id="example-deserialize-a-dfa"><a class="doc-anchor" href="#example-deserialize-a-dfa">§</a>Example: deserialize a DFA</h2>
|
||
<p>This shows how to first serialize a DFA into raw bytes, and then deserialize
|
||
those raw bytes back into a DFA. While this particular example is a
|
||
bit contrived, this same technique can be used in your program to
|
||
deserialize a DFA at start up time or by memory mapping a file.</p>
|
||
|
||
<div class="example-wrap"><pre class="rust rust-example-rendered"><code><span class="kw">use </span>regex_automata::{Match, dfa::{dense, regex::Regex}};
|
||
|
||
<span class="kw">let </span>re1 = Regex::new(<span class="string">r"[0-9]{4}-[0-9]{2}-[0-9]{2}"</span>).unwrap();
|
||
<span class="comment">// serialize both the forward and reverse DFAs, see note below
|
||
</span><span class="kw">let </span>(fwd_bytes, fwd_pad) = re1.forward().to_bytes_native_endian();
|
||
<span class="kw">let </span>(rev_bytes, rev_pad) = re1.reverse().to_bytes_native_endian();
|
||
<span class="comment">// now deserialize both---we need to specify the correct type!
|
||
</span><span class="kw">let </span>fwd: dense::DFA<<span class="kw-2">&</span>[u32]> = dense::DFA::from_bytes(<span class="kw-2">&</span>fwd_bytes[fwd_pad..])<span class="question-mark">?</span>.<span class="number">0</span>;
|
||
<span class="kw">let </span>rev: dense::DFA<<span class="kw-2">&</span>[u32]> = dense::DFA::from_bytes(<span class="kw-2">&</span>rev_bytes[rev_pad..])<span class="question-mark">?</span>.<span class="number">0</span>;
|
||
<span class="comment">// finally, reconstruct our regex
|
||
</span><span class="kw">let </span>re2 = Regex::builder().build_from_dfas(fwd, rev);
|
||
|
||
<span class="comment">// we can use it like normal
|
||
</span><span class="kw">let </span>text = <span class="string">b"2018-12-24 2016-10-08"</span>;
|
||
<span class="kw">let </span>matches: Vec<Match> = re2.find_iter(text).collect();
|
||
<span class="macro">assert_eq!</span>(matches, <span class="macro">vec!</span>[
|
||
Match::must(<span class="number">0</span>, <span class="number">0</span>..<span class="number">10</span>),
|
||
Match::must(<span class="number">0</span>, <span class="number">11</span>..<span class="number">21</span>),
|
||
]);</code></pre></div>
|
||
<p>There are a few points worth noting here:</p>
|
||
<ul>
|
||
<li>We need to extract the raw DFAs used by the regex and serialize those. You
|
||
can build the DFAs manually yourself using <a href="dense/struct.Builder.html" title="struct regex_automata::dfa::dense::Builder"><code>dense::Builder</code></a>, but using
|
||
the DFAs from a <code>Regex</code> guarantees that the DFAs are built correctly. (In
|
||
particular, a <code>Regex</code> constructs a reverse DFA for finding the starting
|
||
location of matches.)</li>
|
||
<li>To convert the DFA to raw bytes, we use the <code>to_bytes_native_endian</code> method.
|
||
In practice, you’ll want to use either <a href="dense/struct.DFA.html#method.to_bytes_little_endian" title="method regex_automata::dfa::dense::DFA::to_bytes_little_endian"><code>dense::DFA::to_bytes_little_endian</code></a>
|
||
or <a href="dense/struct.DFA.html#method.to_bytes_big_endian" title="method regex_automata::dfa::dense::DFA::to_bytes_big_endian"><code>dense::DFA::to_bytes_big_endian</code></a>, depending on which platform you’re
|
||
deserializing your DFA from. If you intend to deserialize on either platform,
|
||
then you’ll need to serialize both and deserialize the right one depending on
|
||
your target’s endianness.</li>
|
||
<li>Safely deserializing a DFA requires verifying the raw bytes, particularly if
|
||
they are untrusted, since an invalid DFA could cause logical errors, panics
|
||
or even undefined behavior. This verification step requires visiting all of
|
||
the transitions in the DFA, which can be costly. If cheaper verification is
|
||
desired, then <a href="dense/struct.DFA.html#method.from_bytes_unchecked" title="associated function regex_automata::dfa::dense::DFA::from_bytes_unchecked"><code>dense::DFA::from_bytes_unchecked</code></a> is available that only does
|
||
verification that can be performed in constant time. However, one can only use
|
||
this routine if the caller can guarantee that the bytes provided encoded a
|
||
valid DFA.</li>
|
||
</ul>
|
||
<p>The same process can be achieved with sparse DFAs as well:</p>
|
||
|
||
<div class="example-wrap"><pre class="rust rust-example-rendered"><code><span class="kw">use </span>regex_automata::{Match, dfa::{sparse, regex::Regex}};
|
||
|
||
<span class="kw">let </span>re1 = Regex::new(<span class="string">r"[0-9]{4}-[0-9]{2}-[0-9]{2}"</span>).unwrap();
|
||
<span class="comment">// serialize both
|
||
</span><span class="kw">let </span>fwd_bytes = re1.forward().to_sparse()<span class="question-mark">?</span>.to_bytes_native_endian();
|
||
<span class="kw">let </span>rev_bytes = re1.reverse().to_sparse()<span class="question-mark">?</span>.to_bytes_native_endian();
|
||
<span class="comment">// now deserialize both---we need to specify the correct type!
|
||
</span><span class="kw">let </span>fwd: sparse::DFA<<span class="kw-2">&</span>[u8]> = sparse::DFA::from_bytes(<span class="kw-2">&</span>fwd_bytes)<span class="question-mark">?</span>.<span class="number">0</span>;
|
||
<span class="kw">let </span>rev: sparse::DFA<<span class="kw-2">&</span>[u8]> = sparse::DFA::from_bytes(<span class="kw-2">&</span>rev_bytes)<span class="question-mark">?</span>.<span class="number">0</span>;
|
||
<span class="comment">// finally, reconstruct our regex
|
||
</span><span class="kw">let </span>re2 = Regex::builder().build_from_dfas(fwd, rev);
|
||
|
||
<span class="comment">// we can use it like normal
|
||
</span><span class="kw">let </span>text = <span class="string">b"2018-12-24 2016-10-08"</span>;
|
||
<span class="kw">let </span>matches: Vec<Match> = re2.find_iter(text).collect();
|
||
<span class="macro">assert_eq!</span>(matches, <span class="macro">vec!</span>[
|
||
Match::must(<span class="number">0</span>, <span class="number">0</span>..<span class="number">10</span>),
|
||
Match::must(<span class="number">0</span>, <span class="number">11</span>..<span class="number">21</span>),
|
||
]);</code></pre></div>
|
||
<p>Note that unlike dense DFAs, sparse DFAs have no alignment requirements.
|
||
Conversely, dense DFAs must be aligned to the same alignment as a
|
||
<a href="../util/primitives/struct.StateID.html" title="struct regex_automata::util::primitives::StateID"><code>StateID</code></a>.</p>
|
||
<h2 id="support-for-no_std-and-alloc-only"><a class="doc-anchor" href="#support-for-no_std-and-alloc-only">§</a>Support for <code>no_std</code> and <code>alloc</code>-only</h2>
|
||
<p>This crate comes with <code>alloc</code> and <code>std</code> features that are enabled by default.
|
||
When the <code>alloc</code> or <code>std</code> features are enabled, the API of this module will
|
||
include the facilities necessary for compiling, serializing, deserializing
|
||
and searching with DFAs. When only the <code>alloc</code> feature is enabled, then
|
||
implementations of the <code>std::error::Error</code> trait are dropped, but everything
|
||
else generally remains the same. When both the <code>alloc</code> and <code>std</code> features are
|
||
disabled, the API of this module will shrink such that it only includes the
|
||
facilities necessary for deserializing and searching with DFAs.</p>
|
||
<p>The intended workflow for <code>no_std</code> environments is thus as follows:</p>
|
||
<ul>
|
||
<li>Write a program with the <code>alloc</code> or <code>std</code> features that compiles and
|
||
serializes a regular expression. You may need to serialize both little and big
|
||
endian versions of each DFA. (So that’s 4 DFAs in total for each regex.)</li>
|
||
<li>In your <code>no_std</code> environment, follow the examples above for deserializing
|
||
your previously serialized DFAs into regexes. You can then search with them as
|
||
you would any regex.</li>
|
||
</ul>
|
||
<p>Deserialization can happen anywhere. For example, with bytes embedded into a
|
||
binary or with a file memory mapped at runtime.</p>
|
||
<p>The <code>regex-cli</code> command (found in the same repository as this crate) can be
|
||
used to serialize DFAs to files and generate Rust code to read them.</p>
|
||
<h2 id="syntax"><a class="doc-anchor" href="#syntax">§</a>Syntax</h2>
|
||
<p>This module supports the same syntax as the <code>regex</code> crate, since they share the
|
||
same parser. You can find an exhaustive list of supported syntax in the
|
||
<a href="https://docs.rs/regex/1/regex/#syntax">documentation for the <code>regex</code> crate</a>.</p>
|
||
<p>There are two things that are not supported by the DFAs in this module:</p>
|
||
<ul>
|
||
<li>Capturing groups. The DFAs (and <a href="regex/struct.Regex.html" title="struct regex_automata::dfa::regex::Regex"><code>Regex</code></a>es built on top
|
||
of them) can only find the offsets of an entire match, but cannot resolve
|
||
the offsets of each capturing group. This is because DFAs do not have the
|
||
expressive power necessary.</li>
|
||
<li>Unicode word boundaries. These present particularly difficult challenges for
|
||
DFA construction and would result in an explosion in the number of states.
|
||
One can enable <a href="dense/struct.Config.html#method.unicode_word_boundary" title="method regex_automata::dfa::dense::Config::unicode_word_boundary"><code>dense::Config::unicode_word_boundary</code></a> though, which provides
|
||
heuristic support for Unicode word boundaries that only works on ASCII text.
|
||
Otherwise, one can use <code>(?-u:\b)</code> for an ASCII word boundary, which will work
|
||
on any input.</li>
|
||
</ul>
|
||
<p>There are no plans to lift either of these limitations.</p>
|
||
<p>Note that these restrictions are identical to the restrictions on lazy DFAs.</p>
|
||
<h2 id="differences-with-general-purpose-regexes"><a class="doc-anchor" href="#differences-with-general-purpose-regexes">§</a>Differences with general purpose regexes</h2>
|
||
<p>The main goal of the <a href="https://docs.rs/regex"><code>regex</code></a> crate is to serve as a
|
||
general purpose regular expression engine. It aims to automatically balance low
|
||
compile times, fast search times and low memory usage, while also providing
|
||
a convenient API for users. In contrast, this module provides a lower level
|
||
regular expression interface based exclusively on DFAs that is a bit less
|
||
convenient while providing more explicit control over memory usage and search
|
||
times.</p>
|
||
<p>Here are some specific negative differences:</p>
|
||
<ul>
|
||
<li><strong>Compilation can take an exponential amount of time and space</strong> in the size
|
||
of the regex pattern. While most patterns do not exhibit worst case exponential
|
||
time, such patterns do exist. For example, <code>[01]*1[01]{N}</code> will build a DFA
|
||
with approximately <code>2^(N+2)</code> states. For this reason, untrusted patterns should
|
||
not be compiled with this module. (In the future, the API may expose an option
|
||
to return an error if the DFA gets too big.)</li>
|
||
<li>This module does not support sub-match extraction via capturing groups, which
|
||
can be achieved with the regex crate’s “captures” API.</li>
|
||
<li>While the regex crate doesn’t necessarily sport fast compilation times,
|
||
the regexes in this module are almost universally slow to compile, especially
|
||
when they contain large Unicode character classes. For example, on my system,
|
||
compiling <code>\w{50}</code> takes about 1 second and almost 15MB of memory! (Compiling
|
||
a sparse regex takes about the same time but only uses about 1.2MB of
|
||
memory.) Conversely, compiling the same regex without Unicode support, e.g.,
|
||
<code>(?-u)\w{50}</code>, takes under 1 millisecond and about 15KB of memory. For this
|
||
reason, you should only use Unicode character classes if you absolutely need
|
||
them! (They are enabled by default though.)</li>
|
||
<li>This module does not support Unicode word boundaries. ASCII word boundaries
|
||
may be used though by disabling Unicode or selectively doing so in the syntax,
|
||
e.g., <code>(?-u:\b)</code>. There is also an option to
|
||
<a href="dense/struct.Config.html#method.unicode_word_boundary" title="method regex_automata::dfa::dense::Config::unicode_word_boundary">heuristically enable Unicode word boundaries</a>,
|
||
where the corresponding DFA will give up if any non-ASCII byte is seen.</li>
|
||
<li>As a lower level API, this module does not do literal optimizations
|
||
automatically. Although it does provide hooks in its API to make use of the
|
||
<a href="../util/prefilter/struct.Prefilter.html" title="struct regex_automata::util::prefilter::Prefilter"><code>Prefilter</code></a> trait. Missing literal
|
||
optimizations means that searches may run much slower than what you’re
|
||
accustomed to, although, it does provide more predictable and consistent
|
||
performance.</li>
|
||
<li>There is no <code>&str</code> API like in the regex crate. In this module, all APIs
|
||
operate on <code>&[u8]</code>. By default, match indices are
|
||
guaranteed to fall on UTF-8 boundaries, unless either of
|
||
<a href="../util/syntax/struct.Config.html#method.utf8" title="method regex_automata::util::syntax::Config::utf8"><code>syntax::Config::utf8</code></a> or
|
||
<a href="../nfa/thompson/struct.Config.html#method.utf8" title="method regex_automata::nfa::thompson::Config::utf8"><code>thompson::Config::utf8</code></a> are disabled.</li>
|
||
</ul>
|
||
<p>With some of the downsides out of the way, here are some positive differences:</p>
|
||
<ul>
|
||
<li>Both dense and sparse DFAs can be serialized to raw bytes, and then cheaply
|
||
deserialized. Deserialization can be done in constant time with the unchecked
|
||
APIs, since searching can be performed directly on the raw serialized bytes of
|
||
a DFA.</li>
|
||
<li>This module was specifically designed so that the searching phase of a
|
||
DFA has minimal runtime requirements, and can therefore be used in <code>no_std</code>
|
||
environments. While <code>no_std</code> environments cannot compile regexes, they can
|
||
deserialize pre-compiled regexes.</li>
|
||
<li>Since this module builds DFAs ahead of time, it will generally out-perform
|
||
the <code>regex</code> crate on equivalent tasks. The performance difference is likely
|
||
not large. However, because of a complex set of optimizations in the regex
|
||
crate (like literal optimizations), an accurate performance comparison may be
|
||
difficult to do.</li>
|
||
<li>Sparse DFAs provide a way to build a DFA ahead of time that sacrifices search
|
||
performance a small amount, but uses much less storage space. Potentially even
|
||
less than what the regex crate uses.</li>
|
||
<li>This module exposes DFAs directly, such as <a href="dense/struct.DFA.html" title="struct regex_automata::dfa::dense::DFA"><code>dense::DFA</code></a> and
|
||
<a href="sparse/struct.DFA.html" title="struct regex_automata::dfa::sparse::DFA"><code>sparse::DFA</code></a>, which enables one to do less work in some cases. For example,
|
||
if you only need the end of a match and not the start of a match, then you can
|
||
use a DFA directly without building a <code>Regex</code>, which always requires a second
|
||
DFA to find the start of a match.</li>
|
||
<li>This module provides more control over memory usage. Aside from choosing
|
||
between dense and sparse DFAs, one can also choose a smaller state identifier
|
||
representation to use less space. Also, one can enable DFA minimization
|
||
via <a href="dense/struct.Config.html#method.minimize" title="method regex_automata::dfa::dense::Config::minimize"><code>dense::Config::minimize</code></a>, but it can increase compilation times
|
||
dramatically.</li>
|
||
</ul>
|
||
</div></details><h2 id="modules" class="section-header">Modules<a href="#modules" class="anchor">§</a></h2><dl class="item-table"><dt><a class="mod" href="dense/index.html" title="mod regex_automata::dfa::dense">dense</a></dt><dd>Types and routines specific to dense DFAs.</dd><dt><a class="mod" href="onepass/index.html" title="mod regex_automata::dfa::onepass">onepass</a></dt><dd>A DFA that can return spans for matching capturing groups.</dd><dt><a class="mod" href="regex/index.html" title="mod regex_automata::dfa::regex">regex</a></dt><dd>A DFA-backed <code>Regex</code>.</dd><dt><a class="mod" href="sparse/index.html" title="mod regex_automata::dfa::sparse">sparse</a></dt><dd>Types and routines specific to sparse DFAs.</dd></dl><h2 id="structs" class="section-header">Structs<a href="#structs" class="anchor">§</a></h2><dl class="item-table"><dt><a class="struct" href="struct.OverlappingState.html" title="struct regex_automata::dfa::OverlappingState">Overlapping<wbr>State</a></dt><dd>Represents the current state of an overlapping search.</dd></dl><h2 id="enums" class="section-header">Enums<a href="#enums" class="anchor">§</a></h2><dl class="item-table"><dt><a class="enum" href="enum.StartError.html" title="enum regex_automata::dfa::StartError">Start<wbr>Error</a></dt><dd>An error that can occur when computing the start state for a search.</dd><dt><a class="enum" href="enum.StartKind.html" title="enum regex_automata::dfa::StartKind">Start<wbr>Kind</a></dt><dd>The kind of anchored starting configurations to support in a DFA.</dd></dl><h2 id="traits" class="section-header">Traits<a href="#traits" class="anchor">§</a></h2><dl class="item-table"><dt><a class="trait" href="trait.Automaton.html" title="trait regex_automata::dfa::Automaton">Automaton</a></dt><dd>A trait describing the interface of a deterministic finite automaton (DFA).</dd></dl></section></div></main></body></html> |