Files
GopherGate/target/doc/regex_automata/util/alphabet/index.html
2026-02-26 12:00:21 -05:00

36 lines
7.3 KiB
HTML
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
<!DOCTYPE html><html lang="en"><head><meta charset="utf-8"><meta name="viewport" content="width=device-width, initial-scale=1.0"><meta name="generator" content="rustdoc"><meta name="description" content="This module provides APIs for dealing with the alphabets of finite state machines."><title>regex_automata::util::alphabet - Rust</title><script>if(window.location.protocol!=="file:")document.head.insertAdjacentHTML("beforeend","SourceSerif4-Regular-6b053e98.ttf.woff2,FiraSans-Italic-81dc35de.woff2,FiraSans-Regular-0fe48ade.woff2,FiraSans-MediumItalic-ccf7e434.woff2,FiraSans-Medium-e1aa3f0a.woff2,SourceCodePro-Regular-8badfe75.ttf.woff2,SourceCodePro-Semibold-aa29a496.ttf.woff2".split(",").map(f=>`<link rel="preload" as="font" type="font/woff2"href="../../../static.files/${f}">`).join(""))</script><link rel="stylesheet" href="../../../static.files/normalize-9960930a.css"><link rel="stylesheet" href="../../../static.files/rustdoc-ca0dd0c4.css"><meta name="rustdoc-vars" data-root-path="../../../" data-static-root-path="../../../static.files/" data-current-crate="regex_automata" data-themes="" data-resource-suffix="" data-rustdoc-version="1.93.1 (01f6ddf75 2026-02-11) (Arch Linux rust 1:1.93.1-1)" data-channel="1.93.1" data-search-js="search-9e2438ea.js" data-stringdex-js="stringdex-a3946164.js" data-settings-js="settings-c38705f0.js" ><script src="../../../static.files/storage-e2aeef58.js"></script><script defer src="../sidebar-items.js"></script><script defer src="../../../static.files/main-a410ff4d.js"></script><noscript><link rel="stylesheet" href="../../../static.files/noscript-263c88ec.css"></noscript><link rel="alternate icon" type="image/png" href="../../../static.files/favicon-32x32-eab170b8.png"><link rel="icon" type="image/svg+xml" href="../../../static.files/favicon-044be391.svg"></head><body class="rustdoc mod"><!--[if lte IE 11]><div class="warning">This old browser is unsupported and will most likely display funky things.</div><![endif]--><rustdoc-topbar><h2><a href="#">Module alphabet</a></h2></rustdoc-topbar><nav class="sidebar"><div class="sidebar-crate"><h2><a href="../../../regex_automata/index.html">regex_<wbr>automata</a><span class="version">0.4.14</span></h2></div><div class="sidebar-elems"><section id="rustdoc-toc"><h2 class="location"><a href="#">Module alphabet</a></h2><h3><a href="#structs">Module Items</a></h3><ul class="block"><li><a href="#structs" title="Structs">Structs</a></li></ul></section><div id="rustdoc-modnav"><h2><a href="../index.html">In regex_<wbr>automata::<wbr>util</a></h2></div></div></nav><div class="sidebar-resizer" title="Drag to resize sidebar"></div><main><div class="width-limiter"><section id="main-content" class="content"><div class="main-heading"><div class="rustdoc-breadcrumbs"><a href="../../index.html">regex_automata</a>::<wbr><a href="../index.html">util</a></div><h1>Module <span>alphabet</span>&nbsp;<button id="copy-path" title="Copy item path to clipboard">Copy item path</button></h1><rustdoc-toolbar></rustdoc-toolbar><span class="sub-heading"><a class="src" href="../../../src/regex_automata/util/alphabet.rs.html#1-1139">Source</a> </span></div><details class="toggle top-doc" open><summary class="hideme"><span>Expand description</span></summary><div class="docblock"><p>This module provides APIs for dealing with the alphabets of finite state
machines.</p>
<p>There are two principal types in this module, <a href="struct.ByteClasses.html" title="struct regex_automata::util::alphabet::ByteClasses"><code>ByteClasses</code></a> and <a href="struct.Unit.html" title="struct regex_automata::util::alphabet::Unit"><code>Unit</code></a>.
The former defines the alphabet of a finite state machine while the latter
represents an element of that alphabet.</p>
<p>To a first approximation, the alphabet of all automata in this crate is just
a <code>u8</code>. Namely, every distinct byte value. All 256 of them. In practice, this
can be quite wasteful when building a transition table for a DFA, since it
requires storing a state identifier for each element in the alphabet. Instead,
we collapse the alphabet of an automaton down into equivalence classes, where
every byte in the same equivalence class never discriminates between a match or
a non-match from any other byte in the same class. For example, in the regex
<code>[a-z]+</code>, then you could consider it having an alphabet consisting of two
equivalence classes: <code>a-z</code> and everything else. In terms of the transitions on
an automaton, it doesnt actually require representing every distinct byte.
Just the equivalence classes.</p>
<p>The downside of equivalence classes is that, of course, searching a haystack
deals with individual byte values. Those byte values need to be mapped to
their corresponding equivalence class. This is what <code>ByteClasses</code> does. In
practice, doing this for every state transition has negligible impact on modern
CPUs. Moreover, it helps make more efficient use of the CPU cache by (possibly
considerably) shrinking the size of the transition table.</p>
<p>One last hiccup concerns <code>Unit</code>. Namely, because of look-around and how the
DFAs in this crate work, we need to add a sentinel value to our alphabet
of equivalence classes that represents the “end” of a search. We call that
sentinel <a href="struct.Unit.html#method.eoi" title="associated function regex_automata::util::alphabet::Unit::eoi"><code>Unit::eoi</code></a> or “end of input.” Thus, a <code>Unit</code> is either an
equivalence class corresponding to a set of bytes, or it is a special “end of
input” sentinel.</p>
<p>In general, you should not expect to need either of these types unless youre
doing lower level shenanigans with DFAs, or even building your own DFAs.
(Although, you dont have to use these types to build your own DFAs of course.)
For example, if youre walking a DFAs state graph, its probably useful to
make use of <a href="struct.ByteClasses.html" title="struct regex_automata::util::alphabet::ByteClasses"><code>ByteClasses</code></a> to visit each element in the DFAs alphabet instead
of just visiting every distinct <code>u8</code> value. The latter isnt necessarily wrong,
but it could be potentially very wasteful.</p>
</div></details><h2 id="structs" class="section-header">Structs<a href="#structs" class="anchor">§</a></h2><dl class="item-table"><dt><a class="struct" href="struct.ByteClassElements.html" title="struct regex_automata::util::alphabet::ByteClassElements">Byte<wbr>Class<wbr>Elements</a></dt><dd>An iterator over all elements in an equivalence class.</dd><dt><a class="struct" href="struct.ByteClassIter.html" title="struct regex_automata::util::alphabet::ByteClassIter">Byte<wbr>Class<wbr>Iter</a></dt><dd>An iterator over each equivalence class.</dd><dt><a class="struct" href="struct.ByteClassRepresentatives.html" title="struct regex_automata::util::alphabet::ByteClassRepresentatives">Byte<wbr>Class<wbr>Representatives</a></dt><dd>An iterator over representative bytes from each equivalence class.</dd><dt><a class="struct" href="struct.ByteClasses.html" title="struct regex_automata::util::alphabet::ByteClasses">Byte<wbr>Classes</a></dt><dd>A representation of byte oriented equivalence classes.</dd><dt><a class="struct" href="struct.Unit.html" title="struct regex_automata::util::alphabet::Unit">Unit</a></dt><dd>Unit represents a single unit of haystack for DFA based regex engines.</dd></dl></section></div></main></body></html>