Skip to main content

bouncycastle_mldsa_lowmemory/
lib.rs

1//! This crate implements the Module Lattice Digital Signature Algorithm (ML-DSA) as per FIPS 204 optimized
2//! to have the lowest-reasonable runtime memory footprint (aka peak memory usage).
3//!
4//! We achieve an approximate **1/10 the memory footprint** at a cost of approximately **3x
5//! the runtime for signing and no appreciable difference for keygen and verification**,
6//! compared with our un-optimized implementation in the \[bouncycastle_mldsa] crate.
7//! We are extremely happy with!
8//!
9//! # Philosophy of a low-memory implementation
10//! First, a little primer on the objects that make up an ML-DSA key pair.
11//! The "elements" of the lattice are polynomials of degree 256, represented in memory as
12//! arrays of 256 i32's. That's 1 kb per polynomial.
13//! Then we build vectors and matrices composed of these polynomials.
14//! ML-DSA is parametrized as ML-DSA-k,l where k and l are the sizes of the vectors, and the matrices
15//! have size k x l.
16//! So ML-DSA-44 carries vectors of 4 polynomials and matrices of 4 x 4 = 16 polynomials.
17//! ML-DSA-65 is 6, 5 and 6 x 5 = 30 polynomials, and ML-DSA-87 is 8, 7, and 8 x 7 = 56 polynomials.
18//!
19//! A straightforward implementation of ML-DSA will start by un-compressing all the key material into
20//! memory into the format that you need to perform the computation.
21//! A ready-to-use private key consists of a `Vector<l>` and two `Vector<k>`s,
22//! while the public key is a `Vector<k>` and a `Matrix<k,l>`.
23//! For ML-DSA-65, you expect to use 53 kb of RAM just for holding expanded key material, and then
24//! you expect the `.sign()` operation to require several multiples of that as variables for holding
25//! intermediate values as the computation proceeds.
26//! A well-written but not memory-optimized ML-DSA-65 can be expected to consume 200 - 250 kb of RAM
27//! at the widest point of the `.sign()` operation.
28//!
29//! This crate strives to do better!
30//!
31//! The core observation that makes this implementation possible is that by a careful examination of
32//! how the matrix multiplication works, you don't ever need the vectors and matrices to be fully
33//! expanded at the same time.
34//! In fact, you can work one polynomial at a time.
35//! This is because the ML-DSA keygen algorithm starts with a single 32-byte seed and expands that
36//! into intermediate seeds `rho` (32 byte), `rho_prime`(64 byte), and `K` (32 byte), from which all
37//! of the vectors and matrices are derived via hash functions.
38//! The public matrix A can be derived in a random-access fashion from `rho` and the matrix index `k,l`.
39//! The various vectors cannot, but the polynomial compression algorithm given in FIPS 204 as part of
40//! the key encoding procedure can be used to hold the vectors in memory compressed and only un-compress
41//! a single polynomial entry at a time.
42//! The downside of this approach is that it costs performance: throughout this implementation we are re-deriving
43//! bits and pieces of matrices that we had in memory previously, but released.
44//!
45//! We also get a surprising amount of memory-savings by good coding hygiene:
46//! Using un-named scopes to tell the compiler when an intermediate variable is no longer needed and
47//! can be popped off the stack. This sometimes requires re-ordering the steps of the algorithms given in
48//! FIPS 204 so that variables can be created, used, and released in a self-contained block.
49//! Sometimes this is not possible and we have to make a choice between keeping the variable around
50//! or releasing it and re-deriving it later.
51//! We also attempt to be clean about noting the last time a long-lived variable is used and
52//! re-using / re-naming / moving it to a new purpose rather than allocating an additional variable.
53//! These hygiene points can always be further improved with increasingly aggressive design choices.
54//! We feel that we have hit the point of diminishing returns that maintains an acceptable balance
55//! of memory footprint, performance, and code readability, but we welcome pull requests if, for example,
56//! we've missed a polynomial that doesn't need to be created.
57//!
58//! All this combined, we achieve an approximate *1/10 the memory footprint* at a cost of approximately *3x
59//! the runtime for signing and no appreciable difference for keygen and verification*,
60//! compared with our un-optimized implementation in the \[bouncycastle_mldsa] crate.
61//! We are extremely happy with!
62//!
63//! # Performance and size numbers
64//! Below, find performance charts relative to the standard ML-DSA implementation in the \[bouncycastle_mldsa] crate.
65//!
66//! ## Keys sizes in memory and on disk
67//! This implementation greatly reduces the size of keys both on disk and in memory
68//! by only handling the matrices and vectors either as seeds or in their compressed representation
69//! expanding on-demand as part of a sign or verify operation rather than storing them in memory as part of a keygen or key load.
70//!
71//! | Key Object | PK size on disk | PK size in memory | SK Size on disk | SK size in memory |
72//! |------------|-----------------|-------------------|-----------------|-------------------|
73//! | ML-DSA-44  | 1312 (1312)     | 1312 (4128)       | 32 (2560) | 192 (12416) |
74//! | ML-DSA-65  | 1952 (1952)     | 1952 (6176)       | 32 (4032) | 192 (17536) |
75//! | ML-DSA-87  | 2592 (2592)     | 2592 (8224)       | 32 (4896) | 192 (23680) |
76//!
77//! All values are in bytes.
78//! Values in parentheses are the usual sizes in our un-optimized implementation in the \[bouncycastle_mldsa] crate.
79//! These are the sizes of the data objects, not necessarily the physical size of stack memory usage.
80//!
81//!
82//! ## Algorithm performance and peak memory usage
83//! The table below shows the performance and peak memory usage of the ML-DSA algorithms.
84//! Values in parentheses are the comparison values from the un-optimized implementation in the \[bouncycastle_mldsa] crate.
85//! Size numbers were collected with valgrind using a simple main program that calls only the measured function.
86//! Performance throughput numbers were collected on my laptop using the library's provided benchmarks, so
87//! performance throughput numbers should be taken with an extreme grain of salt.
88//!
89//! Actual values may vary based on build configuration.
90//!
91//! | Algorithm | Peak memory usage (kB) | Throughput (ops/s) |
92//! |-----------|-----------------------|-------------------|
93//! | MLDSA44_lowmemory/KeyGen  | 12.9 (77.9)    | 8100 (8700) |
94//! | MLDSA65_lowmemory/KeyGen  | 14.8 (156.9)   | 4900 (5400) |
95//! | MLDSA87_lowmemory/KeyGen  | 16.1 (270.7)   | 2900 (3400) |
96//! | MLDSA44_lowmemory/Sign    | 25.5 (135.3)   | 630 (1400) |
97//! | MLDSA65_lowmemory/Sign    | 29.8 (227.6)   | 370 (960) |
98//! | MLDSA87_lowmemory/Sign    | 32.8 (362.0)   | 250 (730) |
99//! | MLDSA44_lowmemory/Verify  | 15.8 (68.9)    | 7500 (9000) |
100//! | MLDSA65_lowmemory/Verify  | 19.0 (115.7)   | 4500 (5700) |
101//! | MLDSA87_lowmemory/Verify  | 20.6 (237.2)   | 2700 (3600) |
102//!
103//! Values in parentheses are the usual sizes in our un-optimized implementation in the \[bouncycastle_mldsa] crate.
104//!
105//!
106//! # Usage
107//!
108//! This crate has been designed to serve a wide range of use cases, from people dabbling in
109//! cryptography for the first time, to cryptographic protocol designers who need access to the advanced
110//! functionality of the ML-DSA algorithm, to embedded systems developers who want access to memory
111//! and performance optimized functions.
112//!
113//! This page gives examples of simple usage for generating keys and signatures, and verifying signatures.//!
114//!
115//! More examples on advanced usage can be found on the [mldsa] and [hash_mldsa] pages.
116//!
117//! ## Generating Keys
118//!
119//! ```rust
120//! use bouncycastle_mldsa_lowmemory::MLDSA65;
121//! use bouncycastle_core_interface::traits::Signature;
122//!
123//! let (pk, sk) = MLDSA65::keygen().unwrap();
124//! ```
125//! That's it. That will use the library's default OS-backend RNG.
126//!
127//! Commonly with the ML-DSA algorithm, a 32-byte seed is used as the private key, and expanded into
128//! a full private key as needed. This is offered through the library's [KeyMaterial] object:
129//!
130//! ```rust
131//! use bouncycastle_core_interface::traits::KeyMaterial;
132//! use bouncycastle_core_interface::key_material::{KeyMaterial256, KeyType};
133//! use bouncycastle_mldsa::{MLDSA65, MLDSATrait};
134//!
135//! let seed = KeyMaterial256::from_bytes_as_type(
136//!     &hex::decode("000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f").unwrap(),
137//!     KeyType::Seed,
138//! ).unwrap();
139//!
140//! let (pk, sk) = MLDSA65::keygen_from_seed(&seed).unwrap();
141//! ```
142//!
143//! See [MLDSATrait] and [MLDSATrait::sign_mu_deterministic_from_seed] for an API flow that uses a merged
144//! keygen-and-sign function to provide improved speed and memory performance compared with making
145//! separate calls to [MLDSATrait::keygen_from_seed] followed by [Signature::sign].
146//!
147//! ## Generating and Verifying Signatures
148//!
149//! ```rust
150//! use bouncycastle_core_interface::errors::SignatureError;
151//! use bouncycastle_core_interface::traits::Signature;
152//! use bouncycastle_mldsa_lowmemory::{MLDSA65, MLDSATrait};
153//!
154//! let msg = b"The quick brown fox";
155//!
156//! let (pk, sk) = MLDSA65::keygen().unwrap();
157//!
158//! let sig: Vec<u8> = MLDSA65::sign(&sk, msg, None).unwrap();
159//! // This is the signature value that you can save to a file or whatever you need.
160//!
161//! match MLDSA65::verify(&pk, msg, None, &sig) {
162//!     Ok(()) => println!("Signature is valid!"),
163//!     Err(SignatureError::SignatureVerificationFailed) => println!("Signature is invalid!"),
164//!     Err(e) => panic!("Something else went wrong: {:?}", e),
165//! }
166//!
167//! ```
168//! And that's the basic usage! There are lots more bells-and-whistles in the form of exposed algorithm
169//! parameters, streaming APIs and other goodies that you can find by poking around this documentation.
170//!
171//! # Security
172//! All functionality exposed by this crate is considered secure to use.
173//! In other words, this crate does not contain any "hazmat" except for the obvous points about
174//! handling your private keys properly: if you post your private key to github, or you generate
175//! production keys from a weak seed, I can't help you, that's on you.
176//!
177//! While the full formulation of the ML-DSA and HashML-DSA algorithms look complex with parameters
178//! like `seed`, `mu`, `ph`, `ctx`, and `rnd`, rest assured that use (or misuse) of these parameters
179//! do not really affect security of the algorithm; they just mean that you might produce a signature
180//! that nobody else can verify.
181//!
182//! A note about cryptographic side-channel attacks: considerable effort has been expended to attempt
183//! to make this implementation constant-time, which generally means that the core mathematical algorithm
184//! code that handles secret data uses bitshift-and-xor type constructions instead of if-and-loop
185//! constructions. That should give this implementation reasonably good resistance to timing and
186//! power analysis key extraction attacks, however: A) this is a "best-effort" and not formally verified,
187//! and B) the Rust compiler does not guarantee constant-time behaviour no matter how clever your code,
188//! so like all Safe Rust code (ie Rust code that does not include inline assembly), we are at the mercy
189//! of the Rust compiler's optimizer for whether our bitshift-and-xor code actually remains
190//! constant-time after compilation.
191
192// todo -- need to tag the all the vec's with the alloc feature
193// #![no_std]
194
195#![forbid(missing_docs)]
196
197#![forbid(unsafe_code)]
198#![allow(incomplete_features)] // needed because currently generic_const_exprs is experimental
199#![feature(generic_const_exprs)]
200#![feature(int_roundings)]
201#![feature(inherent_associated_types)]
202#![feature(adt_const_params)]
203
204// These are because I'm matching variable names exactly against FIPS 204, for example both 'K' and 'k',
205// or 'A' and 'a' are used and have specific meanings.
206// But need to tell the rust linter to not care.
207#![allow(non_snake_case)]
208#![allow(non_upper_case_globals)]
209
210// so I can use private traits to hide internal stuff that needs to be generic within the
211// MLDSA implementation, but I don't want accessed from outside, such as FIPS-internal functions.
212#![allow(private_bounds)]
213
214// Used in HashMLDSA
215#![feature(unsized_const_params)]
216
217// imports needed just for docs
218#[allow(unused_imports)]
219use bouncycastle_core_interface::traits::{KeyMaterial, Signature, PHSignature};
220
221pub mod mldsa;
222pub mod hash_mldsa;
223mod mldsa_keys;
224mod polynomial;
225mod aux_functions;
226mod low_memory_helpers;
227/*** Exported types ***/
228pub use mldsa::{MLDSATrait, MLDSA, MLDSA44, MLDSA65, MLDSA87};
229pub use hash_mldsa::{HashMLDSA44_with_SHA256, HashMLDSA65_with_SHA256, HashMLDSA87_with_SHA256};
230pub use hash_mldsa::{HashMLDSA44_with_SHA512, HashMLDSA65_with_SHA512, HashMLDSA87_with_SHA512};
231pub use mldsa_keys::{MLDSAPrivateKeyTrait, MLDSAPublicKeyTrait};
232pub use mldsa_keys::{MLDSAPublicKey, MLDSA44PublicKey, MLDSA65PublicKey, MLDSA87PublicKey};
233pub use mldsa_keys::{MLDSASeedPrivateKey, MLDSA44PrivateKey, MLDSA65PrivateKey, MLDSA87PrivateKey};
234pub use mldsa::{MuBuilder};
235
236/*** Exported constants ***/
237pub use mldsa::ML_DSA_44_NAME;
238pub use mldsa::ML_DSA_65_NAME;
239pub use mldsa::ML_DSA_87_NAME;
240
241pub use hash_mldsa::Hash_ML_DSA_44_with_SHA256_NAME;
242pub use hash_mldsa::Hash_ML_DSA_65_with_SHA256_NAME;
243pub use hash_mldsa::Hash_ML_DSA_87_with_SHA256_NAME;
244
245pub use hash_mldsa::Hash_ML_DSA_44_with_SHA512_NAME;
246pub use hash_mldsa::Hash_ML_DSA_65_with_SHA512_NAME;
247pub use hash_mldsa::Hash_ML_DSA_87_with_SHA512_NAME;
248
249pub use mldsa::{TR_LEN, RND_LEN, MU_LEN};
250pub use mldsa::{MLDSA44_PK_LEN, MLDSA44_SK_LEN, MLDSA44_SIG_LEN};
251pub use mldsa::{MLDSA65_PK_LEN, MLDSA65_SK_LEN, MLDSA65_SIG_LEN};
252pub use mldsa::{MLDSA87_PK_LEN, MLDSA87_SK_LEN, MLDSA87_SIG_LEN};