bouncycastle_mldsa_lowmemory/lib.rs
1//! This crate implements the Module Lattice Digital Signature Algorithm (ML-DSA) as per FIPS 204 optimized
2//! to have the lowest-reasonable runtime memory footprint (aka peak memory usage).
3//!
4//! We achieve an approximate **1/10 the memory footprint** at a cost of approximately **3x
5//! the runtime for signing and no appreciable difference for keygen and verification**,
6//! compared with our un-optimized implementation in the \[bouncycastle_mldsa] crate.
7//! We are extremely happy with!
8//!
9//! # Philosophy of a low-memory implementation
10//!
11//! First, a little primer on the objects that make up an ML-DSA key pair.
12//! The "elements" of the lattice are polynomials of degree 256, represented in memory as
13//! arrays of 256 i32's. That's 1 kb per polynomial.
14//! Then we build vectors and matrices composed of these polynomials.
15//! ML-DSA is parametrized as ML-DSA-k,l where k and l are the sizes of the vectors, and the matrices
16//! have size k x l.
17//! So ML-DSA-44 carries vectors of 4 polynomials and matrices of 4 x 4 = 16 polynomials.
18//! ML-DSA-65 is 6, 5 and 6 x 5 = 30 polynomials, and ML-DSA-87 is 8, 7, and 8 x 7 = 56 polynomials.
19//!
20//! A straightforward implementation of ML-DSA will start by un-compressing all the key material into
21//! memory into the format that you need to perform the computation.
22//! A ready-to-use private key consists of a `Vector<l>` and two `Vector<k>`s,
23//! while the public key is a `Vector<k>` and a `Matrix<k,l>`.
24//! For ML-DSA-65, you expect to use 53 kb of RAM just for holding expanded key material, and then
25//! you expect the `.sign()` operation to require several multiples of that as variables for holding
26//! intermediate values as the computation proceeds.
27//! A well-written but not memory-optimized ML-DSA-65 can be expected to consume approximately 150 kb of RAM
28//! at the widest point of the `.sign()` operation.
29//!
30//! This crate strives to do better!
31//!
32//! The core observation that makes this implementation possible is that by a careful examination of
33//! how the matrix multiplication works, you don't ever need the vectors and matrices to be fully
34//! expanded at the same time.
35//! In fact, you can work one polynomial at a time.
36//! This is because the ML-DSA keygen algorithm starts with a single 32-byte seed and expands that
37//! into intermediate seeds `rho` (32 byte), `rho_prime`(64 byte), and `K` (32 byte), from which all
38//! of the vectors and matrices are derived via hash functions.
39//! The public matrix A can be derived in a random-access fashion from `rho` and the matrix index `i,j`.
40//! The various vectors cannot, but the polynomial compression algorithm given in FIPS 204 as part of
41//! the key encoding procedure can be used to hold the vectors in memory compressed and only un-compress
42//! a single polynomial entry at a time.
43//! The downside of this approach is that it costs performance: throughout this implementation we are re-deriving
44//! bits and pieces of matrices that we had in memory previously, but released.
45//!
46//! We also get a surprising amount of memory-savings by good coding hygiene:
47//! Using un-named scopes to tell the compiler when an intermediate variable is no longer needed and
48//! can be popped off the stack. This sometimes requires re-ordering the steps of the algorithms given in
49//! FIPS 204 so that variables can be created, used, and released in a self-contained block.
50//! Sometimes this is not possible and we have to make a choice between keeping the variable around
51//! or releasing it and re-deriving it later.
52//! We also attempt to be clean about noting the last time a long-lived variable is used and
53//! re-using / re-naming / moving it to a new purpose rather than allocating an additional variable.
54//! These hygiene points can always be further improved with increasingly aggressive design choices.
55//! We feel that we have hit the point of diminishing returns that maintains an acceptable balance
56//! of memory footprint, performance, and code readability, but we welcome pull requests if, for example,
57//! we've missed a polynomial that doesn't need to be created.
58//!
59//! All this combined, we achieve an approximate *1/10 the memory footprint* at a cost of approximately *3x
60//! the runtime for signing and no appreciable difference for keygen and verification*,
61//! compared with our un-optimized implementation in the \[bouncycastle_mldsa] crate, which we are extremely happy with!
62//!
63//! # Memory Footprint
64//!
65//! Below, find performance charts relative to the standard ML-DSA implementation in the \[bouncycastle_mldsa] crate.
66//!
67//! ## Keys sizes in memory and on disk
68//!
69//! This implementation greatly reduces the size of keys both on disk and in memory
70//! by only handling the matrices and vectors either as seeds or in their compressed representation
71//! expanding on-demand as part of a sign or verify operation rather than storing them in memory as part of a keygen or key load.
72//!
73//! | Key Object | PK size on disk | PK size in memory | SK Size on disk | SK size in memory |
74//! |------------|-----------------|-------------------|-----------------|-------------------|
75//! | ML-DSA-44 | 1312 (1312) | 1312 (4128) | 32 (2560) | 176 (12464) |
76//! | ML-DSA-65 | 1952 (1952) | 1952 (6176) | 32 (4032) | 176 (17584) |
77//! | ML-DSA-87 | 2592 (2592) | 2592 (8224) | 32 (4896) | 176 (23728) |
78//!
79//! All values are in bytes. The "in memory" sizes are measured by rust's `std::mem::size_of`.
80//! Values in parentheses are the usual sizes in our un-optimized implementation in the \[bouncycastle_mldsa] crate.
81//!
82//!
83//! ## Algorithm Peak Memory Usage
84//! The table below shows peak memory usage of the ML-DSA algorithms and the rough performanc (throughput) impact.
85//!
86//! Measuring peak application memory usage can be a bit tricky, and the numbers you get depend heavily on how you designed your
87//! measurement harness. Here, we aim to provide a conservative measurement, meaning that we are aiming for an
88//! over-estimate so that any deployment within an existing application will use incrementally less additional memory
89//! than the amount stated here.
90//!
91//! Our measurement methodology is to compile a simple standalone HelloWorld application that only calls the function under test
92//! with as minimal as possible hard-coded data (such as keys or ciphertexts) and measure the peak memory usage of running
93//! the compiled binary using `valgrind --tool=massif --heap=no --stack=yes`. The flags for heap and stack
94//! reflect the fact that this is a `no_std` rust application and therefore the cryptographic functions use no heap memory.
95//! The measurements may over-estimate by as much as 3 kb since that that's the measured peak memory usage of a do-nothing
96//! HelloWorld rust application.
97//!
98//! | Algorithm | Peak swap memory usage (kB) | Throughput (ops/s) |
99//! |-----------|-----------------------|-------------------|
100//! | MLDSA44_lowmemory/KeyGen | 12.6 (113.8) | 11,800 (11,300) |
101//! | MLDSA65_lowmemory/KeyGen | 15.0 (124.1) | 5,500 (7.000) |
102//! | MLDSA87_lowmemory/KeyGen | 15.2 (197.8) | 3,300 (4,200) |
103//! | MLDSA44_lowmemory/Sign | 24.8 (117.7) | 850 (4,000) |
104//! | MLDSA65_lowmemory/Sign | 28.2 (159.6) | 580 (2,900) |
105//! | MLDSA87_lowmemory/Sign | 31.1 (236.7) | 315 (2,000) |
106//! | MLDSA44_lowmemory/Verify | 17.1 (73.0) | 10,100 (14,000) |
107//! | MLDSA65_lowmemory/Verify | 18.0 (134.4) | 6,300 (8,400) |
108//! | MLDSA87_lowmemory/Verify | 20.6 (211.6) | 3,500 (5,000) |
109//!
110//! Values in parentheses are the comparison values from the un-optimized implementation in the \[bouncycastle_mldsa] crate.
111//! Size numbers were collected with valgrind using a simple main program that calls only the measured function.
112//! Performance throughput numbers were collected on my laptop using the library's provided benchmarks, so
113//! performance they should be taken with an extreme grain of salt.
114//!
115//! Actual values may vary based on build configuration and target architecture.
116//!
117//! # Usage
118//!
119//! This crate has been designed to serve a wide range of use cases, from people dabbling in
120//! cryptography for the first time, to cryptographic protocol designers who need access to the advanced
121//! functionality of the ML-DSA algorithm, to embedded systems developers who want access to memory
122//! and performance optimized functions.
123//!
124//! This page gives examples of simple usage for generating keys and signatures, and verifying signatures.//!
125//!
126//! More examples on advanced usage can be found on the [mldsa] and [hash_mldsa] pages.
127//!
128//! ## Generating Keys
129//!
130//! ```rust
131//! use bouncycastle_core::traits::Signature;
132//! use bouncycastle_mldsa_lowmemory::MLDSA65;
133//!
134//! let (pk, sk) = MLDSA65::keygen().unwrap();
135//! ```
136//! That's it. That will use the library's default OS-backend RNG.
137//!
138//! Commonly with the ML-DSA algorithm, a 32-byte seed is used as the private key, and expanded into
139//! a full private key as needed. This is offered through the library's [KeyMaterialTrait] object:
140//!
141//! ```rust
142//! use bouncycastle_core::key_material::{KeyMaterial256, KeyType, KeyMaterialTrait};
143//! use bouncycastle_hex as hex;
144//! use bouncycastle_mldsa_lowmemory::{MLDSA65, MLDSATrait};
145//!
146//! let seed = KeyMaterial256::from_bytes_as_type(
147//! &hex::decode("000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f").unwrap(),
148//! KeyType::Seed,
149//! ).unwrap();
150//!
151//! let (pk, sk) = MLDSA65::keygen_from_seed(&seed).unwrap();
152//! ```
153//!
154//! See [MLDSATrait] and [MLDSATrait::sign_mu_deterministic_from_seed] for an API flow that uses a merged
155//! keygen-and-sign function to provide improved speed and memory performance compared with making
156//! separate calls to [MLDSATrait::keygen_from_seed] followed by [Signature::sign].
157//!
158//! ## Generating and Verifying Signatures
159//!
160//! ```rust
161//! use bouncycastle_core::errors::SignatureError;
162//! use bouncycastle_core::traits::Signature;
163//! use bouncycastle_mldsa_lowmemory::{MLDSA65, MLDSATrait};
164//!
165//! let msg = b"The quick brown fox";
166//!
167//! let (pk, sk) = MLDSA65::keygen().unwrap();
168//!
169//! let sig = MLDSA65::sign(&sk, msg, None).unwrap();
170//! // This is the signature value that you can save to a file or whatever you need.
171//!
172//! match MLDSA65::verify(&pk, msg, None, &sig) {
173//! Ok(()) => println!("Signature is valid!"),
174//! Err(SignatureError::SignatureVerificationFailed) => println!("Signature is invalid!"),
175//! Err(e) => panic!("Something else went wrong: {:?}", e),
176//! }
177//!
178//! ```
179//! And that's the basic usage! There are lots more bells-and-whistles in the form of exposed algorithm
180//! parameters, streaming APIs and other goodies that you can find by poking around this documentation.
181//!
182//! # Security
183//! All functionality exposed by this crate is considered secure to use.
184//! In other words, this crate does not contain any "hazmat" except for the obvous points about
185//! handling your private keys properly: if you post your private key to github, or you generate
186//! production keys from a weak seed, I can't help you, that's on you.
187//!
188//! While the full formulation of the ML-DSA and HashML-DSA algorithms look complex with parameters
189//! like `seed`, `mu`, `ph`, `ctx`, and `rnd`, rest assured that use (or misuse) of these parameters
190//! do not really affect security of the algorithm; they just mean that you might produce a signature
191//! that nobody else can verify.
192//!
193//! A note about cryptographic side-channel attacks: considerable effort has been expended to attempt
194//! to make this implementation constant-time, which generally means that the core mathematical algorithm
195//! code that handles secret data uses bitshift-and-xor type constructions instead of if-and-loop
196//! constructions. That should give this implementation reasonably good resistance to timing and
197//! power analysis key extraction attacks, however: A) this is a "best-effort" and not formally verified,
198//! and B) the Rust compiler does not guarantee constant-time behaviour no matter how clever your code,
199//! so like all Safe Rust code (ie Rust code that does not include inline assembly), we are at the mercy
200//! of the Rust compiler's optimizer for whether our bitshift-and-xor code actually remains
201//! constant-time after compilation.
202
203#![no_std]
204#![forbid(missing_docs)]
205#![forbid(unsafe_code)]
206#![allow(incomplete_features)] // needed because currently generic_const_exprs is experimental
207#![feature(generic_const_exprs)]
208#![feature(adt_const_params)]
209// These are because I'm matching variable names exactly against FIPS 204, for example both 'K' and 'k',
210// or 'A' and 'a' are used and have specific meanings.
211// But need to tell the rust linter to not care.
212#![allow(non_snake_case)]
213#![allow(non_upper_case_globals)]
214// so I can use private traits to hide internal stuff that needs to be generic within the
215// MLDSA implementation, but I don't want accessed from outside, such as FIPS-internal functions.
216#![allow(private_bounds)]
217// Used in HashMLDSA
218#![feature(unsized_const_params)]
219
220// imports needed just for docs
221#[allow(unused_imports)]
222use bouncycastle_core::key_material::KeyMaterialTrait;
223#[allow(unused_imports)]
224use bouncycastle_core::traits::Signature;
225
226mod aux_functions;
227pub mod hash_mldsa;
228mod low_memory_helpers;
229pub mod mldsa;
230mod mldsa_keys;
231mod polynomial;
232
233/*** Exported types ***/
234pub use hash_mldsa::{HashMLDSA44_with_SHA256, HashMLDSA65_with_SHA256, HashMLDSA87_with_SHA256};
235pub use hash_mldsa::{HashMLDSA44_with_SHA512, HashMLDSA65_with_SHA512, HashMLDSA87_with_SHA512};
236pub use mldsa::MuBuilder;
237pub use mldsa::{MLDSA, MLDSA44, MLDSA65, MLDSA87, MLDSATrait};
238pub use mldsa_keys::{
239 MLDSA44PrivateKey, MLDSA65PrivateKey, MLDSA87PrivateKey, MLDSASeedPrivateKey,
240};
241pub use mldsa_keys::{MLDSA44PublicKey, MLDSA65PublicKey, MLDSA87PublicKey, MLDSAPublicKey};
242pub use mldsa_keys::{MLDSAPrivateKeyTrait, MLDSAPublicKeyTrait};
243
244/*** Exported constants ***/
245pub use mldsa::ML_DSA_44_NAME;
246pub use mldsa::ML_DSA_65_NAME;
247pub use mldsa::ML_DSA_87_NAME;
248
249pub use hash_mldsa::HASH_ML_DSA_44_with_SHA256_NAME;
250pub use hash_mldsa::HASH_ML_DSA_65_WITH_SHA256_NAME;
251pub use hash_mldsa::HASH_ML_DSA_87_with_SHA256_NAME;
252
253pub use hash_mldsa::HASH_ML_DSA_44_with_SHA512_NAME;
254pub use hash_mldsa::HASH_ML_DSA_65_WITH_SHA512_NAME;
255pub use hash_mldsa::HASH_ML_DSA_87_WITH_SHA512_NAME;
256
257pub use mldsa::{MLDSA_MU_LEN, MLDSA_RND_LEN, MLDSA_TR_LEN};
258pub use mldsa::{MLDSA44_PK_LEN, MLDSA44_SIG_LEN, MLDSA44_SK_LEN};
259pub use mldsa::{MLDSA65_PK_LEN, MLDSA65_SIG_LEN, MLDSA65_SK_LEN};
260pub use mldsa::{MLDSA87_PK_LEN, MLDSA87_SIG_LEN, MLDSA87_SK_LEN};
261
262// re-export just so it's visible to unit tests
263pub use polynomial::Polynomial;