bouncycastle_mlkem_lowmemory/
lib.rs

1//! This crate implements the Module Lattice Key-Encapsulation Mechanism (ML-KEM) as per FIPS 203.
2//!
3//! # Philosophy of a low-memory implementation
4//!
5//! First, a little primer on the objects that make up an ML-KEM key pair.
6//! The "elements" of the lattice are polynomials of degree 256, represented in memory as
7//! arrays of 256 i16's. That's 512 bytes per polynomial.
8//! Then we build vectors and matrices composed of these polynomials.
9//! ML-KEM is parametrized as ML-KEM-k where k in {2, 3, 4} is the size of the vectors, and the matrices
10//! have size k x k.
11//! So ML-KEM-512 carries vectors of 2 polynomials and matrices of 2 x 2 = 4 polynomials.
12//! ML-KEM-768 is 3 and 3 x 3 = 9 polynomials, and ML-KEM-1024 is 4, and 4 x 4 = 16 polynomials.
13//!
14//! A straightforward implementation of ML-KEM will start by un-compressing all the key material into
15//! memory into the format that you need to perform the computation.
16//! A ready-to-use private key consists of a `Vector<k>`
17//! while the public key is a `Vector<k>` and a `Matrix<k,k>`.
18//! For ML-KEM-768, you expect to use 6 kb of RAM just for holding expanded key material, and then
19//! you expect the `.encaps()` and `.decaps()` operations to require several multiples of that as variables for holding
20//! intermediate values as the computation proceeds.
21//! A well-written but not memory-optimized ML-KEM-768 can be expected to consume approximately 40 kb of RAM
22//! at the widest point of the `.decaps()` operation.
23//!
24//! This crate strives to do better!
25//!
26//! The core observation that makes this implementation possible is that by a careful examination of
27//! how the matrix multiplication works, you don't ever need the vectors and matrices to be fully
28//! expanded at the same time.
29//! In fact, you can work one polynomial at a time.
30//! This is because the ML-KEM keygen algorithm starts with a single 64-byte seed and expands that
31//! into intermediate seeds `rho` (32 byte), and `sigma` (32 byte), from which all
32//! of the vectors and matrices are derived via hash functions.
33//! The public matrix A can be derived in a random-access fashion from `rho` and the matrix index `i,j`.
34//! The various vectors cannot, but the polynomial compression algorithm given in FIPS 203 as part of
35//! the key encoding procedure can be used to hold the vectors in memory compressed and only un-compress
36//! a single polynomial entry at a time.
37//! The downside of this approach is that it costs performance: throughout this implementation we are re-deriving
38//! bits and pieces of matrices that we had in memory previously, but released.
39//!
40//! We also get a surprising amount of memory-savings by good coding hygiene:
41//! Using un-named scopes to tell the compiler when an intermediate variable is no longer needed and
42//! con be popped off the stack. This sometimes requires re-ordering the steps of the algorithms given in
43//! FIPS 203 so that variables can be created, used, and released in a self-contained block.
44//! Sometimes this is not possible and we have to make a choice between keeping the variable around
45//! or releasing it and re-deriving it later.
46//! We also attempt to be clean about noting the last time a long-lived variable is used and
47//! re-using / re-naming / moving it to a new purpose rather than allocating an additional variable.
48//! These hygiene points can always be further improved with increasingly aggressive design choices.
49//! We feel that we have hit the point of diminishing returns that maintains an acceptable balance
50//! of memory footprint, performance, and code readability, but we welcome pull requests if, for example,
51//! we've missed a polynomial that doesn't need to be created.
52//!
53//! All this combined, we achieve an approximate *1/3 the memory footprint* at a cost of approximately *3x
54//! the runtime for decapsulating and no appreciable difference for encapsulation*,
55//! compared with our un-optimized implementation in the \[bouncycastle_mlkem] crate, which we are extremely happy with!
56//!
57//! # Memory Footprint
58//!
59//! Below, find performance charts relative to the standard ML-KEM implementation in the \[bouncycastle_mlkem] crate.
60//!
61//! ## Keys sizes in memory and on disk
62//!
63//! This implementation greatly reduces the size of keys both on disk and in memory
64//! by only handling the matrices and vectors either as seeds or in their compressed representation
65//! expanding on-demand as part of a sign or verify operation rather than storing them in memory as part of a keygen or key load.
66//!
67//! | Key Object | PK size on disk | PK size in memory | SK Size on disk | SK size in memory |
68//! |------------|-----------------|-------------------|-----------------|-------------------|
69//! | ML-KEM-512  | 800 (800)      | 800 (1056)        | 32 (1632)       | 161 (2178) |
70//! | ML-KEM-768  | 1184 (1184)    | 1184 (1568)       | 32 (2400)       | 161 (3202) |
71//! | ML-KEM-1024 | 1568 (1568)    | 1568 (2080)       | 32 (3168)       | 161 (4226) |
72//!
73//! All values are in bytes. The "in memory" sizes are measured by rust's `std::mem::size_of`.
74//! Values in parentheses are the usual sizes in our un-optimized implementation in the \[bouncycastle_mldsa] crate.
75//!
76//!
77//! ## Algorithm Peak Memory Usage
78//!
79//! The table below shows peak memory usage of the ML-KEM algorithms and the rough performanc (throughput) impact.
80//!
81//! Measuring peak application memory usage can be a bit tricky, and the numbers you get depend heavily on how you designed your
82//! measurement harness. Here, we aim to provide a conservative measurement, meaning that we are aiming for an
83//! over-estimate so that any deployment within an existing application will use incrementally less additional memory
84//! than the amount stated here.
85//!
86//! Our measurement methodology is to compile a simple standalone HelloWorld application that only calls the function under test
87//! with as minimal as possible hard-coded data (such as keys or ciphertexts) and measure the peak memory usage of running
88//! the compiled binary using `valgrind --tool=massif --heap=no --stack=yes`. The flags for heap and stack
89//! reflect the fact that this is a `no_std` rust application and therefore the cryptographic functions use no heap memory.
90//! The measurements may over-estimate by as much as 3 kb since that that's the measured peak memory usage of a do-nothing
91//! HelloWorld rust application.
92//!
93//! | Algorithm                  | Peak stack memory usage (kB) | Throughput (Kops/s) |
94//! |----------------------------|------------------------|-------------------|
95//! | MLKEM512_lowmemory/KeyGen  | 5.8 (21.7)             | 50.1 (48.7) |
96//! | MLKEM512_lowmemory/KeyGen  | 7.3 (28.9)             | 28.3 (28.8) |
97//! | MLKEM1024_lowmemory/KeyGen | 9.3 (41.4)             | 16.9 (18.1) |
98//! | MLKEM512_lowmemory/Encaps  | 8.9 (18.8)             | 37.8 (43.7) |
99//! | MLKEM768_lowmemory/Encaps  | 9.9 (27.9)             | 22.3 (26.0) |
100//! | MLKEM1024_lowmemory/Encaps | 11.2 (44.2)            | 14.2 (15.6) |
101//! | MLKEM512_lowmemory/Decaps  | 13.6 (25.7)            | 13.4 (31.6) |
102//! | MLKEM768_lowmemory/Decaps  | 16.6 (40.6)            | 7.8 (21.0) |
103//! | MLKEM1024_lowmemory/Decaps | 20.5 (58.4)            | 4.9 (13.6) |
104//!
105//! Values in parentheses are the comparison values from the un-optimized implementation in the \[bouncycastle_mldsa] crate.
106//! Size numbers were collected with valgrind using a simple main program that calls only the measured function.
107//! Performance throughput numbers were collected on my laptop using the library's provided benchmarks, so
108//! performance they should be taken with an extreme grain of salt.
109//!
110//! Actual values may vary based on build configuration and target architecture.
111//!
112//!
113//! # Usage
114//!
115//! This crate has been designed to serve a wide range of use cases, from people dabbling in
116//! cryptography for the first time, to cryptographic protocol designers who need access to the internal
117//! functionality of the ML-KEM algorithm, to embedded systems developers who want access to memory
118//! and performance optimized functions.
119//!
120//! This page gives examples of simple usage for generating keys and performing encapsulation and decapsulation operations.
121//!
122//! More examples on advanced usage can be found on the [mlkem] page.
123//!
124//! # Primer on KEM algorithms
125//!
126//! Since Key-Encapsulation Mechanisms behave differently from the Diffie-Hellman key exchanges that many people are familiar with,
127//! we start with a brief primer on KEMs.
128//!
129//! The core operation of a KEM is "encapsulation" which performs a mathematical operation against a KEM public key and yields
130//! a shared secret key, and a ciphertext. Internally, the encapsulation routine will use a cryptographic random number generator
131//! to ensure that the shared secret key and ciphertext are strongly unique to this encapsulation operation.
132//! The receiving party can then perform a "decapsulation" using the corresponding private key to obtain
133//! the same shared secret key from the ciphertext.
134//!
135//! Some sources, including FIPS 203 refer to the public key as the "encapsulation key `ek`" and the private key
136//! as the "decapsulation key `dk`". These are used interchangeable with the public key `pk` and private key (or secret key) `sk`.
137//!
138//! The three operations of a KEM algorithm are:
139//!
140//! * `keygen() -> (pk, sk)`
141//! * `encaps(pk) -> (ss, ct)`
142//! * `decaps(sk, ct) -> (ss)`
143//!
144//! Since both `keygen` and `encaps` require a source of randomness, it is also common for a cryptographic
145//! library to expose deterministic versions, which are often labelled as "internal" since they should
146//! be used carefully as their misuse can catastrophically reduce the algorithm's security.
147//! For ML-KEM in particular, these are:
148//!
149//! * `ML-KEM.keygen_internal(seed: \[u8; 64])`, sometimes written as `ML-KEM.keygen_internal(d: \[u8; 32], z: \[u8; 32])`
150//! * `ML-KEM.encaps_internal(pk, m: \[u8; 32])`
151//!
152//!
153//!
154//! ## Generating Keys
155//!
156//! ```rust
157//! use bouncycastle_mlkem_lowmemory::MLKEM768;
158//! use bouncycastle_core::traits::KEM;
159//!
160//! let (pk, sk) = MLKEM768::keygen().unwrap();
161//! ```
162//! That's it. That will use the library's default OS-backend RNG.
163//!
164//! Commonly with the ML-KEM algorithm, a 64-byte seed is used as the private key, and expanded into
165//! a full private key as needed. This is offered through the library's [KeyMaterialTrait] object:
166//!
167//! ```rust
168//! use bouncycastle_core::key_material::{KeyMaterial512, KeyType, KeyMaterialTrait};
169//! use bouncycastle_mlkem_lowmemory::{MLKEM768, MLKEMTrait};
170//! use bouncycastle_hex as hex;
171//!
172//! let seed = KeyMaterial512::from_bytes_as_type(
173//!     &hex::decode("000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f
174//!                   202122232425262728292a2b2c2d2e2f303132333435363738393a3b3c3d3e3f").unwrap(),
175//!     KeyType::Seed,
176//! ).unwrap();
177//!
178//! let (pk, sk) = MLKEM768::keygen_from_seed(&seed).unwrap();
179//! ```
180//!
181//! See [MLKEM] and [MLKEM::decaps_from_seed] for an API that uses a merged
182//! keygen-and-decaps function to that allows you to store the private key only as a 64-byte seed.
183//!
184//! ## Encapsulating and Decapsulating
185//!
186//! ```rust
187//! use bouncycastle_mlkem_lowmemory::{MLKEM768, MLKEMTrait};
188//! use bouncycastle_core::traits::KEM;
189//! use bouncycastle_core::errors::KEMError;
190//!
191//! let (pk, sk) = MLKEM768::keygen().unwrap();
192//!
193//! // Create the shared secret and ciphertext using the public key
194//! let (ss, ct) = MLKEM768::encaps(&pk).unwrap();
195//!
196//! // Recover the shared secret using the private key
197//! let ss1 = match MLKEM768::decaps(&sk, &ct) {
198//!     Err(KEMError) => panic!("Error decapsulating"),
199//!     Ok(ss) => ss,
200//! };
201//!
202//! assert_eq!(ss, ss1);
203//! ```
204//! And that's the basic usage!
205//!
206//! # Security
207//! All functionality exposed by this crate is considered secure to use.
208//! In other words, this crate does not contain any "hazmat" except for the obvious points about
209//! handling your private keys properly: if you post your private key to github, or you generate
210//! production keys from a weak seed, I can't help you, that's on you.
211//! It is worth mentioning, however, that if using a [MLKEM::keygen_from_seed], then it is your
212//! responsibility to ensure that the seed is cryptographically random and unpredictable.
213//! And also that [MLKEM::encaps_internal] requires you to provide the randomness, so the ciphertext
214//! will only be as strong as the randomness that you provide.
215//!
216//! A note about cryptographic side-channel attacks: considerable effort has been expended to attempt
217//! to make this implementation constant-time, which generally means that the core mathematical algorithm
218//! code that handles secret data uses bitshift-and-xor type constructions instead of if-and-loop
219//! constructions. That should give this implementation reasonably good resistance to timing and
220//! power analysis key extraction attacks, however: A) this is a "best-effort" and not formally verified,
221//! and B) the Rust compiler does not guarantee constant-time behaviour no matter how clever your code,
222//! so like all Safe Rust code (ie Rust code that does not include inline assembly), we are at the mercy
223//! of the Rust compiler's optimizer for whether our bitshift-and-xor code actually remains
224//! constant-time after compilation.
225
226#![no_std]
227
228#![forbid(missing_docs)]
229
230#![forbid(unsafe_code)]
231#![allow(incomplete_features)] // needed because currently generic_const_exprs is experimental
232#![feature(generic_const_exprs)]
233#![feature(adt_const_params)]
234
235// These are because I'm matching variable names exactly against FIPS 204, for example both 'K' and 'k',
236// or 'A' and 'a' are used and have specific meanings.
237// But need to tell the rust linter to not care.
238#![allow(non_snake_case)]
239#![allow(non_upper_case_globals)]
240
241// so I can use private traits to hide internal stuff that needs to be generic within the
242// MLKEM implementation, but I don't want accessed from outside, such as FIPS-internal functions.
243#![allow(private_bounds)]
244
245// imports needed just for docs
246#[allow(unused_imports)]
247use bouncycastle_core::key_material::KeyMaterialTrait;
248
249// todo -- re-run mutants
250
251// todo -- crucible tests
252
253pub mod mlkem;
254mod mlkem_keys;
255mod polynomial;
256mod aux_functions;
257mod low_memory_helpers;
258
259/*** Exported types ***/
260pub use mlkem::{MLKEMTrait, MLKEM, MLKEM512, MLKEM768, MLKEM1024};
261pub use mlkem_keys::{MLKEMPrivateKeyTrait, MLKEMPublicKeyTrait};
262pub use mlkem_keys::{MLKEMPublicKey, MLKEM512PublicKey, MLKEM768PublicKey, MLKEM1024PublicKey};
263pub use mlkem_keys::{MLKEMSeedPrivateKey, MLKEM512PrivateKey, MLKEM768PrivateKey, MLKEM1024PrivateKey};
264
265/*** Exported constants ***/
266pub use mlkem::ML_KEM_512_NAME;
267pub use mlkem::ML_KEM_768_NAME;
268pub use mlkem::ML_KEM_1024_NAME;
269
270pub use mlkem::{MLKEM_SEED_LEN, MLKEM_SS_LEN, MLKEM_RND_LEN};
271
272pub use mlkem::{MLKEM512_PK_LEN, MLKEM512_SK_LEN, MLKEM512_CT_LEN};
273pub use mlkem::{MLKEM768_PK_LEN, MLKEM768_SK_LEN, MLKEM768_CT_LEN};
274pub use mlkem::{MLKEM1024_PK_LEN, MLKEM1024_SK_LEN, MLKEM1024_CT_LEN};
275
276// re-export just so it's visible to unit tests
277pub use polynomial::Polynomial;
bouncycastle_mlkem_lowmemory/lib.rs

bouncycastle_mlkem_lowmemory/
lib.rs