SaFi Bank Space : Thoughts on signing requests

Introduction

Goal of request signing is to verify request was made by particular user and wasn't tempered with on the way to our servers.

Our goal for way we sign of requests should be:

  • Robust against implementation errors

  • Robust against differences in parsing of different parts of requests

Possible approaches

At the moment we’re planning on sending 3 different fields along with the request payload, which are also signed:

  • Timestamp – timestamp with microsecond precision, the should be unique for every request

  • Credential ID – identification of the signing key, used to sign this request

  • Customer ID – identification of the customer to which the signing key should belong

Some definitions:

Sign(payload) – hash payload and sign it with key identified by Credential ID

a | b – concatenate a and b , producing ab

"literal" – string literal, with value literal

Concatenation

1. Simple concatenation

The most simple approach for generating payload for signing is to just concatenate the values together:

Signature = Sign(ts | cred_id | customer_id | payload)

This construction unfortunately allows for one signature to represent multiple requests. Let’s say the original request uses following values:

timestamp = 1662581122.491322
credential_id = 1be15f8a-4f74-422b-992f-61d185a8846a
customer_id = 6f00dad0-5b5f-4960-9cb7-420392197ccf

Leading to signature:

Signature = Sign("1662581122.4913221be15f8a-4f74-422b-992f-61d185a8846a6f00dad0-5b5f-4960-9cb7-420392197ccf{body}")

But if we change the values to:

timestamp = 1662581122.
credential_id = 4913221be15f8a-4f74-422b-992f-61d185a8846a
customer_id = 6f00dad0-5b5f-4960-9cb7-420392197ccf

this leads to same signature:

Signature = Sign("1662581122.4913221be15f8a-4f74-422b-992f-61d185a8846a6f00dad0-5b5f-4960-9cb7-420392197ccf{body}")

2. Concatenation with separators

If we add separators between different fields, like for example ".":

Signature = Sign(ts | "." | cred_id | "." | customer_id | "." | body)

This has the same issue, but we need to add separators when forging requests, like for example if we change request like this:

timestamp = 1662581122.
credential_id = 491322.1be15f8a-4f74-422b-992f-61d185a8846a
customer_id = 6f00dad0-5b5f-4960-9cb7-420392197ccf

the message will also be valid.

Solution for this is to ensure non of the fields contain the chosen separator and as we will see in AWS Sig V4 section, if we would choose different separator like "\n", we can get a secure signatures, because in theory HTTP header values can’t contain new lines and therefore sane HTTP implementation should refuse request that don’t fulfil this requirement, but there were HTTP servers in the wild that would process them anyway, therefore it would be best to also filter them in our implementation.

Security of these approaches

While argument can be made that in practice there’s a little chance of these forgeries being accepted as genuine requests. I would argue that it at least shows that these schemes are fragile, because without knowing context in which they are used one cannot with certainty say if they are secure or not. Also if we accept that they are secure at the moment of implementation, as the applications/services change, it’s possible that these forgeries might become useful to attacker in ways we can’t predict. This is why I think this shows why these approaches are not robust enough.

Serialization

3. Serializing payload as one JSON object

Other suggested approach was to serialize payload to be signed as a JSON object:

Signature = JSONEncode({
  "timestamp": "1662581122.491322",
  "credential_id": "1be15f8a-4f74-422b-992f-61d185a8846a",
  "customer_id": "6f00dad0-5b5f-4960-9cb7-420392197ccf",
  "data": {
    ...
  }
})

This removes the ambiguity of payloads from previous examples, but problem with this approach is that different JSON serializers might produce different signature values for semantically identical data. Because we can’t count on order of elements in map one might produce:

{
  "timestamp": "1662581122.491322",
  "credential_id": "1be15f8a-4f74-422b-992f-61d185a8846a",
  "customer_id": "6f00dad0-5b5f-4960-9cb7-420392197ccf",
  "data": {
    ...
  }
}

and the other:

{
  "credential_id": "1be15f8a-4f74-422b-992f-61d185a8846a",
  "customer_id": "6f00dad0-5b5f-4960-9cb7-420392197ccf",
  "data": {
    ...
  },
  "timestamp": "1662581122.491322"
}

Problems with differences in JSON libraries

While semantically they are the same, they will produce different signatures, leading to problems with verification. This could be solved by requiring canonicalization of the resulting JSON (e.g. RFC 8785 - JCS), but this would complicate the implementation.

4. JWS-inspired signatures

We can take a lesson from JOSE book, and generate signatures like this:

header = JSONEncode({
  "timestamp": "1662581122.491322",
  "credential_id": "1be15f8a-4f74-422b-992f-61d185a8846a",
  "customer_id": "6f00dad0-5b5f-4960-9cb7-420392197ccf"
})

Signature = base64url(header) || "." || Sign(base64url(header) || "." || body)

This solves problems of concatenation solutions by having all values that would be concatenated JSON encoded, therefore impossible to move bytes between them and solves the problem with differences in JSON serialization libraries by embedding JSON encoded header directly in to the signature (this has other benefits, but more about that later).

More problems with differences in JSON libraries

The JSON specification says that names within objects SHOULD be unique, but in RFC speak SHOULD is a weak requirement and therefore a lot of JSON libraries happily accept duplicate keys with different behaviors on what they do with them. In JWS specifications Security considerations sections, there’s a requirement to treat SHOULD in JSON specification as MUST, therefore eliminating these concerns, but it is something that needs care when implementing this solution. https://www.rfc-editor.org/rfc/rfc7515#section-10.12 Example of how this could be abused, is malicious client signing a message with multiple customer_ids, if one layer parses the first one (let’s say it corresponds with the owner of the credentials) and second one uses the second one, this would lead to authentication bypass.

Side note: I would suggest removing customer_id entirely, because as far as I know credential_id should uniquely identify both the customer and key, and that way nobody will be tempted to use it directly from the header and only the customer identifier associated with credential_id will be used.

Signing body is not enough

So far we’ve worked with requirements stated in the original user story, but I would argue that they are not sufficient, because they don’t uniquely identify the request, because things like path, method and any of the headers can be changed. Let’s take a look at some other implementations of request signing, like for example AWS SigV4 and RFC drafts HTTP Message Signatures.

AWS Sig V4

The whole specification can be found here https://docs.aws.amazon.com/general/latest/gr/sigv4_signing.html , but here’s a pseudo-code of the whole process:

# normalized path according to RFC 3986
CanonicalURI = normalize(URL.path)

# lexicographically sorted query parameters, with URL encoding ensured
CanonicalQueryString = "..."

CanonicalHeaders =
CanonicalHeadersEntry0 + CanonicalHeadersEntry1 + ... + CanonicalHeadersEntryN
CanonicalHeadersEntry = Lowercase(HeaderName) + ':' + Trimall(HeaderValue) + '\n'

# List of headers that must be used when verifying HTTP signature
SignedHeaders = Lowercase(HeaderName0)
                + ";" + Lowercase(HeaderName1) 
                + ";" + ... 
                + ";" Lowercase(HeaderNameN)

CanonicalRequest =
  HTTPRequestMethod + '\n' +
  CanonicalURI + '\n' +
  CanonicalQueryString + '\n' +
  CanonicalHeaders + '\n' +
  SignedHeaders + '\n' +
  HexEncode(Hash(RequestPayload))
  
Signature = Sign(
  Algorithm + "\n" 
  + RequestDateTime + "\n"
  + CredentialScope + "\n"
  + Hash(CanonicalRequest)
)

RFC drafts HTTP Message Signatures

Full specification can be found here: https://datatracker.ietf.org/doc/html/draft-ietf-httpbis-message-signatures

The specification is much more verbose than the one for AWS SigV4 signatures, and I don’t think I can summarise it properly, because it does much better job of handling edge cases in HTTP specifications.

But the general idea of both approaches is pretty much the same.

Reasoning for signing more than a body of the request

In our original proposal we only signed body, but as we see other request signing schemes sign a lot more. Reason for this is simple, every part of the HTTP request can change it’s semantics. In our scheme we can send the body to any endpoint with any method, so for example signed request to GET /account/balances, could also call DELETE /account.

Solution proposal (WIP)

I hope this document showed that signing requests is not an easy job. HTTP is an old protocol with lots of edge cases and because we can’t control the whole request processing stack it’s hard to make assumptions about security of our solution unless we do something complicated like the RFC draft, for which I don’t know if we could find reasonable implementations (and doing it ourselves is waisting precious time).

So what I would propose is to learn from the https://datatracker.ietf.org/doc/html/draft-ietf-httpbis-message-signatures draft about the possible edge cases, but implement our own signatures either using the concatenation method with new lines as separators or using the JWS-inspired tokens, with some reasonable subset of the specification.

Also in the implementation we should only pass what has been verified down to the actual service (we can have whitelist for headers that don’t need signing, but are still passed through, if we find use-case for it), eliminating mistakes in which the service would process request in different way from the signature verification layer.

Solution (Phase1)

We use \n as separators and simply concatenate the fields.

Solution (Phase N)

We will investigate to sign the path and method of the HTTP call, also.