PART 5: GENESIS OF LEDGER RECOVER – OPERATIONAL SECURITY

Welcome to Part 5 of our blog series on Ledger Recover Genesis. This series of blog posts explores the many technical hurdles encountered when building a seed recovery service, and how Ledger Recover, provided by Coincover, solves them with a secure design and infrastructure.

Part 6 of our Genesis of Ledger Recover blog series is available here.

So far, we have shown in part 1 and 2, how Ledger Recover splits your seed into shares and sends those shares securely to ~~friends~~ trusted backup providers. In part 3, we’ve shown how it stores (and restore) safely the shares of your seed, protected by hardware encryption, tied to your identity and diversified. In part 4, we’ve explored how Ledger Recover manages to give access to your backup to you and you only.

It is now time to take a closer look at how we ensure maximum security at the operational level. At a glance, operational security is achieved by:

Hardening the infrastructure underpinning Ledger Recover,
Applying separation of duties to the various operators of Ledger Recover,
Monitoring critical components and operations,
Implementing a Recover-specific Incident Response.

Let’s dive into the details of what each of those items means.

Hardening of infrastructure

Infrastructure hardening comes in many shapes. It’s a 360° exercise that involves a wide range of activities driven by a thorough analysis of security risks. It usually starts by maintaining a catalog of attack scenarios that could lead to security issues (such as data leaks, impersonation of clients leading to the unauthorized restoration of shares, non-responsive systems, and service disruption). The prevention of these issues at the operational level is organized around activities like resource isolation, system access regulation, network traffic control, vulnerability management, and many more.

Here’s a rundown of our key measures to harden Ledger Recover’s infrastructure:

Service availability

The infrastructure is designed so that there is no single point of failure (NSPOF), meaning that the system is resilient to the failure of any component. Let’s take the following example: our data centers are served by two independent Internet service providers (ISPs), at two opposite ends of the building. If the fiber is damaged due to ongoing construction work in one part of the building, data will simply be routed through the other ISP. Disruption-free maintenance is yet another benefit that enhances availability. Given that there are at least two instances of all software components of Ledger Recover, we can reconfigure the system to use only instance A while replacing/upgrading/fixing instance B.

Limited admin access to Ledger Recover applications

Only a reduced set of users are granted admin access to the resources that are dedicated to Ledger Recover. The shorter the list of users, the more we can reduce the risk of insider threats getting admin access.

Secured physical data centers

The Backup Providers’ HSMs are hosted in geographically redundant physical data centers, protected from physical and virtual threats using industry-grade security techniques and procedures. The level of physical protection ensures that no unauthorized individual can casually walk away with an HSM. Relying on data centers across multiple sites means that if one location experiences an issue, another location can take over, providing uninterrupted service availability. Last but not least, managing our own HSMs gives us control over who has access to them and what code is deployed on them.

Isolation of Ledger Recover resources

All Ledger Recover resources are isolated from any other resources within Ledger Recover’s service providers, including within Coincover and Ledger. This isolation is needed to ensure that we can contain potential attacks from one network slice aimed at exploiting resources of other network slices.

Code-level security ensured via multiple pillars

We use code scanners to help us identify and address vulnerabilities early on, preventing them from making their way into production.
Code is reviewed and approved by a team independent of the one developing Ledger Recover. This separation is yet another measure to help improve overall code quality by catching logical flaws that might lead to security concerns.
The code of the critical modules of Ledger Recover is signed using a cryptographic signature. The signature is partly generated based on the code’s content, preventing the deployment of tampered code by comparing the signature to its expected value. This security check is done before the code is executed.

Network traffic control

Network traffic is tightly controlled via policies that define rules for traffic flows for all 3 Backup Providers. By defining rules for allowed and denied traffic, we limit the attack surface and reduce the risk of unauthorized accesses. Also, restricting communication between individual services ensures that the attacker’s lateral movement is limited, even if one component is compromised. In addition, we apply mutual TLS (mTLS) authentication to prevent Man-in-the-Middle (MiM) attacks. By verifying the identity of both parties with certificates, mutual TLS ensures that only trusted entities can establish a secure connection.

Key rotation

Encryption keys (used, for example, to encrypt data or communication) are changed regularly in line with cryptography best practices. The advantage of this is that if a key gets compromised, the damage is limited to the time between rotations and to the data encrypted with the old key.

Outbound traffic security

Outbound traffic is limited to known domains and IP addresses only (Backup Providers, service providers). Limiting and monitoring outbound traffic is a way to stay alert to potential data leaks. If the volume of outbound data flows is higher than expected, a malicious actor might be extracting sensitive data from the Ledger Recover system on a significant scale.

Inbound traffic security

Incoming traffic is protected by a combination of anti-DDoS, Web Application Filtering (WAF), and IP filtering techniques. Distributed denial-of-service (DDoS) attacks exert harm by overflowing their target system with requests. Limiting the number of incoming requests is a well-known measure against such attacks. Now, not all attacks are about quantity, some of them are about quality. This is where WAF comes into play. WAF looks at incoming requests and inspects their intended behavior: if the request aims at gaining unauthorized access or manipulating data, the filter blocks the request. Finally, IP filtering employs the double technique of a) whitelisting, that is, allowing traffic only from specific IP addresses or ranges, and b) blacklisting, that is, blocking traffic from known attacker IPs.

Vulnerability management

The components of the Ledger Recover infrastructure are continuously and systematically scanned for known vulnerabilities and misconfiguration, and patches/updates are applied regularly. This helps the response to new types of threats as they emerge and keep security measures up to date and world-class.

Separation of duties

Separation of duties is at the core of the security strategy of Ledger Recover.

The separation of duties between the various Backup Providers (part 3) and IDV Providers (part 4) has been described in the previous posts. You may recall that there are:

3 shares of the Secret Recovery Phrase managed by 3 independent Backup Providers (with database diversification on top to prevent collusion)
2 independent Identity Validators (IDV Providers)

At the infrastructure level, separation of duties is applied between the different roles involved in the development and operation of Ledger Recover.

In addition, we combine the separation of duties with the “least privilege” principle. “Least privilege” is the principle applied to system operators and administrators: they are granted rights to do only what they need to do, ensuring they are given the lowest level of permission required to perform their duties.

So when “least privilege” is combined with “separation of duties”, various admin roles are allocated to different people so that no single person can damage/compromise the confidentiality or integrity of any system component. For example, developers of Ledger Recover code do not have access to the system that is running the code they wrote.

Governance : Quorums

Similar to Blockchains’ consensus mechanisms that guarantee integrity and security by having multiple actors verify blocks, we have adopted a quorum within Ledger Recover system to enhance our operational security.

Despite our robust background checks for our employees, the fact remains that humans can be a weak link in any system, and the cryptosphere is no exception. High-profile security incidents, such as the Mt. Gox hack of 2014, demonstrate how individuals can be exploited or lead to security lapses. People can be influenced or coerced through various motivations – Money, Ideology, Coercion, Ego (aka, MICE(S)) – rendering even the most stringent background checks not entirely foolproof.

To mitigate such risks, we use a system based on the concept of a quorum. This framework requires the consensus of at least three authorized individuals from different teams or departments within backup providers before any significant decisions or critical actions can be taken.

The exact number of persons involved in our different quorums remains undisclosed for security reasons. Still, its mere existence significantly enhances our operational security by diluting the potential influence of any single compromised individual.

Here are some of the activities where we use quorums:

1. Generating the private keys for Ledger Recover HSMs: This critical operation is safeguarded by independent quorums within each entity – Coincover, EscrowTech, and Ledger. Each member of these distinct quorums must be present to generate private keys in their respective HSMs. Each quorum member has access to a backup key, which is crucial for restoring and regenerating their HSM secrets if needed. This structure does not only protect against the risk of any person having undue influence over one of the three backup provider HSMs but also enhances the overall system integrity as each quorum operates independently and is unaware of each other’s specifics.

Keep in mind that even a fully compromised quorum can’t put user assets at risk. Remember from blog post 2: Each backup provider handles only a single share. Without all the needed shares, reconstructing a user’s seed is impossible.
Moreover, extracting the private key of the HSM, which is needed to decipher existing shares, cannot be done with the quorum’s backup keys. Backup provider quorum members will only be able to restore and recreate a new HSM.

2. Deciding on an exceptional release of a customer’s share: Specific, albeit rare, situations may require an exceptional release of a customer’s share. These could be due to Identity Verification failures (name change, physical disfigurement, etc.), or if our undisclosed security measures incorrectly block/blacklist a device. When such a situation arises, a quorum consisting of multiple individuals from the backup providers comes together. This procedure, necessitating broad consensus, ensures that decisions are not made hastily or unilaterally, thus enhancing customer security. Each member of the quorum uses their Ledger Nano device (with their own pin) to approve the release, adding another layer of security against possible collusion or individual errors.

3. Signing HSM firmware code update: Before deploying a new firmware update to the HSMs, our product security team, the Ledger Donjon, conducts a comprehensive review process. Being part of the firmware quorum, the Ledger Donjon ensures that no backdoors or malicious code have been introduced by a malicious insider or a compromised development pipeline via supply chain attack. That way, they maintain the integrity and security of the firmware update.

4. Signing Ledger devices (Nano & Stax) firmware code update: Much like the firmware for the HSMs, updates to our Ledger device’s firmware go through a strict review process and require quorum approval before they are proposed to our users via Ledger Live.

Wrapping up, quorums are an integral part of Ledger Recover’s security architecture. They play an important role in fortifying defenses against internal rogue threats and collusion during vital operations. Leveraging the top-notch security of Ledger devices and services, quorums help ensure trust and protect users’ digital assets against malicious insiders.

Monitoring critical components and operations

As we delve into this chapter, it’s important to note that, for security reasons, we’re only disclosing a subset of the extensive monitoring activities for the Ledger Recover service. While we stand by our commitment to transparency, we also recognize the importance of maintaining discretion around the details of the internal controls and monitoring for operational security.

At Ledger, security is our priority. It’s at the core of our solutions, which are built on robust cryptographic protocols as detailed in our Ledger Recover whitepaper. But our work continues beyond the creation of secure systems. We constantly monitor and assess our operations, looking for any suspicious activities. This continuous vigilance strengthens our security stance, ensuring we’re always ready to respond.

Let’s explore some examples of our multi-layered approach:

Monitoring Administrator Activities: We enforce stringent access control for our administrators. Not only do we require 2FA (Two-Factor Authentication) for all administrative connections to our infrastructure, but we also mandate multiple-person validation for administrator infrastructure access on critical parts of the system. Furthermore, our systems meticulously log and track every administrative activity. These logs are cross-referenced automatically with our internal ticketing systems to detect any unplanned actions. This cautious correlation enables us to promptly alert our security teams about any unusual or suspicious behavior, reinforcing our operational security.

Cross Control Among Backup Providers: Transparency and accountability form the basis of the relationships between the backup providers, Ledger, EscrowTech and Coincover. We’ve established a real-time exchange of logs used for system monitoring and security. This enables cross-verification of activities. If any inconsistency is detected, the service is immediately locked to protect users’ assets.

Overseeing Exceptional Release Activity: The rare instances of manual share releases are meticulously controlled through a multi-quorum process as we explained in the previous section. After the execution of the Exceptional Release Activity, Ledger Recover systems proceed with comprehensive monitoring, including detailed logging and analysis of the parties involved, time of operation, and other relevant details. This process, involving both the multi-quorum execution and the post-action monitoring, ensures that the exceptional release of shares is tightly controlled at all stages of the decision-making process.

Leveraging Security Information and Event Management (SIEM): The SIEM solution forms a crucial part of the Ledger Recover monitoring strategy. This dedicated SIEM enhances the ability to identify and respond to potential security issues in real-time. It’s fine-tuned to identify a variety of Indicators of Compromise (IoCs) based on cluster and Ledger Recover application logs, thanks to specific detection rules specifically developed for the Ledger Recover service. If a custom IoC is detected, a response is automatic and immediate – the entire cluster is locked until a thorough analysis is conducted. In the Ledger Recover service, confidentiality is prioritized over availability of the service to ensure the utmost protection of users’ assets.

In the dynamic landscape of cybersecurity, we have strategized and prepared for various scenarios. Our threat model accounts for the unlikely situation where multiple infrastructure administrators from different backup providers might be compromised. With rigorous safeguards and automatic responses in place, the Ledger Recover service aims to ensure the continued security of users’ assets even in such extraordinary circumstances. In the following section, we’ll outline the comprehensive response measures built to tackle such hypothetical situations.

Ledger Recover-specific Incident Response

With the Ledger Recover service, an Incident Response strategy has been built, collaboratively designed with the three backup providers. A central part of this strategy are automatic safeguards that immediately lock the entire system upon detecting suspicious activity in any part of the infrastructure.

In essence, an “always secure, never sorry” protocol has been engineered into the Ledger Recover service. Security is the number one priority, and it’s a commitment that will never be compromised on.

While we continuously strive to provide a seamless user experience to onboard the next 100 million people into Web3, we will never hesitate to activate these safeguards, effectively locking down the entire Ledger Recover service, if a potential threat arises. In our mission to protect, the choice between running a potentially compromised service and ensuring ultimate security is clear – we choose security.

Conclusion

Here we are at the end of the Operational Security part of this series. In this part, we’ve tried to answer any concern you may have regarding how the impregnability of Ledger Recover system’s security measures are ensured. We talked about the infrastructure, the separation of duties, the governance and monitoring, and finally the Incident Response strategy.

Thank you once again for reading all the way up to this point! You should now have a comprehensive understanding of the Ledger Recover’s operational security. The final part of this blog post series will be about the last security concerns we had, and more precisely: how did we manage our internal and external security audits in order to guarantee the maximum level of security to our users? Stay tuned!