Github Designing Data-intensive Applications Site

Second, and more radically, GitHub implemented (horizontal partitioning) using a custom middleware layer called gh-ost (GitHub Online Schema Transfers) and later, their Vitess-inspired system. They split the massive issues and pull_requests tables by repository ID. This meant that data for a single repository always lived on one shard. This is a thoughtful choice: most queries (e.g., “list all issues in this repo”) are naturally local to a shard, avoiding costly distributed joins. The downside, as Kleppmann warns, is the loss of cross-shard transactional guarantees. For example, moving an issue from one repository to another becomes a complex distributed transaction, something GitHub handles with asynchronous workflows and idempotent retries.

Data processing frameworks are used to transform and analyze data. Here are some common data processing frameworks: github designing data-intensive applications

This draft provides a comprehensive overview of designing data-intensive applications, covering key concepts, principles, and best practices. It is inspired by Martin Kleppmann's book "Designing Data-Intensive Applications" and provides a detailed guide for software engineers and architects building scalable and fault-tolerant systems. This is a thoughtful choice: most queries (e

Search for Raft or Paxos implementations. The etcd repository is a fantastic place to see Raft in action, ensuring a distributed system maintains a single source of truth. Data processing frameworks are used to transform and

As GitHub's user base expanded, the company encountered issues with data retrieval and processing. The platform's search functionality, for instance, was slow and often returned incomplete results. The team struggled to keep up with the increasing demands on their databases, which led to performance degradation and timeouts.

In the modern software landscape, the challenge has shifted from limited computing power to the overwhelming volume, complexity, and velocity of data. Martin Kleppmann’s seminal book, Designing Data-Intensive Applications (DDIA), has become the "bible" for engineers navigating this shift.